Scientific analysis and visualization is better with open source | Opensource.com

Scientific analysis and visualization is better with open source

Posted 11 Feb 2014 by 

Jason Baker (Red Hat)
Rating: 
(7 votes)
Image by : 

opensource.com

submit to reddit

Marcus Hanwell is a physicist by training, but his background in science led him down a different path than most reseachers. Today he is a contributer to a number of open source projects aimed at helping the scientific community better analyze and visualize their data. If you've got a question about finding the right open source tool for a scientific application, Marcus can point you in the right direction.

At Opensource.com, Marcus is a member of the Community Moderator program. He writes about everything from specific tools to open access of scientific publications to event recaps. You might also find him in the comments helping drive discussion.

In addition to tools specifically oriented toward scientific research, Marcus frequently uses a number of more general-purpose open source development tools and workflow applications to make his life easier. Learn more about how Marcus uses these tools in his life in this Community Spotlight interview.

The Basics

  • Name: Marcus D. Hanwell
  • Opensource.com username: mhanwell
  • Location: Clifton Park, New York, United States
  • Occupation/Employer/Position: Technical Leader at Kitware, Inc.
  • Open source connection: Originally Gentoo, then KDE and now too many to mention (open science)
  • Favorite open source tool or application: KDE and much of what it contains! I love Kile when editing LaTeX, Qt Creator for coding, and Firefox for browsing the web
  • Favorite opensource.com channel: Life (although I would love to see a science channel!)

Open up to us.

I live in Rexford, NY, very close to Clifton Park, NY where I am a Technical Leader at the headquarters of Kitware, Inc. I have a B.Sc. and PhD in Physics from the University of Sheffield, UK, with a focus on experimental nanomaterials engineering. I spent three months in Silicon Valley back in 2002, in a program to show physics students how startups work. When I started my PhD, I was intent on becoming a research scientist, pushing back the boundaries of science. I thought I might even have a Nobel prize by now!

My open source journey began back in about 1996 or 1997, when I downloaded a Red Hat Linux installer and started trying to get a graphical environment working on my machine. I was frustrated by how much money Microsoft thought I should pay just to get a compiler and experiment with writing software. Later, I built an AMD64 machine and got into Gentoo Linux; I became a packager in 2004 focusing on porting and scientific applications (along with some KDE stuff too). I was featured as developer of the week in 2005, but became interested in more actively creating software rather than packaging/porting.

I have always been passionate about visualizing scientific data, and during my PhD discovered just how difficult (and expensive) it could be to simply draw a molecule and produce an image. At first I was going to start my own project, but then started looking around and found Google Summer of Code with the perfect project proposed as part of KDE. I applied, was accepted, and the rest is history. I took a postdoc at the University of Pittsburgh, but in the end decided academia wasn’t the place for me. When I found out about this small company that would pay me to work on open scientific software I was very excited; I started with Kitware as an R&D Engineer in 2009. Virtually everything I do is open, with the code I write usually being licensed 3-clause BSD and the documents CC-BY! I get to work with some of the best scientists, researchers and engineers in the world on interesting and challenging problems, and am always looking for more collaborators with exciting projects.

What open tools and data help you get things done, and how do they help you?

My day-to-day work is scientific data visualization and analysis, and to make this happen I use a large array of open tools. There is OpenGL for cross platform, hardware-accelerated rendering, Qt as a graphical toolkit, GCC, and more recently Clang, to compile the code along with Qt Creator to develop, and I often use its GDB integration. I make extensive use of CMake to build the projects I work on, and CTest to drive the test framework along with CDash to aggregate the results. Google Test augments what is available in Qt to make test-driven development easier, and I use both along with image difference regressions developed at Kitware to verify rendered images are consistent across platforms, graphics cards, and architectures. I use Git for distributed revision control, and Gerrit for online code review, although I am looking at alternatives as it seems upstream may never integrate support for topic branches.

Some of these tools I contribute to, and others I both use and contribute to. The Visualization Toolkit (VTK) is something I hadn’t really used much before I moved to Kitware, but I am now one of the core developers for. I use it in many of the projects I work on, and also have spent a lot of time working on it. The Open Chemistry project I lead at Kitware makes extensive use of CMake, Qt, OpenGL, VTK, Google Test, OpenGL, GLEW, and a slew of other things needed to develop, debug, and deploy the code. I also need open data in order to test and demonstrate what I do, and this can be a little tougher at times. NWChem was open sourced a few years ago, and the Clean Energy Project is a good source of some of this data. Many of our collaborators are able to offer openly licensed content, but I would like to see this situation improve and this is something I spend time advocating for.

What do you wish were more open?

Science! I do what I do because the reality of research really didn’t match my expectations when I went into this. Publicly funded research is often done using closed source tools, the data, workflow, analysis, and software used is frequently entirely closed. It should all be open to scrutiny, more than that it should all be open to being built upon and extended. There are papers I wrote during my PhD and postdoctoral research that I cannot access, and most of the funding that went to pay for the labs, my salary, and everything else came from public funds. I think that is a great injustice, and I was told that the publishing industry was adding value somewhere...frankly I don’t see what value they add in the modern age.

What are the biggest challenges to openness that you encounter, either at work or in your life?

I think the biggest challenges come from the publishing behemoths clinging to an outdated business model that that no longer serves science very well. The problem is that they seem to have forgotten their mission to make public the fruits of scientific research (i.e., to publish), they are entrenched, have well-established funding mechanisms with deep pockets for lobbying, lawyers, and advertising. I think in the modern age, where so many of us are fortunate enough to have high bandwidth connections to a huge network of shared knowledge we should find business models where things are shared almost effortlessly. The biggest challenge in that is finding ways to fund this work and build teams of developers who can continue to work and innovate.

I still want to change the world, I still want to leave it better than I found it. I don’t think the way to do that is in software licensing, and I think intellectual property is possibly one of the most detrimental concepts ever invented by lawyers that mainly serves to line their pockets and keep monopolies in power. I would like to see intellectual property laws reformed to serve the public, rather than the corporations. I would also like to see more of the funding from government agencies going to fund sustainable software for science, that is liberally licensed so that a much greater cross-section of scientists can benefit (along with the general public). More than anything I want to see research transform into a field where sharing is the default, and if you want to publish results that you claim are science then all steps should be shared and open to question. I think open source borrowed a lot from the ideals of science, and now I would like to see science borrow some of the innovations that open source has made!

Why choose the open source way?

I think at the core it is all about removing barriers and reducing reinvention of the wheel. Another thing I learned while doing my degree, PhD, and postdoc, is that scientists tend to write fairly bad code without all of the rigor you might hope for. There are major codes out there that do not even use version control. I would never mention names, but I have sat in meetings where people have argued against the use of version control! I really want to work on improving that, ideally finding ways to get into universities to teach students about software development and all of the processes that exist around it.

I think we need to break down the walls of all the black boxes and teach people how to look inside of them! It doesn’t matter whether you are just starting out, have a PhD, work for Google/IBM/Facebook/GE, or invented C++. We are all fallible and can learn from one another. The open source way is about the only way I have seen that really enables that. If I want to write a molecular editor, it is a lot easier if I can inspect what others did before me. What kind of sorry state would we all be in if Erwin Schrödinger had written a small proprietary piece of code that described the state of a quantum system, rather than sharing the Schrödinger equation and its full derivation? Computer code is quickly replacing mathematics as one of the main tools for discovery, but it is often not held to the same standard as mathematical proofs in scientific publication.

submit to reddit

Comment now

Jason is passionate about using technology to make the world more open, from software development to bringing sunlight to local governments. He is particularly interested in data visualization/analysis, DIY/maker culture, simulations/modeling, geospatial technologies, and cloud computing, especially OpenStack. Follow him on Twitter or Google+.