My journey from bench scientist to open science software developer and how I develop better tools for open, reproducible scientific research.
The early years
When I was at school, computers were only really just beginning to show their promise and few people had Internet access. I remember begging my Mum for a ZX Spectrum and using it to write basic code to draw things on the screen. From then on I was hooked, but didn’t really know if there were careers programming computers, and it wasn’t at all clear whether this was of any use if I wanted to do scientific research. As I moved to a much faster Amiga 500 Plus, I continued to enjoy programming as a hobby and loved writing simulations to understand mathematics and physical phenomena.
The first Windows-based machine I really used (Windows 95) seemed clunky, and very closed. I remember learning about Linux back then, and installed Red Hat Linux on an old PC back in 1996 or so. It was pretty rough back then, and involved multiple steps, but I learned a lot. When I was done I also had a full development environment, and could write/compile code. I worked for a small company for a few years, mainly maintaining a Microsoft Access database, but I was also tasked with getting our services online. This was my first exposure to a Linux host on the web, and the power of Linux servers on the web was quite clear to me even if the command line felt a little daunting.
Back to school
In 2000, I decided I wanted to go back to school and take my passion for physics further. I applied to the University of Sheffield and started in September of that year. I was disappointed that there were no degrees available to split with computer science but figured I could teach myself more in my spare time. It was during my degree that I started contributing to my first open source project: Gentoo Linux. It was an exciting community, and I had just built an AMD64-based PC and wanted to help get things working on this new 64 bit architecture. I also wanted to learn more about visualization and analysis software available for Linux, especially as it related to science. I spent quite some time on the science and amd64 teams, and I learned a great deal about porting and packaging, along with the intricacies of satisfying dependencies and all the possible build options many packages had.
I figured it was a great hobby, but decided as I neared the end of my BSc in Physics that I would like to pursue a career in research. I loved the idea of discovering new things, sharing my research with the world, and hopefully leaving the world a better place than I found it. I started a PhD in the nanomaterials engineering group in physics at the University of Sheffield in 2003, and one thing that struck me was how primitive the software we used was, especially control and data acquisition software used to control experiments. During my PhD, I wrote some software to control some of our lab equipment and also found flaws in other software but the source code was not available for it. I also ended up writing a C++ program to replace an Excel spreadsheet that had produced interesting analysis but was very difficult to modify. Several pieces of software we used were free, but no source was available, or it was very expensive and limited in scope.
Discovering the open source way
In the final year of my PhD, I decided that I wanted to go beyond packaging and write some original code to solve a problem I had. I knew about the Google Summer of Code program, talked to my supervisor, and agreed I would apply and spend the summer writing code if accepted. I found a project under the KDE umbrella to develop a molecular editor, it was absolutely perfect and just what I was hoping to do—actually edit molecules on Linux, visualize their structure, and save them out for later use. I applied and was overjoyed to be accepted—this was a very pivotal moment for me when I was exposed to people doing this in their spare time, and others doing it professionally. I think this is where I realized I was more passionate about improving the tools for scientists to collect, store, search, analyze, and visualize their data than I was about doing scientific research. This was also the point at which I began learning about the deficiencies in scientific publication and the reliance on certain citation metrics that didn’t make a lot of sense to me.
Through a lot of my undergraduate, and graduate studies I had been running a one-man consultancy business which centered around using Linux to set up email, firewall, web, and other services for local businesses. It was successful, but really not challenging enough. I decided after getting a taste for C++, OpenGL, and scientific data visualization that I wanted to do more of this. As it happened, one of the people mentoring my Google Summer of Code project was starting a new research group, and he offered me a two-year postdoctoral fellowship at the University of Pittsburgh in the Chemistry department. I was also very fortunate at this point to be more deeply immersed in several open source communities; during my time there I would meet several others.
A career in open source
Almost immediately after moving to the US, I was flown back to the UK for a meeting about tools for computational chemistry at the Daresbury Laboratory, and it was there that I met a wider community of scientists interested in approaches to working with computational chemistry codes. During my postdoctoral work, I had the opportunity to continue some of the open source work I had done as well as work on some new software for data acquisition and some simulation code looking at the roles of defects in electronic transport. I enjoyed my postdoctoral work, but in many ways it served to solidify in my mind that I needed to find a career where I could work with scientists to enable their research, and I became more passionate about open access, open source, open data, and open standards. Above all, I wanted to be a part of the solution, to help scientific research to use software to enable reproducibility, and to get back to showing all of the working.
I met one of the partners from Kitware at an open source conference and mentioned I was beginning to look for a job. It was then that I realized Kitware does a lot more than just CMake, and I agreed to apply for a position. I have been here for nearly five years now, and I have continued to develop open source software with some of the best researchers in the world. I am fortunate to have served as the principal investigator for several projects and have talked at numerous meetings—including TEDxAlbany, and I just heard that I will be talking at All Things Open later this year.
I really enjoy my role and spend a lot of time coding still. I love attending meetings and hope to continue improving the software available for scientific research and engineering. In recent years, I have had the opportunity to broaden my focus to other areas, as well as general rendering, working on everything from highly parallel distributed memory applications on supercomputers to mobile phones. I work on some very mature projects, such as the Visualization Toolkit and ParaView, as well as new projects like Open Chemistry and tomviz (which we just launched at the Microscopy and Microanalysis conference this week). I would never have had the opportunity to work in so many diverse fields before moving into my current role, and hope to continue making a difference and improving the state-of-the-art.
And, did I mention Kitware is hiring?
View the complete collection of articles from Careers in Open Source Week.