Bringing open source tools to scientific research communities

Open Chemistry project raises up the next generation of researchers

Open Chemistry project raises up the next generation of researchers
Image by :

In 2007 I took part in Google Summer of Code (GSoC) developing the Avogadro application. As we were developing Avogadro, we founded The Open Chemistry project as an umbrella project to develop related tools for chemistry and materials science. Our goal is to bring high quality open source tools to research communities working in these areas, and to develop other tools to complement the Avogadro molecular editor.

This year we were very pleased to be selected as a mentoring organization for GSoC; a few of our mentors are Geoff Hutchison, Adam Tenderholt, David Koes, and Karol Langner, who are all long-time contributors in related projects. And, we were lucky to get three slots for student projects. To get started, we lined up a number of mentors from related communities, and developed an ideas page.

The projects

During the proposal period, after selected organizations are announced, students are asked to submit their project proposals. These proposals are often based on the ideas suggested by the mentoring organizations, but they can also be original ideas. We received some amazingly high-quality proposals from students, and we were blown away by some of the proposed directions. It was tough selecting just three out of our proposal pool, but it was a nice problem to have.

We chose a project that uses PyBind11 to wrap the Avogadro 2 API; a project to add crystal/materials functionality to Avogadro 2; and a third to extend our JSON format to support data exchange between Avogadro 2 and the cclib projects.

Our student projects aimed to improve upon areas that we had identified as lacking, with mentors ready and able to offer guidance and support over the course of the summer. The Python wrapping project would have made the Avogadro 2 application scriptable and made it easy to automate repetitive tasks, or to extend the application to do new things, without adapting the C++ code. The crystal/materials project added support for a whole new class of systems that have periodic/repeating structures of atoms. These systems have unique challenges as they rely heavily upon symmetry, minimal representations, and periodic boundary conditions. Finally the extension of the JSON format enables Avogadro 2 to reuse the cclib Python library, and makes all supported formats available to Avogadro 2 users in a friendly desktop application.

(Unfortunately, we lost touch with the student working on the Python binding project quite early on, and were not able to establish contact after the initial phases of the program. We let Google know about the situation, and concentrated on our two remaining projects.)

Our students worked well throughout the summer, and achieved all of the high-level goals set out for their projects in their proposals. Our mentors worked with them throughout the summer, and they quickly learned the development tools. Most of the students' code has been merged to the main development branches of both projects, and should be featured in the next release of each project.

The students

Patrick Avery is a PhD student, and worked on features in Avogadro 2 that will ultimately help him in his research after he is done. He developed an original proposal and discussed adding additional features. He successfully added a number of features to support crystals, symmetry, and other useful tools to expand symmetry-reduced representations. He also went beyond what was originally proposed and expected of him, and significantly extended support for undo/redo when editing structures, and added initial support for a periodic semi-empirical code to do quantum calculations on crystals/material systems.

Sanjeed Schamnad submitted a proposal based on things we needed help with, and submitted code to both the cclib and Avogadro 2 projects. He worked on extending the JSON representation developed in Avogadro 2 to support a greater range of electronic structure/quantum mechanical information. This was then added to the cclib project as a writer, exporting data from a range of supported codes to this common format, and then extending the reader in Avogadro 2 to consume the data. We are still working on the best way to distribute Avogadro 2 with a Python interpreter, and to package cclib in our installers. This has been a great project in terms of getting more data into Avogadro 2, and also in building bridges between two related open source chemistry projects.

We are working on getting the remaining contributions to the point where they can be merged.

The big idea

We were impressed by the dedication of our students. Their summer projects not only contributed new code to our code base, but enabled us to grow the Open Chemistry project as an umbrella project for related software projects. We hope to extend the Open Chemistry organization in future years to other projects, and to facilitate greater participation of students in the open science movement. It is important to train the next generation of researchers in the art of software development, community, and collaborative development.

We are in the process of preparing to attend the mentor summit hosted by Google, which serves as a great opportunity to discuss what went well and how our mentors and program administrators might improve the experience for all involved in future years. Thank you to our students, and to Google for their continued support in getting students to "flip bits, not burgers."

About the author

Marcus D. Hanwell
Marcus D. Hanwell - Marcus D. Hanwell | Marcus leads the Open Chemistry project, developing open source tools for chemistry, bioinformatics, and materials science research. He completed an experimental PhD in Physics at the University of Sheffield, a Google Summer of Code developing Avogadro and Kalzium, and a postdoctoral fellowship combining experimental and computational chemistry at the University of Pittsburgh before moving to Kitware in late 2009.