The Open Chemistry project promotes open source, data, and standards.

Open Chemistry project upholds mission of unorganization, The Blue Obelisk

to compete or collaborate
Image by :

Chemistry is not the most open field of scientific endeavor; in fact, as I began working more in the area (coming from a background in physics), I was surprised with the norms in the field. As a PhD student way back in 2003, I simply wanted to draw a 3D molecular structure on my operating system of choice (Linux), and be able to save an image for a paper/poster discussing my research.

This proved to be nearly impossible, and in 2005 a group of like-minded researchers got together at a meeting of the American Chemical Society and formed an unorganization: The Blue Obelisk (named after their meeting place in San Diego).

The Blue Obelisk

In 2006, the original group published a paper entitled, The BlueObelisk—Interoperability in Chemical Informatics, which detailed the aims of the unorganization, succinctly captured as open data, open standards, and open source (ODOSOS); but not necessarily open access. In fact, that first article is locked behind a paywall.

The second article, Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years onsummarizes the progress made in the first five years, and was published as open access to be available to all. It states the core aims of the group: make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards.

The Blue Obelisk has served as a nucleation point for a large array of researchers and developers in fields as diverse as molecules, reactions, computational chemistry, spectra, and crystallography (beyond the original aim of chemical informatics).

This also exposes a large number of open source toolkits (such as Open Babel, RDKit, CDK,and Indigo) that are written in C++ or Java with bindings to many other languages. There are also several “second-generation” tools building on these toolkits, such as Avogadro and Bioclipse. In addition to open source, significant development has taken place in developing open standards, such as the Chemical Markup Language, InChI, Open SMILES, and QSAR-ML, which are improving the state of data exchange in chemistry. Open data efforts have also provided valuable resources such as the Blue Obelisk Data Repository, which offers a liberally-licensed, curated dataset.

The Avogadro Project

The Avogadro paper was published in the same journal as an open access publication the following year, entitled, Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. The paper provides more detail on the Avogadro project and how it had evolved over the years. Since releasing Avogadro 1.0, we’ve been thinking about how we might do things better given the opportunity to rewrite the code. Since my move to Kitware, I have pursued funding to develop an open source suite of tools to meet the needs of computational chemists through an extensible set of applications addressing all major parts of the workflow.

The Open Chemistry project

Thanks to Phase I, and later Phase II SBIR funding from the US Army Engineering Research and Development Center, we’ve developed the Open Chemistry project and set up infrastructure under that domain.

As part of this project, we have been developing three applications: Avogadro 2, MoleQueue, and MongoChem. Each application addresses different aspects of the chemists workflow, namely simulation, input preparation, visualization, and analysis in Avogadro 2. Execution of computational chemistry codes is performed both locally and remotely (integrating with high-performance computing schedulers) in MoleQueue; this application is not actually specialized for chemistry applications and has already seen use in several other domains.

Open Chemistry Diagram

Finally, the development of a desktop tool for storage, indexing, search, and informatics analysis of large collections of chemical datais done in MongoChem. Not only do these applications stand alone, but they are also able to communicate using a simple JSON-RPC 2.0 based API over local sockets to coordinate work.

These tools make use of many open source projects, such as VTK and CMake, developed principally by Kitware, Qt, MongoDB, Open Babel, Gerrit, Doxygen, GTest, as well as others developed by other companies and the community at large. These tools have been developed out in the open, using a quality inducing software process that employs code review, continuous integration build testing on all major platforms, unit tests, and all of the other things expected in a typical open source software project. We have been building binary installers on a nightly basis since 2012, and in April, we made our first beta release with the aim of gathering feedback from the wider community as we continue to add features.

We are excited about opening up chemistry by developing an open, cross-platform, and extensible platform that meets the major needs of chemists using modern techniques. This offers a graphical framework that can be leveraged by many open and closed codes in the community to generate input, store output, and analyze/visualize data produced.

We continue to work with the community to address the need for more open source, open data, and open standards in chemistry via the Open Chemistry project. Get involved.

About the author

Marcus D. Hanwell
Marcus D. Hanwell - Marcus D. Hanwell | Marcus leads the Open Chemistry project, developing open source tools for chemistry, bioinformatics, and materials science research. He completed an experimental PhD in Physics at the University of Sheffield, a Google Summer of Code developing Avogadro and Kalzium, and a postdoctoral fellowship combining experimental and computational chemistry at the University of Pittsburgh before moving to Kitware in late 2009.