Improving the speed and quality of research via shared algorithm implementations

Image by:

Opensource.com

The dissemination of ideas throughout a research field is absolutely critical. For example, recent algorithms have become quite complex and ideas are tightly coupled to their implementations. The ability to share and obtain these implementations quickly is crucial to the future success of many fields. My message is easily summarized - doing your part to enable the maximum potential of future research is as easy as sharing your code.

To understand the problem, consider a simple invention, the pencil. One can explain the concept of a pencil in a single sentence: ``A pencil is a writing instrument usually constructed of a narrow, pigment core inside a protective casing.''

Consider that you have thought of an incremental improvement to the pencil--you want to make it one color for the first half of the pigment, and another color for the second half! If you were the creator of the original pencil, this would be extremely easy to produce--simply use two different pigments in your pencil core production process.

However, if you are not the creator of the original pencil, this is now an extremely complex task. You must re-address fundamental questions like ``How do I compact graphite?''. These questions have already been studied in detail by the original creator. It is not likely that he will let you use his pencil factory to try out your idea, so you must start from scratch.

This contrived example is surprisingly similar to the daily situations encountered by engineering researchers. Someone develops an algorithm and spends years perfecting its implementation. Now someone else wants to use that algorithm as a step in their research. The path rarely strays from:

1) Look online and find no publicly available documentation of the algorithm published by the author.

2) Email the author asking him to share his implementation.

3a) The author agrees! Now you realize the code has been written without regard for future users--there are no comments or reasonable attempts at an API.

3b) More commonly, the author will not respond or cannot share the code. Either way, you move forward with nothing.

4) Decide whether to take your research in a different direction or implement the algorithm yourself.

5) Spend weeks fighting with the nuances that were not in the algorithm’s publication.

Having experienced the above path on countless occasions, I have classified the problems into three categories:

1) Time expenses of newcomers in a field. A new researcher cannot hit the ground running--they must work backwards for several years and write or re-write previously published algorithms so they can compare new results as well as build on old ideas.

2) Research group cliques. When a lab has multiple researchers working together on parts of a unified problem, the effects of the pencil analogy are amplified. After a few years of working together on a single code base, incremental improvements are extremely quick to churn out, and nearly impossible to implement by an outsider. This leads to an even more intimidating barrier for a newcomer, which ultimately leads to a less diversified outlook on problems.

3) Bad re-implementations. The original researcher likely dedicated months or years to a particular algorithm. A new researcher simply intends to use it as a small piece in a larger puzzle. The implementation he creates in one week cannot possibly compare to the original implementation in efficiency or correctness. This leads to inaccurate comparisons in research results, as well as overall lower quality and speed of future research in the field.

Most journals and conferences do not currently require the submission of code. However, this trend seems to be changing. The 2010 Computer Vision and Pattern Recognition conference added some fields to the reviewer score card which ask about repeatability criteria. For example: "Is the code and data publicly available?"

Science has also started to make reproducability a strict requirement. Another new concept is that of an online journal. For example, the Insight Journal does a great job encouraging the submission of code to accompany submitted articles.

As I have outlined, the lack of code-sharing has several severe negative impacts on scientific research. It makes it harder for new entrants to do good work, and slows progress for everyone.

And the answer is so simple. We can fight this problem with openness. Publish your code. It's that easy.