Years ago, in a graduate computer science course, I was tasked with implementing an algorithm for "variational image segmentation by motion detection." The algorithm was, as they say, a doozy. Tersely described over the course of half a dozen papers, it had dozens of subroutines, which when implemented grew to span thousands of lines of MATLAB code. But there was one subroutine, mysteriously called the "numerical upgrading" routine, whose description was mysteriously absent from the scientific record. Without this small but vital routine, the whole marvelous image segmenting machine just sputtered and ground to a halt. Crash! Panic! Woe.
Fortunately, after many late nights, I managed to track down an unpublished technical report that outlined the missing routine (in Japanese, but that's another story). The marvelous machine rumbled to life, images were segmented, and my GPA was saved.
That course taught me many lessons, and one of them was that we have a long way to go toward making scientific work, and particularly scientific code, reproducible. This is one reason we've recently started an open source project called JotGit.
JotGit brings together git, for powerful version control and offline working, with online, collaborative rich text editing. Our aim is to make it easy to use git to track and publish everything related to a scientific paper: the text of the paper itself, the data that goes into the paper, the code used to process the data, and, well, everything else. Here's a quick demo of the prototype:
The code's here on GitHub. JotGit's still a prototype, but we're releasing early and will be releasing often. To make it easy for you to run, host, and hack JotGit yourself, we've built it with the meteor web framework, which is very easy to get running on any Mac or Linux system and has minimal dependencies. Meteor makes it really fun to develop for the real time web, so even if you haven't done any web development before, it's a great way to start!
The big idea behind JotGit is that everyone should be able to contribute to a scientific paper using the tools and processes they love. The scientists who currently use git also tend to write their papers with tools like LaTeX and Markdown, which are text-based and easy to manage with git, but most scientists still use Word documents, which aren't compatible with git. And, unfortunately, there's a steep learning curve from Word to Markdown/LaTeX/git.
This is a problem we know well from our experience running writeLaTeX, an online collaborative editor for LaTeX with a rich text layer that brings WYSIWYG to LaTeX. One of our major goals for writeLaTeX has always been to help LaTeX geeks (like us) collaborate with non-LaTeX geeks (like most of the people we work with). If you're used to writing your papers in LaTeX with powerful scripting, version control, and history features, you probably cringe when someone hands you a Word document. But, if you're used to Word, you probably have the same reaction when someone hands you a bunch of computer code that don't look anything like a paper. With JotGit, we use powerful tools like git, LaTeX, and Markdown on the back end, but we wrap them up in a simple, collaborative, WYSIWYG front end. Ultimately, you can use whichever tools you prefer.
We're excited about the opportunities for collaborating on and sharing scientific papers afforded by a distributed version control system like git. GitHub has really revolutionized open source software with its fork and pull request collaboration model. Can we do the same for the scientific record? What does it mean to "fork" a paper? Right now JotGit works with local git repositories, but soon we'll be hooking it up to GitHub, so we aim to find out.