OSCON 2012: Kaitlin Thaney calls for open science

Register or Login to like
Register or Login to like
open source button on keyboard


At the recent OSCON 2012 convention, Kaitlin Thaney, Manager of External Partnerships at Digital Science, shared in her talk a fresh approach to scientific research. Her views offer a way out of the stagnation of many research fields, where the rewards system (how researchers get evaluated for job positions, raises and promotions) and the economics of funding discourage researchers from openly sharing their data and tools with larger communities and the public.

Here is a set of remarkable quotes from her talk:

We are still dealing with some of the archaic principles carried over even from the sixteen hundreds...

Thaney is referring to the retrograde practices that are still used in scientific publishing, particularly regarding the form and content of scientific papers. While the rest of the world has embraced the Internet, the large majority of scientific publications are still using this format: 10 pages of single spaced writing with occasional figures, pictures and tables. Overall, it is static and utterly insufficient in it's ability to enable anyone to properly evaluate the work, much less attempt to replicate it. 

We are locked in old mechanisms, when you think about how science is rewarded, the golden ticket in this case is still the scientific paper. Go back to 1665, the notion of writing something down on a piece of paper, and transferring knowledge that way...

The irony that the Web was invented at CERN with the explicit purpose of accelerating the dissemination and sharing of scientific information, adds insult to injury. There are certainly a few progressive journals, such as PLoS ONE, that have embraced the idea of online-rich publications, but the large majority of the scientific community is stuck in the publishing practices of the 1700s.

“...traditions last not because they are excellent, but because influential people are averse to change and because of the sheer burdens of transition to a better state.”  

—Cass Sunstein

When it comes to what our future was supposed to be [it's] 'where is my jetpack?' My day to day is: 'Where is my data?'...when it comes to academic research.

Thaney is alluding to the realization that some of the expectations we had a decade ago about our future, didn't come to be. We certainly don't have those jetpacks to commute to work, nor the flying cars in our garages, nor the dinner in a pill. However, when it comes to academic research, the missed future comes down to fundamentals. We still don't share data across research groups (except for some rare bright cases), and even more worrisome, in many cases researchers do not know where their own data is, certainly not the data of the experiment they ran last year.

Some of the most expensive research that has been done is managed by post-it notes and poorly annotated excel spreadsheets...

Thaney highlights the fact that very few labs have a systematic revision control system to store and classify their own data, and even fewer have formal databases or combinations of databases and laboratory information systems. Even in the rare case that systematic data-management exists, the data is not available for sharing outside of the organization that captured it. In the absence of such basic data tools, Thaney follows up with a fundamental question:

Where is my ability to reproduce experiments?

This may come as a surprise to any external observer of the scientific process, but the unfortunate reality is that the majority of labs are not able to replicate their own experiments, much less facilitate the replication of those same experiments by others. And Thaney points rightfully to the source of the problem: the fact that the incentives for academics are tied to the number of papers they publish and not to the actual relevance of the research, nor to the level by which their results can be reproduced by independent groups. Researchers are asked for novelty, not for correctness of results. The outcome of this is that about 90% of scientific papers are not reproducible.

At the core, funding agencies, scientific publishers and academic institutions have confused the profession of researcher with the profession of inventor. By using metrics reserved for inventors to evaluate their researchers, they convert scientific workers into “paper writers”, preventing researchers from becoming what they should have been: agents of discovery.

Scientists can be really stubborn human beings and if the incentives are not there, you are not going to see the behavioral adoption... [Science] is one of the most important enterprises that we have and it is time that we redefine performance.

By applying the hallmarks of open source communities—openness, transparency, meritocracy, and reproducibility—to these problems, we have the opportunity to restore the scientific method in scientific research. 

The work that Thaney does today at Digital Science is focused on providing tools that will enable researchers to restore the practice of reproducibility in their daily work, sharing data with peers and the public, and accelerate research and discovery for all.

Luis Ibáñez works as Senior Software Engineer at Google Inc in Chicago.


90% is a very sobering statistic... I knew it was bad, but I didn't know it was <em>that</em> bad.

One thing that's interesting is that during my first year as a Ph.D. student, the importance of reproducibility in research was never made explicitly clear to me by anyone. While perhaps this is a result of the discipline I'm in (Computer Science doesn't do the same kind of "empirical studies" as say, Education), it still feels like something is missing in the training of the graduate students who are most likely writing these papers.

I would love to see some NSF grant money go towards establishing and populating publicly accessible databases for scientific research. Perhaps if grant applicants can be enticed (or coerced) into making open data release as a condition of funding, we might see more data out there.


You make an excellent point, and I fully agree with you.

One of the main roots of the problem is that Reproducibility is not formal part of the Ph.D. training. As incredible as it may sound, researchers in training do not get to be educated in the fundamentals of the scientific method. Very few of them go through a format class on epistemology or a course on experimental design.

Instead they get the informal mis-education on the "Publish or Perish" misguided and corrupt culture, which only serves the business model of publishers, and doesn't returns at all on the economic investment that society at large makes on scientific research. Graduate students learn about this corrupt practice of "you must publish...or else", as free "career advice" given by well-intentioned but misguided mentors and peers. This feeds into a vicious cycle in which personal and institution reputation is cultivated as a way of securing future research funding.

There is a Reproducible Research movement working hard on raising awareness about this disconnection, and developing tools for bringing back reproducibility in to the mainstream practice of scientific research.

For example, this month's issue of IEEE Computer is dedicated to

"Reproducible Research"

Most conferences and journals do not have requirements of reproducibility verification as part of their review process. They are, as most of the academic publishing field, obsessed with "Novelty" (they confused themselves with the Patent office), and pay little attention to whether the content of publications is real and whether it works at all or not. (See the recent scandal on Fraud in scientific publications: http://www.anesth.or.jp/english/pdf/news20120629.pdf, where a researcher managed to fabricate 172 papers...)

The two critical elements to get out of this sad situation are:

a) Educating the next generation of researchers on the true practice of the Scientific Method.

b) Making open tools available for them to easily incorporate reproducibility in their daily work.

The Requirement for Reproducible research, leads by necessity to Open Science (open data, open source software, open access publications), since the first thing that an independent group needs in order to verify the reproducibility of published work, is access to the data, software, parameters, and reports that fully describe the work presented in a paper.

Here are two interesting talks by Victoria Stodden on these issues:

"Reproducible Research: A Digital Curation Agenda"

"Open Science Summit Keynote 2011"

It has been said that Open Source is the application of the scientific method to the field of software engineering. Curiously, the time has come for Open Source communities to give back to scientific research, and help restore the rightful place of Reproducibility in science.

See also this recent letter <a href="http://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347%2812%2900074-2">Let the four freedoms paradigm apply to ecology</a> published in Trends in Ecology & Evolution: <em>"... the explicit use of Free and Open Source Software (FOSS) with availability of the code is essential for completely open science".</em>

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.