Science finds a better foundation for research in the open

Image by:

Opensource.com

Imagine a world in which reproducible, repurposable, open scientific research is the norm. Certainly there are potential stumbling blocks ahead:

confidentiality of sensitive medical data
embargoes on potentially high-risk research findings
the conundrum of how to facilitate commercial applications whilst reconciling the needs of the academic innovator with those of investors

The flipside of having to resolve such issues, though, is the incredible power of transparency in the research process which openness offers. Any researcher that has hammered away at a piece of published research for months in a futile attempt to recreate its findings will understand the feeling of extreme frustration when the scientific literature falls short of reproducibility. If so much of our research isn’t repeatable, aren’t we building houses upon rather sandy foundations?

Reproducibility concerns provided the basis for Begley and Ellis' 2012 Nature commentary, which called for:

greater transparency in pre-clinical cancer research
incentivisation of negative reporting
widespread release of data from preclinical studies

Of particular interest are two studies from Amgen and Bayer Healthcare, which attempted to recreate the results from over fifty "landmark papers" in the field. The two studies met with an appallingly low rate of success—11% and 25% respectively. Such problems are by no means confined to the biological sciences; computational disciplines are equally troubled by a lack of reproducibility and transparency—take, for instance, this opinion piece in the computational chemistry literature, which asserts that without reproducibility, a scientific paper is effectively reduced to being an advertisement for an author’s research group.

Undoubtedly the prevailing publish or perish attitude in academia does little to help the situation. Our research metrics are guilty of focusing almost exclusively on written publications: we’re only ever as good as our last paper. If we continue to reward research production alone, whilst ignoring the value such output yields to a potential user, the reproducibility problem isn’t going to disappear. Open working practices have a huge role to play in addressing this.

Transition to an open model of working will not only require new infrastructure for sharing of data, figures, writing and the associated meta-information, but also a significant cultural shift from traditional closed working practices across the scientific research landscape. Naturally, this latter change requires more than large-scale funding and planning. It demands substantial alterations in how we perceive and quantify the success of our scientific outputs. How are we to achieve such change across the academic community, whilst meeting the diverse needs of a multitude of scientific disciplines? Policy development is beginning to percolate down from the funders at the top of the academic hierarchy. Now we need to think about revising our working culture from the bottom up as well.

Perhaps inevitably, the closed approach of traditional science still exerts a strong influence on young scientists. The 2012 "Researchers of Tomorrow" study of PhD researchers in the UK confirmed that research students’ outlooks are heavily influenced by their peers and colleagues. Consequently, the research style a student develops is significantly determined by the ethos of their first, formative, research group. Shouldn’t we be educating our researchers BEFORE this point, providing them with the cultural know-how and awareness that might help them to take control in shaping their own research outlook?

In light of this philosophy, a novel teaching approach piloted at the University of Oxford earlier this year is hoping to open the door to new, dynamic ways of training our young scientists. The Open Science Training Initiative (OSTI) introduces an educational model known as Rotation Based Learning (RBL) and was piloted on a cohort of 43 graduate students from the physical and life sciences.

The majority of the two week timetable is given over to research time, accompanied by daily lectures on methodologies to support open science, including licensing; academic publication; version control and data management planning. Rather than passively absorbing lectures, students are expected to put the techniques into practice in their own assessed work for the course. In a departure from the traditional, single-output approach to assessment, OSTI requires students to deliver their research with prospective users in mind, before being graded and peer-critiqued on the openness and reproducibility of their work. The OSTI scheme mandates delivery of a coherent research story: that is, the code, data, figures and written explanation necessary to understand the research, all these outputs being appropriately licensed.

The RBL approach splits the teaching cohort into groups; communication between groups is strictly prohibited. Phase 1 sees the groups attempt to reproduce results from the published computational literature and release a coherent research story via GitHub. In Phase 2, the research problems are rotated: successor groups then have to build on their inherited project, relying solely on the research story provided to them by their predecessors. The OSTI pilot demonstrated that this combined focus on producer and user roles creates a dynamic learning environment for the students and improves the quality and utility of their research output. Contextual application of licensing, data curation and data management played a significant role in allowing students to develop confidence in using these approaches and integrating them into their working practices.

Following on from the success of the pilot, discussions are already underway to establish instances of OSTI at institutions in the USA, UK and Europe. The post-pilot report was released last week and is available for download via the initiative’s official site, and CC-BY licensed course materials for the exercises and lectures will be appearing on the corresponding GitHub repository in the near future. Let’s hope this heralds the start of a new era in graduate training for careers in science research! Training is unlikely to provide the entire solution to the reproducibility issue, but greater awareness of data techniques, digitally-assisted research and the perspective of the research user has a valuable contribution to make in fostering high-quality, high-utility research amongst the next research generation.