Default to open: The scientific method

No readers like this yet.
Doodles of the word open

Opensource.com

The scientific method: it all starts with a simple, essential question. How can you "know" something?

How can we gather knowledge and have confidence in the correctness of such knowledge? The lucubration of many smart minds over the centuries came to refine the following:

  1. You can never know something with certainty.
  2. You can propose a hypothesis.
  3. You then proceed to test that hypothesis with experiments.
  4. Each experiment must attempt to show that the hypothesis is false.
  5. The more you fail at showing that the hypothesis is false, the more confidence you gain in the hypothesis.
  6. If you ever succeed to show that hypothesis is false, then you get rid of the hypothesis, let others know, get more coffee, and go back to step 2.
As a follow up to Mel's interesting post on copyright assignment, I started writing a post describing in detail the reasons why nobody should sign the ACM Copyright Transfer Agreement. In the process, however, I realized that those reasons required explanation of the economics of academic publishing, and that in turn, to understand such economics, it was necessary to first explain how the scientific method works. This is part one of a three blog post to do just that.

The reason why, contrary to popular belief, is it not possible in step 4 and 5 to simply run an experiment to show that the hypothesis is true, is that the outcome of an experiment could also be explained by hundreds of other hypotheses, and therefore, our current beloved hypothesis is not the only possible explanation for what we observe. The fact that the outcome of an experiment is consistent with what the hypothesis would predict does not imply that the hypothesis is correct. Instead, such consistency only give us some confidence on the fact that we can continue using that hypothesis as a good description of how the real world works. An experiment that fails to prove that the hypothesis is false, is simply one piece more of evidence that is consistence with the hypothesis.

Such is the scientific method that we hold today as the best method to "know" something. The best hypotheses we have are the ones that have survived hundreds of repeated attempts to prove them false. We group hypotheses into larger bodies of knowledge that we call theories, and we like the theories for which we continue to fail in our repeated experimental attempts to prove that they are false.

Of course, the experiments that we run in Step 4 may be imperfect. Experimental errors may creep in. Computation errors may have slipped in the data analysis. Software bugs may have been present in the program that we used to compute the outcomes. Lack of coffee may have misled the observation of the experimentalist, or gremlins and leprechauns may have been perturbing the measurement instruments... Therefore, before we trust the outcome of the experiment itself, we must repeat it, and we must do so many times. Enough times to make the argument that it is unlikely that the outcome is a simple statistical outlier. The repetitions must also be done by others, under a variety of conditions, to ensure that our mindset has not biased the experiment, and the analysis that leads to our conclusions. Independent verifications also help to discard the potential effects that may result from our particular set of equipment, and the local conditions under which we performed our experiment.

When the experiments are replicated by others under processes that are independent of ours, and arrive to similar outcomes, we gain a lot of confidence in such experiments.

This is no news to software developers.

As developers, we are not satisfied with "the program runs in my computer," nor the lazy "it works for me"explanation. We demand that the program must be tested in different computers, by different people. We expect the code to include tests, and the tests to have been run by many different people. Only then do we trust the code.

It is essential, therefore, for the proper functioning of the scientific method to have a system of dissemination of information by which researchers can share with others the full description of their hypothesis, and the precise recipe used to perform their experiments. Armed with that precise recipe, other researchers are expected to repeat the experiments, and in turn report on the outcome of such experiments. Sometimes the outcomes will match, sometimes they will not, and then all the researchers involved are supposed to look at the details of the process to figure out the causes behind the differences.

The information dissemination tools used by the scientific community have changed over the centuries. Initially, they were simple letters sent to other scientists. Then, they evolved into publications printed by scientific societies that were focused on a particular scientific domain. The content of the publications was supposed to include the full details on how someone else could repeat the experiments. The recipients of the publications would attempt the recipe and then publish back their results. These process of experimental verification of reproducibility came to be called “peer review.” It was typically done by your "peers," those other geeky scientists like you, who had a preference for spending time in the labs in the basement of the physics building over the pleasures of going to the beach on a Sunday.

That's how the scientific method is supposed to work. Galieo Gallilei built a telescope, pointed it to the sky, found that Jupiter had four satellites, wrote a letter about it, and sent copies to his peers. The peers built similar telescopes, pointed them to the sky, and observed said satellites around Jupiter. Life was good! But that was the 1600s.

Since then, the professionalization of scientific research, and its emergence as an economic activity that brings funding to academic institutions, promotes activities in R&D firms, and provides a way of building careers and the means of making a living to a significant number of people led to a degradation on the standards of quality in the process. The rigorous peer review that used to include the full replication of the experiment degraded into a cozier peer review in which the peer now takes the liberty of skipping the verification of the experiment. After all, what's the point of running an experiment if I already "know" what the outcome is going to be? (this is sarcasm)

In today's environment of academic publishing, and its associated process of "peer review," the rigorous tradition of verification of reproducibility has been degraded and diluted. Too often peer reviewers base their reviews solely on their reading of a particular submission and previous experience, they do not repeat the experiments, and then go on to pronounce opinions on the correctness and value of the paper. The time that reviewers dedicate to writing a review is often in the range of two to four hours. This includes the time to read the paper and the time dedicated to figuring out a set of reasonable criticisms that show the editor that the reviewer was indeed qualified to have something to say about the paper. The activity of review is also an unpaid, and normally an anonymous one. Many times reviews are actually performed during flights and waiting time in airports, or after hours and weekends. This approach to “peer-review” has been degraded and devalued to the mere venality of “you say, I say.” Such reviewers are by definition “Pundits,” not scientists.

The excuses, of course, are many. “It’s too expensive to repeat the experiment.” “It takes too long.” “I have another paper to write so it gets to be reviewed by others to feed the eternal cycle of publish-or-perish.”

Knowing that such is the process, authors don’t bother with describing in their papers the "little details"on how to replicate their experiments. Some authors don’t even bother with repeating their own experiments a couple of times before publishing, so they don't get to discount the known interference effects of gremlins and leprechauns. Some authors go a step further and take the liberty of skipping the experiment altogether, since, after all, what's the point of running an experiment if they already “know” what the outcome is going to be? (sarcarm here again)

 

The public often is deceived by the label of “peer-reviewed” publications. Academic institutions continue misrepresenting the modern “peer-review” as the hallmark of science and they fail to mention these important details about “verification of reproducibility” that are the real essence of the scientific method, and the core process by which we can really get to “know” something.

 

Eric Raymond made the point that "open source is the application of the scientific method to the process of software development." In particular, the aspects of information sharing, educational commitment, peer review, and real experimental verification are ingrained in the way open source software projects run their daily business. These concepts were consciously or unconsciously copied from the way scientific communities used to run their operations. Today, there is a thing or two that open source communities can bring back to academic communities when it comes down to remembering how is that knowledge is to be obtained, verified, and shared.

Many academic communities would benefit from Linus Torvalds simple dictum, "Talk is cheap; show me the code."

That reminds us that just talking and providing opinions about scientific topics is not enough. The means and practices of verifying the claims made in the talk must be part of the scientific and academic discourse.

In my next post, I’ll cover how today’s academic institutions, scientific and technical societies, academic publishers, funding agencies and scientist implement the peer-review system and what are the motivations and the rewards systems that animate their interactions.

Tags
User profile image.
Luis Ibáñez works as Senior Software Engineer at Google Inc in Chicago.

2 Comments

Lies, Damn Lies, and Statistics

Another area where science is very weak is in the use of statistics. People who work in areas which are not rigorously mathematical, such as psychology or economics, will take a course in statistics in order to analyse their experimental data. Most amateur statisticians don't understand that statistics deal with probabilities, not certainties, even after they have taken statistics 101. They will report that their experiments prove that their hypothesis is true when what they should report is that there is a X% probability that their hypothesis is true.

Or to put it in the terms Luis Ibáñez used in his article: The experiment shows a X% chance that the hypothesis is true and a Y% chance that the hypothesis is false. Since X > Y then the hypothesis is likely to be true.

-----------------------------
Steve Stites

Steve,

Thanks for pointing out the role of statistics in this process,
and how often they are misused in the context of publications.

The topic is discussed by Victoria Stodden in this recent talk:
http://www.youtube.com/watch?v=nYWormzn1Mc

In particular, she mentions the focus on "p-values" and their
role in the publishing process.

A better education on statistics and experimental design should
naturally lead to a better appreciation of the importance of
"Reproducibility Verification" in the academic publishing arena.

As a community we can help transform the practice of scientific
research and academic publishing by clarifying that the label of
"peer-reviewed" paper only means that two or more reviewers
read and commented on the article without necessarily verifying
its correctness. While the much more significant label of quality
of "Reproducible Research" (RR) paper, means that an independent
group, managed to replicate the claims made in the paper by using
the detailed description provided in the paper, presumably including
the source code and data that the authors publicly shared along with
the paper.

Again, no surprise for Open Source communities here.
This is standard practice in the daily business of open communities,
but unfortunately is more the exception than the rule in academic
communities.

For more on "Reproducible Research", the recent symposium
on "Tools and Strategies for Scientific Computing" is probably
of great interest for software developers:

http://stodden.net/AMP2011/

(videos of the talks and the associated slides
are publicly available at the site).

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.