How to use open source tools for research
Analyze, collaborate, and share research with open source tools
In part one of my series on using open source in research, I looked at LibreOffice, LaTeX, and two packages to use in psychology experiments. In this article, I show you software to help handle the data in your papers and disseminate the results.
Analyzing your data
Those used to SPSS may wish to have a look at two programs. PSPP is very similar, even including the same menus, but is free and easier to install. It also uses the same file format, so existing work in SPSS should open perfectly in PSPP. The other is a new package, JASP, which bills itself as "a low-fat alternative to SPSS." JASP has a very attractive interface where you get immediate output with every click, meaning that mistakes can be corrected without having to navigate through several windows all over again. Complicated analyses can involve a lot of trial and error to explore the effects on the output, so this is a major timesaver. JASP handles Bayesian as well as classical analyses, and because it is being developed by psychology researchers the output tables are already in APA6 format and can just be cut and pasted into your research document.
The most powerful free and open source (FOSS) statistics program, though, is R. Originally a FOSS version of the statistics language S, R has shown explosive growth over the last few years, with some 7,000 add-on packages available to handle nearly any statistical requirement and an increasing number of books, courses, and blogs (e.g. R-bloggers) focusing on practical usage. Some websites concentrate specifically on how to use R for psychological research—an example is William Revelle's Personality Project, which also offers an R package called psych, a toolbox for personality, psychometrics, and experimental psychology.
One of the main advantages of R is that you can write a sequence of commands to a file, and then R will run all of them one after the other, meaning that entire analyses can be run again with one command instead of having to point and click through multiple screens. (Although you can do something similar with the "syntax" functions of SPSS, it is not so versatile.) A LaTeX package called knitr allows R code to be integrated into LaTeX so that if you edit your data, existing tables and diagrams in your paper will be automatically recreated to show the new data.
However, power comes at a price, and R has a notoriously steep learning curve. Beginners like myself can look at R packages like mosaic or swirl, which introduce R gently to a new user, and the RStudio interface, which makes it easier to type in the R commands.
Collaborating on research
Many people working on group work or a paper with collaborators have used the track changes feature in a word processor document to identify who has changed which bits of the text. In my experience, doing this you soon end up with an unreadable mess of colors, strikethroughs, and underlines. Another option is for each collaborator to circulate a saved copy of their changed document under a new filename, but in that case you end up having to manually check whether Bob's copy of February 11 includes the changes Alice circulated on her copy of February 9.
A much better way is to use version control. This keeps a record of all changes made to a document so that you can undo them if necessary, or wind back the clock to an earlier version when you realize that the reworking of the discussion that you spent so much time on does not read as well as you thought it would. Most importantly, everyone is working off a single, up-to-date copy of the document, meaning everyone knows precisely where the project is at any given time.
Version control works best on text files, which is in itself a good reason to use LaTeX files or R scripts rather than word processor documents or spreadsheets. The most popular version control system is Git, and if you're collaborating with researchers outside your immediate location, it makes sense to store the repository on the web. You may be OK with a public repository where anyone can see the files, but if you're working on a brilliant new idea that you want to keep under wraps for the time being you may need a private repository. Options are GitHub, where public repositories are free but you pay for private ones, or Bitbucket, where both types are free. Both GitHub and Bitbucket offer interface programs for Microsoft Windows or Apple Mac OSX, or users of the latter can use GitUp.
Sharing and presenting your work
LibreOffice includes a presentation program, but I prefer using LaTeX with the beamer package. The main benefit is that the positioning of text is precisely the same on each slide—for instance, titles or bullet lists do not show up in slightly different positions because you have moved each one a little bit when editing it.
For posters you can also use LibreOffice, or again you can use LaTeX, this time with the a0poster package. And remember, you will attract more of an audience when presenting your poster if you choose clothes that match your poster color!
When it comes to publishing, the assumption is that psychology journals want submissions in Microsoft Word format, but in fact a large number accept PDF submissions, and many also accept LaTeX files. For example, journals such as Body Image, Journal of Health Psychology, and Sex Roles will take submissions in LaTeX (Sex Roles even offers a LaTeX template).
There is an increasing trend toward open access for both research and the data it is based on. The U.K. Research Councils defines open access as "unrestricted, on-line access to peer-reviewed and published research papers," and funding recipients are expected to publish with journals that allow this. A parallel data policy states that data should also be made openly available. Various studies indicate that open access papers actually garner increased citations, and there is also a citations advantage for papers associated with open data.
Data repositories like Zenodo, Figshare, or Open Science Framework allow you to upload datasets, posters, and so on and have them assigned a DOI (digital object identifier) so that these resources can be referred to in citations. Repositories for specialized data such as functional magnetic resonance imaging (fMRI) datasets also exist. There is even a new open access journal, the Journal of Open Psychology Data, dedicated to publishing papers describing psychology datasets with high reuse potential.
I hope this short review of free and open source tools will encourage students and researchers to take a look at what is available, whether or not it might be useful to them.
An example that shows one of the benefits of using open source is when a recent review paper concluded that the experiment package PsychoPy showed serious timing errors, and the lead developer of PsychoPy was able to examine the data and let authors and readers know within a couple of weeks, via the article's comments page, that the version of PsychoPy used was three years out of date and that more recent versions did not have these issues.
This shows not only how responsive the developers of open source are towards their users, but also how efficiently corrections can be made when data is open and journals allow readers to make post-publication comments. Open source and open access are likely to become increasingly important in psychology research, not least because they allow and encourage this sort of transparency, replication, and collaboration.