books in a library, stacks

ktchang16 via Flickr. CC BY 2.0

I have always liked librarians. Since ancient Greece, long before the first open source code was shared, librarians have been championing the democratization of knowledge. Modern librarians are no different, whether they are recommending good books to get kids hooked on reading or acting as the leaders of Open Access Week on campuses all over the world, librarians are heroes of open knowledge.

Universities still brag about the size of their libraries. But as digitization has taken hold, a new metric of knowledge dominance has taken hold: How big is your institutional repository?

Universities are attempting to ensure that all of their research is publicly accessible. This is happening internally thanks to librarians, and externally because of the long and growing list of funding mandates (for instance, all NIH-funded research must be made freely available). Those that fund science (i.e. taxpayers) have finally clued into the absurdity that even though they've paid for the research, it's locked behind paywalls and they can't access it. One way to meet the funding mandates is to establish campus open access (OA) repositories.

The vast majority of academic publishers allow academics to legally post their preprints (the papers have the same information but are not typeset by the publisher). For example, Western University has Scholarship@Western, where you can download tens of thousands of papers for free. There are, however, many more papers written by my colleagues than that number. Librarians at all universities are struggling with how to upload millions of manuscripts under numerous license agreements while also linking metadata to make them discoverable. Doing this manually requires around 15 minutes per manuscript by an experienced librarian. The time and cost to do this campus-wide are prohibitive even at wealthy schools — let alone every campus in North America.

To reduce the time and costs of this process and to harvest all past work, install this free and open source software: aperta-accessum. It sounds a bit like a magical enchantment, and like magic, aperta-accessum does seven things:

  • Harvests names and emails from a department's faculty webpage
  • Identifies scholars' Open Researcher and Contributor IDentifiers (ORCID iDs)
  • Obtains digital object identifiers (DOIs) of publications for each scholar
  • Checks for existing copies in an institution's OA repository
  • Identifies the legal opportunities to provide OA versions of all of the articles not already in the OA repository
  • Sends authors emails requesting a simple upload of author manuscripts
  • Adds link-harvested metadata from DOIs with uploaded preprints into a repository.

Western University chose to use the bepress repository but there are many other repositories including open source ones that are even easier to augment. This is where the open source community could really help. If all universities that already have repos use aperta-accessum on their own campuses, most academic papers will be free for anyone that wants to access them. That could be a powerful force for accelerating innovation.

Aperta-accessum logo

(Emily BP, CC-BY-SA 4.0)

The aperta-accessum source code housed on the Open Science Framework is released under the GNU General Public License (GPL) 3.0. It can be freely modified. You can learn more about it in the open-access study in the Journal of Librarianship and Scholarly Communication. In the article, we show that in the administrative time needed to make a single document OA manually, aperta-accessum can process approximately five entire departments' worth of peer-reviewed articles!

Aperta-accessum is an open source OA harvester that enables institutional library's stewardship of OA knowledge on a mass scale. It also radically reduces costs and could also improve science as scientists would have access to the information that would push their work forward. So give your favorite university librarian the gift of aperta-accessum. If your school or alma mater has a different type of repo, please consider sharing a little of your time to customize aperta-accessum for their repo, too.

Joshua Pearce
Joshua M. Pearce is the John M. Thompson Chair in Information Technology and Innovation at the Thompson Centre for Engineering Leadership & Innovation.
Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.