Open source and humanities in the digital age

Image by:

On Flickr, CC BY-NC 2.0.

Welcome to the first installment of a monthly feature where I explore how open source software and the open source way are used in the digital humanities. Every month I will take a look at open source tools you can use in your digital humanities research and some humanities research projects that are using open source tools today. I'll also cover news about how transparency and open exchange, and the other principles of the open source way, are being applied to the humanities.

Let's start with an explanation of the digital humanities. The digital humanities is where traditional humanities scholarship—or, the academic study of arts, language, history, and the like—meets the digital age. By using technology in new and innovative ways, digital humanities scholars can create research projects that explore topics in ways that were not possible (or were extremely laborious undertakings) before computers.

Text/data mining, visualization, information retrieval, and digital publishing are some of the key features of digital humanities research. With computers, it is possible to visualize the connection between famous figures in British history or find patterns in an author's complete body of work, like the University of Toronto's Ian Lancashire and Graeme Hirst did with the works of Agatha Christie (PDF), where vocabulary changes in her later works suggest she had Alzheimer's disease.

In February this year, several interesting developments came to pass—new versions of software, new tutorials, the development of a new course, and a museum journal became open access. I've highlighted them each below. Perhaps one will inspire you in your understanding or research of the digital humanities.

R client for the Internet Archive API

The Internet Archive is a massive collection of freely available material, providing a wide variety of research possibilities for humanities scholars. To facilitate using the materials in the Internet Archive, Lincoln Mullen, Assistant Professor at George Mason University, has developed a package for R which uses the Internet Archive API to search for items, download metadata, and retrieve the associated files. Mullen's Internet Archive R package is available on GitHub.

Learn how to use OpenRefine

Formerly known as Google Refine, OpenRefine is a powerful tool for exploring and cleaning up large data sets. Let's say, for example, that you have a data set that is contains, gender, age, and a favorite book, and you wanted to start analyzing the data for trends. Before you do that, you'd really need to make sure the data is consistent. For instance, you would want to avoid Pride & Prejudice and Pride and Prejudice as two separate entries in your charts and graphs. OpenRefine helps you make those kind of fixes.

There are plenty of resources out there for learning how to use OpenRefine, but dh-lib Review has compiled a resource containing two recent tutorials. The first is Michigan State University Digital Scholarship Librarian Thomas Padilla's materials for a workshop on "Data Preparation for Digital Humanities Research." The other is a recording of a training webcast conducted by Texas A&M University's Elizabeth Grumbach and University of Texas's Jennifer Hecker. Both resources will help you learn how to use OpenRefine and make it easier for you to clean up your data sets.

Krikri and Heiðrún from the Digital Public Library of America

Metadata is a crucial part of information in the digital age. Harvesting, editing, and aggregating metadata can be a complex process, but there are tools that can help. One such tool is Krikri, a "Ruby on Rails engine for metadata aggregation, enhancement, and quality control" developed by the Digital Public Library of America (DPLA). Just released is version 0.1.3, and despite the low version number, it already has a nice feature set. Krikri is a component of Heiðrún, the DPLA's metadata ingestion system. Both Krikri and Heiðrún are released under the MIT License.

Benaki Museum Journal now open access

Greece's Benaki Museum has announced that their journal, Benaki Museum Journal is now an open access journal. The process of opening up the journal is a collaboration between Benaki Museum and the National Documentation Centre of Greece (EKT) which began in 2014. The collaboration has now produced a viable online, open access web presence for the journal (article in Greek). While there is not yet much open access content on the website, they do have two older volumes (2008 and 2009) online. In addition to being open access, the journal's web presence makes use of Open Journal Systems, an open source package designed for open access journals. So if the subject interests you, check out Benaki Museum Journal's website, read their articles, and keep Open Journal systems in mind if you ever need an online platform for publishing an open access journal.

New course from the Roy Rosenzweig Center for History and New Media

The Roy Rosenzweig Center for History and New Media is developing a new course called Teaching Hidden History. The first time the course will be offered is during the Summer of 2015. The course will be a hybrid offering with online and in-person components. According to the Center for History and New Media's announcement, "[t]he course integrates digital history, history education, and best practices in teaching and learning history." Given the Center's role in the development of a large number of digital humanities tools, including Zotero, Omeka, and PressForward, this course should provide a wonderful opportunity for students to develop their skills and learn about doing history the open source way.

This is a monthly column on the state of open digital humanities. If you have news pertaining to this topic that you would like to share, please send an email to Joshua Allen Holm. If you would like to contribute an article on this topic, please send your submission to the Opensource.com editorial team.