Is open data living up to the hype? One data journalist weighs in

Image by:

Opensource.com

Journalism is one profession that has embraced open source. Open source enables smaller organizations with little or no budget to effectively extend their news gathering capabilities. It's not just smaller news organizations who've been adopting open source—The New York Times recently unveiled a new open source content management system.

But it's data journalism where open source has gained its biggest foothold. There a number of open source data collection and analysis tools that are used extensively in data journalism, and there is a growing the number of repositories related to data journalism on GitHub.

One organization that has effectively melded open source and data journalism is Journalism++. Co-founded by Nicolas Kayser-Bril, it's a European agency that specializes in what Kayser-Bril calls data-driven storytelling. According to Kayser-Bril, Journalism++ delivers the full package, from the data analysis to the story to the visualization.

In this interview, Kayser-Bril discusses why open source is the right choice for journalism organizations, how being open has benefited his clients, and why open data may not be the boon that it seems to be.

How did you get involved in data journalism?

I've been programming since I was five (my father was a computer scientist). It was just normal for me to do small interactive pieces at school, on my blog, and then at work. In 2007 and 2008, people from various media outlets started contacting me and wanted to pay me to code for them.

One of the guiding principles of Journalism++ is that "open beats closed. Open source is our default behavior." Why is that?

At Journalism++, we consider open source to be the normal behavior. Most of our team is fairly young and grew up with Linux distributions that they could install without too much pain. Open source was always on their computers. For them, proprietary software is the exception. Without sounding like open source activists, we believe that code should be public by default.

The only reason we might someday make our code proprietary is if we develop an algorithm that, in itself, provides a competitive edge for our business. But as of now, our core skills are data analysis and storytelling. It's not something you nurture by making code proprietary.

Has open source given Journalism++ a competitive advantage?

Being open by default has many advantages:

Many services are free for open source projects.
Having a large and well-maintained Github account is key to recruiting good developers.
Being transparent forces us to always think twice before posting a ticket or writing a commit message.

What are some of the technologies that you use to develop data journalism tools?

Our default stack is Angular.JS for the front-end and Django for the back end. You'll find both of those frameworks used in all of our projects. We also use a variety of open source libraries. Two projects Journalism++ worked, Data Wrapper (a web-based data visualization tool) and Feowl (which monitored power cuts in Cameroon), for example, were coded partly in PHP due to specific constraints imposed on those projects.

What licenses do you use for your work, and why?

By default, we use LGPLv3, and sometimes GPLv3. They're the licenses that most commercial projects use. It makes sense for Journalism++ to use them: others can build upon our code but they cannot create a whole new proprietary service based on it. We sometimes use the MIT license— for example, when the tool is financed by a foundation that wants to foster innovation.

With our client work, the client chooses the license but we insert GPLv3 by default in our contracts. Most of the time, our clients don't change that.

What advantages does using open source tools for data journalism offer news organizations?

Open source makes an organization more transparent and, therefore, more trustworthy. Newsrooms are moving towards open source; just look at the number of journalists using GitHub now!

The other aspect has to do with surveillance: users (myself included) tend to think of open source software as less prone to having backdoors, even if no audit has been performed on the software.

Are there any disadvantages to using open source data tools?

None that I know of.

Where can someone find the source code for the tools that Journalism++ has developed??

Here's our GitHub repository: https://github.com/jplusplus.

Do you imagine the tools that Journalism++ has developed being used outside of data journalism? If so, how?

Journalism++ works in data analysis and storytelling. Many industries need those skills, from communication departments in corporations, NGOs and institutions to business intelligence agencies. I prefer to think of all those different groups as information professionals. Those are the people we cater to.

More and more governments are opening up public data. What do you think are some of the problems that face data journalists or groups that want to use that data?

Governments don't open public data. They simply select a set of non-controversial material that they publish online. Most of the interesting pieces of information, ones that would let journalists assess the impact of public policy, are censored.

The trend is a move towards more opacity. In Europe and in the United States, it's getting harder and harder to file a freedom of information request. In some European countries, journalists have no chance of getting access to some data, even when it has nothing to do with national security.

Journalists should be extremely careful before reusing a dataset that was proactively published by a government. They need to ask themselves: Was a column removed? Why was this data set published? Are there other data sets on this issue that have not been made public?

What do you see as the future of open source in data journalism?

I'm no psychic, but I see no reason why data journalism teams that are embracing open source today should go proprietary tomorrow.

Comments are closed.

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.