What "open data" means—and what it doesn’t

Image by:

Opensource.com

Last week, an article in the Wall Street Journal talked about the Open Data Partnership, which “will allow consumers to edit the interests, demographics and other profile information collected about them. It also will allow people to choose to not be tracked at all.” The article goes on to discuss data mining and privacy issues, which are hot topics in today’s digital world, where we all wonder just how much of our personal data is out there and how it’s being used. These are valid concerns being talked about in other, more appropriate fora. I, however, would like to address my personal pet peeve about the dilution of the term open data.

The Open Knowledge Definition says it this way, “A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.” Generally, this means that the data should be released in a format that is free of royalties and other IP restrictions. The problem is that an increasing number of people are using the term open data to mean publicly available data.

In the article, the CEO of the startup directing the Open Data Initiative says the goal is to “be more transparent and give consumers more control” of the data that is collected and shared. Providing a mechanism in which consumers can decide what information can be made available to advertisers is a laudable goal. However, this “open data” initiative focuses on what data is made available, when open data is really about how data is made available. This definitional shift is a problem, particularly for governments that are implementing data policies.

Simply put, all open data is publicly available. But not all publicly available data is open.

Open data does not mean that a government or other entity releases all of its data to the public. It would be unconscionable for the government to give out all of your private, personal data to anyone who asks for it. Rather, open data means that whatever data is released is done so in a specific way to allow the public to access it without having to pay fees or be unfairly restricted in its use.

In a previous article, I wrote about how the Massachusetts Bay Transit Authority (MBTA) opened up their transit data to software developers. Within 2 months, six new trip planning applications for bus and train riders had been built at no cost to the MBTA. That’s the power of open data. It was data produced by the government which was released to the public in an open format (GTFS) for free, under a license that allowed for use and redistribution.

Why does this matter? If open data is misunderstood as releasing any and all data to the public, people will become opposed to the concept due to their concerns about privacy. What we, as policy advocates, want to encourage is that the data that governments do and should publish is done so in a way to ensure equal public access by all citizens. In other words, you shouldn’t have to buy a particular vendor’s product in order to be able to open, use, or repurpose the data. You, as a taxpayer, have already paid for the collection of the data. You shouldn’t have to pay an additional fee to open it.

We’ve all seen, from the recent news about Wikileaks, that there are real privacy and/or security concerns with putting all the government’s data out there, but that is a separate issue and shouldn’t be confused with open data. Whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.

Am I being nitpicky about the term? Maybe. But we’ve seen from other tech policy battles that good definitions are crucial to framing the debate.