What "open data" means—and what it doesn’t

No readers like this yet.
open source button on keyboard

Opensource.com

Last week, an article in the Wall Street Journal talked about the Open Data Partnership, which “will allow consumers to edit the interests, demographics and other profile information collected about them. It also will allow people to choose to not be tracked at all.” The article goes on to discuss data mining and privacy issues, which are hot topics in today’s digital world, where we all wonder just how much of our personal data is out there and how it’s being used. These are valid concerns being talked about in other, more appropriate fora. I, however, would like to address my personal pet peeve about the dilution of the term open data.

The Open Knowledge Definition says it this way, “A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.” Generally, this means that the data should be released in a format that is free of royalties and other IP restrictions. The problem is that an increasing number of people are using the term open data to mean publicly available data.

In the article, the CEO of the startup directing the Open Data Initiative says the goal is to “be more transparent and give consumers more control” of the data that is collected and shared. Providing a mechanism in which consumers can decide what information can be made available to advertisers is a laudable goal. However, this “open data” initiative focuses on what data is made available, when open data is really about how data is made available. This definitional shift is a problem, particularly for governments that are implementing data policies.

Simply put, all open data is publicly available. But not all publicly available data is open.

Open data does not mean that a government or other entity releases all of its data to the public. It would be unconscionable for the government to give out all of your private, personal data to anyone who asks for it. Rather, open data means that whatever data is released is done so in a specific way to allow the public to access it without having to pay fees or be unfairly restricted in its use.

In a previous article, I wrote about how the Massachusetts Bay Transit Authority (MBTA) opened up their transit data to software developers. Within 2 months, six new trip planning applications for bus and train riders had been built at no cost to the MBTA. That’s the power of open data. It was data produced by the government which was released to the public in an open format (GTFS) for free, under a license that allowed for use and redistribution.

Why does this matter? If open data is misunderstood as releasing any and all data to the public, people will become opposed to the concept due to their concerns about privacy. What we, as policy advocates, want to encourage is that the data that governments do and should publish is done so in a way to ensure equal public access by all citizens. In other words, you shouldn’t have to buy a particular vendor’s product in order to be able to open, use, or repurpose the data. You, as a taxpayer, have already paid for the collection of the data. You shouldn’t have to pay an additional fee to open it.

We’ve all seen, from the recent news about Wikileaks, that there are real privacy and/or security concerns with putting all the government’s data out there, but that is a separate issue and shouldn’t be confused with open data. Whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.

Am I being nitpicky about the term? Maybe. But we’ve seen from other tech policy battles that good definitions are crucial to framing the debate.

User profile image.
Melanie Chernoff | As Public Policy Manager for Red Hat, Inc., Melanie monitors, evaluates, and works to influence U.S. and international legislation and government regulations affecting open source technologies and open standards. She also serves as chair of the company's Corporate Citizenship committee, coordinating Red Hat's charitable activities.

8 Comments

Nicely said!

Excellent post Melanie! Any chance we can cross-post this on the Open Knowledge Foundation blog (blog.okfn.org)?

A small correction: 'open data' is defined in the Open Knowledge Definition not the Open Data Commons project:

http://www.opendefinition.org/

Open Data Commons is a project to provide legal tools for making data open, such as the PDDL (public domain), ODC-BY (attribution) and the ODbL (sharealike):

http://www.opendatacommons.org/

Hi Jonathan,

Thanks for the correction. I've updated the post.

Yes, feel free to cross-post. Unless otherwise stated, all opensource.com is licensed under Creative Commons' CC-BY-SA license, so you don't have to ask.

thx,
mel

I'm glad others are taking up this issue. In July I posted a bit of a rant about the Canadian cities' "open data" license that wasn't: <a href="http://zzzoot.blogspot.com/2010/07/its-not-open-data-so-stop-calling-it.html">It's not Open Data, so stop calling it that...</a> that some in the Canadian city Open Data community identified me as a pooh-pooh'er of the wonderful things the cities were doing. I tried to explain what the problems were in August <a href="http://zzzoot.blogspot.com/2010/08/what-is-open-gov-data-sunlight.html">What is Open Gov Data? The Sunlight Foundation: Ten Principles for Opening Up Government Information</a>. Then, in November, after Tim Berners-Lee announced his five star system for open government data I repeated the incorrect conflation of publicly available data and Open Data, <a href="http://zzzoot.blogspot.com/2010/11/canadian-open-data-cities-no-stars-in.html">Canadian "Open Data" Cities: 'No stars' in Tim Berners-Lee Five Star Rating for Open Government Data</a>.

We need to work to get this fixed. It de-values the Open Data movement, but more importantly by not being Open Data, it <b>allows politicians and technocrats to legally block access for applications, companies and people they do not like</b>. If it is not Open, then it is not transparent.

Excellent post Mel.

Your explanation is very clear and helpful, thanks.

I work on <a href="http://davidpidsley.com/">linked communities and open data</a> and am currently in correspondence with the <a href="http://www.charity-commission.gov.uk">Charity Commission</a> about the <a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/">Open Government Licence</a>.

The Charity Commission registers and regulates charities in England and Wales. They offer advice and provide services and guidance to help charities run as effectively as possible. They keep the online Register of Charities, which provides information about each of the thousands of registered charities in England and Wales.

All the charity data is publicly available (see <a href="http://www.charity-commission.gov.uk/SHOWCHARITY/RegisterOfCharities/CharityWithoutPartB.aspx?RegisteredCharityNumber=1110906&SubsidiaryNumber=0">this example</a> on the Commission's website) but not all publicly available charity data is Open.

In fact, I am not sure if it gets 1-star by Tim Berners-Lee's suggested <a href="http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/">5-star deployment scheme</a> for Linked Open Data: you can see their <a href="http://www.charity-commission.gov.uk/misc/copyright.aspx">Copyright notice, disclaimer and privacy statement</a> on their website footer.

Looking forward to reading more from your blog soon.

Kindest regards,

<a href="http://davidpidsley.com">David Pidsley</a>
<a href="http://twitter.com/davidpidsley">@davidpidsley</a>

Really like this post: nice definition and explanation why a definition is important. Great picture to explain your definition: a good visualization shows explains more than a thousand words. Unluckily the size of the bubbles doesn't yet represent reality: but the size of the green open data bublle is growing!

I like to to stimulate the use of open data in order to create and share insights. Regarding your definition I got two remarks. What is your opinion?

<strong>Open data is not limited to government data</strong>: business can profit from the open data concept as well. And in many countries government has shifted tasks from government to (semi) private companies. So where is the difference?
<strong>Open data means free to access, but doesn’t always mean easy to access format</strong>: the use of open data just started. As a consequence much open data is (unintentionally) published in hard to reuse format (e.g. data table in pdf instead of csv). Developers should take the challenge to re-use the data anyway, in order to boost the use and proof the power of open data. But of course I encourage all efforts to make open data easy to access!

Keep on the great work!
Jacob Houtman

Remark: read my full comment the site http://DataAlchemist.com

Hi Jacob,

I agree with you, and I did not mean to give the impression that only governments can have "open data." It's just that I focused on the government context because this is the government channel for opensource.com. Any entity, including a business, could release their data in an open fashion.

The term "open data," to me, simply means that data is released to the public without IP restrictions that would prevent use, copying, redistribution, etc. Anyone can open up their data.

I agree with you that even "open data" is sometimes released in a hard-to-access format. I don't think that would prevent it from being described as open, but as a practical matter, it's still a pain. I also would advocate for data being released in open, machine readable/searchable formats.

btw, here's a great interview with Tim Berners-Lee on this topic:
http://www.huffingtonpost.com/alexander-howard/tim-bernerslee-on-wikilea_b_798671.html

This captures really well the definitions and what open data means in the context of data sharing & re-use.

Would be really great to re-emphasize the Open Gov aspect of the Open Government Data, and that Open Data does not necessarily equal Open Gov Data (i.e. it could be data freely shared by a commercial organization), just like the primary focus of your article that all Gov Data does not mean Open Data. Only the Open Gov Data addresses both the "open" and "gov" attributes of this data. A Venn diagram to illustrate: http://www.port25.ca/?page_id=866

Thanks!

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.