Calling all data geeks and journalists! Testing is now open on DocHive, an open source Ruby on Rails project for capturing data from image-based PDFs.
In February of this year, I reported that the Raleigh Public Record—a local, online news publication in Raleigh, NC—was in the process of creating an open source solution to extract data from PDFs. The problem many news journalists have is easily and quickly (which is very important given the... Read more
A group of journalists are announcing the launch of their breakthrough open source solution for the problem many writers and journalists have of how to take data in PDFs or images and easily convert it to a spreadsheet or other usable format.