dochive

New phase of DocHive, open source tool for data extraction

open source tool for data extraction

In February of this year, I reported that the Raleigh Public Record—a local, online news publication in Raleigh, NC—was in the process of creating an open source solution to extract data from PDFs. The problem many news journalists have is easily and quickly (which is very important given the nature of their job) converting data and images into a usable format from documents they use for their reports (see an example here).

The project, DocHive, is now phasing into the next cycle of development under the leadership of Edward Duncan. He tells us what he has planned for his team over the next six months. But first, I asked: » Read more

1 Comment

Journalist creates open source solution to extract data from PDFs

open up book

A group of journalists are announcing the launch of their breakthrough open source solution for the problem many writers and journalists have of how to take data in PDFs or images and easily convert it to a spreadsheet or other usable format. » Read more

4 Comments