New phase of DocHive, open source tool for data extraction

In February of this year, I reported that the Raleigh Public Record—a local, online news publication in Raleigh, NC—was in the process of creating an open source solution to extract data from PDFs. The problem many news journalists have is easily and quickly (which is very important given the nature of their job) converting data and images into a usable format from documents they use for their reports (see an example here).

The project, DocHive, is now phasing into the next cycle of development under the leadership of Edward Duncan. He tells us what he has planned for his team over the next six months. But first, I asked: » Read more

