Import your files from closed or obsolete applications

An interview with Italo Vignoli of the Document Liberation Project.
180 readers like this.
Open Data Policy

Opensource.com

One of the biggest risks with using proprietary applications is losing access to your digital content if the software disappears or ends support for old file formats. Moving your content to an open format is the best way to protect yourself from being locked out due to vendor lock-in and for that, the Document Liberation Project (DLP) has your back.

According to the DLP's homepage, "The Document Liberation Project was created to empower individuals, organizations, and governments to recover their data from proprietary formats and provide a mechanism to transition that data into open and standardized file formats, returning effective control over the content from computer companies to the actual authors."

I recently interviewed Italo Vignoli, director of the Open Source Initiative and a co-founder of The Document Foundation, by email to learn more about DLP's work. DLP is a project of The Document Foundation, which oversees the open source LibreOffice productivity suite.

I was curious about how DLP promotes interoperability and enables individuals and governments to recover data created in proprietary applications.

Italo says, "the objective of the Document Liberation Project is to develop import filters—in the form of software libraries—for legacy and current proprietary formats to convert them to the standard ODF document format by importing them into LibreOffice. For instance, Microsoft Visio files can be opened by Draw and saved as standard ODG files to be perpetually and freely available."

DLP libraries enable users to import files created in numerous proprietary and obsolete applications, including Adobe Freehand and PageMaker; Apple Keynote, Numbers, and Pages; Corel WordPerfect and Draw; Lotus 1-2-3; Microsoft Publisher and Works; QuarkXpress; Quattro Pro; Zoner Calisto; StarOffice; Macintosh files; and e-book formats. DLP's import libraries are also used by Abiword, Calligra, CorelDRAW File Viewer, Inkscape, LibreOffice, and Scribus. Ideas for other translations are suggested by members of the project; there is an extensive list of proposed formats on the DLP wiki.

Italo says we can be certain ODF and other open formats will still exist in 10 or 20 years because the standards are "thoroughly documented and therefore can be easily maintained even by people who have not been involved in their development and early evolution." In contrast to proprietary standards, which represent the commercial strategy of a company—meaning no one apart from the company can change them, he says, "open standards reflect the community interests, as they are developed by groups or consortia … in a transparent way to allow the community (of experts) to contribute to the development and the evolution over time." Because the ODF standard is managed by OASIS, a technical committee with a diverse membership, and available from ISO, an independent global standards organization, he says, "ODF is in good and secure hands."

DLP is coordinated by a core group of developers who do a large amount of coding along with several contributors who work on specific libraries based on their interests, Italo says. Creating these import and export libraries can be difficult, says Italo, because many proprietary file formats don't have public documentation—and even generating controlled sample files can be a challenge. He says, "it is necessary to reverse engineer the binary file formats—which can be particularly tricky when the structure of the file is not known."

To make this simpler, Valek Filippov and other DLP developers created OLE Toy, a Python graphical tool "that helps to unwind what can often end up being several nesting containments and [provides] helpful highlighting and debugging tools to make reverse engineering easier," Italo says. "Some file formats are a pure stream of somewhat random object serializations and the structure is much harder to deduce."

If you'd like to learn more, The Document Liberation Project maintains an IRC channel for general interest and one for developers. You can follow the project on Twitter, Facebook, or the project's blog and learn more about contributing on the project's website.

User profile image.
Educator, entrepreneur, open source advocate, life long learner, Python teacher. M.A. in Educational Psychology, M.S. Ed. in Educational Leadership, Linux system administrator.

3 Comments

Nice project, but I notice that there is no support for the old Microsoft Word for DOS (.DOC files, but different from the ones generated by Word for Windows). Microsoft deprecated and abandoned their import filters many years ago, so reading these documents is pretty unreliable these days,

You disparage ODF for not supporting these files, yet it's Microsoft who abandoned them. If there are those who have such files, ODF is who to contact. One thing they would need is a number of these old files to work with. It might be that these old files are not very difficult to sort out.

In reply to by David C.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.