Open source Booktype removes barriers to collaboration

Why Amnesty International uses Booktype 2.0 for report publishing

Image by : 

opensource.com

x

Get the newsletter

Join the 85,000 open source advocates who receive our giveaway alerts and article roundups.

Human rights NGO Amnesty International, a movement of more than seven million people, released its Annual Report for 2014-15 at the end of February. This 500+ page print book is published simultaneously in English, French, Spanish, and Arabic, and translated into 12 other languages by local teams. It is composed of 160 detailed chapters written by regional experts on the human rights situation in most of the countries of the world.

Richard Swingler, Global Production Services Manager at Amnesty's International Secretariat in London, knew that conventional publishing workflows weren't scaling to meet the challenges of a world in which a diverse range of reading and interchange formats are required. This factor is combined with the constant requirement of ensuring that Amnesty's reports are accurate, consistent, and up-to-date. Over the northern hemisphere's winter of 2014-15, the authors and translators had to work to a condensed timetable while producing new output formats, including XML for Adobe InDesign and XHTML for the Amnesty International websites.

Swingler led the search for a tool that would handle a structured, XML-based workflow with as much automation as possible while remaining easy to use for book contributors. Because the decision had already been made for Amnesty to switch from Quark XPress to Adobe InDesign for producing magazine-style layouts, Adobe InCopy was an option; however, contributors were used to working in Microsoft Word and were not keen to abandon it in the short term, especially given the condensed publishing timetable this year. The tool of choice also needed to be extensible at short notice to cover requirements that emerged during the adoption of a new workflow.

These challenges lead Amnesty International to select Booktype 2.0 and deploy the software in production during late 2014. Booktype is open source software released by nonprofit Sourcefabric under the GNU AGPL v3 license intended for web services. A Django application running on a typical web stack including Linux, Apache, PostgreSQL, and Python, Booktype was originally written for book sprints hosted by the Flossmanuals documentation community.

English book cover and the Booktype software

The concept of book sprints is radical in the publishing industry and borrows from the experience of coding teams who work collaboratively in parallel, using intensive bursts to deliver a product. This rapid book production model, based on automated server-side publishing and real-time communication between distributed contributors, suited the time-critical and worldwide nature of Amnesty's work.

The Booktype developer team extended their browser-based software to meet Amnesty’s complex requirements, trained the editors and translators, and supported the book production process with on-the-fly enhancements and bug fixes. A number of other open source projects were integrated into the Booktype system to provide critical features, including LXML for document processing, EbookLib for EPUB3 processing, and wkhtmltopdf with WebKit for PDF proof rendering.

With so much information to integrate from contributors around the world, and so many languages and formats required, a structured publishing system based on XML chapter templates and book skeletons helped organize the data and ensured consistency across outputs. A custom interface was deployed so that Amnesty project managers could enable upload access for chapter editors and see the status of each book and chapter at a glance.

The Booktype web interface enables editors to review the latest corrections and comments on chapters automatically imported from uploaded Microsoft Word documents, correct text and tag formatting using the Aloha inline editor, and export to InDesign XML, EPUB, PDF proof and website XHTML formats with a single click.

Some of the simplest code changes made the biggest difference to Amnesty’s authors and translators. The book history feature used a pop-up window to enable comparison of chapter revisions side by side, but in production use it was soon discovered that the amount of vertical browser scrolling required was impractical. As pretty as the pop-up feature appeared in a demo, it was a productivity killer and harmed user experience. Booktype developer Helmy Giacoman added a button that opened the chapter revisions in a new full-sized browser tab, and within a few minutes the book contributors were able to get writing again.

As part of the support for contributors used to working with Microsoft Word, the Booktype team developed and released a new open source software library, Python-OOXML. This library handles .docx import and conversion to HTML, including the transfer of comments made in Word. Like Booktype itself, Python-OOXML is available as a free download from GitHub for other software developers to integrate into their publishing solutions. Improvements to the code from the Python community are actively encouraged in the form of pull requests.

To ensure high print standards and total design control over Amnesty’s most important publication, XML was exported from Booktype and imported to an Adobe InDesign template for layout approval. Booktype’s HTML formed a bridge between the XML format in the Word documents and the dialect expected by InDesign, mapping styles automatically and fixing incompatibilities including bi-directional English and Arabic text on the same line.

The quality of XML in the Word documents contributed to the publications varied enormously, from the easily parsed to the barely readable, and was especially badly formatted after being run through proprietary translation tools. While investigating one translated chapter of just under 5,100 words with formatting issues, the Booktype team found it had just two lines of XML in the main document file, but the second line had more than 380,000 columns. That works out at 75 characters per word; for each character typed by an author, there were another 10 characters added by software. Booktype lead developer Aleksandar Erkalović implemented a post-processing filter, which tidied the HTML output and made it readable again. This cleaned-up HTML made it feasible for editors to fix formatting errors in the book sources, rather than returning to Word and re-uploading XML.

The Booktype team is now looking at improving automated document rendering with mPDF, an open source PHP project with support for pre-press features including CMYK or spot colours, and the bleeds and crop marks required for edge-to-edge printing of backgrounds. The recently released mPDF 6.0 can use OpenType layout tables to perfect the display of complex scripts, including Arabic, Khmer, Thai, Vietnamese, and Hebrew. This combination of Booktype and mPDF has the potential to generate high-design-quality documents for global languages automatically, without the need to export to InDesign for manual adjustment.

After the book launch hit the newswires, Richard Swingler commented, “We’ve been able to make a very significant improvement to Amnesty’s Annual report production workflow. Booktype is intuitive and easy to use, and makes complete sense for an organization where there are many different stakeholders in different locations. I’m aiming to build on the strong foundations that have been built to explore how we can further collaborate.”

About the author

Daniel James
Daniel James - Daniel James is the director of 64 Studio Ltd, a company developing GNU/Linux products for OEMs and R&D labs. He was one of the founders of the linuxaudio.org consortium, which promotes the use of GNU/Linux and Free Software in the professional audio field. Daniel is the author of Crafting Digital Media (Apress) and has contributed to LinuxUser & Developer, Linux Format and Sound on Sound magazines.