Doc like an Egyptian: Managing project documentation with Sphinx

Image by:

April Killingsworth. Modified by Rikki Endsley. CC BY-SA 2.0.

At the 14th annual Southern California Linux Expo (a.k.a., SCaLE 14x), Dru Lavigne will discuss common "gotchas" associated with creating and maintaining documentation, and she'll talk about available open source tools. She'll also provide an overview of Sphinx, an open source documentation generation system originally created for the new Python documentation.

In this interview, she explains how Sphinx is different from other open source solutions, and what kinds of projects should consider migrating their docs.

Why did the PC-BSD, FreeNAS, and Lumina documentation projects move to Sphinx?

When I became responsible for the maintenance of the PC-BSD documentation back in 2010, I inherited an existing documentation wiki that contained a lot of user-generated content, most of which was several years out of date. Shortly thereafter, I also became responsible for creating the new FreeNAS documentation, so it made sense to also make a documentation wiki for that project.

Over time, the shortcomings of the wiki approach for maintaining updated and versioned documentation became apparent:

While the main purpose of a wiki is to invite user contributions and to provide a low barrier to entry, very few people come to write documentation (however, every spambot on the planet will quickly find your wiki, which creates its own set of maintenance issues).
Wikis are designed for separate, one-ish page infobytes, such as how-tos. They really aren't designed to provide navigation in a Table of Contents or to provide a flow of Chapters, though you can hack your pages to provide navigational elements to match the document's flow. This gets more difficult as the document increases in size—our guides tend to be 300+ pages. It becomes a nightmare as you try to provide versioned copies of each of those pages so that the user is finding and reading the right page for their version of software.
While wiki translation extensions are available, how to configure them is not well documented, their use is slow and clunky, and translated pages only increase the number of available pages, getting you back to the problems in the previous bullet. This is a big deal for projects that have a global audience.
While output-generation wiki extensions are available (for example, to convert your wiki pages to HTML or PDF), how to configure them is not well documented, and they provide very little control for the layout of the generated format. This is a big deal for projects that need to make their documentation available in multiple formats.

We spent a few years hammering various kludges into our existing wiki infrastructure to convince it to create what we needed: large, versioned, translated documents in various formats. We also spent a good amount of time researching alternatives. While researching, we had these goals in mind:

must support a Table of Contents structure and be able to produce multiple formats, preferably through integration with a source build infrastructure;
must integrate seamlessly into a translations infrastructure;
should provide a low barrier to entry for both doc writers and translators.

In our research, we found that barrier of entry tended to be inversely proportional to the quality and number of available output formats.

Sphinx provided a good middle ground in that its syntax is almost as easy to learn as a wiki syntax, it supports integration into existing source repositories, as well as build and translation infrastructures, and it provides decent control over output layout, although that varies upon the format.

What are a couple of the big lessons from migrating to Sphinx?

As an experiment, I first migrated the existing FreeNAS documentation. Since by that point we were maintaining both the wiki and a master OpenOffice document (for generating HTML and PDF), I found a script that converts .odt to .rst (the format used by Sphinx). Having never used either .rst or a Python conf.py before, I spent some time learning how to build an HTML version and experimenting with various themes and extensions. I then spent about a month cleaning up the migrated .rst files, learning as I went how the various tags worked and the best way to layout our documentation tree. As with any migration script, not everything migrated cleanly, which gave me an opportunity to figure out how tables are formatted and which tags controlled which layout.

After that first migration, I had a good understanding of which tags our documentation used, which extensions were useful, and which theme we liked. I then used this knowledge to migrate the PC-BSD documentation. This time I used a different migration script, which did its tags a little differently. This gave me the opportunity to discover tags I hadn't seen before and to decide which ones I liked best in order to standardize between the two documentation projects. The second migration took less than a week. By the time we had a need for the Lumina documentation project, I created it directly using Sphinx and it took less than an hour to set up the documentation tree, the build infrastructure, themes, and extensions so that I could start writing docs from scratch.

Having gone through this process, I would recommend the following:

If you plan to migrate an existing documentation set, find a migration script for your current format and give yourself time to play with tags, themes, and extensions.
Write a README that contains instructions for doc writers and users who want to create their own formats from your doc source. This should include any software that needs to be installed and a list of the tags used by your doc project—you'll know what these are by the end of your migration.

How is Sphinx different from other open source solutions?

While Sphinx is easy to learn, it does have its quirks. For example, it does not support stacked tags. This means, for example, you can not bold italic a phrase using tags—to achieve that requires a CSS workaround. And, while Sphinx does have extensive documentation, a lot of it assumes you already know what you are doing. When you don't, it can be difficult to find an example that does what you are trying to achieve.

Sphinx is well suited for projects with an existing repository—say, on github—a build infrastructure, and contributors who are comfortable with using text editors and committing to the repo (or creating, say, git pull requests).

For those projects who want control over the look and feel of their documentation beyond the built-in or available themes, access to a CSS guru is useful.

It is probably overkill for projects with a small documentation set that does not need version control, translations, or multiple published formats.

Which project stand out as having exceptional documentation? And which ones would benefit from a documentation overhaul?

Having been responsible for documentation for many years, I'm hesitant to point to good and bad examples of documentation. Documentation, for any project, is hard and time-consuming. Software is a moving target and software users range in their skill set and, therefore, have widely different documentation needs. In this respect, no documentation is ever finished—or truly up-to-date—and that is just the nature of the documentation game. The best we can do is try to make it easy, and compelling, for contributors to assist in keeping the documentation up-to-date, useful, and in the languages and formats that the software's usersbase requires.