Git and GitHub for open source documentation

Image by:

Flickr user: theaucitron (CC BY-SA 2.0)

We use git extensively for documentation in OpenStack so that we can "treat the docs like the code"—and I’m seeing this trend many places especially in the Write the Docs community.

Check out these talks from the 2014 Write the Docs event in Portland:

In OpenStack, we have documentation workflows that mimic our code collaboration. We post patches for people to review, we review each other’s patches (similar to a pull request on GitHub), and we have build and test automation for every doc patch. The idea is, use the collaboration present in the GitHub pull request workflow for docs as well as code. We're all responsible for relevant and accurate documentation for about 25 OpenStack projects written in Python across 130 git repositories, so let's collaborate together. I do get questions from writers who are getting started with these types of workflows, so I wanted to bring together some of the best practices we've found, and find more.

How do you process many pull requests for docs?

OpenStack is a popular open source project, so the documentation needs to scale for many contributions and contributors. We have systems in place that let us merge as many as 50 doc patches a day, though typically it's about 15. Since OpenStack uses Gerrit, some of my tips are specific to that web-based team software code review tool and not GitHub. But, these are guidelines that should apply universally. I'd also highly recommend "How do project owners use pull requests on GitHub" which provides survey results and pulls out themes for how integrators use pull requests as additional reading.

We process many patch requests (in OpenStack they aren't technically pull requests) with these tools and processes:

Gate tests: robot helpers
Technical accuracy: human checkers
Bug links in the commit message
Commit message standards
Conventions
Dashboard for all open reviews
Calendar item to remind and block time

If I'm missing your favorite helper, please comment below!

Gate tests: robot helpers

We have "gate tests" that automate a lot of the initial quality checks for any contributions that our community makes. Our gate tests check these things:

"Do the docs in this patch build?"
"Is the syntax correct in all the patched files?"
"Are all the links working?"
"Does the patch delete any files that are used by other deliverables?"

These four tests must be passed to allow merging at all, so they are a true gatekeeper for us. In an additional efficiency gain, these tests are only run if it's relevant to the patch. So if no files were deleted in a patch, the deletion test isn't run. This saves humans the time of bringing the patch locally and running tests manually. There are other tests that report back but do not actually block a patch from going through the gate, such as "Are the translations still working with this patch?" Continuous integration (CI) for documentation is game-changing. I don’t think I can emphasize this point enough but this post is focused on scaling documentation reviews. If you want to read more about OpenStack’s CI systems take a look at ci.openstack.org.

Technical accuracy: human checkers

Testing the technical accuracy of a patch is the time-consuming part and it's hard to predict the amount of time it will take in a doc review. It is critical that a human double-check the technical accuracy of any contributions to the docs, and we rely on our reviewers here. Having environments set up that let you test user actions is part of being a reviewer, and will save you time. DevStack is a collection of scripts for running a configured development environment, tuned to a local set up. TryStack is a free public cloud run with donated hardware and resources. For example, if I have a working DevStack environment based on stable branches for OpenStack, I can run client commands against a known version of OpenStack. I can have admin access to a DevStack environment I run myself. I also have a TryStack account, which gives me free cloud resources through CLI or API commands. I also have a Rackspace Cloud account that lets me test API calls.

Bug links in the commit message

We have a shortcut system that lets a contributor put phrases like "Closes-bug:nnnnnn" in the commit message, where nnnnnn is from the Launchpad system (our bug tracker). This linkage between a doc bug and the contribution itself is handy for seeing if the patch addresses the concerns of the logged bug. I'll often review a patch by clicking through to the bug first, reading through the comments, then see if the patch fixes what was broken. You also want to be sure your process ensures that contributors know if a bug is accepted as a bug. In our system, it must be set to Confirmed. In GitHub, you'd need to use labels on issues, and also link to the issue from your pull request.

Commit message standards

In OpenStack we can merge as many as 70 patches (just for docs) in a day, so consistent commit messages make for easier scanning to find patches you can review with your base knowledge. Seems picky, but understanding that you can look at 50 in a day helps you understand the justification for pickiness. Here's a summary of our standard:

Provide a brief description of the change in the first line.
Insert a single blank line after the first line.
Provide a detailed description of the change in the following lines, breaking paragraphs where needed.
The first line should be limited to 50 characters and should not end with a period (commit messages over 72 characters will be rejected by the gate).
Subsequent lines should be wrapped at 72 characters.

Conventions

All reviewers know that we use a set of agreed-upon conventions for OpenStack docs, and we can point to these when reviewing a patch. This set of standards helps us check for consistent naming of services, capitalization for legal purposes, and structure conventions such as "no stacked headers." We publish those on the OpenStack wiki and any changes to it are discussed on the openstack-docs mailing list ahead of any sweeping changes. When the wiki page doesn't address a concern, we use the IBM Style Guide as our final arbiter in any style or convention questions.

Dashboard for all open reviews

With Gerrit we can do specific searches for relevant patches to review. For example, I've created a dashboard for all OpenStack docs repositories (link requires a login). You can also have a dashboard just for patches that impact the APIs. Really with the search feature you can tailor your dashboard to review what you want and prioritize as you like.

Calendar item to remind you to do reviews

With many meetings on our calendars, it's a good idea to set aside time for reviews. Sometimes people want to review at the end of their work day, or first thing to get a huge chunk of work done. It's really up to you to prioritize the reviews in your work day. I use a calendar item as a reminder plus to block off time for reviews.

What's an expected turnaround time for a review?

Really, this depends on the size of the patch or pull request, plus the number of reviewers who know enough to review that patch. We use data analysis to measure our turnaround time on reviews in OpenStack. In the last five months or so, we've done these sorts of turnarounds on reviews.

Total reviews: 2102 (70.1 per day)
Total reviewers: 113 (0.6 per reviewer per day)
Total reviews by core team: 1570 (52.3 per day)
Core team size: 21 (2.5 per core per day)

Honestly, we have one reviewer (Andreas Jaeger) who's superhuman, and our active core team size is more like 15 than 21. In OpenStack, core reviewers are the ones who can publish to the site. Anyone can review a doc patch. I'd like to get through at least 2-10 reviews a day. In a good review week I can get through about 60 reviews. So the expectations I'd set for our docs contributors is about 3-5 days to get reviewers comments back that they can address.

Summary

Writers might hesitate to work with others on deliverables, and developers might feel they have little to contribute to the documentation. I assert is that no one can know everything, so distribute the workload by writing together just like you collaborate together on code. I hope these tips help you match your docs workflow to git and GitHub workflows so you can accelerate your collaborative writing, especially in open source.

7 Comments

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.