Get the highlights in your inbox every week.
6 habits of excellence to gain contributors for your open source project
Community at the speed of light: Best practices for the new era of open source
Co-authored by Michael DeHaan, Co-founder and CTO of Ansible.
It’s likely that many of you have heard about Ansible. For those who haven't, it's an open source software project that radically simplifies the art of system automation.Ansible has become very popular in the DevOps community over the past year. It's currently one of the most popular Python projects on GitHub. Of the roughly 86,000 public Python repositories on GitHub, Ansible is #6 in stars and #3 in forks, putting Ansible in the top 0.01% of Python projects in both popularity and potential contributorship. Ansible is currently the 99th most forked project on all of GitHub, where several thousand new repositories are created every day. According to GitHub, there are already more lifetime contributors to Ansible (878) than to Chef (345), Puppet (324), and CFEngine (64) combined.
Not bad for a project that's only two years old.
There's no simple roadmap for the kind of success that Ansible currently enjoys; to some degree, successful projects succeed by being in the right place at the right time with the right idea. There are, however, some key open source principles that can now be regarded as "tried and true"; what were once theories in the open source world have become well-defined and repeatable practices. At Ansible, we have done our best to follow these practices from the very beginning.
Here, then, are some of the open source practices that have helped to make Ansible so remarkably successful in generating participation in such a relatively short period of time.
...but it’s not just about GitHub.
In the early days of open source, most projects revolved around the mailing list. Most discussions took place on-list, because it was the place that was guaranteed to be archived for all to see, and the self-documenting discussions helped make the decision-making process clear. Once those decisions were made, though, and it was time to submit the actual code, the contributor had to learn the specific mechanisms for contributing to their chosen project. Thus, if a contributor worked across several projects, they needed to learn several different ways of doing things.
Now there’s GitHub, and six million people use it. If your project is on GitHub, it means that no one has to learn special magic tricks to contribute to your project, because every project on GitHub works in basically the same way. In the time it used to take a user just to figure out a project’s contribution mechanisms, a user can now fork a repo, make a fix, and submit a pull request. The default instinct of new developers is no longer “suggest a change”—the instinct is now “fix the problem”.
Just because the infrastructure exists to make forking easier, that does not mean that those forks all produce contributions. At Ansible, a key metric that we use to assess the health of our project is contributors as a percentage of forks. Here's a look at contributors as a percentage of forks for the five most popular projects on GitHub, along with Ansible:
twbs/bootstrap: 71,079 stars / 26,488 forks / 573 contributors (2%)
jquery/jquery: 31,528 stars / 7,234 forks / 196 contributors (3%)
joyent/node: 31,499 stars / 7,023 forks / 544 contributors (8%)
mbostock/d3: 29,377 stars / 6,729 forks / 77 contributors (1%)
angular/angular.js: 27,566 stars / 10,040 forks / 936 contributors (9%)
ansible/ansible: 7,251 stars / 2,227 forks / 846 contributors (38%)
(all stats as of August 18, 2014)
If it were as easy as rolling out the welcome mat for pull requests, all of these larger projects would have thousands of contributors. Instead, the majority of projects, even successful ones, see a clear limit to contributorship. Having infrastructure that supports contribution is critical, but building a project for massive contribution goes beyond infrastructure choices.
Build your architecture around modularity and option value
In 2005, Carliss Baldwin and Kim Clark of Harvard Business School wrote a paper entitled, "The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model?" In this paper, they observed that open source projects with two particular properties were more likely to gain and keep contributors. Those two properties were high modularity and high option value.
Modularity is straightforward. A codebase with high modularity provides a simple framework of platform and modules. The platform supports the modules and provides well-defined rules for module development; modules may then be developed or modified independently according to those rules, thus allowing contributors to add value in corners of the project, with minimum investment.
Option value is a bit more complicated, but can easily be understood in the context of modularity. In a highly modular framework, some modules will clearly be better than others. There may even be competing modules designed to perform similar duties. A highly modular project with high option value allows users to choose some modules but not others, or even to rewrite particular modules. Not having to take the entire toolset, instead picking and choosing from a wide array of options, reinforces this value. More options means more ability to tolerate uncertainty over time.
Thus, a modular design with high option value allows users to identify ways of contributing immediately, and to see the value of that contribution quickly. And nothing drives successful contribution like successful contribution—which is exactly what has happened for Ansible. In browsing through Ansible's libraries, one can find over 230 different modules, the vast majority of which were co-developed and co-maintained by multiple community developers over time. The Ansible "batteries included" philosophy helps to ensure that as new modules are developed, they are tested and integrated in Ansible's core, so that the maximum number of users get the maximum option value for the minimum collective effort. The value of contributing back to a shared commons is also reinforced, creating an immortality of contributions, rather than the eventual bitrot and replacement that is common among custom developed solutions.
By designing for modularity and option value, Ansible ensures that a user can, in a short amount of time, make a contribution that has perpetual value to themselves and to others.
Optimize for first experience
One of the things that allowed Ansible to spread very quickly was that both the product and the documentation were optimized for the quickest possible successful first experience. An easy install experience and a friendly introduction in the documentation help to create a “shallow end” to the pool where new users can wade in, without having to dive into the deep end head first. The idea that a user can try something out over a lunch break, and understand it—and then learn what is left to learn—is a key success driver for open source software. Too many projects fail needlessly because they don’t invest in this critical idea.
If a GitHub project exists as just five lines of README and arbitrary code, it may survive among a small subset of developers, but broad user adoption will be out of reach. That in turn limits the developer pool, because many of the best contributions come from converting users into developers.
In open source, success is viral. The discovery of a new project is essentially an incubation period, during which users try the project out. If the experience is a good one, users “get the fever” and “spread” the project with their friends, creating more users and, in turn, more developers. A lot of Ansible’s viral success has been a direct result of our short incubation time and high conversion rate—and the fact that Ansible almost never kills its host.
Optimizing for first experience goes beyond the user’s first experience; it also extends to the developer’s first experience. This requires a well documented development process as well as an intentional effort to make things easy for new contributors. Code must be reviewed for readability as much as technical accuracy, and clever code tricks should be avoided in favor of code that’s easy to change, edit, and understand by a large number of people.
Gather data and make decisions with it
Ansible came into being because we recognized a pattern: users complained about the need to learn multiple automation tools, despite the apparent commonality between those tools. Why have separate tools for cloud provisioning, configuration management, orchestration, and application deployment, when one tool could do the job?
As projects grow, identifying emerging patterns and acting upon them becomes a critical skill—and those patterns can only emerge through the continual collection and analysis of data. At Ansible, we have multiple ways of collecting user data, and we are aggressive about gathering it and analyzing it. Bug reports, IRC interactions, mailing list threads, Twitter comments, and user surveys, all provide the critical data points that drive key decisions.
The most important advantage of careful data gathering is that it allows you to listen to the right users, not just to the noisy users. If you listen only to the noisy users, it increases the risk that you will build the wrong things, or that you will optimize for the few rather than the many.
Communicate, communicate, communicate!
In managing any large project, communication is critical, and there's no way around it—and unfortunately, a lot of developers just don’t like to communicate in any way except for code. But answering questions and writing documentation, both for users and for developers, are indispensable activities. The simpler the project is, the less documentation will be needed—but documentation will still be needed. The better the documentation, the fewer questions need to be answered—but there will still be questions that must be answered. That’s why obsessive refactoring and rewriting of documentation, and obsessive communication on mailing lists and elsewhere, took up perhaps 50% of the time of Ansible’s early project days. That investment of time has been vital to Ansible’s rapid growth.
In any large project that tries to integrate many different points of view, there will be disagreements about the right way to do things. Not every pull request will be appropriate, and not every bug report will be useful. Saying no in the right way is even more important than saying yes. It can be hard to take the time necessary to make a contributor feel valued even as you're declining their contribution, especially when there are hundreds of other contributions in the pipeline. We have recently standardized our responses to frequent questions to ensure that, even as we reach epic scale, we always let our contributors know how important their contributions are. Communication can be quick without being terse, and it’s critical to communicate, quickly but sincerely, how dependent Ansible is upon its contributors, and how grateful we are to have them.
We also communicate strongly about our priorities. We make it clear what we’re working on now, and what we’re working on later, and why we make decisions the way we make them. Over time, communicating about these things keeps the community moving forward with a consolidated vision and voice, and the project develops a predictable pulse. The future can be predicted by observing how things worked in the past.
Success is a series of deliberate choices
The methodology of open source development has come a long way in the past twenty years. It took the Linux kernel team eleven years to gain one hundred contributors in a month; it’s taken Ansible two years. Of course, the Linux community had to make up the methodology as they went along; the Ansible team has benefitted from years of studying and participating in Linux and other open source communities.
The biggest lesson, and it goes well beyond open source: excellence is a habit. What you do habitually, you become.
At Ansible, the building of community is our most ingrained habit, expressed in many ways. The architecture of the codebase, the daily communication, the focus on gathering and acting on data, the relentless attention to the new user’s very first experience: these are the habits that have caused so many contributors and champions to rally around Ansible. They are the habits that will continue to drive us as we work with our community to embrace new challenges and new directions.