Data sets are everywhere, and because open source communities produce plenty of information in addition to source code, most community infrastructures require tools to support the software development process. Examples include bug-reporting systems such as Jira and Bugzilla, versioning systems such as Git, and code review tools like Gerrit. Although communication also takes place through these tools, most is done through mailing lists, IRC, supporting systems like Discourse, and even Twitter and other social channels (especially for marketing and announcements). In fact, most open source communities utilize at least five or ten tools, if not more.
When it comes to monitoring data, we tend to go for "easy" metrics, or the ones we feel most comfortable with. If your background is in engineering, for example, you might prefer code review or source code metrics. If you're in marketing, the "easy" choice might be leads, visits to the website, and other such data points.
With so much data, monitoring it can seem like an overwhelming task, so it's important to use the right metrics.
Why do you need metrics?
In my experience, metrics serve three main functions: to increase awareness, to lead change, and to motivate.
- Awareness helps you understand where you are in relation to specific policies and goals. For example, if you don't know how many project contributions were made by under-represented minorities, you cannot determine whether workplace policies that aim to create a more inclusive and diverse work environment are successful.
- Leading change focuses on determining a path. If a particular policy is implemented, for example, metrics will indicate whether KPIs increase or decrease.
- Motivational actions help communities attract developers and help members achieve goals. For example, many communities reward developers who detect bugs in beta products. This benefits the community in two ways: The bugs are fixed, and looking for bugs becomes a priority for community members.
What can you measure?
Open source communities include five measurable areas of interest: activity, community, process, code, and licenses.
- Activity: This is the most basic area, focused on measuring trends and events. It takes the simplest approach, like counting potatoes: You can count commits, code review processes, comments, forks, or stars—but be careful not to confuse activity with popularity. For example, measuring only forks or stars in GitHub could be a good metric for popularity, but not for activity.
- Community: This is the core of open source, as members of the community produce and review code, open bug reports, etc. They produce the events measured in the activity area. But who are the main developers? And what does it mean to be a main developer—are they the members who produce most of the code, or whose lines make up the current version? Do they serve as mentors? Analysis of such demographics can create a social network—for example, the image below shows the social network of the Python Interpreter CPython:
This image helps show who the developers working in different areas of a project are, identifying those with a broader knowledge of the architecture and those who focus on a specific repository. The dots represent developers, while the blue rectangles are the repositories in CPython. A graph edge (a link between the dot and the rectangle) appears only if a developer has participated in a repository. The larger the dot, the more repositories that developer has committed to—in this example, CPython seems to have six or seven main developers. The thicker the edge, the more commits a developer has produced in that repository (the blue rectangle).
- Process: The process focuses on analyzing software development activities. This helps you understand if everyone is following a similar process, or if a change in the toolchain is slowing development or creating bottlenecks. Measuring the process can help you determine the total time, since an idea is written as a feature request or a user story until it is merged into code. If you know that timeframe, and it remains stable, you can, for example, estimate the total time you might need to deploy to a customer.
- Code: This area of analysis can offer valuable insight by monitoring the quality of the code from several perspectives.
- Licenses: Compliance is a basic tenet of open source communities. All members of the community must understand what the license means; this is even more important when third parties use the software to build onto.
How do you proceed?
Now that you have the why and the what, you need the how. First, you need to follow a specific methodology. You might consider the OKR (Objective and Key Results) or the GQM (Goal-Question-Metric) approach many companies use; the key is to use an approach that supports the governance and business goals of the communities. At the end of the day, we are all paid to achieve specific goals, and GQM helps make this happen. To simplify, open source communities have governance, and that governance has specific short-, medium-, and long-term goals. The GQM methodology can help you apply appropriate metrics to business goals:
Governance -> Goals <- Questions <- Metrics
Here's how it works:
- Detail a set of project business goals.
- Create questions that define those goals as completely as possible.
- Answer those questions through metrics, and track the process and product features according to the initial goals.
- Documentation is a key task at this level, so always remember the original goals, the questions you asked, and the metrics you used to answer them. This helps to compare results against previous processes and learn from failures.
Next you need a strategy. A typical one is Plan-Do-Check-Act. Start by defining the short term-goals; that is your plan for the following months. Then start working to apply the policies that you defined, and measure to determine whether or not these policies worked. Then continue iterating and defining new business goals—always think of metrics and goals as an iterative process: The metrics you define today are not likely to fit your business goals next year. This helps answer the question: How long should I use this metric?
Monitoring and metrics are tools that can help in your daily work. Metrics should be seen as an ally, not something to avoid based on often-cited examples of misuse, and they are independent from the role that each of us play in the community. There is transparency in the products open source communities produce, as well as in the processes they use, and metrics are another branch of transparency. Transparency should influence how communities behave, work, and evolve over time, and this requires that every member is included in the daily monitoring process. It also means that the needs of every community member should be considered in determining the process.
Finally, I'd like to mention the CHAOSS (Community Health Analytics for Open Source Software) project, which was introduced recently at the Open Source Summit in LA, under the umbrella of The Linux Foundation. I have had the opportunity to participate in this project since its inception, and if there is a single best place to discuss metrics in open and inner source communities, this is it.