An open source toolkit for measuring project health

Red Hat open sources Prospector and joins Project CHAOSS.

Team operating principles: the open source way

Image by:

Opensource.com

I've always had a nagging question about open source projects: How does one determine a project's success/failure? Does "success" or "failure" get detemined by code commits and gut feel? Or is that some other way?

I wanted to address this ambiguity and conjecture-based assessment of open source software projects, but found that literature around software analytics and associated theories were largely derived from academia and focused on things such as the number of lines of code and bugs per thousand lines of code. The good news is that explosion of open source projects over the past 20 years has provided a large corpus of information that can be mined to help with the analysis of a project's health.

Red Hat's product offerings are all built upon open source projects—they all are derived from one or more upstream, community-based open source projects. Red Hat's product managers need to have a good sense of what is going on in their respective upstream open source projects to enable the product's continued evolution based on the strength of the community and collaboration in the project. In addition to Red Hat's own needs, the explosion of products and services that use the hundreds of thousands of open source projects to drive the technology revolution calls for a coherent, repeatable and objective tool/method to ascertain how a project is doing.

Enter Prospector, a tool we built internally at Red Hat to help measure this and that we now have contributed to the Linux Foundation to help form the basis of the new CHAOSS project.

Before building Prospector, we looked at existing efforts at Ohloh (now Open Hub) and some of the academic research I referenced earlier to help establish a set of metrics that we wanted to measure. With Prospector, we aimed to create a tool that included all of the related information about a project—its website, code repository, bug repository, mailing lists, IRC channels, CVE reports, event blogs, and more.

We wanted metrics based on publicly accessible data sources with no paywalls, login credential requirements (apart from generic logins), or throttling triggers. Where throttling occurred, there needed to be a manual, freely available mechanism to get to the information.

Prospector was not about analyzing the code itself, so there was a deliberate design pattern that dictated that Prospector would not be replicating the source code repository.

Built using Django, Python, JavaScript, and Postgres and hosted on Red Hat OpenShift, the container application platform, version 1 of Prospector went live internally at Red Hat in 2013. Since then, we have continued to evolve Prospector and its data sources, adding sentiment analysis using tools such as JamIQ and Google Trends.

Prospector presents all of the information from its core data sources in graphical dashboards. We set thresholds to data on code commits, bug reports, bug fixes, mailing lists participation, and IRC channel conversations. For example, here's how the tool looked at source code commits. Prospector placed 35% and 55% as two watermarks to indicate the following:

if fewer than than 35% of a project's code commiters have a similar email domain ID and they form the largest domain, Propector will deem that this project is community-driven;
if the number falls between 35% and 55%, it is deemed as a blended project; and
if the number is above 55%, it is deemed as a corporate project.

Prospector does not make value judgements as to whether each of those thresholds are good or bad. Thresholds in the tool are used to help users assess if the user should do additional analysis based on the broader categorization.

The value of such a tool is not that it goes out and gathers data, but that the gathered data is then made available to those interested in a project to interpret in ways that might not have been possible.

With Prospector, one gets to view a project's various metrics and then compare them side by side with other projects. The types of insights that one can gain are quite interesting. I recall a comparison that I did between the oVirt project and the Linux kernel. The oVirt project was new, and when one looks at the code commits during the early days, it was cyclical with the commits slow early in the week and ramping up over a week and then quietening down over the weekend. That would suggest that it is a corporate project with the developers busy during the work week. When the timeline was extended to over 12 months, the weekly cycle evened out and there was a steady uptick in the contributions. That is a really useful metric to know that a project is gaining impetus after the initial corporate push. That then reaffirms the value of the community contributions and collaborations.

We are excited to have contributed Prospector to the CHAOSS project so that it can continue evolving. To learn more about CHAOSS and the role Prospector will play in that project, visit https://chaoss.community/.