The trials of certifying open source software

Image by:

Opensource.com

Open source won and, over the past five years or so, we have been seeing the acceleration of a new wave of open source projects that got their starts in corporations. This comes with a set of new challenges, as new corporate participants struggle with some of the realities. Folks generally understand that foundations provide neutrality in some form, but don't necessarily know how to drive the competitive discussions from the room. One of the more disturbing symptoms of this confusion is the discussions beginning around "certification" and what it means to be certified to a particular project. What is Certified Good Software^TM? [1]

I believe a lot of what drives these certification discussions is a hearkening back to the good times of "open systems" in the late 1980s and 1990s and up into the mid-2000s. We saw the rise of the Internet and the IETF delivering communications protocol standards, followed by the rise of the World Wide Web and the World Wide Web Consortium (W3C) delivering web-related standards. The variations on the UNIX operating system API and commands were standardized as POSIX by the IEEE (and later ISO). We saw the superposition of the Single UNIX Specification from X/Open (now the Opengroup) on top of the POSIX standards. UNIX was the engine driving the early-Web replaced by Linux as Linux matured and evolved.

And this is where the differences need consideration in the discussion.

Successful open source communities (e.g., Linux):

build the one true implementation on a single rapidly evolving code base
by collaborating through contributions, and
meritocratic influence is driven by the contribution of code, infrastructure, and effort in the project.

On the other hand, standards organizations (e.g., POSIX and UNIX):

collaborate on interface specifications to enable multiple competing implementations to be built and measured
by negotiating a compromise position amongst participants, and
democratic influence is gained by diplomacy and participation in the bureaucracy of the standards development organization.

Another way to say this is that well-run open source projects with neutral ownership may grow into an ecosystem that includes products, but standards tend to happen in mature markets where competing products already exist.

Standards and certifications

Standards exist to encourage multiple disparate implementations and to measure how they interoperate. This holds whether or not the standard is for

a communications protocol allowing information to be passed between two different implementations,
a programming language allowing a program to behave consistently when compiled or interpreted by different implementations,
a systems API allowing a program developed on one system to be (built and) executed on a different system.

To this end, people often want some form of certification or testing to exist so that implementations can prove they measure up and conform to the standard. But who are these people? That's a really important question, because testing and certification is never easy and never inexpensive. I'll go back to my primary experience around the POSIX and C-language standards for two examples from different perspectives.

Vendors collaborated in standards efforts regardless of whether the rules in the venues said participants were individual experts (e.g., IETF, IEEE) or employees of consortia members (e.g., X/Open, W3C) or delegates representing their countries (ISO).

In the late 1980s and up through the early 1990s, the U.S. government was the largest IT procurement organization on the planet. They cared about the IEEE POSIX standard to support application portability across different systems from different vendors, and they participated in the IEEE working groups along with the individual engineers developing POSIX to define the minimal subset of a UNIX operating system interface. The U.S. National Institute of Standards and Technology (NIST) then put in place a Federal Information Processing Standard (FIPS) for POSIX-based government procurement. It had a testing requirement, and the government put in place a test suite and a program to certify testing labs to run the test suite to allow vendors with products to demonstrate their conformance to the standard.

The IEEE standards themselves made fairly simple statements with respect to conformance to the standard—essentially they defined the idea of a "conforming application" as an idealized object that a conforming implementation must run. (Conforming applications are horrible ideas in the real world, but we'll save that debate for another day.) The NIST FIPS program made the standard's conformance statement real by defining a large set of test cases that must pass to allow the vendor to claim they had a POSIX FIPS certificate. Because of the complexity of the POSIX standard and the difficulty and expense of testing it, the FIPS didn't go so far as to guarantee the conformance just because the implementation passed the test suite, but it was a very strong indicator.

Essentially the U.S. government put its money (a certification program) where its mouth was (we want POSIX systems) and paid for the cost of demonstrating conformance. The IEEE was not in a position to build expensive testing programs for standards developed by their individual professional members.

Through that period, there was a broader specification around the core POSIX standards being developed for a modern UNIX system. If POSIX was the minimum subset, the Single UNIX Specification was the superset. Here the standardization efforts were led by the vendors as participants in X/Open, an industry consortium of vendors. The work was a proper superset of the POSIX standards. (No vendor wanted to exclude the U.S. government customer.) X/Open took a slightly different approach to the conformance problem. X/Open wanted to create a level playing field amongst the vendor members, so use of the "UNIX" brand was tied to passing an expanded test suite. To get around the difficulties of certification guarantees, the X/Open certification was stated as a warranty. If anyone discovered a contradiction between a certified vendor implementation and the Single UNIX Specification, the vendor warranted they would fix the problem and conform within very tight timeframes, or they risked publicly losing the UNIX brand.

Again, the group that cared about demonstrating their conformance (the UNIX vendors) put their money where their collective mouth was at the consortium to pay for the X/Open membership program to test conformance.

The Internet Engineering Task Force is structured similar to the IEEE in that participants are individual contributing experts. Their clever "hack" for their networking specifications was to state that a standard could never move from Draft status to Full Use status without two implementations from diverse code bases demonstrating they could communicate completely.

Open source software certifications?

All of this discussion so far is about certifying standards. Why don't we hear about certifications in the open source world over the past few decades?

Standards specifications evolve on a different time line to open source projects. They often need to support complex externally driven procurement needs in a marketplace and are carefully and deliberately developed to ensure vendor participants can meet those needs. Once agreed, standards change relatively carefully (i.e., slowly and with rightly conservative process). Consider HTML: The IETF Internet Draft (1993) became HTML 2.0 (1994) became W3C HTML 4.01 (1999) became HTML5 (2014).

Although products based on open source can change to meet customer needs, when products must interoperate (e.g., networking products) or must demonstrate compliance (operating system interfaces), then certifications are a way to demonstrate a product meets tolerances with a standard.

Successful open source projects that have moved from a small public community or single vendor project into a multivendor ecosystem do so by defining a neutral playing field, shared or non-profit IP ownership, and bright lines around the collaboration to remove competitive discussion from the collaboration itself. Work and evolution happens in real-time. If the project succeeds and evolves to the point of products and services in the marketplace, then the business entities providing those products and services have determined the best differentiated approach to their customers, and how to balance contribution to the projects with development of the services and products.

There is no Perl language standard, because at the end of the day, there is only Perl. It is shipped in a variety of forms by vendors as part of their platforms or IDEs. Multiple other executable versions exist (some for free) and can be run on different platforms, and with varying levels of quality and support. But there is no Perl certification tests because all versions ultimately derive from the one true community project source. Applications written in Perl (at a particular language version) run across implementations, unless someone is dealing with an obvious platform-specific extension or experiment.

The C and Fortran languages do have standards. Multiple compiler implementations from multiple sources and vendors existed and were diverging. There needed to be a standard against which to measure the different implementations, and professionals came together under the auspices of the X3 committees to create the ANSI standards. The U.S. government again put its money where its mouth was as a procurement organization and developed certification processes against the standards for its own procurement. The fact that others trust the certification for their own procurement is a benefit to those organizations, but not a requirement of NIST. It supports the standards but is separate from the standards.

The Linux project is another good example. Linux distributions come and go. Some distributions are packaged as products and the companies that provide such products to customers for money have a myriad of ways of competing. But the Linux kernel community is where the core work still happens on what is the Linux operating system. Some companies have nuanced approaches to the variations on Linux they support. For example, Red Hat is a primary contributor to the kernel project. The Fedora distribution is a Red Hat-supported community project, and Red Hat Enterprise Linux is developed from the Fedora community. The CentOS distribution is a freely released community rebuild of the Red Hat Enterprise Linux source code, which provides a similar execution environment.

Linux provides an interesting further example here. Linux was never certified as a UNIX operating system, despite its obvious lineage, and despite the fact that as the enterprise adoption of Linux servers grew and replaced expensive UNIX servers, the UNIX ISV world moved to several key enterprise distributions of Linux without such certification. I believe because Linux was close enough to UNIX, the ISVs moved their applications encouraged by Linux vendor ISV programs, and never looked back.

Behavior versus bits

So stepping back, what does it mean to be certified to an open source project? If it means that the project embodies some particular specification (e.g., Apache httpd implementing RFC 2616), then the specification—not the open source project—is probably the place to focus efforts on certification. The open source project is likely but a single (possibly diverging) implementation, and the standard acts as a measuring stick in the marketplace to ensure the divergence is within acceptable tolerance to the standard's community of interest. This holds true with NGINX as well.

We have seen that with the Linux Standards Base. The standard was an application binary standard to support ISVs trying to target multiple Linux distro products in the marketplace with their applications. But the LSB was an ABI standard on parts of Linux, and not on the Linux code base itself.

As vendors wrestle with these ideas in new open source collaborations (for example, the newly formed Open Container Initiative), they will need to focus on the specification of a container and actions on a container, and not the open source-licensed code for any testing and certification, and not the open source-licensed code. (Maybe they can take a page out of the IETF notebook and demonstrate multiple divergent implementations running the same test containers.) One of the other realities of the standards world is that "certified" products will diverge in the real world, and need to be brought back into alignment with the measuring stick. I'm betting once the group decides on the specification, they'll discover after the fact that the code and the spec are already diverged.

What certifying open source hopefully never comes to mean generally is that somehow the one true code base is the yardstick. Open source projects that succeed in delivering rich ecosystems of products and services support a breadth of needs and requirements. Companies building products and offering services develop a myriad of markets for customers around some of these needs and requirements. At its simplest, a company may discover errors in open source project software as they shape components into products. They may rightly contribute those changes back to the project so they aren't eating the engineering economic cost of living on a fork in the code base. But that means there is now a difference in the project's view of the source and the company product view. And if the project accepts the bug report, but modifies the fix, the product may be shipping out of source code alignment with the project for some time.

Ultimately, the folks who want certifications must be clear about exactly what is being certified to what measuring stick, and the benefits of choosing that particular measuring stick must be clear. Because in the end, the folks receiving those benefits need to be willing to carry the costs of certification.

[1] With a hat tip to Tom Christiansen