Findings from working on Red Hat's installer

Image by:

Opensource.com

Until I started graduate school at the University of Wisconsin-Madison, I had never heard of open source. However, every computer science department of any age and stature uses open source software to support their infrastructure. One or another variant of Linux was always being installed on our desktops by the departmental systems administrators, and many academic programs are open source. I accepted the whole situation more or less as I found it.

View the complete collection of Women in Open Source articles

Open source became more relevant to me when I joined the Cooperative Bug Isolation (CBI) project. This project used open source software as test objects for automated bug finding. My supervisor on the project, Professor Ben Liblit, had written a short paper, The Open Source Proving Grounds, describing his experiences and evaluating the suitability of open source software for bug-finding project like his. Although the paper was published in 2005, many of his observations are still relevant today.

Once I started thinking about open source software, and especially when I began to study some individual applications, I was dismayed to discover that some of it is very poorly written and consequently riddled with bugs. Naively, I had assumed that any software that anyone would choose to distribute would be at least as carefully crafted as the software I myself, a mere graduate student, wrote. Then, as now, people will use software if it works well enough for their purposes, regardless of what the code looks like, even if the code is exposed for all to see.

Later, I worked on extending some macros in the Chromium project to gather CBI-type data. These extensions were never intended to be incorporated into the project in any permanent way; you will certainly not find them there today. The Chromium project was a very disciplined one, and even though the code I wrote was not ever going to be part of the project my patches were required to conform to comprehensive and detailed specifications and were reviewed thoroughly before they were accepted.

My personal computer is a Mac, but as everybody knows, that's just Unix under the hood, so I download, install, and often use numerous open source applications. I've seen some applications, like LyX, mature. Over the time I've known about LyX, my personal evaluation has evolved from "useless" to "by far the best choice available." I would still like to understand how an ambitious open source project like LyX can last through its early, vulnerable years.

By the time I left the CBI project, I had interacted in some way with a variety of open source projects, some apparently emerging from chaos and others emerging from a very disciplined development process, some terribly impressive and others not. After graduation, I held a series of teaching positions and fell back on my old ways. I was grateful that the software was there, used it regularly, contributed the odd bug report, and thought very little about it otherwise. Occasionally, I thought that I ought to find out more about this open source phenomenon and how it managed to sustain itself, but I was always far too busy.

My chance came when I decided to leave academia and enter industry. I had not previously programmed in industry, but I had done extensive development work for the CBI project in a number of languages and for a number of different purposes. Until I interned with the Yocto Project last summer however I knew nothing about the mechanisms of open source development or communication within a possibly geographically distributed team. I learned about the overwhelming number of emails that everybody inevitably gets and also about the utility of IRC. It was through that internship that I became facile with git.

Since I joined Red Hat in September 2013, I've been learning much more about the development cycle and its requirements, although because it hasn't been long, I haven't made it through a full cycle yet. Since I've become an open source contributor, I've started to see open source as a virtue in itself and have become much more likely to seek out open source alternatives to various applications. I'm also more likely to post bug-reports that are a bit higher up on the bug food chain—in the Python standard library, for example, rather than in some application written in Python. I see more problems as opportunities for open source collaboration than I did formerly. I would never have dreamed that open source could be so successful and only hope that its success will grow.

At Red Hat, I spend almost all my time working on the installer, focusing mostly on its storage component, blivet. The installer, Anaconda, guarantees that it will not actually do anything to your disks until you have made all your choices and pressed the big red button; at this point it ought to implement all your choices without crashing. This is a hard problem to solve, as there may be surprising restrictions on what the different tools that Anaconda relies on can do, and there are surprising interactions between different choices that you make. There is always tension between allowing the user the most flexibility and, at the same time, guaranteeing that the choices they make can be carried out.

Blivet, the storage component, is responsible for handling devices and device formatting. It must present a uniform interface for a variety of different file systems, for a variety of different devices, and for a variety of different architectures. This is like the device driver problem, where individual devices may be very different, but the drivers must all present the same Application Programming Interface (API). File systems, for example, can be very different from each other, as the file system designers do not have the problems of installers uppermost in their minds when designing their file systems. Moreover, most file systems do not present an API; blivet must generate the necessary commands for each file system's Command-line Interface (CLI), which can be a moving target.

Although Anaconda is a substantial application in its own right, and blivet is a substantial library, by their very nature they must depend on other significant applications like package installers and file system utilities to coordinate the actions of these various applications and interrogating them to discover relevant system state. Frequently, these interactions are made much more brittle because the application must be invoked by means of its CLI and must be interrogated by reading and parsing output which has a format that may change with the next update.

An installer like Anaconda is certainly an extreme example of an application that must rely on other applications. However, automation is the norm rather than the exception today and many applications, especially of the kind that Anaconda and blivet rely on, may expect to be invoked automatically as often as they are invoked by a human being. The more successfull they are, the more likely this is to be the case.

I believe that the open source community as a whole would benefit if more open source developers considered the API and associated bindings as primary and the CLI as of secondary importance. Ideally, applications would be designed from the start with a well-defined API, a set of bindings that evolved with the API, and a CLI (if one was necessary) that was defined in a scripting language that made use of the bindings. Not only would this make the application ripe for automation, but it would likely have the added benefit of making the API better defined and more robust.

View the complete collection of Women in Open Source Week articles.