Open source more about process than licensing

Image by:

Opensource.com

It is a testament to the success of the Open Source Initiative's (OSI) branding campaign for open source software that "open source" and "licensing" are functionally synonymous. To the extent that people are familiar with open source software, it is the source code released under a license that lets anyone see the "crown jewels" of a software program as opposed to an opaque binary, or black box that hides its underpinnings.

This well-trodden trope has dominated the mainstream view of open source software since Eric Raymond pushed it into the public consciousness over 15 years ago. But taking a previously proprietary code base and transitioning it to an open source project makes one seriously question any previous assumptions about code and licensing. It is that undertaking that leads one to appreciate the values of process and governance. After seeing that transition from closed to open firsthand, I am convinced that the choice of whether to release code as a proprietary or open source project leads to fundamental changes in the end product, a divergence that is very difficult to roll back.

From the point of view of most people, the software license is the most important aspect of releasing open source software, but in my opinion, licensing falls somewhere under user experience, workflows, and integration into existing data center technologies. Nowhere is this difference, in what is "known" (licensing) and what is the actual reality (user workflows), more clear than in the fearful eyes of the development team tasked with transforming their proprietary product into an open source project. In fact, the development methodology chosen by the engineers has a direct impact on what type of software is produced. If an open source development model is chosen from the beginning, one can be reasonably sure that the end product will be relatively portable and will plug into the most commonly used environments. If a proprietary model is chosen, it’s very easy for the developers to make cheap shortcuts that result in short-term gain and long-term pain—and that’s precisely what often happens.

To the extent that people think of these things, the common perception is that this change involves a simple search and replace, maybe the removal of 3rd party software, uploading to a public repository, and presto! Fork me on GitHub! But, nothing could be further from the truth. What most people miss about software is that it's much more about process, control, and administration than software licenses. As I argued in It Was Never About Innovation, the key to the success of open source software is not the desire for innovation but rather the fact that all players in open source ecosystems are on a level playing field. Customers, outside developers, freeloaders—they all have a seat at the table and can exert influence on a project by virtue of their leveraging of community equity, which they have built up over time by contributing in various ways. This is in stark contrast to proprietary development models where developers can essentially do whatever they want as long as they create an end product that meets the expectations of the Product Requirements Document (PRD) supplied by product management.

This is where the difference between open source and proprietary development comes into stark relief. The open process that accompanies open source development will help to ensure that the software will likely integrate into any given environment and that some bad habits are often avoided. These two things go hand-in-hand. For example, proprietary software development often results in software that is monolithic in nature with a minimum of dependencies on system software and often bundled with its own set of libraries and tools. This gives developers the leeway to do whatever they want, often employing specific versions of libraries, reinventing various wheels, and generally veering far from the path of creating software that works well in a broader context.

Open source software developers, by contrast, have no such luxury. From day one, their users demand the ultimate in flexibility, integration, and conformance to standard data center systems practices. This means the utilization of existing tools and libraries whenever possible, baking into the process the idea that your software will be a cog in a much larger data center machine. Note that nowhere did I mention that open source development was faster or more innovative, although it can be. On one hand, developers love the fact that they have complete control over the end product and don't have to deal with annoyances, such as customer demands that their precious software honor their existing workflows. On the other hand, end users love the fact that their open source deployments likely have a long history of use within large data centers and that those previous users made sure the software was to their liking.

Both of these approaches come at a cost: open source development may actually be slower at particular times in its life-cycle due to some overhead costs that are inherent to the model, and proprietary development, while perhaps faster, sends the developer team down the road of maintenance hell, needing to endlessly maintain the bits of glue that generally come for free in open source development. The overwhelming evidence of late suggests that the open source approach is far more effective in the data center.

Suppose that your team went down the road of proprietary development but eventually came to the conclusion that they could win over more users with an open source approach—what then? Here lies the conundrum: the process of undoing the proprietary process and imbuing a project with the open source sauce is spectacularly difficult. Many otherwise knowledgeable people in the tech industry have no idea just how much change is involved. Hell, most engineers have no idea what's actually involved in switching horses midstream. To engage in the process means necessarily losing valuable development time while taking up tasks that developers feel are, frankly, beneath them. To change software from a monolithic, proprietary code base to one that plays well with others is a gargantuan task.

"But wait!," I can hear you say. "Can't they just release whatever they have under an open source license and then take care of the other stuff later?" Sure, they can, but the end result will likely be disappointing at best, and a colossal disaster at worst. For starters, mere mortals won't be able to even install the software, much less build it from source. There are several tricks developers play to make black box monolithic products work for their end users that make it terrible for open source community-building:

Highly customized build environment and tools. This is the #1 reason why the majority of proprietary software cannot simply be set loose as open source: it’s completely unusable to all except the developer team that built it. When developing open source software, there are a few standard ways to build software. All of them are terrible at producing highly optimized executable programs for running at the highest level of efficiency, but they're great for giving developers a simple, standardized way to build and distribute software. The process of making your proprietary software build with standardized open source build tools is probably non-trivial. Open source projects, by contrast, came out of the crib compiling with GCC.
3rd party libraries, also proprietary, that you do not have permission to include in your open source code. Even if your code can build with GNU autotools and GCC, to use one example, you probably have to rewrite some not-insignificant portion of the code. This takes time and effort away from your developers who will be spending time ripping and replacing many pieces of code and not implementing new features. This varies from project to project, but it afflicts the vast majority of projects going from closed to open.
Bad security practices. When developers think nobody else is looking, they do all sorts of crazy things. And as long as features are developed on schedule, nobody bats a eye. It is this primacy of feature development over code quality that can result in some horrendous security holes. Obvious exceptions aside, *cough*heartbleed*cough*, there is lots of evidence that open source software is more secure than its proprietary counterparts.
Bad coding practices and magical unicorn libraries. For the same reasons as above, ie. feature primacy and nobody's looking, developers tend to work with the latest and greatest from other software packages, especially when it comes to runtime scripting engines, libraries, and tools. They take the code, modify it, and then they have an end product that works. For now. This is great if you're on a deadline and your code must work by midnight, and it's approaching 23:30. The problem, however, is that the product will live long after midnight tonight, and you will be responsible for maintaining, updating and syncing your pristine unicorn library with code that will inevitably diverge from what you modified. This is terrible for everyone, developers and admins included. Imagine the poor sod in operations assigned to installing and maintaining someone's late-night "innovations".

All of the above leads product teams to one obvious conclusion: package and distribute the software in such a way that it runs as far removed as possible from the system on which it resides, usually in the form of a bloated virtual appliance or at least in the form of a self-contained application that relies on the bare minimum of system libraries. Windows admins should take a look at their Program Files directory sometime. Or better yet, don't. All of this, taken together, adds up to an end product that is extremely difficult to release as open source software.

Some ops people might think that an appliance is easier for them to deploy and maintain, but more often, they hold their nose in order to use the thing. They will tolerate such an approach if the software actually makes their jobs easier, but they won't like it. All of the ops people I know, and I used to be one, prefer that the software they deploy conform to their existing processes and workflows, not force them to create new ones.

Put another way: would your software exist in its current form if it started life as an open source project? Or would end users have demanded a different approach?

Open source is about process much more than license, and everyone in an open source community has the ability to influence those processes. Projects that start out as open source have many characteristics baked in from the beginning that often, though not always, save developers from their own worst instincts. If you elect to reverse course and move to the open source model, understand what this change entails—it is a minefield, laden with challenges that will be new to your development team, who are unaccustomed to seeing their practices challenged, don't particularly relish direct customer feedback, and are entirely uncomfortable with the idea of others reading over their shoulder as they write code. The amount of effort to change from proprietary to open source processes is probably on the same order as going from waterfall to agile development.

Example: ManageIQ

When Red Hat acquired ManageIQ in late 2012, it was with the understanding that the code would be open sourced—eventually. However, there were several things standing in the way of that:

Many of the User Interface (UI) scripts and libraries were proprietary, 3rd party tools.
The software was distributed as an encrypted virtual machine.
ManageIQ was and is a Rails app, and some of the accompanying Ruby gems were modified from their upstream sources to implement some specific features.

#1 meant that many parts of the code, particularly in the UI, had to be ripped out and either replaced with an open source library or rewritten. This took quite a bit of time, but was something that had to be done to release the code.

#2 is not something one can do in an open source project, striking fear into the hearts of the development team. Some changes to the code were necessary after losing the (false) sense of security that came with distributing the software in an encrypted appliance.

#3 meant that the developer team had to carry forward its modifications to custom gems, which was becoming a burdensome chore and would only get worse over time. The developer team is still in the process of fixing this, but I’m happy to report that we’ve hired a strong Ruby developer, Aaron Patterson, who will, among other things, maintain the team’s changes to upstream gems and prevent future forks and divergence. He’ll also lead the effort to convert ManageIQ to Ruby on Rails 4.

Conclusion

Be considerate of your developers and the challenges ahead of them. Hopefully they understand that the needed changes will ultimately result in a better end product. It comes at a price but has its own rewards, too. And never forget to remind folks that choosing an open source approach from the beginning would have obviated this pain.