How open source is driving the future of cloud computing

A person holding on to clouds that look like balloons

Image by:

Opensource.com

In 1998, Amartya Sen was awarded the Nobel Prize for Economics. The lecture he gave, titled "The Possibility of Social Choice," succinctly captured both the subject of his work (generalizing economic theory to cover social groups of disparate actors rather than just individuals or corporations) and his irrepressible sense of humor (because the generalization applied to Arrow's Impossibility Theorem). Sen's crucial insight (for me) is this (emphasis mine):

Thus, it should be clear that a full axiomatic determination of a particular method of making social choice must inescapably lie next door to an impossibility—indeed just short of it. If it lies far from an impossibility (with various positive possibilities), then it cannot give us an axiomatic derivation of any specific method of social choice. It is, therefore, to be expected that constructive paths in social choice theory, derived from axiomatic reasoning, would tend to be paved on one side by impossibility results (opposite to the side of multiple possibilities). No conclusion about the fragility of social choice theory (or its subject matter) emerges from this proximity.

I am quite familiar with proximity to impossibility. When we started Cygnus Support, the world's first company based on selling commercial support for free software, nearly everybody thought it would be impossible. Those few who did not thought that being so nearly impossible would make the business too fragile to ever be interesting, especially by Silicon Valley standards. The success of Cygnus [1] and the subsequent success of Red Hat [2] [3] strongly validate Sen's bold prediction that being on the edge is not a sign of weakness. Indeed, where do we find leaders, but out in front?

All of the above is a preamble to the subject of this article, which is the presentation of a new economic paradigm for understanding the future and potential of cloud computing. With luck, economists smarter than I will develop the formal methods and analysis that will garner them some recognition in Sweden. But luck or not, the true beneficiaries will be those who embrace this paradigm and profit from the insights that it makes obvious. Insights which, according to today's nay-sayers, are impossible or at best insignificant, but which in fact are the key to recovering trillions of dollars in business value wasted every year under the current paradigms.

Global IT spend tops USD $1.5T per year, and businesses are (or should be) banking on massive IT-enabled returns on that investment. Yet 18% of all projects are abandoned before going into production, and another 55% are "challenged", meaning they are late to market (sometimes very late), buggy (sometimes very buggy), or missing functionality (sometimes key functionality). The estimated costs of these shortfalls is USD $500B per year, but that's only part of the story. The shortfall in terms of expected ROI is 6x to 8x that number, meaning that USD $3.5T of expected business returns never materialize [4]. Each year. No other industry I can think of can tolerate such abysmal performance results, yet that's what we have come to expect from IT. Which is unsustainable.

This problem has remained so stubbornly entrenched in part because the numbers militate against any solution. The probability of failure is so high (18% for sure to fail totally, 55% chance of missing deadlines, milestones, or a clean bill of application health) make it about a 50/50 chance that making any effort to improve one's application environment will actually make it worse, and that even in the best of circumstances, one will only achieve 50%-80% of what was originally intended [5].

But there is an alternate universe in which we find a working solution: the world of open source, where measured software defect rates are 50x to 150x lower than typical proprietary software, and where the pace of innovation is can be seen on literally a daily basis. The first (and still one of the best) economic analyses to explain this remarkable phenomenon was a game theory analysis by Baldwin and Clark [6], showing that selfish developers benefit from forced sharing (involuntary altruism) when systems are modular and there is a community of like-minded (i.e., similarly selfish, lazy, and capable) developers. Their results also showed that the results are highly scalable, and that the more modular the system, the larger the community becomes and the greater the payoff for participating. This formal result justified what Tim O'Reilly and so many others observed when they spoke about "The Architecture of Participation" [7]. It also validates the intuition I had when I started Cygnus Support, as well as what I saw happening between our company and the community pretty much from the beginning.

A second finding, explained by Oliver Williamson in his 2009 Nobel Prize lecture, was the formalization of the economics of governance and the economics of organization, specifically to help answer the question: "What efficiency factors determine when a firm produces a good or service to its own needs rather than outsource?" [8]. For too long, economists, and the proprietary software industry for that matter, have treated firms as black-boxes, ignoring all the details on the inside and focusing on prices and outputs as the only interesting results to study. Williamson builds a new theory of transaction cost economics based on work first articulated by John R. Commons in 1932 and strongly echoed by W. Edwards Deming in 1982 [9], namely that continuity of contractual relationships is a more meaningful predictor of longterm value than simple prices and outputs. Indeed, when so much is being spent and so much being thrown away when it comes to proprietary systems, the prices and outputs of those systems become almost meaningless. At the limit, the firm that treats IT only as a cost, not a driver of business value, has fallen into a trap from which it is quite difficult to escape. By contrast, the architecture of participation, coupled with ever-increasing utility functions (due to user-driven innovation), show that the Deming cycle is perfectly applicable to software, and that the longterm relationships between firms build far greater value for all concerned than trading price for quitting.

So what does this all mean for the cloud? One hypothesis is that the macroeconomics of the cloud makes the microeconomics of open source insignficant, and therefore irrelevant. If that is true, then the game is truly fixed: a cloud OS is just another OS, cloud apps are just like traditional apps, cloud protocols and managment tools are merely software APIs and consoles, etc. If that is true, then we should all be prepared for the Blue Cloud of Death.

An alternative hypothesis is that open source is the nanotechnology of cloud computing, and its nano-scale properties (architecture of participation, enhanced innovation cycles, quality, and transactional efficiencies) are crucial to all innovation going forward. I argue that this is indeed the case, not only because of the arguments made thus far, but because cloud computing creates a new inductive force that specifically strengthens the arguments just made. And at this point I'm compelled to introduce a rather lengthy analogy; please bear with me. A single tree in the Amazon rainforest can transpire 300L of water per day, or a bit less than half a (cubic) yard of water for those of us still using the Imperial measurement system. It seems insignificant. But when one considers the whole Amazonian rainforest, not only do these trees transpire as much water as flows through the Amazon river itself, but they propel that sky-borne water as far and as fast as well, effectively creating a second Amazon river in the sky [10]. It is one thing to see a tree as shade, or as resource for firewood, or a carbon sink, or any other discrete use, but when the lens changes from the small scale to the large, its function in the larger context cannot be imagined looking at the smaller case. Adam Smith said the same thing about the invisible hand of the market, not to say that it always does the right thing, but to say that it's always doing something [11]. Or, as Gandhi once said

Whatever you do will be insignificant, but it is very important that you do it.

When I started writing open source software back in 1987, Richard Stallman was the maintainer of the GNU project, the master repository was his local disk, and my version control system was Emacs backup files and, to a lesser extent the frequent tarballs of software distinguished by a manually-adjusted release number. Merging changes was a time-intensive (and sometimes energy-intensive) process, but the quality of Stallman's code, and the few others working with him at the time, was such that I could do in weeks what companies could scarcely do in years. The GNU C++ compiler was developed and first released in six months time, while at the same time I ported the GNU compilers to half a dozen new architectures. Everything that was wrong about the way we mananged our software changes in those days represented an opportunity for us to develop a new software management paradigm for supporting customers commercially. We adopted the newly-developed CVS (Concurrent Versioning System) and for a time, the world was our oyster.

Within five years, we had succeeded in many of the ways we imagined: inclusion on the Inc 500 list, the Software 500 list, the cover of a special edition of Fortune magazine, even mentions in the New York Times and the Wall Street Journal. But we succeeded in ways we didn't imagine, nor design for. We stretched CVS to its breaking point. Signing a new customer meant potentially creating a new customer branch in the master repository. This process, which could once be done in a matter of minutes, could take a day. Which meant that with 200 business days a year, if we signed up 200 customers that year, then developers would have precisely zero days with which to do any work against the repository. This frequently led to arguments about forking—developers wanted to work in repositories unconstrained by operational bottlenecks, but somebody had to merge changes that could be delivered to customers. The cost of forking had become intolerable, and the social choice we had to engineer was one of lowered expectations for both customers and employees. Despite those shortcomings, relatively speaking we shined, with the development and delivery of custom compilers and debuggers on time and on budget 98.5% of the time.

But things are different now, and being the best in a broken paradigm is not good enough. In the past five years, a program called "git" has revolutionized how developers and maintainers manage code, and how code can be called into production on a moment's notice, sometimes for just a moment. git has reorganized the open source world so that forking is neither expensive nor problematic, and where projects can merge and combine so easily that it is almost possible to think of it as a kind of quantum superpositioning. This change not only solves the problem that bottlenecked the old way of doing things (at Cygnus and the FSF), but opens up entirely new concepts as to what an application itself might be. Instead of being some monolithic tangle of code that was difficult to create, expensive to test, and impossible to change, it becomes a momentary instance of code and data, producing precisely the result requested before vanishing back into the ether. At any moment in time, new code, new data, new APIs, and new usage contexts guide the evolution of each generation of the application. An application that evolves by the minute is fundamentally different than one that evolves only every year or two (regardless how many new features are promised or even delivered).

This rapid new dimension of evolution—at the application/operational level—requires a new economic analysis. Fortunately the groundwork has been laid: Evolutionary Game Theory studies behavior of populations of agents repeatedly engaging in strategic interactions [12]. Behavior changes in populations are driven either by natural selection via differences in birth and death rates, or by the application of myopic decision rules by individual agents. In the article Radically Simple IT by Dr. David Upton [13], a deployment model is described in which all existing functionality of the system exists in at least two states—the original state and a modified state. Inspired by the design of fault-tolerant systems that always avoid a single point of failure by running independent systems in parallel, new features can be added as optional modules in parallel with the existing system. When new features are judged to be operationally complete and correct, the system can "fail over" the old modules to the new, and if a problem is then later detected, the system can "fail back" to the original. By constantly running all versions in parallel, some version of the correct answer is always available, while some version of a new and better answer may also be available. When implemented by Shinsei Bank in Tokyo Japan, the bank achieved its operational milestones 4x faster than using conventional deployment methods, and did so at 1/9th the cost. And by designing their system for maximum adaptability (rather than maximum initial functionality) they were able to adapt to customer needs and expectations so successfully they were recognized as the #1 Bank for loyalty and satisfaction two years in a row. When this same approach was implemented by The Emirates Group (coached by the experience of Shinsei) the results were even more impressive [14].

The combination of low-cost forking (which makes new software generations very rich and diverse) and operational models that can easily select the fittest code in a given generation create a super-charged Deming cycle of sustainable innovation, quality, and value. But to make this cycle effective, the code itself must be susceptible to innovation. Black boxes of proprietary software define the point at which population-driven innovation stops. To fully realize the benefits of the population dynamics of open source innovation, the source code must be available at every level of the system.

We cannot solve problems by using the same thinking we used when we created them. —Albert Einstein

To summarize this rather far-reaching thesis, the world of Enterprise IT has been suffering under the delusion that if we throw enough money at enough black boxes, one of them will surely solve the problems that we were originally tasked with solving. Even if true, the world changes at such a rate that solving a problem once relevant in the past is likely no longer relevant in the future, especially if that problem is merely a symptom of a deeper problem. Recent results in economic theory teach that price and output analysis tend to reveal symptoms, but rarely uncover real, sustainable solutions. But an economic understanding of governance, transactions, and mutual benefit can inform not only sustainable solutions, but can induce ongoing, sustainable innovation, thereby creating ever-increasing business or social value. Evolutionary Game Theory provides a framework for national-level and enterprise-level analysis of a shift from proprietary applications to cloud computing. Factors such a financial capital, knowledge capital, business value potential, and trust capital influence both the processes of natural selection across populations as well as the myopic decisions of agents within populations. Open source software enables vital mechanisms prohibited by proprietary software, fundamentally changing the evolutionary rate and quality of successive generations of (cloud) applications. There is perhaps no easier nor faster way to add more value to enterprise, national, or global accounts than to embrace open source cloud computing and evolve beyond the problems of proprietary applications and platforms. All it requires is that you do something—as a member of the open source community—no matter how insignificant it may seem.