The Accumulo challenge, part I

submit to reddit
 
(5 votes)

The dozens of software projects launched in the wake of Google's Big Table and Map Reduce papers have changed the way we handle large datasets. Like many organizations, the NSA began experimenting with these "big data" tools and realized that the open source implementations available at the time were not addressing some of their particular needs. They decided to embark on their own project: Accumulo. Once they were happy with how Accumulo was working, they did the right thing and released Accumulo to the world through the Apache Foundation.

This is great for open source and the taxpayer. The government found a requirement not being fulfilled by the private sector, and rather than letting their work languish inside its walls, or paying a contractor to develop a proprietary solution, they shared what they had with the world. This is what open source advocates have been clamoring for, and with the Shared First initiative at OMB and innovative open source policies like those at the Consumer Financial Protection Bureau, it's certain we'll see more of this kind of sharing of taxpayer-funded work.

There's a catch, though. Since it was launched, Accumulo has joined an increasingly crowded space for tools that manage big data. As these upstarts compete – witness the extraordinary success of MongoDB from 10gen, Hadoop implementations from Cloudera and HortonWorks, tools like Hadapt, HBase, Cassandra, MapR, and many, many others – Accumulo presents a threat. Some of these products and projects already compete with each other, and the private sector doesn't like it when the government competes alongside them, for good reason.

There's a frequently-ignored policy called OMB Circular A-130 which says that the government shouldn't build something already available from the private sector. In the language of the policy, the government should:

...acquire off-the-shelf software from commercial sources, unless the cost effectiveness of developing custom software is clear and has been documented through pilot projects or prototypes

So there's a tension here: we want the government to share its innovations, but we don't want the government to crowd out the private sector. That's the thinking behind this recent language in Section 929 of S.3254, the 2013 Defense Authorization as reported out by the Senate Armed Services Committee:

(a) Limitation on Use of NSA Database-

(1) LIMITATION- No component of the Department of Defense may utilize the cloud computing database developed by the National Security Agency (NSA) called Accumulo after September 30, 2013, unless the Chief Information Officer of the Department of Defense certifies one of the following:

(A) That there are no viable commercial open source databases with extensive industry support (such as the Apache Foundation HBase and Cassandra databases) that have security features comparable to the Accumulo database that are considered essential by the Chief Information Officer for purposes of the certification under this paragraph.

(B) That the Accumulo database has become a successful Apache Foundation open source database with adequate industry support and diversification, based on criteria to be established by the Chief Information Officer for purposes of the certification under this paragraph and submitted to the appropriate committees of Congress not later than January 1, 2013.

(2) CONSTRUCTION- The limitation in paragraph (1) shall not apply to the National Security Agency.

(b) Adaptation of Accumulo Security Features to HBase Database- The Director of the National Security Agency shall take appropriate actions to ensure that companies and organizations developing and supporting open source and commercial open source versions of the Apache Foundation HBase and Cassandra databases, or similar systems, receive technical assistance from government and contractor developers of software code for the Accumulo database to enable adaptation and integration of the security features of the Accumulo database.

First of all: Wow. The Senate Armed Services Committee proposes to order the DOD to stop using Accumulo, and direct NSA to help push Accumulo's code back to other projects, specifically calling out HBase and Cassandra. The level of sophistication required for legislative language like this is astonishing. Under different circumstances, I'd find that sophistication encouraging. Instead, I'm concerned that SASC feels compelled to blacklist an open source project for the DOD. Surely there's a better response than this?

Let's put the Committee's reasoning to the side for a moment, and look at the remedy they proposed. What if it wasn't Accumulo, but another piece of software? Imagine that we're talking about the Apache Web Server, Red Hat Enterprise Linux, Microsoft SharePoint, or Adobe Acrobat. If Congress put any of those software packages on a blacklist, industry would lose its mind. Accumulo is no different: once it was open sourced, Accumulo became commercial software under the FAR and DFAR. Congress has no business intervening in this way.

There's more. The requirement that Accumulo be certified as "a successful Apache Foundation open source database with adequate industry support and diversification, based on criteria to be established by the Chief Information Officer" is extremely dangerous for Accumulo and for open source in general. It's not sufficient that the software be commercial, functional and be available at reasonable cost. It must now have "adequate industry support and diversification."

If the DOD CIO is compelled to create such criteria for Accumulo, it doesn't take much imagination to see that same "adequacy criteria" applied to all open source software projects. Got a favorite open source project on your DOD program, but no commercial vendor? Inadequate. Only one vendor for the package? Lacks diversity. Proprietary software doesn't have a burden like this.

The last clause of Section 929 is bewildering. SASC directs Accumulo to help other projects that want to use the Accumulo security code, and singles out HBase and Cassandra. There's nothing wrong with the desire to spread Accumulo's technology, but doesn't an act of Congress seem like an extraordinarily, comically inappropriate tool for that? Wouldn't the Accumulo team, like all open source developers, be generally helpful with folks who want to integrate their code? Perhaps more importantly, why is Congress so interested in HBase and Cassandra?

I think the Committee (and whoever provided them this legislative language) is right to be concerned about the government unnecessarily maintaining a duplicative software project. It's bad for the private sector, and it's bad for the government to maintain its own codebase when there are perfectly good alternatives elsewhere.

At the same time, the Accumulo folks indisputably did the right thing by releasing their code, and even went so far as to join the Apache Foundation, which is no small effort. They should be rewarded for their excellent stewardship of taxpayer money. Through their effort, everyone can already benefit from the work that they've done – with or without legislative orders to do so – and they're perfectly capable of winning or losing market share on their own merits, just like everyone else.

This Accumulo issue opens the door to a host of valid questions about the role of government in open source projects. In part two, we'll put this specific bill aside and ask the questions behind the legislation: does the government harm the private sector when they release open source projects? How can we know when open source is an appropriate way for the government to develop software? How should the government handle forks? I'll also examine some possible remedies that could eliminate the need for a dangerously crude tool like Section 929.

In the meantime, please let your Senator know how you feel about this.

""
Creative Commons License