Get the highlights in your inbox every week.
Interview with Doug Cutting of Cloudera
Chief Architect of Cloudera on growth of Hadoop
Doug Cutting is founder of numerous successful open source projects, including Lucene and Hadoop, and currently the chief architect at Cloudera and sits on the Board of the Apache Software Foundation.
In this interview, he tells me how working on open source is more about common sense than creed and dives into open source adoption in the enterprise. Prior to his keynote at the All Things Open conference, I asked him about open sourcing Lucene, what his role is like on the board of the Apache Software Foundation, and what the open source way means to him.
What was it like to open source Lucene back in 2000 when you released it on SourceForge under the GPL license?
It wasn't that different than things are today. Folks had shared software for a long time in the academic and research communities, so the concept of downloading free stuff wasn't new, nor were open source licenses. (I first ran into the GPL in 1985 when I contributed some code to GNU Emacs.) The tools were different. We used Concurrent Versions System (CVS), since even subversion wasn't yet available. We didn't use a bugtracker, just the mailing list, but the fundamental process is much the same. People communicate to coordinate their work on a shared project.
Since the first project you founded, Lucene, you have followed the open source way principles. Do you still apply them today, and why?
To me it's more common sense than following any particular creed. I want to help create software that people use, that's useful. I like to do this together with other people. The rest follows naturally. One must treat collaborators with respect or they won't want to collaborate. Similarly, transparency and meritocracy are required to build healthy, long-lived collaborative communities. At this level it's not much different than non-software projects. If you're cleaning up after a party then some folks need to clear the table, some wash dishes, some put chairs away, etc. No one is the boss, everyone just pitches in where they can to achieve the group's goal, which is both to get the house clean and to remain friends.
You are a member of the Apache Software Foundation board. Can you tell is a bit about your role?
Mostly the Apache Board monitors all the projects in the foundation to make sure each has a healthy community. We need to ensure that projects aren't controlled by one person or company, that everyone is acting respectfully, etc. Each of the 150+ Apache projects submits a quarterly report to the board, so we review about 50 projects at each monthly meeting. Most run smoothly. Occasionally we have to give a project a nudge in the right direction. The board also deals with the typical administrivia, like making sure someone keeps the website running, collecting donations, filing taxes, etc.
With the increasing adoption of open source in enterprise, where do you see both open source and Hadoop in 3 to 5 years?
I gravitated to open source because it suited me as a developer. It lets lots of folks use software I work on, which is personally rewarding. But it also is very attractive to users of software, since they can be less dependent on other businesses ("locked in"). More and more developers are creating open source alternatives to proprietary technologies. Given the choice, users prefer an open source implementation for it's lack of lock in. The Hadoop ecosystem has taken the next step, where the open source implementations came first. Few are motivated to create proprietary alternatives since folks would likely prefer the open source versions. I expect this pattern to continue for many years. The core components of the Hadoop ecosystem will remain open source, even as the core grows and mutates. Some proprietary tools survive at the top of the stack, but few will at the base.
What is your take on the formation of the TODO group?
I spoke briefly with them, and I think it's just a mailing list for folks running corporate open source projects to talk about best practices. They don't seem to have more of an agenda than that. Lots of companies publish something open source and have common technical and legal issues. They'd like to collaborate on approaches, or at least commiserate.
For more on Hadoop, see our Introduction to Hadoop.