There are rapidly growing feature set, high commit rates, and code contributions happening across the globe to Apache Hadoop and related Apache Software Foundation projects. However, the number of woman developers, committers, and Project Management Committee (PMC) members in this vast and diversified ecosystem are really diminutive. For the Hadoop project alone, only 5% out of 84 committers are women; and this has been the case for over the past 2 years.
Due to the exponential rise in adoption, a number of Apache projects such as Apache Pig, Apache Hive, Apache Ambari, Apache Zookeeper, Apache Hbase, etc. have come into existence to augment the Hadoop platform and a lot of new Apache incubator projects such as Apache Storm, Apache Tez, and Apache Knox are currently taking shape in the big data space. Lets take a look at the number of woman committers and PMC members in some of the projects in the Hadoop ecosystem, both top level and incubator.
From this data, it is evident that there are very few woman committers and PMC members in these projects.
When I joined Hortonworks, a leading commercial vendor of Apache Hadoop that focuses on the development and support of Apache Hadoop, as part of the 24 engineers group from Yahoo! back in 2011, I was the only woman employee in Research and Development and remained so for a very long time.
At Hortonworks, I build large, distributed testing frameworks to guarantee high-quality releases of a unified Hadoop stack. Being a collection of open source projects—each with a different release frequency, rich feature set, and rapid evolution—made testing the Hadoop ecosystem a daunting task. Because of the high influx of code, it becomes imperative to stay updated about new features, API changes, improvements, and the impact every change has on the stack. Validating every component involves a large multi-dimensional matrix of tens of operating systems, multitudes of Java Development Kits (JDKs) and versions, different flavors of database interactions, and various modes like secure, non-secure, etc. The nature of code I write includes thousands of tests running for days on clusters spanning nearly a thousand nodes, crunching terabytes of data, and validating more than a dozen projects.
All of this is critical in achieving stable, high-quality Hadoop stack releases. It is a great feeling to know that you are working for the betterment of a software product, which the industry is so graciously embracing. However, when it came to hiring engineers, it was a hard and long process. Moreover, I constantly faced the challenge of hiring a woman engineer with knowledge of distributed systems and open source experience. During conferences, user group meetings, and in the open source community too, I found a very small percentage of women participants. But in all this, not once did I feel left out of the group because the community was ever welcoming and supportive.
Whether I am reporting a bug, submitting code for review, or '+1ing' a patch or committing the code, there is no bias or double standard. All I see is a vibrant community who is passionate about what they do and striving for excellence. The developer community is brilliant, fair, and has made working on open source software fun and easy.
So, how does one start contributing to an open source project?
The straightforward way is the one I chose: join a company that has a strong commitment to open source software. It is a dream come true for anyone who wants to make a living contributing to open source projects. Conferences like ApacheCon, OSCON, and FOSSCON can help streamline the projects you are interested in.
Another easy way is by joining a project during its inception stages. The volume of code is less and you get involved in the architectural design and development of the product right from the beginning. For a well-established project, once the code grows and the project is sizable, it wont be easy, at least at first. But you can start with reporting a bug or a documentation issue or by testing a feature or by giving feedback on the usability. You can work your way up by understanding the use cases, overview of the product, architectural details, and then the innards of the code. Source documentation, mailing lists, IRC, online forums, blogs, bug tracking systems, and books prove very helpful in this task. You can then work on submitting a patch for either an improvement you would like to see or for an existing bug. Once your patch is ready for review, you can welcome co-developers to critique your work and get immediate feedback.
After multiple such iterations, your code is ready to be committed to the source code! With a single patch you have already raised your standards. And there is an added benefit of showcasing your work and proving your competence. Your code, responses on mailing list, and review comments establishes your meritocracy in the industry.
I personally am a big believer in innovation through open source development. Working in open source project is both fun and challenging. You get to interact with the best minds in industry, while all striving to build world-class software. Through constant feedback from co-developers in the open source community and due to ideas being discussed openly, its guaranteed that you will be building the best of technology. There is an unrivaled potential for personal growth and it is important for more and more developers, especially women, to participate in the development of open source software.
Let your code speak for itself, and for you.