Couchbase is a NoSQL, document-oriented database for building interactive applications. Trends in the open source database industry show positive growth as NoSQL is used for web, mobile, and the Internet of Things (IoT).
In this interview, Arun Gupta, VP of Developer Advocacy at Couchbase, shares his views on how open source has made an impact on the database industry, and the challenges that lie ahead for the NoSQL industry. Also, find out which open source tools and methodologies Couchbase has adopted.
Arun has authored more than 2,000 blog posts on technology as well as several books. He's been named a JavaOne Rock Star for three years in a row, and he founded the Devoxx4Kids chapter in the U.S. where he continues to promote technology education among children. Among his recognized titles are: Java Champion, a JUG Leader, NetBeans Dream Team member, and Docker Captain.
What's it like to work for Couchbase?
Well, I've built and led developer communities for 10+ years at Sun, Oracle, and Red Hat, so I have experience in leading crossfunctional teams to develop and execute strategy, planning, and execution of content, and marketing campaigns and programs. I've also led engineering teams at Sun, and I’m a founding member of the Java EE team.
At Couchbase, a developer advocate helps developers become effective users of a technology, product, API, or platform. This can be done by sharing knowledge about the product using the medium where developers typically hangout. Some of the more common channels include blogs, articles, webinars, and presentations at conferences and meetups. Answering questions on forums and Stack Overflow, conversations on social media, and seeking contributors for open source projects are some other typical activities that a developer advocate performs on a regular basis.
How has open source impacted the database industry, and NoSQL databases in particular?
Simply put, open source has revolutionized the database industry.
For several decades databases were closed source and under proprietary licenses. Users were at the mercy of a handful of big vendors like IBM, Oracle, Microsoft, Teradata, and SAP. Enterprises were held captive to release schedules and licensing schemes that often had little to do with providing technical value, and were more about maximizing the vendor's profit margin. There were a few open source databases like MySQL, PostgreSQL, and SQLite that offered an alternative to the big vendors, but for the most part these products lacked the maturity, feature depth, and stability that most enterprises were looking for. Until recently, it appeared that the established vendors had a lock on the database industry and that the barrier to entry was so high that other database providers would be forever relegated to be niche players.
But two things occurred: open source database products matured, and distributed computing happened.
Although enterprises continue to run a lot of "the big five" RDBMS products, industry analysts like Gartner, Forrester, and 451 Research as well as software ranking websites like dbengines.com have clearly stated that mature open source databases are no longer niche products. The new innovations in distributed computing leverage modern, high performance, commodity hardware to provide distributed, scalable data processing (for example, Hadoop and Spark) and distributed, scalable data storage (for example, HDFS and NoSQL).
The vast majority of the software products developed over the last 10 years that provide distributed computing are open source. These new products take advantage of advanced technical features, agile development methodologies, and an open source collaborative licensing model to deliver rapid technological innovation as well as lower cost of operations to their users.
With regard to NoSQL databases, virtually all of these products are available under an open source license (with a few notable exceptions like Amazon DynamoDB, MarkLogic, and a few others). And the number of open source NoSQL databases continues to grow. What's different about the NoSQL space, however, is not just the number of products that are available, but how vibrant and active the open source community is. What's amazing about this is that the adoption rate among Fortune 1000 companies continues to grow exponentially and these companies are contributing back to the community. Also, amazing is the mission critical, nonniche nature of the NOSQL use cases, as well as the resulting quality and maturity of these products.
As we continue to push our information society forward and expand technology in the form of things like machine learning, what challenges do you see in the next few years for the NoSQL database industry?
Probably the two biggest challenges are around technology integration and maintaining product focus, quality, and scalability.
From an integration perspective, you have multiple technologies (for example, the Hadoop stack, Spark, IoT, and NoSQL) that are rapidly evolving, in a space where there are very few actual standards. On the one hand, you need to choose where to focus your resources and try to identify which trends, products, and APIs are going to become widely adopted, while at the same time adjusting to the rapid changes of those same trends, products, and APIs. Also, we hear from enterprise users that inefficient or poorly executed integrations result in wasted time and effort due to performance and scalability problems. You could argue that open source NoSQL databases should provide every single integration possible, but customers aren’t really looking for every kind of integration—they are looking for integrations that work.
From a product quality and scalability perspective, NoSQL databases provide a high performance, high throughput alternative to the RDBMS onesizefitsall database approach by delivering focused functionality that is specifically designed to provide fast access to operational data. Integrating NoSQL with new technologies like machine learning for example, requires a more interactive, realtime, operational approach, rather than a bulk, batch processing approach that is more common with RDBMS and file system integrations.
Additionally, as NoSQL databases become more feature and integration rich, there is a real risk of losing some of the performance and scalability advantages that NoSQL brings to the table. Avoiding that risk doesn't occur by chance or by luck—it has to be planned for and architected into the product.
For example, Couchbase's core database architecture is based on our MultiDimensional Scaling (MDS) and Database Change Protocol (DCP). MDS and DCP enable Couchbase to provide a service oriented architecture (SOA) for data management. With MDS, Couchbase can continuously add data management services, which can be individually configured and resourced, without affecting the throughput or behavior of other services. This allows customers to scale each service independently (scaling up, scaling out, or both) while also providing workload isolation. This SOA-based approach to data management has allowed Couchbase to introduce full SQL for JSON (N1QL), Global Secondary Indexing, Cross Datacenter Replication (XDCR), built-in Full Text Search, and high performance streaming integration with Spark, Kafka, and Hadoop.
Which open source tools and methodologies has Couchbase adopted?
One of the core values at Couchbase is that we are an open source software company. We believe that open source is about much more than simply making the source code available—it's about how we develop our software, and how we engage with the greater Couchbase community.
This affects our development process in the following ways:
- We develop in the open. Our current code line and checkins are public. Couchbase contributors can (and do) see what we’re working on and influence its direction.
- We perform both internal and external (public) code reviews. Product quality and usefulness is significantly enhanced by the feedback from non-Couchbase subject matter experts. This has been especially useful in the development of our .NET SDK and Node.js/Ottoman framework, among others.
- We practice transparency. Our issue trackers are open, and include product roadmap and future feature direction wherever possible.
We do this because we know that it results in a higher quality product with better features and shorter development cycles.
This also affects how we engage with the Couchbase community, as well as the larger open source community:
- As part of our developer community website, we host the interactive Couchbase developer forums on Couchbase.com to ensure that the community is "part of the company" and that the company is focused on our interaction with the community.
- We are proud of our community contributors and maintain attributions in Git History, as part of the product lineage.
- Couchbase developers make upstream contributions to other open source projects, including projects like Memcached, Erlang, CouchDB, TCMalloc, and others.
We do this because the benefits of being an open source product increase in direct proportion to the quality and vibrancy of the community surrounding that product. Fostering a great open source community requires providing real technical value to our community members and the opportunity for two-way communication and contribution.
What trends do you see currently evolving in the open source database industry?
The open source industry, like the products themselves, is evolving and changing every day. That said, there are some common trends that we’re seeing amongst the various vendors, products, and customers. For example:
- Open source is ground zero for innovative technology development, and the cloud is where it is getting deployed in production.
- Open source business models continue to evolve, with the Freemium model being the current favorite.
- Historically proprietary licensing companies, like IBM, Microsoft, and Oracle, are increasingly providing their own open source database products.
- There is a growing mandate, across both public government organizations and private enterprises to use open source solutions whenever possible. There is also a corollary trend to focus and consolidate usage to well-known, proven projects rather than attempting to implement every new thing that comes out.
- Enterprises are increasingly adopting open source databases, actively contributing back to the open source community, and even releasing internal products and tools as open source. Some examples include Google, LinkedIn, Facebook, eBay, Walmart, and Netflix.
- There is a really interesting shift occurring in the motivation for enterprises to move to open source technology. Licensing cost is no longer the #1 reason—it has become product quality, innovative/competitive features, and the ability to fix problems. You can clearly see this trend reflected in the recent 2016 Future of Open Source survey.
The open source database industry reflects a lot of the trends mentioned above. Additionally, we also see:
- Feature crossover and overlap. Traditional relational database vendors are implementing successful open source features, and open source database projects are implementing more traditional relational database features. This benefits users of both technologies.
- Open source databases are broadening their enterprise appeal by adding data types and query capabilities that expand the use cases where they can be used (like graph, fulltext search, and indatabase analytics), improving their security and manageability, and increasing their integration with both new and existing technologies (like application containers, Hadoop/Spark, and JDBC/ODBC).
As a developer advocate, how do you encourage developers to participate and contribute to open source NoSQL database projects?
Open source NoSQL database projects, and the community in general, benefit enormously from developer participation and contributions. Whether on their own time, or as part of their everyday job, developers are the people who make these products happen. When I talk with enthusiastic developers who want to be a part of this great adventure, I usually suggest:
- GitHub is the center of the universe. If you’re not familiar with it already, you need to be.
- Pick a project that you like, or better yet one that you use, and become an expert in that project. Get corporate backing by asking your company to dedicate some or all of your time towards contributing back to the open source projects that they use.
- Find an area of the project that you are passionate about, and start your contributions there. You know, it’s not always about writing new features – there are APIs to develop, storage options to consider, integration with other tools and projects, as well as examples, documentation, and test suites that need continuing review and improvement. Focusing on documentation, for example, can be of tremendous value to you and to many others in the community.
- Solve real world problems. The contributions that make the biggest impact to the community are the ones that solve a problem that many people encounter. It can be a problem in any of the areas above. Odds are, if it’s a problem for you, then it’s a problem for other users as well.
- Most open source projects have processes to encourage participation and contributions—find them and use them. Couchbase, for example, dedicates an entire section of our developer portal to working with the community. It includes a list of the available open source projects, access to the developer forums, how to engage with the Experts and Champions program, as well as a monthly developer newsletter.
In short, find your community, see what’s needed, and get involved. Before you know it, you’ll be a rock star and a contributor to the your favorite open source project.