How Apache Kafka is powering a real-time data revolution

35% of the Fortune 500 companies use Apache Kafka.

Image by:

Steve Jurvetson via Flickr (CC-BY-2.0)

Two years ago, Neha Narkhede co-founded a company called Confluent to build on her team's work with Apache Kafka. In this interview, we talk about how lots of companies are deploying Kafka and how that has led to a very busy GitHub repo.

Narkhede will keynote at All Things Open in Raleigh, NC next week.

What was it like leaving LinkedIn to start your own company?

It was a great experience and a natural extension of the mission that my co-founders and I had been working on for the past several years—of bringing Apache Kafka and our vision for a new future for a company's data architecture built around streaming data to the forefront.

Today, 35% of the Fortune 500 and thousands of companies worldwide use Kafka. There is a huge opportunity for Confluent to help companies leverage streaming data for mission-critical applications, gain insights to drive key business decisions in seconds instead of hours, react to critical events affecting business continuity in real time, and do that while vastly simplifying the operational footprint of their data architecture.

What advice would you give to someone starting an open source company?

There are several things to think about while building a company, but the ones that are particularly critical for building a successful company based on an open source technology are evangelism, community influence, the business model, and having the pragmatism to balance investment across these areas. Open source technology greatly simplifies the adoption problem for a new technology and empowers developers to use the technology that is right for building products. Essentially, the developer is the new buyer.

Apache Kafka has been touted as a rising star on GitHub. Why do you think it's been so successful?

There are many reasons Apache Kafka has taken off, but one of the major ones is that it offers the best solution to a problem that all companies have: processing data in real time. Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies. The other big reason for Kafka's success is its growing and thriving community. And last but not the least, Kafka became very popular because it just works.

Where are your contributors are coming from? What's the most important factor in maintaining a happy, healthy contributor base?

Apache Kafka contributors come from a wide variety of companies, simply because Kafka is applicable to and adopted by companies in a diverse set of industries—from financial services to retail, consumer tech to enterprise tech, and many more. From day one, we have always been focused on building a collaborative and engaged community. As a member of the PMC, I, along with the other contributors, aim to foster a collaborative approach to engineering by discussing major and minor decisions about the direction of the project in the open with the community.

Can you tell us about an interesting or surprising place you found Kafka being employed?

At Kafka Summit 2016, we heard from companies like Uber, Netflix, Dropbox, HomeAway, Goldman Sachs, and more that are all using Apache Kafka to make business decisions in real time. One of my favorite use cases is Uber: They are using Kafka to manage all of the surge pricing, which is a great example of a successful stream processing application.

Another great example is Comcast. They are taking Comcast's technology and product group's massive, heterogeneous set of data collection systems and centralizing on a single platform built around Kafka. These data collection systems are used for everything from business analytics to near-real-time operations, to executive reporting.

Really, enterprises can use Confluent and Apache Kafka in many different ways. Its versatility is another reason it has become so popular—it can be applied to myriad of different real-time data needs.

What doesn't Kafka do that you wish it did? What's next for the project?

Over time, the scope of what Kafka could do has grown from a messaging system to what it is today: A full-fledged distributed streaming platform. It enables publish-subscribe for streaming data like a messaging system, processes streams of data efficiently and in real-time, and stores streaming data safely in a distributed replicated cluster.

Increasingly, Kafka is being used in mission-critical applications requiring stronger database-like guarantees from Kafka. As such, I'm really excited about introducing exactly once guarantees and transactional messaging capability in Kafka. It will open up Kafka to an even larger user base, as it can then be used for applications like billing, ad impression counting, and many more use cases that depend on exactly once processing.

What would you like to see more of in open source?

Diversity and inclusion. Research has proved time and again—and I've observed from my own personal experience—that diverse groups perform significantly better and build better products than those that aren't. It is unfortunate that the open source community is pretty far away from attracting and engaging a diverse group of people. Gender diversity is lacking, with women often being underrepresented and mistreated. I'm confident that the Kafka community can serve as a good example of diversity for all open source communities and grow the open source developer base worldwide.

Comments are closed.

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.