Apache Kafka is a distributed publish-subscribe messaging system designed to be fast, scalable, and durable. It provides a unified, high-throughput, low-latency platform for handling real-time data feeds and has a storage layer that is essentially a massively scalable pub/sub message queue architected as a distributed transaction log. That architecture makes Kafka, which was originally developed by LinkedIn and made open source in early 2011, highly valuable for enterprise infrastructures to process streaming data.
Originally, Kafka was built for website activity tracking—capturing all the clicks, actions, or inputs on a website and enabling multiple "consumers" to subscribe to real-time updates of that information. Now, however, companies in internet services, financial services, entertainment, and other industries have adapted Kafka's massively scalable architecture and applied it to valuable business data.
Kafka lets enterprises from these verticals take everything happening in their company and turn it into real-time data streams that multiple business units can subscribe to and analyze. For these companies, Kafka acts as a replacement for traditional data stores that were siloed to single business units and as an easy way to unify data from all different systems.
Kafka has moved beyond IT operational data and is now also used for data related to consumer transactions, financial markets, and customer data. Here are three ways different industries are using Kafka.
A leading internet service provider (ISP) is using Kafka for service activation. When new customers sign up for internet service by phone or online, the hardware they receive must be validated before it can be used. The validation process generates a series of messages, then a log collector gathers that log data and delivers it to Kafka, which sends the data into multiple applications to be processed.
The benefit of using Kafka in this way is that the IT platform can perform an action for a consumer—activating service—and deliver data to an analytics application so that the ISP can analyze activations by geographical area, rates of activation, and much more.
Before Kafka, capturing and routing data to multiple departments required engineering, business intelligence, and separate pipelines duplicating the data. Kafka now serves as the single source of truth that not only captures data on what's going on with the application but also with what's going on with customers.
Global financial services firms need to analyze billions of daily transactions to look for market trends and stay on top of rapid and frequent changes in financial markets. One firm used to do that by collecting data from multiple business units after the close of the market, sending it to a vast data lake, then running analytics on the captured data.
To shift from a reactive approach to real-time analysis of the incoming market data, Kafka is serving as the messaging broker to house operations data and other market-related financial data. Now, instead of analyzing after-the-fact operational data, the firm's analysts can keep their finger on how markets are doing in real time and make decisions accordingly.
An example of a financial firm using Kafka is Goldman Sachs, which led development of Symphony, an industry initiative to build a cloud-based platform for instant communication and content sharing that securely connects market participants. It is based on an open source business model that is cost-effective, extensible, and customizable to suit end-user needs.
An entertainment company with an industry-leading gaming platform must process in real time millions of transactions per day and ensure it has a very low drop rate for messages. In the past, it used Apache Spark, a powerful open source processing engine, and Hadoop, but it recently switched to Kafka.
Interestingly, the company is using Kafka as an insurance policy for this data, because Kafka will securely store data in a readable format as long as the company needs it to. This enables the company to both route messages via a streamlined architecture and store data via Kafka; in the event of a disaster or critical error, it can recover the data and troubleshoot.
Netflix uses Kafka as the gateway for data collection for all applications, requiring hundreds of billions of messages to be processed daily. For example, it's using Kafka for unified event publishing, collection, routing for batch and stream processing, and ad hoc messaging.
In the cloud and on-premises, Kafka has moved beyond its initial function of website activity tracking to become an industry standard, providing a reliable, streamlined messaging solution for companies in a wide range of industries.
Find the perfect open source tool
Project management, business intelligence, reporting, and more. Check these popular projects.