Apache Kafka – What Is It?

For the uninitiated, the Kafka project created by LinkedIn in 2012 and adopted by Apache is a public subscribe distributed messaging system. This post seeks to provide an overview on Kafka by presenting the ideas related to producers, topic, brokers and consumers.

Introduction to Kafka:

Kafka written in Scala is a scalable, high throughput, replicated, partitioned log system. It was created at LinkedIn primarily aimed at live feeds coming from all social media channels whether they were coming from Twitter, Facebook or LinkedIn itself. Later on, it was open sourced so that other organizations may be able to adopt it as well. Like other messaging systems, messages are written to and read from the server – but with Kafka clusters it happens at a good speed.

Kafka is considered to be a “public subscribe distributed messaging system” rather than a “queue system” since the message is received from the producer and broadcast to a group of consumers rather than a single consumer.

Architecture of Kafka:

Having seen the history of Kafka, let us move onto its architecture. These are the basic terms associated with the Kafka architecture – producer, broker, consumer and topic.

Producer:

Different producers like Apps, DBMS, NoSQL write data to the Kafka cluster. The Kafka cluster consists of many “brokers”. Each “broker” in layman term is a “server”. Each message is given a key which assures that all messages with the same key arrive at the same partition. The producer continuously keeps writing messages to the Kafka cluster without waiting for any acknowledgement. It is this asynchronous way of producing and adding messages to the cluster that gives Kafka its immense speed which is an absolute necessity with today’s live social media feeds.

Topic:

Messages of a similar type are considered to be a ‘Topic’. A ‘Topic’ is similar to a ‘File’ structure. Messages are published to a ‘Topic’ and there is a partition associated with each ‘Topic’.

Brokers:

The “broker” in Kafka is similar to what a traditional “broker” would do. It holds the messages that have been written by the producer before being consumed by the ‘consumer’.

There are many “brokers” or “servers” inside the Kafka cluster. Each “broker” has a partition and as already stated each partition is associated with a ‘Topic’. The brokers receive the messages and they are stored in the “brokers” for ‘n’ number of days (which can be configured). After the ‘n’ of days has expired, the messages are discarded. It is important to state here again that Kafka does not check whether each consumer or consumer groups have read the messages.

Consumer:

After the “producers” have produced the message and sent it to the Kafka brokers, the consumers then read the message. Each “consumer” or “consumer group” is subscribed to different “topics” and they read from the “partition” for the “topics” they are subscribed to. If one of the brokers goes down, then the other brokers support the system and makes sure it is running smoothly.

ZooKeeper:

The Zookeeper’s primary responsibility is to coordinate with the different components of Kafka cluster. The producer hands the message to the “broker leader” which writes the message onto itself and replicates it onto other brokers. LinkedIn, Yahoo, Twitter, Pinterest, Tumblr, Goldman Sachs and Netflix are just a few examples of organizations that have adopted Kafka into their production systems.

This post gave an overview of Kafka followed by its architecture. Kafka will no doubt be embraced by more organizations as time goes by.

For more information on Kafka visit: Kafka.apache.org

About the Author
More from Author

About Aditi Malhotra

Aditi Malhotra is the Content Marketing Manager at Whizlabs. Having a Master in Journalism and Mass Communication, she helps businesses stop playing around with Content Marketing and start seeing tangible ROI. A writer by day and a reader by night, she is a fine blend of both reality and fantasy. Apart from her professional commitments, she is also endearing to publish a book authored by her very soon.

Top 45 Fresher Java Interview Questions - March 9, 2023
25 Free Practice Questions – GCP Certified Professional Cloud Architect - December 3, 2021
30 Free Questions – Google Cloud Certified Digital Leader Certification Exam - November 24, 2021
4 Types of Google Cloud Support Options for You - November 23, 2021
APACHE STORM (2.2.0) – A Complete Guide - November 22, 2021
Data Mining Vs Big Data – Find out the Best Differences - November 18, 2021
Understanding MapReduce in Hadoop – Know how to get started - November 15, 2021
What is Data Visualization? - October 22, 2021