Kafka 101: Essential Core Concepts for Building Resilient Event-Driven Architectures
Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for real-time data feeds. It is horizontally scalable, fault-tolerant, and speedy.
The core concepts of Kafka are:
- Topics: A stream of messages belonging to a particular category is called a topic.
- Producers: Producers are the client applications that publish messages to a Kafka topic.
- Consumers: Consumers are the client applications that read messages from a Kafka topic.
- Brokers: A Kafka cluster is composed of multiple brokers. Each broker is a separate process that runs on a different machine.
- Zookeeper: Apache Zookeeper is used to manage the Kafka cluster. It is responsible for maintaining the brokers, topics, and partitions list.
Some limitations of Kafka include the following:
- It does not support random access to messages.
- It does not have built-in support for data transformation.
- It does not have a user-friendly interface for monitoring and managing the cluster.
Some competitors of Kafka include:
- RabbitMQ
- Amazon Kinesis
- Google Cloud Pub/Sub
- Apache Flume
Some advantages of Kafka include the following:
- High throughput: Kafka can handle high throughput with low latency.
- Scalability: Kafka is horizontally scalable, which can drive increased traffic by adding more brokers to the cluster.
- Durability: Kafka persists all published messages to disk, providing durability in server failures.
- Fault tolerance: Kafka is designed to be fault-tolerant, meaning that it can continue to operate even when some of its servers fail.
In this session, we will cover the following things.
- Producer
- Consumer
- Broker
- Cluster
- Topic
- Partitions
- Offset
- Consumer groups
We also cover a high-level example of the Kafka use case.