Understanding Topics, Partitions, and
Brokers
Ryan Plant
COURSE AUTHOR
@ryan_plant blog.ryanplant.com
Basic Apache Kafka installation:
- Download the binary package
Demo - Extract the archive
- Explore the installation directory
contents
Prerequisites:
- Linux operating system
- Java 8 JDK installed
- Scala 2.11.x installed
Apache Kafka Topics
Central Kafka abstraction
Named feed or category of messages
- Producers produce to a topic
- Consumers consume from a topic
Logical entity
Physically represented as a log
to: “my_topic”
to: “my_other_topic”
Broker Broker Broker Broker
“MY_TOPIC”
“MY_OTHER_TOPIC”
from: “my_topic” from: “my_other_topic”
ordered sequence (by time)
append-only
older
immutable facts as events newer
0 1 2 3 4 5 6 7 8 9
“MY_TOPIC”
Event Sourcing
An architectural style or approach to maintaining an
application’s state by capturing all changes as a
sequence of time-ordered, immutable events.
older newer
0 1 2 3 4 5 6 7 8 9
“MY_TOPIC”
Message Content
{timestamp}
Each message has a:
- Timestamp
{id} - Referenceable identifier
- Payload (binary)
[data content]
0 1 2 3 4 5 6 7 8 9
“MY_TOPIC”
The Offset
A placeholder:
- Last read message position
- Maintained by the Kafka Consumer
- Corresponds to the message identifier
0 1 2 3 n
“MY_TOPIC”
“from beginning”
offset: 0
offset: 0
offset: 3
0 1 2 3 4 5 6 7 8 9
“MY_TOPIC”
“from last offset”
3 8
Message Retention Policy
Apache Kafka retains all published
messages regardless of consumption
Retention period is configurable
- Default is 168 hours or seven days
Retention period is defined on a per-topic
basis
Physical storage resources can constrain
message retention
Simple Kafka cluster setup
Creating an Apache Kafka topic
Producing some messages to the topic
Demo Consuming the messages from the topic
Look for:
- Built-in Producer and Consumer
clients
- The ordering of the messages within a
topic
Don’t get too caught up on:
- The command line parameters and
options
Does Look This Look Familiar?
older newer
0 1 2 3 4 5 6 7 8 9
TOPIC
Append-only
Ordered sequence (by time)
Immutable facts as events
Transaction or Commit Logs
Source of truth
Physically stored and maintained
Higher-order data structures derive from
the log
- Tables, indexes, views, etc.
Point of recovery
Basis for replication and distribution
Apache Kafka is publish-
subscribe messaging
rethought as a distributed
commit log.
Kafka Partitions
Each topic has one or more partitions
A partition is the basis for which Kafka can:
- Scale
- Become fault-tolerant
- Achieve higher levels of throughput
Partition Partition Partition Each partition is maintained on at least one
or more Brokers
(Partition == Log)
Creating a Topic: Single Partition
~$ bin/kafka-topics.sh --create --topic my_topic \
> --zookeeper localhost:2181 \
> --partitions 1 \
> --replication-factor 1
“my_topic”
Partition 0
0 1 2 3 4 5 6 7 8 n
/tmp/kafka-logs/{topic}-{partition}
“my_topic-0”
.index
Broker
.log
Each partition must fit entirely on one machine.
In general, the scalability of
Apache Kafka is determined
by the number of partitions
being managed by multiple
broker nodes.
Creating a Topic: Multiple Partitions
~$ bin/kafka-topics.sh --create --topic my_topic \
> --zookeeper localhost:2181 \
> --partitions 3 \
> --replication-factor 1
“my_topic”
Partition 0
0 1 2 3 4 5 6 7 8 9
Partition 1
0 1 2 3 4 5 6 7 8 9
Partition 2
0 1 2 3 4 5 6 7 8
older newer
$.s
--create h
1 available brokers
--topic my_topic ZK
--partitions 3 2
--rep-factor 1 assign partitions to leaders
3 status checks
I have
partition 1 of
my_topic I have
id=0 id=1 partition 2 of id=2
my_topic
I have
partition 0 of
my_topic-1 my_topic my_topic-0 my_topic-2
metadata metadata metadata
$.s
h
--broker-list {broker:2}
--topic my_topic ZK
metadata
…
id=0 id=1 id=2
my_topic-1 my_topic-0 my_topic-2
metadata metadata metadata
$.s
h
--zookeeper {..}
--topic my_topic metadata ZK
…
3 1
2
id=0 id=1 id=2
my_topic-1 my_topic-0 my_topic-2
metadata metadata metadata
Partitioning Trade-offs
The more partitions the greater the
Zookeeper overhead
- With large partition numbers ensure
proper ZK capacity
Message ordering can become complex
- Single partition for global ordering
- Consumer-handling for ordering
The more partitions the longer the leader
fail-over time
Something Is Missing
What about fault-tolerance?
- Broker failure
- Network issue
- Disk failure
Houston, we
have a
my_topic problem…
broker id: 2
partitions: 0, 2
ZK
I now have
id=0 id=1 partition 0 of id=2
my_topic
my_topic-0
my_topic-1 my_topic-0 my_topic-2
No redundancy between nodes…
Don’t Forget the Replication Factor
~$ bin/kafka-topics.sh --create --topic my_topic \
> --zookeeper localhost:2181 \
> --partitions 3 \
> --replication-factor 1
Replication Factor
Reliable work distribution
- Redundancy of messages
- Cluster resiliency
- Fault-tolerance
Guarantees
- N-1 broker failure tolerance
- 2 or 3 minimum
Configured on a per-topic basis
Multiple Replica Sets
~$ bin/kafka-topics.sh --create --topic my_topic \
> --zookeeper localhost:2181 \
> --partitions 3 \
> --replication-factor 3
$.s
--create h
--topic my_topic ZK
--partitions 3
--rep-factor 3
id=0 id=1 id=2
my_topic-1 my_topic-0 my_topic-2
metadata metadata metadata
ISR: ISR: ISR:
id=7 3 id=5 id=8 3 id=3 id=4 3 id=6
my_topic-1 my_topic-1 my_topic-0 my_topic-0 my_topic-2 my_topic-2
(replica) (replica) (replica) (replica) (replica) (replica)
$.s
--create h
--topic my_topic ZK
--partitions 3
--rep-factor 3
id=0 id=1 id=2
my_topic-1 my_topic-0 my_topic-2
metadata metadata metadata
ISR: ISR: ISR:
id=7 3 id=5 id=8 2 id=3 id=4 3 id=6
my_topic-1 my_topic-1 my_topic-0 my_topic-2 my_topic-2
(replica) (replica) (replica) (replica) (replica)
Viewing Topic State
~$ bin/kafka-topics.sh --describe --topic my_topic \
> --zookeeper localhost:2181
Multi-broker Kafka Setup
Demo Single Partition Topic
Replication Factor of 3
Look for:
- Using the --describe command
- Failure handling
- Continued operation
Detailed explanation and view:
- Topics and Partitions
- Broker partition management and
behavior
Summary Aligned with distributed systems
principles
- Leader election of partitions
- Work distribution and failover
Kafka in action
- Demos
- Configuration
Foundation upon which to dive deeper
into Producers and Consumers