Getting to Know Apache Kafka’s
Architecture
Ryan Plant
COURSE AUTHOR
@ryan_plant blog.ryanplant.com
Apache Kafka as a Messaging System
Producers Consumers
Apache Kafka as a Messaging System
Producers Topics Consumers
To: “X” Retrieve: “X”
“X”
“Y”
To: “Y” Retrieve: “Y”
Apache Kafka as a Messaging System
Producers Consumers
Broker
Apache Kafka as a Messaging System
Producers Broker Consumers
A
~/A/…
B
~/B/…
C
~/C/…
How Apache Kafka Starts to Differentiate
Producers Consumers
LinkedIn: 1,400 brokers => 2 petabytes per week
Broker Broker
Broker Broker
“A high-throughput distributed messaging system.”
The Apache Kafka Cluster
Producers Cluster Consumers
Broker Broker
Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 1
Broker Broker
Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 2
Broker Broker
Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 2
Broker Broker
Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 4
Broker Broker
Broker Broker
Later…
Distributed Systems
Collection of resources that are instructed
to achieve a specific goal or function
Consist of multiple workers or nodes
The system of nodes require coordination
to ensure consistency and progress
towards a common goal
KAFKA BROKERS
Each node communicates with each other
though messages
Distributed Systems: Controller Election
Work Items Attendance Status
Distributed Systems: The Cluster
KAFKA CLUSTER
Distributed Systems: Getting Work Done
PRODUCER
Worker availability and health
Task redundancy
Distributed Systems: Getting Work Done
(Reliably)
“we cannot afford loss…
three replicas, please”
PEER
LEADER PEER
LEADER
PEER PEER PEER
Distributed Systems: Getting Work Done
(Reliably)
”here you go…”
“we have a quorum”
LEADER LEADER
“not ready or able”
FOLLOWER
PEER FOLLOWER
PEER PEER
Sources of Work in Apache Kafka
PRODUCER CONSUMER
KAFKA CLUSTER
Distributed Systems: Communication and
Consensus
Worker node membership and naming
Configuration management
Leader election
Health status
Apache Zookeeper
Centralized service for maintaining
metadata about a cluster of distributed
nodes
- Configuration information
- Heath status
- Group membership
Hadoop, HBase, Mesos, Solr, Redis, and
Neo4j
Distributed system consisting of multiple
nodes in an “ensemble”
Apache Kafka’s Distributed Architecture
APACHE ZOOKEEPER
PRODUCER CONSUMER
KAFKA CLUSTER
Apache Kafka is a Pub-Sub messaging
system, consisting of:
- Producers and Consumers
Summary
- Brokers within a Cluster
Characteristics of distributed systems
- Worker node roles: Controllers,
Leaders, and Followers
- Reliability through replication
- Consensus-based communication
Role of Apache Zookeeper