0% found this document useful (0 votes)

342 views

Kafka Architectures Notes

The document provides an overview of the Apache Kafka architecture. It discusses the four core APIs: the Producer API, Consumer API, Streams API, and Connector API. It then describes the key components of the Kafka cluster architecture including brokers, consumers, producers, ZooKeeper, topics, partitions, and replication factors. The document aims to explain the fundamental concepts underlying how Apache Kafka is designed to reliably process streaming data at large scales.

Uploaded by

skhanshaikh3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

342 views

Kafka Architectures Notes

Uploaded by

skhanshaikh3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Apache Kafka Notes

Table Of Contents
show
 Introduction and Trends
 Kafka Architecture
 Kafka Cluster Architecture
 Fundamental Concepts of Kafka Architecture
o Kafka Topics
o Partitions in Kafka
o Topic Replication Factor in Kafka
o Consumer Group
 Advantages of Kafka Architecture
 Disadvantages of Kafka Architecture
 Conclusion
 Additional Resources

The four key APIs of the Apache Kafka distributed streaming platform is the
Producer API, Consumer API, Streams API, and Connector API. In addition to
offering redundant storage of massive data volumes, the Connector API and
its features such as a message bus capable of throughput reaching millions of
messages per second are capable of processing streaming data from real-
time applications. In this Kafka tutorial, we’ll discuss the Kafka architecture.
We will discuss API in Kafka. We will also learn about Kafka brokers, Kafka
consumers, zookeepers, and producers. We will also get to know some
fundamental Kafka concepts.

Let’s get started with the Apache Kafka architecture.

Introduction and Trends

The Producer, Consumer, Streams, and Connector APIs of Apache Kafka

provide four key services: persistent storage of massive data volumes,
message bus capable of throughput reaching millions of messages every
second, redundant storage of massive data volumes, and parallel processing
of huge amounts of streaming data. It is a solution designed for processing
streaming data from real-time applications that have the capacity to handle
millions of messages every second.
Apache Kafka Notes
To run Kafka, a platform, you must first register with the Kafka Producer API
and Consumer API, then connect to the Kafka cluster through Brokers,
Consumers, Producers, and ZooKeeper. The Kafka Producer API, Kafka
Streams API, and Kafka Connect API are used to manage the platform, while
the Kafka cluster architecture is made up of Brokers, Consumers, Producers,
and ZooKeeper.

Apache Kafka’s architecture is, in fact, simple, albeit for a reason: It eliminates
the Kafkaesque complexities that often accompany messaging architectures.
The intent of the architecture is to deliver an easier-to-understand method of
application messaging than most of the alternatives. Kafka is known for being
a fault-tolerant, fault-diffusional scalable log with a very simple data design.

Kafka allows for a persistent ordered data structure. A record cannot be

deleted, or modified, only appended to the log. Kafka cluster keeps track of
the order of items in Kafka logs and guarantees that the log is partitioned into
distinct commits with equal priority. Messages are stored in the order they are
received, which ensures the order and integrity of the record structure. Every
record has an assigned unique sequential ID known as an offset, which is
used to retrieve data. This ensures the log has a unique start and end point.

By providing set ordering and deterministic processing, Kafka addresses

typical issues with distributed systems. In addition to sequential disk reads,
Kafka benefits from ordered-on-disk message data storage because it keeps
data on disk and in an ordered manner. Disk seeks are costly in terms of
resource waste, and Kafka’s process of first reading and writing at a
consistent pace, followed by simultaneous reads and writing, reduces
resource waste. In addition to that, because Kafka reads and writes
simultaneously, it does not get in the way of each other; it also offers great
resource efficiency.
Apache Kafka Notes
The fact that Kafka can scale up makes horizontal scaling effortless. For
example, if you ask Kafka to handle a simple list update, it can do it with the
same level of performance.

Kafka Architecture

The producer API, the consumer API, the streams API, and the connector API
are the four core APIs in the Apache Kafka architecture. We will discuss them
one by one:

 Producer API: An application can publish a stream of records to a

Kafka topic using the Producer API.
 Consumer API: An application can subscribe to one or more topics and
also process the stream of records generated to them using this API.
 Streams API: The streams API enables applications to convert input
streams to output streams, which is accomplished by consuming an
input stream from one or more topics and producing an output stream
from one or more output topics. Furthermore, to act as a stream
processor, consuming an input stream from one or more topics and
producing an output stream to one or more output topics, effectively
transforming the input streams into output streams, the streams API
permits an application.
 Connector API: The Connector API is used to connect Kafka topics to
existing applications or data systems. For example, a connector to a
relational database might capture every change to a table.
Apache Kafka Notes
Kafka Cluster Architecture

The diagram provides information about the cluster structure of Apache Kafka:

The structure diagram above provides a detailed description of its architecture

of Kafka, including:

Kafka Broker: When maintaining the state of the cluster, the Kafka cluster
typically uses ZooKeeper. However, these are stateless brokers, so keeping
the cluster state is a stateful task. Despite the fact that one Kafka Broker
instance can handle hundreds of thousands of reads and writes per second,
keeping the cluster state is a stateful process.

A broker can handle tens of millions of messages without performance impact,

but ZooKeeper must perform Kafka broker-leader election.

Kafka – ZooKeeper: Kafka broker uses ZooKeeper to coordinate, manage,

and report on the status of a Kafka broker in the Kafka system. It also informs
producers and consumers about the presence of any new brokers or the
failure of the current brokers.

Immediately after Zookeeper sends the notification regarding the broker’s

presence or absence, producers and consumers make the decision and begin
coordinating their work with another broker.
Apache Kafka Notes
Kafka Producers: Kafka brokers receive data from Producers. In addition, all
the Producers send messages to the new brokers when they start, thus
automatically connecting all the Producers.

While the Kafka producer hands out messages as fast as the broker can
process them, it does not wait for acknowledgments from the broker.

Kafka Consumers: The main advantage of partition offset is that the Kafka
Consumer keeps track of how many messages have been consumed by
keeping track of the partition offset. In addition, you can ensure that the
consumer has consumed all prior messages by acknowledging every
message offset.

There must be a buffer of bytes available to consume in order for the

consumer to initiate a pull request. For example, if the offset is 5, consumers
can rewind or skip to any point in a partition by supplying 5 as an offset value.
ZooKeeper informs consumers of the offset value.

Fundamental Concepts of Kafka Architecture

We have listed some of the fundamental concepts of Kafka Architecture that

you must understand.

Kafka Topics

Messages are received by producers through logical channels to which they

publish messages and from which consumers receive messages.

1. A data topic defines the stream of data of a particular type/classification.

In Kafka.
2. The way messages are structured or organised here impacts how they
are received. A certain type of message is published on a certain topic.
3. A producer initially writes its messages to the topics. Once consumers
read those messages from topics, they are read by other consumers.
4. A Kafka cluster has a topic named by its name and must be unique.
5. You can cover any number of topics, but there is no limit to them.
6. Data has to be published before it can be changed or updated.
Apache Kafka Notes
The image provides information about the partitioning relationship between
Kafka Topics and partitions:

Partitions in Kafka

Partitions and also replicated across brokers are employed to divide up Topics
in a Kafka cluster.

1. Regardless of which partition the message is written to, there is no

guarantee that it will be published to that partition.
2. If a producer publishes a message with a specific key to Kafka, it will be
ensured that all such messages (with the same key) will be delivered to
the same partition. This feature ensures message sequencing. Even
without a key being added to it, data is written to partitions randomly.
3. Each message is stored in a sequence fashion in one partition.
4. The messages are partitioned into chunks, and each chunk is assigned
an incremental id, also known as offset.
5. The offsets within a partition are only meaningful within that partition;
however, the values across partitions are meaningless.
6. There is no limit to the number of Partitions.

Topic Replication Factor in Kafka

When designing a Kafka system, it’s important to include topic replication in

the algorithm. When a broker goes down, its topic replicas from another
broker can solve the crisis, assuming there is no partitioning. We have 3
brokers and 3 topics.
Apache Kafka Notes
Topic 1 and Partition 0 both have a replication factor of 2, and so on and so
forth. Broker1 has Topic 1 and Partition 0, and Broker2 has Broker2. It has got
a replication factor of 2; which means it will have one additional copy other
than the primary copy. The image is below:

Some important points are stated:

 The level of replication performed is partition level only.

 There can be only one broker leader at a time for a given partition.
Meanwhile, other brokers will maintain synchronised replicas.
 Having more than the number of brokers would result in an over-
saturation of the replication factor.

Consumer Group

 There can be multiple consumer processes running.

 Every consumer group has its own group-id, which is basically one
consumer group.
 Reading the data from one partition in one consumer group, at the time
of reading, exactly one consumer instance reads the data.
 Consumer groups can read from one single partition since there is more
than one consumer group.
 If the number of consumers exceeds the number of partitions, there will
be some inactive consumers. We will discuss it with an example if there
are 8 consumers and 6 partitions in a single consumer group. We have
2 inactive consumers in that situation.

Advantages of Kafka Architecture

Apache Kafka Notes
Apache Kafka has the following features that make it worthwhile:

1. The Apache Kafka platform offers a low latency value, i.e., up to 10

milliseconds. Because it decouples the message, the consumer can
consume that message at any time.
2. When using Kafka, businesses such as Uber can handle a lot of data at
the same time because of its low latency. Kafka is able to handle a lot of
messages in a second. Uber uses Kafka to store a lot of data.
3. Kafka must have the ability to survive a node or machine failure within
the cluster.
4. A Kafka cluster can use the replication function, which makes data or
messages persist on the cluster in addition to being written on a disk.
This makes the cluster durable.
5. A single Kafka integration handles all of a producer’s data. Therefore,
we only need to create one Kafka integration, which automatically
integrates us with every producing and consuming system.
6. Anyone who has access to Kafka data can easily view it.
7. The distributed system includes a distributed architecture that makes it
scalable. Partitioning and replication are two of the distributed systems’
capabilities.
8. With Apache Kafka, you can build a real-time data pipeline. It can
handle a real-time data pipeline. Processors, analytics, storage, and the
rest of the staff are required to build a real-time data pipeline.
9. Kafka works as a batch-like operation and can also function as an ETL
tool due to its data persistence abilities.
10. A scalable software product is one that can handle large amounts
of messages simultaneously. Kafka is such a product.

Disadvantages of Kafka Architecture

There are certain restrictions/disadvantages to Apache Kafka.

1. An exception to the rule is that Apache Kafka does not come with a
complete set of monitoring and management tools. Because of this,
new ventures or enterprises avoid using Kafka.
2. A message being tweaked by the Kafka broker requires system calls. In
case the message needs some work, its performance of Kafka is
reduced. So, it is advisable to keep the message the same.
3. Apache Kafka does not permit wildcard topic selection. Instead, it
applies only the exact topic name. The reason for this is that ignoring
wildcard topics is unable to meet certain demands.
Apache Kafka Notes
4. The reduction in the data flow caused by brokers and consumers
padding and decompressing the data flow affects its performance as
well as its throughput.
5. When there are more than one Kafka Queue in the Kafka Cluster,
Apache Kafka can be a bit clumsy.
6. Some message paradigms, such as point-to-point queues,
request/reply, and so on, are absent from Kafka for certain use cases.

Conclusion

We discussed Kafka’s architecture earlier in the post. We also saw the Kafka
components and basic concepts. We also saw a brief description of Kafka’s
brokers, consumers, and producers. We also mentioned Kafka Architecture
API in this post. If you want to know more about Kafka’s Architecture, please
read the official documentation.

Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
From Everand
Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
Elara Drevyn
No ratings yet
PP-4 MS
No ratings yet
PP-4 MS
14 pages
Dumps
No ratings yet
Dumps
131 pages
Virtusa Interview Questions
100% (1)
Virtusa Interview Questions
2 pages
Template For Gigascience Journal Manuscript Submissions: First Author, Second Author, Third Author and Fourth Author
No ratings yet
Template For Gigascience Journal Manuscript Submissions: First Author, Second Author, Third Author and Fourth Author
6 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Documentation
No ratings yet
Documentation
105 pages
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
5.PEGA Interview Questions Bible
No ratings yet
5.PEGA Interview Questions Bible
155 pages
Top PEGA 124 Interview Questions and Answers — HARSHA TRAININGS
No ratings yet
Top PEGA 124 Interview Questions and Answers — HARSHA TRAININGS
18 pages
Chapter 4a - High Level Design
No ratings yet
Chapter 4a - High Level Design
26 pages
Day 03 - Class Notes and Home Work
No ratings yet
Day 03 - Class Notes and Home Work
3 pages
2.PEGA Interview Dump 1
No ratings yet
2.PEGA Interview Dump 1
187 pages
Kafka PEGA
No ratings yet
Kafka PEGA
17 pages
SHIVA KUMARA - JavaArchitect
No ratings yet
SHIVA KUMARA - JavaArchitect
9 pages
Topiwise Interviewquestions1
No ratings yet
Topiwise Interviewquestions1
37 pages
WP Data Engineers Handbook
No ratings yet
WP Data Engineers Handbook
22 pages
Class Notes
No ratings yet
Class Notes
42 pages
Basics of Kafka
No ratings yet
Basics of Kafka
17 pages
Java 8 Stream Practice
No ratings yet
Java 8 Stream Practice
3 pages
Slide 5-6 Kafka
No ratings yet
Slide 5-6 Kafka
111 pages
Application Development Intermediate: CSSA Interview Questions
No ratings yet
Application Development Intermediate: CSSA Interview Questions
8 pages
Apache Kafka Interview Questions
No ratings yet
Apache Kafka Interview Questions
5 pages
Interview Question
No ratings yet
Interview Question
41 pages
TCS Salesforce Hiring
No ratings yet
TCS Salesforce Hiring
3 pages
SudheerKumar Ponnana Resume
No ratings yet
SudheerKumar Ponnana Resume
4 pages
Pega CSAv7.1 Dumps - Pegadumps
No ratings yet
Pega CSAv7.1 Dumps - Pegadumps
17 pages
Dzone Com Articles JVM Architecture Explained
No ratings yet
Dzone Com Articles JVM Architecture Explained
8 pages
Pega Interview Preperation
100% (1)
Pega Interview Preperation
35 pages
Answers To List of Java Unanswered Interview Questions
No ratings yet
Answers To List of Java Unanswered Interview Questions
35 pages
Camel Microservices With Spring Boot and Kubernetes
No ratings yet
Camel Microservices With Spring Boot and Kubernetes
67 pages
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
No ratings yet
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
91 pages
Core and Advance Java Interview Questions
No ratings yet
Core and Advance Java Interview Questions
4 pages
Deva Dattu: - Phone: 925-307-9979 - Linkedin
No ratings yet
Deva Dattu: - Phone: 925-307-9979 - Linkedin
8 pages
Pega Doument 1690469041
No ratings yet
Pega Doument 1690469041
138 pages
Qtometa YouTube Pega Videos
No ratings yet
Qtometa YouTube Pega Videos
2 pages
InfoSys Interview Questions For API
100% (1)
InfoSys Interview Questions For API
3 pages
Dhruba Jyoti Saha - Java Architect
No ratings yet
Dhruba Jyoti Saha - Java Architect
15 pages
Sudhir Gannavarapu Full Stack Developer Professional Summary
No ratings yet
Sudhir Gannavarapu Full Stack Developer Professional Summary
4 pages
Es6 Question
No ratings yet
Es6 Question
3 pages
Junit Interview Questions
No ratings yet
Junit Interview Questions
6 pages
Lab 1 - Amazon Simple Storage (S3)
No ratings yet
Lab 1 - Amazon Simple Storage (S3)
11 pages
Data Structures and Algorithms Made Easy With Java Learn Data Structure Using Java in 7 Days
No ratings yet
Data Structures and Algorithms Made Easy With Java Learn Data Structure Using Java in 7 Days
364 pages
List The Various Components in Kafka
No ratings yet
List The Various Components in Kafka
2 pages
Handle Large Messages in Apache Kafka
No ratings yet
Handle Large Messages in Apache Kafka
59 pages
Mockito Framework
No ratings yet
Mockito Framework
17 pages
PEGA Online Training
No ratings yet
PEGA Online Training
2 pages
69 Spring Interview Questions and Answers - The ULTIMATE List
No ratings yet
69 Spring Interview Questions and Answers - The ULTIMATE List
14 pages
80 mock questions - aws certified data engineer associate
No ratings yet
80 mock questions - aws certified data engineer associate
33 pages
1.CSA & CSSA Interview Questions
No ratings yet
1.CSA & CSSA Interview Questions
21 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
New Features in JDK 8: Ivan St. Ivanov Dmitry Alexandrov Martin Toshev
No ratings yet
New Features in JDK 8: Ivan St. Ivanov Dmitry Alexandrov Martin Toshev
58 pages
Pega Customer Decision Hub (CDH)
No ratings yet
Pega Customer Decision Hub (CDH)
14 pages
Pega Contenttt
No ratings yet
Pega Contenttt
10 pages
Java Architect 3
No ratings yet
Java Architect 3
9 pages
AWS Interview Questions: Click Here
No ratings yet
AWS Interview Questions: Click Here
15 pages
Co Forge
No ratings yet
Co Forge
2 pages
Santhosh Java Dev
100% (1)
Santhosh Java Dev
8 pages
Mandar Balkrishna Gurav - Databrick Architect - Senior
No ratings yet
Mandar Balkrishna Gurav - Databrick Architect - Senior
12 pages
Chaitanya - Sr. AWS Engineer
No ratings yet
Chaitanya - Sr. AWS Engineer
3 pages
Kafka and NiFI
No ratings yet
Kafka and NiFI
8 pages
The Complete Spring Boot: A Comprehensive Guide to Modern Java Applications
From Everand
The Complete Spring Boot: A Comprehensive Guide to Modern Java Applications
Aarav Joshi
No ratings yet
Dbms U2 One Shot Bcs501
No ratings yet
Dbms U2 One Shot Bcs501
71 pages
Big Data Report 1
No ratings yet
Big Data Report 1
17 pages
Ubuntu Database Management System Guide
No ratings yet
Ubuntu Database Management System Guide
16 pages
Mysql Workbench Home: View Screenshot
No ratings yet
Mysql Workbench Home: View Screenshot
3 pages
Informatics Practices-XI Inner Title: Supplement
100% (3)
Informatics Practices-XI Inner Title: Supplement
50 pages
Getting Started With: Dataverse
No ratings yet
Getting Started With: Dataverse
40 pages
Login Script
No ratings yet
Login Script
7 pages
SS Koineni Resume
No ratings yet
SS Koineni Resume
4 pages
Mindorks Android Online Professional Course - Syllabus
No ratings yet
Mindorks Android Online Professional Course - Syllabus
7 pages
Java Swing
No ratings yet
Java Swing
39 pages
What Is Microsoft Access
No ratings yet
What Is Microsoft Access
5 pages
Instant Download Data Warehouses and OLAP Concepts Architectures and Solutions Robert Wrembel PDF All Chapter
100% (4)
Instant Download Data Warehouses and OLAP Concepts Architectures and Solutions Robert Wrembel PDF All Chapter
84 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
42 pages
WORD & Excel
No ratings yet
WORD & Excel
11 pages
APIs for AI and Data Science (for DUC PHAM) (Ryan Day)
No ratings yet
APIs for AI and Data Science (for DUC PHAM) (Ryan Day)
133 pages
IDENTIKEY Appliance Product Guide
No ratings yet
IDENTIKEY Appliance Product Guide
256 pages
github_com_muhammadshiraz_Bus_Ticket_Reservation_System_tab_
No ratings yet
github_com_muhammadshiraz_Bus_Ticket_Reservation_System_tab_
17 pages
Course Outline in Statistics and Probability 4 Quarter: Dates Melc Skills Included Subject-Matter Performance Task 1 Week
No ratings yet
Course Outline in Statistics and Probability 4 Quarter: Dates Melc Skills Included Subject-Matter Performance Task 1 Week
2 pages
Indian State District City
No ratings yet
Indian State District City
115 pages
Database Indexing
No ratings yet
Database Indexing
4 pages
Mech Nptel 2020-21
No ratings yet
Mech Nptel 2020-21
33 pages
Basics
No ratings yet
Basics
23 pages
DBMS Using MS-Access
No ratings yet
DBMS Using MS-Access
34 pages
A201 Topic 5 - Laudon Mis16 PPT Ch06 KL CE
No ratings yet
A201 Topic 5 - Laudon Mis16 PPT Ch06 KL CE
50 pages
Chapter 4 Data Modeling
No ratings yet
Chapter 4 Data Modeling
9 pages
Dbms Grade 10
No ratings yet
Dbms Grade 10
16 pages
Sahodaya Comp Set 2 QP
100% (6)
Sahodaya Comp Set 2 QP
9 pages
CLL F041 Ar TRM Eng
No ratings yet
CLL F041 Ar TRM Eng
79 pages