0% found this document useful (0 votes)

47 views

Design A Distributed Queue

Uploaded by

Kshitij Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Design A Distributed Queue

Uploaded by

Kshitij Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Design a

Distributed
Messaging Queue
RabbitMQ or Amazon SQS

A Deep Dive for

System Design Interviews
Design a Distributed Messaging Queue - like
RabbitMQ or Amazon SQS
This is a very popular question. Designing a distributed queue is actually quite
simple once you understand the basics.

Let’s take a look at the requirements of this queue:

Requirements:

1. High Throughput
2. The queue should have persistence as much as we can without compromising
on high throughput
3. Support for multiple consumers and producers
4. Once an item is taken from a queue, it can be deleted. Only one consumer can
access one item

Let’s say we have a single-machine implementation.

We can add and remove items from a created queue.

We can create multiple queues and add and remove items from each one of them by
specifying the queue id or the queue name.

Get more such content: harsh.fyi/subscribe

Using this as a building block, we can figure out how to do this on multiple
machines.

High Throughput Distributed Queue Scenario

Let’s now say we have 3 machines and one queue to create and manage. We have a
large volume of data, so we want this queue to be sharded. How can we manage this?
We have a lot of producers writing to this queue and a lot of consumers pulling from
the queue. How do we handle the high throughput?

Well the logical thing to do will be to partition the queue into 3 separate queues, one
on each machine. Each producer can write to any partition, and consumers can
consume from any partition.

For huge workloads (think Facebook scale), you can partition this queue over 100
machines and achieve a lot of scale.

Get more such content: harsh.fyi/subscribe

There is one problem though - this implementation will not be strictly FIFO for the
entire queue. It will only be FIFO within each partition. For most use cases, this is
ok. For example, if we’re using it as a task queue or as a notifications queue, it is ok if
two tasks are out-of-order within a small time frame.

Get more such content: harsh.fyi/subscribe

We should try to keep all partitions evenly occupied. That will be ideal for the
queue’s performance because we don’t want to overload one partition.

With X machines, our throughput should be X times the throughput of a single

machine.

How to keep the Distributed Queue evenly balanced?

Keeping the queue balanced means writing and reading evenly from all the machines.
If one machine gets starved of items, then it’s readers will be starved as well.

Get more such content: harsh.fyi/subscribe

In the Queue above, readers of Machine B and C will soon get starved because
Machine A is getting a higher rate of writes.

The challenge really becomes writing and reading evenly from all partitions. So we
need some sort of a balancing approach, or some sort of coordinator.

Coordinating between the Queue partitions

Let’s see how to coordinate a queue between 3 machines.

Let’s say we are creating a Queue Q1. We create 3 partitions - P1, P2 and P3 - each on
a different machine. We will need some sort of Queue Manager that keeps track of
the partitions.

Let’s say that we have such a program running on one of the three machines:

Get more such content: harsh.fyi/subscribe

Try to figure this out yourself before reading our solution.

When a producer needs to write to the Queue, it asks the Queue Manager for the IP
address of a machine it can write to. Let’s say the Queue Manager returns the IP
address of Machine C (where P3 is located). The Producer can now establish a
persistent TCP/IP connection and start writing to P3.

Get more such content: harsh.fyi/subscribe

Now you may ask, why is the producer connecting directly with the Partition
Machine?
Why doesn’t it just pass along the message to the Queue Manager, and the Queue
Manager can send it to a partition, like this:

This will make things simple - just hand it off to the Queue Manager and the Queue
manager can take care of the rest. The problem with this approach is that the Queue
Manager becomes a bottleneck. All of the data has to go through it, and there’s only
so much throughput this one machine can handle.

Get more such content: harsh.fyi/subscribe

As the number of producers increases, this might slow down the entire queue. Then
you would have to add more queue managers to handle more writes. Instead of all
this, the general pattern is to directly let the producers connect to the machine. This
way the writes are decentralized, and that makes it more scalable horizontally. This is
also the pattern used in most distributed file systems for writing data. For example,
Google File System uses this exact pattern.

So now we’ve established that the producers will directly connect to the partition
machine and write the data.

So far we had one producer. Now, let’s say one more producer wants to write to the
Queue. The Queue Manager can send this producer to a different Machine so that
the previous partition is not hogged and other partitions are also written to.

This way, the Queue Manager keeps sending new producers to different partitions,
distributing the throughput across partitions.

Get more such content: harsh.fyi/subscribe

As you can see, we also have heartbeat messages going from the Queue Manager to
each partition, regularly communicating the partition’s health with the Queue
Manager. If the partition goes down or becomes overcrowded in relation to the other
partitions, the Queue manager can assign less producers to the partition.

This above setup has a flaw though - can you spot it? Spend some time thinking
about it.

Here it is: This model of assigning a single partition per producer works well if the
producers are homogeneous and more in number than the queue partitions. That

Get more such content: harsh.fyi/subscribe

way, each partition will get a similar amount load, and we don’t need to do any load
balancing.

For load that doesn’t have a lot of bursts, and which has a lot of producer servers, this
will work well. If there is even one “power producer” that is producing a lot of data at
a high rate, one partition will be hogged up.

This model breaks down as soon if we have different producers producing different
quantities of data.

Get more such content: harsh.fyi/subscribe

As we can see above, the power producer fills up one partition quicker. When
consumers read the queue randomly, the filled up partition’s items are at a
disadvantage, because they will be read much later - after the items in other slower
partitions have been read. This reduces the FIFO properties of the queue.

What can we do to solve this? Well, one solution is for each producer to write to
multiple partitions. Each producer can do a round-robin write to different partitions.

Get more such content: harsh.fyi/subscribe

When the producer connects, the Queue manager can give it the IPs of all three
partitions. The producer can then connect to all three machines and write to all 3
machines in a round robin fashion - once to Machine A, then to Machine B, then to
C, and so on.

Get more such content: harsh.fyi/subscribe

Now, you might ask, if I am writing code on the producer, do I need to write a loop
to pick one partition and write, then pick another partition, etc.? No, that will be
handled by the Queue’s client library.

For example, RabbitMQ has client libraries that the producer machine will install.
Let’s say the producer is a Web server and the web server needs to write lots of JSON
objects to the queue. The web server will install your Queue’s (MyQueue) client
library. This library will have functions to connect to the queue and write to it. For
example:

Connection queueConnect = MyQueueManager.get(“My JSON

Queue”);

Get more such content: harsh.fyi/subscribe

queueConnect.enqueue(“{id:33435, data: {...}}”);

The enqueue function and the MyQueueManager library should handle round
robin writes to different partitions. All of this is abstracted away from the end user
aka the producer.

So now we have seen two approaches for writing to the Queue - assign one partition
to a producer and assign multiple partitions to the producer. In reality, different
situations might deem different approaches to be more suitable.

If we have 1000 producers and only 3 machines, then it might make sense to assign
one partition per producer, because evenly spreading so many producers might result
in good load balancing anyway.

In our library, we can give a configurable property for this. The developer can
configure it according to their situation and customize the load balancing.

How do we make sure the system is fault tolerant?

From the discussion in the non-distributed version, we know that till the data flushes
to disk, it is susceptible to being lost if the process crashes or if the machine goes
down.

Get more such content: harsh.fyi/subscribe

This is usually small - the size of the flush interval - usually less than a second.
However, this is still a major cause of concern because in a messaging queue, we want
zero data loss. If you are sending a task to be queued and the messaging queue loses
that task, we have suddenly lost the task. This is not acceptable in most use cases.
Imagine you sending a message to a friend, and that message just disappearing into
thin air without you ever knowing or being informed. That is what will happen if
our queue loses data.

Now, if a hard drive fails, we have suddenly lost all contents in a queue.

Get more such content: harsh.fyi/subscribe

This is not desirable at all. If the drive fails, all the data is not recoverable. Assuming
the process also crashes when the drive fails, the entire queue is lost, including the
data in the RAM. How do we save ourselves from this horrible fate?

We turn to a time tested method of replication. That is the only way to safeguard
against total machine failure.

This means that we need to replicate each partition into multiple machines. Here is
what that would look like:

Get more such content: harsh.fyi/subscribe

Each partition has a leader machine. All reads and writes to that partition go to that
leader machine . Note that in a Queue, a “Read” is also effectively a write because the
item has to be dequeued. So we cannot take advantage of replication to increase read
throughput as we do in replicated databases - where read replicas are able to increase
our read throughput. This might be different in a streaming platform like Kafka
where a read does not delete the queue.

Let’s look at how exactly the replication happens. As we can see, there is a leader
machine for each partition. The producer will connect with the leader machine and
send data. The leader will write data to its partition and to all its synchronous
replicas. Synchronous replicas are those who are updated synchronously with the
leader replica. Let’s say we have 1 synchronous replica. This ensures that the write is
written to 2 machines right away.

Get more such content: harsh.fyi/subscribe

Before we send an acknowledgement to the producer that the write is done, it is
written in 2 machines. This might take longer than writing to one machine, but it
ensures that data loss will be very hard.

In order to lose data, now 2 machines have to go down at the same time, which is
much less likely. To make this even less likely, the administrator can also do things
like ensuring the two machines are connected to different power supplies or different
network routers - so that they don’t have a common cause of failure.

We can also have normal replicas or asynchronous replicas, which are synchronized
after the main replicas returns an ack to the producer. These replicas are faster for the
throughput, because the write is propagated asynchronously. This is what the
replication looks like now after adding the asynchronous replicas.

As we can see, we have a replication factor of 3 here - 2 synchronous and 1

asynchronous replica. You can adjust this as per the data needs. For example, if the
data is not too critical, like user behavior logs, it’s probably ok to lose small amounts

Get more such content: harsh.fyi/subscribe

of data once in a while, so you can do 1 synchronous replica and 1 async replica. It’s
up to the developer to adjust this according to business needs and costs.

After adding fault tolerance, this queueing system can now scale to many machines.
We can auto scale the queue and add more partitions as load increases. To add
another partition to a new machine, we simply create a new partition and point
producers to it.

Get more such content: harsh.fyi/subscribe

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
System Design Interview Textbook
No ratings yet
System Design Interview Textbook
51 pages
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
SignalR on .NET 6 - the Complete Guide
From Everand
SignalR on .NET 6 - the Complete Guide
Fiodar Sazanavets
No ratings yet
Amazon Web Services (AWS) Interview Questions and Answers
From Everand
Amazon Web Services (AWS) Interview Questions and Answers
Tech Interviews
4.5/5 (3)
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Pointers and User Defined Types
100% (2)
Pointers and User Defined Types
7 pages
Property Management System
50% (2)
Property Management System
48 pages
chp 2.1
No ratings yet
chp 2.1
2 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Number of Servers: The Simplified Notation in Queue Where
No ratings yet
Number of Servers: The Simplified Notation in Queue Where
62 pages
Queue 1
No ratings yet
Queue 1
15 pages
Network Programming in Python : The Basic: A Detailed Guide to Python 3 Network Programming and Management
From Everand
Network Programming in Python : The Basic: A Detailed Guide to Python 3 Network Programming and Management
John Galbraith
No ratings yet
Seminar 4
No ratings yet
Seminar 4
4 pages
Cc 33 Data Structure Midterm Module
No ratings yet
Cc 33 Data Structure Midterm Module
12 pages
Linux, Apache, MySQL, PHP Performance End to End
From Everand
Linux, Apache, MySQL, PHP Performance End to End
Colin McKinnon
5/5 (1)
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
CDS M6 1 1
No ratings yet
CDS M6 1 1
11 pages
RabbitMQ Checklist For Production Environments - A Complete Guide - CloudAMQP
No ratings yet
RabbitMQ Checklist For Production Environments - A Complete Guide - CloudAMQP
6 pages
Data structure 3
No ratings yet
Data structure 3
11 pages
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
CLASS 12 NOTES COMPUTER SCIENCE CHAP 4
No ratings yet
CLASS 12 NOTES COMPUTER SCIENCE CHAP 4
4 pages
Queue Data Structure
No ratings yet
Queue Data Structure
20 pages
Textbook For System Design Interviews
No ratings yet
Textbook For System Design Interviews
50 pages
Introduction-to-Queues
No ratings yet
Introduction-to-Queues
10 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
DSA Queue Part 1
No ratings yet
DSA Queue Part 1
7 pages
Operations on Queue Class12
No ratings yet
Operations on Queue Class12
5 pages
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
From Everand
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
Intermediate Load Runner With Oracle/Apex Concepts.
From Everand
Intermediate Load Runner With Oracle/Apex Concepts.
Rohan Gordon
No ratings yet
Unit 3 Queue
No ratings yet
Unit 3 Queue
52 pages
Industrial Cases in Simulation Modeling
From Everand
Industrial Cases in Simulation Modeling
James A. Chisman PhD
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Kafka
No ratings yet
Kafka
5 pages
Module 8 QUEUES
No ratings yet
Module 8 QUEUES
23 pages
Jump Start Git
From Everand
Jump Start Git
Shaumik Daityari
No ratings yet
Real-Time Operating Systems: Queue Management
No ratings yet
Real-Time Operating Systems: Queue Management
90 pages
TD Queue Correction
No ratings yet
TD Queue Correction
10 pages
2 Kafka Eventstorming
No ratings yet
2 Kafka Eventstorming
104 pages
What Is A Queue
No ratings yet
What Is A Queue
6 pages
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
5/5 (2)
Lecture 13 (MLQS)
No ratings yet
Lecture 13 (MLQS)
7 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Group 11 Queue
No ratings yet
Group 11 Queue
11 pages
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
Lecture 4
No ratings yet
Lecture 4
18 pages
queue
No ratings yet
queue
9 pages
L6
No ratings yet
L6
29 pages
Queues Notes
No ratings yet
Queues Notes
6 pages
Data Structures and Algorithms Queues.pptx 20241230 133435 0000
No ratings yet
Data Structures and Algorithms Queues.pptx 20241230 133435 0000
9 pages
Queue
No ratings yet
Queue
31 pages
MATLAB For Dummies
From Everand
MATLAB For Dummies
Jim Sizemore
No ratings yet
5-Queue
No ratings yet
5-Queue
14 pages
Apache Kafka
No ratings yet
Apache Kafka
38 pages
System-admin-Report
No ratings yet
System-admin-Report
10 pages
Breaking the Availability Barrier Ii: Achieving Century Uptimes with Active/Active Systems
From Everand
Breaking the Availability Barrier Ii: Achieving Century Uptimes with Active/Active Systems
Dr. Bruce Holenstein
No ratings yet
Os Assignment-1
No ratings yet
Os Assignment-1
15 pages
REPORT WRITING
No ratings yet
REPORT WRITING
9 pages
6.queue
No ratings yet
6.queue
14 pages
R23_DS_Unit IV-1
No ratings yet
R23_DS_Unit IV-1
5 pages
Lenovo Storage V3700 V2 and V3700 V2 XP: Product Guide
No ratings yet
Lenovo Storage V3700 V2 and V3700 V2 XP: Product Guide
29 pages
תרגיל 2
No ratings yet
תרגיל 2
2 pages
TR-4323-DeSIGN-0814 Highly Available OpenStack Deployment With NetApp Storage
No ratings yet
TR-4323-DeSIGN-0814 Highly Available OpenStack Deployment With NetApp Storage
64 pages
Build A Simple PHP Application: Stage 3 Adding A Contact Form
No ratings yet
Build A Simple PHP Application: Stage 3 Adding A Contact Form
4 pages
Arysalesspecialistexam1 PDF Free
No ratings yet
Arysalesspecialistexam1 PDF Free
11 pages
Excel Training Poster
No ratings yet
Excel Training Poster
1 page
Soal Final Exam
100% (1)
Soal Final Exam
15 pages
Chapter 5 Memory and Memory Interface
No ratings yet
Chapter 5 Memory and Memory Interface
56 pages
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
No ratings yet
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
4 pages
Cmek Csek
No ratings yet
Cmek Csek
2 pages
DATA SHEET Cloud Data Management
No ratings yet
DATA SHEET Cloud Data Management
2 pages
Python Notes
No ratings yet
Python Notes
7 pages
TIB BC 6.2 Installation
No ratings yet
TIB BC 6.2 Installation
50 pages
Binary Trees: Data Structures and Algorithms in Java
No ratings yet
Binary Trees: Data Structures and Algorithms in Java
58 pages
TechCorner 15 - C-More Remote Access, Data Log, FTP File Transfer, and Email (Tutorial)
No ratings yet
TechCorner 15 - C-More Remote Access, Data Log, FTP File Transfer, and Email (Tutorial)
15 pages
Creating An ODI Project and Interface PDF
No ratings yet
Creating An ODI Project and Interface PDF
55 pages
Tech RPT New PDF
No ratings yet
Tech RPT New PDF
47 pages
Aspire v3571g
No ratings yet
Aspire v3571g
233 pages
DS Lab
No ratings yet
DS Lab
7 pages
Soal MTCNA
No ratings yet
Soal MTCNA
8 pages
Abinitio Preperation
No ratings yet
Abinitio Preperation
30 pages
Index Effect On Performance
No ratings yet
Index Effect On Performance
6 pages
Treemacs
No ratings yet
Treemacs
4 pages
State of Data Analytics
100% (1)
State of Data Analytics
21 pages
Spiral Model/ (Iterative Model) : Requirement Collection
No ratings yet
Spiral Model/ (Iterative Model) : Requirement Collection
2 pages
SQL Basics Cheat Sheet A4
No ratings yet
SQL Basics Cheat Sheet A4
2 pages
Dba Privileges
No ratings yet
Dba Privileges
6 pages
En GMS 8.5.2 API Book
No ratings yet
En GMS 8.5.2 API Book
352 pages