0% found this document useful (0 votes)
16 views77 pages

CAP Theorem Lect 2

The CAP theorem outlines the trade-offs in distributed systems, stating that a system can only guarantee two of the three properties: Consistency, Availability, and Partition Tolerance. It emphasizes the importance of understanding these trade-offs for designing distributed databases, as misinterpretation can lead to poor design choices. The document also discusses variations of consistency models, examples of eventual consistency in real-world applications, and introduces the PACELC theorem for further understanding of trade-offs in distributed systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views77 pages

CAP Theorem Lect 2

The CAP theorem outlines the trade-offs in distributed systems, stating that a system can only guarantee two of the three properties: Consistency, Availability, and Partition Tolerance. It emphasizes the importance of understanding these trade-offs for designing distributed databases, as misinterpretation can lead to poor design choices. The document also discusses variations of consistency models, examples of eventual consistency in real-world applications, and introduces the PACELC theorem for further understanding of trade-offs in distributed systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

CAP Theorem

CAP Theorem
• Consistency:
– All nodes should see the same data at the same time
• Availability:
– Node failures do not prevent survivors from continuing to
operate
• Partition-tolerance:
– The system continues to operate despite network
partitions
• A distributed system can satisfy any two of these
guarantees at the same time but not all three
CAP Theorem

C A

P
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?

Bob Dong
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?

Bob Dong
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?

Bob Dong
CAP Theorem: Proof
• A simple proof using two nodes:

A B
CAP Theorem: Proof
• A simple proof using two nodes:
Not Consistent!

A B

Respond to client
CAP Theorem: Proof
• A simple proof using two nodes:
Not Available!

A B

Wait to be updated
CAP Theorem: Proof
• A simple proof using two nodes:
Not Partition
Tolerant!
A B

A gets updated from B


Why this is important?
• The future of databases is distributed (Big
Data Trend, etc.)
• CAP theorem describes the trade-offs involved
in distributed systems
• A proper understanding of CAP theorem is
essential to making decisions about the future
of distributed database design
• Misunderstanding can lead to erroneous or
inappropriate design choices
Problem for Relational Database to Scale
• The Relational Database is built on the
principle of ACID (Atomicity, Consistency,
Isolation, Durability)
• It implies that a truly distributed relational
database should have availability, consistency
and partition tolerance.
• Which unfortunately is impossible …
Revisit CAP Theorem
• Of the following three
guarantees potentially offered
a by distributed systems:
• Consistency
• Availability
• Partition tolerance C A
• Pick two
• This suggests there are three P
kinds of distributed systems:
• CP
• AP Any problems?
• CA
A popular misconception: 2 out 3
• How about CA?
• Can a distributed system
(with unreliable network)
really be not tolerant of
C A
partitions?
Consistency or Availability
• Consistency and Availability is
not “binary” decision
• AP systems relax consistency
in favor of availability – but
are not inconsistent C A
• CP systems sacrifice
availability for consistency-
but are not unavailable
P
• This suggests both AP and CP
systems can offer a degree of
consistency, and availability, as
well as partition tolerance
AP: Best Effort Consistency
• Example:
– Web Caching
– DNS
• Trait:
– Optimistic
– Expiration/Time-to-live
– Conflict resolution
CP: Best Effort Availability
• Example:
– Majority protocols
– Distributed Locking (Google Chubby Lock service)
• Trait:
– Pessimistic locking
– Make minority partition unavailable
Types of Consistency
• Strong Consistency
– After the update completes, any subsequent access will
return the same updated value.
• Weak Consistency
– It is not guaranteed that subsequent accesses will return
the updated value.
• Eventual Consistency
– Specific form of weak consistency
– It is guaranteed that if no new updates are made to
object, eventually all accesses will return the last
updated value (e.g., propagate updates to replicas in a
lazy fashion)
Eventual Consistency Variations
• Causal consistency
– Processes that have causal relationship will see
consistent data
• Read-your-write consistency
– A process always accesses the data item after it’s
update operation and never sees an older value
• Session consistency
– As long as session exists, system guarantees read-
your-write consistency
– Guarantees do not overlap sessions
Eventual Consistency Variations
• Monotonic read consistency
– If a process has seen a particular value of data item,
any subsequent processes will never return any
previous values
• Monotonic write consistency
– The system guarantees to serialize the writes by the
same process
• In practice
– A number of these properties can be combined
– Monotonic reads and read-your-writes are most
desirable
Eventual Consistency
- A Facebook Example
• Bob finds an interesting story and shares with
Alice by posting on her Facebook wall
• Bob asks Alice to check it out
• Alice logs in her account, checks her Facebook
wall but finds:
- Nothing is there!

?
Eventual Consistency
- A Facebook Example
• Bob tells Alice to wait a bit and check out later
• Alice waits for a minute or so and checks back:
- She finds the story Bob shared with her!
Eventual Consistency
- A Facebook Example
• Reason: it is possible because Facebook uses
an eventual consistent model
• Why Facebook chooses eventual consistent
model over the strong consistent one?
– Facebook has more than 1 billion active users
– It is non-trivial to efficiently and reliably store the
huge amount of data generated at any given time
– Eventual consistent model offers the option to
reduce the load and improve availability
Eventual Consistency
- A Dropbox Example
• Dropbox enabled immediate consistency via
synchronization in many cases.
• However, what happens in case of a network
partition?
Eventual Consistency
- A Dropbox Example
• Let’s do a simple experiment here:
– Open a file in your drop box
– Disable your network connection (e.g., WiFi, 4G)
– Try to edit the file in the drop box: can you do
that?
– Re-enable your network connection: what
happens to your dropbox folder?
Eventual Consistency
- A Dropbox Example
• Dropbox embraces eventual consistency:
– Immediate consistency is impossible in case of a
network partition
– Users will feel bad if their word documents freeze
each time they hit Ctrl+S , simply due to the large
latency to update all devices across WAN
– Dropbox is oriented to personal syncing, not on
collaboration, so it is not a real limitation.
Eventual Consistency
- An ATM Example
• In design of automated teller machine (ATM):
– Strong consistency appear to be a nature choice
– However, in practice, A beats C
– Higher availability means higher revenue
– ATM will allow you to withdraw money even if the
machine is partitioned from the network
– However, it puts a limit on the amount of withdraw
(e.g., $200)
– The bank might also charge you a fee when a
overdraft happens
Dynamic Tradeoff between C and A
• An airline reservation system:
– When most of seats are available: it is ok to rely on
somewhat out-of-date data, availability is more
critical
– When the plane is close to be filled: it needs more
accurate data to ensure the plane is not
overbooked, consistency is more critical
• Neither strong consistency nor guaranteed
availability, but it may significantly increase the
tolerance of network disruption
Heterogeneity: Segmenting C and A
• No single uniform requirement
– Some aspects require strong consistency
– Others require high availability
• Segment the system into different components
– Each provides different types of guarantees
• Overall guarantees neither consistency nor
availability
– Each part of the service gets exactly what it needs
• Can be partitioned along different dimensions
Discussion
• In an e-commercial system (e.g., Amazon, e-Bay, etc),
what are the trade-offs between consistency and
availability you can think of? What is your strategy?
• Hint -> Things you might want to consider:
– Different types of data (e.g., shopping cart, billing,
product, etc.)
– Different types of operations (e.g., query, purchase, etc.)
– Different types of services (e.g., distributed lock, DNS, etc.)
– Different groups of users (e.g., users in different
geographic areas, etc.)
Partitioning Examples
• Data Partitioning
• Operational Partitioning
• Functional Partitioning
• User Partitioning
• Hierarchical Partitioning
Partitioning Examples
Data Partitioning
• Different data may require different consistency
and availability
• Example:
• Shopping cart: high availability, responsive, can
sometimes suffer anomalies
• Product information need to be available, slight
variation in inventory is sufferable
• Checkout, billing, shipping records must be consistent
Partitioning Examples
Operational Partitioning
• Each operation may require different balance
between consistency and availability
• Example:
• Reads: high availability; e.g.., “query”
• Writes: high consistency, lock when writing; e.g.,
“purchase”
Partitioning Examples
Functional Partitioning
• System consists of sub-services
• Different sub-services provide different
balances
• Example: A comprehensive distributed system
– Distributed lock service (e.g., Chubby) :
• Strong consistency
– DNS service:
• High availability
Partitioning Examples
User Partitioning
• Try to keep related data close together to
assure better performance
• Example: Craglist
– Might want to divide its service into several data
centers, e.g., east coast and west coast
• Users get high performance (e.g., high availability and
good consistency) if they query servers closet to them
• Poorer performance if a New York user query Craglist in
San Francisco
Partitioning Examples
Hierarchical Partitioning
• Large global service with local “extensions”
• Different location in hierarchy may use
different consistency
• Example:
– Local servers (better connected) guarantee more
consistency and availability
– Global servers has more partition and relax one of
the requirement
What if there are no partitions?
• Tradeoff between Consistency and Latency:
• Caused by the possibility of failure in
distributed systems
– High availability -> replicate data -> consistency
problem
• Basic idea:
– Availability and latency are arguably the same
thing: unavailable -> extreme high latency
– Achieving different levels of consistency/availability
takes different amount of time
CAP -> PACELC
• A more complete description of the space of
potential tradeoffs for distributed system:
– If there is a partition (P), how does the system
trade off availability and consistency (A and C);
else (E), when the system is running normally in
the absence of partitions, how does the system
trade off latency (L) and consistency (C)?

Abadi, Daniel J. "Consistency tradeoffs in modern distributed database


system design." Computer-IEEE Computer Magazine 45.2 (2012): 37.
PACELC

C A C L

Partitioned Normal
Examples
• PA/EL Systems: Give up both Cs for availability and lower
latency
– Dynamo, Cassandra, Riak
• PC/EC Systems: Refuse to give up consistency and pay
the cost of availability and latency
– BigTable, Hbase, VoltDB/H-Store
• PA/EC Systems: Give up consistency when a partition
happens and keep consistency in normal operations
– MongoDB
• PC/EL System: Keep consistency if a partition occurs but
gives up consistency for latency in normal operations
– Yahoo! PNUTS
POW
In a Blockchain, each block consists of 4 main
headers.

1. Previous Hash: This hash address locates the


previous block.

2. Transaction Details: Details of all the transactions


that need to occur.

3. Nonce: An arbitrary number given in cryptography


to differentiate the block’s hash address.
• Hash Address of the Block: All of the above
(i.e., preceding hash, transaction details, and
nonce) are transmitted through a hashing
algorithm.
• This gives an output containing a 256-bit, 64
character length value, which is called the
unique ‘hash address.’ Consequently, it is
referred to as the hash of the block.
• PoW or proof of work is a special protocol that aims
to deter cyber-attacks such as DDoS (distributed
denial-of-service attacks), which can use up the
resources of a computer stem with the help of
multiple fake requests.

• It uses a trustless and distributed consensus system.


• PoW implements a decentralized system and
works without needing a central authority.
• The PoW consensus mechanism can verify
transactions without needing a third party.
• PoW makes double-spending difficult by proving
that every user has done several computations.
• Many other blockchain projects that copied the
original Bitcoin code also follow the Proof of
Work model.
POW
How POW Works
• Proof of work requires an expensive computer calculation or,
in other words, the process of mining. Mining needs to be
performed to create trustless transactions on the blockchain.
• Step 1) Transactions are compiled and bundled up together
in the form of a block.
• Step 2) Miners then verify transactions within each block,
checking to see if they are legitimate.
• Step 3) Miners then solve a mathematical puzzle known as a
proof of work problem to proceed. All miners have to
compete.
• Step 4) The first miner who solves each block problem is
rewarded.
• Step 5) The verified transactions are then stored on the
blockchain.
Proof of Work Examples

• Proof of work model has existed for a long time so let us


go through some examples of PoW.
• Emails
• The first example we shall explore is emails attached
with a lengthy piece of text. Ordinary computers can
send millions of emails per day, but executing other
tasks and receiving a lot of spam can affect its efficiency
and reduce processing costs.
• PoW is used to lower processing cycles by providing
complex computation problems which enhance security.
• Cryptocurrencies
• One of the most famously used examples of
PoW is mining a cryptocurrency.
• The PoW model ensures that miners have direct
authority within the network.
• It also prevents double-spending attacks from
occurring.
• Miners have a fixed income because PoW
includes enough headers in new blocks.
• DDoS
• Another example of PoW is migrating DDoS
attacks that cause inconvenience and disruptions.
• The PoW algorithm solves complex mathematical
problems by getting a collective solution.
• PoW helps to solve problems in a distributed sort
of way.
• This way, even a small number of participants can
solve complex problems.
How are transactions verified: PoW

• Understanding how transaction verifications


work in PoW can be difficult without an example.
Let’s look at Bitcoin’s model.
• Step 1) Within every 10 minutes or so, a new
block is created. It takes about the same amount
of time to confirm Bitcoin transactions as valid.
• Step 2) Every single block contains different
transactions that require verification. Within a
decentralized system, it becomes difficult and
energy-consuming to verify every transaction.
• Step 3) Proof-of-Work offers a huge amount of
computational power to solve the
cryptographic algorithm. It makes it
impossible for network participants who have
fewer resources to get better rewards.
• Step 4) Once all transactions within a block are
verified, they are added onto the public
blockchain where other users can see them.
• Let us suppose the mathematical sum 4+8 using
Proof of Work. Now we know that the answer is
12. But in this model, whoever gets to the
answer first wins the mining reward. Imagine
miner 1 and miner 2 competing to solve this
problem. The results would be as follows;
• Miner 1
• Attempt 1: 4+8 = 11 *Incorrect*
• Attempt 2: 4+8 = 9 *Incorrect*
• Attempt 3: 4+8 = 10 *Incorrect*
• Miner 2
• Attempt 1: 4+8 = 13 *Incorrect*
• Attempt 2: 4+8 = 12 *Correct*
• Attempt 3: 4+8 = 14 *Incorrect*
• So you can see that miner 2 guessed the correct
answer on 2nd attempt so that it will get the
miner reward.
• But in reality, computers can execute millions of
combinations each second.
• At any particular moment, many hardware devices
are trying to solve cryptographic equations.
• It is almost like a race to be the first to reach the
finish line and get the mining reward.
Advantages of POW
• Some Important benefits/pros of Proof of Work are:
• Proof-of-Work was invented to stop double-
spending attempts.
• It is one of the most secure consensus mechanisms.
• Cryptos based on PoW has more mining power and
are more secure.
• Mining earns rewards in a typical PoW model.
• Proof of work is random yet fair.
Disadvantages of POW
• Some important risks/cons of Proof of Work are:
• Mining requires extremely powerful hardware.
• Not affordable for every market
participant.
• Energy consumption due to extremely
high mining participation is off-the-charts.
• The majority of mining pools are
controlled by single entities.
• PoW model is prone to 51% attacks
What is PoS?

• Proof of stake (PoS) is a type of consensus


mechanism which is used to validate transactions on
the blockchain.

• It works by allowing cryptocurrency owners to stake


their coins.

• This gives them the right to verify new blocks of


transactions on the blockchain and add them to the
network.
• The model of Proof of Stake exists as an alternative
consensus mechanism. Few cryptocurrencies follow
this protocol which replaces miners with stakes.
• The algorithm chooses any one of these stakers to
publish the next block.
• Two developers named Scott Nadal and Sunny King
created PoS noticing the flaws in PoW in the year
2012.
• Limited scalability and needing a lot of electricity are
not a problem in the PoS model.
• In theory, PoS is an “ideal” solution for scaling
problems within the PoW mechanism.
• Ethereum 2.0 will be 100% proof-of-stake. Hence
it will process its transactions, NFT transactions,
and execute smart contract transactions.
• One must have a powerful computer system and
a sufficiently sized wallet. It increases their
chances of earning a proof-of-stake reward.
• The PoS model handles maintaining integrity
within a blockchain. It also guarantees that
crypto users cannot mint coins without earning
them.
What is Staking?

• Staked funds are set aside and stored in a smart


contract by validators. This is known as the
staking process.
• Whoever has a bigger stake might be chosen to
verify transactions and create blocks. Blocks thus
forged get added to the blockchain.
• All pos coins do not follow the same set of rules
even though the concept of validation is the
same.
• Every qualified validator market participant earns
a reward based on ownership.
• PoS consensus mechanism concept is based on the
following steps:
Step 1) Users who own native tokens of a blockchain
store all or a part of it in staking pools safely.
Step 2) The algorithm pseudo-randomly chooses the
next validator in line.
Step 3) The chosen validator has to propose a block
and the number of transactions in it.
Step 4) Other participants get to approve and verify
the proposed transaction.
Step 5) A new block is added to the blockchain.
Step 6) The selected validator earns a transaction fee.
POS
Some Important benefits/pros of Proof of Stake are:
• The PoS mechanism is safe from 51% of attacks.
• The Proof-of-stake does not need expensive
hardware for processing.
• Transactions are faster and relatively inexpensive.
• Processing in the case of PoS does not use much
energy.
• Stakes act as a financial motivator in the PoS
model.
• PoS models have not been implemented on an
elaborate blockchain.
• Capturing control of the network is easy as it
depends on capital.
• PoS misses out on many PoW benefits, such as
mining rewards.
• Centralized threats like double-spending are
executable.
• PoS has governance issues meaning users with more
tokens can change the rules of the network.
• Proof-of-Stake is the so-called better way of solving
cryptographic problems. Following are a few
cryptocurrencies that use the PoS model that is faster and
more secure than PoW.
• Tezos:
• The decentralized network of Tezos includes an incentive
mechanism that rewards validators. For maintaining and
securing the network, validators receive newly-created
tokens. Stakes increase as new participants enter the
network and become active. The PoS system in Tezos also
protects rewards and blockchain data from tampering.
• Ethereum 2.0:
• The co-founder of Ethereum, Vitalik Buterin,
proposed the Ethereum Improvement Proposal in
2016.
• It uses a modified version of the PoW algorithm
called Sharding.
• The concept of Sharding can improve network
performance by holding more hash power.
• Sharding would also increase the number of
transactions in a block.
• Cosmos:
• Cosmos is popular for deploying a PoS
network for widespread use (more than
Bitcoin). By securing millions of users, the
project hopes to become the largest PoS-
based coin. Its target audience includes
people who do not have access to the banking
system.
Proof-of-Work Proof-of-Stake

PoW or proof of work is a special protocol that aims to


Proof of stake (PoS) is a type of consensus mechanism
deter cyber-attacks such as DDoS (distributed denial-of-
which is used to validate transactions on the blockchain.
service attacks).

Any hacker needs to gain more than 50% of total Hackers must own more than 50% of all cryptocurrencies
computational power to perform a 51% attack. on the same network, which is impossible.

The mining probability depends on the computational


A new block’s validity depends on the size of the stake.
work done.

Miners receive rewards for complex solving cryptographic The validator does not receive a block reward. Instead,
problems. they only collect network fees as their reward.

Requires powerful and up-to-date mining hardware. Requires server-grade unit for efficient processing.

PoW is the original cryptographic consensus mechanism PoS was derived from PoW, but it comes with several
originating long before PoS. improvements.

To achieve more scalability, all nodes within a transaction The entire network is not involved in the verification of
are involved. every transaction.

You might also like