CAP Theorem Lect 2
CAP Theorem Lect 2
CAP Theorem
• Consistency:
– All nodes should see the same data at the same time
• Availability:
– Node failures do not prevent survivors from continuing to
operate
• Partition-tolerance:
– The system continues to operate despite network
partitions
• A distributed system can satisfy any two of these
guarantees at the same time but not all three
CAP Theorem
C A
P
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
CAP Theorem: Proof
• A simple proof using two nodes:
A B
CAP Theorem: Proof
• A simple proof using two nodes:
Not Consistent!
A B
Respond to client
CAP Theorem: Proof
• A simple proof using two nodes:
Not Available!
A B
Wait to be updated
CAP Theorem: Proof
• A simple proof using two nodes:
Not Partition
Tolerant!
A B
?
Eventual Consistency
- A Facebook Example
• Bob tells Alice to wait a bit and check out later
• Alice waits for a minute or so and checks back:
- She finds the story Bob shared with her!
Eventual Consistency
- A Facebook Example
• Reason: it is possible because Facebook uses
an eventual consistent model
• Why Facebook chooses eventual consistent
model over the strong consistent one?
– Facebook has more than 1 billion active users
– It is non-trivial to efficiently and reliably store the
huge amount of data generated at any given time
– Eventual consistent model offers the option to
reduce the load and improve availability
Eventual Consistency
- A Dropbox Example
• Dropbox enabled immediate consistency via
synchronization in many cases.
• However, what happens in case of a network
partition?
Eventual Consistency
- A Dropbox Example
• Let’s do a simple experiment here:
– Open a file in your drop box
– Disable your network connection (e.g., WiFi, 4G)
– Try to edit the file in the drop box: can you do
that?
– Re-enable your network connection: what
happens to your dropbox folder?
Eventual Consistency
- A Dropbox Example
• Dropbox embraces eventual consistency:
– Immediate consistency is impossible in case of a
network partition
– Users will feel bad if their word documents freeze
each time they hit Ctrl+S , simply due to the large
latency to update all devices across WAN
– Dropbox is oriented to personal syncing, not on
collaboration, so it is not a real limitation.
Eventual Consistency
- An ATM Example
• In design of automated teller machine (ATM):
– Strong consistency appear to be a nature choice
– However, in practice, A beats C
– Higher availability means higher revenue
– ATM will allow you to withdraw money even if the
machine is partitioned from the network
– However, it puts a limit on the amount of withdraw
(e.g., $200)
– The bank might also charge you a fee when a
overdraft happens
Dynamic Tradeoff between C and A
• An airline reservation system:
– When most of seats are available: it is ok to rely on
somewhat out-of-date data, availability is more
critical
– When the plane is close to be filled: it needs more
accurate data to ensure the plane is not
overbooked, consistency is more critical
• Neither strong consistency nor guaranteed
availability, but it may significantly increase the
tolerance of network disruption
Heterogeneity: Segmenting C and A
• No single uniform requirement
– Some aspects require strong consistency
– Others require high availability
• Segment the system into different components
– Each provides different types of guarantees
• Overall guarantees neither consistency nor
availability
– Each part of the service gets exactly what it needs
• Can be partitioned along different dimensions
Discussion
• In an e-commercial system (e.g., Amazon, e-Bay, etc),
what are the trade-offs between consistency and
availability you can think of? What is your strategy?
• Hint -> Things you might want to consider:
– Different types of data (e.g., shopping cart, billing,
product, etc.)
– Different types of operations (e.g., query, purchase, etc.)
– Different types of services (e.g., distributed lock, DNS, etc.)
– Different groups of users (e.g., users in different
geographic areas, etc.)
Partitioning Examples
• Data Partitioning
• Operational Partitioning
• Functional Partitioning
• User Partitioning
• Hierarchical Partitioning
Partitioning Examples
Data Partitioning
• Different data may require different consistency
and availability
• Example:
• Shopping cart: high availability, responsive, can
sometimes suffer anomalies
• Product information need to be available, slight
variation in inventory is sufferable
• Checkout, billing, shipping records must be consistent
Partitioning Examples
Operational Partitioning
• Each operation may require different balance
between consistency and availability
• Example:
• Reads: high availability; e.g.., “query”
• Writes: high consistency, lock when writing; e.g.,
“purchase”
Partitioning Examples
Functional Partitioning
• System consists of sub-services
• Different sub-services provide different
balances
• Example: A comprehensive distributed system
– Distributed lock service (e.g., Chubby) :
• Strong consistency
– DNS service:
• High availability
Partitioning Examples
User Partitioning
• Try to keep related data close together to
assure better performance
• Example: Craglist
– Might want to divide its service into several data
centers, e.g., east coast and west coast
• Users get high performance (e.g., high availability and
good consistency) if they query servers closet to them
• Poorer performance if a New York user query Craglist in
San Francisco
Partitioning Examples
Hierarchical Partitioning
• Large global service with local “extensions”
• Different location in hierarchy may use
different consistency
• Example:
– Local servers (better connected) guarantee more
consistency and availability
– Global servers has more partition and relax one of
the requirement
What if there are no partitions?
• Tradeoff between Consistency and Latency:
• Caused by the possibility of failure in
distributed systems
– High availability -> replicate data -> consistency
problem
• Basic idea:
– Availability and latency are arguably the same
thing: unavailable -> extreme high latency
– Achieving different levels of consistency/availability
takes different amount of time
CAP -> PACELC
• A more complete description of the space of
potential tradeoffs for distributed system:
– If there is a partition (P), how does the system
trade off availability and consistency (A and C);
else (E), when the system is running normally in
the absence of partitions, how does the system
trade off latency (L) and consistency (C)?
C A C L
Partitioned Normal
Examples
• PA/EL Systems: Give up both Cs for availability and lower
latency
– Dynamo, Cassandra, Riak
• PC/EC Systems: Refuse to give up consistency and pay
the cost of availability and latency
– BigTable, Hbase, VoltDB/H-Store
• PA/EC Systems: Give up consistency when a partition
happens and keep consistency in normal operations
– MongoDB
• PC/EL System: Keep consistency if a partition occurs but
gives up consistency for latency in normal operations
– Yahoo! PNUTS
POW
In a Blockchain, each block consists of 4 main
headers.
Any hacker needs to gain more than 50% of total Hackers must own more than 50% of all cryptocurrencies
computational power to perform a 51% attack. on the same network, which is impossible.
Miners receive rewards for complex solving cryptographic The validator does not receive a block reward. Instead,
problems. they only collect network fees as their reward.
Requires powerful and up-to-date mining hardware. Requires server-grade unit for efficient processing.
PoW is the original cryptographic consensus mechanism PoS was derived from PoW, but it comes with several
originating long before PoS. improvements.
To achieve more scalability, all nodes within a transaction The entire network is not involved in the verification of
are involved. every transaction.