0% found this document useful (0 votes)
20 views33 pages

DS Chapter V7replication

This document discusses consistency and replication in distributed systems. It introduces reasons for replication including reliability and performance. It describes object replication and concurrency control methods. The document discusses consistency models and how tight consistency can limit scalability. It proposes loosening consistency constraints. Finally, it outlines distribution protocols for replica placement and update propagation.

Uploaded by

Gofere Tube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views33 pages

DS Chapter V7replication

This document discusses consistency and replication in distributed systems. It introduces reasons for replication including reliability and performance. It describes object replication and concurrency control methods. The document discusses consistency models and how tight consistency can limit scalability. It proposes loosening consistency constraints. Finally, it outlines distribution protocols for replica placement and update propagation.

Uploaded by

Gofere Tube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Distributed Systems

CHAPTER SEVEN
Consistency and Replication

(CS, CCI, WKU, Ethiopia, 2022)

Habtamu Alemayehu
Lecturer Name: (MSc in CSE)

5/3/2022 1
Introduction
 data are generally replicated to enhance reliability and improve
performance
 but replication may create inconsistency
 consistency models for shared data are often hard to implement in
large-scale distributed systems; hence simpler models such as client–
centric consistency models are used

5/3/2022 2
Objectives of the Chapter

 we discuss
 why replication is useful and its relation with scalability; in
particular object-based replication
 consistency models for shared data designed for parallel computers
which are also useful in distributed shared memory systems
 client–centric consistency models
 how consistency and replication are implemented

5/3/2022 3
Reasons for Replication and Object Replication
 two major reasons: reliability and performance
 reliability
 if a file is replicated, we can switch to other replicas if there is a
crash on our replica
 we can provide better protection against corrupted data; similar
to mirroring in non-distributed systems
 performance
 if the system has to scale in size and geographical area
 place a copy of data in the proximity of the process using them,
reducing the time of access and increasing its performance; for
example a Web server is accessed by thousands of clients from
all over the world
 caching is strongly related to replication; normally by clients

5/3/2022 4
Cont’d
 Object Replication
 consider a distributed object shared by multiple clients

organization of a distributed remote object shared by two different clients


 before replication of an object, how to protect the object against
simultaneous access by multiple clients; two methods

5/3/2022 5
Cont’d
1. the object itself can handle concurrent invocations by its own; e.g., a
Java object can be constructed as a monitor by declaring the object’s
methods to be synchronized (only one thread is allowed to proceed
while others are blocked until further notice)

5/3/2022 6
Cont’d
2. the server is responsible for concurrency control using an object
adapter, e.g., using a single thread per object; the single thread
serializes all incoming invocations

5/3/2022 7
Cont’d

 Replication as Scaling Technique


 replication and caching are widely applied as scaling techniques
 processes can use local copies and limit access time and traffic
 however, we need to keep the copies consistent; but this may
1. require more network bandwidth
 if the copies are refreshed more often than used (low access-to-
update ratio), the cost (bandwidth) is more expensive than the
benefits; not all updates have been used

5/3/2022 8
Cont’d
2. itself be subject to serious scalability problems
 intuitively, a read operation made on any copy should return the
same value (the copies are always the same)
 thus, when an update operation is performed on one copy, it should
be propagated to all copies before a subsequent operation takes
places
 this is sometimes called tight consistency (a write is performed at all
copies in a single atomic operation or transaction)
 difficult to implement since it means that all replicas first need to
reach agreement on when exactly an update is to be performed
locally, say by deciding a global ordering of operations using Lamport
timestamps and this takes a lot of communication time

5/3/2022 9
Cont’d
 dilemma
 scalability problems can be alleviated by applying replication and
caching, leading to a better performance
 but, keeping copies consistent requires global synchronization,
which is generally costly in terms of performance
 solution: loosen the consistency constraints
 updates do not need to be executed as atomic
operations (no more instantaneous global
synchronization); but copies may not be always the
same everywhere
 to what extent the consistency can be loosened
depends on the specific application (the purpose of
data as well as access and update patterns)

5/3/2022 10
Distribution Protocols

 there are different ways of propagating, i.e., distributing updates


to replicas, independent of the consistency model
 we will discuss
 replica placement
 update propagation
 epidemic protocols
a. Replica Placement
 a major design issue for distributed data stores is deciding
where, when, and by whom copies of the data store are to be
placed
 three types of copies:
 permanent replicas
 server-initiated replicas
 client-initiated replicas

5/3/2022 11
Cont’d

the logical organization of different kinds of copies of a data store into three concentric
rings

5/3/2022 12
Cont’d
1. Permanent Replicas
 the initial set of replicas that constitute a distributed data store;
normally a small number of replicas
 e.g., a Web site: two forms
 the files that constitute a site are replicated across a limited
number of servers on a LAN; a request is forwarded to one of the
servers
 mirroring: a Web site is copied to a limited number of servers,
called mirror sites, which are geographically spread across the
Internet; clients choose one of the mirror sites

2. Server-Initiated Replicas (push caches)


 Web Hosting companies dynamically create replicas to improve
performance (e.g., create a replica near hosts that use the Web site
very often)

5/3/2022 13
Cont’d

3. Client-Initiated Replicas (client caches or simply caches)


 to improve access time
 a cache is a local storage facility used by a client to temporarily store a
copy of the data it has just received
 placed on the same machine as its client or on a machine shared by
clients on a LAN
 managing the cache is left entirely to the client; the data store from
which the data have been fetched has nothing to do with keeping cached
data consistent

5/3/2022 14
Cont’d
b. Update Propagation
 updates are initiated at a client, forwarded to one of the copies, and
propagated to the replicas ensuring consistency
 some design issues in propagating updates
 state versus operations
 pull versus push protocols
 unicasting versus multicasting
1. State versus Operations
 what is actually to be propagated? three possibilities
 send notification of update only (for invalidation protocols - useful
when read/write ratio is small); use of little bandwidth
 transfer the modified data (useful when read/write ratio is high)
 transfer the update operation (also called active replication); it
assumes that each machine knows how to do the operation; use of
little bandwidth, but more processing power needed from each replica

5/3/2022 15
Cont’d
2. Pull versus Push Protocols
 push-based approach (also called server- based protocols):
propagate updates to other replicas without those replicas even
asking for the updates (used when high degree of consistency is
required and there is a high read/write ratio)
 pull-based approach (also called client-based protocols): often used
by client caches; a client or a server requests for updates from the
server whenever needed (used when the read/write ratio is low)
 a comparison between push-based and pull-based protocols; for
simplicity assume multiple clients and a single server
Issue Push-based Pull-based
State of server List of client replicas and caches None
Update (and possibly fetch
Messages sent Poll and update
update later)
Response time at client Immediate (or fetch-update time) Fetch-update time
5/3/2022 16
Cont’d
3. Unicasting versus Multicasting
 multicasting can be combined with push-based approach; the
underlying network takes care of sending a message to multiple
receivers
 unicasting is the only possibility for pull-based approach; the
server sends separate messages to each receiver

c. Epidemic Protocols
 update propagation in eventual consistency is often implemented by
a class of algorithms known as epidemic protocols
 updates are aggregated into a single message and then exchanged
between two servers

5/3/2022 17
Consistency Protocols
 consistency protocols describe an implementation of a specific
consistency model
 there are three types
 primary-based protocols
 remote-write protocols
 local-write protocols
 replicated-write protocols
 active replication
 quorum-based protocols
 cache-coherence protocols

5/3/2022 18
Cont’d
1. Primary-Based Protocols
 each data item x in the data store has an associated primary, which is
responsible for coordinating write operations on x
 two approaches: remote-write protocols, and local-write protocols
a. Remote-Write Protocols
 all read and write operations are carried out at a (remote) single
server; in effect, data are not replicated; traditionally used in
client-server systems, where the server may possibly be
distributed

5/3/2022 19
Cont’d

primary-based remote-write protocol with a fixed server to which all read and write
operations are forwarded

5/3/2022 20
Cont’d
 another approach is primary-backup protocols where reads can be
made from local backup servers while writes should be made directly
on the primary server
 the backup servers are updated each time the primary is updated

the principle of primary-backup protocol


5/3/2022 21
Cont’d
b. Local-Write Protocols
 two approaches
i. there is a single copy; no replicas
 when a process wants to perform an operation on some data
item, the single copy of the data item is transferred to the
process, after which the operation is performed

5/3/2022 22
Cont’d

primary-based local-write protocol in which a single copy is migrated between processes


 consistency is straight forward
 keeping track of the current location of each data item is a major
problem

5/3/2022 23
Cont’d
ii. primary-backup local-write protocol
 the primary migrates between processes that wish to perform a write
operation
 multiple, successive write operations can be carried out locally, while
(other) reading processes can still access their local copy
 such improvement is possible only if a nonblocking protocol is
followed

5/3/2022 24
Cont’d

primary-backup protocol in which the primary migrates to the process wanting to perform an update

5/3/2022 25
Cont’d
2.Replicated-Write Protocols
 unlike primary-based protocols, write operations can be carried out at
multiple replicas; two approaches: Active Replication and Quorum-Based
Protocols
a. Active Replication
 each replica has an associated process that carries out update
operations
 updates are generally propagated by means of write operations (the
operation is propagated); also possible to send the update
 the operations need to be done in the same order everywhere; totally-
ordered multicast
 two possibilities to ensure that the order is followed
 Lamport’s timestamps, or
 use of a central sequencer that assigns a unique sequence number for
each operation; the operation is first sent to the sequencer then the
sequencer forwards the operation to all replicas

5/3/2022 26
Cont’d
 a problem is replicated invocations
 suppose object A invokes B, and B invokes C; if object B is replicated,
each replica of B will invoke C independently
 this may create inconsistency and other effects; what if the
operation on C is to transfer $10

the problem of replicated invocations


5/3/2022 27
Cont’d
 one solution is to have a replication-aware communication layer that
avoids the same invocation being sent more than once
 when a replicated object B invokes another replicated object C, the
invocation request is first assigned the same, unique identifier by each
replica of B
 a coordinator of the replicas of B forwards its request to all replicas of
object C; the other replicas of object B hold back; hence only a single
request is sent to each replica of C
 the same mechanism is used to ensure that only a single reply message is
returned to the replicas of B

5/3/2022 28
Cont’d

a) forwarding an invocation request from a replicated object


b) returning a reply to a replicated object

5/3/2022 29
Cont’d
b. Quorum-Based Protocols
 use of voting: clients are required to request and acquire the permission of
multiple servers before either reading or writing a replicated data item
 e.g., assume a distributed file system where a file is replicated on N servers
 a client must first contact at least half + 1 (majority) servers and get
them to agree to do an update
 the new update will be done and the file will be given a new version
number
 to read a file, a client must also first contact at least half + 1 and ask
them to send version numbers; if all version numbers agree, this must be
the most recent version

5/3/2022 30
Cont’d
3. Cache-Coherence Protocols
 cashes form a special case of replication as they are controlled by clients
instead of servers
 cache-coherence protocols ensure that a cache is consistent with the
server-initiated replicas
 two design issues in implementing caches: coherence detection and
coherence enforcement
 coherence detection strategy: when inconsistencies are actually
detected
 static solution: prior to execution, a compiler performs the analysis to
determine which data may lead to inconsistencies if cached and
inserts instructions that avoid inconsistencies
 dynamic solution: at runtime, a check is made with the server to see
whether a cached data have been modified since they were cached

5/3/2022 31
Cont’d
 coherence enforcement strategy: how caches are kept consistent
with the copies stored at the servers
 simplest solution: do not allow shared data to be cached;
suffers from performance improvement
 allow caching shared data and
 let a server send an invalidation to all caches whenever a data
item is modified
or
 propagate the update

5/3/2022 32
Thank You !!!

5/3/2022 33

You might also like