ADB - CH4 - Transaction Management and Recovery

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 80

Advanced Database Systems

Chapter 4: Transaction Management and


Recovery

1
Chapter Outline:
 Transaction and System Concepts
 Properties of Transaction
 Schedules and Recoverability
 Serializability of Schedules
 Database Concurrency Control
 Locking Techniques for Concurrency Control
• Locking
• 2PL
• Timestamping

2
1. Introduction
A Transaction:
– Logical unit of database processing that includes one or more access operations (read
-retrieval, write - insert or update, delete).
• Examples include ATM transactions, credit card approvals, flight reservations, hotel
check-in, phone calls, supermarket scanning, academic registration and billing.
• Transaction boundaries:

– Any single transaction in an application program is bounded with Begin and End
statements.
• An application program may contain several transactions separated by the Begin and
End transaction boundaries.

3
SIMPLE MODEL OF A DATABASE :
• A database is a collection of named data items
• Granularity of data - a field, a record , or a
whole disk block that measure the size of the
data item
• Basic operations that a transaction can perform
are read and write
– read_item(X): Reads a database item named X
into a program variable. To simplify our notation,
we assume that the program variable is also
named X.
– write_item(X): Writes the value of program
variable X into the database item named X.
4
Exercise
X=1000; y=550;
T1
Read(y)
Y=y+x
Write(y) y= ?
Y=y-50
Write(y) y=?
• Basic unit of data transfer from the disk to the computer main
memory is one block.
• read_item(X) command includes the following steps:
– Find the address of the disk block that contains item X.
– Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
– Copy item X from the buffer to the program variable
named X.
• write_item(X) command includes the following steps:
– Find the address of the disk block that contains item X.
– Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
– Copy item X from the program variable named X into its
correct location in the buffer.
– Store the updated block from the buffer back to disk
(either immediately or at some later point in time).
8
• The DBMS maintains a number of buffers in the main
memory that holds database disk blocks which contains the
database items being processed.
– When this buffers are occupied and
– if there is a need for additional database block to be
copied to the main memory ;
• some buffer management policy is used to choose for
replacement but if the chosen buffer has been modified , it
must be written back to disk before it is used.
• Two sample transactions:
– (a) Transaction T1
– (b) Transaction T2

9
2. Transaction and System Concepts
• A transaction is an atomic unit of work that is either completed in
its entirety or not done at all.
– For recovery purposes, the system needs to keep track of when
the transaction starts, terminates, and commits or aborts.
• Transaction states:
– Active state -indicates the beginning of a transaction execution
– Partially committed state shows the end of read/write operation
but this will not ensure permanent modification on the data base
– Committed state -ensures that all the changes done on a record
by a transition were done persistently
– Failed state happens when a transaction is aborted during its
active state or if one of the rechecking is fails
– Terminated State -corresponds to the transaction leaving the
system

10
State transition diagram illustrating the states for
transaction execution

11
To summarize, the above state transition diagram illustrates how a transaction moves
through its execution states. A transaction goes into an active state immediately after
it starts execution, where it can execute its READ and WRITE operations. When the
transaction ends, it moves to the partially committed state. At this point, some
recovery protocols need to ensure that a system failure will not result in an inability
to record the changes of the transaction permanently (usually by recording changes
in the system log,
Once this check is successful, the transaction is said to have reached its commit
point and enters the committed state. When a transaction is committed, it has
concluded its execution successfully and all its changes must be recorded
permanently in the database, even if a system failure occurs.
We Can have one of two outcomes:
 Success - transaction commits and database reach a new consistent state. Committed
transaction cannot be aborted. COMMIT signals the successful end of a transaction. Any
changes made by the transaction should be saved. These changes are now visible to other
transactions
 Failure - transaction aborts, and database must be restored to consistent state before it
started Such a transaction is rolled back or undone. Aborted transaction that is rolled back

can be restarted (redone) later. ROLLBACK signals the unsuccessful end of a transaction.

Any changes made by the transaction should be undone. It is now as if the transaction never
existed
– T in the following discussion refers to a unique transaction-id
that is generated automatically by the system and is used to
identify each transaction:
– Types of log record:
• [start_transaction,T]: Records that transaction T has started
execution.
• [write_item,T,X,old_value,new_value]: Records that
transaction T has changed the value of database item X from
old_value to new_value.
• [read_item,T,X]: Records that transaction T has read the value
of database item X.
• [commit,T]: Records that transaction T has completed
successfully, and affirms that its effect can be committed
(recorded permanently) to the database.
• [abort,T]: Records that transaction T has been aborted.

14
3. Desirable Properties of Transactions
To ensure data integrity , DBMS should maintain the following ACID
properties:
• Atomicity: A transaction is an atomic unit of processing; it is either
performed in its entirety or not performed at all.
• Consistency preservation: A correct execution of the transaction must take
the database from one consistent state to another.
• Isolation: A transaction should not make its updates visible to other
transactions until it is committed; this property, when enforced strictly, solves
the temporary update problem and makes cascading rollbacks of
transactions unnecessary
• Durability or permanency: Once a transaction changes the database and
the changes are committed, these changes must never be lost because of
subsequent failure.

15
Example:
• Suppose that Ti is a transaction that transfer 200 birr from account CA2090( which is 5,000
Birr) to SB2359(which is 3,500 birr) as follows
• Read(CA2090)
• CA2090= CA2090-200
• Write(CA2090)
• Read(SB2359)
• SB2359= SB2359+200
• Write(SB2359)
• Atomicity- either all or none of the above operation will be done – this is materialized by
transaction management component of DBMS
• Consistency-the sum of CA2090 and SB2359 be unchanged by the execution of Ti i.e
8500- this is the responsibility of application programmer who codes the transaction
• Isolation- when several transaction are being processed concurrently on a data item they
may create many inconsistent problems. So handling such case is the responsibility of
Concurrency control component of the DBMS
• Durability - once Ti writes its update this will remain there when the database restarted
from failure . This is the responsibility of recovery management components of the DBMS

16
Modes of Concurrency
 Interleaved processing: concurrent execution of processes is interleaved on a single
CPU

 Parallel processing: processes are concurrently executed on multiple CPUs. Multiple


transactions are allowed to run concurrently in the system. Concurrency control
schemes – mechanisms to achieve isolation, i.e., to control the interaction among the
concurrent transactions in order to prevent them from destroying the consistency of the
database. Advantages of parallel processing are:
o increased processor and disk utilization, leading to better transaction throughput: one
transaction can be using the CPU while another is reading from or writing to the disk
o reduced average response time for transactions: short transactions need not wait
17
Why Concurrency Control is needed: Three cases

i. The Lost Update Problem


– This occurs when two transactions that access the
same database items have their operations
interleaved in a way that makes the value of some
database item incorrect.

18
 E.g. Account with balance A=100.
 T1 reads the account A
 T1 withdraws 10 from A
 T1 makes the update in the Database
 T2 reads the account A
 T2 adds 100 on A
 T2 makes the update in the Database
   In the above case, if done one after the other (serially) then we have no problem.
 If the execution is T1 followed by T2 then A=190
 If the execution is T2 followed by T1 then A=190
 But if they start at the same time in the following sequence:
T1 T2
 T1 reads the account A=100
Read_item(A)
 T1 withdraws 10 making the balance A=90 A=A-10
 T2 reads the account A=100 Read_item(A)
A=A+100
 T2 adds 100 making A=200 Write_item(A)
 T1 makes the update in the Database A=90 Write_item(A)

 T2 makes the update in the Database A=200


 After the successful completion of the operation the final value of A will be 200
which override the update made by the first transaction that changed the value from
100 to 90. 19
i. The Temporary Update (or Dirty Read) Problem
– This occurs when one transaction updates a
database item and then the transaction fails for
some reason .
– The updated item is accessed by another
transaction before it is changed back to its original
value. Based on the above scenario:
T1 T2
Read_item(A) Fig 2: Temporal update problem
A=A+100

Write_item(A)
Read_item(A)
A=A-10 Transaction T2 fails and must change the
values of A back to its old value;
Write_item(A) Meanwhile T1 has read the temporary
Abort incorrect value of A
Commit
T2 increases A making it 200 but then aborts the transaction before it
is committed. T1 gets 200, subtracts 10 and make it 190. But the actual
balance should be 90
20
i. The Incorrect Summary Problem
– If one transaction is calculating an aggregate summary
function on a number of records while other transactions
are updating some of these records, the aggregate
function may calculate some values before they are
updated and others after they are updated.
•Example: T1 would like to add the values of A=10, B=20 and C=30. after
the values are read by T1 and before its completion, T2 updates the value
of B to be 50. at the end of the execution of the two transactions T1 will
come up with the sum of 60 while it should be 90 since B is updated to 50
T1 T2
Sum= 0;
Read_item(A)
Sum=Sum+A
Read_item(B)
Sum=Sum+B
Read_item(B)
B=50
Read_item(C)
Sum=Sum+C

21
Schedule
Schedule: a sequence of instructions that specify the chronological order in
which instructions of concurrent transactions are executed. a schedule for a set
of transactions must consist of all instructions of those transactions. Schedule
must preserve the order in which the instructions appear in each individual
transaction.
 A transaction that successfully completes its execution will have a commit
instruction as the last statement (will be omitted if it is obvious)
 A transaction that fails to successfully complete its execution will have an
abort instruction as the last statement (will be omitted if it is obvious)
Example Schedules
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
schedule S1 is a serial schedule in which T1 is followed by T2.
schedule S2 is a serial schedule in which T2 is followed by T1.
Schedule S3 is not serial, but it is equivalent to Schedule S1 and S2.

In all Schedules S1, S2 and


S3, the sum A + B is
preserved.
Bad Schedule

The following concurrent schedule does not preserve the value of (A + B) and violates the consistency requirement
  
The Transaction Log
 The transaction log records the details of all transactions-Any changes the
transaction makes to the database. The log is stored on disk, not in memory-
If the system crashes it is preserved
 Write ahead log rule-The entry in the log must be made before COMMIT
processing can complete
Serializability of Schedule
Basic Assumption: Each transaction preserves database consistency
Thus, serial execution of a set of transactions preserves database consistency.
We ignore operations other than read and write and we assume that transactions
may perform arbitrary computations on data in local
Serializability in DBMS
• Serializability is the concept in a transaction that helps to
identify which non-serial schedule is correct and will maintain
the database consistency. It relates to the isolation property of
transaction in the database.

• Serializability is the concurrency scheme where the execution of


concurrent transactions is equivalent to the transactions which
execute serially.
Serializability in DBMS
• Serializable Schedule

• A serial schedule is always a serializable schedule because any


transaction only starts its execution when another transaction has
already completed its execution. However, a non-serial schedule
of transactions needs to be checked for Serializability.
Serializability in DBMS
Serial Schedules Serializable Schedules

No concurrency is allowed.
Concurrency is allowed.
Thus, all the transactions
Thus, multiple transactions can
necessarily execute serially one
execute concurrently.
after the other.

Serial schedules lead to less Serializable schedules improve


resource utilization and CPU both resource utilization and CPU
throughput. throughput.

Serial Schedules are less efficient


Serializable Schedules are always
as compared to serializable
better than serial schedules.
schedules.
(due to above reason)
(due to above reason)
To illustrate our discussion, consider the schedules in Figure above, and assume that
the initial values of database items are X = 90 and Y = 90 and that N = 3 and M = 2.
After executing transactions T1 and T2, we would expect the database values to be X
= 89 and Y = 93, according to the meaning of the transactions. Sure enough, executing
either of the serial schedules A or B gives the correct results. Now consider the
nonserial schedules C and D. Schedule C (which is the same as Figure (a)) gives the
results X = 92 and Y = 93, in which the X value is erroneous, whereas schedule D
gives the correct results.

Schedule C gives an erroneous result because of the lost update problem discussed.
transaction T2 reads the value of X before it is changed by transaction T1, so only the
effect of T2 on X is reflected in the database. The effect of T1 on X is lost,
overwritten by T2, leading to the incorrect result for item X. However, some nonserial
schedules give the correct expected result, such as schedule D. We would like to
determine which of the nonserial schedules always give a correct result and which
may give erroneous results. The concept used to characterize schedules in this manner
is that of serializability of a schedule.
Database Concurrency Control
• Transaction Processor is divided into:
– A concurrency-control manager, or scheduler, responsible for
assuring isolation of transactions
– A logging and recovery manager, responsible for the durability
of transactions.
• The scheduler (concurrency-control manager) must assure that the
individual actions of multiple transactions are executed in such an
order that the net effect is the same as if the transactions had in fact
executed one-at-a-time.

Requests from
transactions

Lock Scheduler
Table
Read and Writes

Buffers

33
1. Purpose of Concurrency Control
– To enforce Isolation (through mutual exclusion) among
conflicting transactions.
– To preserve database consistency through consistency
preserving execution of transactions.
– To resolve read-write and write-write conflicts.

• A typical scheduler does its work by maintaining locks on


certain pieces of the database. These locks prevent two
transactions from accessing the same piece of data at the
same time. Example:
– In concurrent execution environment if T1 conflicts with
T2 over a data item A, then the existing concurrency
control decides if T1 or T2 should get the A and if the
other transaction is rolled-back or waits.

34
Example:

Bank database: 3 Accounts

A = 500
Account B = 500
Balances
C = 500

Property: A + B + C = 1500

Money does not leave the system


35
Example

Transaction T1: Transfer Transaction T2: Transfer


100 from A to B 100 from A to C

Read (A, t) Read (A, s)

t = t - 100 s = s - 100

Write (A, t) Write (A, s)

Read (B, t) Read (C, s)

t = t + 100 s = s + 100

Write (B, t) Write (C, s)

36
Transaction T1 Transaction T2 A B C

500 500 500


Read (A, t)
t = t - 100
Read (A, s)
s = s - 100
Write (A, s) 400 500 500

Write (A, t) 400 500 500


Read (B, t)
t = t + 100
Write (B, t) 400 600 500

Read (C, s)
s = s + 100
Write (C, s) 400 600 600
Schedule 37
400 + 600 + 600 = 1600
Transaction T1 Transaction T2 A B C

500 500 500


Read (A, t)
t = t - 100
Write (A, t) 400 500 500
Read (A, s)
s = s - 100
Write (A, s) 300 500 500
Read (B, t)
t = t + 100
Write (B, t) 300 600 500

Read (C, s)
s = s + 100
Alternative Write (C, s) 300 600 600
Schedule
300 + 600 + 600 = 1500 38
So What ?
2. Concurrency Control Techniques
• Basic concurrency control techniques:
– Locking,
– Timestamping
• The First two are conservative approaches: delay
transactions in case they conflict with other transactions.
• Optimistic methods assume conflict is rare and only check
for conflicts at commit.

39
2.1. Locking
• Lock is a variable associated with a data item that describes the status of
the data item with respect to the possible operations that can be applied
to it.
• Generally, a transaction must claim a shared (read) or exclusive (write)
lock on a data item before read or write.
• Lock prevents another transaction from modifying item or even reading
it, in the case of a write lock.
– Locking is an operation which secures
• (a) permission to Read
• (b) permission to Write a data item for a transaction.
• Example:
– Lock (X). Data item X is locked in behalf of the requesting
transaction.
– Unlocking is an operation which removes these permissions from the
data item.
• Example:
– Unlock (X): Data item X is made available to all other transactions.
– Lock and Unlock are Atomic operations.
40
 Two locks modes:
• (a) shared (read) (b) exclusive (write).
• Shared mode: shared lock (X)
– More than one transaction can apply share lock on X
for reading its value but no write lock can be applied
on X by any other transaction.
• Exclusive mode: Write lock (X)
– Only one write lock on X can exist at any time and no
shared lock can be applied by any other transaction
on X.
 Conflict matrix
Read Write

Read
Y N

Write
N N

41
 Lock Manager:
• Managing locks on data items.
 Lock table:
• Lock manager uses it to store the identify of
transaction locking a data item, the data item, lock
mode and pointer to the next data item locked. One
simple way to implement a lock table is through linked
list.

Transaction ID Data item id lock mode Ptr to next data item


T1 X1 Read Next

 Database requires that all transactions should be well-formed.


A transaction is well-formed if:
• It must lock the data item before it reads or writes to it.
• It must not lock an already locked data items and it must
not try to unlock a free data item.
42
Locking - Basic Rules
1. It has two oprerations : Lock_item(X) and unLock_item(X)
2. A transaction request access to an item X by first issuing a
lock_Item(x) opreation .
3. If lock (x)=1, the transaction is forced to wait.
4. If lock (X)= 0; it is set to 1 and the transaction is allowed to
access x
5. When a transaction finished operation on X it issues an Unlock
_item operation which set lock(x) to 0 so that X may be
accessed by another transaction
6. If transaction has shared lock on item, can read but not update
item.
7. If transaction has exclusive lock on item, can both read and
update item.
8. Reads cannot conflict, so more than one transaction can hold
shared locks simultaneously on same item.
9. Exclusive lock gives transaction exclusive access to that item.43
• Lock conversion
– Lock upgrade: existing read lock to write lock
if Ti has a read-lock (X) and Tj has no read-lock (X) (i
 j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X
– Lock downgrade: existing write lock to read lock
Ti has a write-lock (X) (*no transaction can have any lock on
X*) convert write-lock (X) to read-lock (X)
• Using such locks in the transaction do not guarantee
serializability of schedule on its own: example

44
1.1 Simple Locking
T1 T2
read_lock (Y);
read_item (Y);
unlock (Y);
read_lock (X); Result
read_item (X); X=50; Y=50
unlock (X); Nonserializable because it
write_lock (Y);
violated two-phase policy.
read_item (Y);
Y:=X+Y;
write_item (Y);
unlock (Y);
Time write_lock (X);
read_item (X);
X:=X+Y; •Problem is that transactions release locks too soon, resulting in
write_item (X); loss of total isolation and atomicity.
unlock (X); •To guarantee serializability, we need an additional protocol
concerning the positioning of lock and unlock operations in
every transaction.

45
2.2 Two-Phase Locking Techniques: The algorithm
 Transaction follows 2PL protocol if all locking operations precede
first unlock operation in the transaction.
• Every transaction can be divided into Two Phases: Locking (Growing)
& Unlocking (Shrinking)
– Locking (Growing) Phase:
• A transaction applies locks (read or write) on desired data items
one at a time.
• acquires all locks but cannot release any locks.
– Unlocking (Shrinking) Phase:
• A transaction unlocks its locked data items one at a time.
• Releases locks but cannot acquire any new locks.
• Requirement:
– For a transaction these two phases must be mutually exclusively,
that is, during locking phase unlocking phase must not start and
during unlocking phase locking phase must not begin.

# locks
held by
Ti

Growing Phase Shrinking Phase Time 46 ???


Example
T1 T2
Result
read_lock (Y); read_lock (X);
Initial values: X=20; Y=30
read_item (Y);read_item (X);
unlock (Y); unlock (X); Result of serial execution
write_lock (X); Write_lock (Y); T1 followed by T2
read_item (X);read_item (Y); X=50, Y=80.
X:=X+Y; Y:=X+Y;
Result of serial execution
write_item (X); write_item (Y);
unlock (X); unlock (Y); T2 followed by T1
X=70, Y=50

47
T1 T2
read_lock (Y);
read_item (Y);
unlock (Y);
read_lock (X); Result
read_item (X); X=50; Y=50
unlock (X); Nonserializable because it
write_lock (Y);
violated two-phase policy.
read_item (Y);
Y:=X+Y;
write_item (Y);
unlock (Y);
Time write_lock (X);
read_item (X);
X:=X+Y; •Problem is that transactions release locks too soon, resulting in
write_item (X); loss of total isolation and atomicity.
unlock (X); •To guarantee serializability, we need an additional protocol
concerning the positioning of lock and unlock operations in
every transaction.

48
T’1 T’2
read_lock (Y); read_lock (X);
read_item (Y);read_item (X); T’1 and T’2 follow two-phase
write_lock (X); Write_lock (Y); policy but they are subject to
deadlock, which must be dealt
unlock (Y); unlock (X); . with
read_item (X);read_item (Y);
X:=X+Y; Y:=X+Y;
write_item (X); write_item (Y);
unlock (X); unlock (Y);

• If every transaction in the schedule follows the 2PL protocol ,


the schedule is guaranteed to be serializabe
• Limitation
– It may limit the amount of concurrency control that can occur in
the schedule . How?
49
• Remark:
– 2PL protocol guarantees serialzability but it doesn’t permit
all the possible serializability schedule
– The use of this locking can cause two additional problems
• Deadlock
• starvation

50
Dealing with Deadlock and Starvation
A. Deadlock
– It is a state that may result when two or more transaction are
each waiting for locks held by the other to be released
– Example :
T1 T2
read_lock (Y);
read_item (Y);
read_lock (X);
read_item (X);
write_lock (X);
write_lock (Y);
– T1 is in the waiting queue for X which is locked by T2
– T2 is on the waiting queue for Y which is locked by T1
– No transaction can continue until the other transaction
completes
– T1 and T2 did follow two-phase policy but they are deadlock
• So the DBMS must either prevent or detect and resolve such 51
deadlock situations
• There are possible solutions : Deadlock prevention, deadlock detection
and avoidance ,and lock timeouts
i. Deadlock prevention protocol: two possibilities
 The conservative two-phase locking
− A transaction locks all data items it refers to before it begins
execution.
− This way of locking prevents deadlock since a transaction never
waits for a data item.
− Limitation : It restrictions concurrency
 Transaction Timestamp( TS(T) )
 We can prevent deadlocks by giving each transaction a priority
and ensuring that lower priority transactions are not allowed to
wait for higher priority transactions (or vice versa ).
 One way to assign priorities is to give each transaction a
timestamp when it starts up.
 it is a unique identifier given to each transaction based on time in
which it is started. i.e if T1 starts before T2 , TS(T1)<TS(T2)
 The lower the timestamp, the higher the transaction's priority, that
is, the oldest transaction has the highest priority.
 If a transaction Ti requests a lock and transaction Tj holds a
conflicting lock, the lock manager can use one of the
following two policies: Wait-die & Would-wait 52
• Wait-die
– If Ti has higher priority, it is allowed to wait; otherwise it is
aborted.
– An older transaction is allowed to wait on a younger transaction.
– A younger transaction requesting an item held by an older
transaction is aborted
– If TS(Ti) < TS(Tj), then (Ti older than Tj)Ti is allowed to wait.
– Otherwise (Ti younger than Tj)Abort Ti (Ti dies) and restart it
later with the same timestamp

T1(ts =10): M is locked , K is requested

wait

wait T2 (ts =20): K is locked, Z is requested

wait

T3 (ts =25) : Z is locked, M is requested


53
• Wound-wait
– The opposite of wait-die
– If Ti has higher priority, abort Tj; otherwise Ti waits.
– A younger transaction is allowed to wait on al older one
– An older transaction requesting an item held by a younger
transaction preempts the younger transaction by aborting it.
– If TS(Ti) < TS(Tj), then (Ti older than Tj),Abort Tj (Ti wounds
Tj) and restart Tj later with the same timestamp
– Otherwise (Ti younger than Tj)Ti is allowed to wait

T1(ts =25)

wait

T2 (ts =20)
wait wait

T3 (ts =10) 54 ??
 Remark:
− Both methods ended by aborting the younger of the two transaction
that may be involved in the deadlock
− Limitation:
−Both techniques may cause some transaction to be aborted
and restarted needlessly
ii. Deadlock Detection and resolution
– In this approach, deadlocks are allowed to happen
– The scheduler maintains a wait-for-graph for detecting cycle.
– When a chain like: Ti waits for Tj waits for Tk waits for Ti or Tj
occurs, then this creates a cycle.
– When the system is in the state of deadlock , some of the transaction
should be aborted by selected (victim) and rolled-back
– This can be done by aborting those transaction: that have made the
least work, the one with the lowest locks, and that have the least #
of abortion and so on
– Example:

55
iii. Timeouts
– It uses the period of time that several transaction have been
waiting to lock items
– It has lower overhead cost and it is simple
– If the transaction wait for a longer time than the predefined time
out period, the system assume that may be deadlocked and
aborted it
B. Starvation
– Starvation occurs when a particular transaction consistently
waits or restarted and never gets a chance to proceed further
while other transaction continue normally
– This may occur , if the waiting method for item locking:
• Gave priority for some transaction over others
• Problem in Victim selection algorithm- it is possible that the same
transaction may consistently be selected as victim and rolled-back
.example In Wound-Wait
– Solution
• FIFO
• Allow for transaction that wait for a longer time
• Give higher priority for transaction that have been aborted for 56
many time
3.Timestamp based concurrency control algorithm
• Timestamp
– In lock based concurrency control , conflicting actions of different
transactions are ordered by the order in which locks are obtained.
– Timestamp values are assigned based on time in which the transaction
are submitted to the system using the current date & time of the system
– A monotonically increasing variable (integer) indicating the age of an
operation or a transaction.
– A larger timestamp value indicates a more recent event or operation.
– Timestamp based algorithm uses timestamp to serialize the execution
of concurrent transactions.
– It doesn’t use lock, thus deadlock cannot be occurred
– In the timestamp ordering, conflicting operation in the schedule
shouldn’t violate serilazable ordering
– This can be achieved by associating timestamp value (TS) to each
database item which is denoted as follow:

57
a) Read_Ts(x): the read timestamp of x – this is the largest time
among all the time stamps of transaction that have successfully
read item X
b) Write_TS(X): the largest of all the timestamps of transaction
that have successfully written item X
 The concurrency control algorithm check whether conflict
operation violate the timestamp ordering in the following
manner: three options
i. Strict Timestamp Ordering
1. Transaction T issues a write_item(X) operation:
•If TS(T) > read_TS(X), then delay T until the
transaction T’ that wrote or read X has terminated
(committed or aborted).
2. Transaction T issues a read_item(X) operation:
•If TS(T) > write_TS(X), then delay T until the
transaction T’ that wrote X has terminated (committed or
aborted).
58
4. Schedules and Recoverability
 Why recovery is needed:
-Whenever a transaction is submitted to the DBMS for execution, the system is
responsible for making sure that either all operations in the transaction to be
completed successfully or the transaction has no effect on the database or any
other transaction.
-The DBMS may permit some operations of a transaction T to be applied to the
database but a transaction may fails after executing some of its operations
• What causes a Transaction to fail
1. A computer failure (system crash):
A hardware or software error occurs in the computer system during transaction
execution. If the hardware crashes, the contents of the computer’s internal
memory may be lost.
2. A transaction or system error:
Some operation in the transaction may cause it to fail, such as integer
overflow or division by zero. Transaction failure may also occur because of
erroneous parameter values or because of a logical programming error. In
addition, the user may interrupt the transaction during its execution.
59
3. Exception conditions detected by the transaction:
– Certain conditions forces cancellation of the transaction.
– For example, data for the transaction may not be found. such as insufficient
account balance in a banking database, may cause a transaction, such as a
fund withdrawal from that account, to be canceled.
4. Concurrency control enforcement:
The concurrency control method may decide to abort the transaction, to
be restarted later, because it violates serializability or because several
transactions are in a state of deadlock (see Chapter 2).
5. Disk failure:
Some disk blocks may lose their data because of a read or write malfunction
or because of a disk read/write head crash. This may happen during a read
or a write operation of the transaction.
6. Physical problems and catastrophes:
This refers to an endless list of problems that includes power or air-
conditioning failure, fire, theft, overwriting disks or tapes by mistake

60
Transactions should be durable, but we cannot prevent all sorts
of failures:
• Natural disasters
Prevention is better than cure
• Reliable OS
• Security
• UPS and surge protectors
Database Recovery Techniques

• Backup and Recovery Concepts


• Recovery Concepts Based on Deferred Update
• Recovery Concepts Based on Immediate Update
• Shadow Paging

62
 Transaction Manager : Accepts transaction commands from an application, which
tell the transaction manager when transactions begin and end, as well as information
about the expectations of the application.
The transaction processor performs the following tasks:

Transaction Request  Logging: In order to assure


durability, every change in the
Query Transaction Log database is logged separately on
Processor Manager Manager disk.

 Log manager initially writes the


Buffer Recovery log in buffers and negotiates with
Manager Manager the buffer manager to make sure
that buffers are written to disk at
appropriate times.
Data
 Recovery Manager: will be able
------------
Log to examine the log of changes and
restore the database to some
consistent state. 63
1. Recovery Concept
i. Purpose of Database Recovery
– To bring the database into the last consistent state, which existed
prior to the failure.
– To preserve transaction properties (Atomicity &and Durability).
• The recovery manager of a DBMS is responsible to ensure
– atomicity by undoing the action of transaction that do
not commit
– Durability by making sure that all actions of committed
transaction survive system crash
i. Types of Failure
– The database may become unavailable for use due to :
• Transaction failure: Transactions may fail because of
incorrect input, deadlock.
• System failure: System may fail because of addressing
error, application error, operating system fault, RAM
failure, etc.
• Media failure: Disk head crash, power disruption, etc. 64
 To recover from system failure , the system keeps information
about the change in the system log
 Strategy for recovery may be summarized as :
 Recovery from catastrophic
 If there is extensive damage to the wide portion of the
database
 This method restore a past copy of the database from the
backup storage and reconstructs operation of a
committed transaction from the back up log up to the
time of failure
 Recovery from non-catastrophic failure
 When the database is not physically damaged but has be
come in consistent
 The strategy uses undoing and redoing some operations
in order to restore to a consistent state. 65
 For instance,
– If failure occurs between commit and database buffers
being flushed to secondary storage then, to ensure
durability, recovery manager has to redo (roll forward)
transaction’s updates.
– If transaction had not committed at failure time,
recovery manager has to undo (rollback) any effects of
that transaction for atomicity.
Fig 2 :
status of
transactions
at the time
of system
fails
– DBMS starts at time t0, but fails at time tf.
– T1 and T6 have to be undone. In absence of any other
information, recovery manager has to redo T2, T3, T4, and T5. 66
i. Transaction Log
– For recovery from any type of failure data values prior to
modification (BFIM - BeFore Image) and the new value
after modification (AFIM – AFter Image) are required.
– These values and other information is stored in a
sequential file (appended file) called Transaction log
– These log files becomes very useful in brining back the
system to a stable state after a system crash.
– A sample log is given below. Back P and Next P point
to the previous and next log records of the same
transaction.
T ID Back P Next P Operation Data item BFIM AFIM
T1 0 1 Begin
T1 1 4 Write X X = 100 X = 200
T2 0 8 Begin
T1 2 5 W Y Y = 50 Y = 100
T1 4 7 R M M = 200 M = 200
T3 0 9 R N N = 400 N = 400
T1 5 nil End
67
I. Data Update : Four types
– Deferred Update:
 All transaction updates are recorded in the local workspace
(cache)
 All modified data items in the cache is then written after a
transaction ends its execution or after a fixed number of
transactions have completed their execution.
 During commit the updates are first recorded on the log and
then on the database
 If a transaction fails before reaching its commit point undo is
not needed because it didn’t change the database yet
 If a transaction fails after commit (writing on the log) but
before finishing saving to the data base redoing is needed from
the log

68
– Immediate Update:
 As soon as a data item is modified in cache, the disk
copy is updated.
 These update are first recorded on the log and on the disk
by force writing, before the database is updated
 If a transaction fails after recording some change to the
database but before reaching its commit point , this will
be rolled back
– Shadow update:
 The modified version of a data item does not overwrite
its disk copy but is written at a separate disk location.
 Multiple version of the same data item can be maintained
 Thus the old value ( before image BFIM) and the new
value (AFIM) are kept in the disk
 No need of Log for recovery
– In-place update: The disk version of the data item is
overwritten by the cache version.
69
I. Data Caching
 Data items to be modified are first stored into database cache
by the Cache Manager (CM) and after modification they are
flushed (written) to the disk
 When DBMS request for read /write operation on some item
 It check the requested data item is in the cache or not
 If it is not, the appropriate disk block are copied to the
cache
 If the cache is already full, some buffer replacement
policy can be used . Like
 Least recent used (LRU)
 FIFO
 While replacing buffers , first of all the updated value on
that buffer should be saved on the appropriate block in the
data base
70
I. Write-Ahead Logging
 When in-place update (immediate or deferred) is used
then log is necessary for recovery
 This log must be available to recovery manager
– This is achieved by Write-Ahead Logging (WAL)
protocol. WAL states that

– For Undo: Before a data item’s AFIM is flushed to the


database disk (overwriting the BFIM) its BFIM must be
written to the log and the log must be saved on a stable
store (log disk).

– For Redo: Before a transaction executes its commit


operation, all its AFIMs must be written to the log and
the log must be saved on a stable store.

71
 Standard Recovery Terminology
 Possible ways for flushing database cache to database
disk:
i. No-Steal: Cache cannot be flushed before transaction
commit.
ii. Steal: Cache can be flushed before transaction
commits.
iii. Force: if all Cache updates are immediately flushed
(forced) to disk when a transaction commits-----
force writing
iv. No-Force: if Cached are flushed to a disk when the
need arise after a committed transaction
 These give rise to four different ways for handling
recovery:
• Steal/No-Force (Undo/Redo)
• Steal/Force (Undo/No-redo)
• No-Steal/No-Force (No-undo/Redo)
• No-Steal/Force (No-undo/No-redo) 72
2. Recovery Scheme
i. Deferred Update (No Undo/Redo)
– The data update goes as follows:
 A set of transactions records their updates in the log.
 At commit point under WAL scheme these updates are saved on
database disk.
 No undo is required because no AFIM is flushed to the disk before a
transaction commits.
 After reboot from a failure the log is used to redo all the transactions
affected by this failure.
– Limitation: out of buffer space may be happened because transaction
changes must be held in the cache buffer until the commit point
– Type of deferred updated recovery environment
 Single User and Multiple user environment
a) Deferred Update in a single-user system
 There is no concurrent data sharing in a single user system.
 The data update goes as follows:
 A set of transactions records their updates in the log.
 At commit point under WAL scheme these updates are saved on
73
database disk.
74
a) Deferred Update with concurrent users
 This environment requires some concurrency control
mechanism to guarantee isolation property of transactions.
 In a system recovery, transactions which were recorded in
the log after the last checkpoint were redone.
 The recovery manager may scan some of the transactions
recorded before the checkpoint to get the AFIMs.

T4 and T5 are ignored because they didn’t reach their commit points
T1, T2 and T3 are redone because their commit point is after the last
checkpoint
75
 Two tables are required for implementing this protocol:
 Active table: All active transactions are entered in this
table.
 Commit table: Transactions to be committed are entered in
this table.
 During recovery, all transactions of the commit table are
redone and all transactions of active tables are ignored since
none of their AFIMs reached the database.
 It is possible that a commit table transaction may be redone
twice but this does not create any inconsistency because of a
redone is “idempotent”,
 that is, one redone for an AFIM is equivalent to multiple
redone for the same AFIM.

76
ii. Recovery Techniques Based on Immediate Update
 Undo/No-redo Algorithm
– In this algorithm AFIMs of a transaction are flushed to the
database disk under WAL before it commits.
– For this reason the recovery manager undoes all transactions
during recovery.
– No transaction is redone.
– It is possible that a transaction might have completed execution
and ready to commit but this transaction is also undone.

77
iii. Shadow Paging Recovery
 Maintain two page tables during life of a transaction: current page
and shadow page table.
 When transaction starts, two pages are the same.
 Shadow page table is never changed thereafter and is used to restore
database in event of failure.
 During transaction, current page table records all updates to
database.
 When transaction completes, current page table becomes shadow
page table.

X Y
X' Y'

Database

X and Y: Shadow copies of data items


X' and Y': Current copies of data items
78
 To manage access of data items by concurrent transactions two
directories (current and shadow) are used.
 The directory arrangement is illustrated below. Here a
page is a data item.

 Advantages over Log Based :


 Overhead of maintaining log is removed
 Recovering is faster since there is no need for undo and redo
 Disadvantages:
 Updated pages will change it location on the disk
 Need Garbage collection after a transaction committed so as to free
the page for future use
 Migration between the current and shadow directories may not t be 79
automatic
Thank You!

80

You might also like