DBMSUnit v Notes-1
DBMSUnit v Notes-1
DBMSUnit v Notes-1
UNIT – V
Transaction Processing: Transaction concepts, transaction states, ACID properties,
implementation of atomicity and durability, implementation of isolation
Concurrency control: concurrent executions, serializability, testing for serializability,
recoverability, concurrency control protocols, pessimistic and optimistic concurrency control
schemes, log based recovery, recovery with concurrent transactions, advanced recovery
techniques
TRANSACTION CONCEPTS
The transaction is a set of logically related operation. It contains a group of tasks.
A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
1
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Let's take an example to debit transaction from an account which consists of following
operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
The first operation reads X's value from database and stores it in a buffer.
The second operation will decrease the value of X by 500. So buffer will contain 3500.
The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2
then X's value will remain 4000 in the database which is not acceptable by the bank.
TRANSCTION STATES
In a database, the transaction can be in one of the following states -
Active state
o The active state is the first state of every transaction. In this state, the transaction is being
executed.
o For example: Insertion or deletion or updating a record is done here. But all the records are
still not saved to the database.
2
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is still
not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed in
this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In
this state, all the effects are now permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is said
to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch
the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If not
then it will abort or roll back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction,
all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
ACID PROPERTIES
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
3
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is
aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as
one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs
600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
Consistency
o The integrity constraints are maintained so that the database is consistent before and after
the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new
stable state.
o The consistent property of database states that every transaction sees a consistent database
instance.
4
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
o The transaction is used to transform the database from one consistent state to another
consistent state.
For example: The total amount must be maintained before or after the transaction.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be used
by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data
item can't be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent state.
It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
Atomicity:
Even in the case of mistakes, failures, or crashes, atomicity ensures that the database maintains
consistency. The following are some of the reasons why atomicity is essential in DBMS:
o Consistency: Atomicity ensures that the database remains in a consistent state at all times.
All changes made by a transaction are rolled back if it is interrupted or fails for any other
reason, returning the database to its initial state. By doing this, the database's consistency
and data integrity are maintained.
5
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
o Recovery: Atomicity guarantees that, in the event of a system failure or crash, the database
can be restored to a consistent state. All changes made by a transaction are undone if it is
interrupted or fails, and the database is then reset to its initial state using the undo log. This
guarantees that, even in the event of failure, the database may be restored to a consistent
condition.
o Concurrency: Atomicity makes assurance that transactions can run simultaneously without
affecting one another. Each transaction is carried out independently of the others, and its
modifications are kept separate. This guarantees that numerous users can access the database
concurrently without resulting in conflicts or inconsistent data.
o Reliability: Even in the face of mistakes or failures, atomicity makes the guarantee that the
database is trustworthy. By ensuring that transactions are atomic, the database remains
consistent and reliable, even in the event of system failures, crashes, or errors.
o Undo Log: An undo log is a mechanism used to keep track of the changes made by a
transaction before it is committed to the database. If a transaction fails, the undo log is used
to undo the changes made by the transaction, effectively rolling back the transaction. By
doing this, the database is guaranteed to remain in a consistent condition.
o Redo Log: A redo log is a mechanism used to keep track of the changes made by a
transaction after it is committed to the database. If a system failure occurs after a transaction
is committed but before its changes are written to disk, the redo log can be used to redo the
changes and ensure that the database is consistent.
o Two-Phase Commit: Two-phase commit is a protocol used to ensure that all nodes in a
distributed system commit or abort a transaction together. This ensures that the transaction
is executed atomically across all nodes and that the database remains consistent across the
entire system.
o Locking: Locking is a mechanism used to prevent multiple transactions from accessing the
same data concurrently. By ensuring that only one transaction can edit a specific piece of
data at once, locking helps to avoid conflicts and maintain the consistency of the database.
Durability:
One of the key characteristics of transactions in database management systems (DBMS) is
durability, which guarantees that changes made by a transaction once it has been committed
are permanently kept in the database and will not be lost even in the case of a system failure
or catastrophe.
Importance:
Durability is a critical property of transactions in database management systems (DBMS)
that ensures that once a transaction is committed, its changes are permanently stored in the
database and will not be lost, even in the event of a system failure or crash. The following
are some of the reasons why durability is essential in DBMS:
o Data Integrity: Durability ensures that the data in the database remains consistent and
accurate, even in the event of a system failure or crash. It guarantees that committed
transactions are durable and will be recovered without data loss or corruption.
o Reliability: Durability guarantees that the database will continue to be dependable despite
faults or failures. In the event of system problems, crashes, or failures, the database is kept
consistent and trustworthy by making sure that committed transactions are durable.
6
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
o Recovery: Durability guarantees that, in the event of a system failure or crash, the database
can be restored to a consistent state. The database can be restored to a consistent state if a
committed transaction is lost due to a system failure or crash since it can be recovered from
the redo log or other backup storage.
o Availability: Durability ensures that the data in the database is always available for access
by users, even in the event of a system failure or crash. It ensures that committed
transactions are always retained in the database and are not lost in the event of a system
crash.
o Transactions: Transactions are used to group related operations that need to be executed
atomically. They are either committed, in which case all their changes become permanent,
or rolled back, in which case none of their changes are made permanent.
o Logging: Logging is a technique that involves recording all changes made to the database in
a separate file called a log. The log is used to recover the database in case of a failure.
Write-ahead logging is a common technique that guarantees that data is written to the log
before it is written to the database.
o Shadow Paging: Shadow paging is a technique that involves making a copy of the database
before any changes are made. The copy is used to provide a consistent view of the database
in case of failure. The modifications are made to the original database after a transaction has
been committed.
o Backup and Recovery: In order to guarantee that the database can be recovered to a
consistent state in the event of a failure, backup and recovery procedures are used. This
involves making regular backups of the database and keeping track of changes made to the
database since the last backup.
IMPLEMENTATION OF ISOLATION
Isolation is one of the core ACID properties of a database transaction, ensuring that the
operations of one transaction remain hidden from other transactions until completion. It means
7
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
that no two transactions should interfere with each other and affect the other's intermediate
state.
Isolation Levels
Isolation levels defines the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. There are four levels of
transaction isolation defined by SQL -
1. Serializable
The highest isolation level.
Guarantees full serializability and ensures complete isolation of transaction operations.
2. Repeatable Read
This is the most restrictive isolation level.
The transaction holds read locks on all rows it references.
It holds write locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, it avoids non repeatable
read.
3. Read Committed
This isolation level allows only committed data to be read.
Thus it does not allows dirty read (i.e. one transaction reading of data immediately after
written by another transaction).
The transaction hold a read or write lock on the current row, and thus prevent other rows
from reading, updating or deleting it.
4. Read Uncommitted
It is lowest isolation level.
In this level, one transaction may read not yet committed changes made by other
transaction.
This level allows dirty reads.
The proper isolation level or concurrency control mechanism to use depends on the specific
requirements of a system and its workload. Some systems may prioritize high throughput and
can tolerate lower isolation levels, while others might require strict consistency and higher
isolation.
Serializable NO NO
Repeatable Read NO NO
8
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
CONCURRENT EXECUTIONS
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case, concurrent
execution of the database is performed.
Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing the
overall throughput and efficiency of the system, several transactions are executed.
Concurrency control is a very important concept of DBMS which ensures the
simultaneous execution or manipulation of data by several processes or user without
resulting in data inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of
the operations in the database
Implementing isolation typically involves concurrency control mechanisms. Here are common
mechanisms used:
1. Locking Mechanisms
Locking ensures exclusive access to a data item for a transaction. This means that while one
transaction holds a lock on a data item, no other transaction can access that item.
Shared Lock (S-lock): Allows a transaction to read an item but not write to it.
Exclusive Lock (X-lock): Allows a transaction to read and write an item. No other
transaction can read or write until the lock is released.
Two-phase Locking (2PL): This protocol ensures that a transaction acquires all the locks
before it releases any. This results in a growing phase (acquiring locks and not releasing
any) and a shrinking phase (releasing locks and not acquiring any).
2. Timestamp-based Protocols
Every transaction is assigned a unique timestamp when it starts. This timestamp determines the
order of transactions. Transactions can only access the database if they respect the timestamp
order, ensuring older transactions get priority.
Concurrent execution refers to the simultaneous execution of more than one transaction. This
is a common scenario in multi-user database environments where many users or applications
might be accessing or modifying the database at the same time. Concurrent execution is crucial
9
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
for achieving high throughput and efficient resource utilization. However, it introduces the
potential for conflicts and data inconsistencies.
Advantages of Concurrent Execution
1. Increased System Throughput: Multiple transactions can be in progress at the same time,
but at different stages
2. Maximized Processor Utilization: If one transaction is waiting for I/O operations, another
transaction can utilize the processor.
3. Decreased Wait Time: Transactions no longer have to wait for other long transactions to
complete.
4. Improved Transaction Response Time: Transactions get processed faster because they
can be executed in parallel.
Examples:
T1 T2
Read(A)
A = A+50
Read(A)
A = A+100
Write(A)
Write(A)
Result: T1's updates are lost.
10
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
commit
Result: T2 has a "dirty" value, that was never committed in T1 and doesn't actually exist in the
database.
when a single transaction reads the same row multiple times and observes different values each
time. This occurs because another concurrent transaction has modified the row between the two
reads.
Examples:
T1 T2
Read(A)
Read(A)
A = A+100
Write(A)
Read(A)
Result: Within the same transaction, T1 has read two different values for the same data item.
This inconsistency is the unrepeatable read.
To manage concurrent execution and ensure the consistency and reliability of the database,
DBMSs use concurrency control techniques. These typically include locking mechanisms,
timestamps, optimistic concurrency control, and serializability checks.
SERIALIZABILITY
Serializability is the property that ensures that the concurrent execution of a set of transactions
produces the same result as if these transactions were executed one after the other without any
overlapping, i.e., serially.
11
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
All the operations of transaction T1 on data items A and then B executes and then in transaction
T2 all the operations on data items A and B execute.
Non Serial Schedule: The schedule in which operations present within the transaction are
intermixed. This may lead to conflicts in the result or inconsistency in the resultant data.
For example- Consider following two transactions,
T1 T2
Read(A)
Write(A)
Read(A)
Write(B)
Read(A)
Write(B)
Read(B)
Write(B)
The above transaction is non serial which result in inconsistency or conflicts in the data.
Why is Serializability Important?
In a database system, for performance optimization, multiple transactions often run
concurrently. While concurrency improves performance, it can introduce several data
inconsistency problems if not managed properly. Serializability ensures that even when
transactions are executed concurrently, the database remains consistent, producing a result that's
equivalent to a serial execution of these transactions.
Conflict Serializability
A schedule is conflict serializable if it can be transformed into a serial schedule (i.e., a schedule
with no overlapping transactions) by swapping non-conflicting operations. If it is not possible to
transform a given schedule to any serial schedule using swaps of non-conflicting operations,
then the schedule is not conflict serializable.
13
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Read(A)
Write(A)
Read(B)
Write(B)
( R1(A) \) conflicts with W1(A), so there's an edge from T1 to T1, but this is ignored because
they´re from the same transaction.
R2(A) conflicts with W1(A), so there's an edge from T2 to T1.
No other conflicting pairs.
The graph has nodes T1 and T2 with an edge from T2 to T1. There are no cycles in this graph.
Decision: Since the precedence graph doesn't have any cycles,Cycle is a path using which we
can start from one node and reach to the same node. the schedule S is conflict serializable. The
equivalent serial schedules, based on the graph, would be T2 followed by T1.
View Serializability
View Serializability is one of the types of serializability in DBMS that ensures the consistency
of a database schedule. Unlike conflict serializability, which cares about the order of
conflicting operations, view serializability only cares about the final outcome. That is, two
schedules are view equivalent if they have:
Initial Read: The same set of initial reads (i.e., a read by a transaction with no preceding
write by another transaction on the same data item).
Updated Read: For any other writes on a data item in between, if a transaction \(T_j\) reads
the result of a write by transaction \(T_i\) in one schedule, then \(T_j\) should read the result
of a write by \(T_i\) in the other schedule as well.
Final Write: The same set of final writes (i.e., a write by a transaction with no subsequent
writes by another transaction on the same data item).
example:
Consider two transactions \(T_1\) and \(T_2\):
Schedule 1(S1):
T1 T2
Write(A)
Read(A)
Write(B)
Read(B)
Write(B)
Commit Commit
14
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Schedule 2(S2):
T1 T2
Read(A)
Write(A)
Write(A)
Read(B)
Write(B)
Commit Commit
Here,
RECOVERABILITY
The need for recoverability arises because databases are designed to ensure data reliability and
consistency. If a system isn't recoverable:
Levels of Recoverability
15
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
1. Recoverable Schedules
A schedule is said to be recoverable if, for any pair of transactions \(T_i\) and \(T_j\), if \(T_j\)
reads a data item previously written by \(T_i\), then \(T_i\) must commit before \(T_j\) commits.
If a transaction fails for any reason and needs to be rolled back, the system can recover without
having to rollback other transactions that have read or used data written by the failed transaction.
Example of a Recoverable Schedule
Suppose we have two transactions \(T_1\) and \(T_2\).
T1 T2
Write(A)
Read(A)
Commit
Write(B)
Commit
In the above schedule, \(T_2\) reads a value written by \(T_1\), but \(T_1\) commits before
\(T_2\), making the schedule recoverable.
2. Non-Recoverable Schedules
A schedule is said to be non-recoverable (or irrecoverable) if there exists a pair of transactions
\(T_i\) and \(T_j\) such that \(T_j\) reads a data item previously written by \(T_i\), but \(T_i\)
has not committed yet and \(T_j\) commits before \(T_i\). If \(T_i\) fails and needs to be rolled
back after \(T_j\) has committed, there's no straightforward way to roll back the effects of
\(T_j\), leading to potential data inconsistency.
In this schedule, \(T_2\) reads a value written by \(T_1\) and commits before \(T_1\) does. If
\(T_1\) encounters a failure and has to be rolled back after \(T_2\) has committed, we're left in a
problematic situation since we cannot easily roll back \(T_2\), making the schedule non-
recoverable.
16
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
3. Cascading Rollback
A cascading rollback occurs when the rollback of a single transaction causes one or more
dependent transactions to be rolled back. This situation can arise when one transaction reads
uncommitted changes of another transaction, and then the latter transaction fails and needs to be
rolled back. Consequently, any transaction that has read the uncommitted changes of the failed
transaction also needs to be rolled back, leading to a cascade effect.
Example of Cascading Rollback
Consider two transactions \(T_1\) and \(T_2\):
T1 T2
Write(A)
Read(A)
Write(B)
Abort some failure
Rollback
Rollback
Here, \(T_2\) reads an uncommitted value of A written by \(T_1\). When \(T_1\) fails and is
rolled back, \(T_2\) also has to be rolled back, leading to a cascading rollback. This is
undesirable because it wastes computational effort and can complicate recovery procedures.
4. Cascadeless Schedules
A schedule is considered cascadeless if transactions only read committed values. This means, in
such a schedule, a transaction can read a value written by another transaction only after the
latter has committed. Cascadeless schedules prevent cascading rollbacks.
Example of Cascadeless Schedule
Consider two transactions \(T_1\) and \(T_2\):
T1 T2
Write(A)
Commit
Read(A)
Write(B)
Commit
17
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
In this schedule, \(T_2\) reads the value of A only after \(T_1\) has committed. Thus, even if
\(T_1\) were to fail before committing (not shown in this schedule), it would not affect \(T_2\).
This means there's no risk of cascading rollback in this schedule.
1. Consistency: Without locks, multiple transactions could modify the same data item
simultaneously, resulting in an inconsistent state.
2. Isolation: Locks ensure that the operations of one transaction are isolated from other
transactions, i.e., they are invisible to other transactions until the transaction is committed.
3. Concurrency: While ensuring consistency and isolation, locks also allow multiple
transactions to be processed simultaneously by the system, optimizing system throughput
and overall performance.
4. Avoiding Conflicts: Locks help in avoiding data conflicts that might arise due to
simultaneous read and write operations by different transactions on the same data item.
5. Preventing Dirty Reads: With the help of locks, a transaction is prevented from reading
data that hasn't yet been committed by another transaction.
Lock-Based Protocols
The Simple lock based protocol is a mechanism in which there is exclusive use of locks on the
data item for current transaction.
Types of Locks: There are two types of locks used -
This lock allows a transaction to read a data item. Multiple transactions can hold shared locks
on the same data item simultaneously. It is denoted by Lock-S. This is also called as read lock.
This lock allows a transaction to read and write a data item. If a transaction holds an exclusive
lock on an item, no other transaction can hold any kind of lock on the same item. It is denoted
as Lock-X. This is also called as write lock.
T1 T2
Lock-S(A)
18
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Read(A)
Unlock(A) Lock-X(A)
Write(B)
Read(A)
Write(A)
Unlock(A)
Shared lock is used for when the Exclusive lock is used when the
transaction wants to perform read transaction wants to perform both read
operation. and write operation.
Any number of transactions can But exclusive lock can be hold by only
hold shared lock on an item. one transaction.
Using shared lock data item can Using exclusive lock data can be
be viewed. inserted or deleted.
The two-phase locking protocol ensures a transaction gets all the locks it needs before it
releases any. The protocol has two phases:
Growing Phase: The transaction may obtain any number of locks but cannot release any.
Shrinking Phase: The transaction may release but cannot obtain any new locks.
The point where the transaction releases its first lock is the end of the growing phase and the
beginning of the shrinking phase.
This protocol ensures conflict-serializability and avoids many of the concurrency issues, but it
doesn't guarantee the absence of deadlocks.
T1 T2
Lock-S(A)
Lock-S(A)
Lock-X(B)
Unlock(A)
19
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Lock-X(C)
Unlock(B)
Unlock(A)
Unlock(C)
Deadlocks: The main disadvantage of 2PL is that it can lead to deadlocks, where two or
more transactions wait indefinitely for a resource locked by the other.
Reduced Concurrency (in certain cases): Locking can block transactions, which can
reduce concurrency. For example, if one transaction holds a lock for a long time, other
transactions needing that lock will be blocked.
Overhead: Maintaining locks, especially in systems with a large number of items and
transactions, requires overhead. There's a time cost associated with acquiring and releasing
locks, and memory overhead for maintaining the lock table.
Starvation: It's possible for some transactions to get repeatedly delayed if other transactions
are continually requesting and acquiring locks.
Transactions are not allowed to release any locks until after they commit or abort.
20
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Ensures serializability and avoids the problem of cascading rollbacks.
However, it can reduce concurrency.
Reduced Concurrency: Since transactions hold onto locks until they commit or abort,
other transactions might be blocked for longer periods.
Potential for Deadlocks: Holding onto locks can increase the chances of deadlocks.
A transaction can release a lock after using it, but it cannot commit until all locks have been
acquired.
Like strict 2PL, rigorous 2PL is deadlock-free and ensures serializability.
Deadlock Possibility: Still susceptible to deadlocks as transactions might hold some locks
while releasing others.
A transaction must request all the locks it will ever need before it begins execution. If any of
the requested locks are unavailable, the transaction is delayed until they are all available.
This approach can avoid deadlocks since transactions only start when all their required locks
are available.
21
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Pros Conservative or Static Two-Phase Locking
Avoids Deadlocks: By ensuring that all required locks are acquired before a transaction
starts, it inherently avoids the possibility of deadlocks.
Serializability: Ensures serializable schedules.
Reduced Concurrency: Transactions might be delayed until all required locks are available,
leading to potential inefficiencies.
Timestamp-based protocols are concurrency control mechanisms used in databases to ensure
serializability and to avoid conflicts without the need for locking. The main idea behind these
protocols is to use a timestamp to determine the order in which transactions should be executed.
Each transaction is assigned a unique timestamp when it starts.
Here are the primary rules associated with timestamp-based protocols:
1. Timestamp Assignment: When a transaction \( T \) starts, it is given a timestamp \( TS(T) \).
This timestamp can be the system's clock time or a logical counter that increments with each
new transaction.
2. Reading/Writing Rules: Timestamp-based protocols use the following rules to determine if
a transaction can read or write an item:
Read Rule: If a transaction \( T \) wants to read an item that was last written by transaction \( T'
\) with \( TS(T') > TS(T) \), the read is rejected because it's trying to read a value from the
future. Otherwise, it can read the item.
Write Rule: If a transaction \( T \) wants to write an item that has been read or written by a
transaction \( T' \) with \( TS(T') > TS(T) \), the write is rejected. This avoids overwriting a
value that has been read or written by a younger transaction.
3. Handling Violations: When a transaction's read or write operation violates the rules, the
transaction can be rolled back and restarted or aborted, depending on the specific protocol in
use.
Example-2
1. Deadlock-free: Since there are no locks involved, there's no chance of deadlocks occurring.
2. Fairness: Older transactions have priority over newer ones, ensuring that transactions do
not starve and get executed in a fair manner.
3. Increased Concurrency: In many situations, timestamp-based protocols can provide better
concurrency than lock-based methods.
1. Starvation: If a transaction is continually rolled back due to timestamp rules, it may starve.
2. Overhead: Maintaining and comparing timestamps can introduce overhead, especially in
systems with a high transaction rate.
3. Cascading Rollbacks: A rollback of one transaction might cause rollbacks of other
transactions.
One of the most well-known timestamp-based protocols is the Thomas Write Rule, which
modifies the write rule to allow certain writes that would be rejected under the basic timestamp
protocol. The idea is to ignore a write that would have no effect on the outcome, rather than
rolling back the transaction. This reduces the number of rollbacks but can result in non-
serializable schedules.
In practice, timestamp-based protocols offer an alternative approach to concurrency control,
especially useful in systems where locking leads to frequent deadlocks or reduced concurrency.
However, careful implementation and tuning are required to handle potential issues like
transaction starvation or cascading rollbacks.
Validation-based protocols, also known as Optimistic Concurrency Control (OCC), are a set of
techniques that aim to increase system concurrency and performance by assuming that conflicts
23
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
between transactions will be rare. Unlike other concurrency control methods, which try to
prevent conflicts proactively using locks or timestamps, OCC checks for conflicts only at
transaction commit time.
1. Read Phase
o The transaction reads from the database but does not write to it.
o All updates are made to a local copy of the data items.
2. Validation Phase
o Before committing, the system checks to ensure that this transaction's local updates
won't cause conflicts with other transactions.
o The validation can take many forms, depending on the specific protocol.
3. Write Phase
o If the transaction passes validation, its updates are applied to the database.
o If it doesn't pass validation, the transaction is aborted and restarted.
High Concurrency: Since transactions don’t acquire locks, more than one transaction can
process the same data item concurrently.
Deadlock-free: Absence of locks means there's no deadlock scenario.
24
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Overhead: The validation process requires additional processing overhead.
Aborts: Transactions might be aborted even if there's no real conflict, leading to wasted
processing.
Validation-based protocols work best in scenarios where conflicts are rare. If the system
anticipates many conflicts, then the high rate of transaction restarts might offset the advantages
of increased concurrency.
In the context of database systems, "granularity" refers to the size or extent of a data item that
can be locked by a transaction. The idea behind multiple granularity is to provide a hierarchical
structure that allows locks at various levels, ranging from coarse-grained (like an entire table) to
fine-grained (like a single row or even a single field). This hierarchy offers flexibility in
achieving the right balance between concurrency and data integrity.
The concept of multiple granularity can be visualized as a tree. Consider a database system
where:
To ensure the consistency and correctness of a system that allows multiple granularity, it's
crucial to introduce the concept of "intention locks." These locks indicate a transaction's
intention to acquire a finer-grained lock in the future.
There are three main types of intention locks:
1. Intention Shared (IS): When a Transaction needs S lock on a node "K", the transaction
would need to apply IS lock on all the precedent nodes of "K", starting from the root node. So,
when a node is found locked in IS mode, it indicates that some of its descendent nodes must be
locked in S mode.
Example: Suppose a transaction wants to read a few records from a table but not the whole
table. It might set an IS lock on the table, and then set individual S locks on the specific rows it
reads.
2. Intention Exclusive (IX): When a Transaction needs X lock on a node "K", the transation
would need apply IX lock on all the precedent nodes of "K", starting from the root node. So,
when a node is found locked in IX mode, it indicates that some of its descendent nodes must be
locked in X mode.
25
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Example: If a transaction aims to update certain records within a table, it may set an IX lock on
the table and subsequently set X locks on specific rows it updates.
3. Shared Intention Exclusive (SIX): When a node is locked in SIX mode; it indicates that the
node is explicitly locked in S mode and Ix mode. So, the entire tree rooted by that node is
locked in S mode and some nodes in that are locked in X mode. This mode is compatible only
with IS mode.
Example: Suppose a transaction wants to read an entire table but also update certain rows. It
would set a SIX lock on the table. This tells other transactions they can read the table but cannot
update it until the SIX lock is released. Meanwhile, the original transaction can set X locks on
specific rows it wishes to update.
| | NL | IS | IX | S | SIX | X |
|:-----:|:-----:|:------:|:-----:|:------:|:-----:|:------:|
| NL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| IS | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| IX | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| SIX | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| X | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
To prevent these conflicts, it locks the data that a transaction is using until the transaction is
completed. This approach is 'pessimistic' because it assumes the worst-case scenario – that every
transaction might lead to a conflict. The data is therefore locked in order to prevent conflicts from
happening.I've mentioned two technical terms here that need clarification: locks and conflict.
Pessimistic concurrency control is very safe because it prevents conflicts from occurring by
blocking them before they get a chance to start. A write lock on a database item prevents other
transactions from reading or updating that item while that lock is held, similar to to how a library
stops more than one person from trying to borrow the same physical book at the same time.
A read lock on a database item allows other transactions to also obtain a read lock for that item, but
prevents transactions from updating that item. This is analogous to borrowing an e-book, where
multiple people can borrow the same e-book at the same time, but can’t make any updates to it.
AccountID Balance
12345 $1500
Here are the steps of how this will work using write locks:
1. Start of T1 (Withdrawal): T1 requests to update the balance of AccountID 12345. The database
system places an exclusive write lock on the row for AccountID 12345, preventing other
transactions from reading or writing to this row until T1 is completed. T1 reads the balance
($1500).
2. T1 Processing: T1 calculates the new balance as $1200 ($1500 - $300).
3. Commit T1: T1 writes the new balance ($1200) back to the database. Upon successful commit,
T1 releases the exclusive lock on AccountID 12345.
4. Start of T2 (Deposit) After T1 Completes: Now that T1 has completed and the lock is released,
T2 can start. T2 attempts to read and update the balance for AccountID 12345. The database
27
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
system places an exclusive lock on the row for AccountID 12345 for T2, ensuring no other
transactions can interfere. T2 reads the updated balance ($1200).
5. T2 Processing: T2 calculates the new balance as $1600 ($1200 + $400).
6. Commit T2: T2 writes the new balance ($1600) back to the database. Upon successful commit,
T2 releases the exclusive lock on AccountID 12345.
7. Result: The Accounts table is updated using locks After T1: $1200 After T2: $1600
Without a write lock in this example, T1 and T2 could read the original balance of $1500 at the
same time. So, instead of a balance of $1200 after T1 has committed, T2 still reads the original
balance of $1500 and adds $400. This would cause the final balance to be $1500 + $400 =
$1900 (instead of $1600).
Absence of locking has created free money, which is never a bad thing for a customer. But, if
money can be conjured out of thin air because of these conflicts, it can also vanish, and accidentally
shrinking bank balances are a quick way to make customers unhappy.
How Pessimistic Concurrency Controls Guarantee the Read Committed Isolation Level
So, how exactly does pessimistic concurrency control work in ensuring the isolation guarantee, that
is the “I” in ACID? The implementation details can vary across different DBMS. But the
explanation here shows the general approach.
the read committed isolation level prevents dirty writes and dirty reads.
Preventing Dirty Writes
Overwriting data that has already been written by another transaction but not yet committed is
called a dirty write. A common approach to preventing dirty writes is to use pessimistic
concurrency control. For example, by using a write lock at the row level.
When a transaction wants to modify a row, it acquires a lock on that row and holds it until the
transaction is complete. Recall that write locks can only be held by a single transaction. This
prevents another transaction from acquiring a lock to modify that row.
But what happens if you are trying to read something that is already being updated but the
transaction has not yet committed? In this instance, the write lock saves the day again.
Since write locks are exclusive (can’t be shared with other transactions), any transaction wanting to
read the same database item will have to wait until the transaction with the write lock is committed
(or aborted, if it fails). This prevents other transactions from reading uncommitted changes.
Unlike pessimistic concurrency control – which prevents conflicts from occurring by blocking them
before they get a chance to start – optimistic concurrency control checks for conflicts at the end of a
transaction. With optimistic concurrency control, multiple transactions can read or update the same
database item without acquiring locks.
1. Start of Transactions: T1 reads the balance, version number, and timestamp for AccountID
12345. Simultaneously, T2 reads the same row with the same balance, version number, and
timestamp.
2. Processing: T1 calculates the new balance as $800 ($1000 - $200) but does not write it back
immediately. T2 calculates the new balance as $1300 ($1000 + $300) but also waits to commit.
3. Attempt to Commit T1: Before committing, T1 checks the current VersionNumber and
Timestamp of AccountID 12345 in the database. Since no other transaction has modified the
row, T1 updates the balance to $800, increments the VersionNumber to 2, updates the
Timestamp, and commits successfully.
4. Attempt to Commit T2: T2 attempts to commit by first verifying the VersionNumber and
Timestamp. T2 finds that the VersionNumber and Timestamp have changed (now
VersionNumber is 2, and Timestamp is updated), indicating another transaction (T1) has
updated the row. Since the version number and timestamp have changed, T2 realises there was a
conflict.
29
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
5. Resolution for T2: T2 must restart its transaction. It re-reads the updated balance of $800, the
new VersionNumber 2, and the updated Timestamp. T2 recalculates the new balance as $1100
($800 + $300), updates the VersionNumber to 3, updates the Timestamp, and commits
successfully.
Result: The Accounts table is updated sequentially and safely without any locks: After T1:
$800, VersionNumber: 2. After T2: $1100, VersionNumber: 3.
In database management systems (DBMS), ensuring the consistency and durability of data is
crucial. One of the key mechanisms used to achieve this is log-based recovery. In this article, we’ll
explore what log-based recovery is, how it works, and its importance in maintaining the integrity of
databases.
LOG-BASED RECOVERY
Log-based recovery is a technique used in DBMS to recover the database to a consistent state in the
event of a system failure or crash. It relies on maintaining a transaction log, which is a record of all
the transactions that have been executed on the database. The log contains information about the
operations performed by each transaction, such as updates, inserts, and deletes, along with the old
and new values of the affected data items.
Log-based recovery works by using the transaction log to redo or undo transactions as necessary to
bring the database back to a consistent state. There are two main phases involved in log-based
recovery: redo and undo.
Atomicity and Durability: Log-based recovery ensures that transactions are either fully
completed or fully undone, preserving the atomicity property of transactions. It also ensures that
committed changes are durable and not lost due to a system failure.
30
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
Recovery from Crashes: Log-based recovery allows the DBMS to recover from crashes
quickly and efficiently, minimizing downtime and ensuring that the database remains available
and reliable.
Transaction Rollback: Log-based recovery enables the rollback of transactions that have not
been committed, ensuring that no partial changes are left in the database after a crash.
Log-based recovery is a critical component of modern database management systems, ensuring the
consistency, durability, and recoverability of data in the face of system failures. By maintaining a
transaction log and using it to redo and undo transactions as necessary, DBMS can recover from
crashes and ensure that data remains safe and reliable.
This pertains to how the recovery system interacts with the concurrency control component of the
DBMS.
Recovery must be aware of concurrency control mechanisms such as locks to ensure that, during
recovery, operations are redone or undone in a manner consistent with the original schedule of
transactions.
For example, if strict two-phase locking is used, it can simplify the recovery process. Under this
protocol, once a transaction releases its first lock, it cannot acquire any new locks. This ensures that
if a transaction commits, its effects are permanent and won't be overridden by any other transaction
that was running concurrently.
Transaction Rollback
Sometimes, instead of recovering the whole system, it's more efficient to just rollback a particular
transaction that has caused inconsistency or when a deadlock occurs.
31
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
When errors are detected during transaction execution (like constraint violations) or if a user issues
a rollback command, the system uses the logs to undo the actions of that specific transaction,
ensuring the database remains consistent.
The system keeps track of all the operations performed by a transaction. If a rollback is necessary,
these operations are reversed using the log records.
Checkpoints
Checkpointing is a technique where the DBMS periodically takes a snapshot of the current state of
the database and writes all changes to the disk. This reduces the amount of work during recovery.
By creating checkpoints, recovery operations don't need to start from the very beginning. Instead,
they can begin from the most recent checkpoint, thereby considerably speeding up the recovery
process.
During the checkpoint, ongoing transactions might be temporarily suspended, or their logs might be
force-written to the stable storage, depending on the implementation.
Restart Recovery
In the case of a system crash, the recovery manager uses the logs to restore the database to the most
recent consistent state. This process is called restart recovery.
Initially, an analysis phase determines the state of all transactions at the time of the crash.
Committed transactions and those that had not started are ignored.
The redo phase then ensures that all logged updates of committed transactions are reflected in the
database. Lastly, the undo phase rolls back the transactions that were active during the crash to
ensure atomicity.
The presence of checkpoints can make the restart recovery process faster since the system can start
the redo phase from the most recent checkpoint rather than from the beginning of the log.
Advanced recovery techniques
The database recovery techniques in DBMS are used to recover the data at times of system failure.
The recovery techniques in DBMS maintain the properties of atomicity and durability of the
database. A system is not called durable if it fails during a transaction and loses all its data and a
system is not called atomic, if the data is in a partial state of update during the transaction. The data
recovery techniques in DBMS make sure, that the state of data is preserved to protect the atomic
property and the data is always recoverable to protect the durability property. The following
techniques are used to recover data in a DBMS,
A checkpoint is made after all the records of a transaction are written to logs to transfer all the logs
from the local storage to the permanent storage for future use. Read more on Checkpoint in DBMS.
32
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
ADVANCED RECOVERY TECHNIQUES
What is the Conceded Update Method?
In the conceded update method, the updates are not made to the data until the transaction reaches
the final phase or at the commit operating. After this operation is performed, the data is modified
and permanently stored in the main memory. The logs are maintained throughout the operation and
are used in case of failure to find the point of failure. This provides us an advantage as even if the
system fails before the commit stage, the data in the database will not be modified and the status
will be managed. If the system fails after the commit stage, we can redo the changes to the new
stage easily compared to the process involved with the undo operation.
Deferred updates and immediate updates are database recovery techniques in DBMS that are used
to maintain the transaction log files of the DBMS.
In the Deferred update, the state of the data in the database is not changed immediately after any
transaction is executed, but as soon as the commit has been made, the changes are recorded in the
log file, and the state of data is changed.
In the Immediate update, at every transaction, the database is updated directly, and a log file is also
maintained containing the old and new values.
A backup is a copy of the current state of the database that is stored in another location. This
backup is useful in cases when the system is destroyed by natural disasters or physical damage.
These backups can be used to recover the database to the state at which the backup was made.
Different methods are being used in backup which is as follows,
An Immediate backup are copies that are kept in devices like hard disks or any other dives.
When a disk crashes or any technical fault occurs we can use these data to recover the data.
An Archival backup is a copy of the database kept in cloud environments or large storage
systems in another region. These are used to recover the data when the system is affected by
a natural disaster.
33
Matrusri Engineering College
UNIT-5-Lecture Notes for BE CSE(DS) III SEM DBMS
What are Transaction Logs?
The transaction logs are used to keep track of all the transactions that have made an update to the
data in the database. The following steps are followed to recover the data using transaction logs,
The recovery manager searches through all the log files and finds the transactions that have
a start_transaction stage and doesn't have the commit stage.
The transactions that are of the above case are rolled back to the old state with the help of
the logs using the rollback command
The transactions that have a commit command will have made changes to the database and
these changes are recorded in the logs. These changes will also be reverted using the undo
operation.
In shadow paging, a database is split into n- multiple pages that represent a fixed-size disk
memory.
Similarly, n shadow pages are also created which are copies of the real database.
At the beginning of a transaction, the state in the database is copied to the shadow pages.
The shadow pages will not be modified during the transaction and only the real database
will be modified.
When the transaction reaches the commit stage, the changes are made to the shadow pages.
The changes are made in a way that if the i-th part of the hard disk has modifications, then
the i-th shadow page is also modified.
If there is a failure of the system, the real pages of the database are compared with the
shadow pages, and recovery operations are performed.
In the Caching/Buffering method, a collection of buffers called DBMS buffers are present
in the logical memory. The buffers are used to store all logs during the process and the update to
the main log file is made when the transaction reaches the commit stage.
--END--
34
Matrusri Engineering College