Advanced Database Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21
At a glance
Powered by AI
Schedules are used to model the execution of transactions in a database system and are important concepts in concurrency control theory. Different types of schedules like serial and serializable schedules exist based on the ordering of transactions.

A schedule models the execution of transactions as a list or partial order of operations. It abstractly describes the actions performed by transactions in a system ordered by time.

The main types of schedules discussed are serial, serializable, and recoverable schedules. Serializable schedules allow concurrent execution of transactions while avoiding anomalies.

Schedule (computer science)

From Wikipedia, the free encyclopedia



This article needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be challenged
and removed. (November 2012)
In the fields of databases and transaction processing (transaction management), a schedule
(or history) of a system is an abstract model to describe execution of transactions running in
the system. Often it is a list of operations (actions) ordered by time, performed by a set of
transactions that are executed together in the system. If order in time between certain
operations is not determined by the system, then a partial order is used. Examples of such
operations are requesting a read operation, reading, writing, aborting, committing, requesting
lock, locking, etc. Not all transaction operation types should be included in a schedule, and
typically only selected operation types (e.g., data access operations) are included, as needed
to reason about and describe certain phenomena. Schedules and schedule properties are
fundamental concepts in database concurrency control theory.
Contents
1 Formal description
2 Types of schedule
o 2.1 Serial
o 2.2 Serializable
2.2.1 Conflicting actions
2.2.2 Conflict equivalence
2.2.3 Conflict-serializable
2.2.4 Commitment-ordered
2.2.5 View equivalence
2.2.6 View-serializable
o 2.3 Recoverable
2.3.1 Unrecoverable
2.3.2 Avoids cascading aborts (rollbacks)
2.3.3 Strict
3 Hierarchical relationship between serializability classes
4 Practical implementations
5 See also
6 References
Formal description
The following is an example of a schedule:

In this example, the horizontal axis represents the different transactions in the schedule D.
The vertical axis represents time order of operations. Schedule D consists of three
transactions T1, T2, T3. The schedule describes the actions of the transactions as seen by the
DBMS. First T1 Reads and Writes to object X, and then Commits. Then T2 Reads and Writes
to object Y and Commits, and finally T3 Reads and Writes to object Z and Commits. This is
an example of a serial schedule, i.e., sequential with no overlap in time, because the actions
of in all three transactions are sequential, and the transactions are not interleaved in time.
Representing the schedule D above by a table (rather than a list) is just for the convenience of
identifying each transaction's operations in a glance. This notation is used throughout the
article below. A more common way in the technical literature for representing such schedule
is by a list:
D = R1(X) W1(X) Com1 R2(Y) W2(Y) Com2 R3(Z) W3(Z) Com3
Usually, for the purpose of reasoning about concurrency control in databases, an operation is
modeled as atomic, occurring at a point in time, without duration. When this is not
satisfactory start and end time-points and possibly other point events are specified (rarely).
Real executed operations always have some duration and specified respective times of
occurrence of events within them (e.g., "exact" times of beginning and completion), but for
concurrency control reasoning usually only the precedence in time of the whole operations
(without looking into the quite complex details of each operation) matters, i.e., which
operation is before, or after another operation. Furthermore, in many cases the before/after
relationships between two specific operations do not matter and should not be specified,
while being specified for other pairs of operations.
In general operations of transactions in a schedule can interleave (i.e., transactions can be
executed concurrently), while time orders between operations in each transaction remain
unchanged as implied by the transaction's program. Since not always time orders between all
operations of all transactions matter and need to be specified, a schedule is, in general, a
partial order between operations rather than a total order (where order for each pair is
determined, as in a list of operations). Also in the general case each transaction may consist
of several processes, and itself be properly represented by a partial order of operations, rather
than a total order. Thus in general a schedule is a partial order of operations, containing
(embedding) the partial orders of all its transactions.
Time-order between two operations can be represented by an ordered pair of these operations
(e.g., the existence of a pair (OP1,OP2) means that OP1 is always before OP2), and a
schedule in the general case is a set of such ordered pairs. Such a set, a schedule, is a partial
order which can be represented by an acyclic directed graph (or directed acyclic graph,
DAG) with operations as nodes and time-order as a directed edge (no cycles are allowed
since a cycle means that a first (any) operation on a cycle can be both before and after (any)
another second operation on the cycle, which contradicts our perception of Time). In many
cases a graphical representation of such graph is used to demonstrate a schedule.
Comment: Since a list of operations (and the table notation used in this article) always
represents a total order between operations, schedules that are not a total order cannot be
represented by a list (but always can be represented by a DAG).
Types of schedule
Serial
The transactions are executed non-interleaved (see example above) i.e., a serial schedule is
one in which no transaction starts until a running transaction has ended.
Serializable
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability
property.
In schedule E, the order in which the actions of the transactions are executed is not the same
as in D, but in the end, E gives the same result as D.

Conflicting actions
Two actions are said to be in conflict (conflicting pair) if:
1. The actions belong to different transactions.
2. At least one of the actions is a write operation.
3. The actions access the same object (read or write).
The following set of actions is conflicting:
R1(X), W2(X), W3(X) (3 conflicting pairs)
While the following sets of actions are not:
R1(X), R2(X), R3(X)
R1(X), W2(Y), R3(X)
Conflict equivalence
The schedules S1 and S2 are said to be conflict-equivalent if following two conditions are
satisfied:
1. Both schedules S1 and S2 involve the same set of transactions (including ordering of
actions within each transaction).
2. Both schedules have same set of conflicting operations.
Conflict-serializable
A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one
or more serial schedules.
Another definition for conflict-serializability is that a schedule is conflict-serializable if and
only if its precedence graph/serializability graph, when only committed transactions are
considered, is acyclic (if the graph is defined to include also uncommitted transactions, then
cycles involving uncommitted transactions may occur without conflict serializability
violation).

Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>.
Commitment-ordered

The neutrality of this section is disputed. Relevant discussion may be found on the
talk page. Please do not remove this message until the dispute is resolved. (November
2011)
A schedule is said to be commitment-ordered (commit-ordered), or commitment-order-
serializable, if it obeys the Commitment ordering (CO; also commit-ordering or commit-
order-serializability) schedule property. This means that the order in time of transactions'
commitment events is compatible with the precedence (partial) order of the respective
transactions, as induced by their schedule's acyclic precedence graph (serializability graph,
conflict graph). This implies that it is also conflict-serializable. The CO property is especially
effective for achieving Global serializability in distributed systems.
Comment: Commitment ordering, which was discovered in 1990, is obviously not
mentioned in (Bernstein et al. 1987). Its correct definition appears in (Weikum and Vossen
2001), however the description there of its related techniques and theory is partial, inaccurate,
and misleading.
[according to whom?]
For an extensive coverage of commitment ordering and its
sources see Commitment ordering and The History of Commitment Ordering.
View equivalence
Two schedules S1 and S2 are said to be view-equivalent when the following conditions are
satisfied:
1. If the transaction in S1 reads an initial value for object X, so does the transaction
in S2.
2. If the transaction in S1 reads the value written by transaction in S1 for
object X, so does the transaction in S2.
3. If the transaction in S1 is the final transaction to write the value for an object X,
so is the transaction in S2.
View-serializable
A schedule is said to be view-serializable if it is view-equivalent to some serial schedule.
Note that by definition, all conflict-serializable schedules are view-serializable.

Notice that the above example (which is the same as the example in the discussion of
conflict-serializable) is both view-serializable and conflict-serializable at the same time.)
There are however view-serializable schedules that are not conflict-serializable: those
schedules with a transaction performing a blind write:

The above example is not conflict-serializable, but it is view-serializable since it has a view-
equivalent serial schedule <T1, T2, T3>.
Since determining whether a schedule is view-serializable is NP-complete, view-
serializability has little practical interest.
Recoverable
Transactions commit only after all transactions whose changes they read, commit.

These schedules are recoverable. F is recoverable because T1 commits before T2, that makes
the value read by T2 correct. Then T2 can commit itself. In F2, if T1 aborted, T2 has to abort
because the value of A it read is incorrect. In both cases, the database is left in a consistent
state.
Unrecoverable
If a transaction T1 aborts, and a transaction T2 commits, but T2 relied on T1, we have an
unrecoverable schedule.

In this example, G is unrecoverable, because T2 read the value of A written by T1, and
committed. T1 later aborted, therefore the value read by T2 is wrong, but since T2
committed, this schedule is unrecoverable.
Avoids cascading aborts (rollbacks)
Also named cascadeless. A single transaction abort leads to a series of transaction rollback.
Strategy to prevent cascading aborts is to disallow a transaction from reading uncommitted
changes from another transaction in the same schedule.
The following examples are the same as the one from the discussion on recoverable:

In this example, although F2 is recoverable, it does not avoid cascading aborts. It can be seen
that if T1 aborts, T2 will have to be aborted too in order to maintain the correctness of the
schedule as T2 has already read the uncommitted value written by T1.
The following is a recoverable schedule which avoids cascading abort. Note, however, that
the update of A by T1 is always lost (since T1 is aborted).

Cascading aborts avoidance is sufficient but not necessary for a schedule to be recoverable.
Strict
A schedule is strict - has the strictness property - if for any two transactions T1, T2, if a write
operation of T1 precedes a conflicting operation of T2 (either read or write), then the commit
event of T1 also precedes that conflicting operation of T2.
Any strict schedule is cascadeless, but not the converse. Strictness allows efficient recovery
of databases from failure.
Hierarchical relationship between serializability classes
The following expressions illustrate the hierarachical (containment) relationships between
serializability and recoverability classes:
Serial commitment-ordered conflict-serializable view-serializable all
schedules
Serial strict avoids cascading aborts recoverable all schedules
The Venn diagram (below) illustrates the above clauses graphically.

Venn diagram for serializability and recoverability classes
Practical implementations
In practice, most general purpose database systems employ conflict-serializable and
recoverable (primarily strict) schedules.
See also
schedule (project management)
References
Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman: Concurrency Control and
Recovery in Database Systems, Addison Wesley Publishing Company, 1987, ISBN 0-
201-10715-5
Gerhard Weikum, Gottfried Vossen: Transactional Information Systems, Elsevier,
2001, ISBN 1-55860-508-8
Categories:
Data management
Transaction processing
DBMS Transaction
Advertisements

Previous Page
Next Page

A transaction can be defined as a group of tasks. A single task is the minimum processing
unit of work, which cannot be divided further.
An example of transaction can be bank accounts of two users, say A & B. When a bank
employee transfers amount of Rs. 500 from A's account to B's account, a number of tasks are
executed behind the screen. This very simple and small transaction includes several steps:
decrease A's bank account from 500
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
In simple words, the transaction involves many tasks, such as opening the account of A,
reading the old balance, decreasing the 500 from it, saving new balance to account of A and
finally closing it. To add amount 500 in B's account same sort of tasks need to be done:
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
A simple transaction of moving an amount of 500 from A to B involves many low level tasks.
ACID Properties
A transaction may contain several low level tasks and further a transaction is very small unit
of any program. A transaction in a database system must maintain some properties in order to
ensure the accuracy of its completeness and data integrity. These properties are refer to as
ACID properties and are mentioned below:
Atomicity: Though a transaction involves several low level operations but this
property states that a transaction must be treated as an atomic unit, that is, either all of
its operations are executed or none. There must be no state in database where the
transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.
Consistency: This property states that after the transaction is finished, its database
must remain in a consistent state. There must not be any possibility that some data is
incorrectly affected by the execution of transaction. If the database was in a consistent
state before the execution of the transaction, it must remain in consistent state after
the execution of the transaction.
Durability: This property states that in any case all updates made on the database will
persist even if the system fails and restarts. If a transaction writes or updates some
data in database and commits that data will always be there in the database. If the
transaction commits but data is not written on the disk and the system fails, that data
will be updated once the system comes up.
Isolation: In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
Serializability
When more than one transaction is executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
Schedule: A chronological execution sequence of transaction is called schedule. A
schedule can have many transactions in it, each comprising of number of
instructions/tasks.
Serial Schedule: A schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle then next
transaction is executed. Transactions are ordered one after other. This type of
schedule is called serial schedule as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as benchmark. The
execution sequence of instruction in a transaction cannot be changed but two transactions can
have their instruction executed in random fashion. This execution does no harm if two
transactions are mutually independent and working on different segment of data but in case
these two transactions are working on same data, results may vary. This ever-varying result
may cause the database in an inconsistent state.
To resolve the problem, we allow parallel execution of transaction schedule if transactions in
it are either serializable or have some equivalence relation between or among transactions.
Equivalence schedules: Schedules can equivalence of the following types:
Result Equivalence:
If two schedules produce same results after execution, are said to be result equivalent.
They may yield same result for some value and may yield different results for another
values. That's why this equivalence is not generally considered significant.
View Equivalence:
Two schedules are view equivalence if transactions in both schedules perform similar
actions in similar manner.
For example:
o If T reads initial data in S1 then T also reads initial data in S2
o If T reads value written by J in S1 then T also reads value written by J in S2
o If T performs final write on data value in S1 then T also performs final write
on data value in S2
Conflict Equivalence:
Two operations are said to be conflicting if they have the following properties:
Both belong to separate transactions
Both accesses the same data item
At least one of them is "write" operation
Two schedules have more than one transactions with conflicting operations are said to be
conflict equivalent if and only if:
Both schedules contain same set of Transactions
The order of conflicting pairs of operation is maintained in both schedules
View equivalent schedules are view serializable and conflict equivalent schedules are conflict
serializable. All conflict serializable schedules are view serializable too.
States of Transactions:
A transaction in a database can be in one of the following state:

[Image: Transaction States]
Active: In this state the transaction is being executed. This is the initial state of every
transaction.
Partially Committed: When a transaction executes its final operation, it is said to be
in this state. After execution of all operations, the database system performs some
checks e.g. the consistency state of database after applying output of transaction onto
the database.
Failed: If any checks made by database recovery system fails, the transaction is said
to be in failed state, from where it can no longer proceed further.
Aborted: If any of checks fails and transaction reached in Failed state, the recovery
manager rolls back all its write operation on the database to make database in the state
where it was prior to start of execution of transaction. Transactions in this state are
called aborted. Database recovery module can select one of the two operations after a
transaction aborts:
o Re-start the transaction
o Kill the transaction
Committed: If transaction executes all its operations successfully it is said to be
committed. All its effects are now permanently made on database system.
DBMS Concurrency Control
Advertisements

Previous Page
Next Page

In a multiprogramming environment where more than one transactions can be concurrently
executed, there exists a need of protocols to control the concurrency of transaction to ensure
atomicity and isolation properties of transactions.
Concurrency control protocols, which ensure serializability of transactions, are most
desirable. Concurrency control protocols can be broadly divided into two categories:
Lock based protocols
Time stamp based protocols
Lock based protocols
Database systems, which are equipped with lock-based protocols, use mechanism by which
any transaction cannot read or write data until it acquires appropriate lock on it first. Locks
are of two kinds:
Binary Locks: a lock on data item can be in two states; it is either locked or
unlocked.
Shared/exclusive: this type of locking mechanism differentiates lock based on their
uses. If a lock is acquired on a data item to perform a write operation, it is exclusive
lock. Because allowing more than one transactions to write on same data item would
lead the database into an inconsistent state. Read locks are shared because no data
value is being changed.
There are four types lock protocols available:
Simplistic
Simplistic lock based protocols allow transaction to obtain lock on every object before
'write' operation is performed. As soon as 'write' has been done, transactions may
unlock the data item.
Pre-claiming
In this protocol, a transactions evaluations its operations and creates a list of data
items on which it needs locks. Before starting the execution, transaction requests the
system for all locks it needs beforehand. If all the locks are granted, the transaction
executes and releases all the locks when all its operations are over. Else if all the locks
are not granted, the transaction rolls back and waits until all locks are granted.

[Image: Pre-claiming]
Two Phase Locking - 2PL
This locking protocol is divides transaction execution phase into three parts. In the
first part, when transaction starts executing, transaction seeks grant for locks it needs
as it executes. Second part is where the transaction acquires all locks and no other
lock is required. Transaction keeps executing its operation. As soon as the transaction
releases its first lock, the third phase starts. In this phase a transaction cannot demand
for any lock but only releases the acquired locks.

[Image: Two Phase Locking]
Two phase locking has two phases, one is growing; where all locks are being acquired
by transaction and second one is shrinking, where locks held by the transaction are
being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read)
lock and then upgrade it to exclusive lock.
Strict Two Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all locks in the first
phase, transaction continues to execute normally. But in contrast to 2PL, Strict-2PL
does not release lock as soon as it is no more required, but it holds all locks until
commit state arrives. Strict-2PL releases all locks at once at commit point.

[Image: Strict Two Phase Locking]
Strict-2PL does not have cascading abort as 2PL does.
Time stamp based protocols
The most commonly used concurrency protocol is time-stamp based protocol. This protocol
uses either system time or logical counter to be used as a time-stamp.
Lock based protocols manage the order between conflicting pairs among transaction at the
time of execution whereas time-stamp based protocols start working as soon as transaction is
created.
Every transaction has a time-stamp associated with it and the ordering is determined by the
age of the transaction. A transaction created at 0002 clock time would be older than all other
transaction, which come after it. For example, any transaction 'y' entering the system at 0004
is two seconds younger and priority may be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know, when was last read and write operation made on the data item.
Time-stamp ordering protocol
The timestamp-ordering protocol ensures serializability among transaction in their conflicting
read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.
Time-stamp of Transaction Ti is denoted as TS(T
i
).
Read time-stamp of data-item X is denoted by R-timestamp(X).
Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows:
If a transaction Ti issues read(X) operation:
o If TS(Ti) < W-timestamp(X)
Operation rejected.
o If TS(Ti) >= W-timestamp(X)
Operation executed.
o All data-item Timestamps updated.

If a transaction Ti issues write(X) operation:
o If TS(Ti) < R-timestamp(X)
Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Thomas' Write rule:
This rule states that in case of:
If TS(Ti) < W-timestamp(X)

Operation rejected and Ti rolled back. Timestamp ordering rules can be modified to
make the schedule view serializable. Instead of making Ti rolled back, the 'write'
operation itself is ignored.
DBMS Deadlock
Advertisements

Previous Page
Next Page

In a multi-process system, deadlock is a situation, which arises in shared resource
environment where a process indefinitely waits for a resource, which is held by some other
process, which in turn waiting for a resource held by some other process.
For example, assume a set of transactions {T
0
, T
1
, T
2
, ...,T
n
}. T
0
needs a resource X to
complete its task. Resource X is held by T
1
and T
1
is waiting for a resource Y, which is held
by T
2
. T
2
is waiting for resource Z, which is held by T
0
. Thus, all processes wait for each
other to release resources. In this situation, none of processes can finish their task. This
situation is known as 'deadlock'.
Deadlock is not a good phenomenon for a healthy system. To keep system deadlock free few
methods can be used. In case the system is stuck because of deadlock, either the transactions
involved in deadlock are rolled back and restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations which transactions are about to execute. DBMS inspects operations and analyze if
they can create a deadlock situation. If it finds that a deadlock situation might occur then that
transaction is never allowed to be executed.
There are deadlock prevention schemes, which uses time-stamp ordering mechanism of
transactions in order to pre-decide a deadlock situation.
Wait-Die Scheme:
In this scheme, if a transaction request to lock a resource (data item), which is already held
with conflicting lock by some other transaction, one of the two possibilities may occur:
If TS(T
i
) < TS(T
j
), that is T
i
, which is requesting a conflicting lock, is older than T
j
, T
i

is allowed to wait until the data-item is available.
If TS(T
i
) > TS(t
j
), that is T
i
is younger than T
j
, T
i
dies. T
i
is restarted later with random
delay but with same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme:
In this scheme, if a transaction request to lock a resource (data item), which is already held
with conflicting lock by some other transaction, one of the two possibilities may occur:
If TS(T
i
) < TS(T
j
), that is T
i
, which is requesting a conflicting lock, is older than T
j
, T
i

forces T
j
to be rolled back, that is T
i
wounds T
j
. T
j
is restarted later with random delay
but with same timestamp.
If TS(T
i
) > TS(T
j
), that is T
i
is younger than T
j
, T
i
is forced to wait until the resource
is available.
This scheme, allows the younger transaction to wait but when an older transaction request an
item held by younger one, the older transaction forces the younger one to abort and release
the item.
In both cases, transaction, which enters late in the system, is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for
graph" are available but for the system where transactions are light in weight and have hold
on fewer instances of resource. In a bulky system deadlock prevention techniques may work
well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each
transaction entering in the system, a node is created. When transaction T
i
requests for a lock
on item, say X, which is held by some other transaction T
j
, a directed edge is created from T
i

to T
j
. If T
j
releases item X, the edge between them is dropped and T
i
locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items
held by others. System keeps checking if there's any cycle in the graph.

[Image: Wait-for Graph]
Two approaches can be used, first not to allow any request for an item, which is already
locked by some other transaction. This is not always feasible and may cause starvation, where
a transaction indefinitely waits for data item and can never acquire it. Second option is to roll
back one of the transactions.
It is not feasible to always roll back the younger transaction, as it may be important than the
older one. With help of some relative algorithm a transaction is chosen, which is to be
aborted, this transaction is called victim and the process is known as victim selection.
DBMS Data Backup
Advertisements

Previous Page
Next Page

Failure with loss of Non-Volatile storage
What would happen if the non-volatile storage like RAM abruptly crashes? All transaction,
which are being executed are kept in main memory. All active logs, disk buffers and related
data is stored in non-volatile storage.
When storage like RAM fails, it takes away all the logs and active copy of database. It makes
recovery almost impossible as everything to help recover is also lost. Following techniques
may be adopted in case of loss of non-volatile storage.
A mechanism like checkpoint can be adopted which makes the entire content of
database be saved periodically.
State of active database in non-volatile memory can be dumped onto stable storage
periodically, which may also contain logs and active transactions and buffer blocks.
<dump> can be marked on log file whenever the database contents are dumped from
non-volatile memory to a stable one.
Recovery:
When the system recovers from failure, it can restore the latest dump.
It can maintain redo-list and undo-list as in checkpoints.
It can recover the system by consulting undo-redo lists to restore the state of all
transaction up to last checkpoint.
Database backup & recovery from catastrophic failure
So far we have not discovered any other planet in our solar system, which may have life on it,
and our own earth is not that safe. In case of catastrophic failure like alien attack, the
database administrator may still be forced to recover the database.
Remote backup, described next, is one of the solutions to save life. Alternatively, whole
database backups can be taken on magnetic tapes and stored at a safer place. This backup can
later be restored on a freshly installed database and bring it to the state at least at the point of
backup.
Grown up databases are too large to be frequently backed-up. Instead, we are aware of
techniques where we can restore a database by just looking at logs. So backup of logs at
frequent rate is more feasible than the entire database. Database can be backed-up once a
week and logs, being very small can be backed-up every day or as frequent as every hour.
Remote Backup
Remote backup provides a sense of security and safety in case the primary location where the
database is located gets destroyed. Remote backup can be offline or real-time and online. In
case it is offline it is maintained manually.

[Image: Remote Data Backup]
Online backup systems are more real-time and lifesavers for database administrators and
investors. An online backup system is a mechanism where every bit of real-time data is
backed-up simultaneously at two distant place. One of them is directly connected to system
and other one is kept at remote place as backup.
As soon as the primary database storage fails, the backup system sense the failure and switch
the user system to the remote storage. Sometimes this is so instant the users even can't realize
a failure.
DBMS Data Recovery
Advertisements

Previous Page
Next Page

Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite
monitor the earth and at every second billions of people are connected through information
technology, failure is expected but not every time acceptable.
DBMS is highly complex system with hundreds of transactions being executed every second.
Availability of DBMS depends on its complex architecture and underlying hardware or
system software. If it fails or crashes amid transactions being executed, it is expected that the
system would follow some sort of algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as
follows:
Transaction failure
When a transaction is failed to execute or it reaches a point after which it cannot be
completed successfully it has to abort. This is called transaction failure. Where only few
transaction or process are hurt.
Reason for transaction failure could be:
Logical errors: where a transaction cannot complete because of it has some code
error or any internal error condition
System errors: where the database system itself terminates an active transaction
because DBMS is not able to execute it or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability systems aborts
an active transaction.
System crash
There are problems, which are external to the system, which may cause the system to stop
abruptly and cause the system to crash. For example interruption in power supply, failure of
underlying hardware or software failure.
Examples may include operating system errors.
Disk failure:
In early days of technology evolution, it was a common problem where hard disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be divided
in various categories:
Volatile storage: As name suggests, this storage does not survive system crashes and
mostly placed very closed to CPU by embedding them onto the chipset itself for
examples: main memory, cache memory. They are fast but can store a small amount
of information.
Nonvolatile storage: These memories are made to survive system crashes. They are
huge in data storage capacity but slower in accessibility. Examples may include, hard
disks, magnetic tapes, flash memory, non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files
opened for them to modifying data items. As we know that transactions are made of various
operations, which are atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained that is, either all operations are
executed or none.
When DBMS recovers from a crash it should maintain the following:
It should check the states of all transactions, which were being executed.
A transaction may be in the middle of some operation; DBMS must ensure the
atomicity of transaction in this case.
It should check whether the transaction can be completed now or needs to be rolled
back.
No transactions would be allowed to left DBMS in inconsistent state.
There are two types of techniques, which can help DBMS in recovering as well as
maintaining the atomicity of transaction:
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where are the changes are done on a volatile memory
and later the actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to actual modification and stored on
a stable storage media, which is failsafe.
Log based recovery works as follows:
The log file is kept on stable storage media
When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>
When the transaction modifies an item X, it write logs as follows:
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When transaction finishes, it logs:
<Tn, commit>
Database can be modified using two approaches:
1. Deferred database modification: All logs are written on to the stable storage and
database is updated when transaction commits.
2. Immediate database modification: Each log follows an actual database
modification. That is, database is modified immediately after every operation.
Recovery with concurrent transactions
When more than one transactions are being executed in parallel, the logs are interleaved. At
the time of recovery it would become hard for recovery system to backtrack all logs, and then
start recovering. To ease this situation most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. At time passes log file may be too big to be handled at
all. Checkpoint is a mechanism where all the previous logs are removed from the system and
stored permanently in storage disk. Checkpoint declares a point before which the DBMS was
in consistent state and all the transactions were committed.
Recovery
When system with concurrent transaction crashes and recovers, it does behave in the
following manner:


[Image: Recovery with concurrent transactions]
The recovery system reads the logs backwards from the end to the last Checkpoint.
It maintains two lists, undo-list and redo-list.
If the recovery system sees a log with <T
n
, Start> and <T
n
, Commit> or just <T
n
,
Commit>, it puts the transaction in redo-list.
If the recovery system sees a log with <T
n
, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in
redo-list, their previous logs are removed and then redone again and log saved.

You might also like