This article discusses schedules in database and transaction processing systems. A schedule models the execution order of transactions' operations and can be represented as a list or partial order. Key types of schedules discussed include serial schedules where transactions execute sequentially without overlap, and serializable schedules where the schedule is equivalent to a serial schedule. The article also covers conflict-serializable, view-serializable, and recoverable schedules. Determining if a schedule is view-serializable is NP-complete.
This article discusses schedules in database and transaction processing systems. A schedule models the execution order of transactions' operations and can be represented as a list or partial order. Key types of schedules discussed include serial schedules where transactions execute sequentially without overlap, and serializable schedules where the schedule is equivalent to a serial schedule. The article also covers conflict-serializable, view-serializable, and recoverable schedules. Determining if a schedule is view-serializable is NP-complete.
This article discusses schedules in database and transaction processing systems. A schedule models the execution order of transactions' operations and can be represented as a list or partial order. Key types of schedules discussed include serial schedules where transactions execute sequentially without overlap, and serializable schedules where the schedule is equivalent to a serial schedule. The article also covers conflict-serializable, view-serializable, and recoverable schedules. Determining if a schedule is view-serializable is NP-complete.
This article discusses schedules in database and transaction processing systems. A schedule models the execution order of transactions' operations and can be represented as a list or partial order. Key types of schedules discussed include serial schedules where transactions execute sequentially without overlap, and serializable schedules where the schedule is equivalent to a serial schedule. The article also covers conflict-serializable, view-serializable, and recoverable schedules. Determining if a schedule is view-serializable is NP-complete.
Download as DOCX, PDF, TXT or read online from Scribd
Download as docx, pdf, or txt
You are on page 1of 21
At a glance
Powered by AI
Schedules are used to model the execution of transactions in a database system and are important concepts in concurrency control theory. Different types of schedules like serial and serializable schedules exist based on the ordering of transactions.
A schedule models the execution of transactions as a list or partial order of operations. It abstractly describes the actions performed by transactions in a system ordered by time.
The main types of schedules discussed are serial, serializable, and recoverable schedules. Serializable schedules allow concurrent execution of transactions while avoiding anomalies.
Schedule (computer science)
From Wikipedia, the free encyclopedia
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (November 2012) In the fields of databases and transaction processing (transaction management), a schedule (or history) of a system is an abstract model to describe execution of transactions running in the system. Often it is a list of operations (actions) ordered by time, performed by a set of transactions that are executed together in the system. If order in time between certain operations is not determined by the system, then a partial order is used. Examples of such operations are requesting a read operation, reading, writing, aborting, committing, requesting lock, locking, etc. Not all transaction operation types should be included in a schedule, and typically only selected operation types (e.g., data access operations) are included, as needed to reason about and describe certain phenomena. Schedules and schedule properties are fundamental concepts in database concurrency control theory. Contents 1 Formal description 2 Types of schedule o 2.1 Serial o 2.2 Serializable 2.2.1 Conflicting actions 2.2.2 Conflict equivalence 2.2.3 Conflict-serializable 2.2.4 Commitment-ordered 2.2.5 View equivalence 2.2.6 View-serializable o 2.3 Recoverable 2.3.1 Unrecoverable 2.3.2 Avoids cascading aborts (rollbacks) 2.3.3 Strict 3 Hierarchical relationship between serializability classes 4 Practical implementations 5 See also 6 References Formal description The following is an example of a schedule:
In this example, the horizontal axis represents the different transactions in the schedule D. The vertical axis represents time order of operations. Schedule D consists of three transactions T1, T2, T3. The schedule describes the actions of the transactions as seen by the DBMS. First T1 Reads and Writes to object X, and then Commits. Then T2 Reads and Writes to object Y and Commits, and finally T3 Reads and Writes to object Z and Commits. This is an example of a serial schedule, i.e., sequential with no overlap in time, because the actions of in all three transactions are sequential, and the transactions are not interleaved in time. Representing the schedule D above by a table (rather than a list) is just for the convenience of identifying each transaction's operations in a glance. This notation is used throughout the article below. A more common way in the technical literature for representing such schedule is by a list: D = R1(X) W1(X) Com1 R2(Y) W2(Y) Com2 R3(Z) W3(Z) Com3 Usually, for the purpose of reasoning about concurrency control in databases, an operation is modeled as atomic, occurring at a point in time, without duration. When this is not satisfactory start and end time-points and possibly other point events are specified (rarely). Real executed operations always have some duration and specified respective times of occurrence of events within them (e.g., "exact" times of beginning and completion), but for concurrency control reasoning usually only the precedence in time of the whole operations (without looking into the quite complex details of each operation) matters, i.e., which operation is before, or after another operation. Furthermore, in many cases the before/after relationships between two specific operations do not matter and should not be specified, while being specified for other pairs of operations. In general operations of transactions in a schedule can interleave (i.e., transactions can be executed concurrently), while time orders between operations in each transaction remain unchanged as implied by the transaction's program. Since not always time orders between all operations of all transactions matter and need to be specified, a schedule is, in general, a partial order between operations rather than a total order (where order for each pair is determined, as in a list of operations). Also in the general case each transaction may consist of several processes, and itself be properly represented by a partial order of operations, rather than a total order. Thus in general a schedule is a partial order of operations, containing (embedding) the partial orders of all its transactions. Time-order between two operations can be represented by an ordered pair of these operations (e.g., the existence of a pair (OP1,OP2) means that OP1 is always before OP2), and a schedule in the general case is a set of such ordered pairs. Such a set, a schedule, is a partial order which can be represented by an acyclic directed graph (or directed acyclic graph, DAG) with operations as nodes and time-order as a directed edge (no cycles are allowed since a cycle means that a first (any) operation on a cycle can be both before and after (any) another second operation on the cycle, which contradicts our perception of Time). In many cases a graphical representation of such graph is used to demonstrate a schedule. Comment: Since a list of operations (and the table notation used in this article) always represents a total order between operations, schedules that are not a total order cannot be represented by a list (but always can be represented by a DAG). Types of schedule Serial The transactions are executed non-interleaved (see example above) i.e., a serial schedule is one in which no transaction starts until a running transaction has ended. Serializable A schedule that is equivalent (in its outcome) to a serial schedule has the serializability property. In schedule E, the order in which the actions of the transactions are executed is not the same as in D, but in the end, E gives the same result as D.
Conflicting actions Two actions are said to be in conflict (conflicting pair) if: 1. The actions belong to different transactions. 2. At least one of the actions is a write operation. 3. The actions access the same object (read or write). The following set of actions is conflicting: R1(X), W2(X), W3(X) (3 conflicting pairs) While the following sets of actions are not: R1(X), R2(X), R3(X) R1(X), W2(Y), R3(X) Conflict equivalence The schedules S1 and S2 are said to be conflict-equivalent if following two conditions are satisfied: 1. Both schedules S1 and S2 involve the same set of transactions (including ordering of actions within each transaction). 2. Both schedules have same set of conflicting operations. Conflict-serializable A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more serial schedules. Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if its precedence graph/serializability graph, when only committed transactions are considered, is acyclic (if the graph is defined to include also uncommitted transactions, then cycles involving uncommitted transactions may occur without conflict serializability violation).
Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>. Commitment-ordered
The neutrality of this section is disputed. Relevant discussion may be found on the talk page. Please do not remove this message until the dispute is resolved. (November 2011) A schedule is said to be commitment-ordered (commit-ordered), or commitment-order- serializable, if it obeys the Commitment ordering (CO; also commit-ordering or commit- order-serializability) schedule property. This means that the order in time of transactions' commitment events is compatible with the precedence (partial) order of the respective transactions, as induced by their schedule's acyclic precedence graph (serializability graph, conflict graph). This implies that it is also conflict-serializable. The CO property is especially effective for achieving Global serializability in distributed systems. Comment: Commitment ordering, which was discovered in 1990, is obviously not mentioned in (Bernstein et al. 1987). Its correct definition appears in (Weikum and Vossen 2001), however the description there of its related techniques and theory is partial, inaccurate, and misleading. [according to whom?] For an extensive coverage of commitment ordering and its sources see Commitment ordering and The History of Commitment Ordering. View equivalence Two schedules S1 and S2 are said to be view-equivalent when the following conditions are satisfied: 1. If the transaction in S1 reads an initial value for object X, so does the transaction in S2. 2. If the transaction in S1 reads the value written by transaction in S1 for object X, so does the transaction in S2. 3. If the transaction in S1 is the final transaction to write the value for an object X, so is the transaction in S2. View-serializable A schedule is said to be view-serializable if it is view-equivalent to some serial schedule. Note that by definition, all conflict-serializable schedules are view-serializable.
Notice that the above example (which is the same as the example in the discussion of conflict-serializable) is both view-serializable and conflict-serializable at the same time.) There are however view-serializable schedules that are not conflict-serializable: those schedules with a transaction performing a blind write:
The above example is not conflict-serializable, but it is view-serializable since it has a view- equivalent serial schedule <T1, T2, T3>. Since determining whether a schedule is view-serializable is NP-complete, view- serializability has little practical interest. Recoverable Transactions commit only after all transactions whose changes they read, commit.
These schedules are recoverable. F is recoverable because T1 commits before T2, that makes the value read by T2 correct. Then T2 can commit itself. In F2, if T1 aborted, T2 has to abort because the value of A it read is incorrect. In both cases, the database is left in a consistent state. Unrecoverable If a transaction T1 aborts, and a transaction T2 commits, but T2 relied on T1, we have an unrecoverable schedule.
In this example, G is unrecoverable, because T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value read by T2 is wrong, but since T2 committed, this schedule is unrecoverable. Avoids cascading aborts (rollbacks) Also named cascadeless. A single transaction abort leads to a series of transaction rollback. Strategy to prevent cascading aborts is to disallow a transaction from reading uncommitted changes from another transaction in the same schedule. The following examples are the same as the one from the discussion on recoverable:
In this example, although F2 is recoverable, it does not avoid cascading aborts. It can be seen that if T1 aborts, T2 will have to be aborted too in order to maintain the correctness of the schedule as T2 has already read the uncommitted value written by T1. The following is a recoverable schedule which avoids cascading abort. Note, however, that the update of A by T1 is always lost (since T1 is aborted).
Cascading aborts avoidance is sufficient but not necessary for a schedule to be recoverable. Strict A schedule is strict - has the strictness property - if for any two transactions T1, T2, if a write operation of T1 precedes a conflicting operation of T2 (either read or write), then the commit event of T1 also precedes that conflicting operation of T2. Any strict schedule is cascadeless, but not the converse. Strictness allows efficient recovery of databases from failure. Hierarchical relationship between serializability classes The following expressions illustrate the hierarachical (containment) relationships between serializability and recoverability classes: Serial commitment-ordered conflict-serializable view-serializable all schedules Serial strict avoids cascading aborts recoverable all schedules The Venn diagram (below) illustrates the above clauses graphically.
Venn diagram for serializability and recoverability classes Practical implementations In practice, most general purpose database systems employ conflict-serializable and recoverable (primarily strict) schedules. See also schedule (project management) References Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman: Concurrency Control and Recovery in Database Systems, Addison Wesley Publishing Company, 1987, ISBN 0- 201-10715-5 Gerhard Weikum, Gottfried Vossen: Transactional Information Systems, Elsevier, 2001, ISBN 1-55860-508-8 Categories: Data management Transaction processing DBMS Transaction Advertisements
Previous Page Next Page
A transaction can be defined as a group of tasks. A single task is the minimum processing unit of work, which cannot be divided further. An example of transaction can be bank accounts of two users, say A & B. When a bank employee transfers amount of Rs. 500 from A's account to B's account, a number of tasks are executed behind the screen. This very simple and small transaction includes several steps: decrease A's bank account from 500 Open_Account(A) Old_Balance = A.balance New_Balance = Old_Balance - 500 A.balance = New_Balance Close_Account(A) In simple words, the transaction involves many tasks, such as opening the account of A, reading the old balance, decreasing the 500 from it, saving new balance to account of A and finally closing it. To add amount 500 in B's account same sort of tasks need to be done: Open_Account(B) Old_Balance = B.balance New_Balance = Old_Balance + 500 B.balance = New_Balance Close_Account(B) A simple transaction of moving an amount of 500 from A to B involves many low level tasks. ACID Properties A transaction may contain several low level tasks and further a transaction is very small unit of any program. A transaction in a database system must maintain some properties in order to ensure the accuracy of its completeness and data integrity. These properties are refer to as ACID properties and are mentioned below: Atomicity: Though a transaction involves several low level operations but this property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in database where the transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction. Consistency: This property states that after the transaction is finished, its database must remain in a consistent state. There must not be any possibility that some data is incorrectly affected by the execution of transaction. If the database was in a consistent state before the execution of the transaction, it must remain in consistent state after the execution of the transaction. Durability: This property states that in any case all updates made on the database will persist even if the system fails and restarts. If a transaction writes or updates some data in database and commits that data will always be there in the database. If the transaction commits but data is not written on the disk and the system fails, that data will be updated once the system comes up. Isolation: In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction. Serializability When more than one transaction is executed by the operating system in a multiprogramming environment, there are possibilities that instructions of one transactions are interleaved with some other transaction. Schedule: A chronological execution sequence of transaction is called schedule. A schedule can have many transactions in it, each comprising of number of instructions/tasks. Serial Schedule: A schedule in which transactions are aligned in such a way that one transaction is executed first. When the first transaction completes its cycle then next transaction is executed. Transactions are ordered one after other. This type of schedule is called serial schedule as transactions are executed in a serial manner. In a multi-transaction environment, serial schedules are considered as benchmark. The execution sequence of instruction in a transaction cannot be changed but two transactions can have their instruction executed in random fashion. This execution does no harm if two transactions are mutually independent and working on different segment of data but in case these two transactions are working on same data, results may vary. This ever-varying result may cause the database in an inconsistent state. To resolve the problem, we allow parallel execution of transaction schedule if transactions in it are either serializable or have some equivalence relation between or among transactions. Equivalence schedules: Schedules can equivalence of the following types: Result Equivalence: If two schedules produce same results after execution, are said to be result equivalent. They may yield same result for some value and may yield different results for another values. That's why this equivalence is not generally considered significant. View Equivalence: Two schedules are view equivalence if transactions in both schedules perform similar actions in similar manner. For example: o If T reads initial data in S1 then T also reads initial data in S2 o If T reads value written by J in S1 then T also reads value written by J in S2 o If T performs final write on data value in S1 then T also performs final write on data value in S2 Conflict Equivalence: Two operations are said to be conflicting if they have the following properties: Both belong to separate transactions Both accesses the same data item At least one of them is "write" operation Two schedules have more than one transactions with conflicting operations are said to be conflict equivalent if and only if: Both schedules contain same set of Transactions The order of conflicting pairs of operation is maintained in both schedules View equivalent schedules are view serializable and conflict equivalent schedules are conflict serializable. All conflict serializable schedules are view serializable too. States of Transactions: A transaction in a database can be in one of the following state:
[Image: Transaction States] Active: In this state the transaction is being executed. This is the initial state of every transaction. Partially Committed: When a transaction executes its final operation, it is said to be in this state. After execution of all operations, the database system performs some checks e.g. the consistency state of database after applying output of transaction onto the database. Failed: If any checks made by database recovery system fails, the transaction is said to be in failed state, from where it can no longer proceed further. Aborted: If any of checks fails and transaction reached in Failed state, the recovery manager rolls back all its write operation on the database to make database in the state where it was prior to start of execution of transaction. Transactions in this state are called aborted. Database recovery module can select one of the two operations after a transaction aborts: o Re-start the transaction o Kill the transaction Committed: If transaction executes all its operations successfully it is said to be committed. All its effects are now permanently made on database system. DBMS Concurrency Control Advertisements
Previous Page Next Page
In a multiprogramming environment where more than one transactions can be concurrently executed, there exists a need of protocols to control the concurrency of transaction to ensure atomicity and isolation properties of transactions. Concurrency control protocols, which ensure serializability of transactions, are most desirable. Concurrency control protocols can be broadly divided into two categories: Lock based protocols Time stamp based protocols Lock based protocols Database systems, which are equipped with lock-based protocols, use mechanism by which any transaction cannot read or write data until it acquires appropriate lock on it first. Locks are of two kinds: Binary Locks: a lock on data item can be in two states; it is either locked or unlocked. Shared/exclusive: this type of locking mechanism differentiates lock based on their uses. If a lock is acquired on a data item to perform a write operation, it is exclusive lock. Because allowing more than one transactions to write on same data item would lead the database into an inconsistent state. Read locks are shared because no data value is being changed. There are four types lock protocols available: Simplistic Simplistic lock based protocols allow transaction to obtain lock on every object before 'write' operation is performed. As soon as 'write' has been done, transactions may unlock the data item. Pre-claiming In this protocol, a transactions evaluations its operations and creates a list of data items on which it needs locks. Before starting the execution, transaction requests the system for all locks it needs beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its operations are over. Else if all the locks are not granted, the transaction rolls back and waits until all locks are granted.
[Image: Pre-claiming] Two Phase Locking - 2PL This locking protocol is divides transaction execution phase into three parts. In the first part, when transaction starts executing, transaction seeks grant for locks it needs as it executes. Second part is where the transaction acquires all locks and no other lock is required. Transaction keeps executing its operation. As soon as the transaction releases its first lock, the third phase starts. In this phase a transaction cannot demand for any lock but only releases the acquired locks.
[Image: Two Phase Locking] Two phase locking has two phases, one is growing; where all locks are being acquired by transaction and second one is shrinking, where locks held by the transaction are being released. To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to exclusive lock. Strict Two Phase Locking The first phase of Strict-2PL is same as 2PL. After acquiring all locks in the first phase, transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release lock as soon as it is no more required, but it holds all locks until commit state arrives. Strict-2PL releases all locks at once at commit point.
[Image: Strict Two Phase Locking] Strict-2PL does not have cascading abort as 2PL does. Time stamp based protocols The most commonly used concurrency protocol is time-stamp based protocol. This protocol uses either system time or logical counter to be used as a time-stamp. Lock based protocols manage the order between conflicting pairs among transaction at the time of execution whereas time-stamp based protocols start working as soon as transaction is created. Every transaction has a time-stamp associated with it and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transaction, which come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and priority may be given to the older one. In addition, every data item is given the latest read and write-timestamp. This lets the system know, when was last read and write operation made on the data item. Time-stamp ordering protocol The timestamp-ordering protocol ensures serializability among transaction in their conflicting read and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed according to the timestamp values of the transactions. Time-stamp of Transaction Ti is denoted as TS(T i ). Read time-stamp of data-item X is denoted by R-timestamp(X). Write time-stamp of data-item X is denoted by W-timestamp(X). Timestamp ordering protocol works as follows: If a transaction Ti issues read(X) operation: o If TS(Ti) < W-timestamp(X) Operation rejected. o If TS(Ti) >= W-timestamp(X) Operation executed. o All data-item Timestamps updated.
If a transaction Ti issues write(X) operation: o If TS(Ti) < R-timestamp(X) Operation rejected. o If TS(Ti) < W-timestamp(X) Operation rejected and Ti rolled back. o Otherwise, operation executed. Thomas' Write rule: This rule states that in case of: If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back. Timestamp ordering rules can be modified to make the schedule view serializable. Instead of making Ti rolled back, the 'write' operation itself is ignored. DBMS Deadlock Advertisements
Previous Page Next Page
In a multi-process system, deadlock is a situation, which arises in shared resource environment where a process indefinitely waits for a resource, which is held by some other process, which in turn waiting for a resource held by some other process. For example, assume a set of transactions {T 0 , T 1 , T 2 , ...,T n }. T 0 needs a resource X to complete its task. Resource X is held by T 1 and T 1 is waiting for a resource Y, which is held by T 2 . T 2 is waiting for resource Z, which is held by T 0 . Thus, all processes wait for each other to release resources. In this situation, none of processes can finish their task. This situation is known as 'deadlock'. Deadlock is not a good phenomenon for a healthy system. To keep system deadlock free few methods can be used. In case the system is stuck because of deadlock, either the transactions involved in deadlock are rolled back and restarted. Deadlock Prevention To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations which transactions are about to execute. DBMS inspects operations and analyze if they can create a deadlock situation. If it finds that a deadlock situation might occur then that transaction is never allowed to be executed. There are deadlock prevention schemes, which uses time-stamp ordering mechanism of transactions in order to pre-decide a deadlock situation. Wait-Die Scheme: In this scheme, if a transaction request to lock a resource (data item), which is already held with conflicting lock by some other transaction, one of the two possibilities may occur: If TS(T i ) < TS(T j ), that is T i , which is requesting a conflicting lock, is older than T j , T i
is allowed to wait until the data-item is available. If TS(T i ) > TS(t j ), that is T i is younger than T j , T i dies. T i is restarted later with random delay but with same timestamp. This scheme allows the older transaction to wait but kills the younger one. Wound-Wait Scheme: In this scheme, if a transaction request to lock a resource (data item), which is already held with conflicting lock by some other transaction, one of the two possibilities may occur: If TS(T i ) < TS(T j ), that is T i , which is requesting a conflicting lock, is older than T j , T i
forces T j to be rolled back, that is T i wounds T j . T j is restarted later with random delay but with same timestamp. If TS(T i ) > TS(T j ), that is T i is younger than T j , T i is forced to wait until the resource is available. This scheme, allows the younger transaction to wait but when an older transaction request an item held by younger one, the older transaction forces the younger one to abort and release the item. In both cases, transaction, which enters late in the system, is aborted. Deadlock Avoidance Aborting a transaction is not always a practical approach. Instead deadlock avoidance mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for graph" are available but for the system where transactions are light in weight and have hold on fewer instances of resource. In a bulky system deadlock prevention techniques may work well. Wait-for Graph This is a simple method available to track if any deadlock situation may arise. For each transaction entering in the system, a node is created. When transaction T i requests for a lock on item, say X, which is held by some other transaction T j , a directed edge is created from T i
to T j . If T j releases item X, the edge between them is dropped and T i locks the data item. The system maintains this wait-for graph for every transaction waiting for some data items held by others. System keeps checking if there's any cycle in the graph.
[Image: Wait-for Graph] Two approaches can be used, first not to allow any request for an item, which is already locked by some other transaction. This is not always feasible and may cause starvation, where a transaction indefinitely waits for data item and can never acquire it. Second option is to roll back one of the transactions. It is not feasible to always roll back the younger transaction, as it may be important than the older one. With help of some relative algorithm a transaction is chosen, which is to be aborted, this transaction is called victim and the process is known as victim selection. DBMS Data Backup Advertisements
Previous Page Next Page
Failure with loss of Non-Volatile storage What would happen if the non-volatile storage like RAM abruptly crashes? All transaction, which are being executed are kept in main memory. All active logs, disk buffers and related data is stored in non-volatile storage. When storage like RAM fails, it takes away all the logs and active copy of database. It makes recovery almost impossible as everything to help recover is also lost. Following techniques may be adopted in case of loss of non-volatile storage. A mechanism like checkpoint can be adopted which makes the entire content of database be saved periodically. State of active database in non-volatile memory can be dumped onto stable storage periodically, which may also contain logs and active transactions and buffer blocks. <dump> can be marked on log file whenever the database contents are dumped from non-volatile memory to a stable one. Recovery: When the system recovers from failure, it can restore the latest dump. It can maintain redo-list and undo-list as in checkpoints. It can recover the system by consulting undo-redo lists to restore the state of all transaction up to last checkpoint. Database backup & recovery from catastrophic failure So far we have not discovered any other planet in our solar system, which may have life on it, and our own earth is not that safe. In case of catastrophic failure like alien attack, the database administrator may still be forced to recover the database. Remote backup, described next, is one of the solutions to save life. Alternatively, whole database backups can be taken on magnetic tapes and stored at a safer place. This backup can later be restored on a freshly installed database and bring it to the state at least at the point of backup. Grown up databases are too large to be frequently backed-up. Instead, we are aware of techniques where we can restore a database by just looking at logs. So backup of logs at frequent rate is more feasible than the entire database. Database can be backed-up once a week and logs, being very small can be backed-up every day or as frequent as every hour. Remote Backup Remote backup provides a sense of security and safety in case the primary location where the database is located gets destroyed. Remote backup can be offline or real-time and online. In case it is offline it is maintained manually.
[Image: Remote Data Backup] Online backup systems are more real-time and lifesavers for database administrators and investors. An online backup system is a mechanism where every bit of real-time data is backed-up simultaneously at two distant place. One of them is directly connected to system and other one is kept at remote place as backup. As soon as the primary database storage fails, the backup system sense the failure and switch the user system to the remote storage. Sometimes this is so instant the users even can't realize a failure. DBMS Data Recovery Advertisements
Previous Page Next Page
Crash Recovery Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth and at every second billions of people are connected through information technology, failure is expected but not every time acceptable. DBMS is highly complex system with hundreds of transactions being executed every second. Availability of DBMS depends on its complex architecture and underlying hardware or system software. If it fails or crashes amid transactions being executed, it is expected that the system would follow some sort of algorithm or techniques to recover from crashes or failures. Failure Classification To see where the problem has occurred we generalize the failure into various categories, as follows: Transaction failure When a transaction is failed to execute or it reaches a point after which it cannot be completed successfully it has to abort. This is called transaction failure. Where only few transaction or process are hurt. Reason for transaction failure could be: Logical errors: where a transaction cannot complete because of it has some code error or any internal error condition System errors: where the database system itself terminates an active transaction because DBMS is not able to execute it or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability systems aborts an active transaction. System crash There are problems, which are external to the system, which may cause the system to stop abruptly and cause the system to crash. For example interruption in power supply, failure of underlying hardware or software failure. Examples may include operating system errors. Disk failure: In early days of technology evolution, it was a common problem where hard disk drives or storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or part of disk storage Storage Structure We have already described storage system here. In brief, the storage structure can be divided in various categories: Volatile storage: As name suggests, this storage does not survive system crashes and mostly placed very closed to CPU by embedding them onto the chipset itself for examples: main memory, cache memory. They are fast but can store a small amount of information. Nonvolatile storage: These memories are made to survive system crashes. They are huge in data storage capacity but slower in accessibility. Examples may include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up) RAM. Recovery and Atomicity When a system crashes, it many have several transactions being executed and various files opened for them to modifying data items. As we know that transactions are made of various operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained that is, either all operations are executed or none. When DBMS recovers from a crash it should maintain the following: It should check the states of all transactions, which were being executed. A transaction may be in the middle of some operation; DBMS must ensure the atomicity of transaction in this case. It should check whether the transaction can be completed now or needs to be rolled back. No transactions would be allowed to left DBMS in inconsistent state. There are two types of techniques, which can help DBMS in recovering as well as maintaining the atomicity of transaction: Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database. Maintaining shadow paging, where are the changes are done on a volatile memory and later the actual database is updated. Log-Based Recovery Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the logs are written prior to actual modification and stored on a stable storage media, which is failsafe. Log based recovery works as follows: The log file is kept on stable storage media When a transaction enters the system and starts execution, it writes a log about it <Tn, Start> When the transaction modifies an item X, it write logs as follows: <Tn, X, V1, V2> It reads Tn has changed the value of X, from V1 to V2. When transaction finishes, it logs: <Tn, commit> Database can be modified using two approaches: 1. Deferred database modification: All logs are written on to the stable storage and database is updated when transaction commits. 2. Immediate database modification: Each log follows an actual database modification. That is, database is modified immediately after every operation. Recovery with concurrent transactions When more than one transactions are being executed in parallel, the logs are interleaved. At the time of recovery it would become hard for recovery system to backtrack all logs, and then start recovering. To ease this situation most modern DBMS use the concept of 'checkpoints'. Checkpoint Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. At time passes log file may be too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in storage disk. Checkpoint declares a point before which the DBMS was in consistent state and all the transactions were committed. Recovery When system with concurrent transaction crashes and recovers, it does behave in the following manner:
[Image: Recovery with concurrent transactions] The recovery system reads the logs backwards from the end to the last Checkpoint. It maintains two lists, undo-list and redo-list. If the recovery system sees a log with <T n , Start> and <T n , Commit> or just <T n , Commit>, it puts the transaction in redo-list. If the recovery system sees a log with <T n , Start> but no commit or abort log found, it puts the transaction in undo-list. All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list, their previous logs are removed and then redone again and log saved.