DBMS Cwipedia - in
DBMS Cwipedia - in
DBMS Cwipedia - in
in
Learn programming like java,python, The website MSBTE.cwipedia.in provides all the stuff
related to MSBTE I Scheme like Syllabus copy, Microprojects, MSBTE ..
DBMS
Notes
Prepared by: Dr. Subhendu Kumar Rath
1. Introduction
2. Disadvantages of file oriented approach
3. Database
4. Why Database
5. Database Management System(DBMS)
6. Function of DBMS
7. Advantages of DBMS and disadvantage of DBMS
8. Database Basics
9. Three level architecture of DBMS
10. Database users
11. Database language
12. Database structure
Introduction:
What is data:
Data is the known facts or figures that have implicit meaning. It can also be defined as it
is the representation of facts ,concepts or instruction in a formal manner, which is suitable
for understanding and processing. Data can be represented in alphabets(A-Z, a-z),in
digits(0-9) and using special characters(+,-.#,$, etc)
e.g: 25, “ajit” etc.
Information:
Information is the processed data on which decisions and actions are based. Information
can be defined as the organized and classified data to provide meaningful values.
File:
File is a collection of related data stored in secondary memory.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
3) Data isolation :
Because data are scattered in various file and files may be in different formats
with new application programs to retrieve the appropriate data is difficult.
4) Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in
the various application program. How ever when new constraints are added, it is
difficult to change the programs to enforce them.
5) Atomicity:
Database:
A database is organized collection of related data of an organization stored in
formatted way which is shared by multiple users.
The main feature of data in a database are:
Prepared by: Dr. Subhendu Kumar Rath
Persistent:
If data is removed from database due to some explicit request from user to remove.
Integrated:
A database can be a collection of data from different files and when any redundancy
among those files are removed from database is said to be integrated data.
Sharing Data:
The data stored in the database can be shared by multiple users simultaneously with out
affecting the correctness of data.
Why Database:
In order to overcome the limitation of a file system, a new approach was required.
Hence a database approach emerged. A database is a persistent collection of logically
related data. The initial attempts were to provide a centralized collection of data. A
database has a self describing nature. It contains not only the data sharing and integration
of data of an organization in a single database.
A small database can be handled manually but for a large database and having
multiple users it is difficult to maintain it, In that case a computerized database is useful.
The advantages of database system over traditional, paper based methods of record
keeping are:
compactness:
No need for large amount of paper files
speed:
The machine can retrieve and modify the data more faster way then human being
Less drudgery: Much of the maintenance of files by hand is eliminated
Accuracy: Accurate,up-to-date information is fetched as per requirement of the
user at any time.
Function of DBMS:
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
1. Defining database schema: it must give facility for defining the database
structure also specifies access rights to authorized users.
2. Manipulation of the database: The dbms must have functions like insertion of
record into database updation of data, deletion of data, retrieval of data
3. Sharing of database: The DBMS must share data items for multiple users by
maintaining consistency of data.
4. Protection of database: It must protect the database against unauthorized users.
5. Database recovery: If for any reason the system fails DBMS must facilitate data
base recovery.
Advantages of dbms:
Reduction of redundancies:
Centralized control of data by the DBA avoids unnecessary duplication of data and
effectively reduces the total amount of data storage required avoiding duplication in the
elimination of the inconsistencies that tend to be present in redundant data files.
Sharing of data:
A database allows the sharing of data under its control by any number of application
programs or users.
Data Integrity:
Data integrity means that the data contained in the database is both accurate and
consistent. Therefore data values being entered for storage could be checked to ensure
that they fall with in a specified range and are of the correct format.
Data Security:
The DBA who has the ultimate responsibility for the data in the dbms can ensure that
proper access procedures are followed including proper authentication schemas for access
to the DBS and additional check before permitting access to sensitive data.
Conflict resolution:
DBA resolve the conflict on requirements of various user and applications. The DBA
chooses the best file structure and access method to get optional performance for the
application.
Data Independence:
Prepared by: Dr. Subhendu Kumar Rath
Data independence is usually considered from two points of views; physically data
independence and logical data independence.
Physical data Independence allows changes in the physical storage devices or
organization of the files to be made without requiring changes in the conceptual view or
any of the external views and hence in the application programs using the data base.
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schema or any application program.
Disadvantage of DBMS:
1. DBMS software and hardware (networking installation) cost is high
2. The processing overhead by the dbms for implementation of security, integrity
and sharing of the data.
3. centralized database control
4. Setup of the database system requires more knowledge, money, skills, and time.
5. The complexity of the database may result in poor performance.
Database Basics:
Data item:
The data item is also called as field in data processing and is the smallest unit of data
that has meaning to its users.
Eg: “e101”,”sumit”
An entity is a thing or object in the real world that is distinguishable from all other
objects
Eg:
Bank,employee,student
Attributes are properties are properties of an entity.
Eg:
Empcode,ename,rolno,name
Logical data are the data for the table created by user in primary memory.
Physical data refers to the data stored in the secondary memory.
A subschema is derived schema derived from existing schema as per the user
requirement. There may be more then one subschema create for a single conceptual
schema.
Conceptual
level Mapping supplied by DBMS
Conceptual view
Internal level
A database management system that provides three level of data is said to follow three-
level architecture .
External level
Conceptual level
Internal level
External level :
Prepared by: Dr. Subhendu Kumar Rath
The external level is at the highest level of database abstraction . At this level, there will
be many views define for different users requirement. A view will describe only a subset
of the database. Any number of user views may exist for a given global or subschema.
for example , each student has different view of the time table. the view of a student of
Btech (CSE) is different from the view of the student of Btech(ECE).Thus this level of
abstraction is concerned with different categories of users.
Each external view is described by means of a schema called schema or
schema.
Conceptual level :
At this level of database abstraction all the database entities and the
relationships among them are included . One conceptual view represents the entire
database . This conceptual view is defined by the conceptual schema.
The conceptual schema hides the details of physical storage structures and concentrate on
describing entities , data types, relationships, user operations and constraints.
It describes all the records and relationships included in the conceptual view
. There is only one conceptual schema per database . It includes feature that specify the
checks to relation data consistency and integrity.
Internal level :
It is the lowest level of abstraction closest to the physical storage method used .
It indicates how the data will be stored and describes the data structures and access
methods to be used by the database . The internal view is expressed by internal schema.
The following aspects are considered at this level:
1. Storage allocation e.g: B-tree,hashing
2. access paths eg. specification of primary and secondary keys,indexes etc
3. Miscellaneous eg. Data compression and encryption techniques,optimization of
the internal structures.
Database users :
Naive users :
Users who need not be aware of the presence of the database system or any other
system supporting their usage are considered naïve users . A user of an automatic teller
machine falls on this category.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Online users :
These are users who may communicate with the database directly via an online
terminal or indirectly via a user interface and application program. These users are
aware of the database system and also know the data manipulation language system.
Application programmers :
Database Administration :
A person who has central control over the system is called database administrator .
The function of DBA are :
1. creation and modification of conceptual Schema
definition
2. Implementation of storage structure and access method.
3. schema and physical organization modifications .
4. granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures
7. ensure physical security to database
Database language :
Elements of DBMS:
DML pre-compiler:
DDL compiler:
The DDL compiler converts the data definition statements into a set of tables. These
tables contains information concerning the database and are in a form that can be used by
other components of the dbms.
File manager:
File manager manages the allocation of space on disk storage and the data structure used
to represent information stored on disk.
Database manager:
A database manager is a program module which provides the interface between the low
level data stored in the database and the application programs and queries submitted to
the system.
The responsibilities of database manager are:
1. Interaction with file manager: The data is stored on the disk using the file
system which is provided by operating system. The database manager translate
the the different DML statements into low-level file system commands. so The
database manager is responsible for the actual storing,retrieving and updating
of data in the database.
2. Integrity enforcement:The data values stored in the database must satisfy
certain constraints(eg: the age of a person can't be less then zero).These
constraints are specified by DBA. Data manager checks the constraints and if
it satisfies then it stores the data in the database.
3. Security enforcement:Data manager checks the security measures for
database from unauthorized users.
4. Backup and recovery:Database manager detects the failures occurs due to
different causes (like disk failure, power failure,deadlock,s/w error) and
restores the database to original state of the database.
5. Concurrency control:When several users access the same database file
simultaneously, there may be possibilities of data inconsistency. It is
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Database manager
File manager
DBMS
Data file
Data dictionary
Answer: Some main differences between a database management system and a file-
processing system are:
• Both systems contain a collection of data and a set of programs which access that
data. A database management system coordinates both the physical and the logical
Prepared by: Dr. Subhendu Kumar Rath
access to the data, whereas a file-processing system coordinates only the physical
access.
• A database management system reduces the amount of data duplication by
ensuring that a physical piece of data is available to all programs authorized to
have access to it, where as data written by one program in a file-processing system
may not be readable by another program.
• A database management system is designed to allow flexible access to data (i.e.,
queries), whereas a file-processing system is designed to allow predetermined
access to data (i.e., compiled programs).
• A database management system is designed to coordinate multiple users accessing
the same data at the same time. A file-processing system is usually designed to
allow one or more programs to access different data files at the same time. In a
file-processing system, a file can be accessed by two programs concurrently only
if both programs have read-only access to the file.
Answer:
• Physical data independence is the ability to modify the physical scheme without
making it necessary to rewrite application programs. Such modifications include
changing from unblocked to blocked record storage, or from sequential to random
access files.
• Logical data independence is the ability to modify the conceptual scheme without
making it necessary to rewrite application programs. Such a modification might
be adding a field to a record; an application program’s view hides this change
from the program.
Q. List six major steps that you would take in setting up a database for a particular
enterprise.
Answer: Six major steps in setting up a database for a particular enterprise are:
• Define the high level requirements of the enterprise (this step generates a
document known as the system requirements specification.)
• Define a model containing all appropriate types of data and data
relationships.
• Define the integrity constraints on the data.
• Define the physical level.
• For each known problem to be solved on a regular basis (e.g., tasks to be
carried out by clerks or Web users) define a user interface to carry out the
task, and write the necessary application programs to implement the user
interface.
• Create/initialize the database.
EXERCISES:
CHAPTER-2
ER-MODEL
Data model:
The data model describes the structure of a database. It is a collection of conceptual tools
for describing data, data relationships and consistency constraints and various types of
data model such as
1. Object based logical model
2. Record based logical model
3. Physical model
The entity-relationship data model perceives the real world as consisting of basic objects,
called entities and relationships among these objects. It was developed to facilitate data
base design by allowing specification of an enterprise schema which represents the
overall logical structure of a data base.
Main features of ER-MODEL:
• Entity relationship model is a high level conceptual model
• It allows us to describe the data involved in a real world enterprise in terms of
objects and their relationships.
• It is widely used to develop an initial design of a database
• It provides a set of useful concepts that make it convenient for a developer to
move from a baseid set of information to a detailed and description of information
that can be easily implemented in a database system
• It describes data as a collection of entities, relationships and attributes.
Prepared by: Dr. Subhendu Kumar Rath
Basic concepts:
The E-R data model employs three basic notions : entity sets, relationship sets and
attributes.
Entity sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other
objects. For example, each person in an enterprise is an entity. An entity has a set
properties and the values for some set of properties may uniquely identify an entity.
BOOK is entity and its properties(calles as attributes) bookcode, booktitle, price etc .
An entity set is a set of entities of the same type that share the same properties, or
attributes. The set of all persons who are customers at a given bank, for example, can be
defined as the entity set customer.
Attributes:
Customer is an entity and its attributes are customerid, custmername, custaddress etc.
An attribute as used in the E-R model , can be characterized by the following attribute
types.
a) Simple and composite attribute:
simple attributes are the attributes which can’t be divided into sub parts
eg: customerid,empno
composite attributes are the attributes which can be divided into subparts.
eg: name consisting of first name, middle name, last name
address consisting of city,pincode,state
b) single-valued and multi-valued attribute:
The attribute having unique value is single –valued attribute
eg: empno,customerid,regdno etc.
The attribute having more than one value is multi-valued attribute
eg: phone-no, dependent name, vehicle
c) Derived Attribute:
The values for this type of attribute can be derived from the values of existing
attributes
eg: age which can be derived from (currentdate-birthdate)
experience_in_year can be calculated as (currentdate-joindate)
Relationship sets:
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical
relation on n>=2 entity sets. If E1,E2…En are entity sets, then a relation ship set R is a
subset of
{(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En}
where (e1,e2,…en) is a relation ship.
Consider the two entity sets customer and loan. We define the relationship set borrow to
denote the association between customers and the bank loans that the customers have.
Mapping Cardinalities:
Mapping cardinalities or cardinality ratios, express the number of entities to which
another entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets, although they
can contribute to the description of relationship sets that involve more than two entity
sets.
For a binary relationship set R between entity sets A and B, the mapping cardinalities
must be one of the following:
one to one:
One to many:
Many to one:
An entity in A is associated with at most one entity in B. An entity in B is associated with
any number in A.
1 M
Course Teach Faculty
es
Many –to-many:
Entities in A and B are associated with any number of entities from each other.
1 M
Customer Depos Account
it
• The weak entity set must have total participation in the identifying relationship.
Example:
Consider the entity type dependent related to employee entity, which is used to keep
track of the dependents of each employee. The attributes of dependents are : name
,birthrate, sex and relationship. Each employee entity set is said to its own the
dependent entities that are related to it. How ever, not that the ‘dependent’ entity does
not exist of its own., it is dependent on the employee entity. In other words we can say
that in case an employee leaves the organization all dependents related to without the
entity ‘employee’. Thus it is a weak entity.
Keys:
Super key:
A super key is a set of one or more attributes that taken collectively, allow us to
identify uniquely an entity in the entity set.
For example , customer-id,(cname,customer-id),(cname,telno)
Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, which
have the following properties:
• Uniqueness: no two distinct tuples in R have the same values for
the candidate key
• Irreducible: No proper subset of the candidate key has the
uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the
principal means of identifying entities with in an entity set. The remaining candidate
keys if any, are called alternate key.
Prepared by: Dr. Subhendu Kumar Rath
ER-DIAGRAM:
The overall logical structure of a database using ER-model graphically with the help
of an ER-diagram.
Symbols use ER- diagram:
entity
Weak entity
composite attribute
attribute Relationship
1 m
1 1
Advanced ER-diagram:
empno name
dob
employee
Generalization Specialization
Is Is
degree degree
Is Is Is Is
EMPLOYEE(empno,name,dob) Faculty(empno,degree,intrest)
FULL_TIME_EMPLOYEE(empno,sala Staff(empno,hour-rate)
ry) Teaching (empno,stipend)
PART_TIME_EMPLOYEE(empno,type)
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a
higher level object. In this manner, the entity person is derived by aggregating the
characteristics of name, address, ssn. Another form of the aggregation is abstracting a
relationship objects and viewing the relationship as an object.
Prepared by: Dr. Subhendu Kumar Rath
Job
Branch
Employe
Works
on
Manag
es
Manager
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
Student Course
opts
N 1
1
M
Head
name dnam of 1 name sal
1
addres relationship
Date
2. for each weak entity type W in the ER diagram, we create another relation R that
contains all simple attributes of W. If E is an owner entity of W then key attribute
of E is also include In R. This key attribute of R is set as a foreign key attribute of
R. Now the combination of primary key attribute of owner entity type and partial
key of the weak entity type will form the key of the weak entity type
DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)
• One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify the
entity type (say E1) at the n-side of the relationship type R and include primary
key of the entity on the other side of the relation (say E2) as a foreign key attribute
in the table of E1. We include all simple attribute(or simple components of a
composite attribute of R(if any) in he table E1)
For example:
The works in relationship between the DEPARTMENT and FACULTY. For this
relationship choose the entity at N side, i.e, FACULTY and add primary key
attribute of another entity DEPARTMENT, ie, DNO as a foreign key attribute in
FACULTY.
• Many-to-many relationship:
For each m:n relationship type R, we create a new table (say S) to represent R, We
also include the primary key attributes of both the participating entity types as a
foreign key attribute in s. Any simple attributes of the m:n relationship type(or
simple components as a composite attribute) is also included as attributes of S.
For example:
The M:n relationship taught-by between entities COURSE; and FACULTY shod
be represented as a new table. The structure of the table will include primary key
of COURSE and primary key of FACULTY entities.
Prepared by: Dr. Subhendu Kumar Rath
• N-ary relationship:
For each n-anry relationship type R where n>2, we create a new table S to
represent R, We include as foreign key attributes in s the primary keys of the
relations that represent the participating entity types. We also include any simple
attributes of the n-ary relationship type(or simple components of complete
attribute) as attributes of S. The primary key of S is usually a combination of all
the foreign keys that reference the relations representing the participating entity
types.
Customer Loan
Loan -
sanctio
Employee
LOAN-SANCTION(cusomet-id,loanno,empno,sancdate,loan_amount)
• Multi-valued attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an
attribute corresponding to plus the primary key attributes k of the relation that
represents the entity type or relationship that has as an attribute. The primary key
of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and phone number where
phone numer is a multivalued attribute the we will create table
PHONE(rollno,phoneno) where primary key is the combination,In the STUDENT
table we need not have phone number, instead if can be simply (rollno,name)
only.
PHONE(rollno,phoneno)
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
name
Account_n
Account branch
generalisation
specialisation
Is-a
intrest charges
Saving Current
Hierarchical Model:
Single Relationships:
Example E-R diagram with two entity sets, customer and account, related through
a binary, one-to-many relationship depositor.
Corresponding tree-structure diagram has
o the record type customer with three fields: customer-name, customer-
street, and customer-city.
o the record type account with two fields: account-number and balance
o the link depositor, with an arrow pointing to customer
If the relationship depositor is one to one, then the link depositor has two arrows.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Must consider the type of queries expected and the degree to which the database
schema fits the given E-R diagram.
In all versions of this transformation, the underlying database tree (or trees) will
have replicated records.
Create two tree-structure diagrams, T1, with the root customer, and T2, with
the root account.
In T1, create depositor, a many-to-one link from account to customer.
In T2, create account-customer, a many-to-one link from customer to account.
Prepared by: Dr. Subhendu Kumar Rath
Virtual Records:
For many-to-many relationships, record replication is necessary to preserve the
tree-structure organization of the database.
o Data inconsistency may result when updating takes place
o Waste of space is unavoidable
Virtual record — contains no data value, only a logical pointer to a particular
physical record.
When a record is to be replicated in several database trees, a single copy of that
record is kept in one of the trees and all other records are replaced with a virtual
record.
Let R be a record type that is replicated in T1, T2, . . ., Tn. Create a new virtual
record type virtual-R and replace R in each of the n – 1 trees with a record of type
virtual-R.
Eliminate data replication in the diagram shown on page B.11; create virtual-
customer and virtual-account.
Replace account with virtual-account in the first tree, and replace customer with
virtual-customer in the second tree.
Add a dashed line from virtual-customer to customer, and from virtual-account to
account, to specify the association between a virtual record and its corresponding
physical record.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
Network Model:
Data are represented by collections of records.
o similar to an entity in the E-R model
o Records and their fields are represented as record type
type customer = record type account = record type
customer-name: string; account-number: integer;
customer-street: string; balance: integer;
customer-city: string;
end end
Relationships among data are represented by links
o similar to a restricted (binary) form of an E-R relationship
o restrictions on links depend on whether the relationship is many-many,
many-to-one, or one-to-one.
Data-Structure Diagrams:
Schema representing the design of a network database.
A data-structure diagram consists of two basic components:
o Boxes, which correspond to record types.
o Lines, which correspond to links.
Specifies the overall logical structure of the database.
Since a link cannot contain any data value, represent an E-R relationship with
attributes with a new record type and links.
1. Replace entity sets account, customer, and branch with record types account,
customer, and branch, respectively.
2. Create a new record type Rlink (referred to as a dummy record type).
3. Create the following many-to-one links:
o CustRlink from Rlink record type to customer record type
o AcctRlnk from Rlink record type to account record type
o BrncRlnk from Rlink record type to branch record type
Prepared by: Dr. Subhendu Kumar Rath
DBTG Sets:
o The structure consisting of two record types that are linked together is referred
to in the DBTG model as a DBTG set
o In each DBTG set, one record type is designated as the owner, and the other is
designated as the member, of the set.
o Each DBTG set can have any number of set occurrences (actual instances of
linked records).
o Since many-to-many links are disallowed, each set occurrence has precisely
one owner, and has zero or more member records.
o No member record of a set can participate in more than one occurrence of the
set at any point.
o A member record can participate simultaneously in several set occurrences of
different DBTG sets.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
o
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
RELATIONAL MODEL
Relational model is simple model is simple model in which database is represented as a
collection of “relations” where each relation is represented by two-dimensional table.
The relational model was founded by E.F.Codd of the IBM in 1972.The basic concept in
the relational model is that of a relation.
Properties:
o It is column homogeneous. In other words, in any given column of a table, all
items are of the same kind.
o Each item is a simple number or a character string. That is a table must be in first
normal form.
o All rows of a table are distinct.
o The ordering of rows with in a table is immaterial.
Prepared by: Dr. Subhendu Kumar Rath
o The column of a table are assigned distinct names and the ordering of these
columns in immaterial.
Relational schema:
A relational schema specifies the relation’ name, its attributes and the domain of each
attribute. If R is the name of a relation and A1,A2,… and is a list of attributes
representing R then R(A1,A2,…,an) is called a relational schema. Each attribute in
this relational schema takes a value from some specific domain called domain(Ai).
Example:
PERSON(PERSON_IDinteger,NAME: STRING,AGE:INTEGER,ADDRESS:string)
Total number of attributes in a relation denotes the degree of a relation.since the
PERSON relation schemea contains four attributes ,so this relation is of degree 4.
Relation Instance:
A relational instance denoted as r is a collection of tuples for a given relational
schema at a specific point of time.
A relation state r to the relations schema R(A1,A2…,An) also denoted by r® is a set
of n-tuples
R{t1,t2,…tm}
Where each n-tuple is an ordered list of n values
T=<v1,v2,….vn>
Where each vi belongs to domain (Ai) or contains null values.
The relation schema is also called ‘intension’ and the relation state is also called
‘extension’.
Eg:
Relation instance:
Student:
Rollno Name City Age
101 Sujit Bam 23
102 kunal bbsr 22
Keys:
Super key:
A super key is an attribute or a set of attributes used to identify the records uniquely in
a relation.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the
principal means of identifying entities with in an entity set. The remaining candidate
keys if any are called alternate key.
RELATIONAL CONSTRAINTS:
There are three types of constraints on relational database that include
o DOMAIN CONSTRAINTS
o KEY CONSTRAINTS
o INTEGRITY CONSTRAINTS
DOMAIN CONSTRAINTS:
It specifies that each attribute in a relation an atomic value from the corresponding
domains. The data types associated with commercial RDBMS domains include:
Prepared by: Dr. Subhendu Kumar Rath
Key constraints:
This constraints states that the key attribute value in each tuple msut be unique .i.e, no
two tuples contain the same value for the key attribute.(null values can allowed)
Integrity constraints:
Department(deptcode,dname)
Here the deptcode is the primary key.
Emp(empcode,name,city,deptcode).
Here the deptcode is foreign key.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
CODD'S RULES
To access any data-item you specify which column within which table it exists, there is
no reading of characters 10 to 20 of a 255 byte string.
Rule 3 : Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank characters and
distinct from zero or any other number) are supported in fully relational DBMS for
representing missing information and inapplicable information in a systematic way,
independent of data type."
If data does not exist or does not apply then a value of NULL is applied, this is
understood by the RDBMS as meaning non-applicable data.
Rule 4 : Dynamic on-line catalog based on the relational model.
"The data base description is represented at the logical level in the same way as-ordinary
data, so that authorized users can apply the same relational language to its interrogation as
they apply to the regular data."
The Data Dictionary is held within the RDBMS, thus there is no-need for off-line
volumes to tell you the structure of the database.
Rule 5 : Comprehensive data sub-language Rule.
"A relational system may support several languages and various modes of terminal use
(for example, the fill-in-the-blanks mode). However, there must be at least one language
whose statements are expressible, per some well-defined syntax, as character strings and
that is comprehensive in supporting all the following items
• Data Definition
• View Definition
• Integrity Constraints
• Authorization.
Prepared by: Dr. Subhendu Kumar Rath
Every RDBMS should provide a language to allow the user to query the contents of the
RDBMS and also manipulate the contents of the RDBMS.
Rule 6 : .View updating Rule
"All views that are theoretically updateable are also updateable by the system."
Not only can the user modify data, but so can the RDBMS when the user is not logged-in.
Rule 7 : High-level insert, update and delete.
"The capability of handling a base relation or a derived relation as a single operand
applies not only to the retrieval of data but also to the insertion, update and deletion of
data."
The user should be able to modify several tables by modifying the view to which they act
as base tables.
Rule 8 : Physical data independence.
"Application programs and terminal activities remain logically unimpaired whenever any
changes are made in either storage representations or access methods."
The user should not be aware of where or upon which media data-files are stored
Rule 9 : Logical data independence.
"Application programs and terminal activities remain logically unimpaired when
information-preserving changes of any kind that theoretically permit un-impairment are
made to the base tables."
User programs and the user should not be aware of any changes to the structure of the
tables (such as the addition of extra columns).
Rule 10 : Integrity independence.
"Integrity constraints specific to a particular relational data base must be definable in the
relational data sub-language and storable in the catalog, not in the application programs."
If a column only accepts certain values, then it is the RDBMS which enforces these
constraints and not the user program, this means that an invalid value can never be
entered into this column, whilst if the constraints were enforced via programs there is
always a chance that a buggy program might allow incorrect values into the system.
Rule 11 : Distribution independence.
"A relational DBMS has distribution independence."
The RDBMS may spread across more than one system and across several networks,
however to the end-user the tables should appear no different to those that are local.
Rule 12 : Non-subversion Rule.
"If a relational system has a low-level (single-record-at-a-time) language, that low level
cannot be used to subvert or bypass the integrity Rules and constraints expressed in the
higher level relational language (multiple-records-at-a-time)."
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
RELATION ALGEBRA:
Relational algebra is a set of basic operations used to manipulate the data in relational
model. These operations enable the user to specify basic retrieval request. The result of
retrieval is anew relation, formed from one or more relation. These operation can be
classified in two categories.
Basic Set Operation
Union
Intersection
Set difference
Cartesian product
Relational operations
Select
Project
Join
Division
A B
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
A B A B
α 1 α 2
α 2 β 3
β 1
r s
α 1
α 2
β 1
β 3
UNION OPERATION:
Notation: r ∪ s
Defined as:
r ∪ s = {t | t ∈ r or t ∈ s}
For r ∪ s to be valid.
r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (e.g., 2nd column of r deals
with the same type of values as does the 2nd column of s)
E.g. to find all customers with either an account or a loan ∏customer-name
(depositor) ∪ ∏customer-name (borrower)
Se t Difference
Operation – Example
Relations r, s:
r – s:
A B
Prepared by: Dr. Subhendu Kumar Rath
A B A B
α 1 α 2
α 2 β 3
β 1
s
r
α 1
β 1
Notation r – s
Defined as:
r – s = {t | t ∈ r and t ∉ s}
Set differences must be taken between compatible relations.
o r and s must have the same arity
o attribute domains of r and s must be compatible
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
α 1 α 10 α 1 α 10
a a
β 2 β 10 α 1 β 10
a a
β 20 α 1 β 20
b b
r γ 10 α 1
b γ 10 b
ProjectsOperation –β Ex2amαple 10 a
Relation r: A B C β 2 β 10 a
β 2 β 20 b
α 10 1
α 20 1 β 2 γ 10 b
β 30 1
r x s:
β 40 2
Notation r x s
Defined as: A C A C
∏A,C (r)
Notation:
r x s = {t q | t ∈ r and q ∈ s}
α 1 of r(R) α
Assume that attributes and 1s(S) are disjoint. (That is, R ∩ S =
∏∅ ). α 1 β 1
A1, A2, …, Ak (r) =
β and s(S) areβ not2 disjoint, then renaming must be
If attributes of r(R) 1
where A1, A2 are attribute names and r is a relation name.
used. β 2
The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
Duplicate rows removed from result, since relations are sets
E.g. To eliminate the branch-name attribute of account
∏account-number, balance (account)
Prepared by: Dr. Subhendu Kumar Rath
• Relation r A B C D
α α 1 7
α β 5 7
β β 12 3
β β 23 10
α α 1 7
β β 23 10
Notation: σ p(r)
p is called the selection predicate
Defined as:
σp(r) = {t | t ∈ r and p(t)}
Where p is a formula in propositional calculus consisting of
terms connected by : ∧ (and), ∨ (or), ¬ (not) Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, ≠, >, ≥. <. ≤
Example of selection:
Q: Display the account details belonging to the branch “perryridge”. σ branch-
name=“Perryridge”(account)
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Rename Operation
Allows us to name, and therefore to refer to, the results of relational-
algebra expressions.
Allows us to refer to a relation by more than one name.
Example:
ρ x (E)
returns the expression E under the name X
If a relational-algebra expression E has arity n, then
ρx (A1, A2, …, An) (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1, A2, …., An.
Set-Intersection Operation
Notation: r ∩ s
Defined as:
r ∩ s ={ t | t ∈ r and t ∈ s }
Assume:
r, s have the same arity
attributes of r and s are compatible
Note: r ∩ s = r - (r - s)
Set-
Set-Intersection Operation - Example
n Relation r, s: A B A B
α 1 α 2
α 2 β 3
β 1
r s
n r ∩s A B
α 2
Prepared by: Dr. Subhendu Kumar Rath
Natural-Join Operation
Notation:
Let r and srbe relations on schemas R and S respectively.
s Then, r s is a relation on schema R ∪ S obtained as follows:
Consider each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R ∩ S,
add a tuple t to the result, where
t has the same value as tr on r
Relations r, s:
A B C D B D E
α 1 α a 1 a α
β 2 γ a 3 a β
γ 4 β b 1 a γ
α 1 γ a 2 b δ
δ 2 β b 3 b ∈
r s
r s A B C D E
α 1 α a α
α 1 α a γ
α 1 γ a α
α 1 γ a γ
δ 2 β b δ
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Division Operation
r÷s
Suited to queries that include the phrase “for all”.
Let r and s be relations on schemas R and S
respectively where
R = (A1, …, Am, B1, …, Bn)
S = (B1, …, Bn)
The result of r ÷ s is a relation on schema
R – S = (A1, …, Am)
r ÷ s = { t | t ∈ ∏ R-S(r) ∧ ∀ u ∈ s ( tu ∈ r ) }
Relations r, s: A B
B
α 1 1
α 2 2
α 3
β 1 s
γ 1
δ 1
δ 3
δ 4
6
∈
1
∈
2
β
r ÷ s: A r
α
β
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Example Queries
n Find all customers who have an account from at least the
“Downtown” and the Uptown” branches.
Query 1
∏CN(σBN=“Downtown”(depositor account)) ∩
∏CN(σBN=“Uptown”(depositor account))
Query 2
∏customer-name, branch-name (depositor account)
÷ ρtemp(branch-name) ({(“Downtown”), (“Uptown”)})
Example Queries
n Find all customers who have an account at all branches located
in Brooklyn city.
Banking Example
Example Queries
n Find the loan number for each loan of an amount greater than
$1200
Example Queries
Example Queries
Find the names of all customers who have a loan at the Perryridge
branch.
∏customer-name (σbranch-name=“Perryridge”
(σborrower.loan-number = loan.loan-number(borrower x loan)))
Find the names of all customers who have a loan at the Perryridge
branch but do not have an account at any branch of the bank.
Example Queries
n Find the names of all customers who have a loan at the Perryridge
branch.
−Query 1
∏customer-name(σbranch-name = “Perryridge” (
σborrower.loan-number = loan.loan-number(borrower x loan)))
− Query 2
∏customer-name(σloan.loan-number = borrower.loan-number(
(σbranch-name = “Perryridge”(loan)) x borrower))
Example Queries
Find the largest account balance
n Rename account relation as d
n The query is:
∏balance(account) - ∏account.balance
(σaccount.balance < d.balance (account x ρd (account)))
Prepared by: Dr. Subhendu Kumar Rath
Aggregate functions:
α α 7
α β 7
β β 3
β β 10
sum-C
g sum(c) (r)
27
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
OUTER JOIN:
The outer join operation is an extension of the join operation to deal with missing
information.
There are three forms of outer join
left outer join
right outer join
full outer join
employee:
Empname Street City
Coyote Toon Hollywood
Rabbit Tunnel carrot
Smith Revolver Death valley
William Seaview Seattle
Ft_works:
Empname Branch name Salary
Coyote Mesa 1500
Rabbit Mesa 1300
Gates Redmond 5300
William Redmond 1500
Employee ft_works
The tuple relational calculus is a non procedural query language. It describes the
desired information with out giving a specific procedure for obtaining that
information.
A query in the tuple relational calculus is expressed as:
{t|P(t)}
where P is a formula. Several tuple variables may appear in a formula. A tuple
variable is said to be a free variable unless it is quantified by a ∃ or ∀.
Example Queries
Example Queries
Find the names of all customers having a loan, an account, or both
at the bank
Find the names
Find the names of ofall
allcustomers
customershaving
havinga aloan,
loananat account,
the
{t | ∃
Perryridges ∈ borrower(
branch t[customer-name] = s[customer-name]) ∨ ∃u ∈
or both at the
depositor( bank
t[customer-name] = u[customer-name])
{t{t| |∃∃ss∈∈borrower(t[customer-name]
borrower( t[customer-name] == s[customer-name]
s[customer-name]) ∨∧
∃u ∈ loan(u[branch-name]
the∈names
Find ∃u depositor( = “Perryridge”
of all t[customer-name] = u[customer-name])
customers who have
∧
a loan and an account
at theu[loan-number]
bank = s[loan-number]))}
Find{tthe
Safety of Expressions
| ∃names of all customers
s ∈ borrower( who have= as[customer-name])
t[customer-name] loan and an
account ∧ ∃u ∈at the bank t[customer-name] = u[customer-name])
depositor(
It is possible to write tuple calculus expressions that generate
{t | ∃s relations.
infinite ∈ borrower( t[customer-name] = s[customer-
Prepared by: Dr. Subhendu Kumar Rath
n Delete all loans with a loan amount between $1300 and $1500.
H For consistency, we have to delete information from loan and
borrower tables
Data base design is a process in which you create a logical data model for a database,
which store data of a company. It is performed after initial database study phase in the
database life cycle. You use normalization technique to create the logical data model for a
database and eliminate data redundancy. Normalization also allows you to organize data
efficiently in a data base and reduce anomalies during data operation. Various normal
forms, such as first, second and third can be applied to create a logical data model for a
database. The second and third normal forms are based on partial dependency and
transitivity dependency. Partial dependency occurs when a row of table is uniquely
identified by one column that is a part of a primary key. A transitivity dependency ours
when a non key column is uniquely identified by values in another non-key column of a
table.
FUNCTIONAL DEPENDENCIES:
The functional dependency x→y
Holds on scema R if, in any legal relation r(R ), for all pairs of tuples t1 and t2 in r
such that t1[x]=t2[x]. it is also the case that t1[y]=t2[y]
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Prepared by: Dr. Subhendu Kumar Rath
The closure of F, denoted by F+, is the set of all functional dependencies logically
implied by F.
The closure of F can be found by using a collection of rules called Armstrong
axioms.
Reflexivity rule: If A is a set of attributes and B is subset or equal to A,
then A→B holds.
Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB
holds
Transitivity rule: If A→B holds and B→C holds, then A→C holds.
Union rule: If A→B holds and A→C then A→BC holds
Decomposition rule: If A→BC holds, then A→B holds and A→C holds.
Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds.
Suppose we are given a relation schema R=(A,B,C,G,H,I) and the set of function
dependencies
A→B,A→C,CG→H,CG→I,B→H
+
We list several members of F here:
A→H, since A→B and B→H hold, we apply the transitivity rule.
CG→HI. Since CG→H and CG→I , the union rule implies that CG→HI
AG→I, since A→C and CG→I, the pseudo transitivity rule implies that
AG→I holds
Algorithm of compute F+ :
To compute the closure of a set of functional dependencies F:
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1and f2 in F+
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F+
until F+ does not change any further
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
To test whether a set α is a super key, we must devise an algorithm for omputing
the set of attributes functionally determined by alpha. One way of doing this is to
compute F+ take all functional dependencies. However doing so can be
expensive, since F+ can be large.
Prepared by: Dr. Subhendu Kumar Rath
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Canonical Cover
Extraneous Attributes
Consider a set F of functional dependencies and the functional dependency
α → β in F.
Attribute A is extraneous in α if A ∈ α and F logically implies (F – {α →
β}) ∪ {(α – A) → β}.
Attribute A is extraneous in β if A ∈ β and the set of functional
dependencies (F – {α → β}) ∪ {α →(β – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases
above, since a “stronger” functional dependency always implies a weaker
one
Example: Given F = {A → C, AB → C }
B is extraneous in AB → C because {A → C, AB → C} logically implies A →
C (I.e. the result of dropping B from AB → C).
Example: Given F = {A → C, AB → CD}
C is extraneous in AB → CD since AB → C can be inferred even after
deleting C
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Canonical Cover
DEPEDENCY PRSERVATION:
Example:
Let R(A,B,C) AND F={A→B}. Then the decomposition of R into R1(A,B) and
R2(A,C) is lossless because the FD { A→B} is contained in R1 and the common
attribute A is a key of R1.
Example:
Let R(A,B,C) AND F={A→B}. Then the decomposition of R into R1(A,B) and
R2(B,C) is not lossless because the common attribute B does not functionally
determine either A or C. i.e, it is not a key of R1 or R 2.
Example:
Let R(A,B,C,D) and F={A→B, A→C, C→D,}. Then the decomposition of R into
R1(A,B,C) with the FD F1={ A→B , A→C }and R2(C,D) with FD F2={ C→D} .
In this decomposition all the original FDs can be logically derived from F1 and
F2, hence the decomposition is dependency preserving also . the common attribute
C forms a key of R2. The decomposition is lossless.
Example:
Let R(A,B,C,D) and F={A→B, A→C, A→D,}. Then the decomposition of R into
R1(A,B,D) with the FD F1={ A→B , A→D }and R2(B,C) with FD F2={ } is
lossy because the common attribute B is not a candidate key of either R1 and R2 .
In addition , the fds A→C is not implied by any fds R1 or R2. Thus the
decomposition is not dependency preserving.
Prepared by: Dr. Subhendu Kumar Rath
Partial dependency:
Given a relation dependencies F defined on the attributes of R and K as a
candidate key ,if X is a proper subset of K and if F|= X→A, then A is said to be
partial dependent on K
NORMALIZATION
The idea of normalizing relations to higher and higher normal forms is to attain the goals
of having a set of ideal relations meeting the above criteria.
Unnormalized relation:
Each row may contain multiple set of values for some of the columns, these multiple
values in a single row are also called non atomic values.
orderno → orderdate
Defn: A relation scheme R<S,F> is in second normal form(2NF) if it is in the !NF and if
all non prime attributes are fully functionally dependent on the relation keys.
A Third Normal Form normalization will be needed where all attributes in a relation tuple
are not functionally dependent only on the key attribute. If two non-key attributes are
functionally dependent, then there will be unnecessary duplication of data. Consider the
relation given in table 4. Here. Roll no. is the key and all other attributes are
functionally dependent on it. Thus it is in 2NF. If it is known that in the college all first
year students are accommodated in Ganga hostel, all second year students in Kaveri, all
third year students in Krishna, and all fourth year students in Godavari, then the non-key
attribute Hostel name is dependent on the non-key attribute Year. This dependency is
shown in figure 6.
Prepared by: Dr. Subhendu Kumar Rath
Observe that given the year of student, his hostel is known and vice versa. The
dependency of hostel on year leads to duplication of data as is evident from table 4. If it is
decided to ask all first year students to move to Kaveri hostel, and all second year
students to Ganga hostel. this change should be made in many places in table 4. Also,
when a student's year of study changes, his hostel change should also be noted in Table 4.
This is undesirable. Table 4 is said to be in 3NF if it is in 2NF and no non-key attribute
is functionally dependent on any other non-key attribute. Table 4 is thus not in 3NF. To
transform it to 3NF, we should introduce another relation which includes the
functionally related non-key attributes. This is shown in table 5.
It is assumed that
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
The relationship diagram for the above relation is given in figure 8. Table 6 gives the
relation attributes. The two possible composite keys are professor code and Dept. or
Professor code and Hcad of Dept. Observe that department as well as Head of Dept. are
not non-key attributes. They are a part of a composite key
Prepared by: Dr. Subhendu Kumar Rath
MULTIVALUED DEPEDENCY:
Defn:Given a relation scheme R, Le X and Y be subsets of attributes of R. then the multi
valued dependency X →→Y holds in a relation R defined on R if given two tuples t1 and
t2 in R with t1(X)=t2(X);
R contains two tuples t3 and t4 with the following characteristics: t1,t2,t3,t4 have the X
value i.e,
T1(X)= T2(X)=T3(X)= T4(X)
The Y values of t1 and t3 are the same and the Y values of t2 and t4 are the same .i.e,
T1(Y)= T2(Y)=T3(Y)= T4(Y)
Defn:
Given a relation scheme R such that the set D of FDs and MVDs are satisfied, consider a
set attributes X and Y where X is subset or equal to R,Y is subset or equal to Y. The
reltion scheme R is in 4NF if for all mutivalued dependencies of the form X →→Y Є D+
Either X →→Y is a trivial MVD or X is super key of R.
TRANSCATION:
These properties are often called the ACID properties; the acronym is derived from
the first letter of each of the four properties.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction
can be defined as
Ti: read(A);
A := A − 50;
write(A);
Prepared by: Dr. Subhendu Kumar Rath
read(B);
B := B + 50;
write(B).
TRANSCATION STATE:
• It can restart the transaction, but only if the transaction was aborted as a result of
some hardware or software error that was not created through the internal logic of
the transaction. A restarted transaction is considered to be a new transaction.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
• It can kill the transaction. It usually does so because of some internal logical error
that can be corrected only by rewriting the application program, or because the
input was bad, or because the desired data were not found in the database.
Concurrent Executions:
Multiple transactions are allowed to run concurrently in the system.
Advantages are:
increased processor and disk utilization, leading to better transaction
throughput: one transaction can be using the CPU while another is reading
from or writing to the disk
reduced average response time for transactions: short transactions need not
wait behind long ones.
Concurrency control schemes – mechanisms to achieve isolation, i.e., to control the
interaction among the concurrent transactions in order to prevent them from
destroying the consistency of the database
Schedules
Schedules – sequences that indicate the chronological order in which instructions of
concurrent transactions are executed
a schedule for a set of transactions must consist of all instructions of those
transactions
must preserve the order in which the instructions appear in each individual
transaction
Example Schedules
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The
following is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.
Serializability:
• Basic Assumption – Each transaction preserves database consistency.
• Thus serial execution of a set of transactions preserves database consistency.
• A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule. Different forms of schedule equivalence give rise to the notions of:
o conflict serializability
o view serializability
• We ignore operations other than read and write instructions, and we assume that
transactions may perform arbitrary computations on data in local buffers in
between reads and writes. Our simplified schedules consist of only read and
write instructions.
Conflict Serializability
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if
there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.
o li = read(Q), lj = read(Q). li and lj don’t conflict.
o li = read(Q), lj = write(Q). They conflict.
o li = write(Q), lj = read(Q). They conflict
o li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical) temporal order between
them. If li and lj are consecutive in a schedule and they do not conflict, their
results would remain the same even if they had been interchanged in the schedule.
• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict equivalent to a
serial schedule
• Example of a schedule that is not conflict serializable:
T3 T4
read(Q)
write(Q)
write(Q)
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
View Serializability
Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q
in schedule S, then transaction Ti must, in schedule S´, also read
the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in schedule
S, and that value was produced by transaction Tj (if any), then
transaction Ti must in schedule S´ also read the value of Q that was
produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the
final write(Q) operation in schedule S must perform the final
write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone.
• A schedule S is view serializable it is view equivalent to a serial
schedule.
• Every conflict serializable schedule is also view serializable.
• Schedule 9 (from text) — a schedule which is view-serializable but not
conflict serializable.
Concurrency Control
• Lock-Based Protocols
• Timestamp-Based Protocols
• Validation-Based Protocols
• Multiple Granularity
• Multiversion Schemes
• Deadlock Handling
Lock-Based Protocols
• A lock is a mechanism to control concurrent access to a data item
• Data items can be locked in two modes :
o exclusive (X) mode. Data item can be both read as well as written. X-lock
is requested using lock-X instruction.
o shared (S) mode. Data item can only be read. S-lock is requested using
lock-S instruction.
• Lock requests are made to concurrency-control manager. Transaction can proceed
only after request is granted.
• Lock-compatibility matrix
•
•
•
• A transaction may be granted a lock on an item if the requested lock is compatible
with locks already held on the item by other transactions
• Any number of transactions can hold shared locks on an item, but if any
transaction holds an exclusive on the item no other transaction may hold any lock
on the item.
• If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is
then granted.
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if A and B get
updated in-between the read of A and B, the displayed sum would be wrong.
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any
new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it
can issue no more lock requests.
Prepared by: Dr. Subhendu Kumar Rath
For example, transactions T3 and T4 are two phase. On the other hand, transactions
T1 and T2 are not two phase. Note that the unlock instructions do not need to appear
at the end of the transaction. For example, in the case of transaction T3, we could
move the unlock(B) instruction to just after the lock-X(A) instruction, and still retain
the two-phase locking property.
Timestamp-Based Protocols
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted
by TS(Ti). This timestamp is assigned by the database system before the transaction Ti
starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for
implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transaction’s
timestampis equal to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been
assigned; that is, a transaction’s timestamp is equal to the value of the counter
When the transaction enters the system.
The timestamps of the transactions determine the serializability order. Thus, if
TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent
to a serial schedule in which transaction Ti appears before transaction Tj .
To implement this scheme, we associate with each data item Q two timestamp
values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed
write(Q) successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executed
read(Q) successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction is
executed.
The timestamp-ordering protocol ensures that any conflicting read and write operations
are executed in timestamp order. This protocol operates as follows:
1. Suppose that transaction Ti issues read(Q).
• If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was
already overwritten. Hence, the read operation is rejected, and Ti is rolled
back.
• If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and
Rtimestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti).
2. Suppose that transaction Ti issues write(Q).
o If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was
needed previously, and the system assumed that that value would never be
produced. Hence, the system rejects the write operation and rolls Ti back.
o If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q. Hence, the system rejects this write operation and rolls Ti
back. Otherwise, the system executes the write operation and sets W-
timestamp( Q) to TS(Ti).
o
Prepared by: Dr. Subhendu Kumar Rath
Deadlock Handling
System is deadlocked if there is a set of transactions such that
every transaction in the set is waiting for another transaction in
the set.
Deadlock prevention protocols ensure that the system will never
enter into a deadlock state. Some prevention strategies :
Require that each transaction locks all its data items before it begins
execution (predeclaration).
Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by the
partial order (graph-based protocol).
Deadlock Handling
T T
1 2
lock-X on
X
write (X) lock-X on Y
write (X)
wait for lock-X on
wait for lock-X on X
Y
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
Timeout-Based Schemes :
a transaction waits for a lock only for a specified amount of time.
After that, the wait times out and the transaction is rolled back.
thus deadlocks are not possible
simple to implement; but starvation is possible. Also difficult to
determine good value of the timeout interval.
Prepared by: Dr. Subhendu Kumar Rath
Deadlock Detection
When a detection algorithm determines that a deadlock exists, the system must recover
from the deadlock. The most common solution is to roll back one or more transactions to
break the
deadlock. Three actions need to be taken:
1. Selection of a victim. Given a set of deadlocked transactions, we must determine
which transaction (or transactions) to roll back to break the deadlock. We should roll
back those transactions that will incur the minimum cost. Unfortunately, the term
minimum cost is not a precise one. Many factors may determine the cost of a rollback,
including
a. How long the transaction has computed, and how much longer the transaction
will compute before it completes its designated task.
b. How many data items the transaction has used.
c. How many more data items the transaction needs for it to complete.
d. How many transactions will be involved in the rollback.
2. Rollback. Once we have decided that a particular transaction must be rolled back, we
must determine how far this transaction should be rolled back. The simplest solution is a
total rollback: Abort the transaction and then restart it. However, it is more effective to
roll back the transaction only as far as necessary to break the deadlock. Such partial
rollback requires the system to maintain additional information about the state of all the
running transactions. Specially, the sequence of lock requests/grants and updates
performed by the transaction needs to be recorded. The deadlock detection mechanism
should decide which locks the selected transaction needs to release in order to break the
deadlock. The selected transaction must be rolled back to the point where it obtained the
.rst of these locks, undoing all actions it took after that point. The recovery mechanism
must be capable of performing such partial rollbacks. Furthermore, the transactions must
be capable of resuming execution after a partial rollback. See the bibliographical notes for
relevant references.
3. Starvation. In a system where the selection of victims is based primarily on cost
factors, it may happen that the same transaction is always picked as a victim. As a result,
Prepared by: Dr. Subhendu Kumar Rath, BPUT.
this transaction never completes its designated task, thus there is starvation. We must
ensure that transaction can be picked as a victim only a (small) .nite number of times. The
most common solution is to include the number of rollbacks in the cost factor.
Consider transaction T29 that executes the following SQL query on the bank database:
select sum(balance) from account where branch-name = ’Perryridge’
Transaction T29 requires access to all tuples of the account relation pertaining to the
Perryridge branch.
Let T30 be a transaction that executes the following SQL insertion:
insert into account values (A-201, ’Perryridge’, 900)
Let S be a schedule involving T29 and T30. We expect there to be potential for a conflict
for the following reasons:
・ If T29 uses the tuple newly inserted by T30 in computing sum(balance), then
T29 read a value written by T30. Thus, in a serial schedule equivalent to S, T30
must come before T29.
・ If T29 does not use the tuple newly inserted by T30 in computing sum(balance),