Database Management
Database Management
Database Management
Database Management
Subject: DATABSE MANAGEMENT Credits: 4
SYLLABUS
Introduction to data base management system – Data versus information, record, file; data dictionary, database
administrator, functions and responsibilities; file-oriented system versus database system
Database system architecture – Introduction, schemas, sub schemas and instances; data base architecture, data
independence, mapping, data models, types of database systems
Data base security – Threats and security issues, firewalls and database recovery; techniques of data base
security; distributed data base
Data warehousing and data mining – Emerging data base technologies, internet, database, digital libraries,
multimedia data base, mobile data base, spatial data base
Suggested Readings:
1. A Silberschatz, H Korth, S Sudarshan, “Database System and Concepts”; fifth Edition; McGraw-Hill
2. Rob, Coronel, “Database Systems ”, Seventh Edition, Cengage Learning
DATABASE
MANAGEMENT
DATABASE MANAGEMENT SYSTEM
COURSE OVERVIEW
This course provides an immediately useable tools and the The students on completion of the course shall develop the
techniques in the methods of database management system, following skills and competencies:
requirements analysis, definition, specification and design etc. It • Database
provides participants with the details if the tools, techniques
• DBMS
and methods to lead or participate in the front-end phases.
• Database System Application
Database systems are designed to manage large bodies of
• File System
information.Management of data involves both defining
structures for storage of information & providing way for • Data Ionsistency
Objectives
• To help you to learn DBMS and design technique: what it is
and how one goes about doing it.
• The primary goal of DBMS is to provide an environment
that is both convenient & efficient for people to use in
retrieving & storing information..
• Database systems are designed to store large bodies of
information.
By the end of this material, you will be equipped with good
knowledge of technical information that will help you develop
& understand DBMS
ii
DATABASE
DATABASE MANAGEMENT
MANAGEMENT
CONTENT
Lesson No. Topic Page No.
INTRODUCTION TO DBMS
Lesson 1 Introduction to Database I 1
Lesson 2 Introduction to Database II 5
Lesson 3 Tutorial 8
Lesson 4 Database Concepts I 9
Lesson 5 Database Concepts II 13
DATA MODELS IN DATABASES
Lesson 6 Data Models 17
RELATIONAL DATA BASE MANAGEMENT SYSTEM
Lesson 7 Relational Database Management System I 21
Lesson 8 Relational Database Management System II 27
Lesson 9 E-R Model I 31
Lesson 10 E-R Model II 36
STRUCTURED QUERY LANGUAGES
Lesson 11 Structured Query Language(SQL) I 40
Lesson 12 LAB 45
Lesson 13 Lab 46
Lesson 14 SQL II 47
Lesson 15 LAB 55
Lesson 16 LAB 56
Lesson 17 SQL III 57
Lesson 18 Lab 62
Lesson 19 Lab 63
Lesson 20 SQL IV 64
Lesson 21 Lab 67
Lesson 22 Lab 68
Lesson 23 Integrity and security 69
Lesson 24 LAB 75
Lesson 25 LAB 76
Lesson 26 PL/SQL 77
Lesson 27 Lab 82
Lesson 28 Lab 83
Lesson 29 Database Triggers 84
Lesson 30 LAB 89
v
DATABASE
DATABASE MANAGEMENT
MANAGEMENT
CONTENT
Lesson No. Topic Page No.
Lesson 31 LAB 90
Lesson 32 Database Cursors 91
Lessom 33 LAB 100
Lesson 34 LAB 101
NORMALIZATION
Lesson 35 Normalisation I 102
Lesson 36 Normalisation I I 107
Lesson 37 Normalisation III 112
FILE ORGANIZATION METHODS
Lesson 38 File Organization Method I 118
Lesson 39 File Organization Method II 123
DATA BASE OPERATIONAL MAINTENANCE
Lesson 40 Transactions Management 130
Lesson 41 Concurrency Control I 136
Lesson 42 Concurency Control II 141
Lesson 43 Concurrency Control III 146
Lesson 44 Database Recovery 152
vi
DATABASE
LESSON 1:
INTRODUCTION TO DATABASE I
MANAGEMENT
Lessons Objective 1.2 Database System Applications
• Database There are many different types of DBMSs, ranging from small
systems that run on personal computers to huge systems that
• Database management system
run on mainframes. Database are applied in wide no. of
• Essentials of data applications. Following are some of the examples :-
• Benefits of DBMS • Banking: For customer information, accounts, loans & other
• Database system application banking transactions
• Purpose of database system • Airlines: For reservation & schedule information
1.1 What is Database Management • Universities: For student information, course
System(DBMS) registration,grades etc.
A database can be termed as a repository of data. A collection • Credit card transaction: For purchase of credit cards &
of actual data which constitutes the information regarding an generation of monthly statements.
organisation is stored in a database. For ex. There are 1000 • Tlecommunication: For keeping records of calls made ,
students in a college & we have to store their personal details, generating monthly billetc.
marks details etc., these details will be recorded in a database.
• Finance: For storing information about holdings, sales &
A collection of programs that enables you to store, modify, and purchase of financial statements
extract information from a database is known as DBMS.The
• Sales: For customer,product & purchase information
primary goal of a DBMS is to provide a way to store & retrieve
database information that is both convenient & efficient. • Manufacturing: For management of supply chain.
• Human Resource: For recording information about
employees,salaries,tax,benefits etc.
We can say that when ever we need to have a computerised
Actual data
storage
system, we need a database system
1.3 Purpose of Database system
A file system is one in which we keep the information in
Database systems are designed to manage large bodies of operating system files. Before the evolution of DBMS,
information.Management of data involves both defining organisations used to store information in file systems. A
structures for storage of information & providing way for typical file processing system is supported by a conventional
manipulation of data. In addition, the database system must operating system. The system stores permanent records in
ensure safety of data. various files & it need application program to extract records ,
DBMS is collection of programs that enables you to store, or to add or delete records .
modify, and extract important information from a database. We will compare both systems with the help of an example.
There are many different types of DBMS, ranging from small
There is a saving bank enterprise that keeps information about
sys-tems that run on personal computers to huge systems that
all customers & saving accounts. Following manipulations has
run on mainframes.
to be done with the system
Good data management is an essential prerequisite to corporate
• A program to debit or credit an account
success.
• A program to add a new account.
Data Information
• A program to find balance of an account.
Information Knowledge
• A program to generate monthly statements.
Knowledge Judgment
As the need arises new applications can be added at a particular
Judgment Decision
point of time as checking accounts can be added in a saving
Decision Success account. Using file system for storing data has got following
provided that data is: disadvantages:-
• complete 1. Data Redundancy and Inconsistency
• accurate Different programmers work on a single project , so various
• timely files are created by different programmers at some interval of
time. So various files are created in different formats & different
• easily available programs are written in different programming language.
1
Same information is repeated.For ex name & address may • The primary goal of DBMS is to provide an environment
DATABASE
appear in saving account file as well as in checking account. This that is both convenient & efficient for people to use in
redundancy sesults in higher storage space & access cost.It also retrieving & storing information.
leads to data inconsistency which means that if we change some • DBMS systems are ubiquitous today & most people
record in one place the change will not be reflected in all the interact either directly or indirectly with database many times
places. For ex. a changed customer address may be reflected in
MANAGEMENT
every day.
saving record but not any where else.
• Database systems are designed to store large bodies of
2. Difficulty in Accesing data information.
Accessing data from a list is also a difficulty in file • A major purpose of a DBMS is to provide users with an
system.Suppose we want to see the records of all customers abstract view of data i.e. the system hides how the data is
who has a balance less than $10,000, we can either check the list stored & maintained.
& find the names manually or write an application program .If
we write an application program & at some later time, we need Review Terms
to see the records of customer who have a balance of less than • Database
$20,000, then again a new program has to be written. • DBMS
It means that file processing system do not allow data to be • Database System Application
accessed in a convenient manner.
• File System
3. Data Isolation • Data Ionsistency
As the data is stored in various files, & various files may be
• Consistency constraints
stored in different format, writing application program to
retrieve the data is difficult. • Atomicity
• Redundancy
4. Integrity Problems
Sometimes, we need that data stored should satisfy certain • Data isolation
constraints as in a bank a minimum deposit should be of $100. • Data Security
Developers enforce these constraints by writing appropriate
Students Activity
programs but if later on some new constraint has to be added
then it is difficult to change the programs to enforce them. 1. What is database?Explain with example?
5. Atomicity Problems
Any mechanical or electrical devive is subject to failure, and so is
the computer system. In this case we have to ensure that data
should be restored to a consistent state.For example an amount
of $50 has to be transferred from Account A to Account B. Let
the amount has been debited from account A but have not
been credited to Account B and in the mean time, some failure
occurred. So, it will lead to an inconsistent state.
So,we have to adopt a mechanism which ensures that either full 2. What is DBMS?Explain with example?
transaction hould b executed or no transaction should be
excuted i.e. the fund transfer should be atomic.
6. Concurrent access Problems
Many systems allows multiple users to update the data
simultaneously. It can also lead the data in an inconsistent
state.Suppose a bank account contains a balance of $ 500 & two
customers want to withdraw $100 & $50 simultaneously. Both
the transaction reads the old balance & withdraw from that old
balance which will result in $450 & &400 which is incorrect.
3. List four significant difference between file system &
7.Security Problems DBMS?
All the user of database should not be able to access all the data.
For example a payroll
Personnel needs to access only that part of data which has
information about various employees & are not needed to
access information about customer accounts.
Points to Ponder
• A DBMS contains collection of inter-related data &
collection of programs to access the data.
2
4. What are the advantages of DBMS? 10. Explain consistency constraints in database?
DATABASE
MANAGEMENT
5. Explain various applications of database?
3
Student Notes
4
DATABASE MANAGEMENT
DATABASE
LESSON 2:
INTRODUCTION TO DATABASE II
MANAGEMENT
Lesson Objective
• Data abstraction
• View of data
• Levels of data
• Physical level
• Logical level
• View level
• Database language
• DDL,DML
View of Data
A database contains a no. of files & certain programs to access &
modify these files.But the actual data is not shown to the user,
• View level
the system hides actual details of how data is stored & main-
tained. This level contains the actual data which is shown to the
users. This is the highest level of abstraction & the user of
Data Abstraction this level need not know the actual details(complexity) of
Data abstraction is the process of distilling data down to its data storage.
essentials. The data when needed should be retrieved efficiently.
As all the details are not of use for all the users, so we hide the
actual(complex) details from users. Various level of abstraction
to data is provided which are listed below:-
• Physical level
It is the lowest level of abstraction & specifies how the data
is actually stored. It describes the complex data structure in
details.
Database Language
As a language is required to understand any thing, similarly to
create or manipulate a database we need to learn a
language.Database language is divided into mainly 2 parts :-
1. DDL(Data definition language)
2. DML(Data Manipulation language)
Data Definition Language (DDL)
Used to specify a database scheme as a set of definitions
• Logical level expressed in a DDL
It is the next level of abstraction & describes what data are 1. DDL statements are compiled, resulting in a set of tables
stored in database & what relationship exists between varius stored in a special file called a data dictionary or data
data. It is less complex than physical level & specifies simple directory.
structures. Though the complexity of physical level is 2. The data directory contains metadata (data about data)
required at logical level, but users of logical level need not 3. The storage structure and access methods used by the
know these complexities. database system are specified by a set of definitions in a
5
special type of DDL called a data storage and definition • Dml
DATABASE
6
DATABASE MANAGEMENT
7
Student Notes
DATABASE
LESSON 3:
TUTORIAL
MANAGEMENT
8
DATABASE
LESSON 4:
DATABASE CONCEPTS I
MANAGEMENT
Lesson objective • Null value allowed
• Data dictionary Null or non-existing data value may be or may not be
allowed for an element. Element with possible null values
• Meta data
needs special considerations in reports and may cause
• Database schema problems, if used as a key.
• Database Instance • Default value
• Data independence Data element may have a default value. Default value may
be a variable, like current date and time of the day (DoD).
Data Dictionary
English language dictionaries define data in terms such as • Element coding (allowed values) and intra-element
“…known facts or things used as a basis for inference or validation details or reference to other documents
reckoning, typically (in modern usage) operated upon or Explanation of coding (code tables, etc.) and validation
manipulated by computers, or Factual information used as a rules when validating this element alone in the application
basis for discussion, reasoning, or calculation domain.
Data dictionary may cover the whole organisation, a part of the • Inter-element validation details or reference to other
organisation or a database. In its simplest form, the data documents
dictionary is only a collection of data element definitions. More Validation rules between this element and other elements in
advanced data dictionary contains database schema with the data dictionary.
reference keys, still more advanced data dictionary contains • Database table references
entity-relationship model of the data elements or objects. Reference to tables the element is used and the role of the
Parts of Data Dictionary element in each table. Special indication when the data
element is the key for the table or a part of the key.
1. Data Element Definitions
• Definitions and references needed to understand the
Data element definitions may be independent of table defini-
meaning of the element
tions or a part of each table definition
Short application domain definitions and references to
• Data element number other documents needed to understand the meaning and
Data element number is used in the technical documents. use of the data element.
• Data element name (caption) • Source of the data in the element
Commonly agreed, unique data element name from the Short description in application domain terms, where the
application domain. This is the real life name of this data data is coming. Rules used in calculations producing the
element. element values are usually written here.
• Short description • Validity dates for the data element definition
Description of the element in the application domain. Validity dates, start and possible end dates, when the
element is or was used. There may be several time periods
• Security classification of the data element
the element has been used.
Organisation-specific security classification level or possible
restrictions on use. This may contain technical links to • History references
security systems. Date when the element was defined in present form,
references to superseded elements, etc.
• Related data elements
• External references
List of closely related data element names when the relation
References to books, other documents, laws, etc.
is important.
• Version of the data element document
• Field name(s)
Version number or other indicator. This may include formal
Field names are the names used for this element in
version control or configuration management references,
computer programs and database schemas. These are the
but such references may be hidden, depending on the
technical names, often limited by the programming
system used.
languages and systems.
• Date of the data element document
• Code format
Writting date of this version of the data element
Data type (characters, numeric, etc.), size and, if needed,
document.
special representation. Common programming language
notation, input masks, etc. can be used.
9
• Quality control references For example, meta-data would document data about data
DATABASE
Organisation-specific quality control endorsements, dates, elements or attributes, (name, size, data type, etc) and data
etc. about records or data structures (length, fields, columns, etc)
• Data element notes and data about data (where it is located, how it is associated,
Short notes not included in above parts. ownership, etc.). Meta-data may include descriptive information
about the context, quality and condition, or characteristics of the
MANAGEMENT
Meta-Data DML
Meta-data is definitional data that provides information about Query Language
or documentation of other data managed within an application • Data dictionary
or environment. • Metadata
10
DATABASE
Student Activity
1. What is difference between database Schema & database
instance?
MANAGEMENT
2. What do you understand by the structure of a database?
11
Student Notes
12
DATABASE MANAGEMENT
DATABASE
LESSON 5:
DATABASE CONCEPTS II
MANAGEMENT
Lesson Objective • Storage structure and access method definition: writing
• Database manager a set of definitions translated by the data storage and
definition language compiler
• Database user
• Scheme and physical organization modification: writing
• Database administrator
a set of definitions used by the DDL compiler to generate
• Role of Database administrator modifications to appropriate internal system tables (e.g. data
• Role of Database user dictionary). This is done rarely, but sometimes the database
• Database architecture scheme or physical organization must be modified.
13
2. Components include:
DATABASE
14
• Database Language
DATABASE
DDL
DML
Query Language
• Data dictionary
MANAGEMENT
• Metadata
• Database Administrator
• Database User
Student Activity
1. What are the various kinds of database users?
15
Student Notes
16
DATABASE MANAGEMENT
DATABASE
LESSON 6:
DATA MODELS
MANAGEMENT
Lesson objective If we want to create a structure where in a course various
• Understsnding data models students are there & these students are given certain marks in
assignment.
• Different types of data models
• Hierchachical data model
• network data model
• Relational model
Data models are a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
A data model is a “description” of both a container for data
and a methodology for storing and retrieving data from that However, as you can imagine, the hierarchical database model
container. Actually, there isn’t really a data model “thing”. Data has some serious problems. For one, you cannot add a record
models are abstractions, oftentimes mathematical algorithms to a child table until it has already been incorporated into the
and concepts. You cannot really touch a data model. But parent table. This might be troublesome if, for example, you
nevertheless, they are very useful. The analysis and design of wanted to add a student who had not yet signed up for any
data models has been the cornerstone of the evolution of courses.
databases. As models have advanced so has database efficiency. Worse, yet, the hierarchical database model still creates repetition
There are various kinds of data models i.e. in a database records of data within the database. You might imagine that in the
can be arranged in various ways.The various ways in which data database system shown above, there may be a higher level that
can be represented are:- includes multiple course. In this case, there could be redundancy
1. Hierarchial data model because students would be enrolled in several courses and thus
each “course tree” would have redundant student information.
2. Network data model
Redundancy would occur because hierarchical databases handle
3. Relational Model
one-to-many relationships well but do not handle many-to-
4. E-R-Model many relationships well. This is because a child may only have
The Hierarchical Model one parent. However, in many cases you will want to have the
Organization of the records is as a collection of trees. As its child be related to more than one parent. For instance, the
name implies, the Hierarchical Database Model defines hierarchi- relationship between student and class is a “many-to-many”.
cally-arranged data. Not only can a student take many subjects but a subject may
Perhaps the most intuitive way to visualize this type of also be taken by many students. How would you model this
relationship is by visualizing an upside down tree of data. In relationship simply and efficiently using a hierarchical database?
this tree, a single table acts as the “root” of the database from The answer is that you wouldn’t.
which other tables “branch” out.
You will be instantly familiar with this relationship because that
is how all windows-based directory management systems (like
Windows Explorer) work these days.
Relationships in such a system are thought of in terms of
children and parents such that a child may only have one parent
but a parent can have multiple children. Parents and children are Though this problem can be solved with multiple databases
tied together by links called “pointers” (perhaps physical creating logical links between children, the fix is very kludgy and
addresses inside the file system). A parent will have a list of awkward.
pointers to each of their children. Faced with these serious problems, the computer brains of the
world got together and came up with the network model.
Network Databases
In many ways, the Network Database model was designed to
solve some of the more serious problems with the Hierarchical
Database Model. Specifically, the Network model solves the
problem of data redundancy by representing relationships in
terms of sets rather than hierarchy. The model had its origins in
17
the Conference on Data Systems Languages (CODASYL) which • The Sequence of Columns is Insignificant
DATABASE
had created the Data Base Task Group to explore and design a • The Sequence of Rows is Insignificant
method to replace the hierarchical model.
• Each Column Has a Unique Name
The network model is very similar to the hierarchical model
Certain fields may be designated as keys, which means that
actually. In fact, the hierarchical model is a subset of the network
searches for specific values of that field will use indexing to
MANAGEMENT
18
DATABASE
Students Activity
1. Define data models?
MANAGEMENT
2. Define hierarchichal data model?
19
Student Notes
20
DATABASE MANAGEMENT
DATABASE
LESSON 7:
RELATIONAL DATABASE MANAGEMENT SYSTEM I
MANAGEMENT
student_id student_name address
Lesson Objective
• Understanding RDBMS
• Understanding data structures 8656789 Peta Williams 9, Davis Hall
• Understanding data manipulation 8700074 John Smith 9, Davis Hall
8900020 Arun Kumar 90, Second Hall
• Understanding various relational algebra operation 8801234 Peter Chew 88, Long Hall
• Understanding data integrity 8654321 Reena Rani 88, Long Hall
8712374 Kathy Garcia 88, Long Hall
The relational model was proposed by E. F. Codd in 1970. It The relation student
8612345 Chris Watanabe 11, Main Street
deals with database management from an abstract point of
student_id subject_id
view. The model provides specifications of an abstract database
management system.To use the database management systems 8700074 CP302
based on the relational model however, users do not need to 8900020 CP302
master the theoretical foundations. Codd defined the model as 8900020 CP304
consisting of the following three components: 8700074 MA111
1. Data Structure - a collection of data structure types for 8801234 CP302
building the database. 8801234 CH001
2. Data Manipulation - a collection of operators that may be
The relation enrolment
used to retrieve, derive or modify data stored in the data
structures. subject_id subject_name department
CP302 Database Management Comp. Science
3. Data Integrity - a collection of rules that implicitly or
CP304 Software Engineering Comp. Science
explicitly define a consistent database state or changes of CH001 Introduction to Chemistry Chemistry
states PH101 Physics Physics
Data Structure MA111 Pure Mathematics Mathematics
Often the information that an organisation wishes to store in a We list a number of properties of relations:
computer and process is complex and unstructured. For
example, we may know that a department in a university has 1. Each relation contains only one record type.
200 students, most are full-time with an average age of 22 years, 2. Each relation has a fixed numer of columns that are
and most are females. Since natural language is not a good explicitly named. Each attribute name within a relation is
language for machine processing, the information must be unique.
structured for efficient processing. In the relational model the 3. No two rows in a relation are the same.
information is structures in a very simple way 4. Each item or element in the relation is atomic, that is, in
We consider the following database to illustrate the basic each row, every attribute has only one value that cannot be
concepts of the relational data model. decomposed and therefore no repeating groups are allowed.
5. Rows have no ordering associated with them.
6. Columns have no ordering associated with them (although
most commercially available systems do).
The above properties are simple and based on practical consider-
The above database could be mapped into the following ations. The first property ensures that only one type of
relational schema which consists of three relation schemes. Each information is stored in each relation. The second property
relation scheme presents the structure of a relation by specifying involves naming each column uniquely. This has several
its name and the names of its attributes enclosed in parenthe- benefits. The names can be chosen to convey what each column
sis. Often the primary key of a relation is marked by is and the names enable one to distinguish between the column
underlining. and its domain. Furthermore, the names are much easier to
student(student_id, student_name, address) remember than the position of the position of each column if
enrolment(student_id, subject_id) the number of columns is large.
subject(subject_id, subject_name, department) The third property of not having duplicate rows appears
An example of a database based on the above relational model obvious but is not always accepted by all users and designers of
is: DBMS. The property is essential since no sensible context free
21
meaning can be assigned to a number of rows that are exactly
DATABASE
Data Manipulation
the same. The manipulative part of relational model makes set processing
The next property requires that each element in each relation be (or relational processing) facilities available to the user. Since
atomic that cannot be decomposed into smaller pieces. In the relational operators are able to manipulate relations, the user
relation model, the only composite or compound type (data does not need to use loops in the application programs.
Avoiding loops can result in significant increase in the produc-
MANAGEMENT
• (a) the attribute or the set of attributes uniquely identifies Projection Cartesian product
each tuple in the relation (called uniqueness), and Join
Selection
• (b) if the key is a set of attributes then no subset of these
attributes has property (a) (called minimality). Left outer join
Renaming
There may be several distinct set of attributes that may serve as Right outer join
candidate keys. One of the candidate keys is arbitrarily chosen as Union
the primary key of the relation. Intersection Full outer join
The three relations above student, enrolment and subject have
Assignment Semijoin
degree 3, 2 and 3 respectively and cardinality 4, 6 and 5 respec-
tively. The primary key of the the relation student is student_id,
of relation enrolment is (student_id, subject_id), and finally the The Select Operation
primary key of relation subject is subject_id. The relation
student probably has another candidate key. If we can assume
The select operation selects tuples that satisfy a given condition..
the names to be unique than the student_name is a candidate
The argument relation is in parentheses after the σ. Thus, to
key. If the names are not unique but the names and address
select those tuples of the loan relation where the branch is
together are unique, then the two attributes (student_id,
“Perryridge,” we write
address) is a candidate key. Note that both student_id and
(student_id, address) cannot be candidate keys, only one can. σ branch-name = “Perryridge” (loan)
Similarly, for the relation subject, subject_name would be a If the loan relation is as shown , then the relation that results
candidate key if the subject names are unique. from the preceding query will be a different relation.
The relational model is the most popular data model for We can find all tuples in which the amount lent is more than
commercial data processing applications.It is very much simple $1200 writing
due to which the programmer’s work is reduced. σ amount>1200 (loan)
22
In general, we allow comparisons using =, ≠, <, ≤, >, ≥ in the Example: The table E (for Employee)
DATABASE
selection predicate. Furthermore, we can combine several enr ename dept
predicates into a larger predicate by using the connectives and
1 Bill A
(Λ), or (V), and not (¬). Thus, to find those tuples pertaining
to loans of more than $1200 made by the Perryridge branch, we 2 Sarah C
write 3 John A
MANAGEMENT
σ branch-name = “Perryridge” L amount>1200 (loan) Example: The table D (for Department)
dnr dname
A Marketing
B Sales
C Legal
The selection predicate may include comparisons between two
Result Relational algebra
attributes. To Illustrate, consider the relation loan-officer that enr ename dept dnr dname EXD
consists of three attributes: customer-name, banker-name, and 1 Bill A A Marketing
loan-number, which specifies that a particular banker is the loan 1 Bill A B Sales
1 Bill A C Legal
officer for a loan that belongs to some customer. To find all
2 Sarah C A Marketing
customers who have the same name as their loan officer, we can 2 Sarah C B Sales
write 2 Sarah C C Legal
3 John A A Marketing
σ customer-name = banker-name (loan-officer) 3 John A B Sales
Projection 3 John A C Legal
Projection is the operation of selecting certain attributes from a • Seldom useful in practice.
relation R to form a new relation S. For example, one may only
• Can give a huge result.
be interested in the list of names from a relation that has a
number of other attributes. Projection operator may then be The Union Operation
used. Like selection, projection is an unary operator. Consider a query to find the names of all bank customers who
II loan-number, amount (loan) have either an account or a loan or both. Note that the customer
relation does not contain the information, since a customer
Composition of Relational Operations does not need to have either an account or a loan at the bank.
The fact that the result of a relational operation is itself a To answer this query, we need the information in the depositor
relation is important. Consider the more complicated query relation and in the borrower relation . We know how to find
Perryridge
“Find those customers who live1500
in Harrison.” We write: the names of all customers with a loan in the bank:
II customer-name (σ customer-city = “Harrison” (customer)) II customer-name (borrower)
Notice that, instead of giving the name of a relation as the We also know how to find the names of all customers with an
argument of the projection operation, we give an expression account in the bank:
that evaluates to a relation.
II customer-name (depositor)
In general, since the result of a relational-algebra operation is of
To answer the query, we need the union of these two sets; that
the same type (relation) as its inputs, relational-algebra opera-
is, we need all customer names that appear in either or both of
tions can be composed together into a
the two relations. We find these data by the binary operation
loan-number amount union, denoted, as in set theory, by U. So the expression needed
L-11 900 is
L-14 1500
II customer-name (borrower) U II customer-name (depositor)
L-15 1500
L-16 1300 There are 10 tuples in the result, even though there are seven
L-17 1000 distinct borrowers and six depositors. This apparent discrepancy
L-23 2000 occurs because Smith, Jones, and Hayes are borrowers as well as
L-93 500 depositors. Since relations are sets, duplicate values are elimi-
nated.
customer-name
Loan number and the amount of the loan. Adams
Relational-algebra expression. Composing relational-algebra Curry
operations into relational-algebra expressions in just like Hayes
Jackson
composing arithmetic operations (such as +, -, *, and %) into
Jones
arithmetic expressions. Smith
Cartesian Product Williams
Lindsay
The cartesian product of two tables combines each row in one
Johnson
table with each row in the other table. Turner
23
Names of all customers who have either a loan or an account. of the query. For relational-algebra queries, assignment must
DATABASE
For a union operation r U s to be valid, we require that two always be made to a temporary relation variable.Note that the
conditions hold: assignment operation does not provide any additional power to
the algebra. It is, however, a convenient way to express complex
1. The relations r and s must have the same number of
queries.
attributes.
MANAGEMENT
2. The domains of the ith attribute of r and the ith attribute Points to Ponder
of s must be the same, for all i. • The relational model was proposed by E. F. Codd in
Note that r and s can be, in general, temporary relations that are 1970.
the result of relational-algebra expressions. • provides specifications of an abstract database
management system
The Set-Intersection Operation
The first additional-relational algebra operation that we shall • It consists of the following three components:
define is set intersection (I ). Suppose that we wish to find all 1. Data Structure – a collection of data structure types for
customers who have both a loan and an account. Using set building the database.
intersection, we can write 2. Data Manipulation – a collection of operators that may be
IIcustomer-name (borrower) IIcustomer-name (deposi- used to retrieve, derive or modify data stored in the data
tor) structures.
Note that we can rewrite any relational algebra expression that 3. Data Integrity – a collection of rules that implicitly or
uses set intersection by replacing the intersection operation with explicitly define a consistent database state or changes of.
a pair of set-difference operations as: • Relational algebra describes a set of algebraic
operations that operates on tables, & output a table as
Thus, set intersection is not a fundamental operation and does a result.
not add any power to the relational algebra. It is simply more Review Terms
convenient to write r I s than to write r – (r – s). • Table/Relation
The Set Difference Operation • Tuple
The set-difference operation, denoted by -, allows us to find
• Domain
tuples that are in one relation but are not in another. The
expression r – s produces a relation containing those tuples in r • Database schema
but not in s. • Database instance
We can find all customers of the bank who have an account but • Keys
not a loan by writing Primary key
II customer-name (depositor) – II customer-name (borrower) Foreign key
As with the union operation, we must ensure that set differ- • Relational algebra
ences are taken between compatible relations. Therefore, for a
Student Activity
set difference operation r – s to be valid,
1. Why do we use RDBMS?
we require that the relations r and s be of the same arity, and
that the domains of the ith attribute of r and the ith attribute
of s be the same.
The Assignment Operation
It is convenient at times to write a relational-algebra expression
by assigning parts of it to temporary relation variables. The
assignment operation, denoted by ← , works like assignment in
a programming language.
temp1 amount>1200 (loan)
2. Define relation,tuple,domain,keys?
temp2 II loan-number, amount (loan)
result = temp1 – temp2
The evaluation of an assignment does not result in any relation
being displayed to the user. Rather, the result of the expression
to the right of the is assigned to the relation variable on the
left of the . This relation variable may be used in subsequent
expressions.
With the assignment operation, a query can be written as a
sequential program consisting of a series of assignment
followed by an expression whose value is displayed as the result
24
3. What is the difference between Intersection, Union &
DATABASE
Cartesian product?
MANAGEMENT
25
Student Notes
26
DATABASE MANAGEMENT
DATABASE
LESSON 8:
RELATIONAL DATABASE MANAGEMENT SYSTEM II
MANAGEMENT
Lesson Objectives distinct). An example arises in the query “Find the number of
• Elaborating various other features of Relational algebra branches appearing in the pt-works relation.” In this case, a
branch name count only once, regardless of the number of
• Understanding aggregate function employees working that branch. We write this query as follows:
• Understanding joins Gcount-distinct(branch-name) (pt-works)
Understanding natural, outer, inner joins? In the above figure, the result of this query is a single row
Aggregate Functions containing the value 3.
Aggregate functions take a collection of values and return a Suppose we want to find the total salary sum of all part-time
single value as a result. For example, the aggregate function sum employees at each branch of the bank separately, rather than the
takes a collection of values and returns the sum of the values. sun for the entire bank. To do so, we need to partition the
Thus, the function sum applied on the collection relation pt-works into group based on the branch, and to apply
{1,1,3,4,4,11} the aggregate function on each group.
returns the value 24. The aggregate function avg returns the The following expression using the aggregation operator G
average of the values. When applied to the preceding collection, achieves the desired result:
it returns the value 4. The aggregate function count returns the branch-name
G sum(salary) (pt-works)
number of the elements in the collection, and returns 6 on the In the expression, the attribute branch-name in the left-hand
preceding collection. Other common aggregate function include subscript of G indicates that the input relation pt-works must
min and max, which return the minimum and maximum be divided into groups based on the value of branch-name.
values in a collection; they return 1 and 11, respectively, on the Following Figure shows the resulting groups.
preceding collection.
employee-name branch-name salary
The collections on which aggregate functions operate can have
Rao Austin 1500
multiple occurrences of a value; the order in which the values Sato Austin 1600
appear is not relevant. Such collections are called multisets. Sets Johnson Downtown 1500
are a special case of multisets where there is only one copy of Loreena Downtown 1300
each element. Peterson Downtown 2500
To illustrate the concept of aggregation, we take the following Adams Perryridge 1500
example Brown Perryridge 1300
Gopal Perryridge 5300
Adams Perryridge 1500
Brown Perryridge 1300 The expression sum(salary) in the right-hand subscript of G
Gopal Perryridge 5300 indicates that for each group of tuples (that is, each branch), the
Johnson Downtown 1500 aggregation function sum must be applied on the collection of
Loreena Downtown 1300 values of the salary attribute. The output relation consists of
Peterson Downtown 2500 tuples with the branch name, and the sum of the salaries for
Rao Austin 1500 the branch, as shown in Figure
Sato Austin 1600
branch-name sum of salary
Austin 3100
G sum(salary)
(pt-works) Downtown 5300
The symbol G is the letter G in calligraphic fount; read it as Perryridge 8100
“calligraphic G.” The relational-algebra operation G signifies
that aggregation is to be applied, and its subscript specifies the The general from of the aggregation operation G is as follows:
aggregate operation to be applied. The result of the expression G1,G2,…,Gn G F1 (A1), F2(A2),…,Fm(Am)(E)
above is a relation with a single attribute, containing a single Where E is any relational-algebra expression; G1, G2,…, Gn
row with a numerical value corresponding to the sum of all the constitute a list of attributes on which to group; each Fi is an
series of all employees working part-time in the bank. aggregate function; and each Ai is an attribute
There are cases where we must eliminate multiple occurrences of
a value before computing an aggregate function. If we do want
Join
The Natural-Join Opeation
to eliminate duplicates, we use the same function names as
before, with the addition of the hyphenated string “distinct” If is often desirable to simplify certain queries that require a
appended to the end of the function name (for example, count- Cartesian product. Usually, a query that involves a Cartesian
27
product includes a selection operation on the result of the the natural join. The tuple (Gates, null, null, Redmond, 5300) is
DATABASE
Cartesian product. Consider the query “Find the names of all such a tuple. Thus, all information from the right relation is
customers who have a loan at the bank, along with the loan present in the result of the right outer join.
number and the loan amount.” We first form the Cartesian The full outer join ( ) does both of those operations,
product of the borrower and loan relations. Then, we select padding tuples from the left relation that did not match any
those tuples that pertain to only the same loan-number,
MANAGEMENT
from the right relation, as well as tuples from the right relation
followed by the projection of the resulting customer-name, that did not match any from the left relation, and adding them
loan-number, and amount: to the result of the join.
IIcustomer-name, loan.loan-number, amount
(σ borrower.loan-number = loan.loan-number (borrower x
loan))
The natural join is a binary operation that allows us to combine
certain selections and a Cartesian product into one operation. It
Result of employee ft-works.
is denoted by the “join” symbol ( ). The natural-join opera-
tion forms a Cartesian product of its two arguments, performs
a selection forcing equality on those attributes that appear in
both relation schemas, and finally removes duplicate attributes.
Outer Join
The outer-join operation is an extension of the join operation
to deal with missing information. Suppose that we have the Result of employee ft-works.
relations with the following schemas, which contain data on Renaming Tables and Columns
full-time employees: Example: The table E (for Employee)
employee (employee-name, street, city)
ft-works (employee-name, branch-name, salary)
Suppose that we want to generate a single relation with all the
information (street, city, branch name, and salary) about full-
time employees. A possible approach would be to use the
natural join operation as follows: Example: The table D (for Department)
employee ft-works
nr name
We can use the outer-join operation to avoid this loss of A Marketing
information. There are actually three forms of the operation: let
B Sales
outer join, denoted ; right outer join, denoted ; and
C Legal
full outer join, denoted . All three forms of outer join
compute the join, and add extra tuples to the result of the join. We want to join these tables, but:
• Several columns in the result will have the same name (nr
and name).
• How do we express the join condition, when there are two
columns called nr?
The result of employee ft-works.
Solutions
• Rename the attributes, using the rename operator.
• Keep the names, and prefix them with the table name, as is
done in SQL. (This is somewhat unorthodox.)
Result Relational algebra
Result of employee ft-works. enr ename dept dnr dname
1 Bill A A Marketing (p (enr, ename, dept)(E)) ? dept = dnr (p (dnr,
The left outer join ( ) takes all tuples in the left relation that
2 Sarah C C Legal dname)(D))
did not match with any tuple in the right relation, pads the 3 John A A Marketing
tuple with null values for all other attributes from the right
relation, and adds them to the result of the natural join. The You can use another variant of the renaming operator to change
tuple (Smith, Revolver, Death Valley, null, null) is such a tuple. the name of a table, for example to change the name of E to R.
All information from the left relation is present in the result of This is necessary when joining a table with itself (see below).
the left outer join. .p R(E)
The right outer join ( ) is symmetric with the left outer join: A third variant lets you rename both the table and the columns:
It pads tuples from the right relation that did not match any (E)
.p R(enr, ename, dept)
from the left relation with nulls and adds them to the result of
28
5. Define rename operators?
DATABASE
Points to Ponder
• Aggregate functions take a collection of values and return a
single value as a result.
• Usually, a query that involves a Cartesian product includes a
selection operation on the result of the Cartesian product.
MANAGEMENT
Review Terms
• Aggregate functions
• Joins
• Natural join
• Outer join
• Right outer join
• Left outer join
Student Notes
1. Define aggregate functions with example?
4. Differentiate between left outer join & right outer join with
the help of example?
29
Student Notes
30
DATABASE MANAGEMENT
DATABASE
LESSON 9:
E-R MODEL - I
MANAGEMENT
Lesson Objective Each entity has a value for each of its attributes.For each
• Understanding entity attribute ,there is a set of permitted values called domain or
value set.
• Understanding relationship
• Understanding attribute, domain, entity set Simple and Composit Attributes
A simple attribute has got one value for its attribute & a
• Understanding Simple & composit Attributes
comosit attribute is one which can be divided into sub-parts.
• Understanding Derived Attribute For example an attribute name can be divided into first name
• Understanding relationship set middle name & last name.
• Components of E-R-Diagrams Single and Multivalued Attributes
• Designing E-R-diagrams An attribute which have got only one value is known as single
ER considers the real world to consist of entities and relation- valued attribute.For ex. the loan_no attribute will have only one
ships among them. An Entity is a ‘thing’ which can be distinctly loan_no. There may be cases when an attribute has a set of
identified, for example a person, a car, a subroutine, a wire, an values for a specific entity.For ex. an attribute phone_no. may
event. have a value zero,one or several phone_no. This is known as
multivalued attribute.
A Relationship is an association among entities, eg person
Owns car Derived Attribute
Its value is derived from value of other related attributes or
is an association between a person and a car person EATS dish
entities.For ex. an attribute age can be calculated from another
IN place is an association among a person, a dish and a place.
attribute date_of_birth.
Attribute, Value, Domain, Entity Set
The information about one entity is expressed by a set of
Relationship, Relationship Set
A relationship is an association among several entities.For ex. a
(attribute,value) pairs, eg a car model could be:
customer A is associated with loan_no L1. A relationship set is
| Name = R1222 |
a subset of the cartesian product of entity sets. For example a
Power = 7.3 relationship set (abbreviated as RSet) on the relationship
Nseats =5 ‘Person HAS_EATEN Dish IN Place’ could be as shown in
Values of attributes belong to different value-sets or domains, ---------------------------------------
| RSet 'Person HAS_EATEN Dish IN Place' |
for example, for a car, |---------------------------------------|
| Person | Dish | Place |
Nseats is an integer between 1 and 12 |---------|--------|--------------------|
Entities defined by the same set of attributes can be grouped | Steve | Duck | Kuala Lumpur |
| Weiren | Duck | Beijing |
into an Entity Set (abbreviated as ESet) as shown in | Paolo | Noodles| Naples |
| Mike | Fondue | Geneva |
|-------------------------------| | Paolo | Duck | Beijing |
| ESET : CarModel | |---------------------------------------|
|-------------------------------|
| Name | Power | Nseats | Notice that an RSet is an ESet having ESets as attributes.
|----------|--------------------|
| R1222 | 7.3 | 5 | Components of ER-D
| HZ893 | 6.8 | 5 | The overall logical structure of a database can be expressed
| R1293 | 5.4 | 4 | graphically by an E-R diagram.The various components of an
|-------------------------------|
E-R diagram are as folows:-
1. Rectapgles, which represent entity sets.
An Entity Set
A given set of attributes may be referred to as an entity type. All 2. _ Ellipses, which represent attributes.
entities in a given ESet are of the same type, but sometimes 3. Diamonds, which represents relationship sets
there can be more that one set of the same type. The set of all 4. Lines, which link attributes to entity sets and entity sets to
persons who are customers at a given bank can be defined as an relationship sets.
entity set customer. The individual entity that constitutes a set
5. Double ellipses, which represent multi-valued attributes.
are said to be extension to entity set. So all the individual bank
customers are the extension of entity set customer. 6. Dashed ellipse which represents derived attributes
7. Double lines,which indicate total participation of an entity
in a relationship set.
31
DATABASE
Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin
have added some sligh refinements to the basic ERD principles.
MANAGEMENT
Entity
An entity is an object or concept about which you want to
store information.
Weak Entity
A weak entity is dependent on another entity to exist..
Attributes
Attributes are the properties or characteristics of an
entity.
Key attribute
A key attribute is the unique, distinguishing characteristic
of the entity. For example, an employee's social security
number might be the employee's key attribute.
Multivalued attribute
A multivalued attribute can have more than one value.
For example, an employee entity can have multiple skill
values.
Derived attribute
A derived attribute is based on another attribute. For
example, an employee's monthly salary is based on the
employee's annual salary.
Relationships
Relationships illustrate how two entities share information
in the database structure.
32
DATABASE
Weak relationship
To connect a weak entity with others, you
MANAGEMENT
should use a weak relationship notation.
Cardinality
Cardinality specifies how many instances of
an entity relate to one instance of another
entity.
Recursive relationship
In some cases, entities can be self-linked. For
example, employees can supervise other
employees.
Mapping Cardinalities
It express the number of entities to which other entity can be
associated via a relationship set. It can be of the following
types:-
1. One to one: An entity in A is associated with at most one
entity in B, & an entity in B is associated with at most one
entity in A.
2. One to many: An entity in A is associated with any
number(zero or more) of entities in B.An entity in B
however can be associated with at most one entity in A.
3. Many to one: An entity in A is associated with at most one
entity in B.An entity in B however can be associated with
any number(zero or more) of entities in A.
4. Many to many: An entity in A is associated with any
number(zero or more) of entities in B & an entity in B is
associated with any number(zero or more) of entities in A.
33
One to many relationship
DATABASE
Points to Ponder
• An Entity is a ‘thing’ which can be distinctly identified, for
example a person, a car, a subroutine, a wire, an event.
• A Relationship is an association among entities, eg person
Owns car
• A given set of attributes may be referred to as an entity type
• A simple attribute has got one value for its attribute & a 5. Differentiate between single & multi-valued attribute?
comosit attribute is one which can be divided into sub-
parts.
• value is derived from value of other related attributes or
entities is known as derived attribute.
• A relationship is an association among several entities
• A relationship set is a subset of the cartesian product of
entity sets
6. Define cardinality?Explain various kinds of cardinality?
Review Terms
• Entity
• Entity set
• Attribute
• Domain
• Value
• Relationship
7. Define various components of E-R-Diagram?
• Relationship set
• Cardinality
• Association
Student Notes
1. Define entity, domain,value?
34
DATABASE MANAGEMENT
35
Student Notes
DATABASE
LESSON 10:
E-R MODEL - II
MANAGEMENT
36
3. Examine relationships between entities closely. Are they
DATABASE
An Entity Relationship Diagram
Methodology necessary? Are there any relationships missing? Eliminate
any redundant relationships. Don’t connect relationships to
Identify the roles, events, locations, tangible things
1. Identify Entities or concepts about which the end-users want to each other.
store data.
Find the natural associations between pairs of
4. Use colors to highlight important portions of your
MANAGEMENT
2. Find Relationships diagram.
entities using a relationship matrix.
Put entities in rectangles and relationships on line
3. Draw Rough ERD
segments connecting the entities.
Determine the number of occurrences of one entity
4. Fill in Cardinality
for a single occurrence of the related entity.
Identify the data attribute(s) that uniquely identify
5. Define Primary Keys
one and only one occurrence of each entity.
Eliminate Many-to-Many relationships and include
6. Draw Key-Based ERD
primary and foreign keys in each entity.
Name the information details (fields) which are
7. Identify Attributes
essential to the system under development.
For each attribute, match it with exactly one entity
8. Map Attributes
that it describes.
Adjust the ERD from step 6 to account for entities
9. Draw fully attributed ERD
or relationships discovered in step 8.
Does the final Entity Relationship Diagram
10. Check Results
accurately depict the system data?
A Simple Example
A company has several departments. Each department has a
supervisor and at least one employee. Employees must be
assigned to at least one, but possibly more departments. At
least one employee is assigned to a project, but an employee
may be on vacation and not assigned to any projects. The
important data fields are the names of the departments,
projects, supervisors and employees, as well as the supervisor
and employee number and a unique project number.
1. Identify Entities
The entities in this system are Department, Employee, Supervi-
sor and Project. One is tempted to make Company an entity,
but it is a false entity because it has only one instance in this
problem. True entities must have more than one instance.
2. Find Relationships
We construct the following Entity Relationship Matrix:
Department Employee Supervisor Project
Department is assigned run by
Employee belongs to works on
Supervisor Runs
Project uses
37
E-R diagram with composite, multivalued, and derived
DATABASE
Student Activity
attributes 1. Explain the difference between primary key, candidate key &
super key?
MANAGEMENT
38
DATABASE MANAGEMENT
39
Student Notes
DATABASE
LESSON 11:
STRUCTURED QUERY LANGUAGE(SQL)I
MANAGEMENT
Lesson Objective these tables. Tables are uniquely identified by their names and
• Understanding SQL are comprised of columns and rows. Columns contain the
column name, data type, and any other attributes for the
• Understanding DDL, DML
column. Rows contain the records or data for the columns.
• Creating tables Here is a sample table called “weather”.
• Selecting data city, state, high, and low are the columns. The rows contain the
• Constraints data for this table:
• Drop Weather
• Insert city state high low
• Select with conditions Phoenix Arizona 105 90
SQL (pronounced “ess-que-el”) stands for Structured Query Tucson Arizona 101 92
Language. SQL is used to communicate with a database.
Flagstaff Arizona 88 69
According to ANSI (American National Standards Institute), it
is the standard language for relational database management San Diego California 77 60
systems. SQL statements are used to perform tasks such as New
Albuquerque 80 72
update data on a database, or retrieve data from a database. Mexico
Some common relational database management systems that
Data-Definition Language
use SQL are: Oracle, Sybase, Microsoft SQL Server, Access,
The set of relations in a database must be specified to the
Ingres, etc. Although most database systems use SQL, most of
system by means of a data definition language (DDL).
them also have their own additional proprietary extensions that
are usually only used on their system. However, the standard The SQL DDL allows specification of not only a set of
SQL commands such as “Select”, “Insert”, “Update”, “Delete”, relations, but also information about each relation, including
“Create”, and “Drop” can be used to accomplish almost • The schema for each relation
everything that one needs to do with a database. • The domain of values associated with each attribute
The SQL language has several parts: • The integrity constraints
• Data-definition language (DDL). The SQL DDL • The set of indices to be maintained for each relation
provides commands for defin-ing relation schemas, deleting
• The security and authorization information for each relation
relations, and modifying relation schemas.
• The physical storage structure of each relation on disk
• Interactive data-manipulation language (DML). The
SQL DML includes a query language based on both the Creating Tables
relational algebra and the tuple relational calculus. It includes The create table statement is used to create a new table. Here is
also commands to insert tuples into, delete tuples from, and the format of a simple create table statement:
modify tuples in the database. create table “tablename”
• View definition. The SQL DOL includes commands for (“column1” “data type”,
defining views. “column2” “data type”,
• Transaction control. SQL includes commands for “column3” “data type”);
specifying the beginning and ending of transactions.
Format of create table if you were to use optional constraints:
• Embedded SQL and dynamic SQL. Embedded and
dynamic SQL define how SQL statements can be embedded create table “tablename”
within general-purpose programming lan-guages, such as C, (“column1” “data type”
C++, Java, PUr, Cobol, Pascal, and Fortran. [constraint],
• Integrity. The SQL DDL includes commands for specifying “column2” “data type”
integrity constraints that the data stored in the database
[constraint],
must satisfy. Updates that violate integrity constraints are
disallowed. “column3” “data type”
40
following information about your new employees: firstname,
DATABASE
Example
create table employee lastname, title, age, and salary.
(first varchar(15),
last varchar(20),
age number(3),
MANAGEMENT
address varchar(30),
city varchar(20),
state varchar(20));
To create a new table, enter the keywords create table followed Drop a Table
by the table name, followed by an open parenthesis, followed The drop table command is used to delete a table and all rows
by the first column name, followed by the data type for that in the table.
column, followed by any optional constraints, and followed by
To delete an entire table including all of its rows, issue the drop
a closing parenthesis. It is important to make sure you use an
table command followed by the tablename. drop table is
open parenthesis before the beginning table, and a closing
different from deleting all of the records in the table. Deleting
parenthesis after the end of the last column definition. Make
all of the records in the table leaves the table including column
sure you seperate each column definition with a comma. All
and constraint information. Dropping the table removes the
SQL statements should end with a “;”.
table definition as well as all of its rows.
The table and column names must start with a letter and can be
drop table “tablename”
followed by letters, numbers, or underscores - not to exceed a
total of 30 characters in length. Do not use any SQL reserved Example
keywords as names for tables or column names (such as drop table myemployees;
“select”, “create”, “insert”, etc). Inserting into a Table
Data types specify what the type of data can be for that particu- The insert statement is used to insert or add a row of data into
lar column. If a column called “Last_Name”, is to be used to the table.
hold names, then that particular column should have a To insert records into a table, enter the key words insert into
“varchar” (variable-length character) data type. followed by the table name, followed by an open parenthesis,
Here are the most common Data types: followed by a list of column names separated by commas,
Fixed-length character string. Size is specified in followed by a closing parenthesis, followed by the keyword
char(size)
parenthesis. Max 255 bytes. values, followed by the list of values enclosed in parenthesis.
Varchar(size)
Variable-length character string. Max size is specified in The values that you enter will be held in the rows and they will
parenthesis. match up with the column names that you specify. Strings
Number value with a max number of column digits
number(size)
specified in parenthesis.
should be enclosed in single quotes, and numbers should not.
Date Date value insert into “tablename”
Number value with a maximum number of digits of "size" (first_column,...last_column)
number(size,d) total, with a maximum number of "d" digits to the right of
the decimal. values (first_value,...last_value);
Number value with a maximum number of digits of "size" In the example below, the column name first will match up
number(size,d) total, with a maximum number of "d" digits to the right of
the decimal. with the value ‘Luke’, and the column name state will match up
with the value ‘Georgia’.
Constraints
When tables are created, it is common for one or more columns Example
to have constraints associated with them. A constraint is insert into employee
basically a rule associated with a column that the data entered (first, last, age, address, city, state)
into that column must follow. For example, a “unique”
values (‘Luke’, ‘Duke’, 45, ‘2130 Boars Nest’,
constraint specifies that no two records can have the same value
in a particular column. They must all be unique. The other two ‘Hazard Co’, ‘Georgia’);
most popular constraints are “not null” which specifies that a Note: All strings should be enclosed between single quotes:
column can’t be left blank, and “primary key”. A “primary key” ‘string’
constraint defines a unique identification of each record (or row)
Students Activity
in a table.
1. Insert data into your new employee table.
It’s now time for you to design and create your own table. If
you decide to change or redesign the table, you can either drop it
and recreate it or you can create a completely different one.
Students Activity
You have just started a new company. It is time to hire some
employees. You will need to create a table that will contain the
41
DATABASE
Jonie Weber, Secretary, 28, 19500.00 match any possible character that might appear before or after
Potsy Weber, Programmer, 32, 45300.00 the characters specified. For example:
Dirk Smith, Programmer II, 45, 75020.00 select first, last, city
from empinfo
where first LIKE ‘Er%’;
This SQL statement will match any first names that start with
‘Er’. Strings must be in single quotes.
Or you can specify,
select first, last
from empinfo
3. Enter these employees into your table first, and then insert where last LIKE ‘%s’;
at least 5 more of your own list of employees in the table This statement will match any last names that end in a ‘s’.
select * from empinfo
where first = ‘Eric’;
This will only select rows where the first name equals ‘Eric’
exactly. Sample Table: empinfo
first Last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Selecting Data Mary Ann Edwards 88233 32 Phoenix Arizona
The select statement is used to query the database and retrieve
Ginger Howell 98002 42 Cottonwood Arizona
selected data that match the criteria that you specify. Here is the
format of a simple select statement: Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
select “column1”
Mary Ann May 32326 52 Tucson Arizona
[,”column2",etc]
Erica Williams 32327 60 Show Low Arizona
from “tablename”
Leroy Brown 32380 22 Pinetop Arizona
[where “condition”];
Elroy Cleaver 32382 22 Globe Arizona
[] = optional
The column names that follow the select keyword determine
which columns will be returned in the results. You can select as Some more examples
many column names that you’d like, or you can use a “*” to
select all columns. select first, last, city from empinfo;
The table name that follows the keyword from specifies the select last, city, age from empinfo
table that will be queried to retrieve the desired results. where age > 30;
The where clause (optional) specifies which data values or rows select first, last, city, state from empinfo
will be returned or displayed, based on the criteria described where first LIKE ‘J%’;
after the keyword where. select * from empinfo;
=
Conditional selections Equal
used in the where clause: select first, last, from empinfo
> Greater than
where last LIKE ‘%s’;
< Less than
select first, last, age from empinfo
>= Greater than or equal
where last LIKE ‘%illia%’;
<= Less than or equal
select * from empinfo where first = ‘Eric’;
<> Not equal to
LIKE *See note below
42
7. Select all columns for everyone whose last name ends in
DATABASE
Student Activity
1. Display the first name and age for everyone that’s in the “ith”.
table.
MANAGEMENT
Points to Ponder
2. Display the first name, last name, and city for everyone • SQL is used to communicate with a database.
that’s not from Payson.
• The set of relations in a database must be specified to the
system by means of a data definition language (DDL).
• The create table statement is used to create a new table
• A constraint is basically a rule associated with a column that
the data entered into that column must follow
• The drop table command is used to delete a table and all
rows in the table.
• The select statement is used to query the database and
3. Display all columns for everyone that is over 40 years old. retrieve selected data that match the criteria that you specify.
4. Display the first and last names for everyone whose last • The insert statement is used to insert or add a row of data
name ends in an “ay”. into the table.
• The Like pattern matching operator can also be used in the
conditional selection of the where clause
Review Terms
• SQL
• DDL
• DML
• Constraints
5. Display all columns for everyone whose first name equals • Insert
“Mary”. • Create
• Select
• Drop
• Constraints
• Like
43
Student Notes
44
DATABASE MANAGEMENT
DATABASE MANAGEMENT
45
LESSON 12:
LAB
LESSON 13:
LAB
46
DATABASE MANAGEMENT
DATABASE
LESSON 14:
SQL-II
MANAGEMENT
Lesson Objective *The WHERE clause (optional) specifies which data values or
• Elaborating Select statement rows will be returned or displayed, based on the criteria
described after the keyword where.
• Rename operator
• Aggregate function Example
SELECT name, age, salary FROM employee WHERE age > 50;
• Having
The above statement will select all of the values in the name,
• Order-by
age, and salary columns from the employee table whose age is
• Group-by greater than 50.
• IN & Between
Comparison Operators
The SELECT statement is the core of SQL, and it is likely that
the vast majority of your SQL commands will be SELECT = Equal
statements.There are enormous amount of options available > Greater than
for the SELECT statement.When constructing SQL Queries < Less than
(with the SELECT statement), it is very useful to know all of >= Greater than or equal to
the possible options and the best or more efficient way to do
<= Less than or equal to
things
<> or != Not equal to
The Rename Operation LIKE String comparison test
SQL provides a mechanism for renaming both relations and
*Note about LIKE
attributes. It uses the as clause, taking the form:
old-name as new-name Example
Select name, title, dept
Example
From employee
select first as name, last, city as emp_city from empinfo
Where title Like ‘Pro%’;
Select Statement The above statement will select all of the rows/values in the
The SELECT statement is used to query the database and name, title, and dept columns from the employee table whose
retrieve selected data that match the criteria that you specify. title starts with ‘Pro’. This may return job titles including
The SELECT statement has five main clauses to choose from, Programmer or Pro-wrestler.
although, FROM is the only required clause. Each of the clauses All and Distinct are keywords used to select either All (default)
have a vast selection of options, parameters, etc. The clauses will or the “distinct” or unique records in your query results. If you
be listed below. would like to retrieve just the unique records in specified
Here is the format of the Select statement: columns, you can use the “Distinct” keyword. Distinct will
discard the duplicate records for the columns you specified after
Select [All | Distinct] column1[,column2]
the “SELECT” statement: For example:
From table1[,table2]
[Where “conditions”] Select Distinct age FROM employee_info;
[Group By “column-list”] This statement will return all of the unique ages in the
[Having “conditions] employee_info table.
[Order By “column-list” [Asc | Desc] ] ALL will display “all” of the specified columns including all of
Select and From Clause Review the duplicates. The ALL keyword is the default if nothing is
Select first_column_name, second_column_name specified.
From table_name Note: The following two tables we will be using
Where first_column_name > 1000;
*The column names that follow the SELECT keyword
determine which columns will be returned in the results. You
can select as many column names that you’d like, or you can use
a * to select all columns. The order they are specified will be the
order that they are returned in your query results.
*The table name that follows the keyword FROM specifies the
table that will be queried to retrieve the desired results.
47
2. Select all columns from the items_ordered table for
DATABASE
Items_ordered
whoever purchased a Tent.
customerid order_date item quantity Price
10330 30-Jun-1999 Pogo stick 1 28.00
10101 30-Jun-1999 Raft 1 58.00
10298 01-Jul-1999 Skateboard 1 33.00
MANAGEMENT
48
SELECT column1, SUM(column2)
DATABASE
Example
SELECT Count(*)FROM employees; FROM “list-of-tables”
This particular statement is slightly different from the other GROUP BY “column-list”;
aggregate functions since there isn’t a column supplied to the
Let’s say you would like to retrieve a list of the highest paid
count function. This statement will return the number of rows
salaries in each dept:
in the employees table.
MANAGEMENT
SELECT max(salary), dept
Students Activity FROM employee
1. Select the maximum price of any item ordered in the GROUP BY dept;
items_ordered table. Hint: Select the maximum price only. This statement will select the maximum salary for the people in
each unique department. Basically, the salary for the person who
makes the most in each department will be displayed. Their,
salary and their department will be returned.
For example, take a look at the items_ordered table. Let’s say
you want to group everything of quantity 1 together, everything
of quantity 2 together, everything of quantity 3 together, etc. If
you would like to determine what the largest cost item is for
each grouped quantity (all quantity 1’s, all quantity 2’s, all
2. Select the average price of all of the items ordered that were quantity 3’s, etc.), you would enter:
purchased in the month of Dec. SELECT quantity, max(price)
FROM items_ordered
GROUP BY quantity;
Enter the statement in above, and take a look at the results to
see if it returned what you were expecting. Verify that the
maximum price in each Quantity Group is really the maximum
price.
GROUP BY - Multiple Grouping Columns - What if ?
What if you ALSO want to display their lastname for the query
3. What are the total number of rows in the items_ordered below:
table? SELECT max(salary), dept
FROM employee
GROUP BY dept;
What you’ll need to do is:
SELECT lastname, max(salary), dept
FROM employee
GROUP BY dept, lastname;
This is a called “multiple grouping columns”.
Students Activity
4. For all of the tents that were ordered in the items_ordered
1. How many people are in each unique state in the customers
table, what is the price of the lowest tent? Hint: Your query
table? Select the state and display the number of people in
should return the price only.
each. Hint: count is used to count rows in a column, sum
works on numeric data only.
Group by Clause
The Group By clause will gather all of the rows together that
contain data in the specified column(s) and will allow aggregate
2. From the items_ordered table, select the item, maximum
functions to be performed on the one or more columns. This
price, and minimum price for each specific item in the table.
can best be explained by an example:
Hint: The items will need to be broken up into separate
GROUP BY clause syntax: groups.
49
DATABASE
price, and minimum price for each specific item in the table.
Only display the results if the maximum price for one of
the items is greater than 190.00.
3. How many orders did each customer make? Use the
items_ordered table. Select the customerid, number of
orders they made, and the sum of their orders. Click the
Group By answers link below if you have any problems.
Having Clause
The HAVING clause allows you to specify conditions on the
rows for each group - in other words, which rows should be
selected will be based on the conditions you specify. The
HAVING clause should follow the GROUP BY clause if you
are going to use it.
HAVING clause syntax:
SELECT column1, SUM(column2) Order by Clause
FROM “list-of-tables” ORDER BY is an optional clause which will allow you to
GROUP BY “column-list” display the results of your query in a sorted order (either
HAVING “condition”; ascending order or descending order) based on the columns
that you specify to order by.
HAVING can best be described by example. Let’s say you have
an employee table containing the employee’s name, department, ORDER BY clause syntax:
salary, and age. If you would like to select the average salary for SELECT column1, SUM(column2)
each employee in each department, you could enter: FROM “list-of-tables”
SELECT dept, avg(salary) ORDER BY
FROM employee “column-list” [ASC | DESC];
GROUP BY dept; [ ] = optional
But, let’s say that you want to ONLY calculate & display the This statement will select the employee_id, dept, name, age, and
average if their salary is over 20000: salary from the employee_info table where the dept equals
SELECT dept, avg(salary) ‘Sales’ and will list the results in Ascending (default) order based
FROM employee on their Salary.
GROUP BY dept ASC = Ascending Order - default
HAVING avg(salary) > 20000; DESC = Descending Order
Students Activity For example
1. How many people are in each unique state in the customers SELECT employee_id, dept, name, age, salary
table that have more than one person in the state? Select the FROM employee_info
state and display the number of how many people are in WHERE dept = ‘Sales’
each if it’s greater than 1. ORDER BY salary;
If you would like to order based on multiple columns, you
must seperate the columns with commas. For example:
SELECT employee_id, dept, name, age, salary
FROM employee_info
WHERE dept = ‘Sales’
50
ORDER BY salary, age DESC; greater than or equal to 50000.00 AND the title is equal to
DATABASE
‘Programmer’. Both of these conditions must be true in order
Students Activity
for the rows to be returned in the query. If either is false, then it
1. Select the lastname, firstname, and city for all customers in will not be displayed.
the customers table. Display the results in Ascending Order
Although they are not required, you can use paranthesis around
based on the lastname.
MANAGEMENT
your conditional expressions to make it easier to read:
SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE (salary >= 50000.00) AND (title = ‘Programmer’);
Another Example
SELECT firstname, lastname, title, salary
FROM employee_info
2. Same thing as exercise #1, but display the results in WHERE (title = ‘Sales’) OR (title = ‘Programmer’);
Descending order.
This statement will select the firstname, lastname, title, and
salary from the employee_info table where the title is either
equal to ‘Sales’ OR the title is equal to ‘Programmer’.
Students Activity
1. Select the customerid, order_date, and item from the
items_ordered table for all items unless they are ‘Snow
Shoes’ or if they are ‘Ear Muffs’. Display the rows as long
as they are not either of these two items.
3. Select the item and price for all of the items in the
items_ordered table that the price is greater than 10.00.
Display the results in Ascending order based on the price.
2. Select the item and price of all items that start with the
letters ‘S’, ‘P’, or ‘F’.
Combining conditions and Boolean
Operators
The AND operator can be used to join two or more conditions
in the WHERE clause. Both sides of the AND condition must
be true in order for the condition to be met and for those rows
to be displayed.
SELECT column1,
SUM(column2)
FROM “list-of-tables” In and Between Conditional Operators
SELECT col1, SUM(col2)
WHERE “condition1” AND “condition2”;
FROM “list-of-tables”
The OR operator can be used to join two or more conditions in
the WHERE clause also. However, either side of the OR WHERE col3 IN (list-of-values);
operator can be true and the condition will be met - hence, the SELECT col1, SUM(col2)
rows will be displayed. With the OR operator, either side can be FROM “list-of-tables”
true or both sides can be true. WHERE col3 BETWEEN value1 AND value2;
For Example The IN conditional operator is really a set membership test
SELECT employeeid, firstname, lastname, title, salary operator. That is, it is used to test whether or not a value (stated
before the keyword IN) is “in” the list of values provided after
FROM employee_info
the keyword IN.
WHERE salary >= 50000.00 AND title = ‘Programmer’;
For Example
This statement will select the employeeid, firstname, lastname,
SELECT employeeid, lastname, salary
title, and salary from the employee_info table where the salary is
51
FROM employee_info
DATABASE
2. Select the firstname, city, and state from the customers table
for all of the rows where the state value is either: Arizona,
Washington, Oklahoma, Colorado, or Hawaii.
52
DATABASE
Points to Ponder
• SQL provides rename operator both relations and attributes
• Aggregate functions are used to compute against a
“returned column of numeric data” from your SELECT
statement.
MANAGEMENT
• The GROUP BY clause will gather all of the rows together
that contain data in the specified column(s) and will allow
aggregate functions to be performed on the one or more
columns.
• The HAVING clause allows you to specify conditions on
the rows for each group
• ORDER BY is an optional clause which will allow you to
display the results of your query in a sorted order
Review Terms
• Rename operator
• Aggregate function
• Having
• Order-by
• Group-by
53
Student Notes
54
DATABASE MANAGEMENT
DATABASE MANAGEMENT
55
LESSON 15:
LAB
LESSON 16:
LAB
56
DATABASE MANAGEMENT
DATABASE
LESSON 17:
SQL-III
MANAGEMENT
Lesson Objective Now, whenever a purchase is made from a repeating customer,
• Table joins the 2nd table, “Purchases” only needs to be updated! We’ve just
eliminated useless redundant data, that is, we’ve just normal-
• Tuple variable
ized this database!
• String operators
Notice how each of the tables have a common
• Set operators “cusomer_number” column. This column, which contains the
• Views unique customer number will be used to JOIN the two tables.
• Update Using the two new tables, let’s say you would like to select the
customer’s name, and items they’ve purchased. Here is an
• Delete
example of a join statement to accomplish this:
Table Joins SELECT customer_info.firstname, customer_info.lastname,
All of the queries up until this point have been useful with the purchases.item
exception of one major limitation - that is, you’ve been
selecting from only one table at a time with your SELECT FROM customer_info, purchases
statement. It is time to introduce you to one of the most WHERE customer_info.customer_number =
beneficial features of SQL & relational database systems - the purchases.customer_number;
“Join”. To put it simply, the “Join” makes relational database This particular “Join” is known as an “Inner Join” or
systems “relational”. “Equijoin”. This is the most common type of “Join” that you
Joins allow you to link data from two or more tables together will see or use.
into a single query result—from one single SELECT statement. Notice that each of the columns are always preceeded with the
A “Join” can be recognized in a SQL SELECT statement if it table name and a period. This isn’t always required, however, it
has more than one table after the FROM keyword. is good practice so that you wont confuse which colums go
with what tables. It is required if the name column names are
For example the same between the two tables. I recommend preceeding all
SELECT “list-of-columns” of your columns with the table names when using joins.
FROM table1,table2 Note: The syntax described above will work with most
WHERE “search-condition(s)” Database Systems . However, in the event that this doesn’t
Joins can be explained easier by demonstrating what would work with yours, please check your specific database documenta-
happen if you worked with one table only, and didn’t have the tion.
ability to use “joins”. This single table database is also some- Although the above will probably work, here is the ANSI SQL-
times referred to as a “flat table”. Let’s say you have a one-table 92 syntax specification for an Inner Join using the preceding
database that is used to keep track of all of your customers and statement above that you might want to try:
what they purchase from your store: SELECT customer_info.firstname, customer_info.lastname,
Everytime a new row is inserted into the table, all columns will purchases.item
be be updated, thus resulting in unnecessary “redundant data”. FROM customer_info INNER JOIN purchases
For example, every time Wolfgang Schultz purchases some- ON customer_info.customer_number =
thing, the following rows will be inserted into the table: purchases.customer_number;
id first last Address city state zip date item price Another Example
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 032299 snowboard 45.00
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 091199 gloves 15.00 SELECT employee_info.employeeid,
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 100999 lantern 35.00
employee_info.lastname, employee_sales.comission
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 022900 tent 85.00
FROM employee_info, employee_sales
WHERE employee_info.employeeid =
An ideal database would have two tables:
employee_sales.employeeid;
1. One for keeping track of your customers
This statement will select the employeeid, lastname (from the
2. And the other to keep track of what they purchase: employee_info table), and the comission value (from the
“Customer_info” table: employee_sales table) for all of the rows where the employeeid
customer_number firstname lastname address city state zip in the employee_info table matches the employeeid in the
employee_sales table.
“Purchases” table:
customer_number date item price
57
DATABASE
Students Activity String Operations
1. Write a query using a join to determine which items were SQL specifies strings by enclosing them in single quotes, for
ordered by each of the customers in the customers table. example, ‘Perryridge’, as we saw earlier. A single quote character
Select the customerid, firstname, lastname, order_date, that is part of a string can be specified by using two single quote
item, and price for everything each customer purchased in characters; for example the string “It’s right” can be specified by
‘It” s right’.
MANAGEMENT
58
consist of the attributes of the Right-hand-side relation result. Although outer -join expressions are typically used in the
DATABASE
followed by the attributes of the right-hand-side relation. from clause, they can be used anywhere that a relation can be
Note that the attribute loan-number appears twice in the figure- used.
the first occur-rence is from loan, and the second is from Each of the variants of the join operations in SQL consists of a
borrower. The SQL standard does not require attribute names join type and a join condition. The join condition defines which
MANAGEMENT
in such results to be unique. An as clause should be used to tuples in the two relations match and what attributes are
assign unique names to attributes in query and subquery results. present in the result of the join. The join type defines how
We rename the result relation of a join and the attributes of the tuples in each
result relation by using an as clause, as illustrated here: Branch name
Loan number Amount Customer name
loan inner join borrower on loan. loan-number = L-170 Downtown 3000 Jones
borrower.loan-number as Lb(loan-number, branch, amount, L- 230 Redwood 4000 Smith
cust, cust-Loan-num)
The result of loan natural inner join borrower.
We rename the second occurrence of loan-number to cust- loan-
num. The ordering of the attributes in the result of the join is
Join types
important for the renaming.
Inner join
Next, we consider an example of the left outer join operation: Left outer join
loan left outer join borrower on loan. Loan-number = bor- Right outer join
rower. loan-number Full outer join
Loan number Branch name Amount Customer name Loan number
Join Types and Join Conditions
L-170 Downtown 3000 Jones L-170
The first join type is the inner join, and the other three are the
L-230 Redwood 4000 Smith L-230
outer joins. Of the three join conditions, we have seen the
The result of loan inner join borrower on natural join and the on condition before, and we shall discuss
loan. loan-number = borrower .loan-number. the using condition, later in this section.
The use of a join condition is mandatory for outer joins, but is
Loan number Branch name Amount Customer name Loan number optional for inner joins (if it is omitted, a Cartesian product
L-170 Downtown 3000 Jones L-170
L- 230 Redwood 4000 Smith L-230
results). Syntactically, the keyword natural appears before the
L-260 Perryridge 1700 null null join type, as illustrated earlier, whereas the on and using con-
ditions appear at the end of the join expression. The keywords
The result of loan left outer join borrower on loan. loan- inner and outer are optional, since the rest of the join type
number = borrower .loan-number. enables us to deduce whether the join is an inner join or an
We can compute the left outer join operation logically as outer join.
follows. First, compute the result of the inner join as before. The meaning of the join condition natural, in terms of which
Then, for every tuple t in the left-hand-side relation loan that tuples from the two relations match, is straightforward. The
does not match any tuple in the right-hand-side relation ordering of the attributes in the result of a natural join is as
borrower in the inner join, add a tuple r to the result of the follows. The join attributes (that is, the attributes common to
join: The attributes of tuple r that are derived from the left- both relations) appear first, in the order in which they appear in
hand-side relation are filled in with the values from tuple t, and the left-hand-side relation. Next come all nonjoin attributes of
the remaining attributes of r are filled with null values. The the left-hand-side relation, and finally all nonjoin attributes of
tuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000) the right-hand-side relation.
join with tuples from borrower and appear in the result of the
inner join, and hence in the result of the left outer join. On the The right outer join is symmetric to the left outer join. Tuples
other hand, the tuple (L-260, Perryridge, 1700) did not match from the right-hand--side relation that do not match any tuple
any tuple from borrower in the inner join, and hence a tuple (L- in the left-hand-side relation are padded with nulls and are
260, Perryridge, 1700, null, null) is present in the result of the added to the result of the right outer join.
left outer join. Here is an example of combining the natural join condition
Finally, we consider an example of the natural join operation: with the right outer join type:
loan natural inner join borrower loan natural right outer join borrower
This expression computes the natural join of the two relations. The attributes of the result are defined by the join type, which is
The only attribute name common to loan and borrower is loan- a natural join; hence, loan-number appears only once. The first
number. However, the attribute loan-number appears only once two tuples in the result are from the inner natural join of loan
in the result of the natural join, whereas it appears twice in the and borrower. The tuple (Hayes, L-155) from the right-hand-
result of the join with the on condition. side relation does not match any tuple from the left-hand-side
relation loan in the natural inner join. Hence, the tuple (L-155,
Join Types and Conditions null, null, Hayes) appears in the join result.
We saw examples of the join operations permitted in SQL Join
op-erations take two relations and return another relation as the
59
The join condition using (Ai, A2, . . . ,An) is similar to the
DATABASE
natural join condition, ex-cept that the join attributes are the
attributes A1, A2, . . . , An, rather than all attributes that are
common to both relations. The attributes A1, A2,…, An must
consist of only attributes that are common to both relations,
and they appear only once in the result of the join.
MANAGEMENT
The full outer join is a combination of the left and right outer-
join types. After the operation computes the result of the inner
join, it extends with nulls tuples from
Join types
Inner join
Left outer join
Right outer join
Full outer join
Review Terms
• Table joins
• Tuple variable
• String operators
• Set operators
• Views
• Update
• Delete
60
DATABASE MANAGEMENT
61
Student Notes
LESSON 18:
LAB
62
DATABASE MANAGEMENT
DATABASE MANAGEMENT
63
LESSON 19:
LAB
DATABASE
LESSON 20:
SQL-IV
MANAGEMENT
and the set of customers who have a loan at the bank, which from borrower)
can be derived by The number of duplicate tuples that appear in the result is
select customer-name equal to the minimum number of duplicates in both d and b.
Thus, if Jones has three accounts and two loans at the bank,
from borrower
then there will be two tuples with the name Jones in the result.
We shall refer to the relations obtained as the result of the
preceding queries as d and b, respectively.
The Except Operation
To find all customers who have an account but no loan at the
The Union Operation bank, we write
To find all customers having a loan, an account, or both at the (select distinct customer-name
bank, we write
from depositor)
(select customer-name
except (select customer-name
from depositor)
from borrower)
union
The except operation automatically eliminates duplicates. Thus,
(select customer-name in the preceding query, a tuple with customer name Jones will
from borrower) appear (exactly once) in the result only if Jones has an account at
The union operation automatically eliminates duplicates, unlike the bank, but has no loan at the bank.
the select clause. Thus, in the preceding query, if a customer-say, If we want to retain all duplicates, we must write except all in
Jones-has several accounts or loans (or both) at the bank, then place of except:
Jones will appear only once in the result. (select customer-name
If we want to retain all duplicates, we must write union all in
from depositor)
place of union:
except all (select customer-name from borrower)
(select customer-name
The number of duplicate copies of a tuple in the result is equal
from depositor)
to the number of duplicate copies of the tuple in d minus the
union all (select Customer-name number of duplicate copies of the tuple in b, provided that the
from borrower) difference is positive. Thus, if Jones has three accounts and one
The number of duplicate tuples in the result is equal to the loan at the bank, then there will be two tuples with the name
total number of duplicates that appear in both d and b. Thus, Jones in the result. If, instead, this customer has two accounts
if Jones has three accounts and two loans at the bank, then and three loans at the bank, there will be no tuple with the
there will be five tuples with the name Jones in the result. name Jones in the result.
64
[and|or “column”
DATABASE
Views
We define a view in SQL by using the create view command. To OPERATOR “value”];
define a view, we must give the view a name and must state the
[] = optional
query that computes the view. The form of the create view
command is Examples
update phone_book
MANAGEMENT
create view v as <query expression>
where <query expression> is any legal query expression. The set area_code = 623
view name is repre-sented by v. where prefix = 979;
As an example, consider the view consisting of branch names update phone_book
and the names of customers who have either an account or a set last_name = ‘Smith’, prefix=555, suffix=9292
loan at that branch. Assume that we want this view to be called
where last_name = ‘Jones’;
all-customer. We define this view as follows:
update employee
create view all-customer as
set age = age+1
(select branch-name, customer-name
where first_name=’Mary’ and last_name=’Williams’;
from depositor, account
where depositor.account-number = account.account-
Students Activity
number) 1. Jonie Weber just got married to Bob Williams. She has
requested that her last name be updated to Weber-Williams.
union
(select branch-name, customer-name
from borrower, loan
where borrower. loan-number = loan.loan-number)
The attribute names of a view can be specified explicitly as
follows:
create view branch-total-Ioan(branch-name, total-loan)
as 2. Dirk Smith’s birthday is today, add 1 to his age.
select branch-name, sum(amount)
from loan
group by branch-name
The preceding view gives for each branch the sum of the
amounts of all the loans at the branch. Since the expression
sum (amount) does not have a name, the attribute name is
specified explicitly in the view definition.
3. All secretaries are now called “Administrative Assistant”.
View names may appear in any place that a relation name may
Update all titles accordingly.
appear. Using the view all-customer, we can find all customers
of the Perryridge branch by writing
select customer-name
from all-customer
where branch-name = ‘Perryridge’
Modification of the Database
Update
The update statement is used to update or change records that 4. Everyone that’s making under 30000 are to receive a 3500 a
match a specified criteria. This is accomplished by carefully year raise.
constructing a where clause.
update “tablename”
set “columnname” =
“newvalue”
[,”nextcolumn” =
“newvalue2”...]
where “columnname”
OPERATOR “value”
65
5. Everyone that’s making over 33500 are to receive a 4500 a
DATABASE
Students Activity
year raise. 1. Jonie Weber-Williams just quit, remove her record from the
table.
MANAGEMENT
66
DATABASE MANAGEMENT
67
LESSON 21:
LAB
LESSON 22:
LAB
68
DATABASE MANAGEMENT
DATABASE
LESSON 23:
INTEGRITY AND SECURITY
MANAGEMENT
Lesson Objective Null Value Concepts
• Data constraints While creating tables, if a row lacks a data value for a particular
column, that value is said to be null. Columns of any data types
• Column Level Constraints
may contain null values unless the column was defined as not
• Level Constraints null when the table was created.
• Table level constraints
Principles of Null values
• NULL value
• Setting a null value is appropriate when the actual value is
• Primary Key unknown, or when a value would not be meaningful.
• Unique key • A null value is not equivalent to a value of zero.
• Default key • A null value will evaluate to null in any expression. e.g. null
• Foreign key multiplied by 10 is null.
• NOT NULL constraints • When a column name is defined as not null, then that
• Check constraints column becomes a mandatory column. It implies that the
user is forced to enter data into that column.
Data Constraints
Example: Create table client master with a not null constraint
Integrity constraints ensure that changes made to the database
on columns client no, Name, address, address2.
by authorized users do not result in a loss of data consistency.
Thus, integrity constraints guard against accidental damage to NOT NULL as a column constraint:
the database. CREATE TABLE client master
Besides the cell name, cell length and cell data type, there are (client_no varchar2(6) NOT NULL,
other parameters i.e. other data constraints that can be passed to name varchar2(20) NOT NULL,
the DBA at cell creation time.
address 1 varchar2(30) NOT NULL,
These data constraints will be connected to a cell by the DBA as
address2 varchar2(30) NOT NULL,
flags. Whenever a user attempts to load a cell with data, the
DBA will check the data being loaded into the cell against the city varchar2(15), state varchar2(15), pin code number( 6),
data constraints defined at the time the cell was created. If the remarks varchar2(60), bal_due number (10,2));
data being loaded fails any of the data constraint checks fired by
Primary Key Concepts
the DBA, the DBA will not load the data into the cell, reject the
A primary key is one or more columns in a table used to
entered record, and will flash an error message to the user.
uniquely identify each row. in the table. Primary key values must
These constraints are given a constraint name and the DBA not be null and must be unique across the column.
stores the constraints with its name and instructions internally
A multicolumn primary key is called a composite primary key.
along with the cell itself
The only function that a primary key performs is to uniquely
The constraint can either be placed at the column level or at the identify a row and thus if one column is used it is just as good
table level. as if multiple columns are used. Multiple columns i.e. (com-
Column Level Constraints posite keys) are used only when the system designed requires a
If the constraints are defined along with the column definition, primary key that cannot be contained in a single column.
it is called as a column level constraint. Column level constraint Examples
can be applied to anyone column at a time i.e. they are local to a Primary Key as a column constraint:
specific column. If the constraint spans across multiple
Create client_master where client_no is the primary key.
columns, the user will have to use table level constraints.
CREATE TABLE client master
Table Level Constraints
(client_no varchar2(6) PRIMARY KEY,
If the data constraint attached to a specific cell in a table
references the contents of another cell in the table then the user name varchar2(20), add}-essl varchar2(30), address2
will have to use table level constraints. Table level constraints are varchar2(30),
stored as a part of the global table definition. city varcbar2(15), state varchar2(15), pincode number(6),
Examples of different constraints that can be applied on the remarks varchar2(60), bal_due number (10,2));
table are as follows: Primary Key as a table constraint: Create a sales order details
table where
69
CREATE TABLE sales_order
DATABASE
70
DATABASE
FOREIGN KEY as a Table Constraint Check with not Null Integrity Constraints
Create Table sales order details According to the ANSI I ISO standard, a not null integrity
( s _order_no varchar2( 6), constraint is an example of a CHECK integrity constraint,
where the condition is, CHECK (column_name IS NOT
product_no varchar2(6),
NULL). Therefore, the not null integrity constraint for a single
qty_ordered number(8), qty_disp number(8), column can, in practice be written in two forms, by using the
MANAGEMENT
product_rate number(8,2), not null constraint or by using a CHECK constrllint. For ease
Primary Key (s_order_no, product_no), of use, you should always choose to define the not null
integrity constraint instead of a CHECK constraint with the is
Foreign Key (s_order_no) References sales_order);
not null condition.
Check Integrity Constraints Here we shall look at the method via which data constraints can
Use the Check constraint when you need to enforce integrity be attached to the cell so that data validation can be done at
rules that can be evaluated based on a logical expression. Never table level itself using the power of the DBA. .
use Check constraints if the constraint can be defined using the
A constraint clause restricts the range of valid values for one
not null, primary key or foreign key constraint.
column (a column constraint) or for a group of columns (a
Following are a few examples of appropriate CHECK con- table constraint). Any INSERT, UPDATE or DELETE
straints: statement evaluates a relevant constraint; the constraint must be
• a Check constraint on the client no column of the client satisfied for the statement to succeed.
master so that no client no value starts with ‘C’. Constraints can be connected to a table by CREATE TABLE or
• a Check constant on name column of the client master so ALTER TABLE command. Use ALTER T ABLE to add. or
that the name is entered in upper case. drop the constraint from a table. Constraints are recorded in the
• a Check constraint on the city column of the client_master data dictionary. If you don’t name a constraint, it is assigned the
so that only the cities “Bombay”, “New Delhl “, “Mapras” name SYS - Cn, where n is an integer that makes the name
and “Calcutta” are allowed. unique in the database.
Create T Able client master Restrictions on Check Constraints
(client_no varchar2(6) Constraint ck_clientno A Check integrity constraint requires that a condition be true or
unknown for every row of the table. If a statement causes the
Check ( client_no like ‘C%’),
condition to evaluate to false; the statement is rolled back. The
name varchar2(20) Constraint ck_cname condition of a CHECK constraint has the following limita-
Check (name = upper(name», tions:
address I varchar2(30), address2 varchar2(30), • The condition must be a Boolean expression that can be
city varchar2( 15) Constraint ck _city evaluated using the values in the row being inserted or
updated.
Check (city In Cnewdelhi’, ‘Bombay’, ‘Calcutta’, ‘Madras’)),
• The condition cannot contain sub queries or sequences.
state varchar2(l5), pin code number(6),
• The condition cannot include the Sysdate, Uid, User or
remarks varchar2(60), bal- due number(10,2));
Userenv SQL function
When using CHECK constraints, consider the ANSI I ISO
standard which states that a CHECK constraint is violated only Defining Different Constraints on the Table
if the condition evaluates to False, True and unknown values Create a sales_order _details table where
do not violate a check condition. Therefore, make sure that a Column Name: Data Type Size Attributes
CHECK constraint that you define actually enforces the rule you Primary Key, Foreign Key
need to enforce. S_order_ no varchar2 6 references s order no of sales
order table.
For example, consider the following CHECK constraint for
Primary Key, Foreign Key
emp table: product_no varchar2 6 references product_no of
CHECK ( sal > 0 or comm >= 0 ) product_master table.
qty_ordered number 8 not null
At first glance, this rule may be interpreted as “do not allow a Qty_disp number 8.
row in emp table unless the employee’s salary is greater than 0 product_rate number 8,2 not null
or the employee’s commission is greater than or equal to “0”.
However, note that if a row is inserted with a null salary and a
negative commission, the row does not violate the CHECK Create Table sales_order details
constraint because the entire check condition is evaluated as (s_order_no varchar2(6) Contraint order_fkey
unknown. In this particular case, you can account for such References sales_order,
violations by placing not null integrity constraint on both the product_no varchar2(6) Constraint product_fkey
sal and comm columns.
References product_master,
qty _ordered number(8) Not Null,
71
qty - disp number(8), • When the user is loading a ‘record’ with values and leaves
DATABASE
product_rate number(8,2) Not Null, this cell empty, the DBA will automatically load this cell
with the default value specified.
Primary Key (s_order_no, product_no));
• Foreign keys represent relationships between tables. A
Defining Integrity Constraints in the Alter Table foreign key is a Column (or a group of columns) whose
MANAGEMENT
Command values -are derived from the primary key of the same or
You can also define integrity constraints using the constraint some other table.
clause in the Alter Table command. The following examples
• Use the CHECK constraint when you need to enforce
show the definitions of several integrity constraints:
integrity rules that can be evaluated based on a logical
1. Add Primary Key constant on column supplier_no in table expression
supplier_master;
• PL/SQL is Oracle’s procedural language extension to SQL.
Alter Table supplier_master PL/SQL enables you to mix SQL statements with
Add Primary Key (suppplier_no); procedural constructs
2. Add Foreign Key constraint on column s order no in table • Procedures, functions, and packages are all examples of PL/
sales order details referencing table sales_order, modify SQL program units.
column qty _ordered to include Not Null constant; • Database triggers are procedures that are stored in the
Alter Table sales order details database and are implicitly executed (fired) when the
Add Constraint order_tkey contents of a table are changed
Foreign Key (s_order_no) References sales_order Review Terms
Modify (qty_ordered number(8) Not Null); • Integrity constraints
Dropping Integrity Constraints in the Alter Table Cmmand • Table level constraints
You can drop an integrity constraint if the rule that it enforces is • Column level constraint
no longer true or if the constraint is no longer needed. Drop • Check constraints
the constraint using the Alter Table command with .the DROP
• NULL constraints
clause. The following examples illustrate the dropping of
integrity constraints: • Primary key
1. Drop the Primary Key constraint from supplier_master; • Foreign key
Alter Table supplier_master • Unique key
Drop Primary Key; Students Activity
2. Drop Foreign Key constraint on column product_no in 1. What is constraint?How many kinds of constraints are
table sales_order_details; there?
Alter Table sales order details
Drop Constraint product- fkey;
Note : Dropping UniquE and Primary Key constraints drops
the associated indexes.
Points to Ponder
• Integrity constraints ensure that changes made to the
database by authorized users do not result in a loss of data 2. Define primary key,unique key,foreign key?
consistency.
• If the constraints are defined along with the column
definition, it is called as a column level constraint.
• If the data constraint attached to a specific cell in a table
references the contents of another cell in the table then the
user will have to use table level constraints
• While creating tables, if a row lacks a data value for a
particular column, that value is said to be null. 3. Define check integrity constraints?
• A primary key is one or more columns in a table used to
uniquely identify each row.
• A unique key is similar to a primary key, except that the
purpose of a unique key is to ensure that information in
the column for each record is unique.
72
4. Differentiate between table level & column level constraint?
DATABASE
MANAGEMENT
5. Define Not Null constraint with the help of an example?
73
Student Notes
74
DATABASE MANAGEMENT
DATABASE MANAGEMENT
75
LESSON 24:
LAB
LESSON 25:
LAB
76
DATABASE MANAGEMENT
DATABASE
LESSON 26:
PL/SQL
MANAGEMENT
Lesson objectives Packages
• Difference between SQL & PL/SQL A package is a group of related procedures and functions,
together with the cursors and variables they use, stored together
• Stored procedures
in the database for continued use as a unit. Similar to
• Packages standalone procedures and functions, packaged procedures and
• Stored procedures functions can be called explicitly by applications or users.
• Functions Procedures and Functions
PL/SQL is Oracle’s procedural language extension to SQL. PL/ A procedure or function is a schema object that consists of a set
SQL enables you to mix SQL statements with procedural of SQL statements and other PL/SQL constructs, grouped
constructs. With PL/SQL, you can define and execute PL/SQL together, stored in the database, and executed as a unit to solve
program units such as procedures, functions, and packages. PL/ a specific problem or perform a set of related tasks. Procedures
SQL program units generally are categorized as anonymous and functions permit the caller to provide parameters that can
blocks and stored procedures. be input only, output only, or input and output values.
An anonymous block is a PL/SQL block that appears within Procedures and functions allow you to combine the ease and
your application and it is not named or stored in the database. flexibility of SQL with the procedural functionality of a
In many applications, PL/SQL blocks can appear wherever SQL structured programming language.
statements can appear. For example, the following statement creates the
A stored procedure is a PL/SQL block that Oracle stores in the Credit_Account procedure, which credits money to a bank
database and can be called by name from an application. When account:
you create a stored procedure, Oracle parses the procedure and Create Procedure credit_account
stores its parsed representation in the database. Oracle also (acct Number, credit Number) AS
allows you to create and store functions (which are similar to
/* This procedure accepts two arguments: an account
procedures) and packages (which are groups of procedures and
functions). number and an amount of money to credit to the specified
account. If the specified account does not exist, a
An Introduction to Stored Procedures and Packages
Oracle allows you to access and manipulate database informa- new account is created. */
tion using procedural schema objects called PL/SQL program old_balance Number;
units. Procedures, functions, and packages are all examples of new_balance Number;
PL/SQL program units.
Begin
PL/SQL is Oracle’s procedural language extension to SQL. It
Select balance Into old_balance From accounts
extends SQL with flow control and other statements that make
it possible to write complex programs in it. The PL/SQL engine Where acct_id = acct
is the tool you use to define, compile, and execute PL/SQL For Update Of balance;
program units. This engine is a special component of many new_balance := old_balance + credit;
Oracle products, including Oracle Server.
Update accounts Set balance = new_balance
While many Oracle products have PL/SQL components, this
Where acct_id = acct;
chapter specifically covers the procedures and packages that can
be stored in an Oracle database and processed using the Oracle Commit;
Server PL/SQL engine. The PL/SQL capabilities of each Oracle Exception
tool are described in the appropriate tool’s documentation. When No_Data_Found Then
Stored Procedures and Functions Insert Into accounts (acct_id, balance)
Procedures and functions are schema objects that logically group Values(acct, credit);
a set of SQL and other PL/SQL programming language
statements together to perform a specific task. Procedures and When Others Then
functions are created in a user’s schema and stored in a database Rollback;
for continued use. You can execute a procedure or function End credit_account;
interactively using an Oracle tool, such as SQL*Plus, or call it Notice that the Credit_Account procedure includes both SQL
explicitly in the code of a database application, such as an Oracle and PL/SQL statements.
Forms or Precompiler application, or in the code of another
procedure or trigger.
77
management change, only the procedures need to be modified,
DATABASE
Procedure Guidelines
Use the following guidelines to design and use all stored not all of the applications that use the procedures.
procedures:
Integrity
• Define procedures to complete a single, focused task. Do Stored procedures improve the integrity and consistency of your
not define long procedures with several distinct subtasks, applications. By developing all of your applications around a
MANAGEMENT
because subtasks common to many procedures might be common group of procedures, you can reduce the likelihood of
duplicated unnecessarily in the code of several procedures. committing coding errors.
• Do not define procedures that duplicate the functionality For example, you can test a procedure or function to guarantee
already provided by other features of Oracle. For example, that it returns an accurate result and, once it is verified, reuse it
do not define procedures to enforce simple data integrity in any number of applications without testing it again. If the
rules that you could easily enforce using declarative integrity data structures referenced by the procedure are altered in any way,
constraints. only the procedure needs to be recompiled; applications that call
Benefits of Procedures the procedure do not necessarily require any modifications.
Procedures provide advantages in the following areas. Anonymous PL/SQL Blocks vs. Stored Procedures
Security A stored procedure is created and stored in the database as a
Stored procedures can help enforce data security. You can restrict schema object. Once created and compiled, it is a named object
the database operations that users can perform by allowing that can be executed without recompiling. Additionally,
them to access data only through procedures and functions. dependency information is stored in the data dictionary to
guarantee the validity of each stored procedure.
For example, you can grant users access to a procedure that
updates a table, but not grant them access to the table itself. As an alternative to a stored procedure, you can create an
When a user invokes the procedure, the procedure executes with anonymous PL/SQL block by sending an unnamed PL/SQL
the privileges of the procedure’s owner. Users who have only block to Oracle Server from an Oracle tool or an application.
the privilege to execute the procedure (but not the privileges to Oracle compiles the PL/SQL block and places the compiled
query, update, or delete from the underlying tables) can invoke version in the shared pool of the SGA, but does not store the
the procedure, but they cannot manipulate table data in any source code or compiled version in the database for reuse
other way. beyond the current instance. Shared SQL allows anonymous
PL/SQL blocks in the shared pool to be reused and shared
Performance until they are flushed out of the shared pool.
Stored procedures can improve database performance in several
In either case, moving PL/SQL blocks out of a database
ways:
application and into database procedures stored either in the
• The amount of information that must be sent over a database or in memory, you avoid unnecessary procedure
network is small compared with issuing individual SQL recompilations by Oracle at runtime, improving the overall
statements or sending the text of an entire PL/SQL block performance of the application and Oracle.
to Oracle, because the information is sent only once and
thereafter invoked when it is used. External Procedures
A PL/SQL procedure executing on an Oracle8 Server can call an
• A procedure’s compiled form is readily available in the
external procedure or function that is written in the C program-
database, so no compilation is required at execution time.
ming language and stored in a shared library. The C routine
• If the procedure is already present in the shared pool of the executes in a separate address space from that of the Oracle
SGA, retrieval from disk is not required, and execution can Server.
begin immediately.
Packages
Memory Allocation Packages encapsulate related procedures, functions, and
Because stored procedures take advantage of the shared associated cursors and variables together as a unit in the
memory capabilities of Oracle, only a single copy of the database.
procedure needs to be loaded into memory for execution by
You create a package in two parts: the specification and the body.
multiple users. Sharing the same code among many users
A package’s specification declares all public constructs of the
results in a substantial reduction in Oracle memory require-
package and the body defines all constructs (public and private)
ments for applications.
of the package. This separation of the two parts provides the
Productivity following advantages:
Stored procedures increase development productivity. By • The developer has more flexibility in the development cycle.
designing applications around a common set of procedures, You can create specifications and reference public procedures
you can avoid redundant coding and increase your productivity. without actually creating the package body.
For example, procedures can be written to insert, update, or • You can alter procedure bodies contained within the package
delete rows from the EMP table. These procedures can then be body separately from their publicly declared specifications in
called by any application without rewriting the SQL statements the package specification. As long as the procedure
necessary to accomplish these tasks. If the methods of data specification does not change, objects that reference the
78
altered procedures of the package are never marked invalid; new_balance NUMBER;
DATABASE
that is, they are never marked as needing recompilation insufficient_funds EXCEPTION;
The following example creates the specification and body for a BEGIN
package that contains several procedures and functions that SELECT balance INTO old_balance FROM accounts
process banking transactions. WHERE acct_id = acct
FOR UPDATE OF balance;
MANAGEMENT
Create Package bank_transactions (null) AS
new_balance := old_balance - debit;
minimum_balance Constant Number := 100.00;
IF new_balance >= minimum_balance THEN
Procedure apply_transactions;
UPDATE accounts SET balance = new_balance
Procedure enter_transaction (acct Number, kind Char,
WHERE acct_id = acct;
amount Number);
do_journal_entry(acct, ‘D’);
END bank_transactions;
ELSE
Create Package Body bank_transactions AS
RAISE insufficient_funds;
/* Package to input bank transactions */
END IF;
new_status CHAR(20); /* Global variable to record status of
EXCEPTION
transaction being applied. Used for update in
WHEN NO_DATA_FOUND THEN
Apply_Transactions. */Procedure do_journal_entry (acct
new_status := ‘Nonexistent account’;
Number, kind CHAR) IS
WHEN insufficient_funds THEN
/* Records a journal entry for each bank transaction applied
new_status := ‘Insufficient funds’;
by the Apply_Transactions procedure. */
WHEN OTHERS THEN /* Returns other errors to applica-
Begin
tion */
Insert Into journal
new_status := ‘Error: ‘ || SQLERRM(SQLCODE);
Values (acct, kind, sysdate);
END debit_account;
IF kind = ‘D’ THEN
PROCEDURE apply_transactions IS
new_status := ‘Debit applied’;
/* Applies pending transactions in the table TRANSACTIONS
ELSIF kind = ‘C’ THEN
to the
new_status := ‘Credit applied’;
ACCOUNTS table. Used at regular intervals to update bank
ELSE
accounts without interfering with input of new transactions. */
new_status := ‘New account’;
/* Cursor fetches and locks all rows from the TRANSAC-
END IF;
TIONS
END do_journal_entry;
table with a status of ‘Pending’. Locks released after all pending
Procedure credit_account (acct Number, credit Number) IS
transactions have been applied. */
/* Credits a bank account the specified amount. If the account
CURSOR trans_cursor IS
does not exist, the procedure creates a new account first. */
SELECT acct_id, kind, amount FROM transactions
old_balance NUMBER;
WHERE status = ‘Pending’
new_balance NUMBER;
ORDER BY time_tag
BEGIN
FOR UPDATE OF status;
SELECT balance INTO old_balance FROM accounts
BEGIN
WHERE acct_id = acct
FOR trans IN trans_cursor LOOP /* implicit open and fetch
FOR UPDATE OF balance; /* Locks account for credit update
*/
*/
IF trans.kind = ‘D’ THEN
new_balance := old_balance + credit;
debit_account(trans.acct_id, trans.amount);
UPDATE accounts SET balance = new_balance
ELSIF trans.kind = ‘C’ THEN
WHERE acct_id = acct;
credit_account(trans.acct_id, trans.amount);
do_journal_entry(acct, ‘C’);
ELSE
EXCEPTION
new_status := ‘Rejected’;
WHEN NO_DATA_FOUND THEN /* Create new account
END IF;
if not found */
/* Update TRANSACTIONS table to return result of applying
INSERT INTO accounts (acct_id, balance)
this transaction. */
VALUES(acct, credit);
UPDATE transactions SET status = new_status
do_journal_entry(acct, ‘N’);
WHERE CURRENT OF trans_cursor;
When Others Then /* Return other errors to application */
END LOOP;
new_status := ‘Error: ‘ || SQLERRM(SQLCODE);
COMMIT; /* Release row locks in TRANSACTIONS table. */
END credit_account;
END apply_transactions;
PROCEDURE debit_account (acct NUMBER, debit NUM-
PROCEDURE enter_transaction (acct NUMBER, kind
BER) IS
CHAR, amount NUMBER) IS
/* Debits an existing account if result is greater than the
/* Enters a bank transaction into the TRANSACTIONS table.
allowed minimum balance. */
A new transaction is always put into this ‘queue’ before being
old_balance NUMBER;
79
applied to the specified account by the
DATABASE
How Oracle Stores Procedures and Packages
APPLY_TRANSACTIONS When you create a procedure or package, Oracle
procedure. Therefore, many transactions can be simultaneously • compiles the procedure or package
input without interference. */
• stores the compiled code in memory
BEGIN
INSERT INTO transactions • stores the procedure or package in the database
MANAGEMENT
80
DATABASE MANAGEMENT
81
Student Notes
LESSON 27:
LAB
82
DATABASE MANAGEMENT
DATABASE MANAGEMENT
83
LESSON 28:
LAB
DATABASE
LESSON 29:
DATABASE TRIGGERS
MANAGEMENT
84
• Before row trigger
DATABASE
Row Triggers
A row trigger is fired each time the table is affected by the Before modifying each row affected by the triggering
triggering statement. For example, if an UPDATE statement statement and before checking appropriate integrity
updates multiple rows of a table, a row trigger is fired once for constraints, the trigger action is executed provided that the
each row affected by the UPDATE statement. If a triggering trigger restriction was not violated.
statement affects no rows, a row trigger is not executed at all.
MANAGEMENT
• After statement trigger
Row triggers are useful if the code in the trigger action depends After executing the triggering statement and applying any
on data provided by the triggering statement or rows that are deferred integrity constraints, the trigger action is executed.
affected. • After row trigger
Statement Triggers After modifying each row affected by the triggering
A statement trigger is fired once on behalf of the triggering statement and possibly applying appropriate integrity
statement, regardless of the number of rows in the table that constraints, the trigger action is executed for the current row
the triggering statement affects (even if no rows are affected). provided the trigger restriction was not violated. Unlike
For example, if a DELETE statement deletes several rows Before row triggers, After row triggers lock rows.
from a table, a statement-level DELETE trigger is fired only You can have multiple triggers of the same type for the same
once, regardless of how many rows are deleted from the table. statement for any given table. For example you may have two
Statement triggers are useful if the code in the trigger action Before statement triggers for Update statements on the EMP
does not depend on the data provided by the triggering table. Multiple triggers of the same type permit modular
statement or the rows affected. For example, if a trigger makes a installation of applications that have triggers on the same
complex security check on the current time or user, or if a trigger tables. Also, Oracle snapshot logs use After row triggers, so you
generates a single audit record based on the type of triggering can design your own After row trigger in addition to the Oracle-
statement, a statement trigger is used. defined After row trigger.
Before vs. After Triggers You can create as many triggers of the preceding different types
When defining a trigger, you can specify the trigger timing- as you need for each type of DML statement (Insert, Update, or
whether the trigger action is to be executed before or after the Delete). For example, suppose you have a table, SAL, and you
triggering statement. Before and After apply to both statement want to know when the table is being accessed and the types of
and row triggers queries being issued. The example below contains a sample
package and trigger that tracks this information by hour and
Before Triggers type of action (for example, Update, Delete, or Insert) on table
Before triggers execute the trigger action before the triggering SAL. A global session variable, Stat.Rowcnt, is initialized to
statement is executed. This type of trigger is commonly used in zero by a Before statement trigger. Then it is increased each time
the following situations: the row trigger is executed. Finally the statistical information is
• When the trigger action should determine whether the saved in the table Stat_Tab by the After statement trigger.
triggering statement should be allowed to complete. Using
Sample Package and Trigger for SAL Table
a BEFORE trigger for this purpose, you can eliminate
unnecessary processing of the triggering statement and its Drop Table stat_tab;
eventual rollback in cases where an exception is raised in the Create Table stat_tab(utype CHAR(8),
trigger action. rowcnt Integer, uhour Integer);
Create or Replace Package stat is
• To derive specific column values before completing a
rowcnt Integer;
triggering INSERT or UPDATE statement.
END;
After Triggers /
AFTER triggers execute the trigger action after the triggering Create Trigger bt Before Update or Delete or Insert on sal
statement is executed. AFTER triggers are used in the following Begin
situations: stat.rowcnt := 0;
• When you want the triggering statement to complete before END;
executing the trigger action. /
Create Trigger rt Before Update or Delete Oor Insert on sal
• If a BEFORE trigger is already present, an AFTER trigger
for Each Row Begin
can perform different actions on the same triggering
stat.rowcnt := stat.rowcnt + 1;
statement.
END;
Combinations /
Using the options listed above, you can create four types of Create Trigger at After Update or Delete or Insert on sal
triggers: Declare
• BEFORE statement trigger typ CHAR(8);
Before executing the triggering statement, the trigger action hour Number;
is executed. Begin
IF updating
85
THEN typ := ‘update’; END IF; Then
DATABASE
86
• A trigger has three basic parts:
DATABASE
1. a triggering event or statement
2. a trigger restriction
3. a trigger action
6. Define various restriction of trigger?
MANAGEMENT
Review Terms
• Database triggers
• Parts of triggers
• Types of triggers
• Executing trigger
Students Activity
1. Explain the use of database triggers?Explain its need?
87
Student Notes
88
DATABASE MANAGEMENT
DATABASE MANAGEMENT
89
LESSON 30:
LAB
LESSON 31:
LAB
90
DATABASE MANAGEMENT
DATABASE
LESSON 32:
DATABASE CURSORS
MANAGEMENT
Lesson Objectives records from a record, set created using a select statement are to
• Defining cursor be evaluated & processed one at a time, then the only method
available is by using Explicit Cursors.
• Use of cursor
Also cursors can be used to evaluate the success of updates and
• Explicit cursor
deletes and the number of this affected (Implicit ).
• Implicit cursor
Explicit Cursor
• Parametrized cursor
You can explicitly declare a cursor to process the rows individu-
When ever an SQL statement is executed, Oracle DBA performs ally. A cursor declared by the user is called Explicit Cursor. For
the following tasks:- queries that return more than one row, you must declare a
• Reserves an area in memory called private SQL Area. cursor explicitly.
• Populates this area with the appropriate data Why Use an Explicit Cursor?
• Processes the data in memory area. Cursors can be’ used when the users wants to process data one
• Frees the memory area when the execution is complete. row at a time.
The data that is stored is called Active Data Set. The size of the the account has an amount debited or credited in the Accttran
cursor in memory is the size required to hold the number of table. The records from the Accttran table will be fetched one at
rows in the Active Data set. a time and. updated in the Acctmast table depending upon
whether the account is debited or credited.
Example
PL/SQL raises an error if an embedded select statement
When a user first select statement as
retrieves more than one row. Such an error forces an abnormal
Select empno,job,salary from Employeewhere dept_no=20 termination of the PL/SQL block. Such an error can be
The resultant data set is as follows:- eliminated by using a cursor.
Active Data Set Explicit Cursor Management
3456 IVAN MANAGER 10000
• The steps involved in declaring a cursor and manipulating
3459 PRADEEP - ANALYST 7000 data in the active set are : Declare a cursor that specifies the
3446 MITA PROGRMR 4000 SQL select statement that you want to process
3463 VIJAY CLERK 2000 • Open a cursor.
3450 ALDRIN ACCTANT 3000
• Fetch data from the cursor one row at a time.
Contents -of a cursor. • Close the cursor.
When a query returns multiple rows, in addition to the data A cursor is defined in the declarative part of a PL/SQL block by
held in, the cursor, Oracle will also open and maintain a row naming it and specifying a query. Then, three commands are
pointer. Depending on user requests to view data the row used to control the cursor: open, fetch and close.
pointer will be relocated within the cursor’s Active Data Set. First, initialize the cursor with the open statement, this ;
Additionally Oracle also maintains cursor variables loaded with
defines a private SQL area executes a query associated with ‘“the
the value of the total no. of rows fetched from the active data
cursor populates the Active Data Set.
set.
Sets the Active Data Set1s row pointer to ‘the first record.
Use of Cursors in PL/SQL
While SQL is the natural language of the DBA, it does not have The fetch statement retrieves the current row and advances the
any procedural capabilities such as’ condition checking, looping cursor to the next row. You can execute fetch repeatedly until all
and branching. For this, Oracle provides the PL/SQL. Program- rows have been retrieved.
mers can use it to create programs for validation and When the last row has been processed, close the cursor with the
manipulation of table data. PL/SQIJ adds to the power of close statement. This will release the memory occupied by the
SQL and provides the user with all the functionality of a cursor and its Data Set.
programming environment. Focus: The HRD manager has decided to raise the salary for all
A PL/SQL block of code includes the Procedural code for the employees in department No. 20 by 0.05. Whenever any
looping and branching along with the SQL statement. If such raise is given to the. employees, a record for the same is
91
maintained in the emp_ raise table. It includes the employee
DATABASE
Fetching a Record from the Cursor
number, the date when the raise was given and the actual raise. The fetch statement retrieves the rows from the active set to the
Write a PL/SQL block to update the salary of each employee variables one at a time. Each time a fetch is executed, The focus
and insert a record in the emp _raise table;- of the DBA cursor advances to the next row in the Active Set.
The table definition is as follows: One can make use of any loop structure (Loop-End Loop
MANAGEMENT
Table name: employee along with While, For, IF-End If ’) to fetch the records from
the cursor into variables one row at a time.
Column name Data Type Size Attributes
Syntax : FETCH cursorname INTO variable1, variable2, ... ,
emp _code varchar 10 Primary Key, via which we shall seek
data in the table. For each column value returned by the query associated cursor,
Ename varchar 20 The first name of the candidate. there must be a corresponding variable in the into list. And,
Depmo number 5 The department number.
Job varchar 20 Employee job details. their datatypes must match. These variables will be declared in
Sal number 8,2 The current salary of the employee. the DECLARE section of the PL/SQL block.
-
Example : DECLARE
emp _code varchar 10 is the part of a composite key via which
we shall seek data in the table. cursor c - emp is select emp _code, salary from
raise date date The date on which the raise was given employee
raise amt number 8,2 The raise given to the employee.
where deptno = 20;
Emp _code and raise_date together form a composite primary /* Declaration of memory variable that holds data
key. fetched from the cursor */
str - emp _code employee.emp - code%type;
Declaring a Cursor
To do the above via a PL/SQL block it is necessary to declare a num _salary employee.salary%type;
cursor and associate it with a query before referencing it in any BEGIN
statement within the PL/SQL block. This is because forward open c_emp;
references to object are not allowed in PL/SQL.
/* infinite loop to fetch data from cursor c - emp one
Syntax : CURSOR cursorname IS row at a time */
SQL statement; loop
Example: DECLARE fetch c - emp into str - emp _code, num _salary;
/* Declaration of the cursor named c - emp /* Updating the salary in the employee table as
The active data set wi# include the names, department current salary + raise */
numbers and salaries of all the employees belonging
update employee set salary = num_salary +
to department 20 */
(num_salary * .05)
cursor c - emp is
where emp _code = str - emp _code;
select emp _code, salary from employee
/*insert a record in the emp _raise table */ insert into
where deptno = 20; emp _raise values
The cursor name is not a PL/SQL variable;. it is used only to (str_emp_code, sysdate, num_salary * 0.05)
reference the query. It cannot be assigned any values or be used
end loop;
in an expression.
commit;
Opening a Cursor
Opening the cursor executes the query and identifies the active END;
set, that contains all the rows which meet the query search Note that the current program will result into an indefinite loop
criteria. as there is no exit provided from the loop. The Exit from the
SYntax : OPEN cursorname ; loop can be provided using cursor variables as explained on
page 101 section Explicit Cursor Variables. Also note if you
Example: DECLARE
execute a fetch and there are no more rows left in the active data
cursor c - emp is select emp _code, salary from set, the values of the Explicit cursor ,variables are indetermi-
employee nate.
where deptno = 20; Closing a Cursor
BEGIN The close statement disables the cursor and the active set
becomes undefined. This will release the memory occupied by
/* Opening cursor c - emp */
the cursor and its Data Set. Once a cursor is closed, the user can
open c_emp; reopen the cursor using the open statement.
END; Syntax : CLOSE cursorname ;
Open statements retrieves the records frO1n the database and
places it in the cursor (private SQL area).
92
Example : DECLARE commit;
DATABASE
ursor c - emp is select _mp _code, salary from close c - emp ;
employee END;
where deptno = 20; • %FOUND: is the logical opposite of %notfound. It
str- emp _code employee.emp - code%type; evaluates to true, if the last fetch succeeded because a row
MANAGEMENT
num _salary employee.salary%type; was available; or to false, if the last fetch failed because no
more rows were available.
BEGIN
Syntax : cursorname%Found
open c_emp;
Example: The PL/SQL block will be as follows:
loop
Declare
fetch c - emp into str - emp _code, num_salary;
cursor c - emp is select emp _code, salary from employee
update employee set salary = num_salary +
(num_salary * .05) where deptno = 20;
where emp _code = str - emp _code; str - emp _code employee.emp - code%type;
insert into emp _raise values num_salary employee.salary_type;
(str - emp code, sysdate, num_salary * 0.05) Begin
commit; loop
/* Close cursor c - emp */ fetch c - emp into str - emp _code, num _salary;
fetch c_emp into str_emp_code, num_salary; cursor c - emp is select emp _code, salary from employee
/ * If no. of records retrieved is 0 or if all the records are where deptno = 20;
fetched then exit the loop. */ str - emp _code employee.emp - code%type;
93
if c_emp%isopen then Example : ‘The PL/SQL block will be rewritten as follows:
DATABASE
loop Declare
fetch c - emp into str - emp _code, num _salary; cursor c - emp is select emp _code, salary from employee
exit when c - emp%notfound ; where deptno = 20;
update employee set salary = num_salary + (num_salary * Begin
MANAGEMENT
94
A client may withdraw money from his account. This must be where acctno = acctnum;
DATABASE
‘SUBTRACTED’ from the amount held against that specific else
client’s name in the ACCTMAST table. This is referred to as a
/* if the account is credited then update the
‘DEBIT type transaction.
acctmast table as balance = balance + amt */ update
The ACCTTRAN table must therefore hold a ‘flag’ that
acctmast
MANAGEMENT
indicates whether the transaction type was ‘CREDIT or
‘DEBIT’. set balance = (balance + amt)
Based on this flag define a cursor which will update the where acctno = acctnum ;
ACCTMAST ‘Balance’ field contents. end if;
Write a PL/SQL block that updates the acctmast table and sets update accttran set processed = ‘Y’
the balance depending upon whether the account is debited ‘OT where acctno = acctnum;
credited. The updation should be done only for those records
exit when acc - updt%notfound ;
that are not processed i.e. the processed flag is ‘N’ in the accttran
table. end loop;
1. Create the following tables close acc - updt;
95
end if; We should be able to pass a value to cursor only when it is
DATABASE
END; being opened. For $at the cursor must be declared in such a way
that it recognizes that it will receive the requested value( s) at the
• %FOUND : is the logical opposite of %notfound. Note,
time of opening the cursor. Such a cursor is known as Param-
however that both attributes evaluate to null until they are
eterized Cursor.
set by an implicit or explicit cursor operation. %found
MANAGEMENT
evaluates to’ true, if an insert, update or delete affected one Syntax : CURSOR cursor_name (variable_name datatype) is
or more rows, or a single-row select returned one or more select statement...
tows. Otherwise, it evaluates to false. The scope of cursor parameters are local to that cursor, which
Syntax: SQL%Notfound metros that they can be referenced only within the query declared
Example: The example in sq1%notfound will be written as in the cursor declaration. The values of cursor parameters are
follows: used by the associated query when the cursor is opened.
Begin For example.
update employee set salaty = salaty * 0.15 cursor c_emp (num_deptno number) is
where emp _code = &emp _code; select job, ename from emp where deptno > num - deptn;
if sq1%found then The parameters to a cursor can be passed in the open statement.
They can either be constant values or the contents of a memory
dbms - output.put _line(‘Employee Record
variable.
Modified Successfully’);
For example
else
OPEN c_emp (30)
dbms_output.put_line(‘Employee No. Does not Exist’);
OPEN c_emp (num_deptno)
end if;
Note: The memory variable should be declared in the DE-
END; CLARE section and the value should be assigned to that
• %Rowcount : returns the number of rows affected by an memory variable.
insert, update or delete, or select into statement. Each parameter in the declaration must have a corresponding
Example: The HRD manager has decided to raise the salary value in the open statement. Remember that the parameters of
of employees working as ‘Programmers’ by 0.15. Write a a cursor cannot return values.
PL/SQL block to accept the employee number and update Example :
the salary of that employee. Display appropriate message
based on the existence of the record in the employee table. Allow insert, update and delete for the table itemmast on the
bases of the table itemtran table..
Declare
1. Create the following tables
rows_affected char( 4) ;
Table name: itemmast
Begin
update employee set salary = salary * 0.15 Column name Data Type Size Attributes
where job = ‘Programmers’ ; itemid Number 4 Primary Key
rows_affected := to_char(sq1%rowcount) description Varchar 20 The item description
if sq1%rowcount > 0 then Bal stock Number 3 The balance stock for an item
96
the table itemmast. If insert is successful then the status update itemtran
DATABASE
field of itemtran table is updated to’ ‘ Successful ‘- else it is set itemtran.status = ‘SUCCESSFUL’
updated as ‘Item Already Exist’.
where itemid = itemidno;
2. If Operation = ‘ U ‘ then the qty against this operation
/* if the record is notfound and the operation is update/
column is added to bal:...:.stock column of the table
delete then set the status to ‘Item Not Present’ */
MANAGEMENT
itemmast where itemid of table itemmast is same as that of
itemtran. If update is successful then the status column of elsif oper = ‘U’ or oper = ‘D’ then
itemtran table is updated to ‘ Successful’ else it is updated as update itemtran
‘ Item Does Not Exist’. set itemtran.status = ‘Item Not Present’
3. If Operation”=’ D ‘ , then a Tow from itemmast is deleted where itemid = itemidno;
whose itemid is equal to the itemid in the table itemtran
end if;
with the operation column having the value’ D ‘. If delete is
successful then the status column of itemtran table is else
updated to ‘Successful’ else it is updated as ‘Item Does Not /* if the record is found and the operation is insert then set
Exist’. the status to ‘Item Already exists’ *,/
The following PL/SQL code takes care of the above three cases. if oper = ‘I’ then
Declare update itemtran
/* Cursor scantable retrieves all the records of table itemtran */ set itemtran.status = ‘Item Already Exists’
cursor scantable is where itemid = itemidno;
select itemchk operation, qty, description from itemtran; /* if the record is found and the operation is update/delete
/* Cursor Itemchk accepts the value of item id from the current then perform the update or delete operation
raw of cursor set the status to ‘Successful’ */ .
scantable */ elsif oper = ‘D’ then
cursor itemchk(mastitemid number) is delete from item_mast where item - mast.itemid =
select itemid from item_mast itemidno;
update itemtran
where itemid = mastitemid;
set itemtran.status = ‘Successful’
/* variable_ that hold data from the cursor scantable */
where itemid = itemidno;
itemidno number( 4);
elsif oper = ‘U’ then
de scrip varchar2(30);
update item_mast
oper char(l);
set item_mast. bal- stock = quantity
quantity number(3);
where itemid = itemidno;
/* variable that hold data from the cursor itemchk */
dummyitem number( 4); update itemtran -
Begin set itemtran.status = ‘Successful’
/* open the scantable cursor */ where itemid = itemidno;
open scantable; end if;
loop’ close itemchk;
/ * Fetch the records from the scan table cursor */fetch exit when scantable%notfound;
scantable into itemidno, oper, quantity, ‘descrip; end loop;
/* Open the itemchk cursor close scantable;
Note that the value of variable passed to the itemchk cursor commit;
is set to the value of item id in the current row of cursor END;
scantable */
Points to Ponder
open itemchk(itemidno);
• Oracle DBA uses a work area for its internal processing.
fetch itemchk into dummyitem;
• You can explicitly declare a cursor to process the rows
/* if the record is not found and the operation is insert
individually. A cursor declared by the user is called Explicit
then insert the new record and set the status to ‘Successful’
Cursor
*/if itemchk%notfound then
• Opening the cursor executes the query and identifies the
if oper = ‘T’ then insert into item - mast(itemid, bal- stock,
active set, that contains all the rows which meet the query
description)
search criteria.
values(itemidno; quantity, descrip);
97
• The fetch statement retrieves the rows from the active set to
DATABASE
98
DATABASE MANAGEMENT
99
Student Notes
LESSON 33:
LAB
100
DATABASE MANAGEMENT
DATABASE MANAGEMENT
101
LESSON 34:
LAB
DATABASE
LESSON 35:
NORMALISATION - I
MANAGEMENT
102
We shall use functional dependencies in two ways:
DATABASE
Customer-name Customer-street Customer-city
1. To test relations to see whether they are legal under a given
Jones Main Harrison
set of functional dependencies. If a relation r is legal under Smith North Rye
a set F of functional dependencies, we say that r satisfies F. Hayes Main Harrison
2. To specify constraints on the set of legal relations. We shall Curry North Rye
MANAGEMENT
thus concern our-selves with only those relations that satisfy Lindsay Park Pittsfield
a given set of functional dependen-cies. If we wish to Tufl1et Putnam Stamford
constrain ourselves to relations on schema R that satisfy a Williams Nassau Princeton
set F of functional dependencies, we say that F holds on R. Adams Spring Pittsfield
Johnson Alma Palo Alto
To see which functional dependencies are satisfied. Observe that
Glenn Sand Hill Woodside
A → C is satisfied. There are two tuples that have an A value
Brooks Senator Brooklyn
of a1. These tuples have the same C value-namely, C1. Similarly,
Green Walnut Stamford
the two tu-ples with an A value of a2 have the same C value,
C2. There are no other pairs of distinct tuples that have the The customer relation
same A value. The functional dependency C → A is not L-17 Downtown 1000
satisfied, however. To see that it is not, consider the tuples t1 = L-23 Redwood 2000
(a2, b3, C2, d3) and L-15 Perryridge 1500
L-14 Downtown 1500
L-93 Mianus 500
a1 B1 C1 d1 L-11 Round Hill 900
a1 B2 c1 d2 L-29 Pownal 1200
a2 B2 c2 d2 L-16 North Town 1300
a2 B2 C2 d3 L-18 Downtown 2000
L-25 Perryridge 2500
a3 B3 C2 d4
L-10 Brighton 2200
103
• On Branch-schema: Therefore, we have shown that, whenever tl and t2 are tuples
DATABASE
branch-name → branch-city such that tl [A] = t2 [A], it must be that tl [H] = t2[H]. But
that is exactly the-definition of A → H.
branch-name → assets
Let F be a set of functional dependencies. The closure of F,
• On Customer-schema:
denoted by F+, is the set of all functional dependencies logically
customer-name → customer-city
MANAGEMENT
104
Applying the transitivity rule to this dependency and CG → I, 4. Define functional dependecy?
DATABASE
we infer AG → I.
Points To Ponder
• Normalisation is the process of determining the correct
structure for data in files or databases
MANAGEMENT
• A relational schema R is in first normal form (lNF) if the
domains of all attributes of R are atomic.
• Functional dependencies plays key role in designing good
database. A functional dependency is a type of constraint
5. Define Closure of a set of functional dependency?
that is a generalization of the notion of key.
• The bad design of a database suggests that we should
decompose a relation schema that has many attributes into
several schemas with fewer attributes.
Review Terms
• Normalisation
• First Normal Form
• Functional dependencies
• Closure of a set of functional dependencies
Students Activity
1. Define Normalisation?
3. Define 1NF?
105
Student Notes
106
DATABASE MANAGEMENT
DATABASE
LESSON 36:
NORMALISATION - II
MANAGEMENT
Lesson objectives the relation branch customer
• Decomposition Customer name Loan number Amount
• Properties of decomposition Jones L-17 1000
Smith L-23 2000
• Lossless join decomposition Hayes L-15 1500
• 2NF Jackson L-14 1500
Jones L-93 500
Decomposition Turner L-11 900
The bad design of a database suggests that we should decom- Williams L-29 1200
pose a relation schema that has many attributes into several Hayes L-16 1300
schemas with fewer attributes. Careless decom-position, Johnson L-18 2000
however, may lead to another form of bad design. Glenn L-25 2500
Brooks L-10 2200
Consider an alternative design in which we decompose Lend-
ing-schema into the following two schemas: The relation customer-loan.
Branch-customer-schema = (branch-name, branch-city, assets, The result of computing branch-customer [><I customer-loan.
customer-name) When we compare this relation and the lending relation with
Customer-loan-schema = (customer-name, loan-number, which we started, we notice a difference: Although every tuple
amount) that appears in the lending relation ap-pears in branch-customer
[><I customer-loan, there are tuples in branch-customer [><I
Using the lending relation of Figure 7.1, we construct our new customer-loan that are not in lending. In our example, branch-
relations branch-customer (Branch-customer) and customer- customer [><I customer-loan has the following additional
loan (Customer-loan-schema): tuples
branch-customer = Π branch-name, branch-city, assets, (Downtown, Brooklyn, 9000000, Jones, L-93, 500)
customer-name (lending)
(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
customer-loan = Π customer-name, loan-number, amount
(lending) (Mianus, Horseneck, 400000, Jones, L-17, 1000)
Of course, there are cases in which we need to reconstruct the (North Town, 3700000, Hayes, L-15, 1500)
loan relation. For example, suppose that we wish to find all Consider the query “Find all bank branches that have made a
branches that have loans with amounts less than $1000. No loan in an amount less than $1000.” . we see that the only
relation in our alternative database contains these data. We need branches with loan amounts less than $1000 are Mianus and
to reconstruct the lending relation. It appears that we can do so Round Hill However, when we apply the expression
by writing Π branch-name ((J amount < 1000 (branch-customer [><I
branch-customer _ customer-loan customer-loan))
we obtain three branch names: Mianus, Round Hill, and
Branch name Branch city Assets Customer name Downtown.
Downtown Brooklyn 9000000 Jones A closer examination of this example shows why. If a customer
Redwood Palo Alto 2100000 Smith happens to have several loans from different branches, we
Perryridge Horseneck 1700000 Hayes cannot tell which loan belongs to which branch. Thus, when we
Downtown Brooklyn . 9000000 Jackson join branch-customer and customer-loan, we obtain not only
Mianus Horseneck 400000 Jones
the tuples we had originally in lending, but also several addi-
Round Hill Horseneck 8000000 Turner
tional tuples. Although we have more tuples in
Pownal Bennington 300000 Williams
branch-customer [><I customer-loan, we actually have less in-
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson formation. We are no longer able, in general, to represent in the
Perryridge Horseneck 1700000 Glenn database information about which customers are borrowers
Brighton Brooklyn . 7100000 Brooks from which branch. Because of this loss of information, we call
the decomposition of Lending-schema into Branch-customer-
schema and customer-loan-schema a lossy decomposition, or a
lossy-join decomposition. Decomposition that is not a lossy-
join decomposition is a lossless-join decomposing.
107
That is, {R1, R2,..., Rn} is a decomposition of R if, for i =
DATABASE
Customer Laon
Branch name Branch city Assets Amount
name number 1,2,..., n, each Ri is a subset of R, and every attribute in R
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500 appears in at least one Ri.
Redwood Palo Alto 2100000 Smith L-23 2000 Let r be a relation on schema R, and let Ti = Π Ri (T) for i =
Perryridge Horseneck 1700000 Hayes L-15 1500
1,2,..n. That is, {r1, r2, r3……rn} is the database that results
Perryridge Horseneck 1700000 Hayes L-16 1300
MANAGEMENT
Downtown Brooklyn 9000000 Jackson L-14 1500 from decomposing R into {R1, R2" . . , Rn}.
Mianus Horseneck 400000 Jones L-17 1000 It is always the case that
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900 r⊆ r1 [><I r2 [><I … [><I rn
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-15 1500
To see that this assertion is true, consider a tuple t in relation r.
North Town Rye 3700000 Hayes L-16 1300 When we compute the relations r1, r2, r3, ... rn , the tuple t gives
Downtown Brooklyn 9000000 Johnson L-18 2000 rise to one tuple ti in each ri, i = 1,2, . . . , n. These n tuples
Perryridge Horseneck 1700000 Glenn L-25 2500 combine to regenerate t when we compute r1, [><Ir2…[><I rn.
Brighton Brooklyn 7100000 Brooks L-10 2200
The details ‘are left for you to complete as an exercise. Therefore,
The relation branch-customer [><I customer-loan. every tuple in r appears in r1, [><Ir2…[><I rn.
It should be clear from our example that a lossy-join decompo- In general, r ¹ r1 [><I r2 [><I…. [><I rn. As an illustration,
sition is, in gen-eral, a bad database design. consider our earlier example, in which
Why is the decomposition lossy? There is one attribute in • n = 2.
common between Branch-customer-schema and Customer-
• R. = Lending-schema.
loan-schema:-
• R 1 = Branch-customer-schema.
Branch-customer-schema n Customer-loan-schema = {cus-
tomer-name} • R2 = Customer-Loan-schema.
The only way that we can represent a relationship between, for • r = the relation shown in Figure 7.1.
example, loan-number and branch-name is through customer- • r1 = the relation shown in Figure 7.2
name. This representation is not adequate because a customer • r2 = the relation shown in Figure 7.10.
may have several loans, yet these loans are not necessarily
• r1 [><I r2 = the relation shown in Figure 7.11.
obtained from the same branch.
To have a lossless-join decomposition, we need to impose
Let us consider another alternative design, in which we decom-
constraints on the set of possible relations. We found that the
pose Lending-schema into the following two schemas:
decomposition of Lending-schema into Branch-schema and
Branch-schema = (branch-name, branch-city, assets) Loan-info-schema is lossless because the functional dependency
Loan-info-schema = (branch-name, customer-name, loan- branch-name → branch:...city assets
number, amount)
hold on branch schema.
There is one attribute in common between these two schemas:
We shall show how to test whether decomposition is lossless-
Branch-loan-schema n Customer-loan-schema = {branch- join decomposition in the next few sections. A major part of
name} this chapter deals with the questions of how to specify con-
Thus, the only way that we can represent a relationship’ straints on the database, and how to obtain lossless-join
between, for example, customer-name and assets is through decom-positions that avoid the pitfalls represented by the
branch-name. The difference between this exam-ple and the examples of bad database designs that we have seen in this
preceding one is that the assets of a branch are the same, section.
regardless of the customer to which we are referring; whereas Desirable Properties of Decomposition
the lending branch associated with a certain loan amount does We can use a given set of functional dependencies in designing a
depend on the customer to which we are referring. For a given relational database in which most of the undesirable properties
branch-name, there is exactly one assets value and exactly one do not occur. When we design such systems, it may become
branch-city; necessary to decompose a relation into several smaller relations.
Where as a similar statement cannot be made for customer- In this section, we outline the desirable properties of a decom-
name. That is, the functional dependency position of a rela-tional schema. In later sections, we outline
branch-name → assets branch-city specific ways of decomposing a relational schema to get the
holds, but customer-name does not functionally determine properties we desire. We illustrate our concepts with the
loan-number. Lending -schema
The notion of loss less joins is central to much of relational- Lending-schema = (branch-name, branch-city, assets, customer-
database design. There-fore, we restate the preceding examples name, loan-number, amount)
more concisely and more formally. Let. R be a relation schema. The set F of functional dependencies that we require to hold on
A set of relation schemas {R1, R2,..., Rn} is a decomposition Lending-schema are
of R if branch-name -+ branch-city assets
R = Rl U R2 U ... U Rn loan-number -+ amount branch-name
108
Lending-schema is an example of a bad database design.
DATABASE
Second Normal Form
Assume that we decompose it to the following three relations: The general requirements of 2NF are:
Branch-schema = (branch-name, branch-city, assets) • Remove subsets of data that apply to multiple rows of a
Loan-schema = (loan-number, branch-name, amount) table and place them in separate rows.
Borrower-schema = (customer-name, loan-number) • Create relationships between these new tables and their
MANAGEMENT
predecessors through the use of foreign keys.
We claim that this decomposition has several desirable proper-
ties, which we discuss next. These rules can be summarized in a simple statement: 2NF
attempts to reduce the amount of redundant data in a table by
Lossless-join Decomposition extracting it, placing it in new table(s) and creating relationships
When we decompose a relation into a number of smaller between those tables.
relations, it is crucial that the decomposition be lossless. We
Let’s look at an example. Imagine an online store that main-
must first present a criterion for determining whether a
tains customer information in a database. Their Customers
decomposition is lossy.
table might look something like this:
Let R be a relation schema, and let F be a set of functional
dependencies on R. Let R1 and R2 form a decomposition of R. CustNum FirstName LastName Address City State ZIP
1 John Doe 12 Main Street Sea Cliff NY 11579
This decomposition is a lossless-join decom-position of R if at
2 Alan Johnson 82 Evergreen Tr Sea Cliff NY 11579
least one of the following functional dependencies is in F+: 3 Beth Thompson 1912 NE 1st St Miami FL 33157
R1 ∩ R2 → Rl 4 Jacob Smith 142 Irish Way South Bend IN 46637
5 Sue Ryan 412 NE 1st St Miami FL 33157
R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or A brief look at this table reveals a small amount of redundant
R2, the decomposition of R is a lossless-join decomposition. data. We’re storing the “Sea Cliff, NY 11579” and “Miami, FL
We can use attribute closure to efficiently test for superkeys, as 33157” entries twice each. Now, that might not seem like too
we have seen earlier. much added storage in our simple example, but imagine the
We now demonstrate that our decomposition of Lending- wasted space if we had thousands of rows in our table.
schema is a lossless-join decomposition by showing a sequence Additionally, if the ZIP code for Sea Cliff were to change, we’d
of steps that generate the decomposition. We begin by need to make that change in many places throughout the
decomposing Lending-schema into two schemas: database.
Branch-schema = (branch-name, branch-city, assets) In a 2NF-compliant database structure, this redundant informa-
tion is extracted and stored in a separate table. Our new table
Loan-info-schema = (branch-name, customer-name, loan-
(let’s call it ZIPs) might look like this:
number, amount)
Since branch-name → branch-city assets, the augmentation rule ZIP City State
for functional depen-dencies implies that 11579 Sea Cliff NY
branch-name → branch-name branch-city assets 33157 Miami FL
Since Branch-schema n Loan-info-schema = {branch-name}, it 46637 South Bend IN
follows that our initial decomposition is a lossless-join If we want to be super-efficient, we can even fill this table in
decomposition. advance — the post office provides a directory of all valid ZIP
Next, we decompose Loan-info-schema into codes and their city/state relationships. Surely, you’ve encoun-
Loan-schema = (loan-number, branch-name, amount) tered a situation where this type of database was utilized.
Someone taking an order might have asked you for your ZIP
Borrower-schema = (customer-name, loan-number)
code first and then knew the city and state you were calling
This step results in a lossless-join decomposition, since loan- from. This type of arrangement reduces operator error and
number is a common at-tribute and loan-number → amount increases efficiency.
branch-name.
Now that we’ve removed the duplicative data from the
For the general case of decomposition of a relation into Customers table, we’ve satisfied the first rule of second normal
multiple parts at once, the test for lossless join decomposition form. We still need to use a foreign key to tie the two tables
is more complicated. See the bibliographical notes for references together. We’ll use the ZIP code (the primary key from the
on the topic. ZIPs table) to create that relationship. Here’s our new Custom-
While the test for binary decomposition is clearly a sufficient ers table:
condition for lossless join, it is a necessary condition only if all
constraints are functional dependencies. We shall see other types CustNum FirstName LastName Address ZIP
of constraints later (in particular, a type of constraint called 1 John Doe 12 Main Street 11579
multivalued dependencies), that can ensure that a decomposi- 2 Alan Johnson 82 Evergreen Tr 11579
3 Beth Thompson 1912 NE 1st St 33157
tion is lossless join even if no functional dependencies are
4 Jacob Smith 142 Irish Way 46637
present.
5 Sue Ryan 412 NE 1st St 33157
109
We’ve now minimized the amount of redundant information 4. Define 2NF with the help of an example?
DATABASE
110
DATABASE MANAGEMENT
111
Student Notes
DATABASE
LESSON 37:
NORMALISATION - III
MANAGEMENT
112
misled into thinking R2 satisfies BCNF. In fact, there is a definition seems rather unintuitive, and it is not obvious why it
DATABASE
dependency AC → D in F+ (which can be inferred using the is useful. It represents, in some sense, a minimal relaxation of
pseudotransitivity rule from the two dependencies in F), which the BCNF conditions that helps ensure that every schema has a
shows that R2 is not in BCNF. Thus, we may need a depen- dependency-preserving decomposition into 3NF. Its purpose
dency that is in F+, but is not in F, to show that a decomposed will become more clear later, when we study decomposition into
relation is not in BCNF. 3NF.
MANAGEMENT
An alternative BCNF test is sometimes easier than computing Observe that any schema that satisfies BCNF also satisfies 3NF,
every dependency in F+. To check if a relation Ri in a decompo- since each of its functional dependencies would satisfy one of
sition of R is in BCNF, we apply this test: the first two alternatives. BCNF is there-fore a more restrictive
. For every subset a of attributes in R.i, check that a+ (the constraint than is 3NF.
attribute closure of a under F) either includes no attribute of Ri The definition of 3NF allows certain functional dependencies
- a, or includes all attributes of Ri. that are not allowed in BCNF. A dependency α → β that
If the condition is violated by some set of attributes a in Ri, satisfies only the third alternative of the 3NF definition is not
consider the following functional dependency, which can be allowed in BCNF, but is allowed in 3NF.
shown to be present in F+: Let us return to our Banker-schema example (Section 7.6). We
α → (α+ - α)∩ Ri have shown that this relation schema does not have a depen-
dency-preserving, lossless-join decomposition into BCNF. This
The above dependency shows that Ri violates BCNF, and is a
schema, however, turns out to be in 3NF. To see that it is, we
“witness” for the viola-tion. The BCNF decomposition
note that {customer-name, branch-name} is a candidate key for
algorithm, which we shall see in Section 7.6.2, makes use of the
Banker-schema, so the only attribute not contained in a
witness.
candidate key for Banker-schema is banker-name. The only
Third Normal Form nontrivial functional dependencies of the form
As we saw earlier, there are relational schemas where a BCNF α → banker-name
decomposition cannot be dependency preserving. For such
include {customer-name, branch-name} as part of Q. Since
schemas, we have two alternatives if we wish to check if an
{customer-name, branch-name} is a candidate key, these
update violates any functional dependencies:
dependencies do not violate the definition of 3NF.
• Pay the extra cost of computing joins to test for violations.
As an optimization when testing for 3NF, we can consider only
• Use an alternative decomposition, third normal form functional depen-dencies in the given set F, rather than in F+.
(3NF), which we present below, which makes testing of Also, we can decompose the dependen-cies in F so that their.
updates cheaper. Unlike BCNF, 3NF decompo-sitions may right-hand side consists of only single attributes, and use the
contain some redundancy in the decomposed schema. resultant set in place of F.
We shall see that it is always possible to find a lossless-join, Given a dependency α → β, we can use the same attribute-
dependency-preserving decomposition that is in 3NF. Which of closure-based tech-nique that we used for BCNF to check if α is
the two alternatives to choose is a design decision to be made a superkey. If α is not a superkey, we have to verify whether each
by the database designer on the basis of the application require- attribute in β is contained in a candidate key of R; this test is
ments. rather more expensive, since it involves finding candidate keys.
Definition In fact, test-ing for 3NF has been .shown to be NP-hard; thus,
BCNF requires that all nontrivial dependencies be of the form it is very unlikely that there is a polynomial time complexity
α → β, where α is a super key. 3NF relaxes this constraint algorithm for the task.
slightly by allowing nontrivial functional de-pendencies whose Comparison of BCNF and 3NF
left side is not a superkey. Of the two normal forms for relational-database schemas, 3NF
A relation schema R is in third normal form (3NF) with respect and BCNF, there are advantages to 3NF in that we know that it
to a set F of func-tional dependencies if, for all functional is always possible to obtain a 3NF design without sacrificing a
dependencies in F+ of the form α → β, where α ⊆ R and β ⊆ lossless join or dependency preservation. Nevertheless, there are
R, at least one of the following holds: disadvantages to 3NF: If we do not eliminate all transitive
• α → β is a trivial functional dependency. relations schema depen-dencies, we may have to use null values
• α is a superkey for R. to represent some of the possible meaningful relationships
among data items, and there is the problem of repetition of
• Each attribute A in β - α is contained in a candidate key for information.
R.
As an illustration of the null value problem, consider again the
Note that the third condition above does not say that a single Banker-schema and its associated functional dependencies. Since
candidate key should contain all the attributes in β - α; each banker-name → branch-name, we may want to represent
attribute A in β - α may be contained in a different candidate relationships between values for banker-name and values for
key. branch--name in our database. If we are to do so, however,
The first two alternatives are the same as the two alternatives in either there must be a correspond-ing value for customer-name,
the definition of BCNF. The third alternative of the 3NF or we must use a null value for the attribute customer--name‘1.
113
Customer-name bank-name branch-name Consider again our banking example. Assume that, in an
DATABASE
Curry Johnson Perryridge The astute reader will recognize this schema as a non-BCNF
Turner Johnson Perryridge schema because of the functional dependency
customer-name → customer-street customer-city
An instance of Banker-schema.
that we asserted earlier, and because customer-name is not a key
As an illustration of the repetition of information problem, for BC-schema. How-ever, assume that our bank is attracting
consider the instance of Banker-schema. Notice that the wealthy customers who have several ad-dresses (say, a winter
information indicating that Johnson is working at the home and a summer home). Then, we no longer wish to en-
Perryridge branch is repeated. force the functional dependency customer-name →
Recall that our goals of database design with functional customer-street customer-city. If we remove this functional
dependencies are: dependency, we find BC-schema to be in BCNF with respect to
1. BCNF our modified set of functional dependencies. Yet, even though
BC-schema is now in BCNF, we still have the problem of
2. Lossless join
repetition of information that we had earlier.
3. Dependency preservation
To deal with this problem, we must define a new form of
Since it is not always possible to satisfy all three, we may be constraint, called a mul-tivalued dependency. As we did for
forced to choose between BCNF and dependency preservation functional dependencies, we shall use multivalued dependencies
with 3NF. to define a normal form for relation schemas. This normal
It is worth noting that SQL does not provide a way of form, called fourth normal form (4NF), is more restrictive than
specifying functional depen-dencies, except for the special case BCNF. We shall see that every 4NF schema is also in BCNF, but
of declaring superkeys by using the primary key or unique there are BCNF schemas that are not in 4NF.
constraints. It is possible, although a little complicated, to write
Multivalued Dependencies
assertions that enforce a functional dependency, unfortunately
Functional dependencies rule out certain tuples from being in a
testing the assertions would be very expensive in most database
relation. If A → B, then we cannot have two tuples with the
systems. Thus even if we had a dependency-preserving
same A value but different B values. Mul-tivalued dependencies,
decomposition, if we use standard SQL we would not be able
on the other hand, do not rule out the existence of certain
to efficiently test a functional dependency whose left-hand side
tuples. Instead, they require that other tuples of a certain form
is not a key.
be present in the rela-tion. For this reason, functional depen-
Although testing functional dependencies may involve a join if dencies sometimes are referred to as equality- generating
the decomposition is not dependency preserving, we can reduce dependencies, and multivalued dependencies are referred to as
the cost by using materialized views, which many database tuple- generating dependencies.
systems support. Given a BCNF decomposition that is not de-
Let R be a relation schema and let α ⊆ R and β ⊆ R. The
pendency preserving, we consider each dependency in a
multivalued dependency
minimum cover Fc that is not preserved in the decomposition.
For each such dependency α → β, we define a materialized view α ⊆→→β
that computes a join of all relations in the decomposition, and holds on R if, in any legal relation r(R), for all pairs of tuples t1
projects the result on αβ. The functional dependency can be and t2 in r such that t1[α] = t2[α], there exist tuples t3 and t4 in
easily tested on the ma-terialized view, by means of a constraint r such that
unique (α). On the negative side, there is a space and time tr[α] = t2[α] = t3[α] = t4[α]
overhead due to the materialized view, but on the positive side,
t3 [β] = td{β]
the application programmer need not worry about writing code
to keep redundant data consistent on updates; it is the job of t3[R - β] = t2[R - β]
the database system to maintain the material-ized view, that is, t4 [β] .= t2 [β]
keep up up to date when the database is updated t4[R - β] = t1(R - β]
Thus, in case we are not able to get a dependency-preserving
BCNF decomposition, it is generally preferable to opt for α β R- α - β
BCNF, and use techniques such as materialized views to reduce T1 A1 …..ai ai +1 ….aj Aj +1 …an
T2 A1 …..ai bi + 1 … bj Bj +1 ….bn
the cost of checking functional dependencies.
T3 A1 …..ai Ai+1 …. aj Bj +1 ….bn
Fourth Normal Form T4 A1 …..ai Bi +1 ….bj Aj +1 … an
Some relation schemas, even though they are in BCNF, do not
seem to be sufficiently normalized, in the sense that they still Tabular representation of α →→ β
suffer from the problem of repetition of infor-mation.
114
This definition. is less complicated than it appears to be. dependencies that occur in practice appear to be quite simple.
DATABASE
Figure’7.16 gives a tabular picture of t1, t2, t3 and t4. Intuitively, For complex dependencies, it is better to reason about sets of
the multivalued dependency α →→ β says that the relationship dependencies by using a system of inference rules.
between α and α is independent of the relationship between α From the definition of multivalued dependency, we can derive
and R - β. If the multivalued dependency α →→ β is satisfied the following rule:
by all relations on schema R, then α →→ β is a trivial
MANAGEMENT
α → β, then α →→ β
multivalued dependency on schema R. Thus α →→ β is trivial
if β ⊆ α or β U α = R. In other words, every functional dependency is also a
multivalued dependency.
To illustrate the difference between functional and multivalued
dependencies, we consider the BC-schema again, and the Definition of Fourth Normal Form
relation bc (BC-schema). We must repeat the loan number once Consider again our BC-schema example in which the
for each address a customer has, and we must repeat the address multivalued dependency customer-name ®® customer-street
for each loan a customer has. This repetition is unnecessary, customer-city holds, but no nontrivial functional de-pendencies
since the relationship between a customer and his address is hold. We saw in the opening paragraphs of Section 7.8 that,
independent of the relationship between that customer and a although BC-schema is in BCNF, the design is not ideal, since
loan. If a customer (say, Smith) has a loan (say, loan number L- we must repeat a customer’s address information lor each loan.
23), we want that loan to We shall see that we can use the given multivalued de-pendency
be associated with all Smith’s addresses. Thus, the relation of to improve the database design, by decomposing BC-schema
Figure 7.18 is illegal. To make this relation legal, we need to add into a fourth normal form decomposition.
the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, A relation schema R is in fourth normal form (4NF) with
North, Rye) to the bc relation respect to a set 0 of functional and multivalued dependencies if,
Comparing the preceding example with our definition of for all multivalued dependencies in D+ of the form α →→ β,
multivalued dependency, we see that we want the multivalued where α ⊆ R and β ⊆ R, at least one of the following holds
dependency • α →→ β is a trivial multivalued dependency.
customer-name →→ customer-street customer-city • α is a superkey for schema R.
to hold. (The multivalued dependency customer-name →→ A database design is in 4NF if each member of the set of
loan-number will do as well. We shall soon see that they are relation schemas that consti-tutes the design is in 4NE
equivalent.) Note that the definition of 4NF differs from the definition of
As with functional dependencies, we shall use multivalued BCNF in only the use of multivalued dependencies instead of
dependencies in two ways: functional dependencies. Every 4NF schema is in
1. To test relations to determine whether they are legal under a BCNF. To see this fact, we note that, if a schema R is not in
given set of func-tional and multivalued dependencies BCNF, then there is
2. To specify constraints on the set of legal relations; we shall result := {R}; done := false;
thus concern our-selves with only those relations that satisfy compute D+; Given schema Ri, let Di denote the restriction of
a given set of functional and mul-tivalued dependencies D+ to Ri while (not done) do
115
dependencies. The restriction of D to Ri is the set Di consisting 4. Define Multivalued dependencies?
DATABASE
of
1. All functional dependencies in D+ that include only
attributes of Ri
2. All multivalued dependencies of the form
MANAGEMENT
α →→ β ∩ Ri
where α ⊆ Ri and α →→ β is in D+.
Points to Ponder
• A relation schema R is in BCNF with respect to a set F of 5. Define 4NF?
functional dependencies if, for all functional dependencies in
F+ of the form α →β
• third normal form (3NF), which we present below, which
makes testing of updates cheaper
• Mul-tivalued dependencies do not rule out the existence of
certain tuples Instead, they require that other tuples of a
certain form be present in the rela-tion.
• Every 4NF schema is also in BCNF, but there are BCNF
schemas that are not in 4NF.
Review Terms
• BCNF
• 3NF
• Comparison of BCNF & 3NF
• 4NF
Students Activity
1. Define BCNF?
2 Define 3NF?
116
DATABASE MANAGEMENT
117
Student Notes
DATABASE
LESSON 38:
FILE ORGANIZATION METHOD - I
MANAGEMENT
• Cache: most costly and fastest form of storage. 2. The storage device hierarchy is presented , where the higher
Usually very small, and managed by the operating levels are expensive (cost per bit), fast (access time), but the
system. capacity is smaller.
118
1. Access time: the time from when a read or write request is The restore operation writes back the blocks of each
DATABASE
issued to when data transfer begins. To access data on a file continuously (or nearly so). Some systems, such as
given sector of a disk, the arm first must move so that it is MS-DOS, have utilities that scan the disk and then
positioned over the correct track, and then must wait for the move blocks to decrease the fragmentation.
sector to appear under it as the disk rotates. The time for • Nonvolatile write buffers. Use nonvolatile RAM (such
repositioning the arm is called seek time, and it increases
MANAGEMENT
as battery-back-up RAM) to speed up disk writes
with the distance the arm must move. Typical seek time drastically (first write to nonvolatile RAM buffer and
range from 2 to 30 milliseconds. inform OS that writes completed).
Average seek time is the average of the seek time, measured • Log disk. Another approach to reducing write latency
over a sequence of (uniformly distributed) random is to use a log disk, a disk devoted to writing a
requests, and it is about one third of the worst-case seek sequential log. All access to the log disk is sequential,
time. essentially eliminating seek time, and several
Once the seek has occurred, the time spent waiting for the consecutive blocks can be written at once, making
sector to be accesses to appear under the head is called writes to log disk several times faster than random
rotational latency time. Average rotational latency time is writes.
about half of the time for a full rotation of the disk.
File Organization
(Typical rotational speeds of disks ranges from 60 to 120
rotations per second). 1. A file is organized logically as a sequence of records.
The access time is then the sum of the seek time and the 2. Records are mapped onto disk blocks.
latency and ranges from 10 to 40 milli-sec. 3. Files are provided as a basic construct in operating systems,
2. Data transfer rate, the rate at which data can be retrieved so we assume the existence of an underlying file system.
from or stored to the disk. Current disk systems support 4. Blocks are of a fixed size determined by the operating
transfer rate from 1 to 5 megabytes per second. system.
3. Reliability, measured by the mean time to failure. The typical 5. Record sizes vary.
mean time to failure of disks today ranges from 30,000 to 6. In relational database, tuples of distinct relations may be of
800,000 hours (about 3.4 to 91 years). different sizes.
Optimization of Disk-Block Access 7. One approach to mapping database to files is to store
1. Data is transferred between disk and main memory in units records of one length in a given file.
called blocks. 8. An alternative is to structure files to accommodate variable-
2. A block is a contiguous sequence of bytes from a single length records. (Fixed-length is easier to implement.)
track of one platter. Fixed-Length Records
3. Block sizes range from 512 bytes to several thousand. 1. Consider a file of deposit records of the form:
4. The lower levels of file system manager covert block aaaaaaaaaaaa¯type deposit = record
addresses into the hardware-level cylinder, surface, and bname : char(22);
sector number.
account# : char(10);
5. Access to data on disk is several orders of magnitude slower
than is access to data in main memory. Optimization balance : real;
techniques besides buffering of blocks in main memory. end
• Scheduling: If several blocks from a cylinder need to • If we assume that each character occupies one byte, an
be transferred, we may save time by requesting them in integer occupies 4 bytes, and a real 8 bytes, our deposit
the order in which they pass under the heads. A record is 40 bytes long.
commonly used disk-arm scheduling algorithm is the • The simplest approach is to use the first 40 bytes for
elevator algorithm. the first record, the next 40 bytes for the second, and
• File organization. Organize blocks on disk in a way so on.
that corresponds closely to the manner that we expect • However, there are two problems with this approach.
data to be accessed. For example, store related • It is difficult to delete a record from this structure.
information on the same track, or physically close
• Space occupied must somehow be deleted, or we need
tracks, or adjacent cylinders in order to minimize seek
to mark deleted records so that they can be ignored.
time. IBM mainframe OS’s provide programmers fine
control on placement of files but increase • Unless block size is a multiple of 40, some records will
programmer’s burden. cross block boundaries.
UNIX or PC OSs hide disk organizations from users. • It would then require two block accesses to read or
Over time, a sequential file may become fragmented. write such a record.
To reduce fragmentation, the system can make a back- 2. When a record is deleted, we could move all successive
up copy of the data on disk and restore the entire disk. records up one, which may require moving a lot of records.
119
• We could instead move the last record into the “hole”
DATABASE
Sequential File Organization
created by the deleted record 1. A sequential file is designed for efficient processing of
• This changes the order the records are in. records in sorted order on some search key.
• It turns out to be undesirable to move records to • Records are chained together by pointers to permit fast
occupy freed space, as moving requires block accesses. retrieval in search key order.
MANAGEMENT
• Also, insertions tend to be more frequent than • Pointer points to next record in order.
deletions. • Records are stored physically in search key order (or as
• It is acceptable to leave the space open and wait for a close to this as possible).
subsequent insertion. • This minimizes number of block accesses.
• This leads to a need for additional structure in our file 2. It is difficult to maintain physical sequential order as records
design. are inserted and deleted.
3. So one solution is: • Deletion can be managed with the pointer chains.
• At the beginning of a file, allocate some bytes as a file • Insertion poses problems if no space where new
header. record should go.
• This header for now need only be used to store the • If space, use it, else put new record in an overflow
address of the first record whose contents are deleted. block.
• This first record can then store the address of the • Adjust pointers accordingly.
second available record, To insert a new record, we use
• Problem: we now have some records out of physical
the record pointed to by the header, and change the
sequential order.
header pointer to the next available record.
• If very few records in overflow blocks, this will work
• If no deleted records exist we add our new record to
well.
the end of the file.
• If order is lost, reorganize the file.
4. Note: Use of pointers requires careful programming. If a
record pointed to is moved or deleted, and that pointer is • Reorganizations are expensive and done when system
not corrected, the pointer becomes a dangling pointer. load is low.
Records pointed to are called pinned. 3. If insertions rarely occur, we could keep the file in physically
5. Fixed-length file insertions and deletions are relatively sorted order and reorganize when insertion occurs. In this
simple because “one size fits all”. For variable length, this is case, the pointer fields are no longer required.
not the case. Clustering File Organization
Variable-Length Records 1. One relation per file, with fixed-length record, is good for
Variable-length records arise in a database in several ways: small databases, which also reduces the code size.
• Storage of multiple items in a file. 2. Many large-scale DB systems do not rely directly on the
• Record types allowing variable field size underlying operating system for file management. One large
OS file is allocated to DB system and all relations are stored
• Record types allowing repeating fields
in one file.
Organization of Records in Files 3. To efficiently execute queries involving
There are several ways of organizing records in files. , one may store the depositor tuple
• heap file organization. Any record can be placed anywhere for each cname near the customer tuple for the
in the file where there is space for the record. There is no corresponding cname
ordering of records. 4. This structure mixes together tuples from two relations,
• sequential file organization. Records are stored in but allows for efficient processing of the join.
sequential order, based on the value of the search key of 5. If the customer has many accounts which cannot fit in one
each record. block, the remaining records appear on nearby blocks. This
• hashing file organization. A hash function is computed file structure, called clustering, allows us to read many of the
on some attribute of each record. The result of the function required records using one block read.
specifies in which block of the file the record should be 6. Our use of clustering enhances the processing of a
placed — to be discussed in chapter 11 since it is closely particular join but may result in slow processing of other
related to the indexing structure. types of queries, such as selection on customer.
• clustering file organization. Records of several different For example, the query
relations can be stored in the same file. Related records of
aaaaaaaaaaaa¯select *
the different relations are stored on the same block so that
one I/O operation fetches related records from all the from customer
relations. now requires more block accesses as our customer relation is
now interspersed with the deposit relation.
120
7. Thus it is a trade-off, depending on the types of query that
DATABASE
Students Activity
the database designer believes to be most frequent. Careful 1. Define various types of storage media?
use of clustering may produce significant performance gain.
Data Dictionary Storage
1. The database also needs to store information about the
MANAGEMENT
relations, known as the data dictionary. This includes:
• Names of relations.
• Names of attributes of relations.
• Domains and lengths of attributes.
• Names and definitions of views.
2. Define qualities of a disk?
• Integrity constraints (e.g., key constraints).
plus data on the system users:
• Names of authorized users.
• Accounting information about users.
plus (possibly) statistical and descriptive data:
• Number of tuples in each relation.
• Method of storage used for each relation (e.g., clustered
or non-clustered).
2. When we look at indices we’ll also see a need to store 3. Define sequential file storage?
information about each index on each relation:
• Name of the index.
• Name of the relation being indexed.
• Attributes the index is on.
• Type of index.
3. This information is, in itself, a miniature database. We can
use the database to store data about itself, simplifying the
overall structure of the system, and allowing the full power
4. Define clustering file storage?
of the database to be used to permit fast access to system
data.
Points to Ponder
• Several types of data storage exist in most computer
systems as main memory,cache, Flash memory, Magnetic-
disk storage,optical storage, tape storage.
• The main measures of the qualities of a disk are capacity,
access time, data transfer rate, and reliability
• A sequential file is designed for efficient processing of 5. Define data dictionary storage?
records in sorted order on some search key
• Clustering structure mixes together tuples from two
relations, but allows for efficient processing of the join.
• The database also needs to store information about the
relations, known as the data dictionary.
Review Terms
• Storage device
• Sequential file storage
• Clustering file storage
• Data dictionary storage
• Search key
121
Student Notes
122
DATABASE MANAGEMENT
DATABASE
LESSON 39:
FILE ORGANISATION METHOD - II
MANAGEMENT
Lesson objectives 5. Space Overhead — additional space occupied by an
• Indexing & Hashing index structure.
• Ordered Indices 6. We may have more than one index or hash function for a
file. (The library may have card catalogues by author, subject
• Dense and Sparse Indices
or title.)
• Multi-Level Indices
7. The attribute or set of attributes used to look up records in
• Index Update a file is called the search key (not to be confused with
• Secondary Indices primary key, etc.).
• Static Hashing Ordered Indices
• Hash functions 1. In order to allow fast random access, an index structure may
• Bucket overflow be used.
• Dynamic hashing 2. A file may have several indices on different search keys.
Indexing and Hashing 3. If the file containing the records is sequentially ordered, the
index whose search key specifies the sequential order of the
1. Many queries reference only a small proportion of records in
file is the primary index, or clustering index. Note: The
a file. For example, finding all records at Perryridge branch
search key of a primary index is usually the primary key, but
only returns records where bname = “Perryridge”.
it is not necessarily so.
2. We should be able to locate these records directly, rather
4. Indices whose search key specifies an order different from
than having to read every record and check its branch-name.
the sequential order of the file are called the secondary
We then need extra file structuring.
indices, or nonclustering indices.
Basic Concepts
1. An index for a file works like a catalogue in a library. Cards
in alphabetic order tell us where to find books by a
particular author.
2. In real-world databases, indices like this might be too large
to be efficient. We’ll look at more sophisticated indexing
techniques.
3. There are two kinds of indices.
• Ordered indices: indices are based on a sorted ordering
of the values.
• Hash indices: indices are based on the values being
Dense and Sparse Indices
distributed uniformly across a range of buckets. The
buckets to which a value is assigned is determined by a 1. There are two types of ordered indices:
function, called a hash function. Dense Index:
4. We will consider several indexing techniques. No one • An index record appears for every search key value in
technique is the best. Each technique is best suited for a file.
particular database application. • This record contains search key value and a pointer to
5. Methods will be evaluated on: the actual record.
1. Access Types — types of access that are supported Sparse Index:
efficiently, e.g., value-based search or range search. • Index records are created only for some of the records.
2. Access Time — time to find a particular data item or • To locate a record, we find the index record with the
set of items. largest search key value less than or equal to the search
3. Insertion Time — time taken to insert a new data key value we are looking for.
item (includes time to find the right place to insert). • We start at that record pointed to by the index record,
4. Deletion Time — time to delete an item (includes and proceed along the pointers in the file (that is,
time taken to find item, as well as to update the index sequentially) until we find the desired record.
structure).
123
DATABASE
MANAGEMENT
Dense index.
2. Notice how we would find records for Perryridge branch
using both methods. (Do it!)
124
DATABASE
Hash Functions
1. A good hash function gives an average-case lookup that is a
small constant, independent of the number of search keys.
2. We hope records are distributed uniformly among the
buckets.
MANAGEMENT
3. The worst hash function maps all keys to the same bucket.
4. The best hash function maps all keys to distinct addresses.
5. Ideally, distribution of keys to addresses is uniform and
random.
6. Suppose we have 26 buckets, and map names beginning
Sparse secondary index on cname. with ith letter of the alphabet to the ith bucket.
2. We can use an extra-level of indirection to implement • Problem: this does not give uniform distribution.
secondary indices on search keys that are not candidate keys. • Many more names will be mapped to “A” than to “X”.
A pointer does not point directly to the file but to a bucket
• Typical hash functions perform some operation on the
that contains pointers to the file.
internal binary machine representations of characters in
• To perform a lookup on Peterson, we must read all a key.
three records pointed to by entries in bucket 2.
• For example, compute the sum, modulo # of buckets,
• Only one entry points to a Peterson record, but three of the binary representations of characters of the
records need to be read. search key.
• As file is not ordered physically by cname, this may take • using this method for 10 buckets (assuming the ith
3 block accesses. character in the alphabet is represented by integer i).
3. Secondary indices must be dense, with an index entry for Handling of Bucket Overflows
every search-key value, and a pointer to every record in the
1. Open hashing occurs where records are stored in different
file.
buckets. Compute the hash function and search the
4. Secondary indices improve the performance of queries on corresponding bucket to find a record.
non-primary keys.
2. Closed hashing occurs where all records are stored in one
5. They also impose serious overhead on database bucket. Hash function computes addresses within that
hh((KKi ) ) = h ( K )
i j modification: whenever a file is updated, every index must bucket. (Deletions are difficult.) Not used much in database
be updated. applications.
Designer must decide whether to use secondary indices or not. 3. Drawback to our approach: Hash function must be
Static Hashing chosen at implementation time.
Index schemes force us to traverse an index structure. Hashing • Number of buckets is fixed, but the database may
avoids this. grow.
Hash File Organization • If number is too large, we waste space.
1. Hashing involves computing the address of a data item by • If number is too small, we get too many “collisions”,
computing a function on the search key value. resulting in records of many search key values being in
2. A hash function h is a function from the set of all search the same bucket.
key values K to the set of all bucket addresses B. • Choosing the number to be twice the number of
• We choose a number of buckets to correspond to the search key values in the file gives a good space/
number of search key values we will have stored in the performance tradeoff.
database. Hash Indices
• To perform a lookup on a search key value K i , we 1. A hash index organizes the search keys with their associated
compute , and search the bucket with that address. pointers into a hash file structure.
• If two search keys i and j map to the same address, 2. We apply a hash function on a search key to identify a
because , then the bucket at the address bucket, and store the key and its associated pointers in the
obtained will contain records with both search key bucket (or in overflow buckets).
values.
3. Strictly speaking, hash indices are only secondary index
• In this case we will have to check the search key value of structures, since if a file itself is organized using hashing,
every record in the bucket to get the ones we want. there is no need for a separate hash index structure on it.
• Insertion and deletion are simple.
125
• All such entries will have a common hash prefix, but
DATABASE
Dynamic Hashing
1. As the database grows over time, we have three options: the length of this prefix may be less than i.
• Choose hash function based on current file size. Get • So we give each bucket an integer giving the length of
performance degradation as file grows. the common hash prefix.
• Choose hash function based on anticipated file size. • Number of bucket entries pointing to bucket j is then
MANAGEMENT
126
• We allocate a new bucket z, and set ij and iz to the
DATABASE
Points to Ponder
original ij value plus 1. • An index for a file works like a catalogue in a library
• Now adjust entries in the bucket address table that • In order to allow fast random access, an index structure may
previously pointed to bucket j. be used.
• Leave the first half pointing to bucket j, and make • Indices whose search key specifies an order different from
MANAGEMENT
the rest point to bucket z. the sequential order of the file are called the secondary
• Rehash each record in bucket j as before. indices
• Reattempt new insert. • Dense Index appears for every search key value in file. This
7. Note that in both cases we only need to rehash records in record contains search key value and a pointer to the actual
bucket j. record.
8. Deletion of records is similar. Buckets may have to be • Sparse Index: are created only for some of the records.
coalesced, and bucket address table may have to be halved. • Regardless of what form of index is used, every index must
9. Insertion is illustrated for the example deposit file be updated whenever a record is either inserted into or
deleted from the file.
• 32-bit hash values on bname are shown An initial
empty hash structure is shown • Hashing involves computing the address of a data item by
computing a function on the search key value.
• We insert records one by one.
• We (unrealistically) assume that a bucket can only hold Review Terms
2 records, in order to illustrate both situations • Index
described. • Ordered index
• As we insert the Perryridge and Round Hill records, • Dense Index
this first bucket becomes full.
• Sparse index
• When we insert the next record (Downtown), we must
• Hashing
split the bucket.
• Hash function
• Since i = i0, we need to increase the number of bits we
use from the hash. • Static hashing
• This makes us double the size of the bucket address Students Activity
table to two entries. 1. Define index and hashing?
• We split the bucket, placing the records whose search
key hash begins with 1 in the new bucket, and those
with a 0 in the old bucket
• Next we attempt to insert the Redwood record, and
find it hashes to 1.
• That bucket is full, and i = i1.
• So we must split that bucket, increasing the number of
bits we must use to 2. 2. Define ordered indices?
• This necessitates doubling the bucket address table
again to four entries (F
• We rehash the entries in the old bucket.
• We continue on for the deposit records , obtaining the
extendable hash structure
8. Advantages:
• Extendable hashing provides performance that does 3. Differentiate between sparse index & dense index?
not degrade as the file grows.
• Minimal space overhead - no buckets need be reserved
for future use. Bucket address table only contains one
pointer for each hash value of current prefix length.
9. Disadvantages:
• Extra level of indirection in the bucket address table
• Added complexity
127
4. Define secondary index & multilevel index?
DATABASE
MANAGEMENT
128
DATABASE MANAGEMENT
129
Student Notes
DATABASE
LESSON 40:
TRANSACTIONS MANAGEMENT
MANAGEMENT
130
operation. In this case, the values of accounts A and B is executing, with the de-ducted total written to A and the
DATABASE
reflected in the database are $950 and $2000. The system increased total yet to be written to B. If a second concurrently
destroyed $50 as a result of this failure. In particular, we running transaction reads A and B at this intermediate point
note that the sum A + B is no longer preserved. and computes A + B, it will observe an inconsistent value.
Thus, because of the failure, the state of the system no longer Furthermore, if this second transaction then performs updates
on A and B based on the in-consistent values that it read, the
MANAGEMENT
reflects a real state of the world that the database is supposed to
capture. We term such a state an inconsistent state. We must database may be left in an inconsistent state even after both
ensure that such inconsistencies are not visible in a database transactions have completed.
system. Note, however, that the system must at some point be A way to avoid the problem of concurrently executing transac-
in an inconsistent state. Even if transaction Ti is executed to tions is to execute transactions serially-that is, one after the
comple-tion, there exists a point at which the value of account other. However, concur-rent execution of transactions provides
A is $950 and the value of account B is $2000, which is clearly significant performance benefits Other solutions have therefore
an inconsistent state. This state, how-ever, is eventually replaced been developed; they allow multiple transactions to execute
by the consistent state where the value of account A is $950, concurrently.
and the value of account B is $2050. Thus, if the transaction The isolation property of a transaction ensures that the concur-
never started or was guaranteed to complete, such an inconsis- rent execution of transactions results in a system state that is
tent state would not be visible except during the execution of equivalent to a state that could have been obtained had these
the transaction. That is the reason for the atomicity requirement: transactions executed one at a time in some order. Ensuring the
If the atomicity property is present, all actions of the transac- isolation property is the responsibility of a component of the
tion are reflected in the database, or none are. database system called the concurrency-control component.
The basic idea behind ensuring atomicity is this: The database
Transaction State
system keeps track (on disk) of the old values of any data on
In the absence of failures, all transactions complete successfully.
which a transaction performs a write, and, if the transaction
However, as we noted earlier, a transaction may not always
does not complete its execution, the database sys-tem restores
complete its execution successfully. Such a transaction is termed
the old values to make it appear as though the transaction never
aborted. If we are to ensure the atomicity property, an aborted
executed.
transaction must have no effect on the state of the database.
• Durability: Once the execution of the transaction Thus, any changes that the aborted transaction made to the
completes successfully, and the user who initiated the database must be undone. Once the changes caused by an
transaction has been notified that the transfer of funds has aborted transaction have been undone, we say that the transac-
taken place, it must be the case that no system failure will tion has been rolled back. It is part of the responsibility of the
result in a loss of data corresponding to this transfer of recovery scheme to manage transaction aborts.
funds.
A transaction that completes its execution successfully is said to
The durability property guarantees that, once a transaction be committed. A committed transaction that has performed
completes suc-cessfully, all the updates that it carried out on the updates transforms the database into a new consistent state,
database persist, even if there is a system failure after the which must persist even if there is a system failure.
transaction completes execution.
Once a transaction has committed, we cannot undo its effects by
We assume for now that a failure of the computer system may aborting it. The only way to undo the effects of a committed
result in loss of data in main memory, but data written to disk transaction is to execute a compensating transaction. For
are never lost. We can guarantee durability by ensuring that instance, if a transaction added $20 to an account, the compen-
either sating transaction would subtract $20 from the account.
1. The updates carried out by the transaction have been written However, it is not always possible to create such a compensating
to disk be-fore the transaction completes. transaction. Therefore, the responsibility of writing and
2. Information about the updates carried out by the executing a compensating transaction is left to the user, and is
transaction and written to disk is sufficient to enable the not handled by the database system.
database to reconstruct the updates when the database We need to be more precise about what we mean by successful
system is restarted after the failure. completion of a trans-action. We therefore establish a simple
Ensuring durability is the responsibility of a component of the abstract transaction model. A transaction must be in one of the
database sys-tem called the recovery-management component. following states:
The transaction-manage-ment component and the recovery- • Active, the initial state; the transaction stays in this state
management component are closely re-lated. while it is executing . Partially committed, after the final
• Isolation: Even if the consistency and atomicity properties statement has been executed
are ensured for each transaction, if several transactions are • Failed, after the discovery that normal execution can no
executed concurrently, their oper-ations may interleave in longer proceed
some undesirable way, resulting in an inconsistent state. • Aborted, after the transaction has been rolled back and the
For example, as we saw earlier, the database is temporarily database has been restored to its state prior to the start of
inconsistent while the transaction to transfer funds from A to B the transaction
131
• Committed, after successful completion writes temporarily in nonvolatile storage, and to perform the ac-
DATABASE
The state diagram corresponding to a transaction appears in tual writes only after the transaction enters the committed state.
Figure 23.1 . We say that a transaction has committed only if it If the system should fail after the transaction has entered the
has entered the committed state. Simi-larly, we say that a committed state, but before it could complete the external
transaction has aborted only if it has entered the aborted state. writes, the database system will carry out the external writes
(using\~he data in nonvolatile storage) when the system is
MANAGEMENT
132
Figure 23.2 Shadow-copy technique for atomicity and durabil- Unfortunately, this implementation is extremely inefficient in
DATABASE
ity. the context of large databases, since executing a single transac-
The transaction is said to have been committed at the point tion requires copying the entire database. Furthermore, the
where the updated db-pointer is written to disk. implementation does not allow transactions to execute
concurrently with one another. There are practical ways of
We now consider how the technique handles transaction and
implementing atomicity and durability that are much less
MANAGEMENT
system failures. First, consider transaction failure. If the
expensive and more powerful.
transaction fails at any time before db-pointer is updated, the
old contents of the database are not affected. We can abort the Points to Ponder
trans-action by just deleting the new copy of the database. Once • Collections of operations that form a single logical unit of
the transaction has been committed, all the updates that it work are called transac-tions
performed are in the database pointed to by db-pointer. Thus,
• Usually, a transaction is initiated by a user program written
either all updates of the transaction are reflected, or none of the
in a high-level data-manipulation language or programming
effects are reflected, regardless of transaction failure.
language
Now consider the issue of system failure. Suppose that the • Transactions access data using two operations:
system fails at any time before the updated db-pointer is written
to disk. Then, when the system restarts, it will read db-pointer Read(X)
arid will thus see the original contents of the database, and Write(X)
none of the effects of the transaction will be visible on the • A transaction may not always complete its execution
database. Next, suppose that the system fails after db-pointer successfully. Such a transaction is termed aborted.
has been updated on disk. Before the pointer is updated, all • A transaction must be in one of the following states:
updated pages of the new copy of the database were written to
Active
disk. Again, we assume that, once a file is written to disk, its
contents will not be damaged even if there is a system failure. Failed
Therefore, when the system restarts, it will read db-pointer and Aborted
will thus see the contents of the database after all the updates Commited
performed by the transaction.
• The recovery-management component of a database system
The implementation actually depends on the write to db- can support atomicity and durability by a variety of schemes
pointer being atomic; that is, either all its bytes are written or
none of its bytes are written. If some of the bytes of the Review Terms
pointer were updated by the write, but others were not, the • Transactions
pointer is meaningless, and neither old nor new versions of the • ACID properties
database may be found when the system restarts. Luckily, disk
• Transaction state
systems provide atomic updates to entire blocks, or at least to a
disk sector. In other words, the disk system guarantees that it • Implementation of ACID properties
will update db-pointer atomically, as long as we make sure that Students Activity
db-pointer lies entirely in a single sector, which we can ensure by 1. Define transaction with the help of an example?
storing db-pointer at the beginning of a block.
Thus, the atomicity and durability properties of transactions are
ensured by the shadow-copy implementation of the recovery-
management component.
As a simple example of a transaction outside the database
domain, consider a text editing session. An entire editing
session can be modeled as a transaction. The actions executed by
the transaction are reading and updating the file. Saving the file
at the end of editing corresponds to a commit of the editing 2. Define ACID properties of a database?
transaction; quitting the editing session without saving the file
corresponds to an abort of the editing transaction.
Many text editors use essentially the implementation just
described, to ensure that an editing session is transactional. A
new file is used to store the updated file. At the end of the
editing session, if the updated file is to be saved, the text editor
uses a file rename command to rename the new file to have the
actual file name. The rename, assumed to be implemented as an 3. Define various states of a transaction ?
atomic operation by the underlying file system, deletes the old
file as well.
133
DATABASE
MANAGEMENT
134
DATABASE MANAGEMENT
135
Student Notes
DATABASE
LESSON 41:
CONCURRENCY CONTROL - I
MANAGEMENT
Lesson objectives to help identify those executions that are guaranteed to ensure
• concurrent executions consistency.
• Advantages of concurrency The database system must control the interaction among the
concurrent trans-actions to prevent them from destroying the
• Problems with concurrency
consistency of the database. It does so through a variety of
• Serializibility mechanisms called concurrency-control schemes
Concurrent Executions Consider again the simplified banking system of Section 23.1,
Transaction-processing systems usually allow multiple transac- which has several accounts, and a set of transactions that access
tions to run concur-rently. Allowing multiple transactions to and update those accounts. Let Tl and T2 be two transactions
update data concurrently causes several complications with that transfer funds from one account to another. Transaction Tl
consistency of the data, as we saw earlier. Ensuring consistency transfers $50 from account A to account R It is defined as
in spite of concurrent execution of transactions requires extra T1: read(A);
work; it is far easier to insist that transactions run serially-that is,
A := A-50;
one at a time, each starting only after the previous one has
completed. However, there are two good reasons for allowing write(A);
concurrency: read(B);
• Improved throughput and resource utilization. A B := B + 50;
transaction consists of many steps. Some involve I/O write(B).
activity; others involve CPU activity. The CPU and the disks
in a computer system can operate in parallel. Therefore, I/O Transaction T2 transfers 10 percent of the balance from account
activity can be done in parallel with processing at the CPU. A to account B. It is defined as
The parallelism of the CPU and the I/O system can T2: read(A);
therefore be exploited to run multiple transactions in temp:= A * 0.1;
parallel. While a read or write on behalf of one transaction A :=A - temp;
is in progress on one disk, another transaction can be
running in the CPU, while another disk may be executing a write(A);
read or write on behalf of a third transaction. All of this read(B);
increases the throughput of the system-that is, the number B := B + temp;
of transactions executed in a given amount of time. write(B).
Correspondingly, the processor and disk utilization also
increase; in other words, the processor and disk spend less Suppose the current values of accounts A and Bare $1000 and
time idle, or not performing any useful work. $2000, respectively. Suppose also that the two transactions are
executed one at a time in the order Tl followed by T2. This
• Reduced waiting time. There may be a mix of execution sequence appears in Figure 23.3. In the figure, the
transactions running on a sys-tem, some short and some sequence of instruction steps is in chronological order from top
long. If transactions run serially, a short transaction may to bottom, with in-structions of Tl appearing in the left
have to wait for a preceding long transaction to complete, column and instructions of T2 appearing in the right column.
which can lead to unpredictable delays in running a The final values of accounts A and B, after the execution in
transaction. If the transactions are oper-ating on different Figure 23.3 takes place, are $855 and $2145, respectively. Thus,
parts of the database, it is better tolet them run the total amount of money in
concurrently, sharing the CPU cycles and disk accesses
among them. Concurrent execution reduces the T1 T2
unpredictable delays in running transactions. Moreover, it read(A) read(A)
also reduces the average response time: the average time for A := A-50 temp:= A * 0.1
a transaction to be completed after it has been submitted. write (A) A := A ~ temp
The motivation for using concurrent execution in a database is read(B) write(A)
essentially the same as the motivation for using multiprogram-
ming in an operating system. B := B + 50 read(B)
When several transactions run concurrently, database consistency write (B) B := B + temp
can be destroyed despite the correctness of each individual write(B)
transaction. In this section, we present the concept of schedules Figure 41.1 Schedule I-a serial schedule in which Tl is followed
by T2.
136
Accounts A and B-that is, the sum A + B - is preserved after T1 T2
DATABASE
the execution of both transactions. read(A)
Similarly, if the transactions are executed one at a time in the A := A-50
order T2 followed by TI, then the corresponding execution
write(A)
sequence is that of Figure 41.2. Again, as expected, the sum A +
read(A)
MANAGEMENT
B is preserved, and the final values of accounts A and Bare $850
and $2150, respectively. temp := A * 0.1
The execution sequences just described are called schedules. A :=A - temp
They represent the chronological order in which instructions are write(A)
executed in the system. Clearly, a sched-ule for a set of transac-
read(B)
tions must consist of all instructions of those transactions, and
must preserve the order in which the instructions appear in each B := B + 50
individual transac-tion. For example, in transaction TI, the write(B)
instruction write(A) must appear before the instruction read(B), read(B)
in any valid schedule. In the following discussion, we shall refer
B := B + temp
to the first execution sequence (TI followed by T2) as schedule 1,
and to the second execution sequence (T2 followed by T1 )as write(B)
schedule 2. Figure 41.3 Schedule 3-a concurrent schedule equivalent to
These schedules are serial: Each serial schedule consists of a schedule 1.
sequence of instruc-tions from various transactions, where the another transaction. Thus, the number of possible schedules
instructions belonging to one single trans-action appear for a set of n transactions is much larger than n!.
together in that schedule. Thus, for a set of n transactions, there Returning to our previous example, suppose that the two
exist n! different valid serial schedules. transactions are exe-cuted concurrently. One possible schedule
When the database system executes several transactions appears in Figure 41.3. After this execution takes place, we arrive
concurrently, the corre-sponding schedule no longer needs to be at the same state as the one in which the transactions are
serial. If two transactions are running con-currently, the executed serially in the order Tl followed by T2. The sum A + B
operating system may execute one transaction for a little while, is indeed preserved.
then perform a context switch, execute the second transaction Not all concurrent executions result in a correct state. To
for some time, and then switch back to the first transaction for illustrate, consider the schedule of Figure 23.6. After the
some time, and so on. With multiple transac-tions, the CPU execution of this schedule, we arrive at a state where the final
time is shared among all the transactions. values of accounts A and Bare $950 and $2100, respectively.
Several execution sequences are possible, since the various This final state is an inconsistent state, since we have gained $50
instructions from both transactions may now be interleaved. In in the process of the concur-rent execution. Indeed, the sum A
general, it is not possible to predict exactly how many instruc- + B is not preserved by the execution of the two transactions.
tions of a transaction will be executed before the CPU switches If control of concurrent execution is left entirely to the operat-
to ing system, many possible schedules, including ones that leave
Tl T2 the database in an inconsistent state, such as the one just
read(A) described, are possible. It is the job of the database system to
ensure that any schedule that gets executed will leave the
temp:= A * 0.1
database in a consistent state. The concurrency-control compo-
A :=A - temp nent of the database system carries out this task.
write(A) We can ensure consistency of the database under concurrent
read(B) execution by making sure that any schedule that executed has
B := B + temp the same effect as a schedule that could have occurred without
any concurrent execution. That is, the schedule should, in some
write(B)
sense, be equivalent to a serial schedule. We examine this idea in
read(A) Section
A := A-50
Serializability
write(A) The database system must control concurrent execution of
read(B) transactions, to ensure that the database state remains consis-
B := B + 50 tent. Before we examine how the database
write(B) T1 T2
Figure 41.2 Schedule 2-a serial schedule in which T2 is followed read(A)
by TI. A := A-50
read(A)
137
temp := A * 0.1 • Concurrent execution reduces the unpredictable delays in
DATABASE
consistent
read(B) • Since transactions are programs, it is computationally
B := B + 50 difficult to determine ex-actly what operations a transaction
write (B) performs and how operations of various trans-actions
interact
B := B + temp
write (B) Review Terms
Figure 41.4 Schedule 4-a concurrent schedule. • Concurrent executions
system can carry out this task, we must first understand which • Advantages of concurrency
schedules will en-sure consistency, and which schedules will not. • Problems with concurrency
Since transactions are programs, it is computationally difficult to • Serializibility
determine ex-actly what operations a transaction performs and Students Activity
how operations of various trans-actions interact. For this
1. Define concurrency?
reason, we shall not interpret the type of operations that a
transaction can perform on a data item. Instead, we consider
only two operations: read and write. We thus assume that,
between a read (Q) instruction and a write (Q) instruction on a
data item Q, a transaction may perform an arbitrary sequence of
op-erations on the copy of Q that is residing in the local buffer
of the transaction. Thus, the only significant operations of a
transaction, from a scheduling point of view, are its read and
write instructions. We shall therefore usually show only read
and write instructions in schedules, as we do in schedule 3 in
2. Define the advantages & disadvantages of concurrency
Figure 41.5.
control?
In this section, we discuss different forms of schedule equiva-
lence; they lead to the notions of conflict serializability and view
serializability.
T1 T2
read(A)
write(A)
read(A)
write(A)
read(B) 3. Define serializibility?
write(B)
read(B)
write(B)
Figure 41.5 Schedule 3-showing only the read and write
instructions.
Points to Ponder
• Transaction-processing systems usually allow multiple
transactions to run concur-rently 4. When a database can enter into an inconsistent state?
• Allowing multiple transactions to update data concurrently
causes several complications with consistency of the data
• The parallelism of the CPU and the I/O system can
therefore be exploited to run multiple transactions in
parallel
138
5. How we can ignore inconsistency problem?
DATABASE
MANAGEMENT
139
Student Notes
140
DATABASE MANAGEMENT
DATABASE
LESSON 42:
CONCURENCY CONTROL - II
MANAGEMENT
Lesson objectives new schedule S!. We expect S to be equivalent to S!, since all
• Types of serializibility instructions appear in the same order in both schedules except
for Ii and Ij, whose order does not matter.
• Conflict Serializability
Since the write(A) instruction of T2 in schedule 3 of Figure 23.7
• View Serializability
does not conflict with the read(B) instruction of TI, we can
• Implementation of isolation swap these instructions to generate an equivalent schedule,
• Transaction definition in SQL schedule 5, in. Figure 23.8. Regardless of the initial system state,
Conflict Serializability schedules 3 and 5 both produce the same final system state.
Let us consider a schedule 5 in which there are two consecutive We continue to swap non-conflicting instructions:
instructions Ii and Ij, of transactions Ti and Tj, respectively (i ≠ • Swap the read (B) instruction of TI with the read (A)
j). If Ii and Ij refer to different data items, then we can swap Ii instruction of T2.
and Ij without affecting the results of any instruction in the • Swap the write (B) instruction of TI with the write (A)
schedule. However, if Ii and Ij refer to the same data item Q, instruction of T2.
then the order of the two steps may matter. Since we are dealing
• Swap the write(B) instruction of TI with the read(A)
with only read and write instructions, there are four cases that
instruction of T2.
we need to consider:
T1 T2
1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not
matter, since the same value of Q is read by Ti and Tj, Read (A)
regardless of the order. Write (A)
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti Read (A)
does not read the value of Q that is written by Tj in Read (B)
instruction Ij. If Ij comes before Ii, then Ti reads the value
of Q that is written by Tj. Thus, the order of Ii and Ij Write (A)
matters. Write (B)
3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters Read (B)
for reasons similar to those of the previous case. Write (B)
4. Ii = write(Q), Ij = write(Q). Since both instructions are write Figure 42.1 Schedule 5-schedule 3 after swapping of a pair of
operations, the order of these instructions does not affect instructions.
either Ti or Tj. However, the value obtained by the next The final result of these swaps, schedule 6 of Figure 42.2, is a
read(Q) instruction of 5 is affected, since the result of only serial schedule. Thus, we have shown that schedule 3 is
the latter of the two write instructions is preserved in the equivalent to a serial schedule. This equivalence implies that,
database. If there is no other write(Q) instruction after Ii regardless of the initial system state, schedule 3 will produce the
and Ij in 5, then the order of Ii and Ij directly affects the final same final state as will some serial schedule.
value of Q in the database state that results from schedule
5. If a schedule S can be transformed into a schedule S’ by a series
of swaps of non--conflicting instructions, we say that Sand S’
Thus, only in the case where both Ii and Ij are read instructions are conflict equivalent.
does the relative order of their execution not matter.
In our previous examples, schedule 1 is not conflict equivalent
We say that Ii and Ij conflict if they are operations by different to schedule 2. How- ever, schedule 1 is conflict equivalent to
transactions on the same data item, and at least one of these schedule 3, Because the read(B) and write(B) instruction of Tl
instructions is a write operation. can be swapped with the read(A) and write(A) instruction of T2.
To illustrate the concept of conflicting instructions, we consider The concept of conflict equivalence leads to the concept of
schedule 3, in Fig-ure 23.7. The write(A) instruction of Tl conflict serializability. We say that a schedule S is conflict
conflicts with the read(A) instruction of T2. However, the serializable if it is conflict equivalent to a serial schedule. Thus,
write(A) instruction of T2 does not conflict with the read(B) schedule 3 is conflict serializable, since it is conflict equivalent to
instruction of TI, because the two instructions access different the serial schedule 1.
data items.
Finally, consider schedule 7 of Figure 42.3; it consists of only
Let Ii and Ij be consecutive instructions of a schedule 5. If Ii and the significant op-erations (that is, the read and write) of
Ij are instructions of different transactions and Ii and Ij do not transactions T3 and T4. This schedule is not conflict serializable,
conflict, then we can swap the order of Ii and Ij to produce a
141
since it is not equivalent to either the serial schedule <T3,T4> or read(B)
DATABASE
T1 T2
write (A)
read(A)
figure 42.4 schedule 8.
write(A)
Consider two schedules S and S’, where the same set of
read(B)
transactions participates in both schedules. The schedules S and
write(B) S’ are said to be view equivalent if three conditions are met:
read(A) 1. For each data item Q, if transaction Ti reads the initial value
write(A) of Q in schedule S, then transaction Ti must, in schedule S’,
read(B) also read the initial value of Q.
write (B) 2. For each data item Q, if transaction Ti executes read(Q) in
schedule S, and if that value was produced by a write(Q)
Figure 42.2 Schedule 6-a serial schedule that is equivalent to
operation executed by transaction Ti, then the read(Q)
schedule 3.
operation of transaction Ti must, in schedule S’, also read
T3 T4 the value of Q that was produced by the same write(Q)
Read (Q) operation of transaction Ti.
Write (Q) 3. For each data item Q, the transaction (if any) that performs
Write (Q) the final write(Q) operation in schedule S must perform the
final write(Q) operation in sched-ule S’.
Figure 42.3 Schedule 7.
Conditions 1 and 2 ensure that each transaction reads the same
from account B to account A. Let schedule 8 be as defined in
values in both schedules and, therefore, performs the same
Figure 42.4. We claim that schedule 8 is not conflict equivalent
computation. Condition 3, coupled with conditions 1 and 2,
to the serial schedule <TI, T5>, since, in schedule 8, the write (B)
ensures that both schedules result in the same final system
instruction of Ts conflicts with the read (B) instruction of TI.
state.
Thus, we cannot move all the instructions of TI before those of
T5 by swapping con-secutive non-conflicting instructions. In our previous examples, schedule 1 is not view equivalent to
However, the final values of accounts A and B after the schedule 2, since, in schedule 1, the value of account A read by
execution of either schedule 8 or the serial schedule <TI,T5> are transaction T2 was produced by T1 whereas this case does not
the same -$960 and $2040, respectively. hold in schedule 2. However, schedule 1 is view equivalent to
schedule 3, because the values of account A and B read by
We can see from this example that there are less stringent
transaction T2 were produced by T1 in both schedules.
definitions of schedule equivalence than conflict equivalence.
For the system to determine that schedule 8 produces the same The concept of view equivalence leads to the concept of view
outcome as the serial schedule <TI,T5>, it must analyze the serializability. We say that a schedule 5 is view serializable if it is
com-putation performed by TI and T5, rather than just the read view equivalent to a serial schedule.
and write operations. In general, such analysis is hard to As an illustration, suppose that we augment schedule 7 with
implement and is computationally expensive. How-ever, there transaction T6, and obtain schedule 9 in Figure 42.5. Schedule 9
are other definitions of schedule equivalence based purely on is view serializable. Indeed, it is view equivalent to the serial
the read and write operations. We will consider one such schedule <T3, T4, T6>, since the one read(Q) instruction reads
definition in the next section. the initial value of Q in both schedules, and T6 performs the
final write of Q in both schedules.
View Serializability
In this section, we consider a form of equivalence that is less Every conflict-serializable schedule is also view serializable, but
stringent than conflict equivalence, but that, like conflict there are view-serializable schedules that are not conflict
equivalence, is based on only the read and write operations of serializable. Indeed, schedule 9 is not con-flict serializable, since
transactions. every pair of consecutive instructions conflicts, and, thus, no
swapping of instructions is possible.
TI T2
Observe that, in schedule 9, transactions T4 and T6 perform
read(A)
write(Q) operations without having performed a read(Q)
A := A-50 operation. Writes of this sort are called blind writes. Blind
write (A) writes appear in any view-serializable schedule that is not conflict
read(B) seri-alizable.
B := B - 10
write(B)
142
schedule must have the same effect as would some serial
DATABASE
T3 T4 T6
schedule. Thus, conflict and view serializability are both
read( Q)
acceptable.
write(Q)
The SQL.92 standard also allows a transaction to specify that it
write( Q) write(Q)
may be executed in a manner that causes it to become
Figure 42.5 Schedule 9-a view-serializable schedule.
MANAGEMENT
nonserializable with respect to other transactions. We study
Implementation of Isolation such weaker levels of consistency in Section 16.8.
So far, we have seen what properties a schedule must have if it Points to Ponder
is to leave the database in a consistent state and allow transac-
• A transaction is a unit of program execution that accesses
tion failures to be handled in a safe manner. Specifically,
and possibly updates var-ious data items.
schedules that are conflict or view serializable and cascadeless
satisfy these requirements. • To ensure integrity of the data, we require that the database
system maintain the following properties of the
There are various concurrency-control schemes that we can use
transactions. These properties are often called the ACID
to ensure that, even when multiple transactions are executed
properties.
concurrently, only acceptable sched-ules are generated, regardless
of how the operating-system time-shares resources (such as • Allowing multiple transactions to update data concurrently
CPU time) among the transactions. causes several complications with consistency
As a trivial example of a concurrency-control scheme, consider • The database system must control concurrent execution of
this scheme: A transaction acquires a lock on the entire database transactions, to ensure that the database state remains
before it starts and releases the lock after it has committed. consistent
While a transaction holds a lock, no other transaction is allowed Review Terms
to acquire the lock and all must therefore wait for the lock to be • Inconsistent state
released. As a result of the locking policy, only one transaction
can execute at a time. Therefore, only serial schedules are • Transaction state
generated. These are trivially serializable, and it is easy to verify • Active
that they are cascadeless as well. • Partially committed
A concurrency-control scheme such as this one leads to poor • Failed
performance, since it forces transactions to wait for preceding • Aborted
transactions to finish before they can start. In other words, it
• Committed
provides a poor degree of concurrency. As explained in Section
23.4, concurrent execution has several performance benefits. • Terminated
The goal of concurrency-control schemes is to provide a high • Transaction
degree of concur-rency, while ensuring that all schedules that can • Restart
be generated are conflict or view serializable, and are cascadeless. • Kill
The schemes have different trade-offs in terms of the amount • Observable external writes
of concurrency they allow and the amount of overhead that
• Shadow copy scheme
they incur. Some of them allow only conflict serializable
schedules to be generated; others allow certain view-serializable • Concurrent executions
schedules that are not conflict-serializable to be generated. • Serial execution
Transaction Definition in SQL • Schedules
A data-manipulation language must include a construct for • Conflict of operations
specifying the set of ac-tions that constitute a transaction. • Conflict equivalence
The SQL standard specifies that a transaction begins implicitly. • Conflict serializability
Transactions are ended by one of these SQL statements:
• View equivalence
• Commit work commits the current transaction and begins
• View serializability
a new one.
• Rollback work causes the current transaction to abort. Students Activity
The keyword work is optional in both the statements. If a 1. Define Transaction?
program terminates with-out either of these commands, the
updates are either committed or rolled back -which of the two
happens is not specified by the standard and depends on the
im-plementation.
The standard also specifies that the system must ensure both
serializability and freedom from cascading rollback. The
definition of serializability used by the stan-dard is that a
143
2. Define Conflict Serializibility?
DATABASE
MANAGEMENT
144
DATABASE MANAGEMENT
145
Student Notes
DATABASE
LESSON 43:
CONCURRENCY CONTROL - III
MANAGEMENT
146
Figure 43.1 Tree-structured database graph TS(Ti), and a new transaction Tj enters the system, then TS(Ti)
DATABASE
< TS(Tj). There are two simple methods for implementing this
T10 T11 T12 T13
lock-X(B) scheme:
lock-x (D) 1. Use the value of the system clock as the timestamp; that is,
lock-x(H) a transaction’s time-stamp is equal to the value of the clock
unlock(D)
MANAGEMENT
when the transaction enters the system.
lock-x(E)
lock-x(D) 2. Use a logical counter that is incremented after a new
unlock(B) timestamp has been assigned; that is, a transaction’s
unlock(E) timestamp is equal to the value of the counter when the
lock-x (B) transaction enters the system.
lock-x (E)
unlock (H) The timestamps of the transactions determine the serializability
lock-X(G) order. Thus, if TS(Ti) < TS(Tj), then the system must ensure
unlock(D) that the produced schedule is equiva-lent to a serial schedule in
lock-x (D) which transaction Ti appears before transaction Tj.
lock-X(H)
unlock(D) To implement this scheme, we associate with each data item Q
unlock(H) two timestamp values:
unlock(E)
• W-timestamp(Q) denotes the largest timestamp of any
unlock(B)
unlock (G) transaction that exe-cuted write(Q) successfully.
• R-timestamp(Q) denotes the largest timestamp of any
Figure 43.1 Serializable schedule under the tree protocol.
transaction that exe-cuted read(Q) successfully.
The tree-locking protocol has an advantage over the two-phase
These timestamps are updated whenever a new read(Q) or
locking protocol in that, unlike two-phase locking, it is dead-
write(Q) instruction is executed.
lock-free, so no rollbacks are required. The tree-locking protocol
has another advantage over the two-phase locking protocol in The Timestamp-ordering Protocol
that unlocking may occur earlier. Earlier unlocking may lead to The timestamp-ordering protocol ensures that any conflicting
shorter waiting times, and to an increase in concurrency. read and write opera-tions are executed in timestamp order.
However, the protocol has the disadvantage that, in some cases, This protocol operates as follows:
a transaction may have to lock data items that it does not access. 1. Suppose that transaction Ti issues read(Q).
For example, a transaction that needs to access data items A and a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a
J in the database graph of Figure 43.1 must lock not only A and value of Q that was already overwritten. Hence, the
J, but also data items B, D, and H. This additional locking read operation is rejected, and Ti is rolled back.
results in increased locking overhead, the possibility of addi- b. If TS(Ti) ~ W-timestamp(Q), then the read operation
tional waiting time, and a potential decrease in concurrency. is executed, and R-timestamp(Q) is set to the
Further, without prior knowledge of what data items will need maximum of R-timestamp(Q) and TS(Ti).
to be locked, transactions will have to lock the root of the tree,
2. Suppose that transaction Ti issues write(Q).
and that can reduce concurrency greatly.
a. If TS(Ti) < R-timestamp(Q), then the value of Q that
For a set of transactions, there may be conflict-serializable
Ti is producing was needed previously, and the system
schedules that cannot be obtained through the tree protocol.
assumed that that value would never be produced.
Indeed, there are schedules possible under the two-phase
Hence, the system rejects the write operation and rolls
locking protocol that are not possible under the tree protocol,
Ti back.
and vice versa. Examples of such schedules are explored in the
exercises. b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to
write an obsolete value of Q. Hence, the system rejects
Timestamp-based Protocols this write operation and rolls Ti back.
The locking protocols that we have described thus far determine
the order between every pair of conflicting transactions at c. Otherwise, the system executes the write operation and
execution time by the first lock that both members of the pair sets W-time- stamp(Q) to TS(Ti). TS(Ti) = R/W -
request that involves incompatible modes. Another method for TS(Q).
determining the serializability order is to select an ordering If a transaction Ti is rolled back by the concurrency-control
among transactions in advance. The most common method for scheme as result of is-suance of either a read or write operation,
doing so is to use a timestamp-ordering scheme. the system assigns it a new timestamp and restarts it.
Timestamps To illustrate this protocol, we consider transactions TI4 and T15.
With each transaction Ti in the system, we associate a unique Transaction TI4 displays the contents of accounts A and B:
fixed timestamp, de-noted by TS(Ti). This timestamp is T14: read(B);
assigned by the database system before the trans-action Ti starts read(A);
execution. If a transaction Ti has been assigned timestamp
display(A + B).
147
Transaction TI5 transfers $50 from account A to account B, and • Recoverability alone can be ensured by tracking
DATABASE
148
1. Finish(Ti) < Start(Tj) Since Ti completes its execution • With each transaction Ti in the system, we associate a unique
DATABASE
before Tj started, the serializability order is indeed fixed timestamp, This timestamp is assigned by the
maintained. database system before the trans-action Ti starts execution
2. The set of data items written by Ti does not intersect with • The timestamp-ordering protocol ensures that any
the set of data items read by T, and Ti completes its write conflicting read and write opera-tions are executed in
MANAGEMENT
phase before Tj starts its validation phase (Start(Tj) < timestamp order
Finish(Ti) < Validation(Tj)). This condition ensures that
Review Terms
T14 T15 • Locking mechanism
read(B) • Graph based protocol
read(B)
B := B – 50 • Timestamps
read (A) • Timestamp based protocol
A := A + 50
read (A) • Timestamp ordering protocols
(validate) • Validation protocol
display (A + B)
(validate) Students Activity
write(B) 1. Define locks in a database system?Why it is advisable
write (A)
149
5. Define Timestamp ordering protocol?
DATABASE
MANAGEMENT
150
DATABASE MANAGEMENT
151
Student Notes
DATABASE
LESSON 44:
DATABASE RECOVERY
MANAGEMENT
152
Flash storage, though nonvolatile, has insuffi-cient capacity of remote backup, one of the blocks is local, whereas the other
DATABASE
for most database systems. is at a remote site. An output operation is executed as follows:
• Stable storage. Information residing in stable storage is 1. Write the information onto the first physical block.
never lost (never should be taken with a grain of salt, since 2. When the first write completes successfully, write the same
theoretically never cannot be guaranteed for example, it is information onto the second physical block.
MANAGEMENT
possible, although extremely unlikely, that a black hole may
3. The output is completed only after the second write
envelop the earth and permanently destroy all data!).
completes successfully.
Although stable stor-age is theoretically impossible to
obtain, it can be closely approximated by techniques that During recovery, the system examines each pair of physical
make data loss extremely unlikely. blocks. If both are the same and no detectable error exists, then
no further actions are necessary. (Recall that errors in a disk
The distinctions among the various storage types are often less
block, such as a partial write to the block, are detected by storing
clear in practice than in our presentation. Certain systems
a checksum with each block.) If the system detects an error in
provide battery backup, so that some main memory can survive
one block, then it replaces its content with the content of the
system crashes and power failures. Alternative forms of non-
other block. If both blocks contain no detectable error, but they
volatile storage, such as optical media, provide an even higher
differ in content, then the system replaces the content of the
degree of reliability than do disks.
first block with the value of the second. This recovery procedure
Stable-Storage Implementation ensures that a write to stable storage either succeeds completely
To implement stable storage, we need to replicate the needed (that is, updates all copies) or results in no change.
information in sev-eral nonvolatile storage media (usually disk) The requirement of comparing every corresponding pair of
with independent failure modes, and to update the information blocks during recovery is expensive to meet. We can reduce the
in a controlled manner to ensure that failure during data transfer cost greatly by keeping track of block writes that are in progress,
does not damage the needed information. using a small amount of nonvolatile RAM. On recovery, only
RAID systems guarantee that the failure of a single disk (even blocks for which writes were in progress need to be compared.
during data transfer) will not result in loss of data. The The protocols for writing out a block to a remote site are similar
simplest and fastest form of RAID is the mirrored disk, which to the protocols for writing blocks to a mirrored disk system
keeps two copies of each block, on separate disks. Other forms
We can extend this procedure easily to allow the use of an
of RAID offer lower costs, but at the expense of lower
arbitrarily large number of copies of each block of stable
performance.
storage. Although a large number of copies reduces the
RAID systems, however, cannot guard against data loss due to probability of a failure to even lower than two copies-do, it is
disasters such as fires or flooding. Many systems store archival usually reasonable to simulate stable storage with only two
backups of tapes off-site to guard against such disasters. copies.
However, since tapes cannot be carried off-site continually,
updates since the most recent time that tapes were carried off- Data Access
site could be lost in such a disaster. More secure systems keep a The database system resides permanently on nonvolatile storage
copy of each block of stable storage at a remote site, writing it (usually disks), and is partitioned into fixed-length storage units
out over a computer network, in addition to storing the block called blocks. Blocks are the units of data transfer to and from
on a local disk system. Since the blocks are output to a remote disk, and may contain several data items. We shall assume that
system as and when they are output to local storage, once an no data item spans two or more blocks. This assumption is
output operation is complete, the output is not lost, even in realistic for most data-processing applications, such as our
the event of a disaster such as a fire or flood. We study such banking example.
remote’ backup systems Transactions input information from the disk to main memory,
In the remainder of this section, we discuss how storage media and then output the information back onto the disk. The input
can be protected from failure during data transfer. Block transfer and output operations are done in block units. The blocks
between memory and disk storage can result in residing on the disk are referred to as physical blocks; the blocks
residing temporarily in main memory are referred to as buffer
• Successful completion. The transferred information
blocks. The area of memory where blocks reside temporarily is
arrived safely at its des-tination.
called the disk buffer.
• Partial failure. A failure occurred in the midst of transfer,
Block movements between disk and main memory are initiated
and the destination block has incorrect information.
through the fol-lowing two operations:
• Total failure. The failure occurred sufficiently early during
1. input(B) transfers the physical block B to main memory.
the transfer that the destination block remains intact.
2. output(B) transfers the buffer block B to the disk, and
We require that, if a data-transfer failure occurs, the system replaces the appropriate physical block there.
detects it and invokes a recovery procedure to restore the block Each transaction Ti has a private work area in which copies of all
to a consistent state. To do so, the system must maintain two the data items accessed and updated by Ti are kept. The system
physical blocks for each logical database block; in the case of creates this work area when the transaction is initiated; the
mirrored disks, both blocks are at the same location; in the case system removes it when the transaction either commits or
153
aborts. Each data item X kept in the work area of transaction Ti for this difficulty is that we have modified the database without
DATABASE
is denoted by Xi. Transaction Ti interacts with the database having assurance that the transaction will indeed commit. Our
system by transferring data to and from its work area to the goal is to perform either all or no database modifications made
system buffer. We transfer data by these two operations: by Ti. However, if Ti performed multiple database modifica-
1. read(X) assigns the value of data item X to the local variable tions, several output operations may be re-quired, and a failure
may occur after some of these modifications have been made,
MANAGEMENT
Note that both operations may require the transfer of a block Log-Based Recovery
from disk to main mem-ory. They do not, however, specifically The most widely used structure for recording database modifi-
require the transfer of a block from main mem-ory to disk. cations is the log. The log is a sequence of log records, recording
all the update activities in the database. There are several types
A buffer block is eventually written out to the disk either
of log records. An update log record describes a single data-base
because the buffer man-ager needs the memory space for other
write. It has these fields:
purposes or because the database system wishes to reflect the
change to B on the disk. We shall say that the database system Transaction identifier is the unique identifier of the transaction
performs a force-output of buffer B if it issues an output(B). that performed the write operation.
When a transaction needs to access a data item X for the first • Data-item identifier is the unique identifier of the data item
time, it must execute read (X). The system then performs all written. Typically, it is the location on disk of the data item.
updates to X on Xi. After the transaction ac-cesses X for the • Old value is the value of the data item prior to the write.
final time, it must execute write(X) to reflect the change to X in • New value is the value that the data item will have after the
the database itself. write.
The output(B x) operation for the buffer block B x on which X Other special log records exist to record significant events during
resides ,does not need to take effect immediately after write(X) transaction pro-cessing, such as the start of a transaction and
is executed, since the block Bx may contain other data items that the commit or abort of a transaction. We denote the various
are still being accessed. Thus, the actual output may take place types of log records as:
later. Notice that, if the system crashes after the write(X)
• <Ti start>. Transaction Ti has started.
operation was executed but before output(Bx) was executed,
the new value of X is never written to disk and, thus, is lost. • <Ti, Xj, VI, V2>. Transaction Ti has performed a write on
data item Xj. Xj
Recovery and Atomicity
• had value VI before the write, and will have value V2 after
Consider again our simplified banking system and transaction
the write.
Ti that transfers $50 from account A to account B, with initial
values of A aI)d B being $1000 and $2000, respectively. • <Ti commit>. Transaction Ti has committed. . <Ti abort>.
Suppose that a system crash has occurred during the execution Transaction Ti has aborted.
of Ti, after output (B A) has taken place, but before output(B Whenever a transaction performs a write, it is essential that the
B) was executed, where B A and B B denote the buffer blocks log record for that write be created before the database is
on which A and B reside. Since the memory contents were lost, modified. Once a log record exists, we can output the modifica-
we do not know the fate of the transaction; thus, we could tion to the database if that is desirable. Also, we have the ability
invoke one of two possible recovery procedures: to undo a modification that has already been output to the
• Execute Ti. This procedure will result in the value of A database. We undo it by using the old-value field in log records.
becoming $900, rather than $950. Thus, the system enters For log records to be useful for recovery from system and disk
an inconsistent state. failures, the log must reside in stable storage. For now, we
• Do not execute Ti. The current system state has values of assume that every log record is written to the end of the log on
$950 and $2000 for A and B, respectively. Thus, the system stable storage as soon as it is created. In Section 44.7, we shall
enters an inconsistent state. see when it is safe to relax this requirement so as to reduce the
overhead imposed by logging. Two techniques for using the log
In either case, the database is left in an inconsistent state, and
to ensure transaction atomicity despite failures. Observe that the
thus this simple re-covery scheme does not work. The reason
log contains a complete record of all database activity. As a
154
result, the volume of data stored in the log may become <To, B, 2050>
DATABASE
unreasonably large. <To commit>
Deferred Database Modification <T 1 start>
The deferred-modification technique ensures transaction <T1, C, 600>
atomicity by recording all database modifications in the log, but
<Tl commit>
MANAGEMENT
deferring the execution of all write operations of a transaction
until the transaction partially commits. Recall that a transaction Figure 44.2 Portion of the database log corresponding to To
is said to be partially committed once the final action of the and Tl’
transaction has been ex-ecuted. The version of the deferred- appears in Figure 44.3. Note that the value of A is changed in
modification technique that we describe in this section assumes the database only after the record <To, A, 950> has been placed
that transactions are executed serially. in the log.
When a transaction partially commits, the information on the Using the log, the system can handle any failure that results in
log associated with the transaction is used in executing the the loss of informa-tion on volatile storage. The recovery
deferred writes. If the system crashes before the transaction scheme uses the following recovery procedure:
completes its execution, or if the transaction aborts, then the • redo(Ti) sets the value of all data items updated by
informa-tion on the log is simply ignored. transaction Ti to the new values.
The execution of transaction Ti proceeds as follows. Before Ti The set of data items updated by Ti and their respective new
starts its execution, a record <Ti start> is written to the log. A values can be- found in the log.
write(X) operation by Ti results in the writing of a new record
The redo operation must be independent; that is, executing it
to the log. Finally, when Ti partially commits, a record <Ti
several times must be equivalent to executing it once. This
commit> is written to the log.
characteristic is required if we are to guarantee correct behavior
When transaction Ti partially commits, the records associated even if a failure occurs during the recovery process.
with it in the log are used in executing the deferred writes. Since
After a failure, the recovery subsystem consults the log to
a failure may occur while this updating is taking place, we must
determine which trans-actions need to be redone. Transaction Ti
ensure that, before the start of these updates, all the log records
needs to be redone if and only if the log contains both the
are written out to stable storage. Once they have been written,
record <Ti start> and the record <Ti commit>. Thus, if the
the actual updating takes place, and the transaction enters the
system crashes after the transaction completes its execution, the
committed state.
recovery scheme uses the information in the log to restore the
Observe that only the new value of the data item is required by system to a previous consistent state after the transaction had
the deferred modification technique. Thus, we can simplify the completed.
general update-log record struc-ture that we saw in the previous
As an illustration, let us return to our banking example with
section, by omitting the old-value field.
transactions To and Tl executed one after the other in the order
To illustrate, reconsider our simplified banking system. Let To To followed by Tl’ Figure 44.2 shows the log that results from
be a transaction that transfers $50 from account A to account B: the complete execution of To and Tl’ Let us suppose that the
To: read (A); Log Database
A := A-50; <To start>
write(A); <To, A, 950>
read(B); <To, B, 2050>
B := B + 50; <To commit>
write(B). A = 950
Let Tl be a transaction that withdraws $100 from account C: B = 2050
T1: read(C); <Tl start>
C := C - 100; <T1, C, 600>
write(C). <Tl commit>
Suppose that these transactions are executed serially, in the order C = 600
To followed by Tl’ and that the values of accounts A, B, and C
Figure 44.3 State of the log and database corresponding. to To
before the execution took place were $1000, $2000, and $700,
and Tl’
respectively.
<T0 start> <T0 start> <T0 start>
There are various orders in which the actual outputs can take
place to both the database system and the log as a result of the < T0,A,950> < T0,A,950> < T0,A,950>
execution of To and Tl’ One such order <T0,B,2050> <T0,B,2050> <T0,B,2050>
<To start>
<To, A, 950>
155
<T0 commit> <T0 commit> <T1 start> found in the log , the system performs the operation redo(Ti)
DATABASE
<T1 start> <T1,C,600> <T1,C,600> .In other words ,it restarts the recovery action from the begin-
<T1,commit> ning. Since redo writes values to the database independent of
the values currentlyin the database, the result of a successful
(a) (b) (c)
second attempt at redo is the same as though redo has suc-
Figure 44.4 Same log shown at three different times. ceeded the first time.
MANAGEMENT
<To commit> Transactions are not allowed to perform any update action, such
as writing to any buffer blockor writing a log record , while a
appears in the log on the disk. After this operation is executed, checkpoint is in progress.
the values of accounts A and Bare $950 and $2050, respectively.
The value of account Remains $700. As before, the log records Points to Ponder
of the incomplete transaction Tl can be deleted from the log. • There are various types of failure that may occur in a system,
Finally, assume that a crash occurs just after the log record each of which needs to be dealt with in a different manner
as transaction failure,system crash, disk failures.
<Tl commit>
• Storage media can be distinguished by their relative speed,
is written to stable storage. The log at the time of this crash is
capacity, and resilience to failure, and classified as volatile
as in Figure 44.4c. When the system comes back up, two
storage or nonvolatile stor-age
commit records are in the log: one for To and one for Tl’
Therefore, the system must perform operations redo(To) and • RAID systems guarantee that the failure of a single disk
redo(T1), in the order in which their commit records appear in (even during data transfer) will not result in loss of data
the log. After the system executes these operations, the values • When a system failure occurs, we must consult the log to
of accounts A, B, and Care $950, $2050, and $600, respectively. determine those transac-tions that need to be redone and
Finally, let us consider a case in which a second system crash those that need to be undone.
occurs during re-covery from the first crash. Some changes may Review Terms
have been made to the database as a result of the redo opera-
• Failure
tions, but all changes may not have been made. When the
System comes up after the second crash, recovery proceeds • Types of failures
exactly as in the pr_.g examples. For each commit record • Storage structure
<Ticommit> • Storage Type
Finally, let us consider a case in which a second system crash • Stable-Storage Implementation
occurs during re-covery from the first crash. Some changes may • Data Access
have been made to the database as a • Recovery and Atomicity
result of the redo operations, but all changes may not have • Log-Based Recovery
been made. When the System comes up after the second crash,
• Checkpoints
recovery proceeds exactly as in the pr_.g examples. For each
commit record
<Ticommit>
156
DATABASE
Students Activity
1. Define database recovery?
MANAGEMENT
2. What are the various kinds of failures?
4. Define checkpoints?
157
Student Notes
158
DATABASE MANAGEMENT
“The lesson content has been compiled from various sources in public domain including but not limited to the
internet for the convenience of the users. The university has no proprietary right on the same.”