Database Management

?
Database Management
Subject: DATABSE MANAGEMENT Credits: 4
SYLLABUS
Introduction to data base management system – Data versus information, record, file; data dictionary, database
administrator, functions and responsibilities; file-oriented system versus database system
Database system architecture – Introduction, schemas, sub schemas and instances; data base architecture, data
independence, mapping, data models, types of database systems
Data base security – Threats and security issues, firewalls and database recovery; techniques of data base
security; distributed data base
Data warehousing and data mining – Emerging data base technologies, internet, database, digital libraries,
multimedia data base, mobile data base, spatial data base
Lab: Working over Microsoft Access
Suggested Readings:
1. A Silberschatz, H Korth, S Sudarshan, “Database System and Concepts”; fifth Edition; McGraw-Hill
2. Rob, Coronel, “Database Systems ”, Seventh Edition, Cengage Learning
DATABASE
MANAGEMENT
DATABASE MANAGEMENT SYSTEM
COURSE OVERVIEW
This course provides an immediately useable tools and the The students on completion of the course shall develop the
techniques in the methods of database management system, following skills and competencies:
requirements analysis, definition, specification and design etc. It • Database
provides participants with the details if the tools, techniques
• DBMS
and methods to lead or participate in the front-end phases.
• Database System Application
Database systems are designed to manage large bodies of
• File System
information.Management of data involves both defining
structures for storage of information & providing way for • Data Ionsistency
manipulation of data. In addition, the database system must

ensure safety of data.
DBMS is collection of programs that enables you to store,
modify, and extract important information from a database.
There are many different types of DBMS, ranging from small
sys-tems that run on personal computers to huge systems that
run on mainframes.
Objectives
• To help you to learn DBMS and design technique: what it is
and how one goes about doing it.
• The primary goal of DBMS is to provide an environment
that is both convenient & efficient for people to use in
retrieving & storing information..
• Database systems are designed to store large bodies of
information.
By the end of this material, you will be equipped with good
knowledge of technical information that will help you develop
& understand DBMS
ii
DATABASE
DATABASE MANAGEMENT
MANAGEMENT
CONTENT
Lesson No. Topic Page No.
INTRODUCTION TO DBMS
Lesson 1 Introduction to Database I 1
Lesson 2 Introduction to Database II 5
Lesson 3 Tutorial 8
Lesson 4 Database Concepts I 9
Lesson 5 Database Concepts II 13
DATA MODELS IN DATABASES
Lesson 6 Data Models 17
RELATIONAL DATA BASE MANAGEMENT SYSTEM
Lesson 7 Relational Database Management System I 21
Lesson 8 Relational Database Management System II 27
Lesson 9 E-R Model I 31
Lesson 10 E-R Model II 36
STRUCTURED QUERY LANGUAGES
Lesson 11 Structured Query Language(SQL) I 40
Lesson 12 LAB 45
Lesson 13 Lab 46
Lesson 14 SQL II 47
Lesson 15 LAB 55
Lesson 16 LAB 56
Lesson 17 SQL III 57
Lesson 18 Lab 62
Lesson 19 Lab 63
Lesson 20 SQL IV 64
Lesson 21 Lab 67
Lesson 22 Lab 68
Lesson 23 Integrity and security 69
Lesson 24 LAB 75
Lesson 25 LAB 76
Lesson 26 PL/SQL 77
Lesson 27 Lab 82
Lesson 28 Lab 83
Lesson 29 Database Triggers 84
Lesson 30 LAB 89
v
DATABASE
DATABASE MANAGEMENT
MANAGEMENT
CONTENT
Lesson No. Topic Page No.
Lesson 31 LAB 90
Lesson 32 Database Cursors 91
Lessom 33 LAB 100
Lesson 34 LAB 101
NORMALIZATION
Lesson 35 Normalisation I 102
Lesson 36 Normalisation I I 107
Lesson 37 Normalisation III 112
FILE ORGANIZATION METHODS
Lesson 38 File Organization Method I 118
Lesson 39 File Organization Method II 123
DATA BASE OPERATIONAL MAINTENANCE
Lesson 40 Transactions Management 130
Lesson 41 Concurrency Control I 136
Lesson 42 Concurency Control II 141
Lesson 43 Concurrency Control III 146
Lesson 44 Database Recovery 152
vi
DATABASE
LESSON 1:
INTRODUCTION TO DATABASE I
MANAGEMENT
Lessons Objective 1.2 Database System Applications
• Database There are many different types of DBMSs, ranging from small
systems that run on personal computers to huge systems that
• Database management system
run on mainframes. Database are applied in wide no. of
• Essentials of data applications. Following are some of the examples :-
• Benefits of DBMS • Banking: For customer information, accounts, loans & other
• Database system application banking transactions
• Purpose of database system • Airlines: For reservation & schedule information
1.1 What is Database Management • Universities: For student information, course
System(DBMS) registration,grades etc.
A database can be termed as a repository of data. A collection • Credit card transaction: For purchase of credit cards &
of actual data which constitutes the information regarding an generation of monthly statements.
organisation is stored in a database. For ex. There are 1000 • Tlecommunication: For keeping records of calls made ,
students in a college & we have to store their personal details, generating monthly billetc.
marks details etc., these details will be recorded in a database.
• Finance: For storing information about holdings, sales &
A collection of programs that enables you to store, modify, and purchase of financial statements
extract information from a database is known as DBMS.The
• Sales: For customer,product & purchase information
primary goal of a DBMS is to provide a way to store & retrieve
database information that is both convenient & efficient. • Manufacturing: For management of supply chain.
• Human Resource: For recording information about
employees,salaries,tax,benefits etc.
We can say that when ever we need to have a computerised
Actual data
storage
system, we need a database system
1.3 Purpose of Database system
A file system is one in which we keep the information in
Database systems are designed to manage large bodies of operating system files. Before the evolution of DBMS,
information.Management of data involves both defining organisations used to store information in file systems. A
structures for storage of information & providing way for typical file processing system is supported by a conventional
manipulation of data. In addition, the database system must operating system. The system stores permanent records in
ensure safety of data. various files & it need application program to extract records ,
DBMS is collection of programs that enables you to store, or to add or delete records .
modify, and extract important information from a database. We will compare both systems with the help of an example.
There are many different types of DBMS, ranging from small
There is a saving bank enterprise that keeps information about
sys-tems that run on personal computers to huge systems that
all customers & saving accounts. Following manipulations has
run on mainframes.
to be done with the system
Good data management is an essential prerequisite to corporate
• A program to debit or credit an account
success.
• A program to add a new account.
Data Information
• A program to find balance of an account.
Information Knowledge
• A program to generate monthly statements.
Knowledge Judgment
As the need arises new applications can be added at a particular
Judgment Decision
point of time as checking accounts can be added in a saving
Decision Success account. Using file system for storing data has got following
provided that data is: disadvantages:-
• complete 1. Data Redundancy and Inconsistency
• accurate Different programmers work on a single project , so various
• timely files are created by different programmers at some interval of
time. So various files are created in different formats & different
• easily available programs are written in different programming language.
1
Same information is repeated.For ex name & address may • The primary goal of DBMS is to provide an environment
DATABASE
appear in saving account file as well as in checking account. This that is both convenient & efficient for people to use in
redundancy sesults in higher storage space & access cost.It also retrieving & storing information.
leads to data inconsistency which means that if we change some • DBMS systems are ubiquitous today & most people
record in one place the change will not be reflected in all the interact either directly or indirectly with database many times
places. For ex. a changed customer address may be reflected in
MANAGEMENT
every day.
saving record but not any where else.
2. Difficulty in Accesing data information.
Accessing data from a list is also a difficulty in file • A major purpose of a DBMS is to provide users with an
system.Suppose we want to see the records of all customers abstract view of data i.e. the system hides how the data is
who has a balance less than $10,000, we can either check the list stored & maintained.
& find the names manually or write an application program .If
we write an application program & at some later time, we need Review Terms
to see the records of customer who have a balance of less than • Database
$20,000, then again a new program has to be written. • DBMS
It means that file processing system do not allow data to be • Database System Application
accessed in a convenient manner.
• File System
3. Data Isolation • Data Ionsistency
As the data is stored in various files, & various files may be
• Consistency constraints
stored in different format, writing application program to
retrieve the data is difficult. • Atomicity
• Redundancy
4. Integrity Problems
Sometimes, we need that data stored should satisfy certain • Data isolation
constraints as in a bank a minimum deposit should be of $100. • Data Security
Developers enforce these constraints by writing appropriate
Students Activity
programs but if later on some new constraint has to be added
then it is difficult to change the programs to enforce them. 1. What is database?Explain with example?
5. Atomicity Problems
Any mechanical or electrical devive is subject to failure, and so is
the computer system. In this case we have to ensure that data
should be restored to a consistent state.For example an amount
of $50 has to be transferred from Account A to Account B. Let
the amount has been debited from account A but have not
been credited to Account B and in the mean time, some failure
occurred. So, it will lead to an inconsistent state.
So,we have to adopt a mechanism which ensures that either full 2. What is DBMS?Explain with example?
transaction hould b executed or no transaction should be
excuted i.e. the fund transfer should be atomic.
6. Concurrent access Problems
Many systems allows multiple users to update the data
simultaneously. It can also lead the data in an inconsistent
state.Suppose a bank account contains a balance of $ 500 & two
customers want to withdraw $100 & $50 simultaneously. Both
the transaction reads the old balance & withdraw from that old
balance which will result in $450 & &400 which is incorrect.
3. List four significant difference between file system &
7.Security Problems DBMS?
All the user of database should not be able to access all the data.
For example a payroll
Personnel needs to access only that part of data which has
information about various employees & are not needed to
access information about customer accounts.
Points to Ponder
• A DBMS contains collection of inter-related data &
collection of programs to access the data.
2
4. What are the advantages of DBMS? 10. Explain consistency constraints in database?
DATABASE
MANAGEMENT
5. Explain various applications of database?
6. Explain data inconsistency with example?
7. Explain data security? Why it is needed?Explain with

example?
8. Explain isolation & atomicity property of database?
9. Explain why redundancy should be avoided in database?
3
Student Notes
4
DATABASE MANAGEMENT
DATABASE
LESSON 2:
INTRODUCTION TO DATABASE II
MANAGEMENT
Lesson Objective
• Data abstraction
• View of data
• Levels of data
• Physical level
• Logical level
• View level
• Database language
• DDL,DML
View of Data
A database contains a no. of files & certain programs to access &
modify these files.But the actual data is not shown to the user,
• View level
the system hides actual details of how data is stored & main-
tained. This level contains the actual data which is shown to the
users. This is the highest level of abstraction & the user of
Data Abstraction this level need not know the actual details(complexity) of
Data abstraction is the process of distilling data down to its data storage.
essentials. The data when needed should be retrieved efficiently.
As all the details are not of use for all the users, so we hide the
actual(complex) details from users. Various level of abstraction
to data is provided which are listed below:-
• Physical level
It is the lowest level of abstraction & specifies how the data
is actually stored. It describes the complex data structure in
details.
Database Language
As a language is required to understand any thing, similarly to
create or manipulate a database we need to learn a
language.Database language is divided into mainly 2 parts :-
1. DDL(Data definition language)
2. DML(Data Manipulation language)
Data Definition Language (DDL)
Used to specify a database scheme as a set of definitions
• Logical level expressed in a DDL
It is the next level of abstraction & describes what data are 1. DDL statements are compiled, resulting in a set of tables
stored in database & what relationship exists between varius stored in a special file called a data dictionary or data
data. It is less complex than physical level & specifies simple directory.
structures. Though the complexity of physical level is 2. The data directory contains metadata (data about data)
required at logical level, but users of logical level need not 3. The storage structure and access methods used by the
know these complexities. database system are specified by a set of definitions in a
5
special type of DDL called a data storage and definition • Dml
DATABASE
language • Query language

4. basic idea: hide implementation details of the database
Students Activity
schemes from the users
1. Define data abstraction?
Data Manipulation Language (DML)
MANAGEMENT
1. Data Manipulation is:

• retrieval of information from the database
• insertion of new information into the database
• deletion of information in the database
• modification of information in the database
2. A DML is a language which enables users to access and
manipulate data. The goal is to provide efficient human
2. How many views of data abstraction are there? Explain in
interaction with the system.
details?
3. There are two types of DML:
• procedural: the user specifies what data is needed and
how to get it
• nonprocedural: the user only specifies what data is
needed
• Easier for user
• May not generate code as efficient as that produced
by procedural languages
4. A query language is a portion of a DML involving 3. Explain database language? Differentiate between DDL &
information retrieval only. The terms DML and query DML?
language are often used synonymously.
Points to Ponder
• DBMS systems are ubiquitous today & most people
interact either directly or indirectly with database many times
every day.
information.
• A major purpose of a DBMS is to provide users with an
abstract view of data i.e. the system hides how the data is
stored & maintained.
• Structure of a database is defined through DDL. &
manipulated through DML.
• DDL statements are compiled, resulting in a set of tables
stored in a special file called a data dictionary or data
directory.
• A query language is a portion of a DML involving
information retrieval only. The terms DML and query
language are often used synonymously.
Review Terms
• Data Security
• Data Views
• Data Abstraction
• Physical level
• Logical level
• View level
• Database language
• Ddl
6
DATABASE MANAGEMENT
7
Student Notes
DATABASE
LESSON 3:
TUTORIAL
MANAGEMENT
8
DATABASE
LESSON 4:
DATABASE CONCEPTS I
MANAGEMENT
Lesson objective • Null value allowed
• Data dictionary Null or non-existing data value may be or may not be
allowed for an element. Element with possible null values
• Meta data
needs special considerations in reports and may cause
• Database schema problems, if used as a key.
• Database Instance • Default value
• Data independence Data element may have a default value. Default value may
be a variable, like current date and time of the day (DoD).
Data Dictionary
English language dictionaries define data in terms such as • Element coding (allowed values) and intra-element
“…known facts or things used as a basis for inference or validation details or reference to other documents
reckoning, typically (in modern usage) operated upon or Explanation of coding (code tables, etc.) and validation
manipulated by computers, or Factual information used as a rules when validating this element alone in the application
basis for discussion, reasoning, or calculation domain.
Data dictionary may cover the whole organisation, a part of the • Inter-element validation details or reference to other
organisation or a database. In its simplest form, the data documents
dictionary is only a collection of data element definitions. More Validation rules between this element and other elements in
advanced data dictionary contains database schema with the data dictionary.
reference keys, still more advanced data dictionary contains • Database table references
entity-relationship model of the data elements or objects. Reference to tables the element is used and the role of the
Parts of Data Dictionary element in each table. Special indication when the data
element is the key for the table or a part of the key.
1. Data Element Definitions
• Definitions and references needed to understand the
Data element definitions may be independent of table defini-
meaning of the element
tions or a part of each table definition
Short application domain definitions and references to
• Data element number other documents needed to understand the meaning and
Data element number is used in the technical documents. use of the data element.
• Data element name (caption) • Source of the data in the element
Commonly agreed, unique data element name from the Short description in application domain terms, where the
application domain. This is the real life name of this data data is coming. Rules used in calculations producing the
element. element values are usually written here.
• Short description • Validity dates for the data element definition
Description of the element in the application domain. Validity dates, start and possible end dates, when the
element is or was used. There may be several time periods
• Security classification of the data element
the element has been used.
Organisation-specific security classification level or possible
restrictions on use. This may contain technical links to • History references
security systems. Date when the element was defined in present form,
references to superseded elements, etc.
• Related data elements
• External references
List of closely related data element names when the relation
References to books, other documents, laws, etc.
is important.
• Version of the data element document
• Field name(s)
Version number or other indicator. This may include formal
Field names are the names used for this element in
version control or configuration management references,
computer programs and database schemas. These are the
but such references may be hidden, depending on the
technical names, often limited by the programming
system used.
languages and systems.
• Date of the data element document
• Code format
Writting date of this version of the data element
Data type (characters, numeric, etc.), size and, if needed,
document.
special representation. Common programming language
notation, input masks, etc. can be used.
9
• Quality control references For example, meta-data would document data about data
DATABASE
Organisation-specific quality control endorsements, dates, elements or attributes, (name, size, data type, etc) and data
etc. about records or data structures (length, fields, columns, etc)
• Data element notes and data about data (where it is located, how it is associated,
Short notes not included in above parts. ownership, etc.). Meta-data may include descriptive information
about the context, quality and condition, or characteristics of the
MANAGEMENT
Table Definitions data.

Table definition is usually available with SQL command help
table Data Independence
1. The ability to modify a scheme definition in one level
Tablename
without affecting a scheme definition in a higher level is
• Table name called data independence.
• Table owner or database name 2. There are two kinds:
• List of data element (column) names and details • Physical data independence
• Key order for all the elements, which are possible keys • The ability to modify the physical scheme without
• Possible information on indexes causing application programs to be rewritten
• Possible information on table organisation • Modifications at this level are usually to improve
Technical table organisation, like hash, heap, B+ -tree, AVL - performance
tree, ISAM, etc. may be in the table definition. • Logical data independence
• Duplicate rows allowed or not allowed • The ability to modify the conceptual scheme
• Possible detailed data element list with complete data without causing application programs to be
element definitions rewritten
• Possible data on the current contents of the table • Usually done when logical structure of database is
The size of the table and similar site specific information altered
may be kept with the table definition. 3. Logical data independence is harder to achieve as the
• Security classification of the table application programs are usually heavily dependent on the
Security classification of the table is usually same or higher logical structure of the data. An analogy is made to abstract
than its elements. However, there may be views accessing data types in programming languages.
parts of the table with lower security.
Points to Ponder
Database schema • Data dictionary is a collection of data elements & its
It is the overall structure is called a database schema. Database definition.
schema is usually graphical presentation of the whole database.
• Database Schema is the overall structure of a database.
Tables are connected with external keys and key colums. When
accessing data from several tables, database schema will be • Database instance is the structure of a database at a
needed in order to find joining data elements and in complex particular time.
cases to find proper intermediate tables. Some database • Meta-data is the data about data.
products use the schema to join the tables automatically. • The ability to modify a scheme definition in one level
Database system has several schemas according to the level of without affecting a scheme definition in a higher level is
abstraction.The physical schema describes the database design at called data independence.
physical level. The logical schema describes the database design
Review Terms
at logical level.A database can also have sub-schemas(view level)
that describes different views of database. • Database Instance
• Schema
Database Instance
Database Schema
1. Databases change over time.
Physical schema
2. The information in a database at a particular point in time is
called an instance of the database Logical schema
3. Analogy with programming languages: • Physical data independence
• Data type definition - scheme • Database Language
• Value of a variable - instance DDL
Meta-Data DML
Meta-data is definitional data that provides information about Query Language
or documentation of other data managed within an application • Data dictionary
or environment. • Metadata
10
DATABASE
Student Activity
1. What is difference between database Schema & database
instance?
MANAGEMENT
2. What do you understand by the structure of a database?
3. Define physical schema and logical schema?
4. Define data independence?Explain types of data

independence?
5. Define data dictionary, meta-data?
6. Define various elements of data dictionary?
11
Student Notes
12
DATABASE MANAGEMENT
DATABASE
LESSON 5:
DATABASE CONCEPTS II
MANAGEMENT
Lesson Objective • Storage structure and access method definition: writing
• Database manager a set of definitions translated by the data storage and
definition language compiler
• Database user
• Scheme and physical organization modification: writing
• Database administrator
a set of definitions used by the DDL compiler to generate
• Role of Database administrator modifications to appropriate internal system tables (e.g. data
• Role of Database user dictionary). This is done rarely, but sometimes the database
• Database architecture scheme or physical organization must be modified.
Database Manager • Granting of authorization for data access: granting

The database manager is a program module which provides the different types of authorization for data access to various
interface between the low-level data stored in the database and users
the application programs and queries submitted to the system. • Integrity constraint specification: generating integrity
1. Databases typically require lots of storage space (gigabytes). constraints. These are consulted by the database manager
This must be stored on disks. Data is moved between disk module whenever updates occur.
and main memory (MM) as needed. Database Users
2. The goal of the database system is to simplify and facilitate The database users fall into several categories:
access to data. Performance is important. Views provide Application programmers are computer professionals interact-
simplification. ing with the system through DML calls embedded in a program
3. So the database manager module is responsible for written in a host language (e.g. C, PL/1, Pascal).
• Interaction with the file manager: Storing raw data • These programs are called application programs.
on disk using the file system usually provided by a • The DML precompiler converts DML calls (prefaced by a
conventional operating system. The database manager special character like $, #, etc.) to normal procedure calls in a
must translate DML statements into low-level file host language.
system commands (for storing, retrieving and • The host language compiler then generates the object code.
updating data in the database).
• Some special types of programming languages combine
• Integrity enforcement: Checking that updates in the Pascal-like control structures with control structures for the
database do not violate consistency constraints (e.g. no manipulation of a database.
bank account balance below $25)
• These are sometimes called fourth-generation languages.
• Security enforcement: Ensuring that users only have
• They often include features to help generate forms and
access to information they are permitted to see
display data.
• Backup and recovery: Detecting failures due to
• Sophisticated users interact with the system without
power failure, disk crash, software errors, etc., and
writing programs.
restoring the database to its state before the failure
• They form requests by writing queries in a database query
• Concurrency control: Preserving data consistency
language.
when there are concurrent users.
• These are submitted to a query processor that breaks a
4. Some small database systems may miss some of these
DML statement down into instructions for the database
features, resulting in simpler database managers. (For
manager module.
example, no concurrency is required on a PC running MS-
DOS.) These features are necessary on larger systems • Specialized users are sophisticated users writing
special database application programs. These may be
Database Administrator CADD systems, knowledge-based and expert systems,
The database administrator is a person having central control complex data systems (audio/video), etc.
over data and programs accessing that data. Duties of the
• Naive users are unsophisticated users who interact
database administrator include:
with the system by using permanent application
• Scheme definition: the creation of the original database programs (e.g. automated teller machine).
scheme. This involves writing a set of definitions in a DDL
1. Database systems are partitioned into modules for different
(data storage and definition language), compiled by the
functions. Some functions (e.g. file systems) may be
DDL compiler into a set of tables stored in the data
provided by the operating system.
dictionary.
13
2. Components include:
DATABASE
Naïve Application Sophisticated Database

Users Programs Users Administrator
• File manager manages allocation of disk space and
data structures used to represent information on disk.
• Database manager: The interface between low-level Application Application Query Database
Interfaces programs Schema
data and application programs and queries.
MANAGEMENT
• Query processor translates statements in a query

language into low-level instructions the database
manager understands. (May also attempt to find an Query DDL
equivalent but more efficient form.) DML PreCompiler processor compiler
• DML precompiler converts DML statements

embedded in an application program to normal
procedure calls in a host language. The precompiler
Application Program Database Manager
interacts with the query processor. object code
• DDL compiler converts DDL statements to a set of
tables containing metadata stored in a data dictionary.
In addition, several data structures are required for physical
File Manager
system implementation:
• Data files: store the database itself.
• Data dictionary: stores information about the structure of
the database. It is used heavily. Great emphasis should be
placed on developing a good design and efficient
implementation of the dictionary.
• Indices: provide fast access to data items holding particular
Data files
values.
Data
Database System Architecture Data dictionary
Database systems are partitioned into modules for different storage
functions. Some functions (e.g. file systems) may be provided
by the operating system.
Components Include Points to Ponder
• File manager manages allocation of disk space and data • Database manager is a program module which provides
structures used to represent information on disk. the interface between the low-level data stored in the
• Database manager: The interface between low-level data database and the application programs
and application programs and queries. • Database administrator is a person having central control
• Query processor translates statements in a query language over data
into low-level instructions the database manager • Database user is a person who access the database at various
understands. (May also attempt to find an equivalent but level.
more efficient form.) • Data files: store the database itself.
• DML precompiler converts DML statements embedded • Data dictionary: stores information about the structure of
in an application program to normal procedure calls in a the database.
host language. The precompiler interacts with the query
• DML precompiler converts DML statements embedded
processor.
in an application program to normal procedure calls in a
• DDL compiler converts DDL statements to a set of tables host language.
containing metadata stored in a data dictionary.
• File manager manages allocation of disk space and data
In addition, several data structures are required for physical structures used to represent information on disk.
system implementation:
Review Terms
• Data files: store the database itself.
• Database Instance
• Data dictionary: stores information about the structure of
• Schema
the database. It is used heavily. Great emphasis should be
placed on developing a good design and efficient Database Schema
implementation of the dictionary. Physical schema
• Indices: provide fast access to data items holding particular Logical schema
values. • Physical data independence
14
• Database Language
DATABASE
DDL
DML
Query Language
• Data dictionary
MANAGEMENT
• Metadata
• Database Administrator
• Database User
Student Activity
1. What are the various kinds of database users?
2. What do you understand by the structure of a database?
3. Define physical schema and logical schema?
4. Define file manager, dml compiler,data files?
15
Student Notes
16
DATABASE MANAGEMENT
DATABASE
LESSON 6:
DATA MODELS
MANAGEMENT
Lesson objective If we want to create a structure where in a course various
• Understsnding data models students are there & these students are given certain marks in
assignment.
• Different types of data models
• Hierchachical data model
• network data model
• Relational model
Data models are a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
A data model is a “description” of both a container for data
and a methodology for storing and retrieving data from that However, as you can imagine, the hierarchical database model
container. Actually, there isn’t really a data model “thing”. Data has some serious problems. For one, you cannot add a record
models are abstractions, oftentimes mathematical algorithms to a child table until it has already been incorporated into the
and concepts. You cannot really touch a data model. But parent table. This might be troublesome if, for example, you
nevertheless, they are very useful. The analysis and design of wanted to add a student who had not yet signed up for any
data models has been the cornerstone of the evolution of courses.
databases. As models have advanced so has database efficiency. Worse, yet, the hierarchical database model still creates repetition
There are various kinds of data models i.e. in a database records of data within the database. You might imagine that in the
can be arranged in various ways.The various ways in which data database system shown above, there may be a higher level that
can be represented are:- includes multiple course. In this case, there could be redundancy
1. Hierarchial data model because students would be enrolled in several courses and thus
each “course tree” would have redundant student information.
2. Network data model
Redundancy would occur because hierarchical databases handle
3. Relational Model
one-to-many relationships well but do not handle many-to-
4. E-R-Model many relationships well. This is because a child may only have
The Hierarchical Model one parent. However, in many cases you will want to have the
Organization of the records is as a collection of trees. As its child be related to more than one parent. For instance, the
name implies, the Hierarchical Database Model defines hierarchi- relationship between student and class is a “many-to-many”.
cally-arranged data. Not only can a student take many subjects but a subject may
Perhaps the most intuitive way to visualize this type of also be taken by many students. How would you model this
relationship is by visualizing an upside down tree of data. In relationship simply and efficiently using a hierarchical database?
this tree, a single table acts as the “root” of the database from The answer is that you wouldn’t.
which other tables “branch” out.
You will be instantly familiar with this relationship because that
is how all windows-based directory management systems (like
Windows Explorer) work these days.
Relationships in such a system are thought of in terms of
children and parents such that a child may only have one parent
but a parent can have multiple children. Parents and children are Though this problem can be solved with multiple databases
tied together by links called “pointers” (perhaps physical creating logical links between children, the fix is very kludgy and
addresses inside the file system). A parent will have a list of awkward.
pointers to each of their children. Faced with these serious problems, the computer brains of the
world got together and came up with the network model.
Network Databases
In many ways, the Network Database model was designed to
solve some of the more serious problems with the Hierarchical
Database Model. Specifically, the Network model solves the
problem of data redundancy by representing relationships in
terms of sets rather than hierarchy. The model had its origins in
17
the Conference on Data Systems Languages (CODASYL) which • The Sequence of Columns is Insignificant
DATABASE
had created the Data Base Task Group to explore and design a • The Sequence of Rows is Insignificant
method to replace the hierarchical model.
• Each Column Has a Unique Name
The network model is very similar to the hierarchical model
Certain fields may be designated as keys, which means that
actually. In fact, the hierarchical model is a subset of the network
searches for specific values of that field will use indexing to
MANAGEMENT
model. However, instead of using a single-parent tree hierarchy,

speed them up. Where fields in two different tables take values
the network model uses set theory to provide a tree-like
from the same set, a join operation can be performed to select
hierarchy with the exception that child tables were allowed to
related records in the two tables by matching values in those
have more than one parent. This allowed the network model to
fields. Often, but not always, the fields will have the same name
support many-to-many relationships.
in both tables. For example, an “orders” table might contain
Visually, a Network Database looks like a hierarchical Database (customer-ID, product-code) pairs and a “products” table
in that you can see it as a type of tree. However, in the case of a might contain (product-code, price) pairs so to calculate a given
Network Database, the look is more like several trees which customer’s bill you would sum the prices of all products
share branches. Thus, children can have multiple parents and ordered by that customer by joining on the product-code fields
parents can have multiple children. of the two tables. This can be extended to joining multiple
tables on multiple fields. Because these relationships are only
specified at retreival time, relational databases are classed as
dynamic database management system. The RELATIONAL
database model is based on the Relational Algebra.
A basic understanding of the relational model is necessary to
effectively use relational database software such as Oracle,
Microsoft SQL Server, or even personal database systems such
as Access or Fox, which are based on the relational model.
Nevertheless, though it was a dramatic improvement, the

network model was far from perfect. Most profoundly, the
model was difficult to implement and maintain. Most imple-
mentations of the network model were used by computer
programmers rather than real users. What was needed was a
simple model which could be used by real end users to solve
real problems.
Relational Model
The relational model was formally introduced by Dr. E. F. Points to Ponder
Codd in 1970 and has evolved since then, through a series of
• Data models are a collection of conceptual tools for
writings. The model provides a simple, yet rigorously defined,
describing data, data relationships, data semantics and data
concept of how users perceive data. Network model solves the
constraints.
problem of data redundancy by representing relationships in
terms of sets. A relational database is a collection of two- • Types of data models are:-
dimensional tables. The organization of data into relational 1. Hierarchial data model
tables is known as the logical view of the database. That is, the 2. Network data model
form in which a relational database presents data to the user and
3. Relational Model
the programmer. The way the database software physically
stores the data on a computer disk system is called the internal 4. E-R-Model
view. The internal view differs from product to product and • The Hierarchical Database Model defines hierarchically-
does not concern us here. arranged data.
A relational database allows the definition of data structures, • Network model solves the problem of data redundancy by
storage and retrieval operations and integrity constraints. In representing relationships in terms of sets.
such a database the data and relations between them are • Network model solves the problem of data redundancy by
organised in tables. A table is a collection of records and each representing relationships in terms of sets.
record in a table contains the same fields.
Review Terms
Properties of Relational Tables • Data models
• Values Are Atomic
• Hierchical data model
• Each Row is Unique
• Network data model
• Column Values Are of the Same Kind
• Relational data model
18
DATABASE
Students Activity
1. Define data models?
MANAGEMENT
2. Define hierarchichal data model?
3. Define relational data model?
4. Define relational data model?
19
Student Notes
20
DATABASE MANAGEMENT
DATABASE
LESSON 7:
RELATIONAL DATABASE MANAGEMENT SYSTEM I
MANAGEMENT
student_id student_name address
Lesson Objective
• Understanding RDBMS
• Understanding data structures 8656789 Peta Williams 9, Davis Hall
• Understanding data manipulation 8700074 John Smith 9, Davis Hall
8900020 Arun Kumar 90, Second Hall
• Understanding various relational algebra operation 8801234 Peter Chew 88, Long Hall
• Understanding data integrity 8654321 Reena Rani 88, Long Hall
8712374 Kathy Garcia 88, Long Hall
The relational model was proposed by E. F. Codd in 1970. It The relation student
8612345 Chris Watanabe 11, Main Street
deals with database management from an abstract point of
student_id subject_id
view. The model provides specifications of an abstract database
management system.To use the database management systems 8700074 CP302
based on the relational model however, users do not need to 8900020 CP302
master the theoretical foundations. Codd defined the model as 8900020 CP304
consisting of the following three components: 8700074 MA111
1. Data Structure - a collection of data structure types for 8801234 CP302
building the database. 8801234 CH001
2. Data Manipulation - a collection of operators that may be
The relation enrolment
used to retrieve, derive or modify data stored in the data
structures. subject_id subject_name department
CP302 Database Management Comp. Science
3. Data Integrity - a collection of rules that implicitly or
CP304 Software Engineering Comp. Science
explicitly define a consistent database state or changes of CH001 Introduction to Chemistry Chemistry
states PH101 Physics Physics
Data Structure MA111 Pure Mathematics Mathematics
Often the information that an organisation wishes to store in a We list a number of properties of relations:
computer and process is complex and unstructured. For
example, we may know that a department in a university has 1. Each relation contains only one record type.
200 students, most are full-time with an average age of 22 years, 2. Each relation has a fixed numer of columns that are
and most are females. Since natural language is not a good explicitly named. Each attribute name within a relation is
language for machine processing, the information must be unique.
structured for efficient processing. In the relational model the 3. No two rows in a relation are the same.
information is structures in a very simple way 4. Each item or element in the relation is atomic, that is, in
We consider the following database to illustrate the basic each row, every attribute has only one value that cannot be
concepts of the relational data model. decomposed and therefore no repeating groups are allowed.
5. Rows have no ordering associated with them.
6. Columns have no ordering associated with them (although
most commercially available systems do).
The above properties are simple and based on practical consider-
The above database could be mapped into the following ations. The first property ensures that only one type of
relational schema which consists of three relation schemes. Each information is stored in each relation. The second property
relation scheme presents the structure of a relation by specifying involves naming each column uniquely. This has several
its name and the names of its attributes enclosed in parenthe- benefits. The names can be chosen to convey what each column
sis. Often the primary key of a relation is marked by is and the names enable one to distinguish between the column
underlining. and its domain. Furthermore, the names are much easier to
student(student_id, student_name, address) remember than the position of the position of each column if
enrolment(student_id, subject_id) the number of columns is large.
subject(subject_id, subject_name, department) The third property of not having duplicate rows appears
An example of a database based on the above relational model obvious but is not always accepted by all users and designers of
is: DBMS. The property is essential since no sensible context free
21
meaning can be assigned to a number of rows that are exactly
DATABASE
Data Manipulation
the same. The manipulative part of relational model makes set processing
The next property requires that each element in each relation be (or relational processing) facilities available to the user. Since
atomic that cannot be decomposed into smaller pieces. In the relational operators are able to manipulate relations, the user
relation model, the only composite or compound type (data does not need to use loops in the application programs.
Avoiding loops can result in significant increase in the produc-
MANAGEMENT
that can be decomposed into smaller pieces) is a relation. This

simplicity of structure leads to relatively simple query and tivity of application programmers.
manipulative languages. The primary purpose of a database in an enterprise is to be able
The relation is a set of tuples and is closely related to the to provide information to the various users in the enterprise.
concept of relation in mathematics. Each row in a relation may The process of querying a relational database is in essence a way
be viewed as an assertion. For exaple, the relation student of manipulating the relations that are the database. For
asserts that a student by the name of Reena Rani has example, one may wish to know
student_id 8654321 and lives at 88, Long Hall. Similarly the 1. names of all students enrolled in CP302, or
relation subject asserts that one of the subjects offered by the 2. names of all subjects taken by John Smith.
Department of Computer Science is CP302 Database Manage-
ment. The Relational Algebra
The relational algebra is a procedural query language. It consists
In the relational model, a relation is the only compound data
of a set of operations that take one or two relations as input
structure since relation do not allow repeating groups or
and produce a new relation as their result. The fundamental
pointers.
operations in the relational algebra are select, project, union, set
We now define the relational terminology: difference, Cartesian product, and rename. In addition to the
Relation - essentially a table fundamental operations, there are several other operations-
Tuple - a row in the relation namely, set intersection, natural join, division, and assignment.
Attribute - a column in the relation We will define these operations in terms of the fundamental
Degree of a relation - number of attributes in the relation operations.
Cardinality of a relation - number of tuples in the relation Fundamental Operations
Domain - a set of values that an attribute is permitted to take. The select, project, and rename operations are called unary
Same domain may be used by a number of different attributes. operations, because they operate on one relation. The other
Primary key - as discussed in the last chapter, each relation must three operations operate on pairs of relations and are, therefore,
have an attribute (or a set of attributes) that uniquely identifies called binary operations.
each tuple.
Various operations are shown as follows:
Each such attribute (or a set of attributes) is called a candidate
key of the relation if it satisfies the following properties: Operation Symbol Operation Symbol
• (a) the attribute or the set of attributes uniquely identifies Projection Cartesian product
each tuple in the relation (called uniqueness), and Join
Selection
• (b) if the key is a set of attributes then no subset of these
attributes has property (a) (called minimality). Left outer join
Renaming
There may be several distinct set of attributes that may serve as Right outer join
candidate keys. One of the candidate keys is arbitrarily chosen as Union
the primary key of the relation. Intersection Full outer join
The three relations above student, enrolment and subject have
Assignment Semijoin
degree 3, 2 and 3 respectively and cardinality 4, 6 and 5 respec-
tively. The primary key of the the relation student is student_id,
of relation enrolment is (student_id, subject_id), and finally the The Select Operation
primary key of relation subject is subject_id. The relation
student probably has another candidate key. If we can assume
The select operation selects tuples that satisfy a given condition..
the names to be unique than the student_name is a candidate
The argument relation is in parentheses after the σ. Thus, to
key. If the names are not unique but the names and address
select those tuples of the loan relation where the branch is
together are unique, then the two attributes (student_id,
“Perryridge,” we write
address) is a candidate key. Note that both student_id and
(student_id, address) cannot be candidate keys, only one can. σ branch-name = “Perryridge” (loan)
Similarly, for the relation subject, subject_name would be a If the loan relation is as shown , then the relation that results
candidate key if the subject names are unique. from the preceding query will be a different relation.
The relational model is the most popular data model for We can find all tuples in which the amount lent is more than
commercial data processing applications.It is very much simple $1200 writing
due to which the programmer’s work is reduced. σ amount>1200 (loan)
22
In general, we allow comparisons using =, ≠, <, ≤, >, ≥ in the Example: The table E (for Employee)
DATABASE
selection predicate. Furthermore, we can combine several enr ename dept
predicates into a larger predicate by using the connectives and
1 Bill A
(Λ), or (V), and not (¬). Thus, to find those tuples pertaining
to loans of more than $1200 made by the Perryridge branch, we 2 Sarah C
write 3 John A
MANAGEMENT
σ branch-name = “Perryridge” L amount>1200 (loan) Example: The table D (for Department)
dnr dname
A Marketing
B Sales
C Legal
The selection predicate may include comparisons between two
Result Relational algebra
attributes. To Illustrate, consider the relation loan-officer that enr ename dept dnr dname EXD
consists of three attributes: customer-name, banker-name, and 1 Bill A A Marketing
loan-number, which specifies that a particular banker is the loan 1 Bill A B Sales
1 Bill A C Legal
officer for a loan that belongs to some customer. To find all
2 Sarah C A Marketing
customers who have the same name as their loan officer, we can 2 Sarah C B Sales
write 2 Sarah C C Legal
3 John A A Marketing
σ customer-name = banker-name (loan-officer) 3 John A B Sales
Projection 3 John A C Legal
Projection is the operation of selecting certain attributes from a • Seldom useful in practice.
relation R to form a new relation S. For example, one may only
• Can give a huge result.
be interested in the list of names from a relation that has a
number of other attributes. Projection operator may then be The Union Operation
used. Like selection, projection is an unary operator. Consider a query to find the names of all bank customers who
II loan-number, amount (loan) have either an account or a loan or both. Note that the customer
relation does not contain the information, since a customer
Composition of Relational Operations does not need to have either an account or a loan at the bank.
The fact that the result of a relational operation is itself a To answer this query, we need the information in the depositor
relation is important. Consider the more complicated query relation and in the borrower relation . We know how to find
Perryridge
“Find those customers who live1500
in Harrison.” We write: the names of all customers with a loan in the bank:
II customer-name (σ customer-city = “Harrison” (customer)) II customer-name (borrower)
Notice that, instead of giving the name of a relation as the We also know how to find the names of all customers with an
argument of the projection operation, we give an expression account in the bank:
that evaluates to a relation.
II customer-name (depositor)
In general, since the result of a relational-algebra operation is of
To answer the query, we need the union of these two sets; that
the same type (relation) as its inputs, relational-algebra opera-
is, we need all customer names that appear in either or both of
tions can be composed together into a
the two relations. We find these data by the binary operation
loan-number amount union, denoted, as in set theory, by U. So the expression needed
L-11 900 is
L-14 1500
II customer-name (borrower) U II customer-name (depositor)
L-15 1500
L-16 1300 There are 10 tuples in the result, even though there are seven
L-17 1000 distinct borrowers and six depositors. This apparent discrepancy
L-23 2000 occurs because Smith, Jones, and Hayes are borrowers as well as
L-93 500 depositors. Since relations are sets, duplicate values are elimi-
nated.
customer-name
Loan number and the amount of the loan. Adams
Relational-algebra expression. Composing relational-algebra Curry
operations into relational-algebra expressions in just like Hayes
Jackson
composing arithmetic operations (such as +, -, *, and %) into
Jones
arithmetic expressions. Smith
Cartesian Product Williams
Lindsay
The cartesian product of two tables combines each row in one
Johnson
table with each row in the other table. Turner
23
Names of all customers who have either a loan or an account. of the query. For relational-algebra queries, assignment must
DATABASE
For a union operation r U s to be valid, we require that two always be made to a temporary relation variable.Note that the
conditions hold: assignment operation does not provide any additional power to
the algebra. It is, however, a convenient way to express complex
1. The relations r and s must have the same number of
queries.
attributes.
MANAGEMENT
2. The domains of the ith attribute of r and the ith attribute Points to Ponder
of s must be the same, for all i. • The relational model was proposed by E. F. Codd in
Note that r and s can be, in general, temporary relations that are 1970.
the result of relational-algebra expressions. • provides specifications of an abstract database
management system
The Set-Intersection Operation
The first additional-relational algebra operation that we shall • It consists of the following three components:
define is set intersection (I ). Suppose that we wish to find all 1. Data Structure – a collection of data structure types for
customers who have both a loan and an account. Using set building the database.
intersection, we can write 2. Data Manipulation – a collection of operators that may be
IIcustomer-name (borrower) IIcustomer-name (deposi- used to retrieve, derive or modify data stored in the data
tor) structures.
Note that we can rewrite any relational algebra expression that 3. Data Integrity – a collection of rules that implicitly or
uses set intersection by replacing the intersection operation with explicitly define a consistent database state or changes of.
a pair of set-difference operations as: • Relational algebra describes a set of algebraic
operations that operates on tables, & output a table as
Thus, set intersection is not a fundamental operation and does a result.
not add any power to the relational algebra. It is simply more Review Terms
convenient to write r I s than to write r – (r – s). • Table/Relation
The Set Difference Operation • Tuple
The set-difference operation, denoted by -, allows us to find
• Domain
tuples that are in one relation but are not in another. The
expression r – s produces a relation containing those tuples in r • Database schema
but not in s. • Database instance
We can find all customers of the bank who have an account but • Keys
not a loan by writing Primary key
II customer-name (depositor) – II customer-name (borrower) Foreign key
As with the union operation, we must ensure that set differ- • Relational algebra
ences are taken between compatible relations. Therefore, for a
Student Activity
set difference operation r – s to be valid,
1. Why do we use RDBMS?
we require that the relations r and s be of the same arity, and
that the domains of the ith attribute of r and the ith attribute
of s be the same.
The Assignment Operation
It is convenient at times to write a relational-algebra expression
by assigning parts of it to temporary relation variables. The
assignment operation, denoted by ← , works like assignment in
a programming language.
temp1 amount>1200 (loan)
2. Define relation,tuple,domain,keys?
temp2 II loan-number, amount (loan)
result = temp1 – temp2
The evaluation of an assignment does not result in any relation
being displayed to the user. Rather, the result of the expression
to the right of the is assigned to the relation variable on the
left of the . This relation variable may be used in subsequent
expressions.
With the assignment operation, a query can be written as a
sequential program consisting of a series of assignment
followed by an expression whose value is displayed as the result
24
3. What is the difference between Intersection, Union &
DATABASE
Cartesian product?
MANAGEMENT
25
Student Notes
26
DATABASE MANAGEMENT
DATABASE
LESSON 8:
RELATIONAL DATABASE MANAGEMENT SYSTEM II
MANAGEMENT
Lesson Objectives distinct). An example arises in the query “Find the number of
• Elaborating various other features of Relational algebra branches appearing in the pt-works relation.” In this case, a
branch name count only once, regardless of the number of
• Understanding aggregate function employees working that branch. We write this query as follows:
• Understanding joins Gcount-distinct(branch-name) (pt-works)
Understanding natural, outer, inner joins? In the above figure, the result of this query is a single row
Aggregate Functions containing the value 3.
Aggregate functions take a collection of values and return a Suppose we want to find the total salary sum of all part-time
single value as a result. For example, the aggregate function sum employees at each branch of the bank separately, rather than the
takes a collection of values and returns the sum of the values. sun for the entire bank. To do so, we need to partition the
Thus, the function sum applied on the collection relation pt-works into group based on the branch, and to apply
{1,1,3,4,4,11} the aggregate function on each group.
returns the value 24. The aggregate function avg returns the The following expression using the aggregation operator G
average of the values. When applied to the preceding collection, achieves the desired result:
it returns the value 4. The aggregate function count returns the branch-name
G sum(salary) (pt-works)
number of the elements in the collection, and returns 6 on the In the expression, the attribute branch-name in the left-hand
preceding collection. Other common aggregate function include subscript of G indicates that the input relation pt-works must
min and max, which return the minimum and maximum be divided into groups based on the value of branch-name.
values in a collection; they return 1 and 11, respectively, on the Following Figure shows the resulting groups.
preceding collection.
employee-name branch-name salary
The collections on which aggregate functions operate can have
Rao Austin 1500
multiple occurrences of a value; the order in which the values Sato Austin 1600
appear is not relevant. Such collections are called multisets. Sets Johnson Downtown 1500
are a special case of multisets where there is only one copy of Loreena Downtown 1300
each element. Peterson Downtown 2500
To illustrate the concept of aggregation, we take the following Adams Perryridge 1500
example Brown Perryridge 1300
Gopal Perryridge 5300
Adams Perryridge 1500
Brown Perryridge 1300 The expression sum(salary) in the right-hand subscript of G
Gopal Perryridge 5300 indicates that for each group of tuples (that is, each branch), the
Johnson Downtown 1500 aggregation function sum must be applied on the collection of
Loreena Downtown 1300 values of the salary attribute. The output relation consists of
Peterson Downtown 2500 tuples with the branch name, and the sum of the salaries for
Rao Austin 1500 the branch, as shown in Figure
Sato Austin 1600
branch-name sum of salary
Austin 3100
G sum(salary)
(pt-works) Downtown 5300
The symbol G is the letter G in calligraphic fount; read it as Perryridge 8100
“calligraphic G.” The relational-algebra operation G signifies
that aggregation is to be applied, and its subscript specifies the The general from of the aggregation operation G is as follows:
aggregate operation to be applied. The result of the expression G1,G2,…,Gn G F1 (A1), F2(A2),…,Fm(Am)(E)
above is a relation with a single attribute, containing a single Where E is any relational-algebra expression; G1, G2,…, Gn
row with a numerical value corresponding to the sum of all the constitute a list of attributes on which to group; each Fi is an
series of all employees working part-time in the bank. aggregate function; and each Ai is an attribute
There are cases where we must eliminate multiple occurrences of
a value before computing an aggregate function. If we do want
Join
The Natural-Join Opeation
to eliminate duplicates, we use the same function names as
before, with the addition of the hyphenated string “distinct” If is often desirable to simplify certain queries that require a
appended to the end of the function name (for example, count- Cartesian product. Usually, a query that involves a Cartesian
27
product includes a selection operation on the result of the the natural join. The tuple (Gates, null, null, Redmond, 5300) is
DATABASE
Cartesian product. Consider the query “Find the names of all such a tuple. Thus, all information from the right relation is
customers who have a loan at the bank, along with the loan present in the result of the right outer join.
number and the loan amount.” We first form the Cartesian The full outer join ( ) does both of those operations,
product of the borrower and loan relations. Then, we select padding tuples from the left relation that did not match any
those tuples that pertain to only the same loan-number,
MANAGEMENT
from the right relation, as well as tuples from the right relation
followed by the projection of the resulting customer-name, that did not match any from the left relation, and adding them
loan-number, and amount: to the result of the join.
IIcustomer-name, loan.loan-number, amount
(σ borrower.loan-number = loan.loan-number (borrower x
loan))
The natural join is a binary operation that allows us to combine
certain selections and a Cartesian product into one operation. It
Result of employee ft-works.
is denoted by the “join” symbol ( ). The natural-join opera-
tion forms a Cartesian product of its two arguments, performs
a selection forcing equality on those attributes that appear in
both relation schemas, and finally removes duplicate attributes.
Outer Join
The outer-join operation is an extension of the join operation
to deal with missing information. Suppose that we have the Result of employee ft-works.
relations with the following schemas, which contain data on Renaming Tables and Columns
full-time employees: Example: The table E (for Employee)
employee (employee-name, street, city)
ft-works (employee-name, branch-name, salary)
Suppose that we want to generate a single relation with all the
information (street, city, branch name, and salary) about full-
time employees. A possible approach would be to use the
natural join operation as follows: Example: The table D (for Department)
employee ft-works
nr name
We can use the outer-join operation to avoid this loss of A Marketing
information. There are actually three forms of the operation: let
B Sales
outer join, denoted ; right outer join, denoted ; and
C Legal
full outer join, denoted . All three forms of outer join
compute the join, and add extra tuples to the result of the join. We want to join these tables, but:
• Several columns in the result will have the same name (nr
and name).
• How do we express the join condition, when there are two
columns called nr?
The result of employee ft-works.
Solutions
• Rename the attributes, using the rename operator.
• Keep the names, and prefix them with the table name, as is
done in SQL. (This is somewhat unorthodox.)
Result Relational algebra
Result of employee ft-works. enr ename dept dnr dname
1 Bill A A Marketing (p (enr, ename, dept)(E)) ? dept = dnr (p (dnr,
The left outer join ( ) takes all tuples in the left relation that
2 Sarah C C Legal dname)(D))
did not match with any tuple in the right relation, pads the 3 John A A Marketing
tuple with null values for all other attributes from the right
relation, and adds them to the result of the natural join. The You can use another variant of the renaming operator to change
tuple (Smith, Revolver, Death Valley, null, null) is such a tuple. the name of a table, for example to change the name of E to R.
All information from the left relation is present in the result of This is necessary when joining a table with itself (see below).
the left outer join. .p R(E)
The right outer join ( ) is symmetric with the left outer join: A third variant lets you rename both the table and the columns:
It pads tuples from the right relation that did not match any (E)
.p R(enr, ename, dept)
from the left relation with nulls and adds them to the result of
28
5. Define rename operators?
DATABASE
Points to Ponder
• Aggregate functions take a collection of values and return a
single value as a result.
• Usually, a query that involves a Cartesian product includes a
selection operation on the result of the Cartesian product.
MANAGEMENT
Review Terms
• Aggregate functions
• Joins
• Natural join
• Outer join
• Right outer join
• Left outer join
Student Notes
1. Define aggregate functions with example?
2. Define joins?What is natural join?
3. Differentiate between inner join & outer join?
4. Differentiate between left outer join & right outer join with
the help of example?
29
Student Notes
30
DATABASE MANAGEMENT
DATABASE
LESSON 9:
E-R MODEL - I
MANAGEMENT
Lesson Objective Each entity has a value for each of its attributes.For each
• Understanding entity attribute ,there is a set of permitted values called domain or
value set.
• Understanding relationship
• Understanding attribute, domain, entity set Simple and Composit Attributes
A simple attribute has got one value for its attribute & a
• Understanding Simple & composit Attributes
comosit attribute is one which can be divided into sub-parts.
• Understanding Derived Attribute For example an attribute name can be divided into first name
• Understanding relationship set middle name & last name.
• Components of E-R-Diagrams Single and Multivalued Attributes
• Designing E-R-diagrams An attribute which have got only one value is known as single
ER considers the real world to consist of entities and relation- valued attribute.For ex. the loan_no attribute will have only one
ships among them. An Entity is a ‘thing’ which can be distinctly loan_no. There may be cases when an attribute has a set of
identified, for example a person, a car, a subroutine, a wire, an values for a specific entity.For ex. an attribute phone_no. may
event. have a value zero,one or several phone_no. This is known as
multivalued attribute.
A Relationship is an association among entities, eg person
Owns car Derived Attribute
Its value is derived from value of other related attributes or
is an association between a person and a car person EATS dish
entities.For ex. an attribute age can be calculated from another
IN place is an association among a person, a dish and a place.
attribute date_of_birth.
Attribute, Value, Domain, Entity Set
The information about one entity is expressed by a set of
Relationship, Relationship Set
A relationship is an association among several entities.For ex. a
(attribute,value) pairs, eg a car model could be:
customer A is associated with loan_no L1. A relationship set is
| Name = R1222 |
a subset of the cartesian product of entity sets. For example a
Power = 7.3 relationship set (abbreviated as RSet) on the relationship
Nseats =5 ‘Person HAS_EATEN Dish IN Place’ could be as shown in
Values of attributes belong to different value-sets or domains, ---------------------------------------
| RSet 'Person HAS_EATEN Dish IN Place' |
for example, for a car, |---------------------------------------|
| Person | Dish | Place |
Nseats is an integer between 1 and 12 |---------|--------|--------------------|
Entities defined by the same set of attributes can be grouped | Steve | Duck | Kuala Lumpur |
| Weiren | Duck | Beijing |
into an Entity Set (abbreviated as ESet) as shown in | Paolo | Noodles| Naples |
| Mike | Fondue | Geneva |
|-------------------------------| | Paolo | Duck | Beijing |
| ESET : CarModel | |---------------------------------------|
|-------------------------------|
| Name | Power | Nseats | Notice that an RSet is an ESet having ESets as attributes.
|----------|--------------------|
| R1222 | 7.3 | 5 | Components of ER-D
| HZ893 | 6.8 | 5 | The overall logical structure of a database can be expressed
| R1293 | 5.4 | 4 | graphically by an E-R diagram.The various components of an
|-------------------------------|
E-R diagram are as folows:-
1. Rectapgles, which represent entity sets.
An Entity Set
A given set of attributes may be referred to as an entity type. All 2. _ Ellipses, which represent attributes.
entities in a given ESet are of the same type, but sometimes 3. Diamonds, which represents relationship sets
there can be more that one set of the same type. The set of all 4. Lines, which link attributes to entity sets and entity sets to
persons who are customers at a given bank can be defined as an relationship sets.
entity set customer. The individual entity that constitutes a set
5. Double ellipses, which represent multi-valued attributes.
are said to be extension to entity set. So all the individual bank
customers are the extension of entity set customer. 6. Dashed ellipse which represents derived attributes
7. Double lines,which indicate total participation of an entity
in a relationship set.
31
DATABASE
ENTITY RELATIONSHIP DIAGRAM NOTATIONS
Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin
have added some sligh refinements to the basic ERD principles.
MANAGEMENT
Entity
An entity is an object or concept about which you want to
store information.
Weak Entity
A weak entity is dependent on another entity to exist..
Attributes
Attributes are the properties or characteristics of an
entity.
Key attribute
A key attribute is the unique, distinguishing characteristic
of the entity. For example, an employee's social security
number might be the employee's key attribute.
Multivalued attribute
A multivalued attribute can have more than one value.
For example, an employee entity can have multiple skill
values.
Derived attribute
A derived attribute is based on another attribute. For
example, an employee's monthly salary is based on the
employee's annual salary.
Relationships
Relationships illustrate how two entities share information
in the database structure.
32
DATABASE
Weak relationship
To connect a weak entity with others, you
MANAGEMENT
should use a weak relationship notation.
Cardinality
Cardinality specifies how many instances of
an entity relate to one instance of another
entity.
Ordinality is also closely linked to cardinality.

While cardinality specifies the occurences of a
relationship, ordinality describes the
relationship as either mandatory or optional.
In other words, cardinality specifies the
maximum number of relationships and
ordinality specifies the absolute minimum
number of relationships.
Recursive relationship
In some cases, entities can be self-linked. For
example, employees can supervise other
employees.
Mapping Cardinalities
It express the number of entities to which other entity can be
associated via a relationship set. It can be of the following
types:-
1. One to one: An entity in A is associated with at most one
entity in B, & an entity in B is associated with at most one
entity in A.
2. One to many: An entity in A is associated with any
number(zero or more) of entities in B.An entity in B
however can be associated with at most one entity in A.
3. Many to one: An entity in A is associated with at most one
entity in B.An entity in B however can be associated with
any number(zero or more) of entities in A.
4. Many to many: An entity in A is associated with any
number(zero or more) of entities in B & an entity in B is
associated with any number(zero or more) of entities in A.
33
One to many relationship
DATABASE
2. Define relationship, relationship set?

MANAGEMENT
Many to one relationship
3. Differentiate between simple & composit attribute?
One to one relationship
4. Define derived attribute?
Points to Ponder
• An Entity is a ‘thing’ which can be distinctly identified, for
example a person, a car, a subroutine, a wire, an event.
• A Relationship is an association among entities, eg person
Owns car
• A given set of attributes may be referred to as an entity type
• A simple attribute has got one value for its attribute & a 5. Differentiate between single & multi-valued attribute?
comosit attribute is one which can be divided into sub-
parts.
• value is derived from value of other related attributes or
entities is known as derived attribute.
• A relationship is an association among several entities
• A relationship set is a subset of the cartesian product of
entity sets
6. Define cardinality?Explain various kinds of cardinality?
Review Terms
• Entity
• Entity set
• Attribute
• Domain
• Value
• Relationship
7. Define various components of E-R-Diagram?
• Relationship set
• Cardinality
• Association
Student Notes
1. Define entity, domain,value?
34
DATABASE MANAGEMENT
35
Student Notes
DATABASE
LESSON 10:
E-R MODEL - II
MANAGEMENT
Lesson Objective • A superkey may contain extraneous attributes, and we are

• Designing E-R-Diagram often interested in the smallest superkey. A superkey for
which no subset is a superkey is called a candidate key.
• Understanding keys
• In the example above, S.I.N. is a candidate key, as it is
• Super key,primary key, composit key
minimal, and uniquely identifies a customer entity.
Entity relationship diagram methodology
• A primary key is a candidate key (there may be more than
Developing Entity Relationship Diagrams one) chosen by the DB designer to identify entities in an
(ERDs) entity set.
Why An entity set that does not possess sufficient attributes to form
Entity Relationship Diagrams are a major data modelling tool a primary key is called a weak entity set. One that does have a
and will help organize the data in your project into entities and primary key is called a strong entity set.
define the relationships between the entities. This process has For example,
proved to enable the analyst to produce a good database
• The entity set transaction has attributes transaction-number,
structure so that the data can be stored and retrieved in a most
date and amount.
efficient manner.
• Different transactions on different accounts could share the
Information same number.
Entity • These are not sufficient to form a primary key (uniquely
A data entity is anything real or abstract about which we want to identify a transaction).
store data. Entity types fall into five classes: roles, events,
• Thus transaction is a weak entity set.
locations, tangible things or concepts. E.g. employee, payment,
campus, book. Specific examples of an entity are called in- For a weak entity set to be meaningful, it must be part of a one-
stances. E.g. the employee John Jones, Mary Smith’s payment, to-many relationship set. This relationship set should have no
etc. descriptive attributes. (Why?)
The idea of strong and weak entity sets is related to the
Relationship
existence dependencies seen earlier.
A data relationship is a natural association that exists between
one or more entities. E.g. Employees process payments. • Member of a strong entity set is a dominant entity.
Cardinality defines the number of occurrences of one entity • Member of a weak entity set is a subordinate entity.
for a single occurrence of the related entity. E.g. an employee A weak entity set does not have a primary key, but we need a
may process many payments but might not process any means of distinguishing among the entities.
payments depending on the nature of her job.
The discriminator of a weak entity set is a set of attributes that
Attribute allows this distinction to be made.
A data attribute is a characteristic common to all or most The primary key of a weak entity set is formed by taking the
instances of a particular entity. Synonyms include property, data primary key of the strong entity set on which its existence
element, field. E.g. Name, address, Employee Number, pay rate depends (see Mapping Constraints) plus its discriminator.
are all attributes of the entity employee. An attribute or
combination of attributes that uniquely identifies one and only To illustrate
one instance of an entity is called a primary key or identifier. E.g. • transaction is a weak entity. It is existence-dependent on
Employee Number is a primary key for Employee. account.
Keys • The primary key of account is account-number.
Differences between entities must be expressed in terms of • transaction-number distinguishes transaction entities within
attributes. the same account (and is thus the discriminator).
• A superkey is a set of one or more attributes which, taken • So the primary key for transaction would be (account-
collectively, allow us to identify uniquely an entity in the number, transaction-number).
entity set. Just Remember: The primary key of a weak entity is found by
• For example, in the entity set customer, customer-name and taking the primary key of the strong entity on which it is
S.I.N. is a superkey. existence-dependent, plus the discriminator of the weak entity
• Note that customer-name alone is not, as two customers set.
could have the same name.
36
3. Examine relationships between entities closely. Are they
DATABASE
An Entity Relationship Diagram
Methodology necessary? Are there any relationships missing? Eliminate
any redundant relationships. Don’t connect relationships to
Identify the roles, events, locations, tangible things
1. Identify Entities or concepts about which the end-users want to each other.
store data.
Find the natural associations between pairs of
4. Use colors to highlight important portions of your
MANAGEMENT
2. Find Relationships diagram.
entities using a relationship matrix.
Put entities in rectangles and relationships on line
3. Draw Rough ERD
segments connecting the entities.
Determine the number of occurrences of one entity
4. Fill in Cardinality
for a single occurrence of the related entity.
Identify the data attribute(s) that uniquely identify
5. Define Primary Keys
one and only one occurrence of each entity.
Eliminate Many-to-Many relationships and include
6. Draw Key-Based ERD
primary and foreign keys in each entity.
Name the information details (fields) which are
7. Identify Attributes
essential to the system under development.
For each attribute, match it with exactly one entity
8. Map Attributes
that it describes.
Adjust the ERD from step 6 to account for entities
9. Draw fully attributed ERD
or relationships discovered in step 8.
Does the final Entity Relationship Diagram
10. Check Results
accurately depict the system data?
A Simple Example
A company has several departments. Each department has a
supervisor and at least one employee. Employees must be
assigned to at least one, but possibly more departments. At
least one employee is assigned to a project, but an employee
may be on vacation and not assigned to any projects. The
important data fields are the names of the departments,
projects, supervisors and employees, as well as the supervisor
and employee number and a unique project number.
1. Identify Entities
The entities in this system are Department, Employee, Supervi-
sor and Project. One is tempted to make Company an entity,
but it is a false entity because it has only one instance in this
problem. True entities must have more than one instance.
2. Find Relationships
We construct the following Entity Relationship Matrix:
Department Employee Supervisor Project
Department is assigned run by
Employee belongs to works on
Supervisor Runs
Project uses
3. Draw Rough ERD

We connect the entities whenever a relationship is shown in the
entity Relationship Matrix.
E-R diagram with an attribute attached to a relationship set.
If a relationship set has also some attributes associated with it,

Tips for Effective ER Diagrams then we link these ttributes to that relationship set. For
1. Make sure that each entity only appears once per diagram. example, we have the accessdrate descriptive attribute attached
2. Name every entity, relationship, and attribute on your to the relationship set depositor to specify the most ecent date
diagram. on which a customer accessed that account
37
E-R diagram with composite, multivalued, and derived
DATABASE
Student Activity
attributes 1. Explain the difference between primary key, candidate key &
super key?
MANAGEMENT
2. Why is redundancy a bad practice ?

It shows how composite attributes can be represented in the E-
R notation. Here, a composite attribute name, with component
attributes first-name, middle-initial, md last-name replaces the
simple attribute customer-name of customer. Also, a compos-
te attribute address, whose component attributes are street, city,
state, and zip-code) replaces the attributes customer-street and
customer-city of customer. The attribute street is tself a
composite attribute whose component attributes are street-
number, street-name, md apartment number.
3. Construct an E-R-diagram for a car insurance company
It also illustrates a multivalued attribute phone-number,
whose customer owns one or more cars each.Each car is
depicted by a louble ellipse, and a derived attribute age, depicted
associated with it zero to any number of recorded student?
by a dashed ellipse.
Points to Ponder
• ER considers the real world to consist of entities and
relationships among them.
• An Entity is a ‘thing’ which can be distinctly identified
• A Relationship is an association among entities
• The information about one entity is expressed by a set of
(attribute,value)
4. Design an E-R diagram for keeping tracks of the exploits
• A given set of attributes may be referred to as an entity type
of your favourite sports team. You should store the
• Attributes are of various types:- matches played, scores in each match, players in each match
Single & Multi-valued & individual player statistics for each match. Summary
Simple & Composit statistics should be modelled as derived attribute?
Derived
• E-R-diagrams are expressed through various symbols
• Cardinality expresses the number of entities to which other
entity can be associated via a relationship set
Review Terms
• Entity.
• Relationship.
• Attribute.
• dOMAIN
• E-R-D Symbols.
• Various kinds of keys.
• Types of Attributes
• Types of Cardinality
38
DATABASE MANAGEMENT
39
Student Notes
DATABASE
LESSON 11:
STRUCTURED QUERY LANGUAGE(SQL)I
MANAGEMENT
Lesson Objective these tables. Tables are uniquely identified by their names and
• Understanding SQL are comprised of columns and rows. Columns contain the
column name, data type, and any other attributes for the
• Understanding DDL, DML
column. Rows contain the records or data for the columns.
• Creating tables Here is a sample table called “weather”.
• Selecting data city, state, high, and low are the columns. The rows contain the
• Constraints data for this table:
• Drop Weather
• Insert city state high low
• Select with conditions Phoenix Arizona 105 90
SQL (pronounced “ess-que-el”) stands for Structured Query Tucson Arizona 101 92
Language. SQL is used to communicate with a database.
Flagstaff Arizona 88 69
According to ANSI (American National Standards Institute), it
is the standard language for relational database management San Diego California 77 60
systems. SQL statements are used to perform tasks such as New
Albuquerque 80 72
update data on a database, or retrieve data from a database. Mexico
Some common relational database management systems that
Data-Definition Language
use SQL are: Oracle, Sybase, Microsoft SQL Server, Access,
The set of relations in a database must be specified to the
Ingres, etc. Although most database systems use SQL, most of
system by means of a data definition language (DDL).
them also have their own additional proprietary extensions that
are usually only used on their system. However, the standard The SQL DDL allows specification of not only a set of
SQL commands such as “Select”, “Insert”, “Update”, “Delete”, relations, but also information about each relation, including
“Create”, and “Drop” can be used to accomplish almost • The schema for each relation
everything that one needs to do with a database. • The domain of values associated with each attribute
The SQL language has several parts: • The integrity constraints
• Data-definition language (DDL). The SQL DDL • The set of indices to be maintained for each relation
provides commands for defin-ing relation schemas, deleting
• The security and authorization information for each relation
relations, and modifying relation schemas.
• The physical storage structure of each relation on disk
• Interactive data-manipulation language (DML). The
SQL DML includes a query language based on both the Creating Tables
relational algebra and the tuple relational calculus. It includes The create table statement is used to create a new table. Here is
also commands to insert tuples into, delete tuples from, and the format of a simple create table statement:
modify tuples in the database. create table “tablename”
• View definition. The SQL DOL includes commands for (“column1” “data type”,
defining views. “column2” “data type”,
• Transaction control. SQL includes commands for “column3” “data type”);
specifying the beginning and ending of transactions.
Format of create table if you were to use optional constraints:
• Embedded SQL and dynamic SQL. Embedded and
dynamic SQL define how SQL statements can be embedded create table “tablename”
within general-purpose programming lan-guages, such as C, (“column1” “data type”
C++, Java, PUr, Cobol, Pascal, and Fortran. [constraint],
• Integrity. The SQL DDL includes commands for specifying “column2” “data type”
integrity constraints that the data stored in the database
[constraint],
must satisfy. Updates that violate integrity constraints are
disallowed. “column3” “data type”
• Authorization. The SQL DDL includes commands for [constraint]);

specifying access rights to relations and views. [ ] = optional
A relational database system contains one or more objects called Note: You may have as many columns as you’d like, and the
tables. The data or information for the database are stored in constraints are optional.
40
following information about your new employees: firstname,
DATABASE
Example
create table employee lastname, title, age, and salary.
(first varchar(15),
last varchar(20),
age number(3),
MANAGEMENT
address varchar(30),
city varchar(20),
state varchar(20));
To create a new table, enter the keywords create table followed Drop a Table
by the table name, followed by an open parenthesis, followed The drop table command is used to delete a table and all rows
by the first column name, followed by the data type for that in the table.
column, followed by any optional constraints, and followed by
To delete an entire table including all of its rows, issue the drop
a closing parenthesis. It is important to make sure you use an
table command followed by the tablename. drop table is
open parenthesis before the beginning table, and a closing
different from deleting all of the records in the table. Deleting
parenthesis after the end of the last column definition. Make
all of the records in the table leaves the table including column
sure you seperate each column definition with a comma. All
and constraint information. Dropping the table removes the
SQL statements should end with a “;”.
table definition as well as all of its rows.
The table and column names must start with a letter and can be
drop table “tablename”
followed by letters, numbers, or underscores - not to exceed a
total of 30 characters in length. Do not use any SQL reserved Example
keywords as names for tables or column names (such as drop table myemployees;
“select”, “create”, “insert”, etc). Inserting into a Table
Data types specify what the type of data can be for that particu- The insert statement is used to insert or add a row of data into
lar column. If a column called “Last_Name”, is to be used to the table.
hold names, then that particular column should have a To insert records into a table, enter the key words insert into
“varchar” (variable-length character) data type. followed by the table name, followed by an open parenthesis,
Here are the most common Data types: followed by a list of column names separated by commas,
Fixed-length character string. Size is specified in followed by a closing parenthesis, followed by the keyword
char(size)
parenthesis. Max 255 bytes. values, followed by the list of values enclosed in parenthesis.
Varchar(size)
Variable-length character string. Max size is specified in The values that you enter will be held in the rows and they will
parenthesis. match up with the column names that you specify. Strings
Number value with a max number of column digits
number(size)
specified in parenthesis.
should be enclosed in single quotes, and numbers should not.
Date Date value insert into “tablename”
Number value with a maximum number of digits of "size" (first_column,...last_column)
number(size,d) total, with a maximum number of "d" digits to the right of
the decimal. values (first_value,...last_value);
Number value with a maximum number of digits of "size" In the example below, the column name first will match up
number(size,d) total, with a maximum number of "d" digits to the right of
the decimal. with the value ‘Luke’, and the column name state will match up
with the value ‘Georgia’.
Constraints
When tables are created, it is common for one or more columns Example
to have constraints associated with them. A constraint is insert into employee
basically a rule associated with a column that the data entered (first, last, age, address, city, state)
into that column must follow. For example, a “unique”
values (‘Luke’, ‘Duke’, 45, ‘2130 Boars Nest’,
constraint specifies that no two records can have the same value
in a particular column. They must all be unique. The other two ‘Hazard Co’, ‘Georgia’);
most popular constraints are “not null” which specifies that a Note: All strings should be enclosed between single quotes:
column can’t be left blank, and “primary key”. A “primary key” ‘string’
constraint defines a unique identification of each record (or row)
Students Activity
in a table.
1. Insert data into your new employee table.
It’s now time for you to design and create your own table. If
you decide to change or redesign the table, you can either drop it
and recreate it or you can create a completely different one.
Students Activity
You have just started a new company. It is time to hire some
employees. You will need to create a table that will contain the
41
DATABASE
The Like pattern matching operator can also be used in the

conditional selection of the where clause. Like is a very powerful
operator that allows you to select only rows that are “like” what
2. Your first three employees are the following: you specify. The percent sign “%” can be used as a wild card to
MANAGEMENT
Jonie Weber, Secretary, 28, 19500.00 match any possible character that might appear before or after
Potsy Weber, Programmer, 32, 45300.00 the characters specified. For example:
Dirk Smith, Programmer II, 45, 75020.00 select first, last, city
from empinfo
where first LIKE ‘Er%’;
This SQL statement will match any first names that start with
‘Er’. Strings must be in single quotes.
Or you can specify,
select first, last
from empinfo
3. Enter these employees into your table first, and then insert where last LIKE ‘%s’;
at least 5 more of your own list of employees in the table This statement will match any last names that end in a ‘s’.
select * from empinfo
where first = ‘Eric’;
This will only select rows where the first name equals ‘Eric’
exactly. Sample Table: empinfo
first Last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Selecting Data Mary Ann Edwards 88233 32 Phoenix Arizona
The select statement is used to query the database and retrieve
Ginger Howell 98002 42 Cottonwood Arizona
selected data that match the criteria that you specify. Here is the
format of a simple select statement: Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
select “column1”
Mary Ann May 32326 52 Tucson Arizona
[,”column2",etc]
Erica Williams 32327 60 Show Low Arizona
from “tablename”
Leroy Brown 32380 22 Pinetop Arizona
[where “condition”];
Elroy Cleaver 32382 22 Globe Arizona
[] = optional
The column names that follow the select keyword determine
which columns will be returned in the results. You can select as Some more examples
many column names that you’d like, or you can use a “*” to
select all columns. select first, last, city from empinfo;
The table name that follows the keyword from specifies the select last, city, age from empinfo
table that will be queried to retrieve the desired results. where age > 30;
The where clause (optional) specifies which data values or rows select first, last, city, state from empinfo
will be returned or displayed, based on the criteria described where first LIKE ‘J%’;
after the keyword where. select * from empinfo;
=
Conditional selections Equal
used in the where clause: select first, last, from empinfo
> Greater than
where last LIKE ‘%s’;
< Less than
select first, last, age from empinfo
>= Greater than or equal
where last LIKE ‘%illia%’;
<= Less than or equal
select * from empinfo where first = ‘Eric’;
<> Not equal to
LIKE *See note below
42
7. Select all columns for everyone whose last name ends in
DATABASE
Student Activity
1. Display the first name and age for everyone that’s in the “ith”.
table.
MANAGEMENT
Points to Ponder
2. Display the first name, last name, and city for everyone • SQL is used to communicate with a database.
that’s not from Payson.
• The set of relations in a database must be specified to the
system by means of a data definition language (DDL).
• The create table statement is used to create a new table
• A constraint is basically a rule associated with a column that
the data entered into that column must follow
• The drop table command is used to delete a table and all
rows in the table.
• The select statement is used to query the database and
3. Display all columns for everyone that is over 40 years old. retrieve selected data that match the criteria that you specify.
4. Display the first and last names for everyone whose last • The insert statement is used to insert or add a row of data
name ends in an “ay”. into the table.
• The Like pattern matching operator can also be used in the
conditional selection of the where clause
Review Terms
• SQL
• DDL
• DML
• Constraints
5. Display all columns for everyone whose first name equals • Insert
“Mary”. • Create
• Select
• Drop
• Constraints
• Like
6. Display all columns for everyone whose first name contains

“Mary”.
43
Student Notes
44
DATABASE MANAGEMENT
DATABASE MANAGEMENT
45
LESSON 12:
LAB
LESSON 13:
LAB
46
DATABASE MANAGEMENT
DATABASE
LESSON 14:
SQL-II
MANAGEMENT
Lesson Objective *The WHERE clause (optional) specifies which data values or
• Elaborating Select statement rows will be returned or displayed, based on the criteria
described after the keyword where.
• Rename operator
• Aggregate function Example
SELECT name, age, salary FROM employee WHERE age > 50;
• Having
The above statement will select all of the values in the name,
• Order-by
age, and salary columns from the employee table whose age is
• Group-by greater than 50.
• IN & Between
Comparison Operators
The SELECT statement is the core of SQL, and it is likely that
the vast majority of your SQL commands will be SELECT = Equal
statements.There are enormous amount of options available > Greater than
for the SELECT statement.When constructing SQL Queries < Less than
(with the SELECT statement), it is very useful to know all of >= Greater than or equal to
the possible options and the best or more efficient way to do
<= Less than or equal to
things
<> or != Not equal to
The Rename Operation LIKE String comparison test
SQL provides a mechanism for renaming both relations and
*Note about LIKE
attributes. It uses the as clause, taking the form:
old-name as new-name Example
Select name, title, dept
Example
From employee
select first as name, last, city as emp_city from empinfo
Where title Like ‘Pro%’;
Select Statement The above statement will select all of the rows/values in the
The SELECT statement is used to query the database and name, title, and dept columns from the employee table whose
retrieve selected data that match the criteria that you specify. title starts with ‘Pro’. This may return job titles including
The SELECT statement has five main clauses to choose from, Programmer or Pro-wrestler.
although, FROM is the only required clause. Each of the clauses All and Distinct are keywords used to select either All (default)
have a vast selection of options, parameters, etc. The clauses will or the “distinct” or unique records in your query results. If you
be listed below. would like to retrieve just the unique records in specified
Here is the format of the Select statement: columns, you can use the “Distinct” keyword. Distinct will
discard the duplicate records for the columns you specified after
Select [All | Distinct] column1[,column2]
the “SELECT” statement: For example:
From table1[,table2]
[Where “conditions”] Select Distinct age FROM employee_info;
[Group By “column-list”] This statement will return all of the unique ages in the
[Having “conditions] employee_info table.
[Order By “column-list” [Asc | Desc] ] ALL will display “all” of the specified columns including all of
Select and From Clause Review the duplicates. The ALL keyword is the default if nothing is
Select first_column_name, second_column_name specified.
From table_name Note: The following two tables we will be using
Where first_column_name > 1000;
*The column names that follow the SELECT keyword
determine which columns will be returned in the results. You
can select as many column names that you’d like, or you can use
a * to select all columns. The order they are specified will be the
order that they are returned in your query results.
*The table name that follows the keyword FROM specifies the
table that will be queried to retrieve the desired results.
47
2. Select all columns from the items_ordered table for
DATABASE
Items_ordered
whoever purchased a Tent.
customerid order_date item quantity Price
10330 30-Jun-1999 Pogo stick 1 28.00
10101 30-Jun-1999 Raft 1 58.00
10298 01-Jul-1999 Skateboard 1 33.00
MANAGEMENT
10101 01-Jul-1999 Life Vest 4 125.00

10299 06-Jul-1999 Parachute 1 1250.00
10339 27-Jul-1999 Umbrella 1 4.50
10449 13-Aug-1999 Unicycle 1 180.79
10439 14-Aug-1999 Ski Poles 2 25.50
10101 18-Aug-1999 Rain Coat 1 18.30
10449 01-Sep-1999 Snow Shoes 1 45.00 3. Select the customerid, order_date, and item values from the
10439 18-Sep-1999 Tent 1 88.00
items_ordered table for any items in the item column that
10298 19-Sep-1999 Lantern 2 29.00
10410 28-Oct-1999 Sleeping Bag 1 89.22 start with the letter “S”.
10438 01-Nov-1999 Umbrella 1 6.75
10438 02-Nov-1999 Pillow 1 8.50
10298 01-Dec-1999 Helmet 1 22.00
10449 15-Dec-1999 Bicycle 1 380.50
10449 22-Dec-1999 Canoe 1 280.00
10101 30-Dec-1999 Hoola Hoop 3 14.75
10330 01-Jan-2000 Flashlight 4 28.00
10101 02-Jan-2000 Lantern 1 16.00
10299 18-Jan-2000 Inflatable Mattress 1 38.00
10438 18-Jan-2000 Tent 1 79.99
10413 19-Jan-2000 Lawnchair 4 32.00 4. Select the distinct items in the items_ordered table. In other
10410 30-Jan-2000 Unicycle 1 192.50 words, display a listing of each of the unique items from
the items_ordered table.
Customers
customerid firstname lastname city state
10101 John Gray Lynden Washington
10298 Leroy Brown Pinetop Arizona
10299 Elroy Keller Snoqualmie Washington
10315 Lisa Jones Oshkosh Wisconsin
10325 Ginger Schultz Pocatello Idaho
10329 Kelly Mendoza Kailua Hawaii
10330 Shawn Dalton Cannon Beach Oregon Aggregate Functions
10338 Michael Howell Tillamook Oregon
10339 Anthony Sanchez Winslow Arizona MIN returns the smallest value in a given column
10408 Elroy Cleaver Globe Arizona MAX returns the largest value in a given column
10410 Mary Ann Howell Charleston South Carolina SUM returns the sum of the numeric values in a given column
10413 Donald Davids Gila Bend Arizona AVG returns the average value of a given column
10419 Linda Sakahara Nogales Arizona COUNT returns the total number of values in a given column
10429 Sarah Graham Greensboro North Carolina COUNT(*) returns the number of rows in a table
10438 Kevin Smith Durango Colorado
10439 Conrad Giles Telluride Colorado Aggregate functions are used to compute against a “returned
10449 Isabela Moore Yuma Arizona column of numeric data” from your SELECT statement. They
basically summarize the results of a particular column of
Students Activity
selected data. We are covering these here since they are required
1. From the items_ordered table, select a list of all items by the next topic, “GROUP BY”. Although they are required
purchased for customerid 10449. Display the customerid, for the “GROUP BY” clause, these functions can be used
item, and price for this customer. without the “GROUP BY” clause. For example:
SELECT AVG(salary)FROM employee;
This statement will return a single result which contains the
average value of everything returned in the salary column from
the employee table.
Another Example
SELECT AVG(salary)FROM employee WHERE title =
‘Programmer’;
This statement will return the average salary for all employees
whose title is equal to ‘Programmer’
48
SELECT column1, SUM(column2)
DATABASE
Example
SELECT Count(*)FROM employees; FROM “list-of-tables”
This particular statement is slightly different from the other GROUP BY “column-list”;
aggregate functions since there isn’t a column supplied to the
Let’s say you would like to retrieve a list of the highest paid
count function. This statement will return the number of rows
salaries in each dept:
in the employees table.
MANAGEMENT
SELECT max(salary), dept
Students Activity FROM employee
1. Select the maximum price of any item ordered in the GROUP BY dept;
items_ordered table. Hint: Select the maximum price only. This statement will select the maximum salary for the people in
each unique department. Basically, the salary for the person who
makes the most in each department will be displayed. Their,
salary and their department will be returned.
For example, take a look at the items_ordered table. Let’s say
you want to group everything of quantity 1 together, everything
of quantity 2 together, everything of quantity 3 together, etc. If
you would like to determine what the largest cost item is for
each grouped quantity (all quantity 1’s, all quantity 2’s, all
2. Select the average price of all of the items ordered that were quantity 3’s, etc.), you would enter:
purchased in the month of Dec. SELECT quantity, max(price)
FROM items_ordered
GROUP BY quantity;
Enter the statement in above, and take a look at the results to
see if it returned what you were expecting. Verify that the
maximum price in each Quantity Group is really the maximum
price.
GROUP BY - Multiple Grouping Columns - What if ?
What if you ALSO want to display their lastname for the query
3. What are the total number of rows in the items_ordered below:
table? SELECT max(salary), dept
FROM employee
GROUP BY dept;
What you’ll need to do is:
SELECT lastname, max(salary), dept
FROM employee
GROUP BY dept, lastname;
This is a called “multiple grouping columns”.
Students Activity
4. For all of the tents that were ordered in the items_ordered
1. How many people are in each unique state in the customers
table, what is the price of the lowest tent? Hint: Your query
table? Select the state and display the number of people in
should return the price only.
each. Hint: count is used to count rows in a column, sum
works on numeric data only.
Group by Clause
The Group By clause will gather all of the rows together that
contain data in the specified column(s) and will allow aggregate
2. From the items_ordered table, select the item, maximum
functions to be performed on the one or more columns. This
price, and minimum price for each specific item in the table.
can best be explained by an example:
Hint: The items will need to be broken up into separate
GROUP BY clause syntax: groups.
49
DATABASE
2. From the items_ordered table, select the item, maximum

MANAGEMENT
price, and minimum price for each specific item in the table.
Only display the results if the maximum price for one of
the items is greater than 190.00.
3. How many orders did each customer make? Use the
items_ordered table. Select the customerid, number of
orders they made, and the sum of their orders. Click the
Group By answers link below if you have any problems.
3. How many orders did each customer make? Use the

items_ordered table. Select the customerid, number of
orders they made, and the sum of their orders if they
purchased more than 1 item.
Having Clause
The HAVING clause allows you to specify conditions on the
rows for each group - in other words, which rows should be
selected will be based on the conditions you specify. The
HAVING clause should follow the GROUP BY clause if you
are going to use it.
HAVING clause syntax:
SELECT column1, SUM(column2) Order by Clause
FROM “list-of-tables” ORDER BY is an optional clause which will allow you to
GROUP BY “column-list” display the results of your query in a sorted order (either
HAVING “condition”; ascending order or descending order) based on the columns
that you specify to order by.
HAVING can best be described by example. Let’s say you have
an employee table containing the employee’s name, department, ORDER BY clause syntax:
salary, and age. If you would like to select the average salary for SELECT column1, SUM(column2)
each employee in each department, you could enter: FROM “list-of-tables”
SELECT dept, avg(salary) ORDER BY
FROM employee “column-list” [ASC | DESC];
GROUP BY dept; [ ] = optional
But, let’s say that you want to ONLY calculate & display the This statement will select the employee_id, dept, name, age, and
average if their salary is over 20000: salary from the employee_info table where the dept equals
SELECT dept, avg(salary) ‘Sales’ and will list the results in Ascending (default) order based
FROM employee on their Salary.
GROUP BY dept ASC = Ascending Order - default
HAVING avg(salary) > 20000; DESC = Descending Order
Students Activity For example
1. How many people are in each unique state in the customers SELECT employee_id, dept, name, age, salary
table that have more than one person in the state? Select the FROM employee_info
state and display the number of how many people are in WHERE dept = ‘Sales’
each if it’s greater than 1. ORDER BY salary;
If you would like to order based on multiple columns, you
must seperate the columns with commas. For example:
SELECT employee_id, dept, name, age, salary
FROM employee_info
WHERE dept = ‘Sales’
50
ORDER BY salary, age DESC; greater than or equal to 50000.00 AND the title is equal to
DATABASE
‘Programmer’. Both of these conditions must be true in order
Students Activity
for the rows to be returned in the query. If either is false, then it
1. Select the lastname, firstname, and city for all customers in will not be displayed.
the customers table. Display the results in Ascending Order
Although they are not required, you can use paranthesis around
based on the lastname.
MANAGEMENT
your conditional expressions to make it easier to read:
SELECT employeeid, firstname, lastname, title, salary
FROM employee_info
WHERE (salary >= 50000.00) AND (title = ‘Programmer’);
Another Example
SELECT firstname, lastname, title, salary
FROM employee_info
2. Same thing as exercise #1, but display the results in WHERE (title = ‘Sales’) OR (title = ‘Programmer’);
Descending order.
This statement will select the firstname, lastname, title, and
salary from the employee_info table where the title is either
equal to ‘Sales’ OR the title is equal to ‘Programmer’.
Students Activity
1. Select the customerid, order_date, and item from the
items_ordered table for all items unless they are ‘Snow
Shoes’ or if they are ‘Ear Muffs’. Display the rows as long
as they are not either of these two items.
3. Select the item and price for all of the items in the
items_ordered table that the price is greater than 10.00.
Display the results in Ascending order based on the price.
2. Select the item and price of all items that start with the
letters ‘S’, ‘P’, or ‘F’.
Combining conditions and Boolean
Operators
The AND operator can be used to join two or more conditions
in the WHERE clause. Both sides of the AND condition must
be true in order for the condition to be met and for those rows
to be displayed.
SELECT column1,
SUM(column2)
FROM “list-of-tables” In and Between Conditional Operators
SELECT col1, SUM(col2)
WHERE “condition1” AND “condition2”;
FROM “list-of-tables”
The OR operator can be used to join two or more conditions in
the WHERE clause also. However, either side of the OR WHERE col3 IN (list-of-values);
operator can be true and the condition will be met - hence, the SELECT col1, SUM(col2)
rows will be displayed. With the OR operator, either side can be FROM “list-of-tables”
true or both sides can be true. WHERE col3 BETWEEN value1 AND value2;
For Example The IN conditional operator is really a set membership test
SELECT employeeid, firstname, lastname, title, salary operator. That is, it is used to test whether or not a value (stated
before the keyword IN) is “in” the list of values provided after
FROM employee_info
the keyword IN.
WHERE salary >= 50000.00 AND title = ‘Programmer’;
For Example
This statement will select the employeeid, firstname, lastname,
SELECT employeeid, lastname, salary
title, and salary from the employee_info table where the salary is
51
FROM employee_info
DATABASE
WHERE lastname IN (‘Hernandez’, ‘Jones’, ‘Roberts’, ‘Ruiz’);

This statement will select the employeeid, lastname, salary from
the employee_info table where the lastname is equal to either:
Hernandez, Jones, Roberts, or Ruiz. It will return the rows if it
MANAGEMENT
is ANY of these values.

The IN conditional operator can be rewritten by using com-
pound conditions using the equals operator and combining it Mathematical Operators
with OR - with exact same output results: Standard ANSI SQL-92 supports the following first four basic
arithmetic operators:
SELECT employeeid, lastname, salary
FROM employee_info + addition
WHERE lastname = ‘Hernandez’ OR lastname = ‘Jones’ OR - subtraction
lastname = ‘Roberts’
* multiplication
OR lastname = ‘Ruiz’;
/ division
As you can see, the IN operator is much shorter and easier to
read when you are testing for more than two or three values. % modulo
You can also use NOT IN to exclude the rows in your list. The modulo operator determines the integer remainder of the
The BETWEEN conditional operator is used to test to see division. This operator is not ANSI SQL supported, however,
whether or not a value (stated before the keyword BETWEEN) most databases support it. The following are some more useful
is “between” the two values stated after the keyword BE- mathematical functions to be aware of since you might need
TWEEN. them. These functions are not standard in the ANSI SQL-92
For example specs, therefore they may or may not be available on the specific
SELECT employeeid, age, lastname, salary RDBMS that you are using. However, they were available on
several major database systems that I tested. They WILL work
FROM employee_info on this tutorial.
WHERE age BETWEEN 30 AND 40;
ABS(x) returns the absolute value of x
This statement will select the employeeid, age, lastname, and
returns the sign of input x as -1, 0, or 1 (negative, zero, or
salary from the employee_info table where the age is between 30 SIGN(x)
positive respectively)
and 40 (including 30 and 40). modulo - returns the integer remainder of x divided by y
MOD(x,y)
(same as x%y)
This statement can also be rewritten without the BETWEEN
returns the largest integer value that is less than or equal
operator: FLOOR(x)
to x
SELECT employeeid, age, lastname, salary CEILING(x) or returns the smallest integer value that is greater than or
FROM employee_info CEIL(x) equal to x
POWER(x,y) returns the value of x raised to the power of y
WHERE age >= 30 AND age <= 40;
returns the value of x rounded to the nearest whole
You can also use NOT BETWEEN to exclude the values ROUND(x)
integer
between your range. ROUND(x,d)
returns the value of x rounded to the number of decimal
places specified by the value d
Students Ativity SQRT(x) returns the square-root value of x
1. Select the date, item, and price from the items_ordered table
for all of the rows that have a price value ranging from For example
10.00 to 80.00. SELECT round(salary), firstname
FROM employee_info
This statement will select the salary rounded to the nearest
whole value and the firstname from the employee_info table.
Students Activity
Select the item and per unit price for each item in the
items_ordered table. Hint: Divide the price by the quantity.
2. Select the firstname, city, and state from the customers table
for all of the rows where the state value is either: Arizona,
Washington, Oklahoma, Colorado, or Hawaii.
52
DATABASE
Points to Ponder
• SQL provides rename operator both relations and attributes
• Aggregate functions are used to compute against a
“returned column of numeric data” from your SELECT
statement.
MANAGEMENT
• The GROUP BY clause will gather all of the rows together
that contain data in the specified column(s) and will allow
aggregate functions to be performed on the one or more
columns.
• The HAVING clause allows you to specify conditions on
the rows for each group
• ORDER BY is an optional clause which will allow you to
display the results of your query in a sorted order
Review Terms
• Rename operator
• Aggregate function
• Having
• Order-by
• Group-by
53
Student Notes
54
DATABASE MANAGEMENT
DATABASE MANAGEMENT
55
LESSON 15:
LAB
LESSON 16:
LAB
56
DATABASE MANAGEMENT
DATABASE
LESSON 17:
SQL-III
MANAGEMENT
Lesson Objective Now, whenever a purchase is made from a repeating customer,
• Table joins the 2nd table, “Purchases” only needs to be updated! We’ve just
eliminated useless redundant data, that is, we’ve just normal-
• Tuple variable
ized this database!
• String operators
Notice how each of the tables have a common
• Set operators “cusomer_number” column. This column, which contains the
• Views unique customer number will be used to JOIN the two tables.
• Update Using the two new tables, let’s say you would like to select the
customer’s name, and items they’ve purchased. Here is an
• Delete
example of a join statement to accomplish this:
Table Joins SELECT customer_info.firstname, customer_info.lastname,
All of the queries up until this point have been useful with the purchases.item
exception of one major limitation - that is, you’ve been
selecting from only one table at a time with your SELECT FROM customer_info, purchases
statement. It is time to introduce you to one of the most WHERE customer_info.customer_number =
beneficial features of SQL & relational database systems - the purchases.customer_number;
“Join”. To put it simply, the “Join” makes relational database This particular “Join” is known as an “Inner Join” or
systems “relational”. “Equijoin”. This is the most common type of “Join” that you
Joins allow you to link data from two or more tables together will see or use.
into a single query result—from one single SELECT statement. Notice that each of the columns are always preceeded with the
A “Join” can be recognized in a SQL SELECT statement if it table name and a period. This isn’t always required, however, it
has more than one table after the FROM keyword. is good practice so that you wont confuse which colums go
with what tables. It is required if the name column names are
For example the same between the two tables. I recommend preceeding all
SELECT “list-of-columns” of your columns with the table names when using joins.
FROM table1,table2 Note: The syntax described above will work with most
WHERE “search-condition(s)” Database Systems . However, in the event that this doesn’t
Joins can be explained easier by demonstrating what would work with yours, please check your specific database documenta-
happen if you worked with one table only, and didn’t have the tion.
ability to use “joins”. This single table database is also some- Although the above will probably work, here is the ANSI SQL-
times referred to as a “flat table”. Let’s say you have a one-table 92 syntax specification for an Inner Join using the preceding
database that is used to keep track of all of your customers and statement above that you might want to try:
what they purchase from your store: SELECT customer_info.firstname, customer_info.lastname,
Everytime a new row is inserted into the table, all columns will purchases.item
be be updated, thus resulting in unnecessary “redundant data”. FROM customer_info INNER JOIN purchases
For example, every time Wolfgang Schultz purchases some- ON customer_info.customer_number =
thing, the following rows will be inserted into the table: purchases.customer_number;
id first last Address city state zip date item price Another Example
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 032299 snowboard 45.00
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 091199 gloves 15.00 SELECT employee_info.employeeid,
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 100999 lantern 35.00
employee_info.lastname, employee_sales.comission
10982 Wolfgang Schultz 300 N. 1st Ave Yuma AZ 85002 022900 tent 85.00
FROM employee_info, employee_sales
WHERE employee_info.employeeid =
An ideal database would have two tables:
employee_sales.employeeid;
1. One for keeping track of your customers
This statement will select the employeeid, lastname (from the
2. And the other to keep track of what they purchase: employee_info table), and the comission value (from the
“Customer_info” table: employee_sales table) for all of the rows where the employeeid
customer_number firstname lastname address city state zip in the employee_info table matches the employeeid in the
employee_sales table.
“Purchases” table:
customer_number date item price
57
DATABASE
Students Activity String Operations
1. Write a query using a join to determine which items were SQL specifies strings by enclosing them in single quotes, for
ordered by each of the customers in the customers table. example, ‘Perryridge’, as we saw earlier. A single quote character
Select the customerid, firstname, lastname, order_date, that is part of a string can be specified by using two single quote
item, and price for everything each customer purchased in characters; for example the string “It’s right” can be specified by
‘It” s right’.
MANAGEMENT
the items_ordered table.

The most commonly used operation on strings is pattern
matching using the op-erator like. We describe patterns by using
two special characters:
• Percent (%): The % character matches any sub string.
• Underscore (-): The - character matches any character.
Patterns are case sensitive; that is, uppercase characters do not
match lowercase char-acters, or vice versa. To illustrate pattern
matching, we consider the following exam-ples:
• ‘Perry%’ matches any string beginning with “Perry”.
2. Repeat exercise #1, however display the results sorted by • ‘%idge%’ matches any string containing “idge” as a sub
state in descending order string, for example,
‘Perryridge’, ‘Rock Ridge’, ‘Mianus Bridge’, and ‘Ridgeway’.
• ‘- - -’ matches any string of exactly three characters.
• ‘- - % matches any string of at least three characters.
SQL expresses patterns by using the like comparison operator.
Consider the query “Find the names of all customers whose
street address includes the substring ‘Main’.” This query can be
written as
select customer-name
from customer
Tuple Variables where customer-street like ‘%Main%’
The as clause is particularly useful in defining the notion of
tuple variables, as is done in the tuple relational calculus. A tuple Joined Relations
variable in SQL must be associated with a particular relation. SQL provides not only the basic Cartesian-product mechanism
Tuple variables are defined in the from clause by way of the as for joining tuples of relations found in its earlier versions, but,
clause. To illustrate, we rewrite the query “For all customers who SQL also provides various other mechanisms -
have a loan from the bank, find their names, loan numbers, and Loan number Branch name Amout
loan amount” as select customer-name, T. loan-number, L-170 Downtown 3000
S.amount L-230 Redwood 4000
from borrower as T, loan as S L-260 Perryridge 1700
where T. loan-number = S. loan-number Customer name Loan number
Note that we define a tuple variable in the from clause by Jones L-170
placing it after the name of the relation with which it is Smith L-230
associated, with the keyword as in between (the keyword as is Hayes L-155
optional). When we write expressions of the form relation- The loan and borrower relations.
name. attribute-name, the relation name is, in effect, an for joining relations, including condition joins and natural
implicitly defined tuple variable. joins, as well as var-ious forms of outer joins. These additional
operations are typically used as subquery expressions in the
Tuple variables are most useful for comparing two tuples in the
from clause.
same relation. Recall that, in such cases, we could use the rename
operation in the relational algebra. Suppose that we want the Examples
query “Find the names of all branches that have assets greater We illustrate the various join operations by using the relations
than at least one branch located in Brooklyn.” We can write the loan and borrower in . We start with a simple example of inner
SQL expression joins.
select distinct T.branch-name loan inner join borrower on loan. loan-number - borrower.
from branch as T, branch as S loan-number
where T.assets > S.assets and S.branch-city = ‘Brooklyn’ The expression computes the theta join of the loan and the
borrower relations, with the join condition being loan. Loan-
Observe that we could not use the notation branch. asset, since
number = borrower. loan-number. The attributes of the result
it would not be clear which reference to branch is intended.
58
consist of the attributes of the Right-hand-side relation result. Although outer -join expressions are typically used in the
DATABASE
followed by the attributes of the right-hand-side relation. from clause, they can be used anywhere that a relation can be
Note that the attribute loan-number appears twice in the figure- used.
the first occur-rence is from loan, and the second is from Each of the variants of the join operations in SQL consists of a
borrower. The SQL standard does not require attribute names join type and a join condition. The join condition defines which
MANAGEMENT
in such results to be unique. An as clause should be used to tuples in the two relations match and what attributes are
assign unique names to attributes in query and subquery results. present in the result of the join. The join type defines how
We rename the result relation of a join and the attributes of the tuples in each
result relation by using an as clause, as illustrated here: Branch name
Loan number Amount Customer name
loan inner join borrower on loan. loan-number = L-170 Downtown 3000 Jones
borrower.loan-number as Lb(loan-number, branch, amount, L- 230 Redwood 4000 Smith
cust, cust-Loan-num)
The result of loan natural inner join borrower.
We rename the second occurrence of loan-number to cust- loan-
num. The ordering of the attributes in the result of the join is
Join types
important for the renaming.
Inner join
Next, we consider an example of the left outer join operation: Left outer join
loan left outer join borrower on loan. Loan-number = bor- Right outer join
rower. loan-number Full outer join
Loan number Branch name Amount Customer name Loan number
Join Types and Join Conditions
L-170 Downtown 3000 Jones L-170
The first join type is the inner join, and the other three are the
L-230 Redwood 4000 Smith L-230
outer joins. Of the three join conditions, we have seen the
The result of loan inner join borrower on natural join and the on condition before, and we shall discuss
loan. loan-number = borrower .loan-number. the using condition, later in this section.
The use of a join condition is mandatory for outer joins, but is
Loan number Branch name Amount Customer name Loan number optional for inner joins (if it is omitted, a Cartesian product
L-170 Downtown 3000 Jones L-170
L- 230 Redwood 4000 Smith L-230
results). Syntactically, the keyword natural appears before the
L-260 Perryridge 1700 null null join type, as illustrated earlier, whereas the on and using con-
ditions appear at the end of the join expression. The keywords
The result of loan left outer join borrower on loan. loan- inner and outer are optional, since the rest of the join type
number = borrower .loan-number. enables us to deduce whether the join is an inner join or an
We can compute the left outer join operation logically as outer join.
follows. First, compute the result of the inner join as before. The meaning of the join condition natural, in terms of which
Then, for every tuple t in the left-hand-side relation loan that tuples from the two relations match, is straightforward. The
does not match any tuple in the right-hand-side relation ordering of the attributes in the result of a natural join is as
borrower in the inner join, add a tuple r to the result of the follows. The join attributes (that is, the attributes common to
join: The attributes of tuple r that are derived from the left- both relations) appear first, in the order in which they appear in
hand-side relation are filled in with the values from tuple t, and the left-hand-side relation. Next come all nonjoin attributes of
the remaining attributes of r are filled with null values. The the left-hand-side relation, and finally all nonjoin attributes of
tuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000) the right-hand-side relation.
join with tuples from borrower and appear in the result of the
inner join, and hence in the result of the left outer join. On the The right outer join is symmetric to the left outer join. Tuples
other hand, the tuple (L-260, Perryridge, 1700) did not match from the right-hand--side relation that do not match any tuple
any tuple from borrower in the inner join, and hence a tuple (L- in the left-hand-side relation are padded with nulls and are
260, Perryridge, 1700, null, null) is present in the result of the added to the result of the right outer join.
left outer join. Here is an example of combining the natural join condition
Finally, we consider an example of the natural join operation: with the right outer join type:
loan natural inner join borrower loan natural right outer join borrower
This expression computes the natural join of the two relations. The attributes of the result are defined by the join type, which is
The only attribute name common to loan and borrower is loan- a natural join; hence, loan-number appears only once. The first
number. However, the attribute loan-number appears only once two tuples in the result are from the inner natural join of loan
in the result of the natural join, whereas it appears twice in the and borrower. The tuple (Hayes, L-155) from the right-hand-
result of the join with the on condition. side relation does not match any tuple from the left-hand-side
relation loan in the natural inner join. Hence, the tuple (L-155,
Join Types and Conditions null, null, Hayes) appears in the join result.
We saw examples of the join operations permitted in SQL Join
op-erations take two relations and return another relation as the
59
The join condition using (Ai, A2, . . . ,An) is similar to the
DATABASE
natural join condition, ex-cept that the join attributes are the
attributes A1, A2, . . . , An, rather than all attributes that are
common to both relations. The attributes A1, A2,…, An must
consist of only attributes that are common to both relations,
and they appear only once in the result of the join.
MANAGEMENT
The full outer join is a combination of the left and right outer-
join types. After the operation computes the result of the inner
join, it extends with nulls tuples from
Loan number Branch name Amount Customer name

L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-155 Null null Hayes
The result of loan natural right outer join borrower.

the left-hand-side relation that did not match with any from the
right-hand-side, and adds them to the result. Similarly, it
extends with nulls tuples from the right-hand--side relation
that did not match with any tuples from the left-hand-side
relation and adds them to the result.
loan full outer join borrower using (loan-number)
Loan number Branch name Amount Customer name

L-170 Downtown 3000 Jones
L- 230 Redwood 4000 Smith
L-260 Perryridge 1700 null
L-155 null Null Hayes
The result of loan full outer join borrower using(loan-

number).
Points to Ponder
• Joins allow you to link data from two or more tables
together into a single query result—from one single
SELECT statement.
• Tuple variables are defined in the from clause by way of the
as clause
• We describe patterns by using two special characters:
• Percent (%): The % character matches any sub string.
• Underscore (-): The - character matches any character.
• Various kinds of joins are as follows:-
Join types
Inner join
Left outer join
Right outer join
Full outer join
Review Terms
• Table joins
• Tuple variable
• String operators
• Set operators
• Views
• Update
• Delete
60
DATABASE MANAGEMENT
61
Student Notes
LESSON 18:
LAB
62
DATABASE MANAGEMENT
DATABASE MANAGEMENT
63
LESSON 19:
LAB
DATABASE
LESSON 20:
SQL-IV
MANAGEMENT
Lesson Objectives The Intersect Operation

• Set operators To find all customers who have both a loan and an account at
the bank, we write
• Union
(select distinct customer-name
• Intersect
from depositor)
• Except
intersect (select distinct customer-name
• View
from borrower)
• Update
The intersect operation automatically eliminates duplicates.
• Delete
Thus, in the preceding query, if a customer-say, Jones-has
Set Operations several accounts and loans at the bank, then Jones will appear
The SQL operations union, intersect, and except operate on only once in the result.
relations and correspond to the relational-algebra operations U, If we want to retain all duplicates, we must write intersect all in
n, and -. Like union, intersection, and set difference in relational place of intersect:
algebra, the relations participating in the operations must be
compatible; that is, they must have the same set of attributes. (select customer-name
select customer-name from depositor)
from depositor intersect all (select customer-name
and the set of customers who have a loan at the bank, which from borrower)
can be derived by The number of duplicate tuples that appear in the result is
select customer-name equal to the minimum number of duplicates in both d and b.
Thus, if Jones has three accounts and two loans at the bank,
from borrower
then there will be two tuples with the name Jones in the result.
We shall refer to the relations obtained as the result of the
preceding queries as d and b, respectively.
The Except Operation
To find all customers who have an account but no loan at the
The Union Operation bank, we write
To find all customers having a loan, an account, or both at the (select distinct customer-name
bank, we write
from depositor)
(select customer-name
except (select customer-name
from depositor)
from borrower)
union
The except operation automatically eliminates duplicates. Thus,
(select customer-name in the preceding query, a tuple with customer name Jones will
from borrower) appear (exactly once) in the result only if Jones has an account at
The union operation automatically eliminates duplicates, unlike the bank, but has no loan at the bank.
the select clause. Thus, in the preceding query, if a customer-say, If we want to retain all duplicates, we must write except all in
Jones-has several accounts or loans (or both) at the bank, then place of except:
Jones will appear only once in the result. (select customer-name
If we want to retain all duplicates, we must write union all in
from depositor)
place of union:
except all (select customer-name from borrower)
(select customer-name
The number of duplicate copies of a tuple in the result is equal
from depositor)
to the number of duplicate copies of the tuple in d minus the
union all (select Customer-name number of duplicate copies of the tuple in b, provided that the
from borrower) difference is positive. Thus, if Jones has three accounts and one
The number of duplicate tuples in the result is equal to the loan at the bank, then there will be two tuples with the name
total number of duplicates that appear in both d and b. Thus, Jones in the result. If, instead, this customer has two accounts
if Jones has three accounts and two loans at the bank, then and three loans at the bank, there will be no tuple with the
there will be five tuples with the name Jones in the result. name Jones in the result.
64
[and|or “column”
DATABASE
Views
We define a view in SQL by using the create view command. To OPERATOR “value”];
define a view, we must give the view a name and must state the
[] = optional
query that computes the view. The form of the create view
command is Examples
update phone_book
MANAGEMENT
create view v as <query expression>
where <query expression> is any legal query expression. The set area_code = 623
view name is repre-sented by v. where prefix = 979;
As an example, consider the view consisting of branch names update phone_book
and the names of customers who have either an account or a set last_name = ‘Smith’, prefix=555, suffix=9292
loan at that branch. Assume that we want this view to be called
where last_name = ‘Jones’;
all-customer. We define this view as follows:
update employee
create view all-customer as
set age = age+1
(select branch-name, customer-name
where first_name=’Mary’ and last_name=’Williams’;
from depositor, account
where depositor.account-number = account.account-
Students Activity
number) 1. Jonie Weber just got married to Bob Williams. She has
requested that her last name be updated to Weber-Williams.
union
(select branch-name, customer-name
from borrower, loan
where borrower. loan-number = loan.loan-number)
The attribute names of a view can be specified explicitly as
follows:
create view branch-total-Ioan(branch-name, total-loan)
as 2. Dirk Smith’s birthday is today, add 1 to his age.
select branch-name, sum(amount)
from loan
group by branch-name
The preceding view gives for each branch the sum of the
amounts of all the loans at the branch. Since the expression
sum (amount) does not have a name, the attribute name is
specified explicitly in the view definition.
3. All secretaries are now called “Administrative Assistant”.
View names may appear in any place that a relation name may
Update all titles accordingly.
appear. Using the view all-customer, we can find all customers
of the Perryridge branch by writing
select customer-name
from all-customer
where branch-name = ‘Perryridge’
Modification of the Database
Update
The update statement is used to update or change records that 4. Everyone that’s making under 30000 are to receive a 3500 a
match a specified criteria. This is accomplished by carefully year raise.
constructing a where clause.
update “tablename”
set “columnname” =
“newvalue”
[,”nextcolumn” =
“newvalue2”...]
where “columnname”
OPERATOR “value”
65
5. Everyone that’s making over 33500 are to receive a 4500 a
DATABASE
Students Activity
year raise. 1. Jonie Weber-Williams just quit, remove her record from the
table.
MANAGEMENT
6. All “Programmer II” titles are now promoted to

“Programmer III”.
2. It’s time for budget cuts. Remove all employees who are
making over 70000 dollars.
7. All “Programmer” titles are now promoted to

“Programmer II”.
Points to Ponder
• The SQL operations union, intersect, and except operate on
relations
• The union operation automatically eliminates duplicates,
unlike the select clause
• The intersect operation automatically eliminates duplicates.
• To define a view, we must give the view a name and must
state the query that computes the view
Deleting Records • The update statement is used to update or change records
The delete statement is used to delete records or rows from the that match a specified criteria
table.
• The delete statement is used to delete records or rows from
delete from “tablename” the table.
where “columnname”
Review Terms
OPERATOR “value”
• Set operators
[and|or “column”
• Union
OPERATOR “value”];
• Intersect
[ ] = optional
• Except
Examples
• View
delete from employee;
• Update
Note: if you leave off the where clause, all records will be
deleted! • Delete
delete from employee Students Activity

where lastname = ‘May’;
delete from employee
where firstname = ‘Mike’ or firstname = ‘Eric’;
To delete an entire record/row from a table, enter “delete from”
followed by the table name, followed by the where clause which
contains the conditions to delete. If you leave off the where
clause, all records will be deleted.
66
DATABASE MANAGEMENT
67
LESSON 21:
LAB
LESSON 22:
LAB
68
DATABASE MANAGEMENT
DATABASE
LESSON 23:
INTEGRITY AND SECURITY
MANAGEMENT
Lesson Objective Null Value Concepts
• Data constraints While creating tables, if a row lacks a data value for a particular
column, that value is said to be null. Columns of any data types
• Column Level Constraints
may contain null values unless the column was defined as not
• Level Constraints null when the table was created.
• Table level constraints
Principles of Null values
• NULL value
• Setting a null value is appropriate when the actual value is
• Primary Key unknown, or when a value would not be meaningful.
• Unique key • A null value is not equivalent to a value of zero.
• Default key • A null value will evaluate to null in any expression. e.g. null
• Foreign key multiplied by 10 is null.
• NOT NULL constraints • When a column name is defined as not null, then that
• Check constraints column becomes a mandatory column. It implies that the
user is forced to enter data into that column.
Data Constraints
Example: Create table client master with a not null constraint
Integrity constraints ensure that changes made to the database
on columns client no, Name, address, address2.
by authorized users do not result in a loss of data consistency.
Thus, integrity constraints guard against accidental damage to NOT NULL as a column constraint:
the database. CREATE TABLE client master
Besides the cell name, cell length and cell data type, there are (client_no varchar2(6) NOT NULL,
other parameters i.e. other data constraints that can be passed to name varchar2(20) NOT NULL,
the DBA at cell creation time.
address 1 varchar2(30) NOT NULL,
These data constraints will be connected to a cell by the DBA as
address2 varchar2(30) NOT NULL,
flags. Whenever a user attempts to load a cell with data, the
DBA will check the data being loaded into the cell against the city varchar2(15), state varchar2(15), pin code number( 6),
data constraints defined at the time the cell was created. If the remarks varchar2(60), bal_due number (10,2));
data being loaded fails any of the data constraint checks fired by
Primary Key Concepts
the DBA, the DBA will not load the data into the cell, reject the
A primary key is one or more columns in a table used to
entered record, and will flash an error message to the user.
uniquely identify each row. in the table. Primary key values must
These constraints are given a constraint name and the DBA not be null and must be unique across the column.
stores the constraints with its name and instructions internally
A multicolumn primary key is called a composite primary key.
along with the cell itself
The only function that a primary key performs is to uniquely
The constraint can either be placed at the column level or at the identify a row and thus if one column is used it is just as good
table level. as if multiple columns are used. Multiple columns i.e. (com-
Column Level Constraints posite keys) are used only when the system designed requires a
If the constraints are defined along with the column definition, primary key that cannot be contained in a single column.
it is called as a column level constraint. Column level constraint Examples
can be applied to anyone column at a time i.e. they are local to a Primary Key as a column constraint:
specific column. If the constraint spans across multiple
Create client_master where client_no is the primary key.
columns, the user will have to use table level constraints.
CREATE TABLE client master
Table Level Constraints
(client_no varchar2(6) PRIMARY KEY,
If the data constraint attached to a specific cell in a table
references the contents of another cell in the table then the user name varchar2(20), add}-essl varchar2(30), address2
will have to use table level constraints. Table level constraints are varchar2(30),
stored as a part of the global table definition. city varcbar2(15), state varchar2(15), pincode number(6),
Examples of different constraints that can be applied on the remarks varchar2(60), bal_due number (10,2));
table are as follows: Primary Key as a table constraint: Create a sales order details
table where
69
CREATE TABLE sales_order
DATABASE
Column Name: Data Type Size Attributes

(s_order_no varchar2(6) PRIMARY KEY,
S_order_no varchar2 6
Primary Key s _order_date date, client_no varchar2(6),
product_no varchar2 6 Primary Key dely _Addr varchar2(25), salesman _no varchar2(6),
qty _ordered Number 8
dely_type char(l) DEFAULT ‘F’,
MANAGEMENT
qty - disp Number 8

billed_yn char( l), dely_date date,
product_rate Number 8,2
order_status varchar2(1 0)) -
CREATE TABLE sales order details Foreign Key Concepts
(s_order_no varchar2(6), product_no varchar2(6), Foreign keys represent relationships between tables. A foreign
key is a Column (or a group of columns) whose values -are
qty _ordered number(8), qty - disp number(8),
derived from the primary key of the same or some other table.
product_rate number(8,2),
The existence of a foreign key implies that the table with the
PRIMARY KEY (s_order_no, product_no)); foreign key is related to the - primary key table from which the
Unique Key Concepts foreign-key is derived. A foreign key must have a corresponding
A unique key is similar to a primary key, except that the purpose primary key value in the primary key table to have a meaning.
of a unique key is to ensure that information in the column for For example, the s_order_no column is the primary key of
each record is unique, as with telephone or driver’s license table sales_order. In table sales_order__details, s _order_no is a
numbers. A table may have many unique keys. foreign key that references the s_order_.no values in table sales
Example: Create Table client_master with unique constraint on order .
column client_no The Foreign Key References Constraint
UNIQUE as a column constraint: • rejects an Insert or Update of a value, if a corresponding
CREATE TABLE client master value does not currently exist in the primary key table
(client_no varchar2( 6) CONSTRAINT cnmn - ukey UNIQUE, • rejects a DEL_TE, if it would invalidate a REFERENCES
name varchar2(20), address 1 varchar2(30), address2 constrain
varchar2(30), • must reference a PRIMARY KEY or UNIQUE column(s)
city varchar2(15), state varchar2(l5), pincode number(6), in primary key table
remarks varch_2(60), bal_due number(lO,2), partpaY311 • will reference the PRIMARY KEY of the primary key table
char(l)); if no column or group of columns is specified in the
constraint
UNIQUE as a table constraint:
• must reference a table, not a view or cluster;
CREATE TABLE client master
• requires that you own the primary key table, have Reference
(client_no varchar2(6), name varchar2(20),
privilege on it, or have column-level Reference privilege on
addressl varchar2(30), address2 varchar2(30), the referenced colwnns in the primary key table;
city varchar2(15), state varchar2(15), pincode number(6), • doesn’t restrict how other constraints may reference the
remarks varchar2(60), bal_due number(lO,2), same tables;
CONSTRAINT cnmn_ukey UNIQUE (client_no)); • requires that the FOREIGN KEY column(s) and the
CONSTRAINT column(s) have matching data types;
Default Value Concepts
At the time of cell creation a ‘default value’ can be assigned to it. • may reference the same table named in the CREATE
When the user is loading a ‘record’ with values and leaves this TABLE statement;
cell empty, the DBA will automatically load this cell with the • must not reference the same column more than once (in a
default value specified- The data type of the default value single constraint).
should match the data type of the column. You can use the Example: Create table sales_order _details with primary key as
default clause to specify any default value you want. s_order_no and product_no and foreign key as s_order_no
Create sales_order table where: referencing column s_order_no in the sales order table.
Column Name Data Type Size Attributes Foreign Key as a Column Constraint
S_order no varchar2 6 Primary Key
S_order date date
CREATE TABLE sales order details
Client_no varchar2 6 (s_order_no varchar2(6) REFERENCES sales_order,
Dely_Addr varchar2 25
Salesman_no varchar2 6 product_no varchar2(6),
Dely_type char I
Delivery: part (P) / full (F)
qty _ordered number(8), qty - disp number(8), product_rate
Billed_yn char I Default 'F' number(8,2),
Dely_date date Primary Key (s_order_no, product_no));
Order_status varchar2 10
70
DATABASE
FOREIGN KEY as a Table Constraint Check with not Null Integrity Constraints
Create Table sales order details According to the ANSI I ISO standard, a not null integrity
( s _order_no varchar2( 6), constraint is an example of a CHECK integrity constraint,
where the condition is, CHECK (column_name IS NOT
product_no varchar2(6),
NULL). Therefore, the not null integrity constraint for a single
qty_ordered number(8), qty_disp number(8), column can, in practice be written in two forms, by using the
MANAGEMENT
product_rate number(8,2), not null constraint or by using a CHECK constrllint. For ease
Primary Key (s_order_no, product_no), of use, you should always choose to define the not null
integrity constraint instead of a CHECK constraint with the is
Foreign Key (s_order_no) References sales_order);
not null condition.
Check Integrity Constraints Here we shall look at the method via which data constraints can
Use the Check constraint when you need to enforce integrity be attached to the cell so that data validation can be done at
rules that can be evaluated based on a logical expression. Never table level itself using the power of the DBA. .
use Check constraints if the constraint can be defined using the
A constraint clause restricts the range of valid values for one
not null, primary key or foreign key constraint.
column (a column constraint) or for a group of columns (a
Following are a few examples of appropriate CHECK con- table constraint). Any INSERT, UPDATE or DELETE
straints: statement evaluates a relevant constraint; the constraint must be
• a Check constraint on the client no column of the client satisfied for the statement to succeed.
master so that no client no value starts with ‘C’. Constraints can be connected to a table by CREATE TABLE or
• a Check constant on name column of the client master so ALTER TABLE command. Use ALTER T ABLE to add. or
that the name is entered in upper case. drop the constraint from a table. Constraints are recorded in the
• a Check constraint on the city column of the client_master data dictionary. If you don’t name a constraint, it is assigned the
so that only the cities “Bombay”, “New Delhl “, “Mapras” name SYS - Cn, where n is an integer that makes the name
and “Calcutta” are allowed. unique in the database.
Create T Able client master Restrictions on Check Constraints
(client_no varchar2(6) Constraint ck_clientno A Check integrity constraint requires that a condition be true or
unknown for every row of the table. If a statement causes the
Check ( client_no like ‘C%’),
condition to evaluate to false; the statement is rolled back. The
name varchar2(20) Constraint ck_cname condition of a CHECK constraint has the following limita-
Check (name = upper(name», tions:
address I varchar2(30), address2 varchar2(30), • The condition must be a Boolean expression that can be
city varchar2( 15) Constraint ck _city evaluated using the values in the row being inserted or
updated.
Check (city In Cnewdelhi’, ‘Bombay’, ‘Calcutta’, ‘Madras’)),
• The condition cannot contain sub queries or sequences.
state varchar2(l5), pin code number(6),
• The condition cannot include the Sysdate, Uid, User or
remarks varchar2(60), bal- due number(10,2));
Userenv SQL function
When using CHECK constraints, consider the ANSI I ISO
standard which states that a CHECK constraint is violated only Defining Different Constraints on the Table
if the condition evaluates to False, True and unknown values Create a sales_order _details table where
do not violate a check condition. Therefore, make sure that a Column Name: Data Type Size Attributes
CHECK constraint that you define actually enforces the rule you Primary Key, Foreign Key
need to enforce. S_order_ no varchar2 6 references s order no of sales
order table.
For example, consider the following CHECK constraint for
Primary Key, Foreign Key
emp table: product_no varchar2 6 references product_no of
CHECK ( sal > 0 or comm >= 0 ) product_master table.
qty_ordered number 8 not null
At first glance, this rule may be interpreted as “do not allow a Qty_disp number 8.
row in emp table unless the employee’s salary is greater than 0 product_rate number 8,2 not null
or the employee’s commission is greater than or equal to “0”.
However, note that if a row is inserted with a null salary and a
negative commission, the row does not violate the CHECK Create Table sales_order details
constraint because the entire check condition is evaluated as (s_order_no varchar2(6) Contraint order_fkey
unknown. In this particular case, you can account for such References sales_order,
violations by placing not null integrity constraint on both the product_no varchar2(6) Constraint product_fkey
sal and comm columns.
References product_master,
qty _ordered number(8) Not Null,
71
qty - disp number(8), • When the user is loading a ‘record’ with values and leaves
DATABASE
product_rate number(8,2) Not Null, this cell empty, the DBA will automatically load this cell
with the default value specified.
Primary Key (s_order_no, product_no));
• Foreign keys represent relationships between tables. A
Defining Integrity Constraints in the Alter Table foreign key is a Column (or a group of columns) whose
MANAGEMENT
Command values -are derived from the primary key of the same or
You can also define integrity constraints using the constraint some other table.
clause in the Alter Table command. The following examples
• Use the CHECK constraint when you need to enforce
show the definitions of several integrity constraints:
integrity rules that can be evaluated based on a logical
1. Add Primary Key constant on column supplier_no in table expression
supplier_master;
• PL/SQL is Oracle’s procedural language extension to SQL.
Alter Table supplier_master PL/SQL enables you to mix SQL statements with
Add Primary Key (suppplier_no); procedural constructs
2. Add Foreign Key constraint on column s order no in table • Procedures, functions, and packages are all examples of PL/
sales order details referencing table sales_order, modify SQL program units.
column qty _ordered to include Not Null constant; • Database triggers are procedures that are stored in the
Alter Table sales order details database and are implicitly executed (fired) when the
Add Constraint order_tkey contents of a table are changed
Foreign Key (s_order_no) References sales_order Review Terms
Modify (qty_ordered number(8) Not Null); • Integrity constraints
Dropping Integrity Constraints in the Alter Table Cmmand • Table level constraints
You can drop an integrity constraint if the rule that it enforces is • Column level constraint
no longer true or if the constraint is no longer needed. Drop • Check constraints
the constraint using the Alter Table command with .the DROP
• NULL constraints
clause. The following examples illustrate the dropping of
integrity constraints: • Primary key
1. Drop the Primary Key constraint from supplier_master; • Foreign key
Alter Table supplier_master • Unique key
Drop Primary Key; Students Activity
2. Drop Foreign Key constraint on column product_no in 1. What is constraint?How many kinds of constraints are
table sales_order_details; there?
Alter Table sales order details
Drop Constraint product- fkey;
Note : Dropping UniquE and Primary Key constraints drops
the associated indexes.
Points to Ponder
• Integrity constraints ensure that changes made to the
database by authorized users do not result in a loss of data 2. Define primary key,unique key,foreign key?
consistency.
• If the constraints are defined along with the column
definition, it is called as a column level constraint.
• If the data constraint attached to a specific cell in a table
references the contents of another cell in the table then the
user will have to use table level constraints
• While creating tables, if a row lacks a data value for a
particular column, that value is said to be null. 3. Define check integrity constraints?
• A primary key is one or more columns in a table used to
uniquely identify each row.
• A unique key is similar to a primary key, except that the
purpose of a unique key is to ensure that information in
the column for each record is unique.
72
4. Differentiate between table level & column level constraint?
DATABASE
MANAGEMENT
5. Define Not Null constraint with the help of an example?
73
Student Notes
74
DATABASE MANAGEMENT
DATABASE MANAGEMENT
75
LESSON 24:
LAB
LESSON 25:
LAB
76
DATABASE MANAGEMENT
DATABASE
LESSON 26:
PL/SQL
MANAGEMENT
Lesson objectives Packages
• Difference between SQL & PL/SQL A package is a group of related procedures and functions,
together with the cursors and variables they use, stored together
• Stored procedures
in the database for continued use as a unit. Similar to
• Packages standalone procedures and functions, packaged procedures and
• Stored procedures functions can be called explicitly by applications or users.
• Functions Procedures and Functions
PL/SQL is Oracle’s procedural language extension to SQL. PL/ A procedure or function is a schema object that consists of a set
SQL enables you to mix SQL statements with procedural of SQL statements and other PL/SQL constructs, grouped
constructs. With PL/SQL, you can define and execute PL/SQL together, stored in the database, and executed as a unit to solve
program units such as procedures, functions, and packages. PL/ a specific problem or perform a set of related tasks. Procedures
SQL program units generally are categorized as anonymous and functions permit the caller to provide parameters that can
blocks and stored procedures. be input only, output only, or input and output values.
An anonymous block is a PL/SQL block that appears within Procedures and functions allow you to combine the ease and
your application and it is not named or stored in the database. flexibility of SQL with the procedural functionality of a
In many applications, PL/SQL blocks can appear wherever SQL structured programming language.
statements can appear. For example, the following statement creates the
A stored procedure is a PL/SQL block that Oracle stores in the Credit_Account procedure, which credits money to a bank
database and can be called by name from an application. When account:
you create a stored procedure, Oracle parses the procedure and Create Procedure credit_account
stores its parsed representation in the database. Oracle also (acct Number, credit Number) AS
allows you to create and store functions (which are similar to
/* This procedure accepts two arguments: an account
procedures) and packages (which are groups of procedures and
functions). number and an amount of money to credit to the specified
account. If the specified account does not exist, a
An Introduction to Stored Procedures and Packages
Oracle allows you to access and manipulate database informa- new account is created. */
tion using procedural schema objects called PL/SQL program old_balance Number;
units. Procedures, functions, and packages are all examples of new_balance Number;
PL/SQL program units.
Begin
PL/SQL is Oracle’s procedural language extension to SQL. It
Select balance Into old_balance From accounts
extends SQL with flow control and other statements that make
it possible to write complex programs in it. The PL/SQL engine Where acct_id = acct
is the tool you use to define, compile, and execute PL/SQL For Update Of balance;
program units. This engine is a special component of many new_balance := old_balance + credit;
Oracle products, including Oracle Server.
Update accounts Set balance = new_balance
While many Oracle products have PL/SQL components, this
Where acct_id = acct;
chapter specifically covers the procedures and packages that can
be stored in an Oracle database and processed using the Oracle Commit;
Server PL/SQL engine. The PL/SQL capabilities of each Oracle Exception
tool are described in the appropriate tool’s documentation. When No_Data_Found Then
Stored Procedures and Functions Insert Into accounts (acct_id, balance)
Procedures and functions are schema objects that logically group Values(acct, credit);
a set of SQL and other PL/SQL programming language
statements together to perform a specific task. Procedures and When Others Then
functions are created in a user’s schema and stored in a database Rollback;
for continued use. You can execute a procedure or function End credit_account;
interactively using an Oracle tool, such as SQL*Plus, or call it Notice that the Credit_Account procedure includes both SQL
explicitly in the code of a database application, such as an Oracle and PL/SQL statements.
Forms or Precompiler application, or in the code of another
procedure or trigger.
77
management change, only the procedures need to be modified,
DATABASE
Procedure Guidelines
Use the following guidelines to design and use all stored not all of the applications that use the procedures.
procedures:
Integrity
• Define procedures to complete a single, focused task. Do Stored procedures improve the integrity and consistency of your
not define long procedures with several distinct subtasks, applications. By developing all of your applications around a
MANAGEMENT
because subtasks common to many procedures might be common group of procedures, you can reduce the likelihood of
duplicated unnecessarily in the code of several procedures. committing coding errors.
• Do not define procedures that duplicate the functionality For example, you can test a procedure or function to guarantee
already provided by other features of Oracle. For example, that it returns an accurate result and, once it is verified, reuse it
do not define procedures to enforce simple data integrity in any number of applications without testing it again. If the
rules that you could easily enforce using declarative integrity data structures referenced by the procedure are altered in any way,
constraints. only the procedure needs to be recompiled; applications that call
Benefits of Procedures the procedure do not necessarily require any modifications.
Procedures provide advantages in the following areas. Anonymous PL/SQL Blocks vs. Stored Procedures
Security A stored procedure is created and stored in the database as a
Stored procedures can help enforce data security. You can restrict schema object. Once created and compiled, it is a named object
the database operations that users can perform by allowing that can be executed without recompiling. Additionally,
them to access data only through procedures and functions. dependency information is stored in the data dictionary to
guarantee the validity of each stored procedure.
For example, you can grant users access to a procedure that
updates a table, but not grant them access to the table itself. As an alternative to a stored procedure, you can create an
When a user invokes the procedure, the procedure executes with anonymous PL/SQL block by sending an unnamed PL/SQL
the privileges of the procedure’s owner. Users who have only block to Oracle Server from an Oracle tool or an application.
the privilege to execute the procedure (but not the privileges to Oracle compiles the PL/SQL block and places the compiled
query, update, or delete from the underlying tables) can invoke version in the shared pool of the SGA, but does not store the
the procedure, but they cannot manipulate table data in any source code or compiled version in the database for reuse
other way. beyond the current instance. Shared SQL allows anonymous
PL/SQL blocks in the shared pool to be reused and shared
Performance until they are flushed out of the shared pool.
Stored procedures can improve database performance in several
In either case, moving PL/SQL blocks out of a database
ways:
application and into database procedures stored either in the
• The amount of information that must be sent over a database or in memory, you avoid unnecessary procedure
network is small compared with issuing individual SQL recompilations by Oracle at runtime, improving the overall
statements or sending the text of an entire PL/SQL block performance of the application and Oracle.
to Oracle, because the information is sent only once and
thereafter invoked when it is used. External Procedures
A PL/SQL procedure executing on an Oracle8 Server can call an
• A procedure’s compiled form is readily available in the
external procedure or function that is written in the C program-
database, so no compilation is required at execution time.
ming language and stored in a shared library. The C routine
• If the procedure is already present in the shared pool of the executes in a separate address space from that of the Oracle
SGA, retrieval from disk is not required, and execution can Server.
begin immediately.
Packages
Memory Allocation Packages encapsulate related procedures, functions, and
Because stored procedures take advantage of the shared associated cursors and variables together as a unit in the
memory capabilities of Oracle, only a single copy of the database.
procedure needs to be loaded into memory for execution by
You create a package in two parts: the specification and the body.
multiple users. Sharing the same code among many users
A package’s specification declares all public constructs of the
results in a substantial reduction in Oracle memory require-
package and the body defines all constructs (public and private)
ments for applications.
of the package. This separation of the two parts provides the
Productivity following advantages:
Stored procedures increase development productivity. By • The developer has more flexibility in the development cycle.
designing applications around a common set of procedures, You can create specifications and reference public procedures
you can avoid redundant coding and increase your productivity. without actually creating the package body.
For example, procedures can be written to insert, update, or • You can alter procedure bodies contained within the package
delete rows from the EMP table. These procedures can then be body separately from their publicly declared specifications in
called by any application without rewriting the SQL statements the package specification. As long as the procedure
necessary to accomplish these tasks. If the methods of data specification does not change, objects that reference the
78
altered procedures of the package are never marked invalid; new_balance NUMBER;
DATABASE
that is, they are never marked as needing recompilation insufficient_funds EXCEPTION;
The following example creates the specification and body for a BEGIN
package that contains several procedures and functions that SELECT balance INTO old_balance FROM accounts
process banking transactions. WHERE acct_id = acct
FOR UPDATE OF balance;
MANAGEMENT
Create Package bank_transactions (null) AS
new_balance := old_balance - debit;
minimum_balance Constant Number := 100.00;
IF new_balance >= minimum_balance THEN
Procedure apply_transactions;
UPDATE accounts SET balance = new_balance
Procedure enter_transaction (acct Number, kind Char,
WHERE acct_id = acct;
amount Number);
do_journal_entry(acct, ‘D’);
END bank_transactions;
ELSE
Create Package Body bank_transactions AS
RAISE insufficient_funds;
/* Package to input bank transactions */
END IF;
new_status CHAR(20); /* Global variable to record status of
EXCEPTION
transaction being applied. Used for update in
WHEN NO_DATA_FOUND THEN
Apply_Transactions. */Procedure do_journal_entry (acct
new_status := ‘Nonexistent account’;
Number, kind CHAR) IS
WHEN insufficient_funds THEN
/* Records a journal entry for each bank transaction applied
new_status := ‘Insufficient funds’;
by the Apply_Transactions procedure. */
WHEN OTHERS THEN /* Returns other errors to applica-
Begin
tion */
Insert Into journal
new_status := ‘Error: ‘ || SQLERRM(SQLCODE);
Values (acct, kind, sysdate);
END debit_account;
IF kind = ‘D’ THEN
PROCEDURE apply_transactions IS
new_status := ‘Debit applied’;
/* Applies pending transactions in the table TRANSACTIONS
ELSIF kind = ‘C’ THEN
to the
new_status := ‘Credit applied’;
ACCOUNTS table. Used at regular intervals to update bank
ELSE
accounts without interfering with input of new transactions. */
new_status := ‘New account’;
/* Cursor fetches and locks all rows from the TRANSAC-
END IF;
TIONS
END do_journal_entry;
table with a status of ‘Pending’. Locks released after all pending
Procedure credit_account (acct Number, credit Number) IS
transactions have been applied. */
/* Credits a bank account the specified amount. If the account
CURSOR trans_cursor IS
does not exist, the procedure creates a new account first. */
SELECT acct_id, kind, amount FROM transactions
old_balance NUMBER;
WHERE status = ‘Pending’
new_balance NUMBER;
ORDER BY time_tag
BEGIN
FOR UPDATE OF status;
SELECT balance INTO old_balance FROM accounts
BEGIN
WHERE acct_id = acct
FOR trans IN trans_cursor LOOP /* implicit open and fetch
FOR UPDATE OF balance; /* Locks account for credit update
*/
*/
IF trans.kind = ‘D’ THEN
new_balance := old_balance + credit;
debit_account(trans.acct_id, trans.amount);
UPDATE accounts SET balance = new_balance
ELSIF trans.kind = ‘C’ THEN
WHERE acct_id = acct;
credit_account(trans.acct_id, trans.amount);
do_journal_entry(acct, ‘C’);
ELSE
EXCEPTION
new_status := ‘Rejected’;
WHEN NO_DATA_FOUND THEN /* Create new account
END IF;
if not found */
/* Update TRANSACTIONS table to return result of applying
INSERT INTO accounts (acct_id, balance)
this transaction. */
VALUES(acct, credit);
UPDATE transactions SET status = new_status
do_journal_entry(acct, ‘N’);
WHERE CURRENT OF trans_cursor;
When Others Then /* Return other errors to application */
END LOOP;
new_status := ‘Error: ‘ || SQLERRM(SQLCODE);
COMMIT; /* Release row locks in TRANSACTIONS table. */
END credit_account;
END apply_transactions;
PROCEDURE debit_account (acct NUMBER, debit NUM-
PROCEDURE enter_transaction (acct NUMBER, kind
BER) IS
CHAR, amount NUMBER) IS
/* Debits an existing account if result is greater than the
/* Enters a bank transaction into the TRANSACTIONS table.
allowed minimum balance. */
A new transaction is always put into this ‘queue’ before being
old_balance NUMBER;
79
applied to the specified account by the
DATABASE
How Oracle Stores Procedures and Packages
APPLY_TRANSACTIONS When you create a procedure or package, Oracle
procedure. Therefore, many transactions can be simultaneously • compiles the procedure or package
input without interference. */
• stores the compiled code in memory
BEGIN
INSERT INTO transactions • stores the procedure or package in the database
MANAGEMENT
VALUES (acct, kind, amount, ‘Pending’, sysdate); Points to Ponder

COMMIT;
• PL/SQL enables you to mix SQL statements with
END enter_transaction;
procedural constructs.
END bank_transactions;
• Procedures and functions are schema objects that logically
Packages allow the database administrator or application group a set of SQL and other PL/SQL programming
developer to organize similar routines. They also offer increased language statements together to perform a specific task.
functionality and database performance.
• A package is a group of related procedures and functions
Benefits of Packages
Packages are used to define related procedures, variables, and Review Terms
cursors and are often implemented to provide advantages in the • Difference between SQL & PL/SQL
following areas: • Stored procedures
• encapsulation of related procedures and variables • Packages
• declaration of public and private procedures, variables, • Stored procedures
constants, and cursors • Functions
• better performance
Students Activity
Encapsulation 1. Differentiate between SQL & PL-SQL?
Stored packages allow you to encapsulate (group) related stored
procedures, variables, datatypes, and so forth in a single named,
stored unit in the database. This provides for better organiza-
tion during the development process.
Encapsulation of procedural constructs in a package also makes
privilege management easier. Granting the privilege to use a
package makes all constructs of the package accessible to the
grantee.
Public and Private Data and Procedures
2. Define stored procedures & functions?
The methods of package definition allow you to specify which
variables, cursors, and procedures are
public Directly accessible to the user of a package.

private Hidden from the user of a package.
For example, a package might contain ten procedures. You can

define the package so that only three procedures are public and
therefore available for execution by a user of the package; the
remainder of the procedures are private and can only be accessed 3. Define packages?
by the procedures within the package.
Performance Improvement
An entire package is loaded into memory when a procedure
within the package is called for the first time. This load is
completed in one operation, as opposed to the separate loads
required for standalone procedures. Therefore, when calls to
related packaged procedures occur, no disk I/O is necessary to
execute the compiled code already in memory.
A package body can be replaced and recompiled without
affecting the specification. As a result, objects that reference a
package’s constructs (always via the specification) need not be
recompiled unless the package specification is also replaced. By
using packages, unnecessary recompilations can be minimized,
resulting in less impact on overall database performance.
80
DATABASE MANAGEMENT
81
Student Notes
LESSON 27:
LAB
82
DATABASE MANAGEMENT
DATABASE MANAGEMENT
83
LESSON 28:
LAB
DATABASE
LESSON 29:
DATABASE TRIGGERS
MANAGEMENT
Lesson Objectives Triggering Event or Statement

• Database triggers A triggering event or statement is the SQL statement that
causes a trigger to be fired. A triggering event can be an Insert,
• Parts of triggers
Update, or Delete statement on a table.
• Types of triggers
. . . Update of parts_on_hand on inventory . . .
• Executing trigger
which means: when the Parts_On_Hand column of a row in
Database triggers are procedures that are stored in the database the INventory table is updated, fire the trigger. Note that when
and are implicitly executed (fired) when the contents of a table the triggering event is an Update statement, you can include a
are changed. column list to identify which columns must be updated to fire
Introduction the trigger. You cannot specify a column list for Insert and
Oracle allows the user to define procedures that are implicitly Delete statements, because they affect entire rows of informa-
executed (i.e.” executed by, Oracle itself), when an insert, update tion.
or delete is issued against a table from SQL * Plus or through A triggering event can specify multiple DML statements, as in
an application. These procedures are called database triggers. . . . Insert or Update or Delete of inventory . . .
The major point that make these triggers stand alone is that
they are fired implicitly (i.e. internally) by Oracle itself and not which means: when an Insert, Update, or Delete statement is
explicitly called by the user, as done in normal procedures. A issued against the Inventory table, fire the trigger. When
trigger can include SQL and PL/SQL statements to execute as a multiple types of DML statements can fire a trigger, you can use
unit and can invoke stored procedures. However, procedures conditional predicates to detect the type of triggering statement.
and triggers differ in the way that they are invoked. A procedure In this way, you can create a single trigger that executes different
is explicitly executed by a user, application, or trigger. Triggers code based on the type of statement that fires the trigger.
(one or more) are implicitly fired (executed) by Oracle when a Trigger Restriction
triggering INSERT, UPDATE, or DELETE statement is A trigger restriction specifies a Boolean (logical) expression that
issued, no matter which user is connected or which application must be True for the trigger to fire. The trigger action is not
is being used. executed if the trigger restriction evaluates to FALSE or
Use Of Database Triggers Unknown. In the example, the trigger restriction is
Triggers can supplement the standard capabilities of Oracle to new.parts_on_hand < new.reorder_point
provide a highly customized database management system. For Trigger Action
example, a trigger can restrict DML operations against a table to A trigger action is the procedure (PL/SQL block) that contains
those issued during regular business hours. A trigger could also the SQL statements and PL/SQL code to be executed when a
restrict DML operations to occur only at certain times during triggering statement is issued and the trigger restriction
weekdays. Other uses for triggers are to evaluates to TRUE.
• automatically generate derived column values Like stored procedures, a trigger action can contain SQL and
• prevent invalid transactions PL/SQL statements, define PL/SQL language constructs
(variables, constants, cursors, exceptions, and so on), and call
• enforce complex security authorizations
stored procedures. Additionally, for row triggers (described in
• enforce referential integrity across nodes in a distributed the next section), the statements in a trigger action have access
database to column values (new and old) of the current row being
• enforce complex business rules processed by the trigger. Two correlation names provide access
• provide transparent event logging to the old and new values for each column.
• provide sophisticated auditing Types of Triggers
• maintain synchronous table replicates This section describes the different types of triggers: row and
statement triggers; and Before, After, and Instead-of triggers.
• gather statistics on table access
Row vs. Statement Triggers
Parts of a Trigger
When you define a trigger, you can specify the number of times
A trigger has three basic parts:
the trigger action is to be executed: once for every row affected
• a triggering event or statement by the triggering statement (such as might be fired by an
• a trigger restriction UPDATE statement that updates many rows), or once for the
• a trigger action triggering statement, no matter how many rows it affects.
84
• Before row trigger
DATABASE
Row Triggers
A row trigger is fired each time the table is affected by the Before modifying each row affected by the triggering
triggering statement. For example, if an UPDATE statement statement and before checking appropriate integrity
updates multiple rows of a table, a row trigger is fired once for constraints, the trigger action is executed provided that the
each row affected by the UPDATE statement. If a triggering trigger restriction was not violated.
statement affects no rows, a row trigger is not executed at all.
MANAGEMENT
• After statement trigger
Row triggers are useful if the code in the trigger action depends After executing the triggering statement and applying any
on data provided by the triggering statement or rows that are deferred integrity constraints, the trigger action is executed.
affected. • After row trigger
Statement Triggers After modifying each row affected by the triggering
A statement trigger is fired once on behalf of the triggering statement and possibly applying appropriate integrity
statement, regardless of the number of rows in the table that constraints, the trigger action is executed for the current row
the triggering statement affects (even if no rows are affected). provided the trigger restriction was not violated. Unlike
For example, if a DELETE statement deletes several rows Before row triggers, After row triggers lock rows.
from a table, a statement-level DELETE trigger is fired only You can have multiple triggers of the same type for the same
once, regardless of how many rows are deleted from the table. statement for any given table. For example you may have two
Statement triggers are useful if the code in the trigger action Before statement triggers for Update statements on the EMP
does not depend on the data provided by the triggering table. Multiple triggers of the same type permit modular
statement or the rows affected. For example, if a trigger makes a installation of applications that have triggers on the same
complex security check on the current time or user, or if a trigger tables. Also, Oracle snapshot logs use After row triggers, so you
generates a single audit record based on the type of triggering can design your own After row trigger in addition to the Oracle-
statement, a statement trigger is used. defined After row trigger.
Before vs. After Triggers You can create as many triggers of the preceding different types
When defining a trigger, you can specify the trigger timing- as you need for each type of DML statement (Insert, Update, or
whether the trigger action is to be executed before or after the Delete). For example, suppose you have a table, SAL, and you
triggering statement. Before and After apply to both statement want to know when the table is being accessed and the types of
and row triggers queries being issued. The example below contains a sample
package and trigger that tracks this information by hour and
Before Triggers type of action (for example, Update, Delete, or Insert) on table
Before triggers execute the trigger action before the triggering SAL. A global session variable, Stat.Rowcnt, is initialized to
statement is executed. This type of trigger is commonly used in zero by a Before statement trigger. Then it is increased each time
the following situations: the row trigger is executed. Finally the statistical information is
• When the trigger action should determine whether the saved in the table Stat_Tab by the After statement trigger.
triggering statement should be allowed to complete. Using
Sample Package and Trigger for SAL Table
a BEFORE trigger for this purpose, you can eliminate
unnecessary processing of the triggering statement and its Drop Table stat_tab;
eventual rollback in cases where an exception is raised in the Create Table stat_tab(utype CHAR(8),
trigger action. rowcnt Integer, uhour Integer);
Create or Replace Package stat is
• To derive specific column values before completing a
rowcnt Integer;
triggering INSERT or UPDATE statement.
END;
After Triggers /
AFTER triggers execute the trigger action after the triggering Create Trigger bt Before Update or Delete or Insert on sal
statement is executed. AFTER triggers are used in the following Begin
situations: stat.rowcnt := 0;
• When you want the triggering statement to complete before END;
executing the trigger action. /
Create Trigger rt Before Update or Delete Oor Insert on sal
• If a BEFORE trigger is already present, an AFTER trigger
for Each Row Begin
can perform different actions on the same triggering
stat.rowcnt := stat.rowcnt + 1;
statement.
END;
Combinations /
Using the options listed above, you can create four types of Create Trigger at After Update or Delete or Insert on sal
triggers: Declare
• BEFORE statement trigger typ CHAR(8);
Before executing the triggering statement, the trigger action hour Number;
is executed. Begin
IF updating
85
THEN typ := ‘update’; END IF; Then
DATABASE
IF deleting THEN typ := ‘delete’; END IF; Insert Into dept

IF inserting THEN typ := ‘insert’; END IF; Values(:n.deptno, :n.dept_type);
hour := TRUNC((SYSDATE - TRUNC(SYSDATE)) * 24); Else
UPDATE stat_tab Update dept SET dept.dept_type = :n.dept_type
SET rowcnt = rowcnt + stat.rowcnt Where dept.deptno = :n.deptno;
MANAGEMENT
WHERE utype = typ End If;

AND uhour = hour; If Not Exists Select * From project
IF SQL%ROWCOUNT = 0 THEN Where project.projno = :n.projno
INSERT INTO stat_tab VALUES (typ, stat.rowcnt, hour); Then
END IF; Insert Into project
EXCEPTION Values(:n.projno, :n.project_level);
WHEN dup_val_on_index THEN ELSE
UPDATE stat_tab Update project SET project.level = :n.level
SET rowcnt = rowcnt + stat.rowcnt Where project.projno = :n.projno;
WHERE utype = typ End If;
AND uhour = hour; END;
END; The actions shown for rows being inserted into the
/ Manager_Info view first test to see if appropriate rows already
Instead of Triggers exist in the base tables from which Manager_Info is derived.
Instead of triggers provide a transparent way of modifying The actions then insert new rows or update existing rows, as
views that cannot be modified directly through SQL DML appropriate. Similar triggers can specify appropriate actions for
statements (Insert, Update, and Delete). These triggers are called UPDATE and Delete.
Instead of triggers because, unlike other types of triggers, Trigger Execution
Oracle fires the trigger instead of executing the triggering A trigger can be in either of two distinct modes:
statement. The trigger performs update, insert, or delete
operations directly on the underlying tables. enabled An enabled trigger executes its trigger action if a triggering
Users write normal INSERT, DELETE, and UPDATE statement is issued and the trigger restriction (if any) evaluates
to TRUE.
statements against the view and the INSTEAD OF trigger
disabled A disabled trigger does not execute its trigger action, even if a
works invisibly in the background to make the right actions take triggering statement is issued and the trigger restriction (if any)
place. By default, INSTEAD OF triggers are activated for each would evaluate to TRUE.
row.
Example of an Instead of Trigger For enabled triggers, Oracle automatically
The following example shows an Instead of trigger for • executes triggers of each type in a planned firing sequence
inserting rows into the Manager_Info view. when more than one trigger is fired by a single SQL
Create View manager_info AS statement
SELECT e.name, e.empno, d.dept_type, d.deptno, p.level, • performs integrity constraint checking at a set point in time
p.projno with respect to the different types of triggers and guarantees
FROM emp e, dept d, project p that triggers cannot compromise integrity constraints
WHERE e.empno = d.mgr_no
• provides read-consistent views for queries and constraints
AND d.deptno = p.resp_dept;
• manages the dependencies among triggers and objects
Create Trigger manager_info_insert
referenced in the code of the trigger action
Instead of Insert on manager_info
Eferencing New As n — new manager information • uses two-phase commit if a trigger updates remote tables
For Each Row in a distributed database
Begin fires multiple triggers in an unspecified order, if more than one
If Not Exists Select * From emp trigger of the same type exists for a given statement
WHERE emp.empno = :n.empno Points to Ponder
THEN
• Allowing the user to define procedures that are implicitly
INSERT INTO emp
executed is known as triggers
VALUES(:n.empno, :n.name);
ELSE • Triggers can supplement the standard capabilities of Oracle
UPDATE emp SET emp.name = :n.name to provide a highly customized database management
WHERE emp.empno = :n.empno; system
END IF; • A trigger restriction specifies a Boolean (logical) expression
If Not Exists Select * From dept that must be TRUE for the trigger to fire.
Where dept.deptno = :n.deptno
86
• A trigger has three basic parts:
DATABASE
1. a triggering event or statement
2. a trigger restriction
3. a trigger action
6. Define various restriction of trigger?
MANAGEMENT
Review Terms
• Database triggers
• Parts of triggers
• Types of triggers
• Executing trigger
Students Activity
1. Explain the use of database triggers?Explain its need?
2. What are the various parts of database trigger?
3. Differentiate between before & after trigger?
4. Differentiate between row & after trigger?
5. Define INSTEAD OF Trigger with the help of an example?
87
Student Notes
88
DATABASE MANAGEMENT
DATABASE MANAGEMENT
89
LESSON 30:
LAB
LESSON 31:
LAB
90
DATABASE MANAGEMENT
DATABASE
LESSON 32:
DATABASE CURSORS
MANAGEMENT
Lesson Objectives records from a record, set created using a select statement are to
• Defining cursor be evaluated & processed one at a time, then the only method
available is by using Explicit Cursors.
• Use of cursor
Also cursors can be used to evaluate the success of updates and
• Explicit cursor
deletes and the number of this affected (Implicit ).
• Implicit cursor
Explicit Cursor
• Parametrized cursor
You can explicitly declare a cursor to process the rows individu-
When ever an SQL statement is executed, Oracle DBA performs ally. A cursor declared by the user is called Explicit Cursor. For
the following tasks:- queries that return more than one row, you must declare a
• Reserves an area in memory called private SQL Area. cursor explicitly.
• Populates this area with the appropriate data Why Use an Explicit Cursor?
• Processes the data in memory area. Cursors can be’ used when the users wants to process data one
• Frees the memory area when the execution is complete. row at a time.
What is Cursor ? Example

Oracle DBA uses a work area for its internal processing. This Update an Acctmast table und set a value in its Balance amount
work area is private to SQL’s operation and is called a cursor. column depending upon whether
The data that is stored is called Active Data Set. The size of the the account has an amount debited or credited in the Accttran
cursor in memory is the size required to hold the number of table. The records from the Accttran table will be fetched one at
rows in the Active Data set. a time and. updated in the Acctmast table depending upon
whether the account is debited or credited.
Example
PL/SQL raises an error if an embedded select statement
When a user first select statement as
retrieves more than one row. Such an error forces an abnormal
Select empno,job,salary from Employeewhere dept_no=20 termination of the PL/SQL block. Such an error can be
The resultant data set is as follows:- eliminated by using a cursor.
Active Data Set Explicit Cursor Management
3456 IVAN MANAGER 10000
• The steps involved in declaring a cursor and manipulating
3459 PRADEEP - ANALYST 7000 data in the active set are : Declare a cursor that specifies the
3446 MITA PROGRMR 4000 SQL select statement that you want to process
3463 VIJAY CLERK 2000 • Open a cursor.
3450 ALDRIN ACCTANT 3000
• Fetch data from the cursor one row at a time.
Contents -of a cursor. • Close the cursor.
When a query returns multiple rows, in addition to the data A cursor is defined in the declarative part of a PL/SQL block by
held in, the cursor, Oracle will also open and maintain a row naming it and specifying a query. Then, three commands are
pointer. Depending on user requests to view data the row used to control the cursor: open, fetch and close.
pointer will be relocated within the cursor’s Active Data Set. First, initialize the cursor with the open statement, this ;
Additionally Oracle also maintains cursor variables loaded with
defines a private SQL area executes a query associated with ‘“the
the value of the total no. of rows fetched from the active data
cursor populates the Active Data Set.
set.
Sets the Active Data Set1s row pointer to ‘the first record.
Use of Cursors in PL/SQL
While SQL is the natural language of the DBA, it does not have The fetch statement retrieves the current row and advances the
any procedural capabilities such as’ condition checking, looping cursor to the next row. You can execute fetch repeatedly until all
and branching. For this, Oracle provides the PL/SQL. Program- rows have been retrieved.
mers can use it to create programs for validation and When the last row has been processed, close the cursor with the
manipulation of table data. PL/SQIJ adds to the power of close statement. This will release the memory occupied by the
SQL and provides the user with all the functionality of a cursor and its Data Set.
programming environment. Focus: The HRD manager has decided to raise the salary for all
A PL/SQL block of code includes the Procedural code for the employees in department No. 20 by 0.05. Whenever any
looping and branching along with the SQL statement. If such raise is given to the. employees, a record for the same is
91
maintained in the emp_ raise table. It includes the employee
DATABASE
Fetching a Record from the Cursor
number, the date when the raise was given and the actual raise. The fetch statement retrieves the rows from the active set to the
Write a PL/SQL block to update the salary of each employee variables one at a time. Each time a fetch is executed, The focus
and insert a record in the emp _raise table;- of the DBA cursor advances to the next row in the Active Set.
The table definition is as follows: One can make use of any loop structure (Loop-End Loop
MANAGEMENT
Table name: employee along with While, For, IF-End If ’) to fetch the records from
the cursor into variables one row at a time.
Column name Data Type Size Attributes
Syntax : FETCH cursorname INTO variable1, variable2, ... ,
emp _code varchar 10 Primary Key, via which we shall seek
data in the table. For each column value returned by the query associated cursor,
Ename varchar 20 The first name of the candidate. there must be a corresponding variable in the into list. And,
Depmo number 5 The department number.
Job varchar 20 Employee job details. their datatypes must match. These variables will be declared in
Sal number 8,2 The current salary of the employee. the DECLARE section of the PL/SQL block.
-
Example : DECLARE
emp _code varchar 10 is the part of a composite key via which
we shall seek data in the table. cursor c - emp is select emp _code, salary from
raise date date The date on which the raise was given employee
raise amt number 8,2 The raise given to the employee.
where deptno = 20;
Emp _code and raise_date together form a composite primary /* Declaration of memory variable that holds data
key. fetched from the cursor */
str - emp _code employee.emp - code%type;
Declaring a Cursor
To do the above via a PL/SQL block it is necessary to declare a num _salary employee.salary%type;
cursor and associate it with a query before referencing it in any BEGIN
statement within the PL/SQL block. This is because forward open c_emp;
references to object are not allowed in PL/SQL.
/* infinite loop to fetch data from cursor c - emp one
Syntax : CURSOR cursorname IS row at a time */
SQL statement; loop
Example: DECLARE fetch c - emp into str - emp _code, num _salary;
/* Declaration of the cursor named c - emp /* Updating the salary in the employee table as
The active data set wi# include the names, department current salary + raise */
numbers and salaries of all the employees belonging
update employee set salary = num_salary +
to department 20 */
(num_salary * .05)
cursor c - emp is
where emp _code = str - emp _code;
select emp _code, salary from employee
/*insert a record in the emp _raise table */ insert into
where deptno = 20; emp _raise values
The cursor name is not a PL/SQL variable;. it is used only to (str_emp_code, sysdate, num_salary * 0.05)
reference the query. It cannot be assigned any values or be used
end loop;
in an expression.
commit;
Opening a Cursor
Opening the cursor executes the query and identifies the active END;
set, that contains all the rows which meet the query search Note that the current program will result into an indefinite loop
criteria. as there is no exit provided from the loop. The Exit from the
SYntax : OPEN cursorname ; loop can be provided using cursor variables as explained on
page 101 section Explicit Cursor Variables. Also note if you
Example: DECLARE
execute a fetch and there are no more rows left in the active data
cursor c - emp is select emp _code, salary from set, the values of the Explicit cursor ,variables are indetermi-
employee nate.
where deptno = 20; Closing a Cursor
BEGIN The close statement disables the cursor and the active set
becomes undefined. This will release the memory occupied by
/* Opening cursor c - emp */
the cursor and its Data Set. Once a cursor is closed, the user can
open c_emp; reopen the cursor using the open statement.
END; Syntax : CLOSE cursorname ;
Open statements retrieves the records frO1n the database and
places it in the cursor (private SQL area).
92
Example : DECLARE commit;
DATABASE
ursor c - emp is select _mp _code, salary from close c - emp ;
employee END;
where deptno = 20; • %FOUND: is the logical opposite of %notfound. It
str- emp _code employee.emp - code%type; evaluates to true, if the last fetch succeeded because a row
MANAGEMENT
num _salary employee.salary%type; was available; or to false, if the last fetch failed because no
more rows were available.
BEGIN
Syntax : cursorname%Found
open c_emp;
Example: The PL/SQL block will be as follows:
loop
Declare
fetch c - emp into str - emp _code, num_salary;
cursor c - emp is select emp _code, salary from employee
update employee set salary = num_salary +
(num_salary * .05) where deptno = 20;
where emp _code = str - emp _code; str - emp _code employee.emp - code%type;
insert into emp _raise values num_salary employee.salary_type;
(str - emp code, sysdate, num_salary * 0.05) Begin
end loop; open c_emp;
commit; loop
/* Close cursor c - emp */ fetch c - emp into str - emp _code, num _salary;
close c_emp; * If no. of record” retrieved:> 0 then
END; process the data

else exit the loop. */
Explicit Cursor Attributes
Oracle provides certain attributes / cursor variables to control if c_emp%)found then
the execution of the cursor. Whenever any cursor (explicit or update employee set salary = num_salary + (num_salary *
implicit) is opened and used Oracle creates a set of four system. .05)”
variables via which Oracle keeps track of the ‘Current status of where emp _code =str - emp _code;
the cursor. You can access these cursor variables. They are
insel1 into emp _raise values
described below.
(str_emp_code, sysdate, num_salary * 0.05)
• %Notfound : evaluates to t11,le, if the last fetch has failed
because no more rows were available; or to false, if the last else
fetch returned a row. exit;
Svntax : cursorname%NOTFOUND end if;
Example: DECLARE end loop;
cursor c - emp is select emp _code, salary from employee commit;
where deptno = 20; close c - emp ;
str - emp _code employee.emp -=.code%type; END;
num_salary employee. salary%type; • %ISOPEN : evaluates to true, if an explicit cursor is open;
BEGIN or to false, if it is closed.
open c_emp; Syntax : Cursorname%ISOPEN
loop Example : Declare
fetch c_emp into str_emp_code, num_salary; cursor c - emp is select emp _code, salary from employee
/ * If no. of records retrieved is 0 or if all the records are where deptno = 20;
fetched then exit the loop. */ str - emp _code employee.emp - code%type;
exit when c_emp%notfound ; num _salary employee. salary%type;
update employee set salary = num_salary + (num_salary * Begin

.05) open c - emp;
where -emp _code = str - emp _code; /* If the cursor is open
insert into emp _raise values continue with the data processing
(str_emp_code, sysdate, num_salary * O_05) else
end loop; display an appropriate error message */
93
if c_emp%isopen then Example : ‘The PL/SQL block will be rewritten as follows:
DATABASE
loop Declare
fetch c - emp into str - emp _code, num _salary; cursor c - emp is select emp _code, salary from employee
exit when c - emp%notfound ; where deptno = 20;
update employee set salary = num_salary + (num_salary * Begin
MANAGEMENT
.05} for emp_rec in c_emp

where emp _code = str - emp _code; loop
insert into emp _raise values update employee
(str - emp _code, sysdate, num _salary * 0.05) set salary = emp_rec.salary + (emp_rec.salary * .05)
end loop; where emp _code = emp _rec.emp _code;
commit; insert into emp _raise values
close c - emp ; (emp_rec.emp_code, sysdate, emp_rec.salary *.05)
else end loop;
dbms_output. put_line (‘Unable to open Cursor’); commit;
end if ; End;
END; When you use a cursor for loop, the cursor for loop
%Rowcount : returns the number of rows fetched from the • implicitly declares emp _rec as belonging to type c _
active set. It is set to zero when the cursor is opened. emp%rowtype and retrieves the records as declared in the
Syntax : cursomaine%Rowcount cursor c - emp.
Example: Display the names, department name and salary • The sequence of statements inside the loop is executed once
of the first 10 employees getting the highest salary. for every row that satisfies the query associated with the
Declare cursor. with each iteration, a record will be fetched from c
_emp into emp _rec. Dot notation should be used to make
cursor c - emp is
a reference to individual items.
select ename, deptno, salary from employee, deptmaster
Example: emp yec.emp _code
where deptmaster.deptno = employee.deptno
where emp _rec is a row type variable and emp _code is the
order by salary desc ; name of the field.
str - ename employee.ename%type ; • When you leave the loop, the cursor is closed automatically.
num - deptno employee.deptno%type ; This is true even if you use an exit or goto statement to
num_salary employee.salary%type; leave the loop prematurely, or if an exception is raised inside
the loop. Thus, When you exit the loop it closes the cursor c
Begin
- emp.
open c_emp;
Note: The record is defined only inside the loop. You cannot
dbms_output.Put_line (‘Name Department Salary’); refer to it’s item outside the loop. The reference in the following
dbms - output. Put _line (‘—— ———— ——-‘); example is illegal:
-loop Begin
fetch c - emp into str - ename, num - deptno, num _salary; for c1rec in c1 loop
dbms -output.Put_line (str - ename II ‘ , II ………
num - deptno II ‘ end loop;
exit when c_emp%rowcount = 10 ; result:= clrec.n2 + 3; /* referencing c 1rec outside for loop is
end loop; ‘. illegal */
‘II num_salary); End;
END; Focus: A bank has an ‘ACCTMAST’ table where it holds the
current status of a ,client’s bank account ( i.e. currently what the
Cursor for Loops
client has in the savings bank account.)
In most situations that require an explicit cursor, you can
simplify coding by using a cursor for loop instead of the open, Another table called the ACCTTRAN table holds each transac-
fetch and close statements. tion as it occurs at the back. i.e. Deposits / Withdrawals of
clients. A client can deposit money which must be then
A cursor for loop implicitly declares its loop index as a
‘ADDED’ to the amount held against that specific clients name
%rowtype record, opens a cursor, repeatedly fetches rows of
in the ACCTMAST table. This is referred to as a ‘CREDIT’ type
values from the active set into items in the record, and closes the
transaction.
cursor when all rows have been processed.
94
A client may withdraw money from his account. This must be where acctno = acctnum;
DATABASE
‘SUBTRACTED’ from the amount held against that specific else
client’s name in the ACCTMAST table. This is referred to as a
/* if the account is credited then update the
‘DEBIT type transaction.
acctmast table as balance = balance + amt */ update
The ACCTTRAN table must therefore hold a ‘flag’ that
acctmast
MANAGEMENT
indicates whether the transaction type was ‘CREDIT or
‘DEBIT’. set balance = (balance + amt)
Based on this flag define a cursor which will update the where acctno = acctnum ;
ACCTMAST ‘Balance’ field contents. end if;
Write a PL/SQL block that updates the acctmast table and sets update accttran set processed = ‘Y’
the balance depending upon whether the account is debited ‘OT where acctno = acctnum;
credited. The updation should be done only for those records
exit when acc - updt%notfound ;
that are not processed i.e. the processed flag is ‘N’ in the accttran
table. end loop;
1. Create the following tables close acc - updt;
Table name: acctmast commit;

End;
Column name Data Type Size Attributes
Implicit Cursor
Acctno varchar2 4 'Primary Key Oracle implicitly opens a cursor to process each SQL statement
Name varchar2 20 Account Name not associated with an explicitly declared cursor. PL/SQL lets
Balance Number 8 The balance in the account. you refer to the most recent implicit cursor as the SQL cursor.
So, although you cannot use the open, fetch, and close state-
Table name : Accttran ments to control an implicit cursor, you can still use cursor
attributes to access “information about 1:\1e most recently
Column name Data Type Size Attributes executed SQL statement.
Acctno varchar2 4 is the foreign key field which
Implicit Cursor Attributes
references table acttmast.
TrnDate date The Transaction Date: The SQL cursor has four attributes as described below. When
deb crd char 1 The Dr / Cr Flag. appended to the cursor name (i.e. SQL), these attributes let you
Amount number 7,2
Processed char 1 A flag indicating whether the record is
access information about the execution of insert, update, delete
processed or not and single-row select statements. Implicit cursor attributes
return the boolean null value, until they are set by a cursor
The following PL/SQL code updates the acctmast table operation.
depending upon the daily transactions entered in the accttran The values of the cursor attributes always refer to the most
table. recently executed SQL statement, wherever the statement
Declare appears. It might be in a different scope (in a sub-block). So, if
cursor acc-updt is you want to save an attribute value for later use, assign it a
boolean variable immediately.
select acctno, deb - crd, amount from accttran
• %Notfound : evaluates to true, if an insert, update or delete
where processed = ‘N’;
affected no rows, or a single -row select returns no rows.
acctnum char(4); Otherwise, it evaluates to false.
db_cd char( 1); Syntax : QL %Notfound
amt number(7,2); Example : The HRD manager has decided to raise the salary
Begin of employees by 0.15. Write a PL/SQL block to accept the
open acc - updt ; employee number and update the salary of that employee.
Display appropriate message based on the existence of the
/* perform the updation for all the records retrieved by the
record in the employee table.
cursor */
Begin
loop
update employee set salary = salary * 0.15
fetch acc - updt into acctnum, db - cd,amt;
where emp _code = &emp _code;
/* if the account is debited then update the
if sq1%Botfound- then
acctmast table as balance = balance - amt */
dbms_output.put_line(‘Employee No. Does not Exist’);
if db_cd = ‘d’ then
else
update acctmast
dbms_- output.put_line(‘Employee Record Modified
set balance = (balance - amt)
Successfully’);
95
end if; We should be able to pass a value to cursor only when it is
DATABASE
END; being opened. For $at the cursor must be declared in such a way
that it recognizes that it will receive the requested value( s) at the
• %FOUND : is the logical opposite of %notfound. Note,
time of opening the cursor. Such a cursor is known as Param-
however that both attributes evaluate to null until they are
eterized Cursor.
set by an implicit or explicit cursor operation. %found
MANAGEMENT
evaluates to’ true, if an insert, update or delete affected one Syntax : CURSOR cursor_name (variable_name datatype) is
or more rows, or a single-row select returned one or more select statement...
tows. Otherwise, it evaluates to false. The scope of cursor parameters are local to that cursor, which
Syntax: SQL%Notfound metros that they can be referenced only within the query declared
Example: The example in sq1%notfound will be written as in the cursor declaration. The values of cursor parameters are
follows: used by the associated query when the cursor is opened.
Begin For example.
update employee set salaty = salaty * 0.15 cursor c_emp (num_deptno number) is
where emp _code = &emp _code; select job, ename from emp where deptno > num - deptn;
if sq1%found then The parameters to a cursor can be passed in the open statement.
They can either be constant values or the contents of a memory
dbms - output.put _line(‘Employee Record
variable.
Modified Successfully’);
For example
else
OPEN c_emp (30)
dbms_output.put_line(‘Employee No. Does not Exist’);
OPEN c_emp (num_deptno)
end if;
Note: The memory variable should be declared in the DE-
END; CLARE section and the value should be assigned to that
• %Rowcount : returns the number of rows affected by an memory variable.
insert, update or delete, or select into statement. Each parameter in the declaration must have a corresponding
Example: The HRD manager has decided to raise the salary value in the open statement. Remember that the parameters of
of employees working as ‘Programmers’ by 0.15. Write a a cursor cannot return values.
PL/SQL block to accept the employee number and update Example :
the salary of that employee. Display appropriate message
based on the existence of the record in the employee table. Allow insert, update and delete for the table itemmast on the
bases of the table itemtran table..
Declare
1. Create the following tables
rows_affected char( 4) ;
Table name: itemmast
Begin
update employee set salary = salary * 0.15 Column name Data Type Size Attributes
where job = ‘Programmers’ ; itemid Number 4 Primary Key
rows_affected := to_char(sq1%rowcount) description Varchar 20 The item description
if sq1%rowcount > 0 then Bal stock Number 3 The balance stock for an item
dbms - output.put_line(rows _affected II ‘Employee

Table name: ItemTran
Records
Modified Successfully’) ; Column name Data Type Size Attributes
else Itemid Number 4 Foreign key via which we shall seek data
in the table.
dbms_output.put_line(‘There are no Employees Description Varchar 30 Item description
Operation Char 1 The kind of operation on Item Mast table
working as Programmers’); i.e. Insert Update Delete(I U, D)
Qty Number 3 The Qty sold
end if;
Status Varchar 30 The status of the Operation.
END;
• %ISOPEN : Oracle automatically closes the SQL cursor after Based on the value in the operation column of table itemtran
executing its associated SQL statement. As a result, the records for table itemmast is either inserted updated or
sq1%isopen always evaluates to false. deleted. On the basis of success/failure of insert, update and
delete operation the status column in the table itemtran is
Parameterized Cursors updated with appropriate text indicating success or reason for
So far we have used cursors querying all the records from a table. failure. Following are the 3 cases which are to be taken care of.
Sometimes records are brought in memory selectively. While
1. If Operation = ‘ I ‘ then the itemid against along with
declaring a cursor? the select statement must include a where
description and qty is inserted into the required columns of
clause to retrieve data conditionally.
96
the table itemmast. If insert is successful then the status update itemtran
DATABASE
field of itemtran table is updated to’ ‘ Successful ‘- else it is set itemtran.status = ‘SUCCESSFUL’
updated as ‘Item Already Exist’.
where itemid = itemidno;
2. If Operation = ‘ U ‘ then the qty against this operation
/* if the record is notfound and the operation is update/
column is added to bal:...:.stock column of the table
delete then set the status to ‘Item Not Present’ */
MANAGEMENT
itemmast where itemid of table itemmast is same as that of
itemtran. If update is successful then the status column of elsif oper = ‘U’ or oper = ‘D’ then
itemtran table is updated to ‘ Successful’ else it is updated as update itemtran
‘ Item Does Not Exist’. set itemtran.status = ‘Item Not Present’
3. If Operation”=’ D ‘ , then a Tow from itemmast is deleted where itemid = itemidno;
whose itemid is equal to the itemid in the table itemtran
end if;
with the operation column having the value’ D ‘. If delete is
successful then the status column of itemtran table is else
updated to ‘Successful’ else it is updated as ‘Item Does Not /* if the record is found and the operation is insert then set
Exist’. the status to ‘Item Already exists’ *,/
The following PL/SQL code takes care of the above three cases. if oper = ‘I’ then
Declare update itemtran
/* Cursor scantable retrieves all the records of table itemtran */ set itemtran.status = ‘Item Already Exists’
cursor scantable is where itemid = itemidno;
select itemchk operation, qty, description from itemtran; /* if the record is found and the operation is update/delete
/* Cursor Itemchk accepts the value of item id from the current then perform the update or delete operation
raw of cursor set the status to ‘Successful’ */ .
scantable */ elsif oper = ‘D’ then
cursor itemchk(mastitemid number) is delete from item_mast where item - mast.itemid =
select itemid from item_mast itemidno;
update itemtran
where itemid = mastitemid;
set itemtran.status = ‘Successful’
/* variable_ that hold data from the cursor scantable */
itemidno number( 4);
elsif oper = ‘U’ then
de scrip varchar2(30);
update item_mast
oper char(l);
set item_mast. bal- stock = quantity
quantity number(3);
/* variable that hold data from the cursor itemchk */
dummyitem number( 4); update itemtran -
Begin set itemtran.status = ‘Successful’
/* open the scantable cursor */ where itemid = itemidno;
open scantable; end if;
loop’ close itemchk;
/ * Fetch the records from the scan table cursor */fetch exit when scantable%notfound;
scantable into itemidno, oper, quantity, ‘descrip; end loop;
/* Open the itemchk cursor close scantable;
Note that the value of variable passed to the itemchk cursor commit;
is set to the value of item id in the current row of cursor END;
scantable */
Points to Ponder
open itemchk(itemidno);
• Oracle DBA uses a work area for its internal processing.
fetch itemchk into dummyitem;
• You can explicitly declare a cursor to process the rows
/* if the record is not found and the operation is insert
individually. A cursor declared by the user is called Explicit
then insert the new record and set the status to ‘Successful’
Cursor
*/if itemchk%notfound then
• Opening the cursor executes the query and identifies the
if oper = ‘T’ then insert into item - mast(itemid, bal- stock,
active set, that contains all the rows which meet the query
description)
search criteria.
values(itemidno; quantity, descrip);
97
• The fetch statement retrieves the rows from the active set to
DATABASE
the variables one at a time

• The close statement disables the cursor and the active set
becomes undefined
• Oracle implicitly opens a cursor to process each SQL
MANAGEMENT
statement not associated with an explicitly declared cursor.

Review Terms
• Defining cursor
• Use of cursor
• Explicit cursor
• Implicit cursor
• Parametrized cursor
Students Activity
1. Define cursors?Explain its usage?
2. Differentiate between explicit & implicit cursor with the help

of an example?
98
DATABASE MANAGEMENT
99
Student Notes
LESSON 33:
LAB
100
DATABASE MANAGEMENT
DATABASE MANAGEMENT
101
LESSON 34:
LAB
DATABASE
LESSON 35:
NORMALISATION - I
MANAGEMENT
Lesson Objectives relation schema had an attribute whose domain consists of

• Normalisation iden-tification numbers encoded as above, the schema would
not be in first normal form.
• First Normal Form
First normal form (1NF) sets the very basic rules for an
• Functional dependencies
organized database:
• Closure of a set of functional dependencies
• Eliminate duplicative columns from the same table.
When deciding upon the structure of data to be stored in a
• Create separate tables for each group of related data and
file(s), or a database, the two main issues to be considered are:
identify each row with a unique column or set of columns
1. removing data duplication from the files/database (the primary key).
2. avoiding data confusion, where different versions of the
Functional Dependencies
same information is located in different places in the same
Functional dependencies plays key role in designing good
file
database. A functional dependency is a type of constraint that is
Data Normalisation is the process of determining the correct a generalization of the notion of key.
structure for data in files or databases so that the problems
Functional dependencies are constraints. on the set of legal
mentioned cannot occur
relations. They. allow us to express facts about the enterprise
Data is structured by following a series of steps. Each step that we are modeling with our database.
removes the potential for a particular problem to occur in the
If we define the notion of a superkey as follows. Let R be a
data, e.g. duplication, and each step builds upon the previous
relation schema. A subset K of R is a superkey of R if, in any
steps
legal relation r(R), for all pairs tl and t2 of tuples in r such that
First Normal Form tl≠t2, then t1[K] ≠t2[K_] . That is, no two tuples in any legal
The first of the normal forms that we study, first normal form, relation r(R) may have the same value on attribute set K.
imposes a very basic requirement on relations. The notion of functional dependency generalizes the notion of
A domain is atomic if elements of the domain are considered superkey. Consider a relation schema R, and let α ⊆ R and β⊆R.
to be indivisible units. We say that a relational schema R is in The functional dependency
first normal form (lNF) if the domains of all attributes of R
C→α
are atomic.
holds on schema R if, in any legal relation r(R), for all pairs of
A set of names is an example of a non atomic value. For
tuples t1 and t2 in r such that t1(α) = t2 (α) it is also the case
example, if the schema of a relation employee included an
that t1(β) = t2(β)
attribute children whose domain elements are sets of names,
the schema would not be in first normal form. Using the functional-dependency notation, we say that K is a
superkey of R if K→ R. That is, K is a superkey if, whenever
Composite attributes, such as an attribute address with
t1[K) = t2[K], it is also the case that t1[R]= t2[R] (that is,
component attributes street and city, also have nonatomic
t1=t2).
domains.
Functional dependencies allow us to express constraints that we
Integers are assumed to be atomic, so the set of integers is an
cannot express with superkeys. Consider the schema
atomic domain; the set of all sets of integers is a nonatomic
domain. The distinction is that we do not normally consider Loan-info-schema = (loan-number, branch-name, customer-
integers to have subparts, but we consider-sets of integers to name, amount) which is simplification of the Lending-schema
have subparts-namely, the integers making up the set. But the that we saw earlier. The set of functional dependencies that we
important issue is not what the domain itself is, but rather expect to hold on this elation schema is
how we use domain elemeents in our database. The domain of loan-nurnber → amount loan-number → branch-name
all integers would be nonatomic if we considered each integer to We would not, however, expect the functional dependency
be an ordered list of digits.
loan-number → customer-name
As a practical illustration of the above point, consider an
to hold, since, in general, a given loan can be made to more than
organization that as-signs employees identification numbers of
one customer (for example, to both members of a husband-
the following form: The first two letters specify the department
wife pair).
and the remaining four digits are a unique number within the
department for the employee. Examples of such numbers
would be CS0012 and EE1127. Such identification numbers can
be divided into smaller units, and are there-fore nonatomic. If a
102
We shall use functional dependencies in two ways:
DATABASE
Customer-name Customer-street Customer-city
1. To test relations to see whether they are legal under a given
Jones Main Harrison
set of functional dependencies. If a relation r is legal under Smith North Rye
a set F of functional dependencies, we say that r satisfies F. Hayes Main Harrison
2. To specify constraints on the set of legal relations. We shall Curry North Rye
MANAGEMENT
thus concern our-selves with only those relations that satisfy Lindsay Park Pittsfield
a given set of functional dependen-cies. If we wish to Tufl1et Putnam Stamford
constrain ourselves to relations on schema R that satisfy a Williams Nassau Princeton
set F of functional dependencies, we say that F holds on R. Adams Spring Pittsfield
Johnson Alma Palo Alto
To see which functional dependencies are satisfied. Observe that
Glenn Sand Hill Woodside
A → C is satisfied. There are two tuples that have an A value
Brooks Senator Brooklyn
of a1. These tuples have the same C value-namely, C1. Similarly,
Green Walnut Stamford
the two tu-ples with an A value of a2 have the same C value,
C2. There are no other pairs of distinct tuples that have the The customer relation
same A value. The functional dependency C → A is not L-17 Downtown 1000
satisfied, however. To see that it is not, consider the tuples t1 = L-23 Redwood 2000
(a2, b3, C2, d3) and L-15 Perryridge 1500
L-14 Downtown 1500
L-93 Mianus 500
a1 B1 C1 d1 L-11 Round Hill 900
a1 B2 c1 d2 L-29 Pownal 1200
a2 B2 c2 d2 L-16 North Town 1300
a2 B2 C2 d3 L-18 Downtown 2000
L-25 Perryridge 2500
a3 B3 C2 d4
L-10 Brighton 2200
Sample relation r. The loan relation.

t2= (a3, b3, C2, d4). These two tuples have the same C values, can have streets with the same name. Thus, it is possible, at
C2, but they have dif-ferent A values, a2 and a3, respectively. some time, to have an instance of the customer relation in
Thus, we have found a pair of tuples t1 and t2 such that t1(C) which customer-street → customer-city is not satis-fied. So, we
= t2 [C], but t1 [A] ≠ t2 [A]. would not include customer-street → customer-city in the set
Many other functional dependencies are satisfied by r, including, of functional dependencies that hold on Customer-schema.
for example, the functional dependency AB → D. Note that we In the loan relation (on Loan-schema), we see that the depen-
use AB as a shorthand for {A,B}, to conform with standard dency loan--number → amount is satisfied. In contrast to the
practice. Observe that there is no pair of distinct tuples tl and t2 case of customer-city and customer-street in Customer-schema,
such that tl [AB] = t2 [AB]. Therefore, if t1 [AB] = t2 [AB], it we do believe that the real-world enterprise that we are model-
must be that t1 = t2 and, thus, t1 [D] = t2[D]. So, r satisfies ing requires each loan to have only one amount. Therefore, we
AB→ D. want to require that loan-number → amount be satisfied by the
Some functional dependencies are said to be trivial because they loan relation at all times. In other words, we require that the
are satisfied by all relations. For example, A → A is satisfied by constraint loan-number → amount hold on Loan-schema.
all relations involving attribute A. Reading the definition of In the branch relation, we see that branch-name → assets is
functional dependency literally, we see that, for all tuples t1 and satisfied, as is assets → branch-name. We want to require that
t2 such that t1(A] = t2 [A], it is the case that h[A] = t2 [A]. branch-name → assets hold on Branch-schema. However, we
Similarly, AB → A is satisfied by all relations involving attribute do not wish to require that assets → branch-name hold, since it
A. In general, a functional dependency of the form α → β is is possible to have several branches that have the same asset
trivial if β ⊆ α. value.
To distinguish between the concepts of a relation satisfying a In what follows, we assume that, when we design a relational
dependency and a dependency holding on a schema, we return database, we first list those functional dependencies that must
to the banking example. If we consider the customer relation always hold. In the banking example, our list of dependencies
(on Customer-schema) in Figure 7.3, we see that customer- includes the following:
street → customer-city is satisfied. However, we believe that, in Downtown Brooklyn 9000000
the real world, two cities Redwood Palo Alto 2100000
Perryridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
103
• On Branch-schema: Therefore, we have shown that, whenever tl and t2 are tuples
DATABASE
branch-name → branch-city such that tl [A] = t2 [A], it must be that tl [H] = t2[H]. But
that is exactly the-definition of A → H.
branch-name → assets
Let F be a set of functional dependencies. The closure of F,
• On Customer-schema:
denoted by F+, is the set of all functional dependencies logically
customer-name → customer-city
MANAGEMENT
implied by F. Given F, we can compute F+ directly from the

customer-name → customer-street formal definition of functional dependency. If F were large, this
• On Loan-schema: process would be lengthy and difficult. Such a computation of
F+ requires argu-ments of the type just used to show that
loan-number → amount
A→H is in the closure of our example set of dependencies.
loan-nll1nber → branch-name
Axioms, or rules of inference, provide a simpler technique for
• On Borrower-schema: reasoning about functional dependencies. In the rules that
No functional dependencies follow, we use Greek letters (α, β, γ . . .) for sets of attributes,
• On Account-schema: and uppercase Roman letters from the beginning of the
alphabet for individual attributes. We use αβ to denote αUβ.
account-number → branch-name
account-number → balance We can use the following three rules to find logically implied
functional dependencies. By applying these rules repeatedly, we
• On Depositor-schema: can find all of F+ given F. This collection of rules is called
No functional dependencies Armstrong’s axioms in honor of the person who first pro-
Closure of a Set of Functional posed it.
Dependencies • Reflexivity rule. If α is a set of attributes and β ⊆ α then
It is not sufficient to consider the given set of functional a→β holds.
dependencies. Rather, we need to consider all functional • Augmentation rule. If α → β holds and γ is a set of
dependencies that hold. We shall see that, given a set F of attributes, then γα → γβ holds.
functional dependencies, we can prove that certain other
• Transitivity rule. If α → β holds and β → γ holds, then
functional dependencies hold. We say that such functional
α→γ holds.
dependencies are “logically implied” by F.
Armstrong’s axioms are sound, because they do not generate
More formally, given a relational schema R, a functional
any incorrect func-tional dependencies. They are complete,
dependency f on R is log-ically implied by a set of functional
because, for a given set F of functional de-pendencies, they
dependencies F on R if every relation instance r(R) that satisfies
allow us to generate all F+. The bibliographical notes provide
F also satisfies f.
ref-erences for proofs of soundness and completeness.
Suppose we are given a relation schema R = (A, B, C, G, H, I)
Although Armstrong’s axioms are complete, it is tiresome to
and the set of functional dependencies
use them directly for the computation of F+. To simplify
A→B matters further, we list additional rules. It is pos-sible to use
A→C Armstrong’s axioms to prove that these rules are correct
CG → H • Union rule. If α →β holds and α → γ holds, then α→βγ
CG → I • Decomposition rule. If α → βγ holds, then α →β holds
B→H and α → γ holds.
The functional dependency • Pseudotransitivity rule. If α → β holds and γβ → δ
holds, then αγ → δ holds.
A→H
Let us apply our rules to the example of schema R = (A, B, C,
is logically implied. That is, we can show that, whenever our
G, H, I) and the set F of functional dependencies {A → B, A →
given set of functional dependencies holds on a relation, A→H
C, CG → H, CG → I, B → H}. We list several members of F+
must also hold on the relation. Suppose that tl and t2 are tuples
here:
such that
A → H. Since A → B and B → H hold, we apply the transitiv-
t1(A] = t2[A]
ity rule. Observe that it was much easier to use Armstrong’s
Since we are given that A → B, it follows from the definition of axioms to show that A → H holds than it was to argue directly
functional dependency that from the definitions, as we did earlier in this section.
t1(B] = t2[B] • CG → HI. Since CG → and CG → I, the union rule
Then, since we are given that B ® H, it follows from the implies that CG → HI.
definition of functional dependency that • AG → I. Since A → C and CG → I, the pseudotransitivity
t1(H] = t2[H] rule implies that A G → I holds.
Another way of finding that AG → I holds is as follows. We
use the aug-mentation rule on A → C to infer AG → CG.
104
Applying the transitivity rule to this dependency and CG → I, 4. Define functional dependecy?
DATABASE
we infer AG → I.
Points To Ponder
• Normalisation is the process of determining the correct
structure for data in files or databases
MANAGEMENT
• A relational schema R is in first normal form (lNF) if the
domains of all attributes of R are atomic.
• Functional dependencies plays key role in designing good
database. A functional dependency is a type of constraint
5. Define Closure of a set of functional dependency?
that is a generalization of the notion of key.
• The bad design of a database suggests that we should
decompose a relation schema that has many attributes into
several schemas with fewer attributes.
Review Terms
• Normalisation
• First Normal Form
• Functional dependencies
• Closure of a set of functional dependencies
Students Activity
1. Define Normalisation?
2. How normalisation can improve the redundancy of data?
3. Define 1NF?
105
Student Notes
106
DATABASE MANAGEMENT
DATABASE
LESSON 36:
NORMALISATION - II
MANAGEMENT
Lesson objectives the relation branch customer
• Decomposition Customer name Loan number Amount
• Properties of decomposition Jones L-17 1000
Smith L-23 2000
• Lossless join decomposition Hayes L-15 1500
• 2NF Jackson L-14 1500
Jones L-93 500
Decomposition Turner L-11 900
The bad design of a database suggests that we should decom- Williams L-29 1200
pose a relation schema that has many attributes into several Hayes L-16 1300
schemas with fewer attributes. Careless decom-position, Johnson L-18 2000
however, may lead to another form of bad design. Glenn L-25 2500
Brooks L-10 2200
Consider an alternative design in which we decompose Lend-
ing-schema into the following two schemas: The relation customer-loan.
Branch-customer-schema = (branch-name, branch-city, assets, The result of computing branch-customer [><I customer-loan.
customer-name) When we compare this relation and the lending relation with
Customer-loan-schema = (customer-name, loan-number, which we started, we notice a difference: Although every tuple
amount) that appears in the lending relation ap-pears in branch-customer
[><I customer-loan has the following additional
loan (Customer-loan-schema): tuples
branch-customer = Π branch-name, branch-city, assets, (Downtown, Brooklyn, 9000000, Jones, L-93, 500)
customer-name (lending)
(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
customer-loan = Π customer-name, loan-number, amount
(lending) (Mianus, Horseneck, 400000, Jones, L-17, 1000)
Of course, there are cases in which we need to reconstruct the (North Town, 3700000, Hayes, L-15, 1500)
loan relation. For example, suppose that we wish to find all Consider the query “Find all bank branches that have made a
branches that have loans with amounts less than $1000. No loan in an amount less than $1000.” . we see that the only
relation in our alternative database contains these data. We need branches with loan amounts less than $1000 are Mianus and
to reconstruct the lending relation. It appears that we can do so Round Hill However, when we apply the expression
by writing Π branch-name ((J amount < 1000 (branch-customer [><I
branch-customer _ customer-loan customer-loan))
we obtain three branch names: Mianus, Round Hill, and
Branch name Branch city Assets Customer name Downtown.
Downtown Brooklyn 9000000 Jones A closer examination of this example shows why. If a customer
Redwood Palo Alto 2100000 Smith happens to have several loans from different branches, we
Perryridge Horseneck 1700000 Hayes cannot tell which loan belongs to which branch. Thus, when we
Downtown Brooklyn . 9000000 Jackson join branch-customer and customer-loan, we obtain not only
Mianus Horseneck 400000 Jones
the tuples we had originally in lending, but also several addi-
Round Hill Horseneck 8000000 Turner
tional tuples. Although we have more tuples in
Pownal Bennington 300000 Williams
branch-customer [><I customer-loan, we actually have less in-
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson formation. We are no longer able, in general, to represent in the
Perryridge Horseneck 1700000 Glenn database information about which customers are borrowers
Brighton Brooklyn . 7100000 Brooks from which branch. Because of this loss of information, we call
the decomposition of Lending-schema into Branch-customer-
schema and customer-loan-schema a lossy decomposition, or a
lossy-join decomposition. Decomposition that is not a lossy-
join decomposition is a lossless-join decomposing.
107
That is, {R1, R2,..., Rn} is a decomposition of R if, for i =
DATABASE
Customer Laon
Branch name Branch city Assets Amount
name number 1,2,..., n, each Ri is a subset of R, and every attribute in R
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500 appears in at least one Ri.
Redwood Palo Alto 2100000 Smith L-23 2000 Let r be a relation on schema R, and let Ti = Π Ri (T) for i =
Perryridge Horseneck 1700000 Hayes L-15 1500
1,2,..n. That is, {r1, r2, r3……rn} is the database that results
Perryridge Horseneck 1700000 Hayes L-16 1300
MANAGEMENT
Downtown Brooklyn 9000000 Jackson L-14 1500 from decomposing R into {R1, R2" . . , Rn}.
Mianus Horseneck 400000 Jones L-17 1000 It is always the case that
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900 r⊆ r1 [><I rn
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-15 1500
To see that this assertion is true, consider a tuple t in relation r.
North Town Rye 3700000 Hayes L-16 1300 When we compute the relations r1, r2, r3, ... rn , the tuple t gives
Downtown Brooklyn 9000000 Johnson L-18 2000 rise to one tuple ti in each ri, i = 1,2, . . . , n. These n tuples
Perryridge Horseneck 1700000 Glenn L-25 2500 combine to regenerate t when we compute r1, [><Ir2…[><Ir2…[><I…. [><I rn. As an illustration,
sition is, in gen-eral, a bad database design. consider our earlier example, in which
Why is the decomposition lossy? There is one attribute in • n = 2.
common between Branch-customer-schema and Customer-
• R. = Lending-schema.
loan-schema:-
• R 1 = Branch-customer-schema.
Branch-customer-schema n Customer-loan-schema = {cus-
tomer-name} • R2 = Customer-Loan-schema.
The only way that we can represent a relationship between, for • r = the relation shown in Figure 7.1.
example, loan-number and branch-name is through customer- • r1 = the relation shown in Figure 7.2
name. This representation is not adequate because a customer • r2 = the relation shown in Figure 7.10.
may have several loans, yet these loans are not necessarily
• r1 [><I r2 = the relation shown in Figure 7.11.
obtained from the same branch.
To have a lossless-join decomposition, we need to impose
Let us consider another alternative design, in which we decom-
constraints on the set of possible relations. We found that the
pose Lending-schema into the following two schemas:
decomposition of Lending-schema into Branch-schema and
Branch-schema = (branch-name, branch-city, assets) Loan-info-schema is lossless because the functional dependency
Loan-info-schema = (branch-name, customer-name, loan- branch-name → branch:...city assets
number, amount)
hold on branch schema.
There is one attribute in common between these two schemas:
We shall show how to test whether decomposition is lossless-
Branch-loan-schema n Customer-loan-schema = {branch- join decomposition in the next few sections. A major part of
name} this chapter deals with the questions of how to specify con-
Thus, the only way that we can represent a relationship’ straints on the database, and how to obtain lossless-join
between, for example, customer-name and assets is through decom-positions that avoid the pitfalls represented by the
branch-name. The difference between this exam-ple and the examples of bad database designs that we have seen in this
preceding one is that the assets of a branch are the same, section.
regardless of the customer to which we are referring; whereas Desirable Properties of Decomposition
the lending branch associated with a certain loan amount does We can use a given set of functional dependencies in designing a
depend on the customer to which we are referring. For a given relational database in which most of the undesirable properties
branch-name, there is exactly one assets value and exactly one do not occur. When we design such systems, it may become
branch-city; necessary to decompose a relation into several smaller relations.
Where as a similar statement cannot be made for customer- In this section, we outline the desirable properties of a decom-
name. That is, the functional dependency position of a rela-tional schema. In later sections, we outline
branch-name → assets branch-city specific ways of decomposing a relational schema to get the
holds, but customer-name does not functionally determine properties we desire. We illustrate our concepts with the
loan-number. Lending -schema
The notion of loss less joins is central to much of relational- Lending-schema = (branch-name, branch-city, assets, customer-
database design. There-fore, we restate the preceding examples name, loan-number, amount)
more concisely and more formally. Let. R be a relation schema. The set F of functional dependencies that we require to hold on
A set of relation schemas {R1, R2,..., Rn} is a decomposition Lending-schema are
of R if branch-name -+ branch-city assets
R = Rl U R2 U ... U Rn loan-number -+ amount branch-name
108
Lending-schema is an example of a bad database design.
DATABASE
Second Normal Form
Assume that we decompose it to the following three relations: The general requirements of 2NF are:
Branch-schema = (branch-name, branch-city, assets) • Remove subsets of data that apply to multiple rows of a
Loan-schema = (loan-number, branch-name, amount) table and place them in separate rows.
Borrower-schema = (customer-name, loan-number) • Create relationships between these new tables and their
MANAGEMENT
predecessors through the use of foreign keys.
We claim that this decomposition has several desirable proper-
ties, which we discuss next. These rules can be summarized in a simple statement: 2NF
attempts to reduce the amount of redundant data in a table by
Lossless-join Decomposition extracting it, placing it in new table(s) and creating relationships
When we decompose a relation into a number of smaller between those tables.
relations, it is crucial that the decomposition be lossless. We
Let’s look at an example. Imagine an online store that main-
must first present a criterion for determining whether a
tains customer information in a database. Their Customers
decomposition is lossy.
table might look something like this:
Let R be a relation schema, and let F be a set of functional
dependencies on R. Let R1 and R2 form a decomposition of R. CustNum FirstName LastName Address City State ZIP
1 John Doe 12 Main Street Sea Cliff NY 11579
This decomposition is a lossless-join decom-position of R if at
2 Alan Johnson 82 Evergreen Tr Sea Cliff NY 11579
least one of the following functional dependencies is in F+: 3 Beth Thompson 1912 NE 1st St Miami FL 33157
R1 ∩ R2 → Rl 4 Jacob Smith 142 Irish Way South Bend IN 46637
5 Sue Ryan 412 NE 1st St Miami FL 33157
R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or A brief look at this table reveals a small amount of redundant
R2, the decomposition of R is a lossless-join decomposition. data. We’re storing the “Sea Cliff, NY 11579” and “Miami, FL
We can use attribute closure to efficiently test for superkeys, as 33157” entries twice each. Now, that might not seem like too
we have seen earlier. much added storage in our simple example, but imagine the
We now demonstrate that our decomposition of Lending- wasted space if we had thousands of rows in our table.
schema is a lossless-join decomposition by showing a sequence Additionally, if the ZIP code for Sea Cliff were to change, we’d
of steps that generate the decomposition. We begin by need to make that change in many places throughout the
decomposing Lending-schema into two schemas: database.
Branch-schema = (branch-name, branch-city, assets) In a 2NF-compliant database structure, this redundant informa-
tion is extracted and stored in a separate table. Our new table
Loan-info-schema = (branch-name, customer-name, loan-
(let’s call it ZIPs) might look like this:
number, amount)
Since branch-name → branch-city assets, the augmentation rule ZIP City State
for functional depen-dencies implies that 11579 Sea Cliff NY
branch-name → branch-name branch-city assets 33157 Miami FL
Since Branch-schema n Loan-info-schema = {branch-name}, it 46637 South Bend IN
follows that our initial decomposition is a lossless-join If we want to be super-efficient, we can even fill this table in
decomposition. advance — the post office provides a directory of all valid ZIP
Next, we decompose Loan-info-schema into codes and their city/state relationships. Surely, you’ve encoun-
Loan-schema = (loan-number, branch-name, amount) tered a situation where this type of database was utilized.
Someone taking an order might have asked you for your ZIP
Borrower-schema = (customer-name, loan-number)
code first and then knew the city and state you were calling
This step results in a lossless-join decomposition, since loan- from. This type of arrangement reduces operator error and
number is a common at-tribute and loan-number → amount increases efficiency.
branch-name.
Now that we’ve removed the duplicative data from the
For the general case of decomposition of a relation into Customers table, we’ve satisfied the first rule of second normal
multiple parts at once, the test for lossless join decomposition form. We still need to use a foreign key to tie the two tables
is more complicated. See the bibliographical notes for references together. We’ll use the ZIP code (the primary key from the
on the topic. ZIPs table) to create that relationship. Here’s our new Custom-
While the test for binary decomposition is clearly a sufficient ers table:
condition for lossless join, it is a necessary condition only if all
constraints are functional dependencies. We shall see other types CustNum FirstName LastName Address ZIP
of constraints later (in particular, a type of constraint called 1 John Doe 12 Main Street 11579
multivalued dependencies), that can ensure that a decomposi- 2 Alan Johnson 82 Evergreen Tr 11579
3 Beth Thompson 1912 NE 1st St 33157
tion is lossless join even if no functional dependencies are
4 Jacob Smith 142 Irish Way 46637
present.
5 Sue Ryan 412 NE 1st St 33157
109
We’ve now minimized the amount of redundant information 4. Define 2NF with the help of an example?
DATABASE
stored within the database and our structure is in second

normal form.
Points to Ponder
• The bad design of a database suggests that we should
MANAGEMENT
decompose a relation schema that has many attributes into

several schemas with fewer attributes.
• when we decompose a relation into a number of smaller
relations, it is crucial that the decomposition be lossless
• 2NF attempts to reduce the amount of redundant data in a
table by extracting it, placing it in new table(s) and creating
relationships between those tables.
• We can use a given set of functional dependencies in
designing a relational database in which most of the
undesirable properties do not occur
• When we decompose a relation into a number of smaller
relations, it is crucial that the decomposition be lossless
• 2NF remove subsets of data that apply to multiple rows of
a table and place them in separate rows.
Review Terms
• Decomposition
• Properties of decomposition
• Lossless join decomposition
• 2NF
Students Activity
1. Define decomposition? Why it is required?
2. Define desirable properties of decomposition?
3. Define Lossless join decomposition?
110
DATABASE MANAGEMENT
111
Student Notes
DATABASE
LESSON 37:
NORMALISATION - III
MANAGEMENT
Lesson Objectives We claim that Loan-info-schema is not in a desirable form, since

• BCNF it suffers from the problem of repetition of information. We
observe that, if there are several customer names associated
• 3NF
with a loan, in a relation on Loan-info--schema, then we are
• Comparison of BCNF & 3NF forced to repeat the branch name and the amount once for each
• 4NF customer. We can eliminate this redundancy by redesigning our
database such that all schemas are in BCNF. One approach to
Boyce-Codd Normal Form
this problem is to take the existing non--BCNF design as a
Using functional dependencies, we can define several normal
starting point, and to decompose those schemas that are not in
forms that represent “good” database designs.
BCNF. Consider the decomposition of Loan-info-schema into
Definition two schemas:
One of the more desirable normal forms that we can obtain is
Loan-schema = (loan-number, branch-name, amount)
Boyce-Codd normal form (BCNF). A relation schema R is in
BCNF with respect to a set F of functional dependencies if, for Borrower-schema = (customer-name, loan-number)
all functional dependencies in F+ of the form α →β, where α ⊆ This decomposition is a lossless-join decomposition.
R and β ⊆ R, at least one of the following holds: To determine whether these schemas are in BCNF, we need to
• α → β is a trivial functional dependency (that is, β ⊆ α). determine what functional dependencies apply to them. In this
α is a superkey for schema R. example, it is easy to see that
A database design is in BCNF if each member of the set of loan-number → amount branch-name
relation schemas that constitutes the design is in BCNF. applies to the Loan-schema, and that only trivial functional
As an illustration, consider the following relation schemas dependencies apply to Borrower-schema. Although loan-
and their respective functional dependencies: number is not a superkey for Loan-info-schema, it is a candidate
• Customer-schema = (customer-name, customer-street, key for Loan-schema. Thus, both schemas of our decomposi-
customer-city) tion are in SCNF.
customer-name -t customer-street customer-city It is now possible to avoid redundancy in the case where there
are several cus-tomers associated with a loan. There is exactly
• Branch-schema = (branch_name, assets, branch-city)
one tuple for each loan in the relation on Loan-schema, and one
branch-name -t assets branch-city tuple for each customer of each loan in the relation on Bor-
• Loan-info-schema = (branch-name, customer-name, loan- rower-schema. Thus, we do not have to repeat the branch name
number, amount) and the amount once for each customer associated with a loan.
loan-number -t amount branch-name Often testing of a relation to see if it satisfies BCNF can be
We claim that Customer-schema is in BCNF. We note that a simplified:
candidate key for the, schema is cust9mer-name. The only • To check if a nontrivial dependency α → β causes a violation
nontrivial functional dependencies that hold on Customer- of SCNF, comput α+ (the attribute closure of α), and
schema have customer-name on the left side of the arrow. Since verify that it includes all attributes of R; that is, it is a
customer-name is a candidate key, functional dependencies with superkey of R.
customer-name on the left side do not violate the definition of • To check if a relation schema R is in SCNF, it suffices to
BCNF. Similarly, it can be shown easily that the relation schema check only the dependencies in the given set F for violation
Branch-schema is in BCNF. ‘of BCNF, rather than check all depen-dencies in F+.
The schema Loan-info-schema, however, is not in BCNF. First, We can show that if none of the dependencies in F causes a
note that loan-number is not a superkey for Loan-info-schema, violation of SCNF, then none of the dependencies in F+ will
since we could have a pair of tuples represent-ing a single loan cause a violation of SCNF either.
made to two people - for example,
Unfortunately, the latter procedure does not work when a
(Dowhtown, John Bell, L-44, 1000) relation is decomposed. That is it does not suffice to use F
(Downtown, Jane Bell, L-44, 1000) when we test a relation Ri in a decomposition of R, for
Because we did not list functional dependencies that rule out violation of BCNF. For example” consider relation schema R
the preceding case, loan--number is not a candidate key. (A, B, C, D, E), with functional dependencies F containing A →
However, the functional dependency loan-nurnber → amount Band BC → D. Suppose this were decomposed into R1(A, B)
is nontriviaL Therefore, Loan-info-schema does not satisfy the and R2(A, C, D, E). Now, neither of the dependencies in F
definition of BCNF. contains only attributes from (A, C, D, E) so we might be
112
misled into thinking R2 satisfies BCNF. In fact, there is a definition seems rather unintuitive, and it is not obvious why it
DATABASE
dependency AC → D in F+ (which can be inferred using the is useful. It represents, in some sense, a minimal relaxation of
pseudotransitivity rule from the two dependencies in F), which the BCNF conditions that helps ensure that every schema has a
shows that R2 is not in BCNF. Thus, we may need a depen- dependency-preserving decomposition into 3NF. Its purpose
dency that is in F+, but is not in F, to show that a decomposed will become more clear later, when we study decomposition into
relation is not in BCNF. 3NF.
MANAGEMENT
An alternative BCNF test is sometimes easier than computing Observe that any schema that satisfies BCNF also satisfies 3NF,
every dependency in F+. To check if a relation Ri in a decompo- since each of its functional dependencies would satisfy one of
sition of R is in BCNF, we apply this test: the first two alternatives. BCNF is there-fore a more restrictive
. For every subset a of attributes in R.i, check that a+ (the constraint than is 3NF.
attribute closure of a under F) either includes no attribute of Ri The definition of 3NF allows certain functional dependencies
- a, or includes all attributes of Ri. that are not allowed in BCNF. A dependency α → β that
If the condition is violated by some set of attributes a in Ri, satisfies only the third alternative of the 3NF definition is not
consider the following functional dependency, which can be allowed in BCNF, but is allowed in 3NF.
shown to be present in F+: Let us return to our Banker-schema example (Section 7.6). We
α → (α+ - α)∩ Ri have shown that this relation schema does not have a depen-
dency-preserving, lossless-join decomposition into BCNF. This
The above dependency shows that Ri violates BCNF, and is a
schema, however, turns out to be in 3NF. To see that it is, we
“witness” for the viola-tion. The BCNF decomposition
note that {customer-name, branch-name} is a candidate key for
algorithm, which we shall see in Section 7.6.2, makes use of the
Banker-schema, so the only attribute not contained in a
witness.
candidate key for Banker-schema is banker-name. The only
Third Normal Form nontrivial functional dependencies of the form
As we saw earlier, there are relational schemas where a BCNF α → banker-name
decomposition cannot be dependency preserving. For such
include {customer-name, branch-name} as part of Q. Since
schemas, we have two alternatives if we wish to check if an
{customer-name, branch-name} is a candidate key, these
update violates any functional dependencies:
dependencies do not violate the definition of 3NF.
• Pay the extra cost of computing joins to test for violations.
As an optimization when testing for 3NF, we can consider only
• Use an alternative decomposition, third normal form functional depen-dencies in the given set F, rather than in F+.
(3NF), which we present below, which makes testing of Also, we can decompose the dependen-cies in F so that their.
updates cheaper. Unlike BCNF, 3NF decompo-sitions may right-hand side consists of only single attributes, and use the
contain some redundancy in the decomposed schema. resultant set in place of F.
We shall see that it is always possible to find a lossless-join, Given a dependency α → β, we can use the same attribute-
dependency-preserving decomposition that is in 3NF. Which of closure-based tech-nique that we used for BCNF to check if α is
the two alternatives to choose is a design decision to be made a superkey. If α is not a superkey, we have to verify whether each
by the database designer on the basis of the application require- attribute in β is contained in a candidate key of R; this test is
ments. rather more expensive, since it involves finding candidate keys.
Definition In fact, test-ing for 3NF has been .shown to be NP-hard; thus,
BCNF requires that all nontrivial dependencies be of the form it is very unlikely that there is a polynomial time complexity
α → β, where α is a super key. 3NF relaxes this constraint algorithm for the task.
slightly by allowing nontrivial functional de-pendencies whose Comparison of BCNF and 3NF
left side is not a superkey. Of the two normal forms for relational-database schemas, 3NF
A relation schema R is in third normal form (3NF) with respect and BCNF, there are advantages to 3NF in that we know that it
to a set F of func-tional dependencies if, for all functional is always possible to obtain a 3NF design without sacrificing a
dependencies in F+ of the form α → β, where α ⊆ R and β ⊆ lossless join or dependency preservation. Nevertheless, there are
R, at least one of the following holds: disadvantages to 3NF: If we do not eliminate all transitive
• α → β is a trivial functional dependency. relations schema depen-dencies, we may have to use null values
• α is a superkey for R. to represent some of the possible meaningful relationships
among data items, and there is the problem of repetition of
• Each attribute A in β - α is contained in a candidate key for information.
R.
As an illustration of the null value problem, consider again the
Note that the third condition above does not say that a single Banker-schema and its associated functional dependencies. Since
candidate key should contain all the attributes in β - α; each banker-name → branch-name, we may want to represent
attribute A in β - α may be contained in a different candidate relationships between values for banker-name and values for
key. branch--name in our database. If we are to do so, however,
The first two alternatives are the same as the two alternatives in either there must be a correspond-ing value for customer-name,
the definition of BCNF. The third alternative of the 3NF or we must use a null value for the attribute customer--name‘1.
113
Customer-name bank-name branch-name Consider again our banking example. Assume that, in an
DATABASE
alternative design for the bank database schema, we have the

Jones Johnson Perryridge schema
Smith Johnson Perryridge BC-schema = (loan-number, customer-name, customer-street,
Hayes Johnson Perryridge
customer-city)
Jackson Johnson Perryridge
MANAGEMENT
Curry Johnson Perryridge The astute reader will recognize this schema as a non-BCNF
Turner Johnson Perryridge schema because of the functional dependency
customer-name → customer-street customer-city
An instance of Banker-schema.
that we asserted earlier, and because customer-name is not a key
As an illustration of the repetition of information problem, for BC-schema. How-ever, assume that our bank is attracting
consider the instance of Banker-schema. Notice that the wealthy customers who have several ad-dresses (say, a winter
information indicating that Johnson is working at the home and a summer home). Then, we no longer wish to en-
Perryridge branch is repeated. force the functional dependency customer-name →
Recall that our goals of database design with functional customer-street customer-city. If we remove this functional
dependencies are: dependency, we find BC-schema to be in BCNF with respect to
1. BCNF our modified set of functional dependencies. Yet, even though
BC-schema is now in BCNF, we still have the problem of
2. Lossless join
repetition of information that we had earlier.
3. Dependency preservation
To deal with this problem, we must define a new form of
Since it is not always possible to satisfy all three, we may be constraint, called a mul-tivalued dependency. As we did for
forced to choose between BCNF and dependency preservation functional dependencies, we shall use multivalued dependencies
with 3NF. to define a normal form for relation schemas. This normal
It is worth noting that SQL does not provide a way of form, called fourth normal form (4NF), is more restrictive than
specifying functional depen-dencies, except for the special case BCNF. We shall see that every 4NF schema is also in BCNF, but
of declaring superkeys by using the primary key or unique there are BCNF schemas that are not in 4NF.
constraints. It is possible, although a little complicated, to write
Multivalued Dependencies
assertions that enforce a functional dependency, unfortunately
Functional dependencies rule out certain tuples from being in a
testing the assertions would be very expensive in most database
relation. If A → B, then we cannot have two tuples with the
systems. Thus even if we had a dependency-preserving
same A value but different B values. Mul-tivalued dependencies,
decomposition, if we use standard SQL we would not be able
on the other hand, do not rule out the existence of certain
to efficiently test a functional dependency whose left-hand side
tuples. Instead, they require that other tuples of a certain form
is not a key.
be present in the rela-tion. For this reason, functional depen-
Although testing functional dependencies may involve a join if dencies sometimes are referred to as equality- generating
the decomposition is not dependency preserving, we can reduce dependencies, and multivalued dependencies are referred to as
the cost by using materialized views, which many database tuple- generating dependencies.
systems support. Given a BCNF decomposition that is not de-
Let R be a relation schema and let α ⊆ R and β ⊆ R. The
pendency preserving, we consider each dependency in a
multivalued dependency
minimum cover Fc that is not preserved in the decomposition.
For each such dependency α → β, we define a materialized view α ⊆→→β
that computes a join of all relations in the decomposition, and holds on R if, in any legal relation r(R), for all pairs of tuples t1
projects the result on αβ. The functional dependency can be and t2 in r such that t1[α] = t2[α], there exist tuples t3 and t4 in
easily tested on the ma-terialized view, by means of a constraint r such that
unique (α). On the negative side, there is a space and time tr[α] = t2[α] = t3[α] = t4[α]
overhead due to the materialized view, but on the positive side,
t3 [β] = td{β]
the application programmer need not worry about writing code
to keep redundant data consistent on updates; it is the job of t3[R - β] = t2[R - β]
the database system to maintain the material-ized view, that is, t4 [β] .= t2 [β]
keep up up to date when the database is updated t4[R - β] = t1(R - β]
Thus, in case we are not able to get a dependency-preserving
BCNF decomposition, it is generally preferable to opt for α β R- α - β
BCNF, and use techniques such as materialized views to reduce T1 A1 …..ai ai +1 ….aj Aj +1 …an
T2 A1 …..ai bi + 1 … bj Bj +1 ….bn
the cost of checking functional dependencies.
T3 A1 …..ai Ai+1 …. aj Bj +1 ….bn
Fourth Normal Form T4 A1 …..ai Bi +1 ….bj Aj +1 … an
Some relation schemas, even though they are in BCNF, do not
seem to be sufficiently normalized, in the sense that they still Tabular representation of α →→ β
suffer from the problem of repetition of infor-mation.
114
This definition. is less complicated than it appears to be. dependencies that occur in practice appear to be quite simple.
DATABASE
Figure’7.16 gives a tabular picture of t1, t2, t3 and t4. Intuitively, For complex dependencies, it is better to reason about sets of
the multivalued dependency α →→ β says that the relationship dependencies by using a system of inference rules.
between α and α is independent of the relationship between α From the definition of multivalued dependency, we can derive
and R - β. If the multivalued dependency α →→ β is satisfied the following rule:
by all relations on schema R, then α →→ β is a trivial
MANAGEMENT
α → β, then α →→ β
multivalued dependency on schema R. Thus α →→ β is trivial
if β ⊆ α or β U α = R. In other words, every functional dependency is also a
multivalued dependency.
To illustrate the difference between functional and multivalued
dependencies, we consider the BC-schema again, and the Definition of Fourth Normal Form
relation bc (BC-schema). We must repeat the loan number once Consider again our BC-schema example in which the
for each address a customer has, and we must repeat the address multivalued dependency customer-name ®® customer-street
for each loan a customer has. This repetition is unnecessary, customer-city holds, but no nontrivial functional de-pendencies
since the relationship between a customer and his address is hold. We saw in the opening paragraphs of Section 7.8 that,
independent of the relationship between that customer and a although BC-schema is in BCNF, the design is not ideal, since
loan. If a customer (say, Smith) has a loan (say, loan number L- we must repeat a customer’s address information lor each loan.
23), we want that loan to We shall see that we can use the given multivalued de-pendency
be associated with all Smith’s addresses. Thus, the relation of to improve the database design, by decomposing BC-schema
Figure 7.18 is illegal. To make this relation legal, we need to add into a fourth normal form decomposition.
the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, A relation schema R is in fourth normal form (4NF) with
North, Rye) to the bc relation respect to a set 0 of functional and multivalued dependencies if,
Comparing the preceding example with our definition of for all multivalued dependencies in D+ of the form α →→ β,
multivalued dependency, we see that we want the multivalued where α ⊆ R and β ⊆ R, at least one of the following holds
dependency • α →→ β is a trivial multivalued dependency.
customer-name →→ customer-street customer-city • α is a superkey for schema R.
to hold. (The multivalued dependency customer-name →→ A database design is in 4NF if each member of the set of
loan-number will do as well. We shall soon see that they are relation schemas that consti-tutes the design is in 4NE
equivalent.) Note that the definition of 4NF differs from the definition of
As with functional dependencies, we shall use multivalued BCNF in only the use of multivalued dependencies instead of
dependencies in two ways: functional dependencies. Every 4NF schema is in
1. To test relations to determine whether they are legal under a BCNF. To see this fact, we note that, if a schema R is not in
given set of func-tional and multivalued dependencies BCNF, then there is
2. To specify constraints on the set of legal relations; we shall result := {R}; done := false;
thus concern our-selves with only those relations that satisfy compute D+; Given schema Ri, let Di denote the restriction of
a given set of functional and mul-tivalued dependencies D+ to Ri while (not done) do
Customer-street if (there is a schema Ri in result that is not in 4NF W.r.t.- Di)

Loan-number Customer-name Customer-city
then begin
L-23 Smith Rye
North let α →→ β a nontrivial multivalued dependency that holds on
L-23 Smith Main Manchester
Ri such that α → Ri is not in Di, and α ∩ β = φ;
L-93 Curry Lake Horseneck
result := (result - Ri) U (Ri - β U (α , β);
Relation bc: An example of redundancy in a BCNF relation. end
Loan-number Customer-name Customer-street customercity else done := true;
L-23 Smith North Rye
L-27 Smith Main Manchester 4NF decomposition algorithm.
An illegal bc relation. a nontrivial functional dependency α → β holding on R, where
a is not a superkey. Since α → β implies α →→ β, R cannot be
Note that, if a relation r fails to satisfy a given multivalued
in 4NF.
dependency, we can con-struct a relation r’ that does satisfy the
multivalued dependency by adding tuples to r. Let R be a relation schema, and let R1, R2,. . . , Rn be a decom-
position of R. To check if each relation schema Ri in the
Let D denote a set of functional and multivalued dependencies.
decomposition is in 4NF, we need to find what multivalued
The closure D+ of D is the set of all functional and
dependencies hold on each Ri. Recall that, for a set F of
multivalued dependencies logically implied by o. As we did for
functional dependencies, the restriction Fi of F to Ri is all
functional dependencies, we can compute D+ from 0, using the
functional dependencies in F+ that include only attributes of Ri.
formal definitions of functional dependencies and multivalued
Now consider a set D of both functional and multivalued
dependencies. We can man-age with such reasoning for very
simple multivalued dependencies. Luckily, multi-valued
115
dependencies. The restriction of D to Ri is the set Di consisting 4. Define Multivalued dependencies?
DATABASE
of
1. All functional dependencies in D+ that include only
attributes of Ri
2. All multivalued dependencies of the form
MANAGEMENT
α →→ β ∩ Ri
where α ⊆ Ri and α →→ β is in D+.
Points to Ponder
• A relation schema R is in BCNF with respect to a set F of 5. Define 4NF?
functional dependencies if, for all functional dependencies in
F+ of the form α →β
• third normal form (3NF), which we present below, which
makes testing of updates cheaper
• Mul-tivalued dependencies do not rule out the existence of
certain tuples Instead, they require that other tuples of a
certain form be present in the rela-tion.
• Every 4NF schema is also in BCNF, but there are BCNF
schemas that are not in 4NF.
Review Terms
• BCNF
• 3NF
• Comparison of BCNF & 3NF
• 4NF
Students Activity
1. Define BCNF?
2 Define 3NF?
3. Differentiate between BCNF & 3NF?
116
DATABASE MANAGEMENT
117
Student Notes
DATABASE
LESSON 38:
FILE ORGANIZATION METHOD - I
MANAGEMENT
Lesson objectives • Disk storage is called direct access storage as it is

• Physical Storage Media possible to read data on the disk in any order
(unlike sequential access).
• Performance Measures of Disks
• Disk storage usually survives power failures and
• Optimization of Disk-Block Access
system crashes.
• Fixed-Length Records
• Optical storage: CD-ROM (compact-disk read-only
• Variable-Length Records memory), WORM (write-once read-many) disk (for
• Sequential File Organization archival storage of data), and Juke box (containing a
• Clustering File Organization few drives and numerous disks loaded on demand).
• Data Dictionary Storage • Tape Storage: used primarily for backup and archival
data.
Overview of Physical Storage Media
• Cheaper, but much slower access, since tape must
1. Several types of data storage exist in most computer be read sequentially from the beginning.
systems. They vary in speed of access, cost per unit of data,
and reliability. • Used as protection from disk failures!
• Cache: most costly and fastest form of storage. 2. The storage device hierarchy is presented , where the higher
Usually very small, and managed by the operating levels are expensive (cost per bit), fast (access time), but the
system. capacity is smaller.
• Main Memory (MM): the storage area for data

available to be operated on.
• General-purpose machine instructions operate on
main memory.
• Contents of main memory are usually lost in a
power failure or “crash”.
• Usually too small (even with megabytes) and too
expensive to store the entire database.
• Flash memory: EEPROM (electrically erasable
programmable read-only memory).
• Data in flash memory survive from power failure.
• Reading data from flash memory takes about 10
nano-secs (roughly as fast as from main memory),
and writing data into flash memory is more Storage-device hierarchy
complicated: write-once takes about 4-10 microsecs.
3. Another classification: Primary, secondary, and tertiary
• To overwrite what has been written, one has to first storage.
erase the entire bank of the memory. It may
1. Primary storage: the fastest storage media, such as cash
support only a limited number of erase cycles (104
and main memory.
to 106).
2. Secondary (or on-line) storage: the next level of the
• It has found its popularity as a replacement for
hierarchy, e.g., magnetic disks.
disks for storing small volumes of data (5-10
megabytes). 3. Tertiary (or off-line) storage: magnetic tapes and optical
disk juke boxes.
• Magnetic-disk storage: primary medium for long-
term storage. 4. Volatility of storage. Volatile storage loses its contents when
the power is removed. Without power backup, data in the
• Typically the entire database is stored on disk.
volatile storage (the part of the hierarchy from main
• Data must be moved from disk to main memory memory up) must be written to nonvolatile storage for
in order for the data to be operated on. safekeeping.
• After operations are performed, data must be Performance Measures of Disks
copied back to disk if any changes were made. The main measures of the qualities of a disk are capacity, access
time, data transfer rate, and reliability,
118
1. Access time: the time from when a read or write request is The restore operation writes back the blocks of each
DATABASE
issued to when data transfer begins. To access data on a file continuously (or nearly so). Some systems, such as
given sector of a disk, the arm first must move so that it is MS-DOS, have utilities that scan the disk and then
positioned over the correct track, and then must wait for the move blocks to decrease the fragmentation.
sector to appear under it as the disk rotates. The time for • Nonvolatile write buffers. Use nonvolatile RAM (such
repositioning the arm is called seek time, and it increases
MANAGEMENT
as battery-back-up RAM) to speed up disk writes
with the distance the arm must move. Typical seek time drastically (first write to nonvolatile RAM buffer and
range from 2 to 30 milliseconds. inform OS that writes completed).
Average seek time is the average of the seek time, measured • Log disk. Another approach to reducing write latency
over a sequence of (uniformly distributed) random is to use a log disk, a disk devoted to writing a
requests, and it is about one third of the worst-case seek sequential log. All access to the log disk is sequential,
time. essentially eliminating seek time, and several
Once the seek has occurred, the time spent waiting for the consecutive blocks can be written at once, making
sector to be accesses to appear under the head is called writes to log disk several times faster than random
rotational latency time. Average rotational latency time is writes.
about half of the time for a full rotation of the disk.
File Organization
(Typical rotational speeds of disks ranges from 60 to 120
rotations per second). 1. A file is organized logically as a sequence of records.
The access time is then the sum of the seek time and the 2. Records are mapped onto disk blocks.
latency and ranges from 10 to 40 milli-sec. 3. Files are provided as a basic construct in operating systems,
2. Data transfer rate, the rate at which data can be retrieved so we assume the existence of an underlying file system.
from or stored to the disk. Current disk systems support 4. Blocks are of a fixed size determined by the operating
transfer rate from 1 to 5 megabytes per second. system.
3. Reliability, measured by the mean time to failure. The typical 5. Record sizes vary.
mean time to failure of disks today ranges from 30,000 to 6. In relational database, tuples of distinct relations may be of
800,000 hours (about 3.4 to 91 years). different sizes.
Optimization of Disk-Block Access 7. One approach to mapping database to files is to store
1. Data is transferred between disk and main memory in units records of one length in a given file.
called blocks. 8. An alternative is to structure files to accommodate variable-
2. A block is a contiguous sequence of bytes from a single length records. (Fixed-length is easier to implement.)
track of one platter. Fixed-Length Records
3. Block sizes range from 512 bytes to several thousand. 1. Consider a file of deposit records of the form:
4. The lower levels of file system manager covert block aaaaaaaaaaaa¯type deposit = record
addresses into the hardware-level cylinder, surface, and bname : char(22);
sector number.
account# : char(10);
5. Access to data on disk is several orders of magnitude slower
than is access to data in main memory. Optimization balance : real;
techniques besides buffering of blocks in main memory. end
• Scheduling: If several blocks from a cylinder need to • If we assume that each character occupies one byte, an
be transferred, we may save time by requesting them in integer occupies 4 bytes, and a real 8 bytes, our deposit
the order in which they pass under the heads. A record is 40 bytes long.
commonly used disk-arm scheduling algorithm is the • The simplest approach is to use the first 40 bytes for
elevator algorithm. the first record, the next 40 bytes for the second, and
• File organization. Organize blocks on disk in a way so on.
that corresponds closely to the manner that we expect • However, there are two problems with this approach.
data to be accessed. For example, store related • It is difficult to delete a record from this structure.
information on the same track, or physically close
• Space occupied must somehow be deleted, or we need
tracks, or adjacent cylinders in order to minimize seek
to mark deleted records so that they can be ignored.
time. IBM mainframe OS’s provide programmers fine
control on placement of files but increase • Unless block size is a multiple of 40, some records will
programmer’s burden. cross block boundaries.
UNIX or PC OSs hide disk organizations from users. • It would then require two block accesses to read or
Over time, a sequential file may become fragmented. write such a record.
To reduce fragmentation, the system can make a back- 2. When a record is deleted, we could move all successive
up copy of the data on disk and restore the entire disk. records up one, which may require moving a lot of records.
119
• We could instead move the last record into the “hole”
DATABASE
Sequential File Organization
created by the deleted record 1. A sequential file is designed for efficient processing of
• This changes the order the records are in. records in sorted order on some search key.
• It turns out to be undesirable to move records to • Records are chained together by pointers to permit fast
occupy freed space, as moving requires block accesses. retrieval in search key order.
MANAGEMENT
• Also, insertions tend to be more frequent than • Pointer points to next record in order.
deletions. • Records are stored physically in search key order (or as
• It is acceptable to leave the space open and wait for a close to this as possible).
subsequent insertion. • This minimizes number of block accesses.
• This leads to a need for additional structure in our file 2. It is difficult to maintain physical sequential order as records
design. are inserted and deleted.
3. So one solution is: • Deletion can be managed with the pointer chains.
• At the beginning of a file, allocate some bytes as a file • Insertion poses problems if no space where new
header. record should go.
• This header for now need only be used to store the • If space, use it, else put new record in an overflow
address of the first record whose contents are deleted. block.
• This first record can then store the address of the • Adjust pointers accordingly.
second available record, To insert a new record, we use
• Problem: we now have some records out of physical
the record pointed to by the header, and change the
sequential order.
header pointer to the next available record.
• If very few records in overflow blocks, this will work
• If no deleted records exist we add our new record to
well.
the end of the file.
• If order is lost, reorganize the file.
4. Note: Use of pointers requires careful programming. If a
record pointed to is moved or deleted, and that pointer is • Reorganizations are expensive and done when system
not corrected, the pointer becomes a dangling pointer. load is low.
Records pointed to are called pinned. 3. If insertions rarely occur, we could keep the file in physically
5. Fixed-length file insertions and deletions are relatively sorted order and reorganize when insertion occurs. In this
simple because “one size fits all”. For variable length, this is case, the pointer fields are no longer required.
not the case. Clustering File Organization
Variable-Length Records 1. One relation per file, with fixed-length record, is good for
Variable-length records arise in a database in several ways: small databases, which also reduces the code size.
• Storage of multiple items in a file. 2. Many large-scale DB systems do not rely directly on the
• Record types allowing variable field size underlying operating system for file management. One large
OS file is allocated to DB system and all relations are stored
• Record types allowing repeating fields
in one file.
Organization of Records in Files 3. To efficiently execute queries involving
There are several ways of organizing records in files. , one may store the depositor tuple
• heap file organization. Any record can be placed anywhere for each cname near the customer tuple for the
in the file where there is space for the record. There is no corresponding cname
ordering of records. 4. This structure mixes together tuples from two relations,
• sequential file organization. Records are stored in but allows for efficient processing of the join.
sequential order, based on the value of the search key of 5. If the customer has many accounts which cannot fit in one
each record. block, the remaining records appear on nearby blocks. This
• hashing file organization. A hash function is computed file structure, called clustering, allows us to read many of the
on some attribute of each record. The result of the function required records using one block read.
specifies in which block of the file the record should be 6. Our use of clustering enhances the processing of a
placed — to be discussed in chapter 11 since it is closely particular join but may result in slow processing of other
related to the indexing structure. types of queries, such as selection on customer.
• clustering file organization. Records of several different For example, the query
relations can be stored in the same file. Related records of
aaaaaaaaaaaa¯select *
the different relations are stored on the same block so that
one I/O operation fetches related records from all the from customer
relations. now requires more block accesses as our customer relation is
now interspersed with the deposit relation.
120
7. Thus it is a trade-off, depending on the types of query that
DATABASE
Students Activity
the database designer believes to be most frequent. Careful 1. Define various types of storage media?
use of clustering may produce significant performance gain.
Data Dictionary Storage
1. The database also needs to store information about the
MANAGEMENT
relations, known as the data dictionary. This includes:
• Names of relations.
• Names of attributes of relations.
• Domains and lengths of attributes.
• Names and definitions of views.
2. Define qualities of a disk?
• Integrity constraints (e.g., key constraints).
plus data on the system users:
• Names of authorized users.
• Accounting information about users.
plus (possibly) statistical and descriptive data:
• Number of tuples in each relation.
• Method of storage used for each relation (e.g., clustered
or non-clustered).
2. When we look at indices we’ll also see a need to store 3. Define sequential file storage?
information about each index on each relation:
• Name of the index.
• Name of the relation being indexed.
• Attributes the index is on.
• Type of index.
3. This information is, in itself, a miniature database. We can
use the database to store data about itself, simplifying the
overall structure of the system, and allowing the full power
4. Define clustering file storage?
of the database to be used to permit fast access to system
data.
Points to Ponder
• Several types of data storage exist in most computer
systems as main memory,cache, Flash memory, Magnetic-
disk storage,optical storage, tape storage.
• The main measures of the qualities of a disk are capacity,
access time, data transfer rate, and reliability
• A sequential file is designed for efficient processing of 5. Define data dictionary storage?
records in sorted order on some search key
• Clustering structure mixes together tuples from two
relations, but allows for efficient processing of the join.
• The database also needs to store information about the
relations, known as the data dictionary.
Review Terms
• Storage device
• Sequential file storage
• Clustering file storage
• Data dictionary storage
• Search key
121
Student Notes
122
DATABASE MANAGEMENT
DATABASE
LESSON 39:
FILE ORGANISATION METHOD - II
MANAGEMENT
Lesson objectives 5. Space Overhead — additional space occupied by an
• Indexing & Hashing index structure.
• Ordered Indices 6. We may have more than one index or hash function for a
file. (The library may have card catalogues by author, subject
• Dense and Sparse Indices
or title.)
• Multi-Level Indices
7. The attribute or set of attributes used to look up records in
• Index Update a file is called the search key (not to be confused with
• Secondary Indices primary key, etc.).
• Static Hashing Ordered Indices
• Hash functions 1. In order to allow fast random access, an index structure may
• Bucket overflow be used.
• Dynamic hashing 2. A file may have several indices on different search keys.
Indexing and Hashing 3. If the file containing the records is sequentially ordered, the
index whose search key specifies the sequential order of the
1. Many queries reference only a small proportion of records in
file is the primary index, or clustering index. Note: The
a file. For example, finding all records at Perryridge branch
search key of a primary index is usually the primary key, but
only returns records where bname = “Perryridge”.
it is not necessarily so.
2. We should be able to locate these records directly, rather
4. Indices whose search key specifies an order different from
than having to read every record and check its branch-name.
the sequential order of the file are called the secondary
We then need extra file structuring.
indices, or nonclustering indices.
Basic Concepts
1. An index for a file works like a catalogue in a library. Cards
in alphabetic order tell us where to find books by a
particular author.
2. In real-world databases, indices like this might be too large
to be efficient. We’ll look at more sophisticated indexing
techniques.
3. There are two kinds of indices.
• Ordered indices: indices are based on a sorted ordering
of the values.
• Hash indices: indices are based on the values being
Dense and Sparse Indices
distributed uniformly across a range of buckets. The
buckets to which a value is assigned is determined by a 1. There are two types of ordered indices:
function, called a hash function. Dense Index:
4. We will consider several indexing techniques. No one • An index record appears for every search key value in
technique is the best. Each technique is best suited for a file.
particular database application. • This record contains search key value and a pointer to
5. Methods will be evaluated on: the actual record.
1. Access Types — types of access that are supported Sparse Index:
efficiently, e.g., value-based search or range search. • Index records are created only for some of the records.
2. Access Time — time to find a particular data item or • To locate a record, we find the index record with the
set of items. largest search key value less than or equal to the search
3. Insertion Time — time taken to insert a new data key value we are looking for.
item (includes time to find the right place to insert). • We start at that record pointed to by the index record,
4. Deletion Time — time to delete an item (includes and proceed along the pointers in the file (that is,
time taken to find item, as well as to update the index sequentially) until we find the desired record.
structure).
123
DATABASE
MANAGEMENT
Dense index.
2. Notice how we would find records for Perryridge branch
using both methods. (Do it!)
4. Two-level sparse index.

5. Use binary search on outer index. Scan index block found
until correct index record found. Use index record as before
- scan block pointed to for desired record.
6. For very large files, additional levels of indexing may be
required.
7. Indices must be updated at all levels when insertions or
deletions require it.
Sparse index. 8. Frequently, each level of index corresponds to a unit of
3. Dense indices are faster in general, but sparse indices require physical storage (e.g. indices at the level of track, cylinder and
less space and impose less maintenance for insertions and disk).
deletions. (Why?) Index Update
4. A good compromise: to have a sparse index with one entry Regardless of what form of index is used, every index must be
per block. updated whenever a record is either inserted into or deleted
Why is this good? from the file.
• Biggest cost is in bringing a block into main memory. 1. Deletion
• We are guaranteed to have the correct block with this • Find (look up) the record
method, unless record is on an overflow block (actually • If the last record with a particular search key value,
could be several blocks). delete that search key value from index.
• Index size still small. • For dense indices, this is like deleting a record in a file.
Multi-Level Indices • For sparse indices, delete a key value by replacing key
1. Even with a sparse index, index size may still grow too value’s entry in index by next search key value. If that
large. For 100,000 records, 10 per block, at one index record value already has an index entry, delete the entry.
per block, that’s 10,000 index records! Even if we can fit 100 2. Insertion
index records per block, this is 100 blocks. • Find place to insert.
2. If index is too large to be kept in main memory, a search • Dense index: insert search key value if not present.
results in several disk reads.
• Sparse index: no change unless new block is created. (In
• If there are no overflow blocks in the index, we can use this case, the first search key value appearing in the new
binary search. block is inserted into the index).
• This will read as many as 1+log2(b) blocks (as many as
Secondary Indices
7 for our 100 blocks).
1. If the search key of a secondary index is not a candidate key,
• If index has overflow blocks, then sequential search
it is not enough to point to just the first record with each
typically used, reading all b index blocks.
search-key value because the remaining records with the
3. Solution: Construct a sparse index on the index same search-key value could be anywhere in the file.
Therefore, a secondary index must contain pointers to all
the records.
124
DATABASE
Hash Functions
1. A good hash function gives an average-case lookup that is a
small constant, independent of the number of search keys.
2. We hope records are distributed uniformly among the
buckets.
MANAGEMENT
3. The worst hash function maps all keys to the same bucket.
4. The best hash function maps all keys to distinct addresses.
5. Ideally, distribution of keys to addresses is uniform and
random.
6. Suppose we have 26 buckets, and map names beginning
Sparse secondary index on cname. with ith letter of the alphabet to the ith bucket.
2. We can use an extra-level of indirection to implement • Problem: this does not give uniform distribution.
secondary indices on search keys that are not candidate keys. • Many more names will be mapped to “A” than to “X”.
A pointer does not point directly to the file but to a bucket
• Typical hash functions perform some operation on the
that contains pointers to the file.
internal binary machine representations of characters in
• To perform a lookup on Peterson, we must read all a key.
three records pointed to by entries in bucket 2.
• For example, compute the sum, modulo # of buckets,
• Only one entry points to a Peterson record, but three of the binary representations of characters of the
records need to be read. search key.
• As file is not ordered physically by cname, this may take • using this method for 10 buckets (assuming the ith
3 block accesses. character in the alphabet is represented by integer i).
3. Secondary indices must be dense, with an index entry for Handling of Bucket Overflows
every search-key value, and a pointer to every record in the
1. Open hashing occurs where records are stored in different
file.
buckets. Compute the hash function and search the
4. Secondary indices improve the performance of queries on corresponding bucket to find a record.
non-primary keys.
2. Closed hashing occurs where all records are stored in one
5. They also impose serious overhead on database bucket. Hash function computes addresses within that
hh((KKi ) ) = h ( K )
i j modification: whenever a file is updated, every index must bucket. (Deletions are difficult.) Not used much in database
be updated. applications.
Designer must decide whether to use secondary indices or not. 3. Drawback to our approach: Hash function must be
Static Hashing chosen at implementation time.
Index schemes force us to traverse an index structure. Hashing • Number of buckets is fixed, but the database may
avoids this. grow.
Hash File Organization • If number is too large, we waste space.
1. Hashing involves computing the address of a data item by • If number is too small, we get too many “collisions”,
computing a function on the search key value. resulting in records of many search key values being in
2. A hash function h is a function from the set of all search the same bucket.
key values K to the set of all bucket addresses B. • Choosing the number to be twice the number of
• We choose a number of buckets to correspond to the search key values in the file gives a good space/
number of search key values we will have stored in the performance tradeoff.
database. Hash Indices
• To perform a lookup on a search key value K i , we 1. A hash index organizes the search keys with their associated
compute , and search the bucket with that address. pointers into a hash file structure.
• If two search keys i and j map to the same address, 2. We apply a hash function on a search key to identify a
because , then the bucket at the address bucket, and store the key and its associated pointers in the
obtained will contain records with both search key bucket (or in overflow buckets).
values.
3. Strictly speaking, hash indices are only secondary index
• In this case we will have to check the search key value of structures, since if a file itself is organized using hashing,
every record in the bucket to get the ones we want. there is no need for a separate hash index structure on it.
• Insertion and deletion are simple.
125
• All such entries will have a common hash prefix, but
DATABASE
Dynamic Hashing
1. As the database grows over time, we have three options: the length of this prefix may be less than i.
• Choose hash function based on current file size. Get • So we give each bucket an integer giving the length of
performance degradation as file grows. the common hash prefix.
• Choose hash function based on anticipated file size. • Number of bucket entries pointing to bucket j is then
MANAGEMENT
Space is wasted initially. 2 (i–ij).

• Periodically re-organize hash structure as file grows. 4. To find the bucket containing search key value Ki:
Requires selecting new hash function, recomputing all • Compute h(Kt ).
addresses and generating new bucket assignments. • Take the first i high order bits of h(Kt).
Costly, and shuts down database.
• Look at the corresponding table entry for this i-bit
2. Some hashing techniques allow the hash function to be string.
modified dynamically to accommodate the growth or
• Follow the bucket pointer in the table entry.
shrinking of the database. These are called dynamic hash
functions. 5. We now look at insertions in an extendable hashing scheme.
• Extendable hashing is one form of dynamic hashing. • Follow the same procedure for lookup, ending up in
some bucket j.
• Extendable hashing splits and coalesces buckets as
database size changes. • If there is room in the bucket, insert information and
insert record in the file.
• This imposes some performance overhead, but space
efficiency is maintained. • If the bucket is full, we must split the bucket, and
redistribute the records.
• As reorganization is on one bucket at a time, overhead
is acceptably low. • If bucket is split we may need to increase the number
of bits we use in the hash.
3. How does it work?
6. Two cases exist:
1. If i = ij, then only one entry in the bucket address table
points to bucket j.
• Then we need to increase the size of the bucket
address table so that we can include pointers to the
two buckets that result from splitting bucket j.
• We increment i by one, thus considering more of
the hash, and doubling the size of the bucket
address table.
• Each entry is replaced by two entries, each
containing original value.
• Now two entries in bucket address table point to
bucket j.
General extendable hash structure. • We allocate a new bucket z, and set the second
• We choose a hash function that is uniform and pointer to point to z.
random that generates values over a relatively large • Set ij and iz to i.
range.
• Rehash all records in bucket j which are put in either
• Range is b-bit binary integers (typically b=32). j or z.
• is over 4 billion, so we don’t generate that many • Now insert new record.
buckets!
• It is remotely possible, but unlikely, that the new
• Instead we create buckets on demand, and do not use hash will still put all of the records in one bucket.
all b bits of the hash initially.
• If so, split again and increment i again.
• At any point we use i bits where .
2. If i>ij , then more than one entry in the bucket address
• The i bits are used as an offset into a table of bucket table points to bucket j.
addresses.
• Then we can split bucket j without increasing the
• Value of i grows and shrinks with the database. size of the bucket address table (why?).
• Note that the i appearing over the bucket address table • Note that all entries that point to bucket j
tells how many bits are required to determine the correspond to hash prefixes that have the same
correct bucket. value on the leftmost ij bits.
• It may be the case that several entries point to the same
bucket.
126
• We allocate a new bucket z, and set ij and iz to the
DATABASE
Points to Ponder
original ij value plus 1. • An index for a file works like a catalogue in a library
• Now adjust entries in the bucket address table that • In order to allow fast random access, an index structure may
previously pointed to bucket j. be used.
• Leave the first half pointing to bucket j, and make • Indices whose search key specifies an order different from
MANAGEMENT
the rest point to bucket z. the sequential order of the file are called the secondary
• Rehash each record in bucket j as before. indices
• Reattempt new insert. • Dense Index appears for every search key value in file. This
7. Note that in both cases we only need to rehash records in record contains search key value and a pointer to the actual
bucket j. record.
8. Deletion of records is similar. Buckets may have to be • Sparse Index: are created only for some of the records.
coalesced, and bucket address table may have to be halved. • Regardless of what form of index is used, every index must
9. Insertion is illustrated for the example deposit file be updated whenever a record is either inserted into or
deleted from the file.
• 32-bit hash values on bname are shown An initial
empty hash structure is shown • Hashing involves computing the address of a data item by
computing a function on the search key value.
• We insert records one by one.
• We (unrealistically) assume that a bucket can only hold Review Terms
2 records, in order to illustrate both situations • Index
described. • Ordered index
• As we insert the Perryridge and Round Hill records, • Dense Index
this first bucket becomes full.
• Sparse index
• When we insert the next record (Downtown), we must
• Hashing
split the bucket.
• Hash function
• Since i = i0, we need to increase the number of bits we
use from the hash. • Static hashing
• We now use 1 bit, allowing us 21 = 2 buckets. • Dynamic hashing
• This makes us double the size of the bucket address Students Activity
table to two entries. 1. Define index and hashing?
• We split the bucket, placing the records whose search
key hash begins with 1 in the new bucket, and those
with a 0 in the old bucket
• Next we attempt to insert the Redwood record, and
find it hashes to 1.
• That bucket is full, and i = i1.
• So we must split that bucket, increasing the number of
bits we must use to 2. 2. Define ordered indices?
• This necessitates doubling the bucket address table
again to four entries (F
• We rehash the entries in the old bucket.
• We continue on for the deposit records , obtaining the
extendable hash structure
8. Advantages:
• Extendable hashing provides performance that does 3. Differentiate between sparse index & dense index?
not degrade as the file grows.
• Minimal space overhead - no buckets need be reserved
for future use. Bucket address table only contains one
pointer for each hash value of current prefix length.
9. Disadvantages:
• Extra level of indirection in the bucket address table
• Added complexity
127
4. Define secondary index & multilevel index?
DATABASE
MANAGEMENT
5. Define Hashing?What is hash function?
6. Differentiate between Static hashing & dynamic hashing?
128
DATABASE MANAGEMENT
129
Student Notes
DATABASE
LESSON 40:
TRANSACTIONS MANAGEMENT
MANAGEMENT
Lesson Objectives To gain a better understanding of ACID properties and the

• Transactions need for them, con-sider a simplified banking system consisting
of several accounts and a set of trans-actions that access and
• ACID properties
update those accounts. For the time being, we assume that the
• Transaction state database permanently resides on disk, but that some portion of
• Implementation of ACID properties it is temporarily residing in main memory.
Collections of operations that form a single logical unit of Transactions access data using two operations:
work are called transac-tions. A database system must ensure • read(X), which transfers the data item X from the database
proper execution of transactions despite fail-ures-either the to a local buffer belonging to the transaction that executed
entire transaction executes, or none of it does. For example, a the read operation.
transfer of funds from a checking account to a savings account
• write(X), which transfers the data item X from the the local
is a single operation from the customer’s standpoint; within the
buffer of the trans action that executed the write back to the
database system, however, it consists of several operations.
database.
Clearly, it is essential that all these operations occur, or that, in
case of a failure, none occur. Furthermore, it must manage In a real database system, the write operation does not necessar-
concurrent execution of transactions in a way that avoids the ily result in the imme-diate update of the data on the disk; the
introduction of inconsistency. In our funds-transfer example, a write operation may be temporarily stored in memory and
transaction computing the customer’s total money might see executed on the disk later. For now, however, we shall assume
the checking-account balance before it is debited by the funds - that the write operation updates the database immediately.
transfer transaction, but see the savings balance after it is Let Ti be a transaction that transfers $50 from account A to
credited. As a result, it would obtain an incorrect result. account B. This trans-action can be defined as
Transaction Ti: read(A);
A transaction is a unit of program execution that accesses and A := A-50;
possibly updates var-ious data items. Usually, a transaction is write(A);
initiated by a user program written in a high-level data-manipu-
read(B);
lation language or programming language (for example, SQL,
COBOL, C, C++, or Java), where it is delimited by statements B := B + 50;
(or function calls) of the form begin transaction and end write(B).
transaction. The transaction consists of all opera-tions executed Let us now consider each of the ACID requirements. (For ease
between the begin transaction and end transaction. of presentation, we consider them in an order different from
To ensure integrity of the data, we require that the database the order A-C-I-D).
system maintain the following properties of the transactions. • Consistency: The consistency requirement here is that the
These properties are often called the ACID properties, the sum of A and B be unchanged by the execution of the
acronym is derived from the first letter of each of the four transaction. Without the consistency requirement, money
properties. could be created or destroyed by the transaction It can be
• Atomicity. Either all operations of the transaction are verified easily that, if the database is consistent before an
reflected properly in the database, or none are. execution of the transaction, the database remains
• Consistency. Execution of a transaction in isolation (that consistent after the execution of the transac-tion.
is, with no other transaction executing concurrently) Ensuring consistency for an individual transaction is the
preserves the consistency of the database. responsibility of the application programmer who codes the
• Isolation. Even though multiple transactions may execute transaction. This task may be facil-itated by automatic testing of
concurrently, the system guarantees that, for every pair of integrity constraints
transactions Ti and Tj, it appears to Ti that either Tj finished • Atomicity: Suppose that, just before the execution of
execution before Ti started, or Tj started execu-tion after Ti transaction Ti the values of accounts A and Bare $1000 and
finished. Thus, each transaction is unaware of other $2000, respectively. Now suppose that, dur-ing the
transactions executing concurrently in the system. execution of transaction Ti, a failure occurs that prevents Ti
• Durability. After a transaction completes successfully, the from com-pleting its execution successfully. Examples of
changes it has made to the database persist, even if there are such failures include power failures, hardware failures, and
system failures. software errors. Further, suppose that the fail-ure happened
after the write (A) operation but before the write(B)
130
operation. In this case, the values of accounts A and B is executing, with the de-ducted total written to A and the
DATABASE
reflected in the database are $950 and $2000. The system increased total yet to be written to B. If a second concurrently
destroyed $50 as a result of this failure. In particular, we running transaction reads A and B at this intermediate point
note that the sum A + B is no longer preserved. and computes A + B, it will observe an inconsistent value.
Thus, because of the failure, the state of the system no longer Furthermore, if this second transaction then performs updates
on A and B based on the in-consistent values that it read, the
MANAGEMENT
reflects a real state of the world that the database is supposed to
capture. We term such a state an inconsistent state. We must database may be left in an inconsistent state even after both
ensure that such inconsistencies are not visible in a database transactions have completed.
system. Note, however, that the system must at some point be A way to avoid the problem of concurrently executing transac-
in an inconsistent state. Even if transaction Ti is executed to tions is to execute transactions serially-that is, one after the
comple-tion, there exists a point at which the value of account other. However, concur-rent execution of transactions provides
A is $950 and the value of account B is $2000, which is clearly significant performance benefits Other solutions have therefore
an inconsistent state. This state, how-ever, is eventually replaced been developed; they allow multiple transactions to execute
by the consistent state where the value of account A is $950, concurrently.
and the value of account B is $2050. Thus, if the transaction The isolation property of a transaction ensures that the concur-
never started or was guaranteed to complete, such an inconsis- rent execution of transactions results in a system state that is
tent state would not be visible except during the execution of equivalent to a state that could have been obtained had these
the transaction. That is the reason for the atomicity requirement: transactions executed one at a time in some order. Ensuring the
If the atomicity property is present, all actions of the transac- isolation property is the responsibility of a component of the
tion are reflected in the database, or none are. database system called the concurrency-control component.
The basic idea behind ensuring atomicity is this: The database
Transaction State
system keeps track (on disk) of the old values of any data on
In the absence of failures, all transactions complete successfully.
which a transaction performs a write, and, if the transaction
However, as we noted earlier, a transaction may not always
does not complete its execution, the database sys-tem restores
complete its execution successfully. Such a transaction is termed
the old values to make it appear as though the transaction never
aborted. If we are to ensure the atomicity property, an aborted
executed.
transaction must have no effect on the state of the database.
• Durability: Once the execution of the transaction Thus, any changes that the aborted transaction made to the
completes successfully, and the user who initiated the database must be undone. Once the changes caused by an
transaction has been notified that the transfer of funds has aborted transaction have been undone, we say that the transac-
taken place, it must be the case that no system failure will tion has been rolled back. It is part of the responsibility of the
result in a loss of data corresponding to this transfer of recovery scheme to manage transaction aborts.
funds.
A transaction that completes its execution successfully is said to
The durability property guarantees that, once a transaction be committed. A committed transaction that has performed
completes suc-cessfully, all the updates that it carried out on the updates transforms the database into a new consistent state,
database persist, even if there is a system failure after the which must persist even if there is a system failure.
transaction completes execution.
Once a transaction has committed, we cannot undo its effects by
We assume for now that a failure of the computer system may aborting it. The only way to undo the effects of a committed
result in loss of data in main memory, but data written to disk transaction is to execute a compensating transaction. For
are never lost. We can guarantee durability by ensuring that instance, if a transaction added $20 to an account, the compen-
either sating transaction would subtract $20 from the account.
1. The updates carried out by the transaction have been written However, it is not always possible to create such a compensating
to disk be-fore the transaction completes. transaction. Therefore, the responsibility of writing and
2. Information about the updates carried out by the executing a compensating transaction is left to the user, and is
transaction and written to disk is sufficient to enable the not handled by the database system.
database to reconstruct the updates when the database We need to be more precise about what we mean by successful
system is restarted after the failure. completion of a trans-action. We therefore establish a simple
Ensuring durability is the responsibility of a component of the abstract transaction model. A transaction must be in one of the
database sys-tem called the recovery-management component. following states:
The transaction-manage-ment component and the recovery- • Active, the initial state; the transaction stays in this state
management component are closely re-lated. while it is executing . Partially committed, after the final
• Isolation: Even if the consistency and atomicity properties statement has been executed
are ensured for each transaction, if several transactions are • Failed, after the discovery that normal execution can no
executed concurrently, their oper-ations may interleave in longer proceed
some undesirable way, resulting in an inconsistent state. • Aborted, after the transaction has been rolled back and the
For example, as we saw earlier, the database is temporarily database has been restored to its state prior to the start of
inconsistent while the transaction to transfer funds from A to B the transaction
131
• Committed, after successful completion writes temporarily in nonvolatile storage, and to perform the ac-
DATABASE
The state diagram corresponding to a transaction appears in tual writes only after the transaction enters the committed state.
Figure 23.1 . We say that a transaction has committed only if it If the system should fail after the transaction has entered the
has entered the committed state. Simi-larly, we say that a committed state, but before it could complete the external
transaction has aborted only if it has entered the aborted state. writes, the database system will carry out the external writes
(using\~he data in nonvolatile storage) when the system is
MANAGEMENT
A transaction is said to have terminated if has either committed

or aborted. restarted.
A transaction starts in the active state. When it finishes its final Handling external writes can be more complicated in some
statement, it enters the partially committed state. At this point, situations. For example suppose the external action is that of
the transaction has completed its exe-cution, but it is still dispensing cash at an automated teller machine, and the system
possible that it may have to be aborted, since the actual output fails just before the cash is actually dispensed (we assume that
may still be temporarily residing in main memory, and thus a cash can be dispensed atomically). It makes no sense to
hardware failure may preclude its successful completion. dispense cash when the system is restarted, since the user may
have left the machine. In such a case a compensat-ing transac-
The database system then writes out enough information to
tion, such as depositing the cash back in the users account,
disk that, even in the event of a failure, the updates performed
needs to be executed when the system is restarted.
by the transaction can be re-created when the system restarts
after the failure. When the last of this information is written For certain applications, it may be desirable to allow active
out, the transaction enters the committed state. transactions to dis-play data to users, particularly for long-
duration transactions that run for minutes or hours.
As mentioned earlier, we assume for now that failures do not
Unfortunately, we cannot allow such output of observable data
result in loss of data on disk.
unless we are willing to compromise transaction atomicity. Most
A transaction enters the failed state after the system determines current transaction systems ensure atomicity and, therefore,
that the transac-tion can no longer proceed with its normal forbid this form of interaction with users.
execution (for example, because of hard-ware or logical errors).
Such a transaction must be rolled back. Then, it enters the Implementation of Atomicity and
aborted state. At this point, the system has two options: Durability
The recovery-management component of a database system can
Committed support atomicity and durability by a variety of schemes. We
Partially
committed first consider a simple, but extremely in-efficient, scheme called
the shadow copy scheme. This scheme, which is based on
making copies of the database, called shadow copies, assumes
that only one transac-tion is active at a time. The scheme also
Active assumes that the database is simply a file on disk. A pointer
called db-pointer is maintained on disk; it points to the current
copy of the database.
In the shadow-copy scheme, a transaction that wants to update
the database first creates a complete copy of the database. All
Failed Aborted
updates are done on the new database copy, leaving the original
copy, the shadow copy, untouched. If at any point the trans-
action has to be aborted, the system merely deletes the new
Figure 23.1 state diagram of a transaction
copy. The old copy of the database has not been affected.
• It can restart the transaction, but only if the transaction was
If the transaction completes, it is committed as follows. First,
aborted as a result of some hardware or software error that
the operating system is asked to make sure that all pages of the
was not created through the internal logic of the
new copy of the database have been written out to disk. (Unix
transaction. A restarted transaction is considered to be a new
systems use the flush command for this purpose.) After the
transaction.
operat-ing system has written all the pages to disk, the database
• It can kill the transaction. It usually does so because of system updates the pointer db-pointer to point to the new copy
some internal logical error that can be corrected only by of the database; the new copy then becomes the current copy of
rewriting the application program, or be-cause the input was the database. The old copy of the database is then deleted. Fig-
bad, or because the desired data were not found in the ure 23.2 depicts the scheme, showing the database state before
database. and after the update.
We must be cautious when dealing with observable external db-pointer db-pointer
writes, such as writes to a terminal or printer. Once such a write

has occurred, it cannot be erased, since it may have been seen
external to the database system. Most systems allow such writes Old copy of
Old copy of database (to be
to take place only after the transaction has entered the commit- database deleted )
New copy of
database
ted state. One way to implement such a scheme is for the
database system to store any value associated with such external (a) Before update (a) After update
132
Figure 23.2 Shadow-copy technique for atomicity and durabil- Unfortunately, this implementation is extremely inefficient in
DATABASE
ity. the context of large databases, since executing a single transac-
The transaction is said to have been committed at the point tion requires copying the entire database. Furthermore, the
where the updated db-pointer is written to disk. implementation does not allow transactions to execute
concurrently with one another. There are practical ways of
We now consider how the technique handles transaction and
implementing atomicity and durability that are much less
MANAGEMENT
system failures. First, consider transaction failure. If the
expensive and more powerful.
transaction fails at any time before db-pointer is updated, the
old contents of the database are not affected. We can abort the Points to Ponder
trans-action by just deleting the new copy of the database. Once • Collections of operations that form a single logical unit of
the transaction has been committed, all the updates that it work are called transac-tions
performed are in the database pointed to by db-pointer. Thus,
• Usually, a transaction is initiated by a user program written
either all updates of the transaction are reflected, or none of the
in a high-level data-manipulation language or programming
effects are reflected, regardless of transaction failure.
language
Now consider the issue of system failure. Suppose that the • Transactions access data using two operations:
system fails at any time before the updated db-pointer is written
to disk. Then, when the system restarts, it will read db-pointer Read(X)
arid will thus see the original contents of the database, and Write(X)
none of the effects of the transaction will be visible on the • A transaction may not always complete its execution
database. Next, suppose that the system fails after db-pointer successfully. Such a transaction is termed aborted.
has been updated on disk. Before the pointer is updated, all • A transaction must be in one of the following states:
updated pages of the new copy of the database were written to
Active
disk. Again, we assume that, once a file is written to disk, its
contents will not be damaged even if there is a system failure. Failed
Therefore, when the system restarts, it will read db-pointer and Aborted
will thus see the contents of the database after all the updates Commited
performed by the transaction.
• The recovery-management component of a database system
The implementation actually depends on the write to db- can support atomicity and durability by a variety of schemes
pointer being atomic; that is, either all its bytes are written or
none of its bytes are written. If some of the bytes of the Review Terms
pointer were updated by the write, but others were not, the • Transactions
pointer is meaningless, and neither old nor new versions of the • ACID properties
database may be found when the system restarts. Luckily, disk
• Transaction state
systems provide atomic updates to entire blocks, or at least to a
disk sector. In other words, the disk system guarantees that it • Implementation of ACID properties
will update db-pointer atomically, as long as we make sure that Students Activity
db-pointer lies entirely in a single sector, which we can ensure by 1. Define transaction with the help of an example?
storing db-pointer at the beginning of a block.
Thus, the atomicity and durability properties of transactions are
ensured by the shadow-copy implementation of the recovery-
management component.
As a simple example of a transaction outside the database
domain, consider a text editing session. An entire editing
session can be modeled as a transaction. The actions executed by
the transaction are reading and updating the file. Saving the file
at the end of editing corresponds to a commit of the editing 2. Define ACID properties of a database?
transaction; quitting the editing session without saving the file
corresponds to an abort of the editing transaction.
Many text editors use essentially the implementation just
described, to ensure that an editing session is transactional. A
new file is used to store the updated file. At the end of the
editing session, if the updated file is to be saved, the text editor
uses a file rename command to rename the new file to have the
actual file name. The rename, assumed to be implemented as an 3. Define various states of a transaction ?
atomic operation by the underlying file system, deletes the old
file as well.
133
DATABASE
MANAGEMENT
4. Define the implementation of Atomicity property of a

database?
5. Define implementation of Durability?
6. Define the importance of consistency in a database

transaction?
7. Define Isolation of a database?
134
DATABASE MANAGEMENT
135
Student Notes
DATABASE
LESSON 41:
CONCURRENCY CONTROL - I
MANAGEMENT
Lesson objectives to help identify those executions that are guaranteed to ensure
• concurrent executions consistency.
• Advantages of concurrency The database system must control the interaction among the
concurrent trans-actions to prevent them from destroying the
• Problems with concurrency
consistency of the database. It does so through a variety of
• Serializibility mechanisms called concurrency-control schemes
Concurrent Executions Consider again the simplified banking system of Section 23.1,
Transaction-processing systems usually allow multiple transac- which has several accounts, and a set of transactions that access
tions to run concur-rently. Allowing multiple transactions to and update those accounts. Let Tl and T2 be two transactions
update data concurrently causes several complications with that transfer funds from one account to another. Transaction Tl
consistency of the data, as we saw earlier. Ensuring consistency transfers $50 from account A to account R It is defined as
in spite of concurrent execution of transactions requires extra T1: read(A);
work; it is far easier to insist that transactions run serially-that is,
A := A-50;
one at a time, each starting only after the previous one has
completed. However, there are two good reasons for allowing write(A);
concurrency: read(B);
• Improved throughput and resource utilization. A B := B + 50;
transaction consists of many steps. Some involve I/O write(B).
activity; others involve CPU activity. The CPU and the disks
in a computer system can operate in parallel. Therefore, I/O Transaction T2 transfers 10 percent of the balance from account
activity can be done in parallel with processing at the CPU. A to account B. It is defined as
The parallelism of the CPU and the I/O system can T2: read(A);
therefore be exploited to run multiple transactions in temp:= A * 0.1;
parallel. While a read or write on behalf of one transaction A :=A - temp;
is in progress on one disk, another transaction can be
running in the CPU, while another disk may be executing a write(A);
read or write on behalf of a third transaction. All of this read(B);
increases the throughput of the system-that is, the number B := B + temp;
of transactions executed in a given amount of time. write(B).
Correspondingly, the processor and disk utilization also
increase; in other words, the processor and disk spend less Suppose the current values of accounts A and Bare $1000 and
time idle, or not performing any useful work. $2000, respectively. Suppose also that the two transactions are
executed one at a time in the order Tl followed by T2. This
• Reduced waiting time. There may be a mix of execution sequence appears in Figure 23.3. In the figure, the
transactions running on a sys-tem, some short and some sequence of instruction steps is in chronological order from top
long. If transactions run serially, a short transaction may to bottom, with in-structions of Tl appearing in the left
have to wait for a preceding long transaction to complete, column and instructions of T2 appearing in the right column.
which can lead to unpredictable delays in running a The final values of accounts A and B, after the execution in
transaction. If the transactions are oper-ating on different Figure 23.3 takes place, are $855 and $2145, respectively. Thus,
parts of the database, it is better tolet them run the total amount of money in
concurrently, sharing the CPU cycles and disk accesses
among them. Concurrent execution reduces the T1 T2
unpredictable delays in running transactions. Moreover, it read(A) read(A)
also reduces the average response time: the average time for A := A-50 temp:= A * 0.1
a transaction to be completed after it has been submitted. write (A) A := A ~ temp
The motivation for using concurrent execution in a database is read(B) write(A)
essentially the same as the motivation for using multiprogram-
ming in an operating system. B := B + 50 read(B)
When several transactions run concurrently, database consistency write (B) B := B + temp
can be destroyed despite the correctness of each individual write(B)
transaction. In this section, we present the concept of schedules Figure 41.1 Schedule I-a serial schedule in which Tl is followed
by T2.
136
Accounts A and B-that is, the sum A + B - is preserved after T1 T2
DATABASE
the execution of both transactions. read(A)
Similarly, if the transactions are executed one at a time in the A := A-50
order T2 followed by TI, then the corresponding execution
write(A)
sequence is that of Figure 41.2. Again, as expected, the sum A +
read(A)
MANAGEMENT
B is preserved, and the final values of accounts A and Bare $850
and $2150, respectively. temp := A * 0.1
The execution sequences just described are called schedules. A :=A - temp
They represent the chronological order in which instructions are write(A)
executed in the system. Clearly, a sched-ule for a set of transac-
read(B)
tions must consist of all instructions of those transactions, and
must preserve the order in which the instructions appear in each B := B + 50
individual transac-tion. For example, in transaction TI, the write(B)
instruction write(A) must appear before the instruction read(B), read(B)
in any valid schedule. In the following discussion, we shall refer
B := B + temp
to the first execution sequence (TI followed by T2) as schedule 1,
and to the second execution sequence (T2 followed by T1 )as write(B)
schedule 2. Figure 41.3 Schedule 3-a concurrent schedule equivalent to
These schedules are serial: Each serial schedule consists of a schedule 1.
sequence of instruc-tions from various transactions, where the another transaction. Thus, the number of possible schedules
instructions belonging to one single trans-action appear for a set of n transactions is much larger than n!.
together in that schedule. Thus, for a set of n transactions, there Returning to our previous example, suppose that the two
exist n! different valid serial schedules. transactions are exe-cuted concurrently. One possible schedule
When the database system executes several transactions appears in Figure 41.3. After this execution takes place, we arrive
concurrently, the corre-sponding schedule no longer needs to be at the same state as the one in which the transactions are
serial. If two transactions are running con-currently, the executed serially in the order Tl followed by T2. The sum A + B
operating system may execute one transaction for a little while, is indeed preserved.
then perform a context switch, execute the second transaction Not all concurrent executions result in a correct state. To
for some time, and then switch back to the first transaction for illustrate, consider the schedule of Figure 23.6. After the
some time, and so on. With multiple transac-tions, the CPU execution of this schedule, we arrive at a state where the final
time is shared among all the transactions. values of accounts A and Bare $950 and $2100, respectively.
Several execution sequences are possible, since the various This final state is an inconsistent state, since we have gained $50
instructions from both transactions may now be interleaved. In in the process of the concur-rent execution. Indeed, the sum A
general, it is not possible to predict exactly how many instruc- + B is not preserved by the execution of the two transactions.
tions of a transaction will be executed before the CPU switches If control of concurrent execution is left entirely to the operat-
to ing system, many possible schedules, including ones that leave
Tl T2 the database in an inconsistent state, such as the one just
read(A) described, are possible. It is the job of the database system to
ensure that any schedule that gets executed will leave the
temp:= A * 0.1
database in a consistent state. The concurrency-control compo-
A :=A - temp nent of the database system carries out this task.
write(A) We can ensure consistency of the database under concurrent
read(B) execution by making sure that any schedule that executed has
B := B + temp the same effect as a schedule that could have occurred without
any concurrent execution. That is, the schedule should, in some
write(B)
sense, be equivalent to a serial schedule. We examine this idea in
read(A) Section
A := A-50
Serializability
write(A) The database system must control concurrent execution of
read(B) transactions, to ensure that the database state remains consis-
B := B + 50 tent. Before we examine how the database
write(B) T1 T2
Figure 41.2 Schedule 2-a serial schedule in which T2 is followed read(A)
by TI. A := A-50
read(A)
137
temp := A * 0.1 • Concurrent execution reduces the unpredictable delays in
DATABASE
A :=A – temp running transactions. Moreover, it also reduces the average

response time
write (A)
• The database system must control concurrent execution of
read(B)
transactions, to ensure that the database state remains
write (A)
MANAGEMENT
consistent
read(B) • Since transactions are programs, it is computationally
B := B + 50 difficult to determine ex-actly what operations a transaction
write (B) performs and how operations of various trans-actions
interact
B := B + temp
write (B) Review Terms
Figure 41.4 Schedule 4-a concurrent schedule. • Concurrent executions
system can carry out this task, we must first understand which • Advantages of concurrency
schedules will en-sure consistency, and which schedules will not. • Problems with concurrency
Since transactions are programs, it is computationally difficult to • Serializibility
determine ex-actly what operations a transaction performs and Students Activity
how operations of various trans-actions interact. For this
1. Define concurrency?
reason, we shall not interpret the type of operations that a
transaction can perform on a data item. Instead, we consider
only two operations: read and write. We thus assume that,
between a read (Q) instruction and a write (Q) instruction on a
data item Q, a transaction may perform an arbitrary sequence of
op-erations on the copy of Q that is residing in the local buffer
of the transaction. Thus, the only significant operations of a
transaction, from a scheduling point of view, are its read and
write instructions. We shall therefore usually show only read
and write instructions in schedules, as we do in schedule 3 in
2. Define the advantages & disadvantages of concurrency
Figure 41.5.
control?
In this section, we discuss different forms of schedule equiva-
lence; they lead to the notions of conflict serializability and view
serializability.
T1 T2
read(A)
write(A)
read(A)
write(A)
read(B) 3. Define serializibility?
write(B)
read(B)
write(B)
Figure 41.5 Schedule 3-showing only the read and write
instructions.
Points to Ponder
• Transaction-processing systems usually allow multiple
transactions to run concur-rently 4. When a database can enter into an inconsistent state?
• Allowing multiple transactions to update data concurrently
causes several complications with consistency of the data
• The parallelism of the CPU and the I/O system can
therefore be exploited to run multiple transactions in
parallel
138
5. How we can ignore inconsistency problem?
DATABASE
MANAGEMENT
139
Student Notes
140
DATABASE MANAGEMENT
DATABASE
LESSON 42:
CONCURENCY CONTROL - II
MANAGEMENT
Lesson objectives new schedule S!. We expect S to be equivalent to S!, since all
• Types of serializibility instructions appear in the same order in both schedules except
for Ii and Ij, whose order does not matter.
• Conflict Serializability
Since the write(A) instruction of T2 in schedule 3 of Figure 23.7
• View Serializability
does not conflict with the read(B) instruction of TI, we can
• Implementation of isolation swap these instructions to generate an equivalent schedule,
• Transaction definition in SQL schedule 5, in. Figure 23.8. Regardless of the initial system state,
Conflict Serializability schedules 3 and 5 both produce the same final system state.
Let us consider a schedule 5 in which there are two consecutive We continue to swap non-conflicting instructions:
instructions Ii and Ij, of transactions Ti and Tj, respectively (i ≠ • Swap the read (B) instruction of TI with the read (A)
j). If Ii and Ij refer to different data items, then we can swap Ii instruction of T2.
and Ij without affecting the results of any instruction in the • Swap the write (B) instruction of TI with the write (A)
schedule. However, if Ii and Ij refer to the same data item Q, instruction of T2.
then the order of the two steps may matter. Since we are dealing
• Swap the write(B) instruction of TI with the read(A)
with only read and write instructions, there are four cases that
instruction of T2.
we need to consider:
T1 T2
1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not
matter, since the same value of Q is read by Ti and Tj, Read (A)
regardless of the order. Write (A)
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti Read (A)
does not read the value of Q that is written by Tj in Read (B)
instruction Ij. If Ij comes before Ii, then Ti reads the value
of Q that is written by Tj. Thus, the order of Ii and Ij Write (A)
matters. Write (B)
3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters Read (B)
for reasons similar to those of the previous case. Write (B)
4. Ii = write(Q), Ij = write(Q). Since both instructions are write Figure 42.1 Schedule 5-schedule 3 after swapping of a pair of
operations, the order of these instructions does not affect instructions.
either Ti or Tj. However, the value obtained by the next The final result of these swaps, schedule 6 of Figure 42.2, is a
read(Q) instruction of 5 is affected, since the result of only serial schedule. Thus, we have shown that schedule 3 is
the latter of the two write instructions is preserved in the equivalent to a serial schedule. This equivalence implies that,
database. If there is no other write(Q) instruction after Ii regardless of the initial system state, schedule 3 will produce the
and Ij in 5, then the order of Ii and Ij directly affects the final same final state as will some serial schedule.
value of Q in the database state that results from schedule
5. If a schedule S can be transformed into a schedule S’ by a series
of swaps of non--conflicting instructions, we say that Sand S’
Thus, only in the case where both Ii and Ij are read instructions are conflict equivalent.
does the relative order of their execution not matter.
In our previous examples, schedule 1 is not conflict equivalent
We say that Ii and Ij conflict if they are operations by different to schedule 2. How- ever, schedule 1 is conflict equivalent to
transactions on the same data item, and at least one of these schedule 3, Because the read(B) and write(B) instruction of Tl
instructions is a write operation. can be swapped with the read(A) and write(A) instruction of T2.
To illustrate the concept of conflicting instructions, we consider The concept of conflict equivalence leads to the concept of
schedule 3, in Fig-ure 23.7. The write(A) instruction of Tl conflict serializability. We say that a schedule S is conflict
conflicts with the read(A) instruction of T2. However, the serializable if it is conflict equivalent to a serial schedule. Thus,
write(A) instruction of T2 does not conflict with the read(B) schedule 3 is conflict serializable, since it is conflict equivalent to
instruction of TI, because the two instructions access different the serial schedule 1.
data items.
Finally, consider schedule 7 of Figure 42.3; it consists of only
Let Ii and Ij be consecutive instructions of a schedule 5. If Ii and the significant op-erations (that is, the read and write) of
Ij are instructions of different transactions and Ii and Ij do not transactions T3 and T4. This schedule is not conflict serializable,
conflict, then we can swap the order of Ii and Ij to produce a
141
since it is not equivalent to either the serial schedule <T3,T4> or read(B)
DATABASE
the serial schedule <T4,T3>. B := B + 50

It is possible to have two schedules that produce the same write(B)
outcome, but that are not conflict equivalent. For example,
read(A)
consider transaction T5, which transfers $10
A := A + 10
MANAGEMENT
T1 T2
write (A)
read(A)
figure 42.4 schedule 8.
write(A)
Consider two schedules S and S’, where the same set of
read(B)
transactions participates in both schedules. The schedules S and
write(B) S’ are said to be view equivalent if three conditions are met:
read(A) 1. For each data item Q, if transaction Ti reads the initial value
write(A) of Q in schedule S, then transaction Ti must, in schedule S’,
read(B) also read the initial value of Q.
write (B) 2. For each data item Q, if transaction Ti executes read(Q) in
schedule S, and if that value was produced by a write(Q)
Figure 42.2 Schedule 6-a serial schedule that is equivalent to
operation executed by transaction Ti, then the read(Q)
schedule 3.
operation of transaction Ti must, in schedule S’, also read
T3 T4 the value of Q that was produced by the same write(Q)
Read (Q) operation of transaction Ti.
Write (Q) 3. For each data item Q, the transaction (if any) that performs
Write (Q) the final write(Q) operation in schedule S must perform the
final write(Q) operation in sched-ule S’.
Figure 42.3 Schedule 7.
Conditions 1 and 2 ensure that each transaction reads the same
from account B to account A. Let schedule 8 be as defined in
values in both schedules and, therefore, performs the same
Figure 42.4. We claim that schedule 8 is not conflict equivalent
computation. Condition 3, coupled with conditions 1 and 2,
to the serial schedule <TI, T5>, since, in schedule 8, the write (B)
ensures that both schedules result in the same final system
instruction of Ts conflicts with the read (B) instruction of TI.
state.
Thus, we cannot move all the instructions of TI before those of
T5 by swapping con-secutive non-conflicting instructions. In our previous examples, schedule 1 is not view equivalent to
However, the final values of accounts A and B after the schedule 2, since, in schedule 1, the value of account A read by
execution of either schedule 8 or the serial schedule <TI,T5> are transaction T2 was produced by T1 whereas this case does not
the same -$960 and $2040, respectively. hold in schedule 2. However, schedule 1 is view equivalent to
schedule 3, because the values of account A and B read by
We can see from this example that there are less stringent
transaction T2 were produced by T1 in both schedules.
definitions of schedule equivalence than conflict equivalence.
For the system to determine that schedule 8 produces the same The concept of view equivalence leads to the concept of view
outcome as the serial schedule <TI,T5>, it must analyze the serializability. We say that a schedule 5 is view serializable if it is
com-putation performed by TI and T5, rather than just the read view equivalent to a serial schedule.
and write operations. In general, such analysis is hard to As an illustration, suppose that we augment schedule 7 with
implement and is computationally expensive. How-ever, there transaction T6, and obtain schedule 9 in Figure 42.5. Schedule 9
are other definitions of schedule equivalence based purely on is view serializable. Indeed, it is view equivalent to the serial
the read and write operations. We will consider one such schedule <T3, T4, T6>, since the one read(Q) instruction reads
definition in the next section. the initial value of Q in both schedules, and T6 performs the
final write of Q in both schedules.
View Serializability
In this section, we consider a form of equivalence that is less Every conflict-serializable schedule is also view serializable, but
stringent than conflict equivalence, but that, like conflict there are view-serializable schedules that are not conflict
equivalence, is based on only the read and write operations of serializable. Indeed, schedule 9 is not con-flict serializable, since
transactions. every pair of consecutive instructions conflicts, and, thus, no
swapping of instructions is possible.
TI T2
Observe that, in schedule 9, transactions T4 and T6 perform
read(A)
write(Q) operations without having performed a read(Q)
A := A-50 operation. Writes of this sort are called blind writes. Blind
write (A) writes appear in any view-serializable schedule that is not conflict
read(B) seri-alizable.
B := B - 10
write(B)
142
schedule must have the same effect as would some serial
DATABASE
T3 T4 T6
schedule. Thus, conflict and view serializability are both
read( Q)
acceptable.
write(Q)
The SQL.92 standard also allows a transaction to specify that it
write( Q) write(Q)
may be executed in a manner that causes it to become
Figure 42.5 Schedule 9-a view-serializable schedule.
MANAGEMENT
nonserializable with respect to other transactions. We study
Implementation of Isolation such weaker levels of consistency in Section 16.8.
So far, we have seen what properties a schedule must have if it Points to Ponder
is to leave the database in a consistent state and allow transac-
• A transaction is a unit of program execution that accesses
tion failures to be handled in a safe manner. Specifically,
and possibly updates var-ious data items.
schedules that are conflict or view serializable and cascadeless
satisfy these requirements. • To ensure integrity of the data, we require that the database
system maintain the following properties of the
There are various concurrency-control schemes that we can use
transactions. These properties are often called the ACID
to ensure that, even when multiple transactions are executed
properties.
concurrently, only acceptable sched-ules are generated, regardless
of how the operating-system time-shares resources (such as • Allowing multiple transactions to update data concurrently
CPU time) among the transactions. causes several complications with consistency
As a trivial example of a concurrency-control scheme, consider • The database system must control concurrent execution of
this scheme: A transaction acquires a lock on the entire database transactions, to ensure that the database state remains
before it starts and releases the lock after it has committed. consistent
While a transaction holds a lock, no other transaction is allowed Review Terms
to acquire the lock and all must therefore wait for the lock to be • Inconsistent state
released. As a result of the locking policy, only one transaction
can execute at a time. Therefore, only serial schedules are • Transaction state
generated. These are trivially serializable, and it is easy to verify • Active
that they are cascadeless as well. • Partially committed
A concurrency-control scheme such as this one leads to poor • Failed
performance, since it forces transactions to wait for preceding • Aborted
transactions to finish before they can start. In other words, it
• Committed
provides a poor degree of concurrency. As explained in Section
23.4, concurrent execution has several performance benefits. • Terminated
The goal of concurrency-control schemes is to provide a high • Transaction
degree of concur-rency, while ensuring that all schedules that can • Restart
be generated are conflict or view serializable, and are cascadeless. • Kill
The schemes have different trade-offs in terms of the amount • Observable external writes
of concurrency they allow and the amount of overhead that
• Shadow copy scheme
they incur. Some of them allow only conflict serializable
schedules to be generated; others allow certain view-serializable • Concurrent executions
schedules that are not conflict-serializable to be generated. • Serial execution
Transaction Definition in SQL • Schedules
A data-manipulation language must include a construct for • Conflict of operations
specifying the set of ac-tions that constitute a transaction. • Conflict equivalence
The SQL standard specifies that a transaction begins implicitly. • Conflict serializability
Transactions are ended by one of these SQL statements:
• View equivalence
• Commit work commits the current transaction and begins
• View serializability
a new one.
• Rollback work causes the current transaction to abort. Students Activity
The keyword work is optional in both the statements. If a 1. Define Transaction?
program terminates with-out either of these commands, the
updates are either committed or rolled back -which of the two
happens is not specified by the standard and depends on the
im-plementation.
The standard also specifies that the system must ensure both
serializability and freedom from cascading rollback. The
definition of serializability used by the stan-dard is that a
143
2. Define Conflict Serializibility?
DATABASE
MANAGEMENT
3. Define View Serializibility?
4. Define Commit,Rollback transaction?
144
DATABASE MANAGEMENT
145
Student Notes
DATABASE
LESSON 43:
CONCURRENCY CONTROL - III
MANAGEMENT
Lesson objectives 2. Subsequently, a data item Q can be locked by Ti only if the

• Locking mechanism parent of Q is currently locked by Ti.
• Graph based protocol 3. Data items may be unlocked at any time.
• Timestamps 4. A data item that has been locked and unlocked by Ti cannot
subsequently be relocked by T1.
• Timestamp based protocol
All schedules that are legal under the tree protocol are conflict
• Timestamp ordering protocols
serializable.
• Validation protocol
To illustrate this protocol, consider the database graph of
When the lock manager receives an unlock message from a Figure 43.1. The follow-ing four transactions follow the tree
transaction, it deletes the record for that data item in the linked protocol on this graph. We show only the lock and unlock
list corresponding to that transaction. It tests the record that instructions:
follows, if any to see if that request can now be granted. If it
TlO : lock-X(B); lock-X(E); lock-X(D); unlock(B); unlock(E);
can, the lock man-ager grants that request, and processes the
lock-X(G); unlock(D); unlock(G).
record following it, if any, similarly, and so on.
T11: lock-X(D); lock-X(H); unlock(D); unlock(H).
If a transaction aborts, the lock manager deletes any waiting
request made by the transaction. Once the database system has T12: lock-X(B); lock-X(E); unlock(E); unlock(B).
taken appropriate actions to undo the transaction it releases all T13: lock-X(D); lock-X(H); unlock(D); unlock(H).
locks held by the aborted transaction. One possible schedule in which these four transactions
This algorithm guarantees freedom from starvation for lock participated appears in Figure 43.2. Note that, during its
requests, since a re-quest can never be granted while a request execution, transaction T1O holds locks on two dis-joint subtrees.
received earlier is waiting to be granted. Observe that the schedule of Figure 43.2 is conflict serializable.
Graph-based Protocols It can be shown not only that the tree protocol ensures conflict
But, if we wish to develop protocols that are not two phase, we serializability, but also that this proto-col ensures freedom from
need additional information on how each transaction will access deadlock.
the database. There are various models that can give us the The tree protocol in Figure 43.2 does not ensure recoverability
additional information, each differing in the amount of and cascadeless-ness. To ensure recoverability and
information provided. The simplest model requires that we cascadelessness, the protocol can be modified to not permit
have prior knowledge about the order in which the database release of exclusive locks until the end of the transaction.
items will be accessed. Given such information, it is possible to Holding exclu-sive locks until the end of the transaction reduces
construct locking protocols that are not two phase, but that, concurrency. Here is an alterna-tive that improves concurrency,
nevertheless, ensure conflict serializability. but ensures only recoverability: For each data item with an
To acquire such prior knowledge, we impose a partial ordering uncommitted write we record which transaction performed the
® on the set D = {d1, d2, . . ., dh} of all data items. If di → dj, last write to the data item. Whenever a transaction Ti performs a
then any transaction accessing both di and dj must access di read of an uncommitted data item, we record a commit
before accessing dj. This partial ordering may be the result of dependency of Ti on the transaction that performed the last
either the logical or the physical organization of the data, or it write to the data item. Transaction Ti is then not permitted to
may be imposed solely for the purpose of concurrency control. commit until the commit of all transactions on which it has a
The partial ordering implies that the set D may now be viewed commit dependency. If any of these trans-actions aborts, Ti
as a directed acyclic graph, called a database graph. In this section, must also be aborted.
for the sake of simplicity, we will restrict our attention to only
those graphs that are rooted trees. We will present a simple
protocol, called the tree protocol, which is restricted to employ
only exclusive locks. References to other, more complex, graph-
based locking protocols are in the bibliographical notes.
In the tree protocol, the only lock instruction allowed is lock-X.
Each transaction Ti can lock a data item at most once, and must
observe the following rules:
1. The first lock by Ti may be on any data item.
146
Figure 43.1 Tree-structured database graph TS(Ti), and a new transaction Tj enters the system, then TS(Ti)
DATABASE
< TS(Tj). There are two simple methods for implementing this
T10 T11 T12 T13
lock-X(B) scheme:
lock-x (D) 1. Use the value of the system clock as the timestamp; that is,
lock-x(H) a transaction’s time-stamp is equal to the value of the clock
unlock(D)
MANAGEMENT
when the transaction enters the system.
lock-x(E)
lock-x(D) 2. Use a logical counter that is incremented after a new
unlock(B) timestamp has been assigned; that is, a transaction’s
unlock(E) timestamp is equal to the value of the counter when the
lock-x (B) transaction enters the system.
lock-x (E)
unlock (H) The timestamps of the transactions determine the serializability
lock-X(G) order. Thus, if TS(Ti) < TS(Tj), then the system must ensure
unlock(D) that the produced schedule is equiva-lent to a serial schedule in
lock-x (D) which transaction Ti appears before transaction Tj.
lock-X(H)
unlock(D) To implement this scheme, we associate with each data item Q
unlock(H) two timestamp values:
unlock(E)
• W-timestamp(Q) denotes the largest timestamp of any
unlock(B)
unlock (G) transaction that exe-cuted write(Q) successfully.
• R-timestamp(Q) denotes the largest timestamp of any
Figure 43.1 Serializable schedule under the tree protocol.
transaction that exe-cuted read(Q) successfully.
The tree-locking protocol has an advantage over the two-phase
These timestamps are updated whenever a new read(Q) or
locking protocol in that, unlike two-phase locking, it is dead-
write(Q) instruction is executed.
lock-free, so no rollbacks are required. The tree-locking protocol
has another advantage over the two-phase locking protocol in The Timestamp-ordering Protocol
that unlocking may occur earlier. Earlier unlocking may lead to The timestamp-ordering protocol ensures that any conflicting
shorter waiting times, and to an increase in concurrency. read and write opera-tions are executed in timestamp order.
However, the protocol has the disadvantage that, in some cases, This protocol operates as follows:
a transaction may have to lock data items that it does not access. 1. Suppose that transaction Ti issues read(Q).
For example, a transaction that needs to access data items A and a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a
J in the database graph of Figure 43.1 must lock not only A and value of Q that was already overwritten. Hence, the
J, but also data items B, D, and H. This additional locking read operation is rejected, and Ti is rolled back.
results in increased locking overhead, the possibility of addi- b. If TS(Ti) ~ W-timestamp(Q), then the read operation
tional waiting time, and a potential decrease in concurrency. is executed, and R-timestamp(Q) is set to the
Further, without prior knowledge of what data items will need maximum of R-timestamp(Q) and TS(Ti).
to be locked, transactions will have to lock the root of the tree,
2. Suppose that transaction Ti issues write(Q).
and that can reduce concurrency greatly.
a. If TS(Ti) < R-timestamp(Q), then the value of Q that
For a set of transactions, there may be conflict-serializable
Ti is producing was needed previously, and the system
schedules that cannot be obtained through the tree protocol.
assumed that that value would never be produced.
Indeed, there are schedules possible under the two-phase
Hence, the system rejects the write operation and rolls
locking protocol that are not possible under the tree protocol,
Ti back.
and vice versa. Examples of such schedules are explored in the
exercises. b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to
write an obsolete value of Q. Hence, the system rejects
Timestamp-based Protocols this write operation and rolls Ti back.
The locking protocols that we have described thus far determine
the order between every pair of conflicting transactions at c. Otherwise, the system executes the write operation and
execution time by the first lock that both members of the pair sets W-timestamp(Q) to TS(Ti). TS(Ti) = R/W -
request that involves incompatible modes. Another method for TS(Q).
determining the serializability order is to select an ordering If a transaction Ti is rolled back by the concurrency-control
among transactions in advance. The most common method for scheme as result of is-suance of either a read or write operation,
doing so is to use a timestamp-ordering scheme. the system assigns it a new timestamp and restarts it.
Timestamps To illustrate this protocol, we consider transactions TI4 and T15.
With each transaction Ti in the system, we associate a unique Transaction TI4 displays the contents of accounts A and B:
fixed timestamp, de-noted by TS(Ti). This timestamp is T14: read(B);
assigned by the database system before the trans-action Ti starts read(A);
execution. If a transaction Ti has been assigned timestamp
display(A + B).
147
Transaction TI5 transfers $50 from account A to account B, and • Recoverability alone can be ensured by tracking
DATABASE
then displays the uncommitted writes, and al-lowing a transaction Ti to

contents of both: commit only after the commit of any transaction that wrote
a value that Ti read. Commit dependencies.
T15: read(B);
B := B - 50; Validation-based Protocols
MANAGEMENT
In cases where a majority of transactions are read-only transac-

write(B);
tions, the rate of con-flicts among transactions may be low.
read(A); Thus, many of these transactions, if executed without the
A := A + 50; supervision of a concurrency-control scheme, would neverthe-
write(A); less leave the system in a consistent state. A concurrency-control
scheme imposes overhead of code execution and possible delay
display(A + B).
of transactions. It may be better to use an alterna-tive scheme
In presenting schedules under the timestamp protocol, we shall that imposes less overhead. A difficulty in reducing the
assume that a trans-action is assigned a timestamp immediately overhead is that we do not know in advance which transactions
before its first instruction. Thus, in sched-ule 3 of Figure 43.3, will be involved in a conflict. To gain that knowledge, we need a
TS(TI4) < TS(TI5), and the schedule is possible under the time- scheme for monitoring the system.
stamp protocol.
We assume that each transaction Ti executes in two or three
We note that the preceding execution can also be produced by different phases in its lifetime, depending on whether it is a
the two-phase lock-ing protocol. There are, however, schedules read-only or an update transaction. The phases are, in order,
that are possible under the two-phase locking protocol, but are
1. Read phase. During this phase, the system executes
not possible under the timestamp protocol, and vice versa
transaction Ti. It reads the values of the various data items
The timestamp-ordering protocol ensures conflict serializability. and stores them in variables local to Ti. It performs all write
This is because conflicting operations are processed in operations on temporary local variables, without updates of
timestamp order. the actual database.
The protocol ensures freedom from deadlock, since no 2. Validation phase. Transaction Ti performs a validation test
transaction ever waits. However, there is a possibility of to determine whe-ther it can copy to the database the
starvation of long transactions if a sequence of conflicting temporary local variables that hold the results of write
short transactions causes repeated restarting of the long operations without causing a violation of serializability.
transaction. If
3. Write phase. If transaction Ti succeeds in validation (step 2),
T14 T15 then the system applies the actual updates to the database.
read (B) Otherwise, the system rolls back Ti.
Read (B)
Each transaction must go through the three phases in the order
B: = B-50
shown. However, all three phases of concurrently executing
Write (B)
transactions can be interleaved.
read (A)
read (A) To perform the validation test; we need to know when the
Display(A + B) various phases of trans-actions Ti took place. We shall, there-
A : = A+50 fore, associate three different timestamps with transaction Ti:
Write (A) 1. Start(Ti), the time when Ti started its execution.
Display (A+B)
2. Validation(Ti), the time when Ti finished its read phase and
Figure 43.13 Schedule 3. started its vali-dation phase.
a transaction is found to be getting restarted repeatedly, 3. Finish(Ti), the time when Ti finished its write phase.
conflicting transactions need to be temporarily blocked to enable We determine the serializability order by the timestamp-
the transaction to finish. ordering technique, using the value of the timestamp
The protocol can generate schedules that are not recoverable. Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if
However, it can be extended to make the schedules recoverable, TS(Tj) < TS(Tk), then any produced schedule must be equiva-
in one of several ways: lent to a serial schedule in which transaction Tj appears before
Recoverability and cascadelessness can be ensured by performing transaction Tk. The reason we have chosen Validation(Ti), rather
all writes together at the end of the transaction. The writes than Start(Ti), as the timestamp of transaction Ti is that we can
must be atomic in the fol-lowing sense: While the writes are in expect faster response time provided that conflict rates among
progress, no transaction is permitted to access any of the data transactions are indeed low.
items that have been written. The validation test for transaction Tj requires that, for all
• Recoverability and cascadelessness can also be guaranteed by transactions Ti with TS(Ti) < TS(Tj), one of the following two
using a limited form of locking, whereby reads of conditions must hold:
uncommitted items are postponed until the transaction that
updated the item commits
148
1. Finish(Ti) < Start(Tj) Since Ti completes its execution • With each transaction Ti in the system, we associate a unique
DATABASE
before Tj started, the serializability order is indeed fixed timestamp, This timestamp is assigned by the
maintained. database system before the trans-action Ti starts execution
2. The set of data items written by Ti does not intersect with • The timestamp-ordering protocol ensures that any
the set of data items read by T, and Ti completes its write conflicting read and write opera-tions are executed in
MANAGEMENT
phase before Tj starts its validation phase (Start(Tj) < timestamp order
Finish(Ti) < Validation(Tj)). This condition ensures that
Review Terms
T14 T15 • Locking mechanism
read(B) • Graph based protocol
read(B)
B := B – 50 • Timestamps
read (A) • Timestamp based protocol
A := A + 50
read (A) • Timestamp ordering protocols
(validate) • Validation protocol
display (A + B)
(validate) Students Activity
write(B) 1. Define locks in a database system?Why it is advisable
write (A)
Figure 43.5 Schedule 5, a schedule produced by using valida-

tion.
the writes of Ti and Tj do not overlap. Since the writes of Ti do
not affect the read of Tj, and since Tj cannot affect the read of Ti
the serializability order is indeed maintained.
As an illustration, consider again transactions T14 and T15.
Suppose that TS(T14) < TS(T15). Then, the validation phase 2. Define graph based protocol?
succeeds in the schedule 5 in Figure 43.5. Note that the writes to
the actual variables are performed only after the validation phase
of T15. Thus, T14 reads the old values of B and A, and this
schedule is serializable.
The validation scheme automatically guards against cascading
rollbacks, since the actual writes take place only after the
transaction issuing the write has committed. However, there is a
possibility of starvation of long transactions, due to a sequence
of conflicting short transactions that cause repeated restarts of
the long transaction. To avoid starvation, conflicting transac- 3. Define Timestamps?
tions must be temporarily blocked, to enable the long
transaction to finish.
This validation scheme is called the optimistic concurrency
control scheme since transactions execute optimistically,
assuming they will be able to finish execution and validate at the
end. In contrast, locking and timestamp ordering are pessimistic
in that they force a wait or a rollback whenever a conflict is
detected, even though there is a chance that the schedule may be
conflict serializable.
4. Define Timestamp based protocol?
Points to Ponder
• When the lock manager receives an unlock message from a
transaction, it deletes the record for that data item in the
linked list corresponding to that transaction
• If we wish to develop protocols that are not two phase, we
need additional information on how each transaction will
access the database
• One method for determining the serializability order is to
select an ordering among transactions in advance
149
5. Define Timestamp ordering protocol?
DATABASE
MANAGEMENT
6. Define validation based protocol?
150
DATABASE MANAGEMENT
151
Student Notes
DATABASE
LESSON 44:
DATABASE RECOVERY
MANAGEMENT
Lesson objectives when there is an error. Hence, the fail-stop assumption is a

• Failure reasonable one.
• Types of failures • Disk failure. A disk block loses its content as a result of
either a head crash or failure during a data transfer
• Storage structure
operation. Copies of the data on other disks, or archival
• Storage Type backups on tertiary media, such as tapes, are used to recover
• Stable-Storage Implementation from the failure.
• Data Access To determine how the system should recover from failures, we
• Recovery and Atomicity need to identify the failure modes of those devices used for
storing data. Next, we must consider how these failure modes
• Log-Based Recovery
affect the contents of the database. We can then propose
• Checkpoints algorithms to ensure database consistency and transaction
The database system must take actions in advance to ensure that atomicity despite failures. These algorithms, known as recovery
the atomicity and durability properties of transactions as a algorithms, have two parts:
computer system, like any other device, is subject to failure from 1. Actions taken during normal transaction processing to
a variety of causes: disk crash, power outage, software error, a ensure that enough information exists to allow recovery
fire in the machine room, even sabotage”. In any failure, from failures.
information may be lost. are preserved. An integral part of a
2. Actions taken after a failure to recover the database contents
database system is a recovery scheme that can restore the
to a state that ensures database consistency, transaction
database to the consistent state that existed before the failure.
atomicity, and durability.
The recovery scheme must also provide high availability; that is,
it must minimize the time for which the database is not usable Storage Structure
after a crash. The various data items in the database may be stored and
accessed in a number of different storage media. To understand
Failure Classification
how to ensure the atomicity and durability properties of a
There are various types of failure that may occur in a system,
transaction, we must gain a better under-standing of these
each of which needs to be dealt with in a different manner. The
storage media and their access methods.
simplest type of failure is one that does not result in the loss of
information in the system. The failures that are more difficult Storage Types
tom deal with are those that result in loss of information. Storage media can be distinguished by their relative speed,
Various types of failure are: capacity, and resilience to failure, and classified as volatile storage
or nonvolatile stor-age. We review these terms, and introduce
Transaction Failure
another class of storage, called stable stor-age.
There are two types of errors that may cause a transaction to fail:
• Volatile storage. Information residing in volatile storage
Logical error. The transaction can no longer continue with its
does not usually sur-vive system crashes. Examples of such
normal ex-ecution because of some internal condition, such as
storage are main memory and cache memory. Access to
bad input, data not found, overflow, or resource limit exceeded.
volatile storage is extremely fast, both because of the speed
System error. The system has entered an undesirable state (for of the memory access itself, and because it is possible to
example, deadlock), as a result of which a transaction cannot access any data item in volatile storage directly.
continue with its nor-mal execution. The transaction, however,
• Nonvolatile storage. Information residing in nonvolatile
can be executed at a later time.
storage survives sys-tem crashes. Examples of such storage
System crash There is a hardware malfunction, or a bug in the are disk and magnetic tapes. Disks are used for online
database software or the operating system, that causes the loss storage, whereas tapes are used for archival storage.Both
of the content of volatile storage, and brings transaction however, are subject to failure (for example, head crash),
processing to a halt. The content of nonvolatile storage remains which may result in loss of information. At the current
intact, and is not corrupted. state of technology, nonvolatile st-age is slower than volatile
The assumption that hardware errors and bugs in the software storage by several orders of magnitude. This is because disk
bring the system to a halt, but do not corrupt the nonvolatile and tape devices are electromechanical, rather than based en-
storage contents, is known as the fail-stop assumption. Well- tirely on chips, as is volatile storage. In database systems,
designed systems have numerous internal checks, at the disks are used for most nonvolatile storage. Other
hardware and the software level, that bring the system to a halt nonvolatile media are normally used only for backup data.
152
Flash storage, though nonvolatile, has insuffi-cient capacity of remote backup, one of the blocks is local, whereas the other
DATABASE
for most database systems. is at a remote site. An output operation is executed as follows:
• Stable storage. Information residing in stable storage is 1. Write the information onto the first physical block.
never lost (never should be taken with a grain of salt, since 2. When the first write completes successfully, write the same
theoretically never cannot be guaranteed for example, it is information onto the second physical block.
MANAGEMENT
possible, although extremely unlikely, that a black hole may
3. The output is completed only after the second write
envelop the earth and permanently destroy all data!).
completes successfully.
Although stable stor-age is theoretically impossible to
obtain, it can be closely approximated by techniques that During recovery, the system examines each pair of physical
make data loss extremely unlikely. blocks. If both are the same and no detectable error exists, then
no further actions are necessary. (Recall that errors in a disk
The distinctions among the various storage types are often less
block, such as a partial write to the block, are detected by storing
clear in practice than in our presentation. Certain systems
a checksum with each block.) If the system detects an error in
provide battery backup, so that some main memory can survive
one block, then it replaces its content with the content of the
system crashes and power failures. Alternative forms of non-
other block. If both blocks contain no detectable error, but they
volatile storage, such as optical media, provide an even higher
differ in content, then the system replaces the content of the
degree of reliability than do disks.
first block with the value of the second. This recovery procedure
Stable-Storage Implementation ensures that a write to stable storage either succeeds completely
To implement stable storage, we need to replicate the needed (that is, updates all copies) or results in no change.
information in sev-eral nonvolatile storage media (usually disk) The requirement of comparing every corresponding pair of
with independent failure modes, and to update the information blocks during recovery is expensive to meet. We can reduce the
in a controlled manner to ensure that failure during data transfer cost greatly by keeping track of block writes that are in progress,
does not damage the needed information. using a small amount of nonvolatile RAM. On recovery, only
RAID systems guarantee that the failure of a single disk (even blocks for which writes were in progress need to be compared.
during data transfer) will not result in loss of data. The The protocols for writing out a block to a remote site are similar
simplest and fastest form of RAID is the mirrored disk, which to the protocols for writing blocks to a mirrored disk system
keeps two copies of each block, on separate disks. Other forms
We can extend this procedure easily to allow the use of an
of RAID offer lower costs, but at the expense of lower
arbitrarily large number of copies of each block of stable
performance.
storage. Although a large number of copies reduces the
RAID systems, however, cannot guard against data loss due to probability of a failure to even lower than two copies-do, it is
disasters such as fires or flooding. Many systems store archival usually reasonable to simulate stable storage with only two
backups of tapes off-site to guard against such disasters. copies.
However, since tapes cannot be carried off-site continually,
updates since the most recent time that tapes were carried off- Data Access
site could be lost in such a disaster. More secure systems keep a The database system resides permanently on nonvolatile storage
copy of each block of stable storage at a remote site, writing it (usually disks), and is partitioned into fixed-length storage units
out over a computer network, in addition to storing the block called blocks. Blocks are the units of data transfer to and from
on a local disk system. Since the blocks are output to a remote disk, and may contain several data items. We shall assume that
system as and when they are output to local storage, once an no data item spans two or more blocks. This assumption is
output operation is complete, the output is not lost, even in realistic for most data-processing applications, such as our
the event of a disaster such as a fire or flood. We study such banking example.
remote’ backup systems Transactions input information from the disk to main memory,
In the remainder of this section, we discuss how storage media and then output the information back onto the disk. The input
can be protected from failure during data transfer. Block transfer and output operations are done in block units. The blocks
between memory and disk storage can result in residing on the disk are referred to as physical blocks; the blocks
residing temporarily in main memory are referred to as buffer
• Successful completion. The transferred information
blocks. The area of memory where blocks reside temporarily is
arrived safely at its des-tination.
called the disk buffer.
• Partial failure. A failure occurred in the midst of transfer,
Block movements between disk and main memory are initiated
and the destination block has incorrect information.
through the fol-lowing two operations:
• Total failure. The failure occurred sufficiently early during
1. input(B) transfers the physical block B to main memory.
the transfer that the destination block remains intact.
2. output(B) transfers the buffer block B to the disk, and
We require that, if a data-transfer failure occurs, the system replaces the appropriate physical block there.
detects it and invokes a recovery procedure to restore the block Each transaction Ti has a private work area in which copies of all
to a consistent state. To do so, the system must maintain two the data items accessed and updated by Ti are kept. The system
physical blocks for each logical database block; in the case of creates this work area when the transaction is initiated; the
mirrored disks, both blocks are at the same location; in the case system removes it when the transaction either commits or
153
aborts. Each data item X kept in the work area of transaction Ti for this difficulty is that we have modified the database without
DATABASE
is denoted by Xi. Transaction Ti interacts with the database having assurance that the transaction will indeed commit. Our
system by transferring data to and from its work area to the goal is to perform either all or no database modifications made
system buffer. We transfer data by these two operations: by Ti. However, if Ti performed multiple database modifica-
1. read(X) assigns the value of data item X to the local variable tions, several output operations may be re-quired, and a failure
may occur after some of these modifications have been made,
MANAGEMENT
Xi. It executes this operation as follows:

but before all of them are made.
a. If block B x on which X resides is not in main
memory, it issues input(B x). To achieve our goal of atomicity, we must first output informa-
tion describing the modifications to stable storage, without
b. It assigns to Xi the value of X from the buffer block.
modifying the database itself. As we shall see, this procedure
2. write(X) assigns the value of local variable Xi to data item X will allow us to output all the modifications made by a commit-
in the buffer block. ted transaction, despite failures. There are two ways to perform
It executes this operation as follows: such outputs; we study them in Sections 44.4 and 44.5. In these
a. If block B x on which X resides is not in main two sections, we shall assume that transactions are executed
memory, it issues input(Bx L serially; in other words, only a single transaction is active at a
time. We shall describe how to handle concurrently executing
b. It assigns the value of Xi to X in buffer B x.
transactions later, in Section 44.6.
Note that both operations may require the transfer of a block Log-Based Recovery
from disk to main mem-ory. They do not, however, specifically The most widely used structure for recording database modifi-
require the transfer of a block from main mem-ory to disk. cations is the log. The log is a sequence of log records, recording
all the update activities in the database. There are several types
A buffer block is eventually written out to the disk either
of log records. An update log record describes a single data-base
because the buffer man-ager needs the memory space for other
write. It has these fields:
purposes or because the database system wishes to reflect the
change to B on the disk. We shall say that the database system Transaction identifier is the unique identifier of the transaction
performs a force-output of buffer B if it issues an output(B). that performed the write operation.
When a transaction needs to access a data item X for the first • Data-item identifier is the unique identifier of the data item
time, it must execute read (X). The system then performs all written. Typically, it is the location on disk of the data item.
updates to X on Xi. After the transaction ac-cesses X for the • Old value is the value of the data item prior to the write.
final time, it must execute write(X) to reflect the change to X in • New value is the value that the data item will have after the
the database itself. write.
The output(B x) operation for the buffer block B x on which X Other special log records exist to record significant events during
resides ,does not need to take effect immediately after write(X) transaction pro-cessing, such as the start of a transaction and
is executed, since the block Bx may contain other data items that the commit or abort of a transaction. We denote the various
are still being accessed. Thus, the actual output may take place types of log records as:
later. Notice that, if the system crashes after the write(X)
• <Ti start>. Transaction Ti has started.
operation was executed but before output(Bx) was executed,
the new value of X is never written to disk and, thus, is lost. • <Ti, Xj, VI, V2>. Transaction Ti has performed a write on
data item Xj. Xj
Recovery and Atomicity
• had value VI before the write, and will have value V2 after
Consider again our simplified banking system and transaction
the write.
Ti that transfers $50 from account A to account B, with initial
values of A aI)d B being $1000 and $2000, respectively. • <Ti commit>. Transaction Ti has committed. . <Ti abort>.
Suppose that a system crash has occurred during the execution Transaction Ti has aborted.
of Ti, after output (B A) has taken place, but before output(B Whenever a transaction performs a write, it is essential that the
B) was executed, where B A and B B denote the buffer blocks log record for that write be created before the database is
on which A and B reside. Since the memory contents were lost, modified. Once a log record exists, we can output the modifica-
we do not know the fate of the transaction; thus, we could tion to the database if that is desirable. Also, we have the ability
invoke one of two possible recovery procedures: to undo a modification that has already been output to the
• Execute Ti. This procedure will result in the value of A database. We undo it by using the old-value field in log records.
becoming $900, rather than $950. Thus, the system enters For log records to be useful for recovery from system and disk
an inconsistent state. failures, the log must reside in stable storage. For now, we
• Do not execute Ti. The current system state has values of assume that every log record is written to the end of the log on
$950 and $2000 for A and B, respectively. Thus, the system stable storage as soon as it is created. In Section 44.7, we shall
enters an inconsistent state. see when it is safe to relax this requirement so as to reduce the
overhead imposed by logging. Two techniques for using the log
In either case, the database is left in an inconsistent state, and
to ensure transaction atomicity despite failures. Observe that the
thus this simple re-covery scheme does not work. The reason
log contains a complete record of all database activity. As a
154
result, the volume of data stored in the log may become <To, B, 2050>
DATABASE
unreasonably large. <To commit>
Deferred Database Modification <T 1 start>
The deferred-modification technique ensures transaction <T1, C, 600>
atomicity by recording all database modifications in the log, but
<Tl commit>
MANAGEMENT
deferring the execution of all write operations of a transaction
until the transaction partially commits. Recall that a transaction Figure 44.2 Portion of the database log corresponding to To
is said to be partially committed once the final action of the and Tl’
transaction has been ex-ecuted. The version of the deferred- appears in Figure 44.3. Note that the value of A is changed in
modification technique that we describe in this section assumes the database only after the record <To, A, 950> has been placed
that transactions are executed serially. in the log.
When a transaction partially commits, the information on the Using the log, the system can handle any failure that results in
log associated with the transaction is used in executing the the loss of informa-tion on volatile storage. The recovery
deferred writes. If the system crashes before the transaction scheme uses the following recovery procedure:
completes its execution, or if the transaction aborts, then the • redo(Ti) sets the value of all data items updated by
informa-tion on the log is simply ignored. transaction Ti to the new values.
The execution of transaction Ti proceeds as follows. Before Ti The set of data items updated by Ti and their respective new
starts its execution, a record <Ti start> is written to the log. A values can be- found in the log.
write(X) operation by Ti results in the writing of a new record
The redo operation must be independent; that is, executing it
to the log. Finally, when Ti partially commits, a record <Ti
several times must be equivalent to executing it once. This
commit> is written to the log.
characteristic is required if we are to guarantee correct behavior
When transaction Ti partially commits, the records associated even if a failure occurs during the recovery process.
with it in the log are used in executing the deferred writes. Since
After a failure, the recovery subsystem consults the log to
a failure may occur while this updating is taking place, we must
determine which trans-actions need to be redone. Transaction Ti
ensure that, before the start of these updates, all the log records
needs to be redone if and only if the log contains both the
are written out to stable storage. Once they have been written,
record <Ti start> and the record <Ti commit>. Thus, if the
the actual updating takes place, and the transaction enters the
system crashes after the transaction completes its execution, the
committed state.
recovery scheme uses the information in the log to restore the
Observe that only the new value of the data item is required by system to a previous consistent state after the transaction had
the deferred modification technique. Thus, we can simplify the completed.
general update-log record struc-ture that we saw in the previous
As an illustration, let us return to our banking example with
section, by omitting the old-value field.
transactions To and Tl executed one after the other in the order
To illustrate, reconsider our simplified banking system. Let To To followed by Tl’ Figure 44.2 shows the log that results from
be a transaction that transfers $50 from account A to account B: the complete execution of To and Tl’ Let us suppose that the
To: read (A); Log Database
A := A-50; <To start>
write(A); <To, A, 950>
read(B); <To, B, 2050>
B := B + 50; <To commit>
write(B). A = 950
Let Tl be a transaction that withdraws $100 from account C: B = 2050
T1: read(C); <Tl start>
C := C - 100; <T1, C, 600>
write(C). <Tl commit>
Suppose that these transactions are executed serially, in the order C = 600
To followed by Tl’ and that the values of accounts A, B, and C
Figure 44.3 State of the log and database corresponding. to To
before the execution took place were $1000, $2000, and $700,
and Tl’
respectively.
<T0 start> <T0 start> <T0 start>
There are various orders in which the actual outputs can take
place to both the database system and the log as a result of the < T0,A,950> < T0,A,950> < T0,A,950>
execution of To and Tl’ One such order <T0,B,2050> <T0,B,2050> <T0,B,2050>
<To start>
<To, A, 950>
155
<T0 commit> <T0 commit> <T1 start> found in the log , the system performs the operation redo(Ti)
DATABASE
<T1 start> <T1,C,600> <T1,C,600> .In other words ,it restarts the recovery action from the begin-
<T1,commit> ning. Since redo writes values to the database independent of
the values currentlyin the database, the result of a successful
(a) (b) (c)
second attempt at redo is the same as though redo has suc-
Figure 44.4 Same log shown at three different times. ceeded the first time.
MANAGEMENT
System crashes before the completion of transaction, so that we

Checkpoints
can see how the recovery technique restores the database to a
When a system failure occurs, we must consult the log to
consistent state.Assume that the crash occurs just after the log
determine those transac-tions that need to be redone and those
record for the step.
that need to be undone. In principle, we need to search the
Write(B) entire log to determine this information. There are two major
Of transaction t0 has been written to stable storage.The log at difficulties with this approach:
the time of crash appears in figure 44.4(a) When the system 1. The search process is time consuming.
comes back up, no redo actions need to be taken, since no
2. Most of the transaction that, need to be redone have already
commit record appears in the log. The values of ‘accounts A
written their updates into the database.
and B remain $1000 and $2000, respectively. The log records of
the incomplete transaction To can be deleted from the log. To reduce these types of overhead, we introduce checkpoints.
During execution, the system maintains the log. In addition,
Now, let us assume the crash comes just after the log record for
the system periodically performs checkpoints, which require the
the step
following sequence of actions to take place:
write( C)
1. Output onto stable storage all log records currently residing
of transaction Tl has been written to stable storage. In this case, in main memory.
the log at the time of the crash is as in Figure 17.4b. When the 2. Output to the disk all modified buffer blocks.
system comes back up, the operation redo(To) is performed,
since the record 3. Output onto stable storage a log record <checkpoint>.
<To commit> Transactions are not allowed to perform any update action, such
as writing to any buffer blockor writing a log record , while a
appears in the log on the disk. After this operation is executed, checkpoint is in progress.
the values of accounts A and Bare $950 and $2050, respectively.
The value of account Remains $700. As before, the log records Points to Ponder
of the incomplete transaction Tl can be deleted from the log. • There are various types of failure that may occur in a system,
Finally, assume that a crash occurs just after the log record each of which needs to be dealt with in a different manner
as transaction failure,system crash, disk failures.
<Tl commit>
• Storage media can be distinguished by their relative speed,
is written to stable storage. The log at the time of this crash is
capacity, and resilience to failure, and classified as volatile
as in Figure 44.4c. When the system comes back up, two
storage or nonvolatile stor-age
commit records are in the log: one for To and one for Tl’
Therefore, the system must perform operations redo(To) and • RAID systems guarantee that the failure of a single disk
redo(T1), in the order in which their commit records appear in (even during data transfer) will not result in loss of data
the log. After the system executes these operations, the values • When a system failure occurs, we must consult the log to
of accounts A, B, and Care $950, $2050, and $600, respectively. determine those transac-tions that need to be redone and
Finally, let us consider a case in which a second system crash those that need to be undone.
occurs during re-covery from the first crash. Some changes may Review Terms
have been made to the database as a result of the redo opera-
• Failure
tions, but all changes may not have been made. When the
System comes up after the second crash, recovery proceeds • Types of failures
exactly as in the pr_.g examples. For each commit record • Storage structure
<Ticommit> • Storage Type
Finally, let us consider a case in which a second system crash • Stable-Storage Implementation
occurs during re-covery from the first crash. Some changes may • Data Access
have been made to the database as a • Recovery and Atomicity
result of the redo operations, but all changes may not have • Log-Based Recovery
been made. When the System comes up after the second crash,
• Checkpoints
recovery proceeds exactly as in the pr_.g examples. For each
commit record
<Ticommit>
156
DATABASE
Students Activity
1. Define database recovery?
MANAGEMENT
2. What are the various kinds of failures?
3. Define log-based recovery?
4. Define checkpoints?
157
Student Notes
158
DATABASE MANAGEMENT
“The lesson content has been compiled from various sources in public domain including but not limited to the
internet for the convenience of the users. The university has no proprietary right on the same.”
Rai Technology University Campus

Dhodballapur Nelmangala Road, SH -74, Off Highway 207, Dhodballapur Taluk, Bangalore - 561204
E-mail: info@raitechuniversity.in | Web: www.raitechuniversity.in

Database Management

Uploaded by

Copyright:

Available Formats

Database Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Management

Uploaded by

Copyright:

Available Formats

?

Lab: Working over Microsoft Access

manipulation of data. In addition, the database system must

6. Explain data inconsistency with example?

7. Explain data security? Why it is needed?Explain with

8. Explain isolation & atomicity property of database?

9. Explain why redundancy should be avoided in database?

language • Query language

1. Data Manipulation is:

Table Definitions data.

3. Define physical schema and logical schema?

4. Define data independence?Explain types of data

5. Define data dictionary, meta-data?

6. Define various elements of data dictionary?

Database Manager • Granting of authorization for data access: granting

Naïve Application Sophisticated Database

• Query processor translates statements in a query

• DML precompiler converts DML statements

2. What do you understand by the structure of a database?

3. Define physical schema and logical schema?

4. Define file manager, dml compiler,data files?

model. However, instead of using a single-parent tree hierarchy,

Nevertheless, though it was a dramatic improvement, the

3. Define relational data model?

4. Define relational data model?

that can be decomposed into smaller pieces) is a relation. This

2. Define joins?What is natural join?

3. Differentiate between inner join & outer join?

ENTITY RELATIONSHIP DIAGRAM NOTATIONS

Ordinality is also closely linked to cardinality.

2. Define relationship, relationship set?

Many to one relationship

3. Differentiate between simple & composit attribute?

One to one relationship

4. Define derived attribute?

Lesson Objective • A superkey may contain extraneous attributes, and we are

3. Draw Rough ERD

E-R diagram with an attribute attached to a relationship set.

If a relationship set has also some attributes associated with it,

2. Why is redundancy a bad practice ?

• Authorization. The SQL DDL includes commands for [constraint]);

The Like pattern matching operator can also be used in the

6. Display all columns for everyone whose first name contains

10101 01-Jul-1999 Life Vest 4 125.00

2. From the items_ordered table, select the item, maximum

3. How many orders did each customer make? Use the

WHERE lastname IN (‘Hernandez’, ‘Jones’, ‘Roberts’, ‘Ruiz’);

is ANY of these values.

the items_ordered table.

Loan number Branch name Amount Customer name

The result of loan natural right outer join borrower.

Loan number Branch name Amount Customer name

The result of loan full outer join borrower using(loan-

Lesson Objectives The Intersect Operation

select customer-name from depositor)

from depositor intersect all (select customer-name

6. All “Programmer II” titles are now promoted to

7. All “Programmer” titles are now promoted to

delete from employee Students Activity