Data Base Management Systems
Data Base Management Systems
Data Base Management Systems
Management System
Version: DBMS/Handout/0307/2.1
Date: 05-03-07
Cognizant
500 Glen Pointe Center West
Teaneck, NJ 07666
Ph: 201-801-0233
www.cognizant.com
Database Management System
TABLE OF CONTENTS
Introduction ................................................................................................................................5
About this Document..................................................................................................................5
Target Audience.........................................................................................................................5
Objectives ..................................................................................................................................5
Pre-requisite ..............................................................................................................................5
Page 2
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 3
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Glossary ......................................................................................................................................87
References ..................................................................................................................................90
Websites ..................................................................................................................................90
Books.......................................................................................................................................90
Page 4
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Introduction
Target Audience
In-Campus Trainees
Objectives
Explain the need of Databases
Define the different types of Databases
Explain the Database Design Process using E-R Model and Relational Database
Concepts
Write queries using Structured Query Language
Describe the Physical File Organisation and File Access Methods
Explain the Overview of the Database Trends
Pre-requisite
Basic Knowledge of Data Processing and basics of Computer Organisation and Operating
System Concepts
Page 5
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Explain the need for Databases
Define Database and Database Management Systems
Describe File-Based Systems and their Limitations
List the features of Database Systems
Identify the different database users
Introduction
Today we live in the age where we need data for all our daily life. In the course of a day we interact
with different kinds of databases. For instance, a person withdraws money from his bank account
on his way to office, then during his office time he checks for some details about some products in
the internet and in the evening he purchases grocery items from a nearby supermarket. In this
case, the person does nothing but interact with different databases for different needs.
The basic type of database we used is the traditional database which has data in the form of text
and numbers. But with the advancement in the Computer and Communication Technology we
have new types of databases. Multimedia databases where data includes pictures, video clips and
audio along with text; Geographic Information Systems which has data about satellite pictures and
weather images; Data warehousing and Online Analytical Processing where some historical data
is analyzed using statistical tools which can be used for decision making. Though there are
different databases for various applications, it is important to get to the basics of the database
concepts. This session gives the basic concepts about the database and database management
systems.
A database is a set of related data. By data, we mean known facts that can be recorded and have
an implicit meaning. For example, an address book of a person contains the details of his friends
and relatives which have an implicit meaning. Now organizations are increasingly aware of the
importance of information in the solution of their problems. Because of decreasing cost of data
storage, organizations store increasing quantities of data and this data must be managed in the
most efficient and effective manner.
Page 6
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The sales department is responsible for selling and renting of properties. Whenever a client
approaches the sales department with a view to marketing his or her property , he is required to fill
up a form. This gives the details of the property such as address and number of rooms together
with the owner’s details. The sales department also handles inquiries from clients and a separate
form is completed for each one. With the assistance of the data processing department, the sales
department creates an information system to handle the renting of property.
The contracts department is responsible for handling the lease agreements associated with
properties for rent. Whenever a client agrees to rent a property, a form is filled in by one of the
sales staff giving the client and property details. This form is passed to the contracts department
which allocates a lease number and completes the payment and rental period details. Again, with
the assistance of the data processing department the contracts department creates an information
system to handle lease agreements. The system consists of three files storing lease, property and
client details, containing similar data held by the sales department.
The situation is illustrated in Fig 1.1. It shows each department accessing their own files through
application programs written specially for them. Each set of departmental application programs
handles data entry, file maintenance and the generation of a fixed set of specific reports.
Duplication of data
Same data is held by different programs. Space is wasted. Same item can have potentially
different values and/or different formats.
Page 7
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Data dependence
File structure is defined in the program code.
where defining a database means identify the data in terms of data types, structures and
constraints; constructing the database means to load it on a secondary storage medium;
manipulating the database means querying, generating reports, insertions, deletions and
modifications to its content.
It is assumed that operations (update, insert, retrieve, etc.) on the database can be carried out in a
simple and flexible way. Also since a database tends to be a long term resource of an
organization, it is expected that planned as well as unplanned applications can (in general) be
carried out without great difficulty.
Page 8
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
To have a controlled access to the database the dbms includes the following subsystems - a
security system,an integrity system,a concurrency control system,a recovery control system,and a
user-accessible catalog.
To summaries, a database system consists of :
The database (data)
A DBMS (software)
A DDL and a DML (Part of the DBMS)
Application programs
Self-describing nature of a database system: A DBMS catalog stores the description of the
database. The description is called meta-data). This allows the DBMS software to work with
different databases.
Insulation between programs and data: Called program-data independence. Allows changing
data storage structures and operations without having to change the DBMS access programs.
Page 9
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Data Abstraction: A data model is used to hide storage details and present the users with a
conceptual view of the database.
Support of multiple views of the data: Each user may see a different view of the database,
which describes only the data of interest to that user.
Sharing of data and multiuser transaction processing : allowing a set of concurrent users to
retrieve and to update the database. Concurrency control within the DBMS guarantees that each
transaction is correctly executed or completely aborted. OLTP (Online Transaction Processing) is
a major part of database applications.
There other main features of a database system are centralized data management, data
independence, and systems integration.
In a database system, the data is managed by the DBMS and all access to the data is through the
DBMS providing a key to effective data processing. This contrasts with conventional data
processing systems where each application program has direct access to the data it reads or
manipulates. In a conventional DP system, an organization is likely to have several files of related
data that are processed by several different application programs.
In the conventional data processing application programs, the programs usually are based on a
considerable knowledge of data structure and format. In such environment any change of data
structure or format would require appropriate changes to the application programs. These changes
could be as small as the following:
Coding of some field is changed. For example, a null value that was coded as -1 is
now coded as -9999.
A new field is added to the records.
The length of one of the fields is changed. For example, the maximum number of
digits in a telephone number field or a postcode field needs to be changed.
The field on which the file is sorted is changed.
If some major changes were to be made to the data, the application programs may need to be
rewritten. In a database system, the database management system provides the interface between
the application programs and the data. When changes are made to the data representation, the
metadata maintained by the DBMS is changed but the DBMS continues to provide data to
application programs in the previously used way. The DBMS handles the task of transformation of
data wherever necessary.
This independence between the programs and the data is called data independence. Data
independence is important because every time some change needs to be made to the data
structure, the programs that were being used before the change would continue to work. To
provide a high degree of data independence, a DBMS must include a sophisticated metadata
management system.
Page 10
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Advantages of DBMS
Control of data redundancy
Data consistency
More information from the same amount of data
Sharing of data
Improved data integrity
Improved security
Enforcement of standards
Economy of scale
Balance conflicting requirements
Improved data accessibility and responsiveness
Increased productivity
Improved maintenance through data independence
Increased concurrency
Improved backup and recovery services
Database Users
Users may be divided into those who actually use and control the content (called “Actors on the
Scene”) and those who enable the database to be developed and the DBMS software to be
designed and implemented (called “Workers Behind the Scene”).
Database administrators: responsible for authorizing access to the database, for co-ordinating
and monitoring its use, acquiring software, and hardware resources, controlling its use and
monitoring efficiency of operations.
Database Designers: responsible to define the content, the structure, the constraints, and
functions or transactions against the database. They must communicate with the end-users and
understand their needs.
End-users: they use the data for queries, reports and some of them actually update the database
content.
Categories of End-users
Casual: access database occasionally when needed
Naïve:
Naïve or Parametric: they make up a large section of the end-user population. They use
previously well-defined functions in the form of “canned transactions” against the database.
Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.
Page 11
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Sophisticated: these include business analysts, scientists, engineers, others thoroughly familiar
with the system capabilities. Many use tools in the form of software packages that work closely
with the stored database.
3. A DBMS is a set of programs that serve as an interface between application programs and a
database (T/F) True
5. The database administrator creates and maintains edit controls regarding changes and
additions to the database. (T/F) True
Page 12
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Explain the Three – Level Architecture of DBMS
Explain the functions of Database Systems
Describe the Overall System Architecture
DBMS Architecture
Definitions
Entity
An entity is a distinct object (a person, place, thing concept or event) in the organization that is to
be represented in the database.
Attribute
An attribute is a property that describes various characteristics of an entity.
Relationship
A relationship is an association between entities.
The major purpose of a database system is to provide users with a view of the system they need
and understand. The system hides certain details of how data is stored, created and maintained.
Complexity should also be hidden from database users. This is achieved through layering. The
three levels are:
External Level
Conceptual Level
Internal Level
Database change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall
design of the database is called the database schema.
Page 13
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
A DBMS provides a mapping and conversion between these schemas speedily and correctly. The
way users perceive the data is called the external level. The way DBMS and the operating system
perceive the data is the internal level, where the data is actually stored using the data structures
and file organizations. The conceptual level provides both the mapping and the desired
independence between the external and internal levels.
The figure below gives an illustration of the three- level architecture of DBMS.
The external level gives the user’s view of the database. This level describes that part of the
database that is relevant to each user. The external level consists of a number of different external
views of the database. Each user has a view of the ‘real world’ represented in a form that is
familiar for that user. The external view includes only those entities, attributes and relationships in
the ‘real world’ that the user is interested. Other entities, attributes and relationships that are not of
interest may be represented in the database, but the user will be unaware of them. For example, in
a university database, a department head may only be interested in the departmental finances and
student enrolments but not the library information. The librarian would not be expected to have any
interest in the information about academic staff. The payroll office would have no interest in
student enrolments.
The conceptual level gives the community view of the database. This level describes what data is
stored in the database and the relationships among the data. This level contains the logical
structure of the entire database. It represents all entities, their attributes and their relationships, the
constraints on the data, security and integrity information. However, this level should not contain
any storage-dependent details. For instance, the description of an entity should contain only data
types of attributes (for example, integer, real, character) and their length (such as the maximum
number of digits or characters), but not any storage considerations such as the number of bytes
occupied.
Page 14
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The internal level gives the physical representation of the database on the computer. This level
describes how the data is stored in the database. It covers the data structures and file
organizations used to store data on storage devices. It interfaces with the operating system access
methods to place the data on the storage devices, build the indexes, and retrieve the data and so
on. It is also concerned with data compression and data encryption techniques.
Assuming the three level view of the database, a number of mappings are needed to enable the
users working with one of the external views.
For example, the payroll office of XYZ Company may have an external view of the database that
consists of the following information only:
Staff number, name and address.
Staff tax information e.g. number of dependents.
Staff employment status, salary level, leave information etc.
The conceptual view of the database may contain academic staff, general staff, casual staff etc. A
mapping will need to be created where all the staff in the different categories is combined into one
category for the payroll office. The conceptual view would include information about each staff's
position, the date employment started, full-time or part-time, etc . This will need to be mapped to
the salary level for the salary office. Also, if there is some change in the conceptual view, the
external view can stay the same if the mapping is changed.
Data Independence
A major objective for the three-level architecture is to provide data independence, which means
that upper levels are unaffected by changes to lower levels. There are two kinds of data
independence: logical and physical.
Logical data independence refers to the immunity of the external schemas to changes in the
conceptual schema. Changes to the conceptual schema such as the addition or removal of new
entities, attributes or relationships, should be possible without having to change existing external
schemas or having to rewrite application programs.
Physical data independence refers to the immunity of the conceptual schema to changes in the
internal schema.
Changes to the internal schema, such as using different file orgainsations or storage structures,
using different storage devices, modifying indexes, or hashing algorithms, should be possible
without having to change the conceptual or external schemas.
The figure below illustrates where each type of data independence occurs in relation to the three-
level architecture.
Page 15
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Functions of DBMS
Data storage, retrieval and update: A DBMS provides users with the ability to store, retrieve and
update data in the database.
A user-accessible catalog: A DBMS provides a catalog in which descriptions of data items are
stored and which is accessible to users.
Transaction support: A DBMS provides a mechanism which will ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.
Concurrency control services: A DBMS provides a mechanism to ensure that the databse is
updated correctly when multiple users are updating the database concurrently.
Recovery services: A DBMS provides a mechanism for recvering the databse in the event that
the database is damaged in any way.
Authorization services: A DBMS provides a mechanism to ensure that only authorized users can
access the database.
Support for data communication: A DBMS must be capable of integrating with communication
software
Integrity services: A DBMS provides a means to ensure that both the data in the database and
changes to the data follow certain rules.
Page 16
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The storage manager is important because databases typically require a large amount of storage
space. The query processor is important because it helps the database system simplify and
facilitate access to data. It is the job of the database system to translate updates and queries
written in a nonprocedural language, at the logical level into an efficient sequence of operations at
the physical level.
The figure above shows the overall system architecture. The subsystems are transaction manager,
query processor and storage manager.
Page 17
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 18
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Define the various types of databases
Hierarchical Databases
The Hierarchical model is the oldest of the database models, and unlike the other data models,
does not have a well documented history of its conception and initial release. It is derived from the
Information Management Systems (IMS) of IBM in the 1950's and 60's.
It was adopted by many banks and insurance companies who are still running it as legacy systems
to this day. Hierarchical database systems can also be found in inventory and accounting systems
used by government departments and hospitals.
Structure:
As its name implies, the Hierarchical Database Model defines hierarchically-arranged data.
Perhaps the most intuitive way to visualize this type of relationship is by visualizing an upside
down tree of data. In this tree, a single table acts as the "root" of the database from which other
tables "branch" out.
Relationships in such a system are thought of in terms of children and parents such that a child
may only have one parent but a parent can have multiple children. Parents and children are tied
together by links called "pointers" (physical addresses inside the file system). A parent will have a
list of pointers to each of their children.
This child/parent rule assures that data is systematically accessible. To get to a low-level record,
you start at the root and work your way down through the tree until you reach your target. Of
course, as you might imagine, one problem with this system is that the user must know how the
tree is structured in order to find anything.
Page 19
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The hierarchical model however, is much more efficient than the flat-file model because there is
not as much need for redundant data. If a change in the data is necessary, the change might only
need to be processed once. Consider the student flat file
As mentioned, this flat file database would store an excessive amount of redundant data. If we
implemented this in a hierarchical database model, we would get much less redundant data.
Consider the following hierarchical database schema:
In this schema, to access the mid term marks of a student, we have to traverse from the top, ie the
course record, identify the course, then move down to student record and then finally we retrieve
the mid term marks of the student.
Page 20
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Limitations
However, the hierarchical database model has some serious problems. For one, we cannot add a
record to a child until it has already been incorporated into the parent. This might be troublesome
if, for example, you wanted to add a student who had not yet signed up for any courses.
The hierarchical database model still creates repetition of data within the database. You might
imagine that in the database system shown above, there may be a higher level that includes
multiple course. In this case, there could be redundancy because students would be enrolled in
several courses and thus each "course tree" would have redundant student information.
Redundancy would occur because hierarchical databases handle one-to-many relationships well
but do not handle many-to-many relationships well. This is because a child may only have one
parent. However, in many cases you will want to have the child be related to more than one
parent. For instance, the relationship between student and class is a "many-to-many". Not only can
a student take many subjects but a subject may also be taken by many students. How would you
model this relationship simply and efficiently using a hierarchical database? The answer is that it is
possible but it is very complex. It can be done by having intermediate connecting nodes called the
Virtual Parent-Child relationship.
Faced with these serious problems, the database community came up with the network model.
Network Databases
The Network database model was first introduced in 1971 by CODASYL Data Base Task Group
and because of this is sometimes called the DBTG Model. The concept of the network model
comes with the idea of IDS (Integrated Data Store) database.
It is called the Network Model because it represents the data it contains in the form of a network of
records and sets which are related to each other, forming a network of links.
Key Concepts:
Records are sets of related data values. They store the name of the record type, the attributes
associated with it and the format for these attributes
Page 21
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Student
Surname character 32
Name character 32
Course character 30
Year Int
Fig 3.4: Network Data Model
Record Types, are set of records of same type. These are the equivalent of tables in the relational
model.
Set Types, are named, 1:N relationships between 2 record types. These do not have a direct
parallel in the relational model, the closest comparison is to a query statement which has joined
two tables together. This makes the network model faster to run certain queries. An example of a
set type would been the relationship between a university department and the students in it. The
network model uses a Bachman diagram to represent these relationships as we can see below,
Department
|
Associated with
|
\/
Student
This relationship is called "Associated with" and is used to set the relationship between the
departments in the university, and the individual students in them.
Page 22
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The network model is not commonly used today to design database systems, however, there are a
few instances of it being used by companies as part of a legacy system. Because it represents
data in a format closer to the way in which it stores it than the other models, it can be faster than
relational systems. Unfortunately, this increase in speed is at the cost of its adaptability.
Relational Databases
The Relational Model is the data model which is based on the relational algebra of mathematics. It
was first outlined in a paper published by Ted Codd in 1970. It is used by the majority of business
community today.
Structure:
The relational model is based on Set Theory and Predicate Logic. Relational databases try to hide
as much of the implementation detail from their users as possible.
The use of set theory allows data to be structured in a series of tables, which have columns
representing the variables and rows that contain specific instances of data. These tables are
organized using Normal Forms.
For example - in a university environment we may have a table with columns labeled surname,
name, address, course, year etc. and rows with individual students information. E.g.
Operations, which can be carried out on the data, include insert, query and delete commands.
Query operations are the most interesting as there are often several ways to obtain the same
information, but with different processing time requirements to gather the information.
Users can also query several tables at the same time using a join command.
The join command is used to merge the data in two or more tables together according to a
common value. For example, in one table, which we will call employees, we can store an
employees details.
Page 23
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The following are the few terms associated with object technology:
Object Type
Is a type of entity in the real world, either real or abstract about which we store data Includes a
specification of the types of operations that can be performed to manipulate the data stored about
that object type.
Operation
Is an activity that is performed to access or manipulate the data associated with an object type.
Operations associated with an object type reference only the data associated with that object type
and not the data in any other object type
Method
It specifies the way in which an operation that is performed on the data associated
with an object type is encoded in computer software.
It consists of procedures that can be executed in a computer to access the data
associated with an object type.
Class
It is a computer implementation of an object type
It includes the data structure that defines the data associated with the object type and
a set of methods specifying all the operations that can be performed on the data
associated with that class.
Page 24
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Object
An object is an instance of an object type.
Consists of a description of the set of data items associated with the object instance
Consists of a description of the computer code that implements the methods
associated with the object’s type.
Encapsulation
Refers to the way in which the object’s data and the operations that can be performed
on that data are packaged together.
It hides the implementation details of an object from its users
Inheritance
Refers to the characteristic of an object type that allows it to take on (inherit) the
properties of its parent class including the data structures and the methods used by
the parent class
An object type can override an inherited property with data structures and methods of
its own.
Page 25
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
List the steps in Database Design Process.
Design Conceptual Database using ER Model
Explain the representations in a E-R Diagram
The database design process starts with identification of that part of the real world which needs to
be converted into an Information System. Database planning should include development of
standards that govern:
How data will be collected,
How the format should be specified,
What necessary documentation will be needed,
How design and implementation should proceed.
Requirements Collection and Analysis phase is the Process of collecting and analyzing information
about the part of organization to be supported by the database system, and using this information
to identify users’ requirements of new system. Information is gathered for each major user view
including:
A description of data used or generated;
Details of how data is to be used/generated;
Any additional requirements for new database system.
Data model representing single user view (or a subset of all user views) is called a local data
model.
Each model includes diagrams and documentation describing requirements for one or more but
not all user views of database.
Database Design phase refers to the process of creating a design for a database that will support
the enterprise’s mission statement of the enterprise and mission objectives for the required
database system.
Page 26
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Conceptual Database Design refers to the process of constructing a model of the data used in an
enterprise, independent of all physical considerations. Data model is built using the information in
users’ requirements specification. Conceptual data model is the source of information for logical
design phase.
Logical Database Design refers to the process of constructing a model of the data used in an
enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS
and other physical considerations.
Physical Database Design refers to the process of producing a description of the database
implementation on secondary storage. It describes base relations, file organizations, and indexes
used to achieve efficient access to data. Also describes any associated integrity constraints and
security measures. Tailored to a specific DBMS system.
Page 27
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
An entity set is a set of entities of the same type that share the same properties.
For example:
Set of all persons, companies, trees, holidays
Example:
customer=(customer-name, social-security, customer-street, customer-city)
account = (account-number, balance)
Attribute
An attribute describes the property of an entity or a relationship type.
Attribute Domain
Set of allowable values for one or more attributes.
Types of Attribute
Simple and composite attributes.
Single-valued and multi-valued attributes.
Null attributes.
Derived attributes.
NULL Attribute:
Represents value for an attribute that is currently unknown or not applicable for tuple.
Derived Attribute
Represents a value that is derivable from value of a related attribute, or set of attributes, not
necessarily in the same entity type.
Page 28
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Relational Keys
Superkey
a. An attribute or a set of attributes that is used to uniquely identify a tuple within a
relation.
Candidate Key
b. Superkey (K) such that no proper subset is a superkey within the relation.
c. In each tuple of R, values of K uniquely identify that tuple (uniqueness).
d. No proper subset of K has the uniqueness property (irreducibility).
Primary Key
e. Candidate key selected to identify tuples uniquely within relation.
Alternate Keys
f. Candidate keys that are not selected to be primary key.
Foreign Key
g. Attribute, or set of attributes, within one relation that matches candidate key of
some (possibly same) relation.
The Fig 4.1 illustrates an Entity-Relationship Diagram with all the entities, attributes and
relationships.
Fig 4.1: ER diagram of Staff and Branch entities and their attributes
Page 29
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
X (n,m)
Person Entity Type R
Relationship X ⊂ Y
Is-insured
Type
Entity type X is a subtype of type Y
Wedding Weak
Entity Type T1 T2
Structural Constraints
Cardinality: Describes maximum number of possible relationship occurrences for an entity
participating in a given relationship type.
Page 30
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Specialization / Generalization
Associated with special types of entities known as superclasses and subclasses, and the process
of attribute inheritance
Superclass
h. An entity type that includes one or more distinct subgroupings of its occurrences.
Subclass
i. A distinct subgrouping of occurrences of an entity type.
Superclass/subclass relationship is one-to-one (1:1). Superclass may contain
overlapping or distinct subclasses. Not all members of a superclass need be a
member of a subclass.
Attribute Inheritance
j. An entity in a subclass represents same ‘real world’ object as in superclass, and
may possess subclass-specific attributes, as well as those associated with the
superclass.
Page 31
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Fig 4.5: Specialization/generalization of Staff entity into subclasses representing job roles
Participation constraint :
Determines whether every member in superclass must participate as a member of a
subclass.
May be mandatory or optional.
Disjoint constraint:
Describes relationship between members of the subclasses and indicates whether
member of a superclass can be a member of one, or more than one, subclass.
May be disjoint or nondisjoint.
Page 32
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Mapping Cardinalities
Express the number of entities to which another entity can be associated via a
relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following
types:
k. One to one
Page 33
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
l. One to many
m. Many to one
n. Many to many
We distinguish among these types by drawing either a directed line (→), signifying
“one,” or an undirected line (—), signifying “many,” between the relationship set and
the entity set.
One-To-One Relationship
In the one-to-many relationship (a), a loan is associated with at most one customer via
borrower, a customer is associated with several (including 0) loans via borrower
In the many-to-one relationship (b), a loan is associated with several (including 0)
customers via borrower, a customer is associated with at most one loan via borrower
Page 34
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Many-To-Many Relationship
In a certain educational institution, a student can register for several courses, for instance, a
course in Calculus, Algebra, or Spanish History. A student has to register for at least one course.
Every course offered has a code and a title, and assigned to a classroom, a day of the week, and
a time at which it is conducted. Only one instructor teaches a course. However, an instructor can
teach several courses.
A student can be a day scholar or a resident scholar. A resident scholar is assigned to a hostel,
and a hostel room. Day scholars are assigned lockers to store their belongings while on campus.
During the course, the student is required to appear in several examinations. A course must have
at least one examination; a course can have several examinations. The examinations may be of
many types – tests, quizzes, seminars and projects. The date, time and place of these
examinations are assigned at the beginning of the course. The instructors evaluate the
examinations. Several instructors may correct the same set of answer scripts belonging to a
course. An instructor may have to correct the answer scripts for more than one examination. An
instructor may not correct any answer scripts. A record of the student’s marks has to be
maintained.
Page 35
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Explain Relational Data Model
Implement Relational Operators in your queries
Convert E-R Model to Tables
Tables are required to have at least one column. Tables are not required to have rows. A table
with no rows is called an empty table. The process of inserting tuples for the very first time into a
table is called populating the table.
For each column of a table there is a set of possible values called its domain. The domain contains
all permissible values that can appear under that column. The order of the rows is irrelevant since
they are identified by their content and not by their position within the table. No two rows or tuples
are identical to each other in all their entries.
The notion of keys is the fundamental concept in the relational model. It provides the basic
mechanism for retrieving tuples within any table of the database. To distinguish a candidate key,no
two different tuples of the relation will have identical entries in all attributes of the key. The number
of attributes that comprises the key must be minimal
Since a relation may have more than one candidate key, one of these candidate keys should be
designated as the primary key (PK) of the relation. The values of the primary key can be used as
the identification and the addressing mechanism of the relation. Once a primary key has been
selected, the remaining candidate keys, if they exist are called alternate keys.
Page 36
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
A Relational Model allows only one primary key per table. A primary key may be composed of a
single attribute (single primary key) or more than one attribute (composite primary key)
.Attributes that are part of any key are called prime attributes. Since the primary key is
used to identify the tuples or rows of a relation, none of its attributes may be NULL. In a relation,
the NULL value is used to represent missing information, unknown, or inapplicable data. A NULL
value is not a zero value A NULL value doesn’t represent a particular value within the computer.
This imposes an additional condition or constraint on the keys known as the integrity constraint.
Because columns that have the same underlying domain can be used to relate tables of a
database, the concept of foreign key (FK) allows the DBMS to maintain consistency among the
rows of two relations or between the rows of the same relation. The attributes of a FK have the
same underlying domain as the set of attributes defined for the PK.
The FK values in any tuple of a relation are either NULL or must appear as the PK values of a
tuple of the corresponding relation.
Relational Operators
Relational Model is based on the relational algebra of mathematics .Relational operators are part
of the relational algebra which are mainly used to retrieve data from the tables.
Page 37
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Restrict
Works on a single relation R and defines a relation that contains only those tuples (rows) of R that
satisfy the specified condition (predicate).
Project
Projection
Works on a single relation R and defines a relation that contains a vertical subset of R, extracting
the values of specified attributes and eliminating duplicates.
Product
Works on two or more relations and produces a Cartesian product of the relations as the result.
Union
Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R
and S, duplicate tuples being eliminated. To be union-compatible both the tables should have the
same set of attributes. R and S must be union-compatible.
Intersect
Intersection
Defines a relation consisting of the set of all tuples that are in both R and S.
R and S must be union-compatible.
Difference
Works on two relations and retrieves data in the first relation which is not in the second relation.
Defines a relation consisting of the tuples that are in relation R, but not in S.
R and S must be union-compatible.
Product
Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation
S.
Join Operations
Join is a derivative of Cartesian product. Equivalent to performing a Selection, using join predicate
as selection formula, over Cartesian product of the two operand relations.
Division
Defines a relation over the attributes C that consists of set of tuples from R that match combination
of every tuple in S.
Page 38
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Converting Relationships
The relationships representation is based on cardinality and degree
Cardinalities - 1:1, 1:M, M:N
Unary 1:1
Example: Employee who are also a couple
The primary key itself will become the foreign key in the same table
Unary 1:N
Example: Employee who is also a manager
The primary key itself will become the foreign key in the same table
Unary M:N
Example: Employee Guarantor
There will be two resulting tables. One to represent the entity and another to represent the many
side
Binary 1:1
Example: Employee head of Department
The primary key of the partial participant will become the foreign key of the total participant. The
primary key of either of the participants can become a foreign key in the other if the participation
types are uniform (both total or both partial)
Page 39
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Binary 1:M
Example: Teacher teaches Subject
The primary key of the relation on the “1” side of the relationship becomes a foreign key in the
relation on the “M” side
Binary M:N
Example: Books borrowed by Employee
A new table is created to represent the relationship
Contains two foreign keys – one from each of the participants in the relationship
The primary key of the new table is the combination of the two foreign keys
Ternary Relationship
Example: Doctor prescription contains Medicine given to Patients
The new table contains three foreign keys – one from each of the participating entities
The primary key of the new table is the combination of all three foreign keys
Page 40
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Session 6: Normalization
Learning Objectives
After completing this chapter, you will be able to:
Define Normalisation
Explain the role of Normalization in database design
Explain the steps in Normalization: 1NF, 2NF, 3NF, BCNF
Perform Normalisation
Introduction
We have considered the primary features of the relational database model. We have noted that
relations that form the database must satisfy some properties, for example, relations have no
duplicate tuples, tuples have no ordering associated with them, and each element in the relation is
atomic. Relations that satisfy these basic requirements may still have some undesirable properties,
for example, data redundancy and update anomalies. We illustrate these properties and study how
relations may be transformed or decomposed (or normalized) to eliminate them. Most such
undesirable properties do not arise if the database modeling has been carried out very carefully
using some technique like the Entity-Relationship Model
The central concept is the notion of functional dependency which depends on understanding the
semantics of the data and which deals with what information in a relation is dependent on what
other information in the relation. We will define the concept of functional dependency and discuss
how to reason with the dependencies. We will then show how to use the dependencies information
to decompose relations whenever necessary to obtain relations that have the desirable properties
that we want without loosing any of the information in the original relations.
Description of Normalization
Normalization is the process of organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect the
data and to make the database more flexible by eliminating two factors: redundancy and
inconsistent dependency.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more
than one place must be changed, the data must be changed in exactly the same way in all
locations. A change in the customer is much easier to implement if that data is stored only in the
Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table
for the address of a particular customer, it may not make sense to look there for the salary of the
employee who calls on that customer. The employee's salary is related to, or dependent on, the
employee and thus should be moved to the Employees table. Inconsistent dependencies can
make data difficult to access; the path to find the data may be missing or broken.
Page 41
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
There are a few rules for database normalization. Each rule is called a "normal form." If the first
rule is observed, the database is said to be in "first normal form." If the first three rules are
observed, the database is considered to be in "third normal form." Although other levels of
normalization are possible, third normal form is considered the highest level necessary for most
applications.
For example, in an Employee Recruitment table, a candidate's university name and address may
be included. But you need a complete list of universities for group mailings. If university information
is stored in the Candidates table, there is no way to list universities with no current candidates.
Create a separate Universities table and link it to the Candidates table with a university code key.
Table 6.1
Sno sname address Cno cname instructor office
85001 Smith 1, Main CP302 Database Gupta 102
85001 Smith 1, Main CP303 Communication Wilson 102
85001 Smith 1, Main CP304 Software Engg. Williams 1024
85005 Jones 12, 7th CP302 Database Gupta 102
The above table satisfies the properties of a relation and is said to be in first normal form (or 1NF).
Conceptually it is convenient to have all the information in one relation since it is then likely to be
easier to query the database. But the above relation has the following undesirable features:
1. Repetition of information --- A lot of information is being repeated. Student name, address,
course name, instructor name and office number are being repeated often. Every time we wish
to insert a student enrolment, say, in CP302 we must insert the name of the course CP302 as
well as the name and office number of its instructor. Also every time we insert a new
enrolment for, say Smith, we must repeat his name and address. Repetition of information
results in wastage of storage as well as other problems.
2. Update Anomalies --- Redundant information not only wastes storage but makes updates
more difficult since, for example, changing the name of the instructor of CP302 would require
that all tuples containing CP302 enrolment information be updated. If for some reason, all
tuples are not updated, we might have a database that gives two names of instructor for
subject CP302. This difficulty is called the update anomaly.
3. Insertion Anomalies -- Inability to represent certain information --- Let the primary key of the
above relation be (sno, cno). Any new tuple to be inserted in the relation must have a value for
the primary key since existential integrity requires that a key may not be totally or partially
NULL. However, if one wanted to insert the number and name of a new course in the
database, it would not be possible until a student enrolls in the course and we are able to
insert values of sno and cno. Similarly information about a new student cannot be inserted in
the database until the student enrolls in a subject. These difficulties are called insertion
anomalies.
4. Deletion Anomalies -- Loss of Useful Information --- In some instances, useful information may
be lost when a tuple is deleted. For example, if we delete the tuple corresponding to student
85001 doing CP304, we will lose relevant information about course CP304 (viz. course name,
instructor, office number) if the student 85001 was the only student enrolled in that course.
Similarly deletion of course CP302 from the database may remove all information about the
student named Jones. This is called deletion anomalies.
Page 42
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The above problems arise primarily because the relation student has information about students
as well as subjects. One solution to deal with the problems is to decompose the relation into two or
more smaller relations.
Decomposition may provide further benefits, for example, in a distributed database different
relations may be stored at different sites if necessary. Of course, decomposition does increase the
cost of query processing since the decomposed relations will need to be joined, sometime
frequently.
The above relation may be easily decomposed into three relations to remove most of the above
undesirable properties:
S (sno, sname, address)
C (cno, cname, instructor, office)
SC (sno, cno)
Such decomposition is called normalization and is essential if we wish to overcome undesirable
anomalies. As noted earlier, normalization often has an adverse effect on performance. Data
which could have been retrieved from one relation before normalization may require several
relations to be joined after normalization. Normalization does however lead to more efficient
updates since an update that may have required several tuples to be updated before normalization
could well need only one tuple to be updated after normalization.
Although in the above case we are able to look at the original relation and propose a suitable
decomposition that eliminates the anomalies that we have discussed, in general this approach is
not possible. A relation may have one hundred or more attributes and it is then almost impossible
for a person to conceptualize all the information and suggest a suitable decomposition to
overcome the problems. We therefore need an algorithmic approach to finding if there are
problems in a proposed database design and how to eliminate them if they exist.
There are several stages of the normalization process. These are called the first normal form
(1NF), the second normal form (2NF), the third normal form (3NF), Boyce-Codd normal form
(BCNF), the fourth normal form (4NF) and the fifth normal form (5NF). For all practical purposes,
3NF or the BCNF are quite adequate since they remove the anomalies discussed above for most
common situations. It should be clearly understood that there is no obligation to normalize
relations to the highest possible level. Performance should be taken into account and this may
result in a decision not to normalize, say, beyond second normal form.
Intuitively, the second and third normal forms are designed to result in relations such that each
relation contains information about only one thing (either an entity or a relationship). That is, non-
key attributes in each relation must provide a fact about the entity or relationship that is being
identified by the key. Again, a sound E-R model of the database would ensure that all relations
either provide facts about an entity or about a relationship resulting in the relations that are
obtained being in 3NF.
Page 43
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
It should be noted that decomposition of relations has to be always based on principles that ensure
that the original relation may be reconstructed from the decomposed relations if and when
necessary. If we are able to reduce redundancy and not loose any information, it implies that all
that redundant information can be derived given the other information in the database. Therefore
information that has been removed must be related or dependent on other information that still
exists in the database. That is why the concept of redundancy is important. Careless
decomposition of a relation can result in loss of information. We will discuss this in detail later.
Role of Normalization
In the E-R model, a conceptual schema using an entity- relationship diagram is built and then
mapped to a set of relations. This technique ensures that each entity has information about only
one thing and once the conceptual schema is mapped into a set of relations, each relation would
have information about only one thing. The relations thus obtained would normally not suffer from
any of the anomalies. This bottom-up approach is likely to lead to a relation that is likely to suffer
from all the problemsn. For example, the relation is highly likely to have redundant information and
update, deletion and insertion anomalies. Normalization of such large relation will then be essential
to avoid (or at least minimize) these problems.
Now to define the normal forms more formally, we first need to define the concept of functional
dependence.
Functional Dependency
Consider a relation R that has two attributes A and B. The attribute B of the relation is functionally
dependent on the attribute A if and only if for each value of A no more than one value of B is
associated. In other words, the value of attribute A uniquely determines the value of B and if there
were several tuples that had the same value of A then all these tuples will have an identical value
of attribute B.
For example, in the student database that we have discussed earlier, we have the following
functional dependencies:
sno -> sname
sno -> address
cno -> cname
cno -> instructor
instructor -> office
These functional dependencies imply that there can be only one name for each sno, only one
address for each student and only one subject name for each cno. It is of course possible that
several students may have the same name and several students may live at the same address. If
we consider cno -> instructor, the dependency implies that no subject can have more than one
instructor (perhaps this is not a very realistic assumption). Functional dependencies therefore
place constraints on what information the database may store. In the above example, one may be
wondering if the following FDs hold
sname -> sno
cname -> cno
Page 44
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
A relation is in 1NF if and only if all underlying domains contain atomic values only.
The first normal form deals only with the basic structure of the relation and does not resolve the
problems of redundant information or the anomalies discussed earlier. All relations discussed in
these notes are in 1NF.
The attribute dob is the date of birth and the primary key of the relation is sno with the functional
dependencies sno -> sname and sno -> dob. The relation is in 1NF as long as dob is considered
an atomic value and not consisting of three components (day, month, year). The above relation of
course suffers from all the anomalies that we have discussed earlier and needs to be normalized.
(add example with date of birth)
A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate
key of the relation.
To understand the above definition of 2NF we need to define the concept of key attributes. Each
attribute of a relation that participates in at least one candidate key of is a key attribute of the
relation. All other attributes are called non-key.
The concept of 2NF requires that all attributes that are not part of a candidate key be fully
dependent on each candidate key. If we consider the relation
student (sno, sname, cno, cname)
Page 45
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
and assume that (sno, cno) is the only candidate key (and therefore the primary key), the relation
is not in 2NF since sname and cname are not fully dependent on the key. The above relation
suffers from the same anomalies and repetition of information as discussed above since sname
and cname will be repeated. To resolve these difficulties we could remove those attributes from
the relation that are not fully dependent on the candidate keys of the relations. Therefore we
decompose the relation into the following projections of the original relation:
S1 (sno, sname)
S2 (cno, cname)
SC (sno, cno)
Use an example that leaves one relation in 2NF but not in 3NF.
We may recover the original relation by taking the natural join of the three relations.
If however we assume that sname and cname are unique and therefore we have the following
candidate keys
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
The above relation is now in 2NF since the relation has no non-key attributes. The relation still has
the same problems as before but it then does satisfy the requirements of 2NF. Higher level
normalization is needed to resolve such problems with relations that are in 2NF and further
normalization will result in decomposition of such relations.
A relation R is in third normal form if it is in 2NF and every non-key attribute of R is non-transitively
dependent on each candidate key of R.
To understand the third normal form, we need to define transitive dependence which is based on
one of Armstrong's axioms. Let A, B and C be three attributes of a relation R such that A -> B and
B -> C. From these FDs, we may derive A -> C. As noted earlier, this dependence A -> C is
transitive.
Page 46
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
The 3NF differs from the 2NF in that all non-key attributes in 3NF are required to be directly
dependent on each candidate key of the relation. The 3NF therefore insists, in the words of Kent
(1983) that all facts in the relation are about the key (or the thing that the key identifies), the whole
key and nothing but the key. If some attributes are dependent on the keys transitively then that is
an indication that those attributes provide information not about the key but about a kno-key
attribute. So the information is not directly about the key, although it obviously is related to the key.
Consider the following relation
Assume that cname is not unique and therefore cno is the only candidate key. The following
functional dependencies exist
We can derive cno -> office from the above functional dependencies and therefore the above
relation is in 2NF. The relation is however not in 3NF since office is not directly dependent on cno.
This transitive dependence is an indication that the relation has information about more than one
thing (viz. course and instructor) and should therefore be decomposed. The primary difficulty with
the above relation is that an instructor might be responsible for several subjects and therefore his
office address may need to be repeated many times. This leads to all the problems that we
identified at the beginning of this chapter. To overcome these difficulties we need to decompose
the above relation in the following two relations:
s(cno, cname)
inst(instructor, office)
si(cno, instructor)
The decomposition into three relations is not necessary since the original relation is based on the
assumption of one instructor for each course.
The 3NF is usually quite adequate for most relational database designs. There are however some
situations, for example the relation student(sno, sname, cno, cname) discussed in 2NF above,
where 3NF may not eliminate all the redundancies and inconsistencies. The problem with the
relation student(sno, sname, cno, cname) is because of the redundant information in the candidate
keys. These are resolved by further normalization using the BCNF.
Page 47
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Since the relation has no non-key attributes, the relation is in 2NF and also in 3NF, in spite of the
relation suffering the problems that we discussed at the beginning of this chapter.
The difficulty in this relation is being caused by dependence within the candidate keys. The second
and third normal forms assume that all attributes not part of the candidate keys depend on the
candidate keys but does not deal with dependencies within the keys. BCNF deals with such
dependencies.
It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF
relation is not in BCNF and this happens only if
a. the candidate keys in the relation are composite keys (that is, they are not single
attributes),
b. there is more than one candidate key in the relation, and
c. the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys and the keys
are composite and overlapping. Consider for example, the relationship
Let us assume that the relation has the following candidate keys:
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in 3NF but not in
BCNF because there are dependencies
Page 48
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
These difficulties may be overcome by decomposing the above relation in the following three
relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
We now have a relation that only has information about students, another only about subjects and
the third only about enrolments. All the anomalies and repetition of information have been
removed.
Page 49
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
2. Classify the following relations as unnormalised, INF, 2NF or 3NF. State any assumptions you
make. If the relation is not in 3NF, normalise to 3NF.
a. EMPLOYEE (EMP# , EMPNAME, JOBCODE)
b. EMPLOYEE (EMP# , EMPNAME, (JOBCODE, YEARS))
c. EMPLOYEE (EMP# , EMPNAME, JOBCODE, JOB DESCRIPTION)
d. EMPLOYEE (EMP# , EMPNAME, PROJECT , HRS WORK)
Page 50
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Write SQL – Data Definition Language
Write SQL – Data Manipulation Language
Perform data retrieval using SQL Select statements
Introduction to SQL
Structured Query Language - (SQL) is the most widely used commercial relational database
language. The SQL has several parts:
DML – The Data Manipulation Language (DML)
DDL – The Data Definition Language (DDL)
Embedded and dynamic SQL
Security
Transaction management
Client-server execution and remote database access
Objectives of SQL
Ideally, database language should allow user to:
Create the database and relation structures;
Perform insertion, modification, deletion of data from relations;
Perform simple and complex queries.
It must perform these tasks with minimal user effort and command structure/syntax must be easy
to learn.
SQL is relatively easy to learn:
It is non-procedural - you specify what information you require, rather than how to get
it;
It is essentially free-format.
History of SQL
In 1974, D. Chamberlin (IBM San Jose Laboratory) defined language called
‘Structured English Query Language’ (SEQUEL).
Still pronounced ‘see-quel’, though official pronunciation is ‘S-Q-L’.
In late 70s, ORACLE appeared and was probably first commercial RDBMS based on
SQL.
In 1987, ANSI and ISO published an initial standard for SQL.
Page 51
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 52
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Required Data
position VARCHAR(10) NOT NULL
Domain Constraints
(a) CHECK
Gender CHAR NOT NULL
CHECK (Gender IN (‘M’, ‘F’))
searchCondition can involve a table lookup:
CREATE DOMAIN BranchNo AS CHAR (4)
CHECK (VALUE IN (SELECT branchNo FROM Branch));
Domains can be removed using DROP DOMAIN:
DROP DOMAIN DomainName
[RESTRICT | CASCADE]
Entity Integrity
Primary key of a table must contain a unique, non-null value for each row.
ISO standard supports FOREIGN KEY clause in CREATE and ALTER TABLE
statements:
PRIMARY KEY (staffNo)
PRIMARY KEY (clientNo, propertyNo)
Can only have one PRIMARY KEY clause per table. Can still ensure uniqueness for
alternate keys using UNIQUE:
Page 53
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
UNIQUE(telNo)
Referential Integrity
FK is column or set of columns that links each row in child table containing foreign FK
to row of parent table containing matching PK.
Referential integrity means that, if FK contains a value, that value must refer to
existing row in parent table.
ISO standard supports definition of FKs with FOREIGN KEY clause in CREATE and
ALTER TABLE:
FOREIGN KEY(branchNo) REFERENCES Branch
Any INSERT/UPDATE attempting to create FK value in child table without matching
CK value in parent is rejected.
Action taken attempting to update/delete a CK value in parent table with matching
rows in child is dependent on referential action specified using ON UPDATE and ON
DELETE subclauses:
dd. CASCADE - SET NULL
ee. SET DEFAULT - NO ACTION
CASCADE: Delete row from parent and delete matching rows in child, and so on in cascading
manner.
SET NULL: Delete row from parent and set FK column(s) in child to NULL. Only valid if FK
columns are NOT NULL.
SET DEFAULT: Delete row from parent and set each component of FK in child to specified
default. This is valid only if DEFAULT specified for FK columns.
CREATE TABLE
CREATE TABLE
Creates a table with one or more columns of the specified dataType.
With NOT NULL, system rejects any attempt to insert a null in the column.
Can specify a DEFAULT value for the column.
Primary keys should always be specified as NOT NULL.
FOREIGN KEY clause specifies FK along with the referential action.
Page 54
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
ALTER TABLE
ALTER TABLE
Add a new column to a table.
Drop a column from a table.
Add a new table constraint.
Drop a table constraint.
Set a default for a column.
Drop a default for a column.
ALTER TABLE
Change Staff table by removing default of ‘Assistant’ for position column and setting default for sex
column to female (‘F’).
ALTER TABLE Staff
Modify position DROP DEFAULT;
ALTER TABLE Staff
MODIFY sex SET DEFAULT ‘F’;
Page 55
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
ALTER TABLE
Remove constraint from PropertyForRent that staff are not allowed to handle more than 100
properties at a time. Add new column to Client table.
DROP TABLE
DROP TABLE TableName [RESTRICT | CASCADE]
INSERT
Page 56
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Example:
ii. Insert a new row into Staff table supplying data for all columns.
iii. Insert a new row into Staff table supplying data for all mandatory columns
INSERT … SELECT
Second form of INSERT allows multiple rows to be copied from one or more tables to another:
Assume there is a table StaffPropCount that contains names of staff and number of properties they
manage:
UPDATE
Syntax:
UPDATE TableName
SET columnName1 = dataValue1
[, columnName2 = dataValue2...]
[WHERE searchCondition]
Page 57
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
UPDATE Staff
SET salary = salary*1.03;
UPDATE Staff
SET salary = salary*1.05
WHERE position = ‘Manager’;
UPDATE Staff
SET position = ‘Manager’, salary = 18000
WHERE staffNo = ‘SG14’;
DELETE
Syntax:
DELETE FROM TableName
[WHERE searchCondition]
Page 58
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Syntax:
SELECT [DISTINCT | ALL]
{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...]
[WHERE condition]
[GROUP BY columnList]
[HAVING condition]
[ORDER BY columnList]
FROM Specifies table(s) to be used.
WHERE Filters rows.
GROUP BY Forms groups of rows with same column value.
HAVING Filters groups subject to some condition.
SELECT Specifies which columns are to appear in output.
ORDER BY Specifies the order of the output.
Order of the clauses cannot be changed.
Only SELECT and FROM are mandatory.
SELECT staffNo, fName, lName, address, position, sex, DOB, salary, branchNo
FROM Staff;
Page 59
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 60
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 61
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 62
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 63
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 64
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
SELECT clientNo, viewDate FROM Viewing WHERE propertyNo = ‘PG4’ AND comment IS NULL;
Negated version (IS NOT NULL) can test for non-null values.
Page 65
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Four flats in this list - as no minor sort key specified, system arranges these rows in
any order it chooses.
To arrange in order of rent, specify minor order:
Page 66
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 67
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 68
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 69
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Subqueries
Some SQL statements can have a SELECT embedded within them.
A subselect can be used in WHERE and HAVING clauses of an outer SELECT, where
it is called a subquery or nested query.
Subselects may also appear in INSERT, UPDATE, and DELETE statements.
Inner SELECT finds branch number for branch at ‘163 Main St’ (‘B003’).
Outer SELECT then retrieves details of all staff who work at this branch.
Outer SELECT then becomes:
Page 70
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Subquery Rules
ORDER BY clause may not be used in a subquery (although it may be used in
outermost SELECT).
Subquery SELECT list must consist of a single column name or expression, except for
subqueries that use EXISTS.
By default, column names refer to table name in FROM clause of subquery. Can refer
to a table in FROM using an alias.
When subquery is an operand in a comparison, subquery must appear on right-hand
side.
A subquery may not be used as an operand in an expression.
Page 71
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
2. Can you increase the size of a column using ALTER table command?
Answers
1. AS
2. YES
3. No
4. No
5. No
Page 72
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Explain SQL Joins
Describe other database Objects – Indexes, Sequences, and Views
SQL JOINS
Joining Tables
To appropriately join tables, the tables must be related and we apply a where clause which
equates the primary key column of the table on the one side of the relationship with the parallel
foreign key column of the many side table.
Joining Two Tables - SQL will compare every row of the 1st table with the first row of the 2nd
table. Then it will compare all rows of the 1st with the second row of the second, and so on only
rows where the condition is met are placed in the result table.
Page 73
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
SQL Indexes
An Index is a schema object that can speed up the retrieval of rows by using a pointer. Indexes
can be created explicitly or automatically. If there is no index on the column then a full table scan
occurs. An index provides fast access to rows in a table. Indexes are logically and physically
independent of the table they index. This means that they can be created or dropped at any time
and have no effect on the base tables or other indexes. When a table is dropped the
corresponding indexes are also dropped.
When to create an index :
If the column contains wide range of values
If the column contains large number of null values
If one or more columns are frequently used together in a WHERE clause or join
condition
To create an index
SQL Sequences
A sequence is a user created database object that can be shared by multiple users to generate
unique integers. A typical usage for sequences is to create primary key value which must be
unique for each row.
CREATE SEQUENCE sequence
[INCREMENT BY n]
[START WITH n]
[{MAXVALUE n I NOMAXVALUE}]
[{MINVALUE n I NOMINVALUE}]
[{CYCLE I NOCYCLE}]
[{CACHE I NOCACHE}];
SQL Views
A View is a dynamic result of one or more relational operations operating on base relations to
produce another relation.
Virtual relation that does not necessarily actually exist in the database but is produced
upon request, at time of request.
Contents of a view are defined as a query on one or more base relations.
With view resolution, any operations on view are automatically translated into
operations on relations from which it is derived.
With view materialization, the view is stored as a temporary table, which is maintained
as the underlying base tables are updated.
Page 74
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 75
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Restrictions on Views
SQL imposes several restrictions on creation and use of views.
(a) If column in view is based on an aggregate function:
Column may appear only in SELECT and ORDER BY clauses of queries that access
view.
Column may not be used in WHERE nor be an argument to an aggregate function in
any query based on view.
For example, following query would fail:
SELECT COUNT(cnt)
FROM StaffPropCnt;
Similarly, following query would also fail:
SELECT *
FROM StaffPropCnt
WHERE cnt > 2;
(b) Grouped view may never be joined with a base table or a view.
For example, StaffPropCnt view is a grouped view, so any attempt to join this view
with another table or view fails.
2. How many join conditions are needed for joining 3 tables? Ans. 2
3. Does the select statement retrieve rows directly from the tables? Ans. No
Page 76
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Explain the File Organisation
Explain the Performance Tuning of the Database
File Organization
Database is stored internally in the secondary storage devices like tapes and disks. Since disks
are random access devices, we will see the file organization of database in the disks. Disks are
divided into concentric circular tracks on each disk surface. Track capacities vary typically from 4
to 50 Kbytes. Because a track usually contains a large amount of information, it is divided into
smaller blocks or sectors. The division of a track into sectors is hard-coded on the disk surface and
cannot be changed.
A track is divided into blocks. The block size B is fixed for each system. Typical block sizes range
from B=512 bytes to B=4096 bytes. Whole blocks are transferred between disk and main memory
for processing.
Records
The records can be either Fixed or variable length records. When records contain fields which
have values of a particular type (e.g., amount, date, time, age) they are called fixed length records
while fields themselves may be variable length such that of an address field, are called variable
length record.
Files of Records
A file is a sequence of records, where each record is a collection of data values (or data items).A
file descriptor (or file header ) includes information that describes the file, such as the field names
and their data types, and the addresses of the file blocks on disk. Records are stored on disk
blocks. The blocking factor for a file is the (average) number of file records stored in a disk block. A
file can have fixed-length records or variable-length records. File records can be unspanned (no
record can span two blocks) or spanned (a record can be stored in more than one block).The
physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or
indexed. In a file of fixed-length records, all records have the same format. Usually, unspanned
blocking is used with such files. Files of variable-length records require additional information to be
stored in each record, such as separator characters and field types. Usually spanned blocking is
used with such files.
So for a database to store data, it needs computer system as well as operating system. The
aspect of storing data and file management is taken care of by the Operating system. So the two
main components of an OS which is needed for a DBMS are file manager and disk manager.
Page 77
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
When a new row has to be inserted the file manager requests the disk manager who in turn
allocates certain memory space to the file manager. The file manager interacts with the DBMS and
the disk manager interacts with the low level and does the logical to the physical mapping of
address.
DBMS/Host inter-communication
DBMS
File Request
File Manager
Database Tuning
Tuning refers to the process of continuing to revise/adjust the physical database design by
monitoring resource utilization as well as internal DBMS processing to reveal bottlenecks such as
contention for the same data or devices.
Page 78
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Tuning Queries
Indications for tuning queries
A query issues too many disk accesses
The query plan shows that relevant indexes are not being used.
Page 79
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
If multiple options for join condition are possible, choose one that uses a clustering
index and avoid those that contain string comparisons.
The order of tables in the FROM clause may affect the join processing.
Some query optimizers perform worse on nested queries compared to their equivalent
un-nested counterparts.
Many applications are based on views that define the data of interest to those
applications. Sometimes these views become an overkill.
2. What is a block?
Page 80
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Learning Objectives
After completing this chapter, you will be able to:
Identify the latest trends in the Database Arena
db
Site A Site D
Communication
Network
Site B Site C
Fig 14.1: Centralized Databases
Page 81
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
In Distributed Database Environment data is stored at different i.e.. more than one location
Site A Site C
db db
Network
db Site B
A fully DDBMS governs the storage and processing of logically related data over interconnected
computer systems in which both data and processing function are distributed among several sites.
Advantages of DDBMS
Faster access to data.
Improved communication.
Cost effective – as cost of dedicated communication lines and mainframes is reduced.
Less danger of single point failure.
Processor independence – available processor may process the user’s request.
Disadvantages of DDBMS
Complex in data management and control – distributed data across various locations
and maintaining location transparency is difficult. Transaction management,
concurrency control, security, backup, recovery, query optimization must all be
addressed and resolved.
Lack of standard protocol for communication.
Types of DDBMS
Homogeneous DDBMS – In this type of DDBMS all the servers on which data is
distributed run the same DBMS software.
Heterogeneous DDBMS - In this type of DDBMS all the servers on which data is
distributed run different DBMS software I.e. a multidatabase system. Such networks
must support standards for gateway protocol. A gateway protocol is an API
(Application Programming Interface) that exposes DBMS functionality external
applications. Eg ODBC and JDBC. However gateway is an additional processing layer
that can be expensive.
Page 82
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Data Fragmentation
A single object can be broken into two or more segments or fragments. The details about
fragments are maintained in the distributed data catalog (DDC).
Data Fragmentation may be
Horizontal – subset of tuples
Vertical – attribute subset.(Key repeated)
Mixed
Data Replication
Storing the same data copies at multiple location or sites is called data replication.
Replication of data is subjected to mutual consistency rule which means they must be identical.
This increases availability of data and faster query evaluation.
Data Warehousing
Since 1970s, organizations gained competitive advantage through systems that automate
business processes to offer more efficient and cost-effective services to the customer. This
resulted in accumulation of growing amounts of data in operational databases. Organizations now
focus on ways to use operational data to support decision-making, as a means of gaining
competitive advantage. However, operational systems were never designed to support such
business activities.
Businesses typically have numerous operational systems with overlapping and sometimes
contradictory definitions. Organizations need to turn their archives of data into a source of
knowledge, so that a single integrated / consolidated view of the organization’s data is presented
to the user.
Page 83
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
A data warehouse was deemed the solution to meet the requirements of a system capable of
supporting decision-making, receiving data from multiple operational data sources. Data
warehouse can be thought of as a subject-oriented, integrated, time-variant, and non-volatile
collection of data in support of management’s decision-making process (Inmon, 1993).
The warehouse is organized around the major subjects of the enterprise (e.g. customers, products,
and sales) rather than the major application areas (e.g. customer invoicing, stock control, and
product sales). This is reflected in the need to store decision-support data rather than application-
oriented data. The data warehouse integrates corporate application-oriented data from different
source systems, which often includes data that is inconsistent.
The integrated data source must be made consistent to present a unified view of the data to the
users. Data in the warehouse is only accurate and valid at some point in time or over some time
interval. Time-variance is also shown in the extended time that the data is held, the implicit or
explicit association of time with all data, and the fact that the data represents a series of
snapshots. Data in the warehouse is not updated in real-time but is refreshed from operational
systems on a regular basis. New data is always added as a supplement to the database, rather
than a replacement.
Page 84
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 85
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Page 86
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Glossary
Attribute
An attribute is a named column of a relation.
Candidate key:
An attribute or group of attributes that identifies a unique row in a relation. One of the candidate
keys is chosen to be the primary key.
Cardinality
The cardinality of a relation is the number of tuples it contains.
Child:
A row, record, or node on the “many” side of a one-to-many relationship.
Composite key
A key of a relation that consists of two or more columns.
Concurrency
A condition in which two or more transactions are processed against a database at the same time.
In a single CPU system, the changes are interleaved; in a multi-CPU system, the transactions can
be processed simultaneously, and the changes on the database server are interleaved.
Constraint
A rule concerning the allowed values of attributes whose truth can be evaluated. A constraint
usually does not include dynamic rules such as “Salesperson Pay can never decrease” or “Salary
now must be greater than Salary last quarter.”
Database
A shared collection of logically related data (and a description of this data), designed to meet the
information needs of an organization.
DBMS
A software system that enables users to define, create, and maintain the database and provides
controlled access to this database.
Data dictionary
A user-accessible catalog of database and application metadata. An active data dictionary is a
dictionary whose contents are updated automatically by the DBMS whenever changes are made to
Page 87
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
the database or application structure. A passive data dictionary is one whose contents must be
updated manually when changes are made.
Data integrity
The state of a database in which all constraints are fulfilled; usually refers to interrelation
constraints in which the value of a foreign key must be present in the table having that foreign key
as its primary key.
Deletion anomaly
In a relation, the situation in which the removal of one row of a table deletes facts about two or
more themes.
Denormalization
The process of intentionally designing a relation that is not normalized. Denormalization is done to
improve performance or security.
Determinant
One or more attributes that functionally determine another attribute or attributes. In the functional
dependency (A, B) ? C, the attributes (A, B) are the determinant
Distributed database
A database that is stored and processed on two or more computers.
Domain
(1) The set of all possible values an attribute can have. (2) A description of the format (data type,
length) and the semantics (meaning) of an attribute.
Entity
(1) Something of importance to a user that needs to be represented in a database. In an entity-
relationship model, entities are restricted to things that can be represented by a single table.
File-based System
A collection of application programs that perform services for the end-users such as the production
of reports. Each program defines and manages its own data.
Foreign key:
An attribute that is a key of one or more relations other than the one in which it appears.
Index
Overhead data used to improve access and sorting performance. Indexes can be
constructed for a single column or groups of columns. They are especially useful for
columns used for control breaks in reports and to specify conditions in joins.
Page 88
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
Join
A relational algebra operation on two relations, A and B, that produces a third relation, C. A row of
A is concatenated with a row of B to form a new row in C if the rows in A and B meet restrictions
concerning their values. For example, A1 is an attribute in A, and B1 is an attribute in B. The join of
A with B in which A1, B1 will result in a relation C having the concatenation of rows in A and B in
which the value of A1 is less than the value of B1
Normalization
The process of evaluating a relation to determine whether it is in a specified normal form and, if
necessary, of converting it to relations in that specified normal form.
Null value
An attribute value that has never been supplied. Such values are ambiguous and can mean (a) the
value is unknown, (b) the value is not appropriate, or (c) the value is known to be blank.
Primary key
A candidate key selected to be the key of a relation.
Program/data independence
The condition existing when the structure of the data is not defined in application programs.
Rather, it is defined in the database, and the application programs obtain it from the DBMS. In this
way, changes can be made in the data structures that might not necessarily be made in the
application programs.
Primary key
A candidate key selected to be the key of a relation.
Program/data independence
The condition existing when the structure of the data is not defined in application programs.
Rather, it is defined in the database, and the application programs obtain it from the DBMS. In this
way, changes can be made in the data structures that might not necessarily be made in the
application programs.
Weak entity
In an entity-relationship model, an entity whose logical existence in the database depends on the
existence of another entity. See ID-dependent entity and Strong entity.
Page 89
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
References
Websites
www.cs.usaka.ca
www.cs.auckland.ac.nz
www.cs.purdue.edu
www.stanford.edu
www.ocw.mit.edu
www.w3schools.com
www.wps.prenhall.com
www.cs.sfu.ca
Books
Thomas Connolly and Carolyn Begg, “Database Systems – A Practical Approach to
Design, Implementation and Management” , Pearson Education , 4th Edition
Elmasri and Navathe, “Fundamentals of Database Systems”, Pearson Education
Edition, 4th Edition
Korth, Sudarshan and Silberschatz, “Database System Concepts”, Mc Graw hill
Publications, 5th Edition
C J Date, “An Introduction to Database Management”, Addison Wesely Publications,
th
6 Edition.
Page 90
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Database Management System
STUDENT NOTES:
Page 91
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected