Unit I Dbms

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 45

1

CHAPTER 1
INTRODUCTION

Overview of DBMS (Database Management System)


DBMS is generally defined as a collection of logically related data and a set of programs
to access the data. Strictly speaking, this is definition of “Database System”, which
comprises of two components i.e. (i) Database and (ii) DBMS.

USER QUERIES

Query Processing
Software

DBMS
Storage Management
Software

DATABAS
E
SYSTEM
Schema DATA
Definition

DATABASE

DATABASE A Database is a collection of logically related data that can be recorded.


The information stored in the database must have the following implicit properties:-

(a) It must represent some real-world aspect; like a college or a company etc.
The aspect represented by the database is called its “Mini-world”.

(b) It must comprise a logically coherent collection of data, which should


have well-understood inherent meaning (semantics).

(c) The repository of data must be designed, developed and implemented for a
specific purpose. There must exist an intended group of users, who must have
some pre-conceived applications of the data.
2

A Database System will have the following major organs:-

- Sources of information, from where it derives its data.


- Some related real-world events, which influence its data.
- Some intended users, who would be interested in its data.

For example, in the college database, sources of information will be students, faculty,
labs etc. The real-world events affecting the information in the database will be
admissions, exams, results & placements etc. The set of intended users will be faculty,
students, admin staff etc.

Database Management System (DBMS)

A Database Management System (DBMS) refers to a set of programs for defining,


creation, maintenance and manipulation of a database. A DBMS must facilitate the
following major functions:-

- Defining of Database Schema:- The DBMS must facilitate defining the


database structure i.e. defining of data types, relationships amongst the data and
specification of the integrity constraints to be enforced on the database. It should
also facilitate specifying the access rights of authorized users.

- Manipulation of the Database:- The DBMS must facilitate functions like:-

Insertion of new data into the database


Update of changed information
Deletion of data, which might have been rendered defunct
Reading of stored information, including generation of reports

- Sharing of a database The DBMS must enable concurrent access of shared


data items by multiple users, while preserving the consistency of the
database.

- Protection of a database The DBMS must protect the database against


unauthorized/ malicious access.

- Database Recovery In the event of system failures, DBMS must facilitate


database recovery.

File Processing System

Before the evolution of DBMS, dedicated systems known as “File-Processing


Systems” were in vogue to handle the data repositories of organizations. Such systems
3

needed a dedicated set of application programs, to add information to the files, to extract
information from the files and to update the existing information. The structure of the
files used to be hard-coded in the application programs. Normally, such application
programs used to be written by different programmers in different languages, as and
when need arose. Also, the same information used to be stored at multiple places, in
different formats, on different machines, which were not even interconnected.

Limitations of a File Processing System

(i) Data Redundancy and Inconsistency:- Since the same information is stored
at multiple places, it causes data-inconsistency problems during updates.

(ii) Difficulty in Accessing of Data Suppose there exists some information in the
files, but the existing set of application programs do not support extraction of that
information. Under such situations, the application programs need to be updated and this
is very inconvenient, time consuming and costly solution.

(iii) Data Isolation The information is scattered over a large number of files,
on a number of stand-alone (not networked) machines, making it very difficult to process
certain queries, which need information to be extracted from multiple locations.

(iv) Difficulty in Enforcing Integrity Constraints: Enforcing of integrity


constraints has to be handled at application program level, making the programs very
complex. The redundancy of information makes this task all the more difficult.

(v) Atomicity Problems: Since the information needed to rollback a transaction may
not be readily available in a file-processing systems, ensuring atomicity of transactions
will be difficult.

(vi) Difficulty in Concurrency Control: It is complex to build in the


concurrency control features at the application programs level.

(vii) Security Problems: Since the information is scattered and does not have
centralized access path, effective enforcing of user access rights will not be fail-proof.

Features of a Database System


A DBMS will support the following features:-

(a) Data Dictionary A Database System will support a Data Dictionary (or Data
Directory or DBMS Catalog), which contains information like Data Types,
Relationships amongst the data and Data Constraints of the underlying
database. In addition, it also contains the information about Authorized Users
of the database like their Access Rights. Since, this information defines the
nature of the data stored in the database, it is called metadata (data about
4

data). This information makes the DBMS software independent of its


underlying database. When a need arises to change the structure of data, no
changes need to be made to the DBMS software; only the dictionary is
updated to reflect the changes. Whereas in a file processing system, the
application programs would need to be changed. Also, this feature makes the
DBMS software generic. The same DBMS can be used for different
organization having entirely different set of data; the distinguishing feature
will be the information stored in the Data Directory. This feature of DBMS is
generally referred to as ‘Self Describing Nature of a Database’, since the
information stored in the Data Dictionary fully describes the nature of the data
stored in the Database.

(b) Storage Management DBMS supports a File Manager to manage the


allocation of disk space for the DBMS files. Also, it supports a Buffer
Manager to manage the memory buffers, used for processing database
information. Whenever, some information is to be updated, it is first read from
the files into the buffer, where it is manipulated and then the updated
information is written back into the files.

(c) Language Interfaces DBMS supports language interfaces with 4GL


languages like PL/SQL for data manipulation applications.

(d) Transaction Management DBMS ensures atomicity of transaction


processing. A Transaction, when executed transforms the database from one
consistent state to another consistent state. During its execution, a log is
maintained in a system Log File of all the operations performed by the
Transaction. If a Transaction fails during its execution, then the log file is
used to rollback the transaction during recovery of the database. This ensures
atomicity of transaction processing.

(e) Concurrency control DBMS will support concurrency control tools for
permitting multiple users or application programs to access the database
concurrently, while preserving the consistency of database.

(f) Security Management Security Mechanism of a Database System will


ensure that only the authorized users can access the database; and that too
only to the extent, which is explicitly authorized by the Database
Administrator. The authorized Access Rights are explicitly stored in the Data
Dictionary. The access by each user and the type of operations performed on
various data will be monitored and controlled by the DBMS. This will protect
database against the authorized/ malicious access.

(g) Database Recovery Since DBMS maintains a log of all transactions


being executed, it will enable recovery of the underlying database, in the event
of failures. For example, if a Transaction fails during its execution, it is rolled
back to initial state; thus reverting back to the consistent state that existed
5

prior to the commencement of the failed transaction. This is made possible by


the information stored in the system log file. Also, DBMS will support taking
of periodic backups, which are used to recover databases in case of
catastrophic failures; like Disk Crash.

DATA MODELS

A Data Model defines the underlying structure of a Database. It comprises a collection of


conceptual tools for describing the Data, the Data Relationships, the Data Semantics and
the Data Integrity Constraints.

CATEGORIES OF DATA MODELS

Basically, there are three categories of data models:-

(a) Object Based Logical Models.


(b) Record Based Logical Models.

(a) Object Based Logical Models.

The Object Based Logical Models view the universe as a collection of objects.
(i) Entity-Relationship Model.

- An Entity refers to a real-world ‘object’ or a ‘concept’ that is distinguishable


from other objects and other concepts in the real-world. For example, a
person, a bank-account, a payment are all entities of different kinds.

- An Entity will have a set of properties, known as Attributes; for example, the
Entity “Account” may have attributes like “Account-Number”, “Current-
Balance” etc

- Each attribute will have a set of permitted values, called its Domain; for
example the domain of Balance of an account can be the set of +ve real
numbers.

- A collection of entities of the same kind, having same set of attributes, is


called an “Entity Set”.

- A relationship refers to the association amongst entities. For example, in a


banking database, an entity ‘Customer’ can have relationship ‘Depositor’ with
another entity ‘Account’.

- A set of Relationships of the same kind, having the same set of attributes is
called a Relationship Set.
6

- A database in E-R Model is modeled as a collection of Entity Sets and


Relationships Sets.

- E-R Model also specifies certain constraints, like Mapping Cardinalities i.e.
whether the relationship is one-to-one, one-to-many, many-to-one or many-to-
many.

- The E-R Diagram below depicts two Entity Sets “STUDENT”, “COURSE”
and a relationship set “RESULT” indicating the marks obtained by students in
different Courses.

S_Name
Sub_Cod
S-Address e
Roll_No
Marks Sub_Title

STUDENT RESULT COURSE

(ii) Object-Oriented Model. Like the E-R Model, this model also
models a database as a Collection of Objects. An Object Body
encapsulates Data (Variables) as well as Methods (Functions) to
manipulate the Data (Variables). The Objects that contain same Type of
Data Variables and same Type of Functions are grouped together as a
Class. Thus, a Class may be viewed as a Type Definition of the Objects.
The only way an Object “A” can access the Data Items of another Object
“B” is by invoking the Methods of “B”. “A” can accomplish this by
making calls to the methods of “B”, through B’s Interface. The methods
defined within an object are made visible to the external world, through its
Interface.

Variables

Functions
Interface

OBJECT
7

The structure of an object-oriented database is modeled as a set of classes


and database will comprise of objects belonging to those classes.

(b) Record Based Logical Models. These models describe data at the Logical
level, as a collection of fixed-format Records of different types. Each Record Type can
have a fixed number of Fields (or Attributes) and each Field is usually of fixed length.
Use of fixed-length Records simplifies the Physical Level implementation of a database.
The most widely used Record Based Logical Models are:-

(i) Hierarchical Model. This is one of the oldest models, dating back to
1960s. The first commercial DBMS, based on this model, was “Information
Management System” (IMS), released by IBM in 1966. At one time, it was the
most used DBMS. In the Hierarchical Model, the Data is represented as Records;
and the Records are organized as a collection of Trees. The relationships among
the data are represented by Links, which can be viewed as pointers. The tree
structure permits that each record can have only one parent record. Thus, it
permits modeling of only one-to-many relationship (not many-to-many
relationship) amongst the Records.

The following diagram, showing an Academic Database in Hierarchical Model,


represents Records of three types “Course”, “Teacher” and “Student”; and links
indicating relationship “Offered By” from “Course” to “Teacher”- indicating the
faculty offering a course and the relationship “Attended By” from “Course” to
“Student”- indicating the students attending a course.

Course

Offered By Attended By

Student
Teacher
HIERARCHICAL MODEL

It does not indicate the relationships “ What are the courses being offered
by a faculty”, “What are courses being attended by a student”, “who are the
students being taught by a faculty” and “who are the faculty teaching a student”.
This is due the limitation of tree structure that a node can have only one parent
node; and thus we can represent only one-to-many relationship but not many-to-
many relationship.
8

(ii) Network Model. Like the Hierarchical Model, this Model also
models a database as a collection of Records; and the Records are organized as a
collection of arbitrary graphs (or Networks). Thus a Record can have any number
of parent records; and thus supports many-to-many relationship amongst records.
The relationships among the records are represented as links (pointers). Since, this
Model supports many-to-many relationship amongst the records, it is considered
more versatile as compared to Hierarchical Model.

The above database can be better modeled in Network as indicated below. It


contains additional information i.e. relationship “Offers” from “Teacher” to
“Course” and relationship “Attends” from “Student” to “Course”. Since, the
Hierarchical Model can strictly model only Tree Structures, it was not possible to
depict “Offers” and “Attends” in the Hierarchical Model. Also, it depicts
relationships “Teaches” and “Taught By” between “Teacher” and “Taught”.

Course
Offered By Attends
Offers Attended By

Teaches
Teacher Student
Taught By

NETWORK MODEL

(iii) Relational Model. This is most modern and most commonly used
model amongst the Record Based Models. It has been widely accepted. The
Relational Model models a database as a collection of Tables to represent both
data and the relationships amongst the data. Each Table is called a Relation,
which is assigned a unique name. Each relation has a number of Columns,
representing the Fields (or Attributes) of the relation. Each Field is also uniquely
named. A Relation (or Table) can have an un-limited number of Rows and each
Row represents an Instance of the Relation. A Row is also termed as a Tuple.
Each Tuple will be unique in a Relation. So, a Relation can be viewed as a set of
Tuples of the same type. The relationships amongst the tables will be modeled as
Foreign Key- Primary Key Relationships.

The “Course-Student-Teacher” Database Schema in Relational Model


will be represented by six Tables- three tables to represent entities i.e. STUDENT
(giving details of all students), TEACHER (giving details of all teachers),
COURSE (giving details of all courses); and three tables to represent relationships
i.e. COURSE-TEACHER (indicating relationships – OFFERED BY and
9

OFFERS), COURSE-STUDENT (indicating relationships ATTENDS and


ATTENDED BY) and TEACHER-STUDENT (indicating relationships
TAUGHT BY and TEACHES).

STUDENT
Roll_No S_Name Branch Semester Section S_Address

COURSE
Sub_Code Sub_Title Semester Branch Contact_Hrs

TEACHER
Fac_Code Fac_Name Desig Dept Fac_Address

COURSE-TEACHER
Sub_Code Fac_Code

COURSE-STUDENT
Sub_Code Roll_No

TEACHER-STUDENT
Fac_Code Roll_No

The Relational Model has become extremely popular because:-

(a) It is extremely simple and easy to implement.


(b) It has a strong mathematical foundation.
(c) It has been highly standardized.

SCHEMAS AND INSTANCES


10

Schema. Database Schema refers to the overall structure of a database. Once


defined, the schemas are rarely changed. A Database System will have several Schemas,
partitioned according to the levels of its abstraction.

Instance. It refers to the actual collection of data (a Snapshot of data) existing in the
database at a particular moment of time. Since, a database will continuously experience
insertion of new data, deletion of defunct data and update of changed data, the Instance
will be under continuous change.

DATA ABSTRACTION & VARIOUS SCHEMAS OF A DATABASE

There are three levels of data abstraction in a database; and each level is described by a
schema as explained below:-

(a) Physical Level. This is the lowest level of abstraction. At this level, a
Physical Schema describes “how data is physically stored”. The Physical Schema may
describe complex structures, used to store the data, with the sole aim of achieving an
efficient access of the data.

(b) Logical Level. This is the intermediate level of abstraction. At this


level, a Logical Schema (or Conceptual Schema) would describe “what data is stored in
the database” and “what are the relationships amongst the data”. This Schema is used by
Database Administrators, who decide what information is to be kept in the Database. It
would describe the logical structure of database, data types and integrity constraints. As
compared to Physical Level, Database at Logical Level is described by relatively smaller
number of simpler structures. But, the implementation of these simple structures may be
quite complex at the Physical Level. The user operating at Logical Level need not be
aware of the complexities at the Physical Level.

(c ) View Level. This is the highest level of abstraction. At this level, there will be
many Views, defined for different categories of users. A View for a certain group of users
describes “what subset of the database is to be made visible” to that group. A view will
describe only a subset of the underlying database. This is the subset, which the intended
group of users needs to access. There may be many Views, tailored to the specific needs
of various users. At the view level, the main goal is to provide an efficient and a user-
friendly human-interaction with the system. So, the interface at this level is made as
simple and user-friendly as possible. A user doesn’t have to be aware of the complexities
at the conceptual level and physical level.

DATA INDEPENDENCE

The ability of a DBMS to modify its Schema definition at one level, without affecting a
Schema definition at the next higher level, is called Data Independence. There are two
levels of Data Independence:-
11

(a) Physical Data Independence. It is the ability of DBMS to modify the


Physical Schema without causing any changes in the schema at the logical level and at
the view level. Modifications at Physical Level are driven by advancements in hardware
technology and by the requirements to upgrade hardware for improving system
performance.
(b) Logical Data Independence. This refers to the ability of DBMS to modify the
Logical Schema without causing any changes in the application programs at the view
level. Modifications at Logical Level are necessitated by need to alter the Logical
Structure of the database. The Logical Data Independence is much more difficult to
achieve than the Physical Data independence, since the application programs are heavily
dependent on the logical structure of the database.

DATABASE LANGUAGES
A DBMS will support two kinds of languages; one called Data Definition Language
(DDL) to specify the Database Schema and the other called Data Manipulation Language
(DML) to enable accessing and manipulation of the data stored in the database.

(a) DDL. A database schema is specified by a set of definitions expressed in DDL.


In a Relational Database, the result of interpretation of DDL statements will be a set of
Tables that are stored in a special file called Data Dictionary or Data Directory or DBMS
Catalog. This data stored in Data Dictionary is called Metadata i.e. data about data.
Whenever the database is to be accessed, the DBMS will first make a reference to the
Data Dictionary with a view to determine the structure of data to be accessed; only then it
will access the actual data in the database. Thus the data dictionary is accessed during
processing of each query.
The storage structure and access methods used by the database system are
specified by a set of definitions in a special type of DDL called Data Storage and
Definition Language. The result of interpretation of these definitions will be a set of
physical schema structures and a set of access methods supported by the system. These
details are usually hidden from the database-users.
(b) DML. A DML is a language that enables users to access and manipulate the data
stored in the database. A DML query is a statement specifying information to be accessed
for retrieval or insert/update/delete. The portion of a DML that involves information
retrieval is called a query language. The goal of a DML is to provide an efficient and
friendly human interface for the following operations in a database:-

(i) Retrieval on information stored in the database.


(ii) Insertion of new information into the database.
(iii) Deletion of information from the database.
(iv) Update of information stored in the database.

There are two types of DMLs:-

(i) Procedural DMLs. A query in procedural DML requires the user to


specify not only “what data is required to be extracted from the database” but also
to specify “how to extract those data”.
12

(ii) Non-Procedural DMLs. A Query in Non-Procedural DML requires


the user to specify only “what data is needed”, without specifying how to get
those data.
Non-procedural DMLs are easier to learn and to use than the procedural DMLs.
However, since non-Procedural DMLs do not specify “how to get the data”, the
queries in Non-Procedural DMLs may not generate as efficient code as the
equivalent queries in Procedural DMLs. This limitation of Non-Procedural DMLs
is overcome by performing query optimization at the System Level.
OVERALL STRUCTURE OF DBMS

Users
Unskilled Application DML DBA
Users Programmers Users

Application
Program DML Tools DBA Tools
Interfaces
Development Tools
DML DDL
Statements
4GL Programs Queries

Application
Programs Pre-Compiler DML DDL
Object Code For 4GL Compiler Interpreter
Programs
Query Evaluation Engine Query Processor

Buffer Manager
Authorization
Transaction
& Integrity
Manager
Manager

File
Manager Storage Manager

Index Query Evaluation


information Statistics

Application Data Dictionary


Database Data Files (Schema + Access
Rights)
DATA MODELING USING E-R MODEL
Disk Storage
13

Functions of a Database Administrator (DBA)

DBA is the custodian of the Database System placed under his control and is responsible
for the following functions:-

1. Creation of Conceptual Schema and its periodic update to adapt to the changed
requirements.

2. Implementation of efficient Storage Structure and Access Methods.

3. Liaise with the Users to ensure that the information required by the Users is made
available.

4. Ensure system security, through Grant and Revoke of Access Rights to the Users.
A user must have only as much rights as required by his role in the organization- nothing
more, nothing less.

5. To ensure Physical Security of Database against malicious access and accidents


like fire etc.

6. Take periodic backups and keep the archived data safely.

7. Execute immediate recovery procedures in case of failures.

8. Monitor the system performance. In case of degradation in system performance,


perform tuning procedures. If necessary, upgrade the system (hardware / software) to
meet the changed requirements of the organization.

9. Ensure sufficient Disk Space is always available. If needed, upgrade the Disk
Drives to meet the increased requirements.

10. To liaise with the DBMS vendor to obtain necessary technical supports and to
obtain the necessary tools & software upgrades, whenever made available by the vendor.

Characteristics of a Database System, which distinguish it from a conventional


File-Processing System

In a traditional file system, each user defines & implements the files needed for a specific
application, as a part of programming the application itself. Multiple users of the same set
of data will create replicated sets of files, specific to their respective applications. This
redundancy in defining & storage of data results in higher storage costs and database
inconsistencies during updates. On the other hand, in a database approach, a single
repository of data is maintained, which is defined once and then accessed by various
users of the data.
14

The main characteristics of a database approach, which distinguish it from a file-


processing approach are:-

(i) Self-Describing nature of a Database System A database contains not only


the data, but also a complete definition of the data structure, data types & data
constraints. This additional information is called meta-data, which is stored in
a file called Data-Dictionary (also called DBMS Catalog). The information
stored in the Data Dictionary is accessible to the DBMS software. This
additional information makes the DBMS software independent of its
applications. When a new need arises to change the structure of data, no
changes need to be made to the DBMS software; only the meta-data in the
Data Dictionary needs to be changed, to reflect the changes. This feature
enables the DBMS software to be adapted for any application. The same
DBMS will work for a college, a bank or a factory. Whereas in a traditional
file processing system, the application programs would need major changes
while shifting from one application to another.

(ii) Data Abstraction In a traditional file processing system, the structure of the
data files is hard coded in the application programs; thus any changes in
structure would need the related application programs to be modified
accordingly. Whereas in a Database System, the application programs are
insulated from the data stored in the database. The application programs are
only concerned with ‘what data’ is stored in the database and not concerned
with ‘how the data is stored’. As long as the contents of data remain
unchanged, the database structure can be changed, without affecting the
existing application programs. This feature is called Data Abstraction.

(iii) Support for Multiple Views of the Data Depending on different


needs and different levels of authorizations, different users would be provided
different perspectives of the same data, called Views. A View refers to a subset
of the stored data or a set of Virtual Data i.e. data derived from the stored
data. A View is not explicitly stored in the Database; only its Definition is
stored in the DBMS Catalog. Whenever a user or a program submits a query
to access a View, the View is instantly computed and presented to the User or
the Program. Next time, when the same view is again accessed, it is re-
computed fresh.

(iv) Multi-User Access & Concurrency Control A Multi-User DBMS allows


multiple users to access the same database concurrently. This is achieved by
including Concurrency Control Software in the DBMS, to ensure that
database remains consistent, despite access by multiple users concurrently.

(v) Effective System Protection through grant of Access Rights Access Rights
are granted to the users, to the extent required for their roles in the
organization. These rights are stored in the data dictionary itself. When a
query is to be processed, the DBMS will first ensure that the user submitting
15

the query has sufficient rights for the processing of that query; only then the
query is processed.

(vi) Support for efficient Recovery. When a system is restarted after a failure,
log-based recovery recovers the database efficiently.

Advantages of using a DBMS vis-à-vis File Processing System

(a) Controlling Redundancy While designing a database, various Views


of different users are integrated into a single database, thus controlling
redundancy. This results in reduced effort and reduced storage space. Also, it
ensures database consistency, in case of updates.

(b) Restricting unauthorized access The user access rights are stored in
the data dictionary. Whenever, any query is received from any user, it is
checked for valid access rights. If access rights exist, the query is processed
else it is rejected as ‘Invalid Query’. This prevents unauthorized access of
data.

(c) Providing Multiple User-Interfaces A DBMS provides various types of


user interfaces for various categories of users:-

- Query Languages (like SQL) for skilled users

- Programming Languages (like PL/SQL) for application programmers

- Menus, Forms for Naive Users

- DDL for Database Administrator

(d) Enforcing of Data Integrity Constraints The Data Integrity


Constraints are stored in the data dictionary itself. Whenever, some data is
inserted/updated/deleted, the data constraints are automatically applied to the
related data items and invalid operations are rejected.

(e) Supporting Concurrent Access A DBMS supports concurrent access by


multiple users. Despite concurrent access by multiple users, database
consistency is maintained.

(f) Providing backup & recovery A DBMS supports data backup & recovery
in case of failures.

(g) Reduced Application Development Time Development time of a new


application using DBMS is of the order of 15 – 25% as compared to the time
needed in development of equivalent applications in a traditional file
processing system.
16

(h) Easy Adaptability A database system can be easily adapted to changed


requirement, with minimal time and cost implications.

(i) Potential for enforcing Standards It permits the Database


Administrator (DBA) to define & enforce standards among the database users.
The standards can be defined for naming conventions, formats of data items,
display formats or report structures etc.
17

Exercises

Ex.1.1 Explain three level of data abstraction. Distinguish between Physical Data
Independence and Logical Data Independence. Which is more difficult to achieve
and why?

Ex.1.2 Explain the characteristics of DBMS that distinguish it from a File


Processing System. Explain how the application development is mush shorter
in a DBMS environment than in a File Processing Environment.

Ex.1.3 Compare the three data models: Hierarchical, Network and Relational.
What are the distinguishing features of Relational Model that make it so popular?

Ex.1.4 Distinguish between:-

(a) DDL & DML


(b) Schema & Instance
(c) Procedural DML & Non-Procedural DML

Ex.1.5 Explain the role of the following components of DBMS:-

(a) DML Compiler


(b) Query Processing Engine
(c) Buffer Manager
(d) Transaction Manager

Ex.1.5 What is the roe of a Data Dictionary in DBMS? How does this feature
make the DBMS independent of the underlying database?

Ex.1.6 Explain major functions of a Database Administrator (DBA)?

Ex.1.7 Explain what is implied by the statement“ In DBMS, views of different


users can be integrated into a single database”.

Ex.1.8 What is meant by “Self-describing nature of a database”?

Ex.1.9 Compare Procedural DMLs and Non-Procedural DMLs from the


viewpoints of (i) User Friendliness (ii) Query Optimization.
18

CHAPTER 2

ENTITY-RELATIONSHIP MODELING
The Entity Relationship Model (ER Model) models the real world situations as a
collection of entities and relationships amongst the entities.

Entity An Entity is an object (like a “CAR”) or a concept (like an “ACCOUNT”)


from the real world, which is distinguishable from other objects and other concepts. Each
Entity will be defined by a set of properties (called Attributes). For example entity
“ACCOUNT” may be defined by Attributes like “ACCOUNT-NUMBER”, “BRANCH-
NAME” and “BALANACE” etc.

Entity Set An Entity-Set refers to a collection of entities of the same kind. Each
entity in an Entity-Set will have the same set of attributes and the set of attributes will
distinguish it from other Entity Sets. No other entity set will have exactly the same set of
attributes. Some of the attributes of an entity set may overlap with other entity sets.

Relationship A Relationship refers to an association amongst Entity Sets. Like there


may be relationship “DEPOSITOR” between Entity Set “CUSTOMER” and Entity Set
“ACCOUNT”.

Relationship Set A Relationship Set refers to the collection of Relationships of the


same kind (i.e. having exactly same set of Attributes). A Relationship Set will inherit
some of the Attributes (properties) of the associating Entity Sets. Like the Relationship
Set “DEPOSITOR” between Entity Sets “CUSTOMER” and “ACCOUNT” will inherit
Attributes “CUSTOMER-ID” from “CUSTOMER” and Attribute “ACCOUNT-
NUMBER” from “ACCOUNT”. In addition, a Relationship Set may have some of its
own attributes called “Descriptive Attributes”; for example the relationship set
“DEPOSITOR” may have a descriptive attribute “DATE-OF-OPERATION”, indicating
the date on which a customer has last operated an account.

Domain of an Attribute

Each attribute has a set of permitted values called its domain or value set, like the
attribute ‘NAME’ may have a domain that is set strings of characters of specified
maximum length.

A database will consist of a set of Entity-sets and Relationship-Sets, each of


which will contain a number of entities of the same type or Relationships of the same
type. An entity in a database may be described by a set of (attribute, data value) pairs;
like a student in Entity-Set “STUDENT” may be described by {(ROLL-NUMBER,
0990013010), (NAME, ‘Karan Singh’), (DATE-OF-BIRTH, ‘10-DEC-1985’)}.
19

Attribute Types:-

(i) Simple Vs Composite Attributes. A Simple attribute is the one, which


is not divisible into sub-parts like ‘BRANCH’. On the other hand, a Composite
attribute is the one, which can be divided into sub-parts like ‘DATE-OF-BIRTH’,
which may be divided into ‘birth-date’, ‘birth-month’ & ‘birth-year’.

(ii) Single-Valued Vs Multi-Valued Attributes. An attribute, which


can assume one value at a time, is called Single-Valued attribute; like ‘name’ of an
EMPLOYEE entity. On the other hand, an attribute, which may assume a set of
values at a time, is called multi-valued attribute; like attribute ‘dependant’ of an
Attribute Set “EMPLOYEE”, which may have none or one or multiple values,
depending upon the number of dependents of an employee.

(iii) Null Attribute. A null value is assigned to an attribute under any of


the following three conditions:-

(d) If the attribute value is not applicable to an entity; like SPOUSE-NAME


will not be applicable if an employee is unmarried.

(e) If value is applicable, but not specified; like TEL#- an employee may not
be owning a Telephone.

(f) If value is applicable and specified but not known to the agency entering
the information; like an employee may be owning a Telephone but the
number may not be known to the organization.

Null value can only be assigned to an Attribute, if assigning value to that attribute
is optional (not mandatory). The Mandatory attributes cannot be assigned a
“Null” value.

(iv) Derived Attribute Vs Stored Attribute . A derived attribute is the one,


whose value is not stored in the database, but is derived from the value of other
stored attributes; like the value of attribute ‘age’ can be derived from attribute
‘date-of-birth’ and current date obtained from the system.

Degree of Relationship Sets. Degree of a Relationship Set refers to the number of


Entity Sets participating in the Relationship. Most of the relationships are binary.

E-R Diagram Notations

Rectangle represents an entity set.


20

Ellipse represents an attribute.

Diamond represents a relationship set.

Line links an attribute to an entity set or an entity set to


a relation set.

Double Line indicates total participation of an entity set in a


relation set.

Dashed Ellipse indicates derived attribute.

Double Ellipse indicates multi-valued attribute.

Double Rectangle indicates weak entity set.

Double Diamond indicates a relationship set with participation


of some weak entity sets.

RELATIONSHIP constraints

- Mapping Cardinalities
- Participation Constraint

Mapping Cardinalities. For a binary relationship set R between entity sets A and B,
the mapping cardinalities can be on of the following:-

(a) One-to-one. An entity in A is associated with at most one entity in B and an


entity in B is associated with at most one entity in A. It is represented in E-R Model as
follows:-

A R B

One-to-one cardinality is represented by directed lines drawn from R to A & B both.


21

(b) One-to-many. One to many cardinality from A to B implies than an entity in A is


associated with any number (Nil/ one/ many) of entities in B; however, an entity in B is
associated with at most one entity in A. It is represented in E-R Model as follows:-

R
A
B

(c) Many-to-one. Many to one cardinality from A to B implies that an entity in A is


associated with at most one entity in B; however, one entity in B can be associated with
any number of entities in A. It is represented in E-R Model as follows:-

R
A B

(d) Many-to-many. Many to many cardinality from A to B implies that an


entity in A can be associated with any number of entities in B and one entity in B can be
associated with any number of entities in A. It is represented in E-R Model as follows:-

R
A B

Example:-

One-to-One relationship from CUSTOMER to ACCOUNT implies that each customer


can have only one account and each account has to be Single.

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

One-to-Many relationship from CUSTOMER to ACCOUNT implies that each customer


can have any number (NIL or One or More than One) of accounts, but each account has
to be Single.

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-Many Relationship)

Many-to-One relationship from CUSTOMER to ACCOUNT implies that each customer


can have only one accounts, but each account can be Joint (held by one or more).

DEPOSITOR
CUSTOMER ACCOUNT
22

(Many-to-One Relationship)

Many-to-Many relationship from CUSTOMER to ACCOUNT implies that each


customer can have any number (Nil or One or More than One) accounts and each account
can be Joint (held by one or more).

DEPOSITOR
CUSTOMER ACCOUNT

(Many-to-Many Relationship)

Participation Constraints in Relationship Sets

- Total Participation
- Partial Participation

Total Participation

An Entity Set E is said to have total participation in relationship set R if each entity in E
is participating at least in one relationship through R. In E-R Diagram, the Total
Participation is represented by a “Double Line” drawn between the Entity Set symbol and
the Relationship Set symbol.

Partial Participation
An Entity Set E is said to have partial participation in relationship set R if some of the
entities in E are not participating in any relationship through R. In E-R Diagram, the
Partial Participation is represented by a “Single Line” drawn between the Entity Set
symbol and the Relationship Set symbol.

Example:- Suppose Entity Sets “CUSTOMER” and “ACCOUNT” are related by


Relationship Set “DEPOSITOR” and Entity Sets “CUSTOMER” and “LOAN” are
related by Relationship Set “BORROWER”. Suppose it is possible that a customer may
have only account or only loan or both, then the situation can be modeled as follows:-

CUSTOMER Partial DEPOSITOR Total Participation ACCOUNT


Participation
Partial Participation

BORROWER

Total Participation

LOAN
23

Concept of Key

Super Key. A Super Key of an Entity Set or Relationship Set refers to the set of
attributes, which when taken collectively, will uniquely determine an entity within the
Entity Set or a Relationship within the Relationship Set. If K forms a Super Key (SK) of
an Entity Set E then any super set of K will also be a Super Key of E. So, a Super Key
may have some extraneous (unnecessary) attributes, which if removed, the balance set
may still form a Super Key of R.

Example :- Suppose each student in the Entity Set STUDENT (ROLL_NO, NAME,
BRANCH, FATHERS-NAME, ADDRESS, DOB, TEL-NO) has a unique value of
ROLL-NO. This implies that no two students can have same ROLL-NO. Then
{ROLL-NO, NAME} forms a super key of Entity-Set STUDENT. In this, the attribute
NAME is extraneous; which if removed, the balance set i.e. {ROLL-NO} still forms a
Super Key of STUDENT.

Candidate Key. A Super Key, whose no proper subset forms a Super Key, is called
a Candidate Key. Thus, Candidate Key is a minimal Super Key (i.e. a Super Key having
no extraneous attributes). An Entity Set may have more than one Candidate Keys.

Example:- The Entity Set STUDENT will have at least two Candidate Keys i.e.
{ROLL-NO} and {NAME, FATHERS-NAME, DOB, ADDRESS}.

Primary Key. Primary Key is one of the Candidate Keys that is designated by the
database designers as primary means of identifying entities within an entity set. In the
E-R Diagram, the Primary Key Attributes are underlined with a firm line.

Primary Key of a Relationship Set

Let R be a binary relationship set between Entity Sets E 1 and E2. Let K1 and K2 be the
respective Primary Keys of E1 and E2. Then the Primary Key of Relationship Set R will
depend upon the cardinality mapping of the relationship set, as explained below:-

(i) One to One Relationship


PK (R) = PK (E1)
or
= PK (E2)

(ii) One to Many Relationship from E1 to E2


Here E2 is called “Many-Side” Entity Set and E1 is called “One-Side” Entity-
Set. PK (R) = PK (E2) i.e. Primary Key of “Many-Side” Entity-Set.
24

(iii) Many to One Relationship from E1 to E2


Here E1 is called “Many-Side” Entity Set and E2 is called “One-Side” Entity-
Set. PK (R) = PK (E1) i.e. Primary Key of “Many-Side” Entity-Set.

(iv) Many to Many Relationship from E1 to E2

PK (R) = PK (E1)  PK (E2)


Example
CN AN
(i)
DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)
PK (DEPOSITOR) = CN or AN

CN AN
(ii)

DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

PK (DEPOSITOR) = AN i.e. PK of ACCOUNT

CN AN
(iii)
DEPOSITOR
CUSTOMER ACCOUNT

(One-to-One Relationship)

PK (DEPOSITOR) = CN i.e. PK of CUSTOMER

CN AN
(iv)
DEPOSITOR
CUSTOMER ACCOUNT
25

(One-to-One Relationship)

PK (DEPOSITOR) = {CN, AN} i.e. PK (CUSTOMER)  PK (ACCOUNT)

Concept of Weak Entity Set

An Entity Set is said to be a Weak Entity Set if it does not have sufficient attributes to
form its Primary Key. On the other hand, an entity set having a primary key of its own is
called a Strong Entity Set. A Weak Entity Set (say E 2) will be dependent for its existence
on a Strong Entity Set (say E 1) to form its Candidate Key. Then Entity Set E 2 is said to be
“Existence-Dependent” on E1 and E1 is said to be the “Owner Entity Set” of E 2. The
relationship R between E2 and E1 is called “Identifying Relationship”. The Weak Entity
Set E2 will have a set of attributes called its “Discriminator”, which together with the
Primary Key of E1 will form the Primary Key of E2.

E1
E2

Owner Entity Set Identifying Relationship Weak Entity Set

Example:-Suppose an Entity Set EMPLOYEE (EMP_ID, EMP_NAME, SALARY,


DEPENDENTS) has an attribute DEPENDENT which is multi-valued i.e. an employee
may have none or one or more than dependents. This situation can be best modeled as
follows:-
EMP-NAME
D-NAME RELATIO
N
EMP-ID
DOB
SALARY

EMPLOYEE DEPENDENT

Owner Entity Set Identifying Relationship Weak Entity Set

- The Weak Entity Set DEPENDENT is Existence Dependent on the Strong Entity
Set EMPLOYEE.

- The Weak Entity Set “DEPENDENT” has a Discriminator Attribute D-NAME,


which along with primary key EMP-ID of EMPLOYEE, forms Primary Key of the weak
entity set DEPENDENT. In E-R Diagram, the Discriminator (also called Partial Key) of
a weak entity set is marked by underlining with a broken line.

Special Features of an Identifying Relationship


26

Normally, a situation modeled by Weak Entity Set will have following features:-

(i) The Identifying Relationship will be one-to-many from Owner Entity Set to Weak
Entity Set.

(ii) The Participation of Owner Entity Set in the Identifying Relationship will be
partial and the participation of the Weak Entity Set in the Identifying Relationship
will be Total.

Modeling of a Multi-valued Attribute as Weak Entity Set

In the above example, the Weak Entity Set DEPENDENT can also be modeled as a
multi-valued attribute of Entity Set EMPLOYEE. The multi-valued attribute can be used
to indicate the names of the dependents of employees. But suppose we want to indicate
other parameters of dependents like dependent’s relationship with the employee then the
multi-valued approach will not be suitable. In this case, the Weak Entity approach will be
the ideal choice, since then the weak entity set DEPENDENT can have any number of
attributes.

Extended E-R Features


Specialization. An entity set E may include some sub-groups of entities (say E1, C
E2, ….. En), such that each of these sub-groups may have some distinct attributes different
than the other sub-groups. There will be some attributes that will be common to all sub-
groups. The process of designating these sub-groups within an entity set is called
specialization;
A1 A2 2

Higher Level Entity Set E


Or Super Class

A2 ISA
C1
A1 B1
En
E1 E2

Lower Level Entity Sets or Sub Classes

In the above example, an Entity Set E has been specialized into Sub-groups designated as
E1 , E2 ….. En. E is called “Super Class” or “Higher Level Entity Set” and the entity sets E1
, E2 ….. En are called “Sub Classes” or “Lower Level Entity Sets” of E. The common
27

attributes of all sub entity sets are represented with the super entity sets. And the distinct
attributes of each sub entity set are represented with the sub entity set.

The relationship of Higher Level Entity Set with its Lower Level Entity Sets is called ISA
relationship. It is read as “is a”.

Inheritance of Attributes in Specialization

Each Sub Class will inherit the Attributes of its Super Class; plus it will have its own
distinct Attributes. Like in the above case, each lower entity set will inherit attributes A 1
and A2 of the Super Class E.

Example:- Consider an entity set ACCOUNT with attributes Account-Number and


Balance. The Entity Set ACCOUNT may be specialized into different types of accounts
like SAVINGS-ACCOUNT, CURRENT-ACCOUNT, FIXED-DEPOSIT (FD) and
RECURRING-DEPOSIT (RD). The SAVINGS-ACCOUNT may have an attribute
Interest-Rate and CURRENT-ACCOUNT may have attribute Over-Draft. Similarly, FD
and RD have distinct attributes of their own.

Account-Numbe Balance
r

ACCOUNT
Mat-Date

Int-Rate Installment
Interest-Rate ISA

RD
SAVINGS-ACCOU Over-Draft
NT Int-Rate

Mat-Date
CURRENT-ACC
OUNT
FD

Specialization Constraints

Disjoint Vs Overlapping Specialization

Disjoint. It implies that an entity does not belong to more than one lower-
level entity set i.e. an account is either savings-account or current-account but not
both.

Overlapping. In overlapping generalizations, an entity may belong to more than


one lower-level entity sets within a single generalization.
28

Total Vs Partial Specialization

Total Each higher level entity must belong to a lower-level entity set.
Partial. Some higher-level entities may not belong to any lower-level entity set.

Generalization. Specialization is a top-down approach; whereas Generalization is


exactly inverse of that. Generalization refers to the process of fusing several distinct
entity sets into a single Higher Level Entity Set, on the basis of commonality of their
attributes. Then the fused sets form sub classes or lower level entity sets. The common
attributes of the Lower Level Entity Sets will be assigned to the Higher Level Entity Set.
Thus, generalization is a process, which proceeds in a bottom-up manner, in which
multiple entities are synthesized into a single higher-level entity set, on the basis of their
common features. The higher-level entity set is termed as super-class and lower level
entity set is termed as sub-class. As regards E-R Diagram, both Specialization and
Generalization are represented exactly in the same manner.

Aggregation. One limitation of E-R Model is that it fails to express relationships


among relationship sets or relationship between a relationship set on one side and an
entity set on the other side. Aggregation provides a solution in this case. Aggregation is
an abstraction through which relationships are treated as higher-level entities, which can
then participate in relationships with other Entity Sets or with other relationship sets. For
example the relationship between R1 and E3 as indicated below.

A1 A2 B1 B2

E1 R1 E2

Aggregated Higher Level Entity Set “R1”

R2

E3

C1 C2 C3

Here, the Relationship Set R1 between Entity Set E1 and Entity Set E2 has been
aggregated as Higher Level Entity Set “R1”. This Higher Level Entity Set is participating
in a Relationship R2 with Entity Set E3. Thus, through aggregation, we are able to
represent a Relationship between Relationship Set R1 and Entity Set E3.
29

Example:- Suppose, we have Entity Sets “EMPLOYEE”, “BRANCH” and “JOB”


which are related through a Relationship “EBJ” which indicates, “which employee” is
performing “what jobs” at “which branch”. There will be multiple jobs at each at each
branch and assume that each employee may be performing multiple jobs at one of the
branches. Suppose, we want to relate another Entity Set “MANAGER” to indicate:-

(i) The set of Employees managed by a

manager. (ii)The set of jobs managed by a

manager.

(iii)The Branches managed by a manager (assume a manager can manages only


one branch).

If we represent this scenario without use of aggregation, then the E-R Diagram will be as
follows:-
BRANCH
EMPLOYEE
EBJ

JOB
EM BM

JM

MANAGER

The above Scenario can be better modeled by aggregating the Relationship Set “EBJ” a a
higher level Entity Set and the creating a relationship between this higher level entity set
and the Entity Set “MANAGER”, as indicated below:-

BRANCH

EMPLOYEE JOB
PraveenEBJ
Kumar
30

EBJM

MANAGER

This modeling represents the situation more realistically, wherein the Relationship Set
“EBJM” indicates “which combinations of employee-branch-job” are being managed by
each manager.

Reduction of E-R Schema to Tables


An E-R Diagram can be reduced to a set of Tables, as explained below:-

(a) Tabular representation of a Strong Entity Set. A Strong Entity Set E will be
represented by a Table named “E”. The Table will have columns as follows:-

(i) Simple, Single-valued Attributes There will be a column for each


simple, single-valued attribute of Entity Set E.

(ii) Composite Attributes There will be a column for each sub-part of


a Composite Attribute; no column needs to be assigned for composite attribute as
such. For example for NAME comprising of First Name (FN), Middle Name
( MN) and Last Name (LN) there will be three columns for FN, MN and LN. No
column needs to be assigned for NAME. If NAME needs to be produced, it can
be done by combining the sub-parts.

(iii) Derived Attributes No column needs to be assigned for the derived


attributes; since the values of these attributes are not stored in database.

(iv) Multi-Valued Attribute Each Multi-Valued Attribute (say M) will be


represented by a separate Table (say named E-M) which will have a column each
for the primary key attributes of E and a column for Attribute M. Each value of
the multi-valued attribute will be represented in a separate row in this table.

Let E be a Strong Entity Set with simple single-valued attributes a1,a2,……,an. This
Entity Set will be represented by a Table called E with n distinct columns, each of which
will correspond to one of the attributes. Let D1,D2,…Dn be the domains of attributes

Praveen Kumar
31

a1,a2,….,an respectively. The Table E will comprise of a set of rows, which will be a
subset of the Cartesian Product D1 X D2 X…….Dn.

Example Age
DOB Tel_No

Name
Univ_Roll_No Cit
y
Stree
STUDENT H-N t Pin
o
Addres
s

The derived attribute Age will not be represented in the STUDENT table. When required,
its value will be derived from DOB.

The Tel-No will be represented in a separate table (say named STUDENT-TEL-NO),


which will have a column for Primary Key of STUDENT i.e. Roll-No and a column for
Tel-No. Suppose, a student has more than one Tel-No then his Roll-No will appear that
many times in this table.

The Above E-R Diagram will be reduced to following two Tables:-

STUDENT
Univ_Roll_No Name DOB H-No Street City Pin

STUDENT-TEL-NO
Univ_Roll_No Tel_No

(b) Tabular representation of Relationship Sets. Let R be a Relation Set and


let a1, a2,…..am be the set of attributes formed by the union of the primary keys of all the
Entity Sets participating in Relation R and let the descriptive attributes of R (if any) be

Praveen Kumar

Date-of-Operation
32

b1,b2,…..bn. Then the Relation R will be represented by a Table named say “R”, which
will (m+n) columns, each column representing one of the attributes from the set {a1, a2,
……am} U {b1,b2,….bn}.

Example
C-Addres
Account-No Branch-Name
s
C-Id
Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

The Relationship Set DEPOSITOR will be represented by a table named DEPOSITOR.


The Entity Sets CUSTOMER and ACCOUNT have Primary Keys C-Id and Account-No
respectively, which will also form part of the DEPOSITOR table. In addition, the
DEPOSITOR table will have a column for its Descriptive Attribute “date-of-Operation”.
The above E-R Diagram will be reduced to the following set of tables:-

CUSTOMER
C-Id C-Name C-address

ACCOUNT
Account-Number Balance Branch-Name

DEPOSITOR
C-Id Account-Number Date-of-Operation

Shifting of Descriptive Attributes of a Relationship Set and Merging of Relationship


Set Table with the tables of participating Entity Sets. Depending on the Cardinality

Praveen Kumar
33

Mapping of the participating Entity Sets, the Descriptive Attributes of the Relationship
set can be shifted to one of the participating Entity Sets. Also, a Relationship Set Table
can be combined with the table of one of the participating Entity Sets, as per the
following conditions:-

(1) One-to-One Relationship Suppose there is a One-to-One relationship between


two entity sets, then the rows in the Relationship Set table will have one-to-one mapping
with the rows in the tables of the participating entity sets. Under this condition, it is
possible to shift the descriptive attributes of the relationship set to any of the participating
Entity Sets and also it is possible to merge the table of the Relationship Set with the table
of any of the participating Entity Sets, without loss of any information.

Example:-

Date-of-Operation

C-Addres Account-No Branch-Name


s
C-Id
Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

As indicated above, there is One-to-One Relationship between CUSTOMER and


ACCOUNT i.e. Each Customer has at most one account and each account is “Single”
(i.e. owned by only one customer).
CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

Praveen Kumar
34

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007

As obvious, the rows in DEPOSITOR table are having one-to-one mapping with the rows
in the CUSTOMER Table and also with the rows in the ACCOUNT Table. That is, the
first row of DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of
DEPOSITOR maps onto the first row of ACCOUNT, the third row of DEPOSITOR
maps onto the second row of ACCOUNT and the last row of DEPOSITOR maps onto the
third row of ACCOUNT. Thus, the descriptive attribute Date-Of-Operation of the
Relationship Set DEPOSITOR can be shifted to either CUSTOMER or ACCOUNT.
Also, the DEPOSITOR Table can be combined either with the CUSTOMER Table or
with the ACCOUNT Table, without losing any information. The combined table will
have union of the columns of the two merged tables. Suppose, DEPOSITOR Table is
merged with the CUSTOMER Table, then the CUSTOMER Table will also include
attributes Account_Number and Date_Of_Operation . The resulting set of tables will then
be:-

CUSTOMER
C-Id C-Name C-address Account-Nu Date-of-Opera
mber tion
C-001 Ajay 320, Sector-26, Noida A-310 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-101 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-203 03-Feb-2007
C-505 Shyam 303,Sector-22,RKP A-305 27-Dec-2007

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

The combined CUSTOMER Table now includes the Primary Key (AN) of ACCOUNT
and descriptive attribute Date_Of_Operation of DEPOSITOR.

Praveen Kumar
35

(2) One-To-Many Relationship Suppose there is a One-to-Many relationship


between CUSTOMER and ACCOUNT i.e. each customer can have many accounts, but
each account has to be single.

Date-of-Operation
Example
C-Addres
Account-No Branch-Name
s
C-Id
Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP
A-550 35000 CP
A-670 60000 Sec-18

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-310 10-Jan-2007
C-220 A-101 23-Dec-2006
C-310 A-203 03-Feb-2007
C-505 A-305 27-Dec-2007
C-101 A-550 22-Dec-2006
C-310 A-670 01-Jan-2007

The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
ACCOUNT Table i.e. with the “Many-Side Entity Set” Table. That is, the first row of
DEPOSITOR maps onto the fourth row of ACCOUNT, the second row of DEPOSITOR
maps onto the first row of ACCOUNT, the third row of DEPOSITOR maps onto the

Praveen Kumar
36

second row of ACCOUNT, the fourth row of DEPOSITOR maps onto the third row of
ACCOUNT, the fifth row of DEPOSITOR maps onto the fifth row of ACCOUNT and
the last row of DEPOSITOR maps onto the last row of ACCOUNT table. Thus, the
descriptive attribute Date-Of-Operation can be shifted to ACCOUNT (The “Many-Side”
Entity Set) and the DEPOSITOR Table can be with the ACCOUNT Table (i.e. with the
table of the “Many-Side” Entity Set), without losing any information. The resultant
ACCOUNT table will also include the Primary Key C-Id of CUSOMER table and
descriptive attribute DOO of the DEPOSITOR table. The resulting set of tables will then
be:-

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22,RKP

ACCOUNT
Account-Numbe Balance Branch-Name Customer_Id Date_of_Operation
r
A-101 10000 Sec-18 C-220 23-Dec-2006
A-203 30000 Sec-26 C-310 03-Feb-2007
A-305 50000 CP C-505 27-Dec-2007
A-310 25000 RKP C-101 10-Jan-2007
A-550 35000 CP C-101 22-Dec-2006
A-670 60000 Sec-18 C-310 01-Jan-2007

(3) Many-to-One Relationship Suppose there is many-to-one relationship between


CUSTOMER and ACCCOUNT, which implies that each account can be “Joint” but each
customer can hold only one account. In this case, the table DEPOSITOR can be
combined with “Many-Side” Entity-Set table CUSTOMER.

Date-of-Operation
Example
C-Addres
Account-No Branch-Name
s
C-Id
Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address

Praveen Kumar
37

C-001 Ajay 320, Sector-26, Noida


C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
The rows in the DEPOSITOR table have one-to-one mapping onto the rows in
CUSTOMER Table i.e. with the “Many-Side Entity Set” Table. Thus, the descriptive
attributes of DEPOSITOR can be shifted to “Many-Side” Entity Set CUSTOMER and
the DEPOSITOR Table can be with the CUSTOMER Table, without losing any
information. The resultant CUSTOMER table will also include the Primary Key
Account_Number of ACCOUNT table and descriptive attribute DOO of the
DEPOSITOR table. The resulting set of tables will then be:-

CUSTOMER
C-Id C-Name C-address Account_Number DOO
C-001 Ajay 320, Sector-26, Noida A-101 10-Jan-2007
C-220 Vijay 110,Sector-8, RKP A-203 23-Dec-2006
C-310 Ram 120,Sector-25, Noida A-101 03-Feb-2007
C-505 Shyam 303,Sector-22, RKP A-203 27-Dec-2007

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26

(4) Many-to-Many Relationship Suppose there is many-to-many relationship


between CUSTOMER and ACCCOUNT, which implies that each account can be “Joint”
but each customer can hold many accounts. In this case, the table DEPOSITOR cannot be
combined with any Entity Set and it has be created as a separate table. Since, if we
combine then we have to combine with both the Entity Sets and that would add
unnecessary data redundancy, which is not acceptable.

Praveen Kumar
38

Date-of-Operation
Example
C-Addres
Account-No Branch-Name
s
C-Id
Balance
C-Name

ACCOUNT
CUSTOMER DEPOSITOR

CUSTOMER
C-Id C-Name C-address
C-001 Ajay 320, Sector-26, Noida
C-220 Vijay 110,Sector-8, RKP
C-310 Ram 120,Sector-25, Noida
C-505 Shyam 303,Sector-22, RKP

ACCOUNT
Account-Number Balance Branch-Name
A-101 10000 Sec-18
A-203 30000 Sec-26
A-305 50000 CP
A-310 25000 RKP

DEPOSITOR
C-Id Account-Number Date-of-Operation
C-001 A-101 10-Jan-2007
C-220 A-203 23-Dec-2006
C-310 A-101 03-Feb-2007
C-505 A-203 27-Dec-2007
C-101 A-305 30-Dec-2007
C-505 A-310 02-Jan-2007

Now, the rows in the DEPOSITOR table do not have one-to-one mapping with
CUSTOMER table and also with the ACCOUNT table. So, the DEPOSITOR table can
neither be merged with CUSTOMER table nor with ACCOUNT table. Thus, there has to
be a separate table for DEPOSITOR as indicated above. Also, the descriptive attributes of
the Relationship Set cannot be shifted to the participating Entity Sets; the descriptive
attributes have to remain with the relationship set itself.

Praveen Kumar
39

(c) Tabular representation of Weak Entity Sets. Let A be a Weak Entity Set
with descriptive Attributes a1,a2,……,am. Let B be the Strong Entity Set on which A is
existence dependent. Let the primary key of B consist of attributes b1,b2,….bn. The
Entity Set A is represented by a Table called A with (m+n) columns, each column
representing one of the attributes from the set {a1,a2,……am} U {b1,b2,….bn}.

Example: Payment-Date
Payment-N
Amoun o
Loan-No t Installmen
t

LOAN- PAYMENT
LOAN PAYMENT

There will be Tables LOAN and PAYMENT; the PAYMENT table will also include the
Primary Key of Loan i.e. Loan-No. The Primary Key of table PAYMENT will be
{Loan-No, Payment-No} where the attribute Payment-No is called a “Discriminator” or
“Partial Key” of the table PAYMENT.

Redundancy of Tables in Weak Entity Sets The Table for Identifying


Relationship LOAN-PAYMENT is not required because if we create such a table, it will
have only two attributes i.e. Loan-No and Payment-No, which as such form part of table
PAYMENT. Thus, no table needs to be created for an Identifying Relationship. In case
there exists a Descriptive Attribute of an Identifying that can be shifted to the “Many-
Side Entity Set” i.e. the Weak Entity Set.

(e) Tabular representation of Generalization. The steps involved are:-

Create a Table each for the higher-level entity set and for each lower-level entity
set. The table for lower-level entity set will include its own attributes plus all the
Primary-Key attributes of its higher-level entity set.

Praveen Kumar
40

Account-Numbe Balance
r

ACCOUNT
Mat-Date

Int-Rate Installment
Interest-Rate ISA

RD
SAVINGS-ACCOU Over-Draft
NT Int-Rate

Mat-Date
CURRENT-ACC
OUNT
FD

For example, in the above case there will five tables i.e. ACCOUNT, SAVINGS-
ACCOUNT, CURRENT-ACCOUNT, FD and RD. The table ACCOUNT will
have columns Account-Number and Balance; and table SAVINGS-ACCOUNT will have
columns Account-Number and Interest Rate; and table CURRENT-RATE will have
columns Account-Number and Over-Draft. Same is applicable to the tables FD and RD.

Combining of Tables in Generalization If a generalization is “Total”, which implies


that each entity in the super-class (higher-level entity set) is a member of at least one sub-
class (lower-level entity set), no table is required to be created for the higher-level entity
set. Instead a table needs to be created for each lower-level entity set; and each such
table will also include all the attributes of higher-level entity set, in addition to its own
distinct attributes. For example, the table SAVINGS-ACCOUNT will also have columns
Account-Number, Balance and Interest-Rate; and the table CURRENT-ACCOUNT will
also have the columns Account-Number, Balance and Over-Draft. The same is applicable
for FD and RD tables.

(f) Tabular representation of Aggregation. Take the following Example:-

Praveen Kumar
41

B-Name
E-Nam B#
E# e
BRANCH J#

EMPLOYEE JOB

EBJ

Aggregated Higher- evel-Entity-Set “EBJ”

Mgr-I EBJM
d

MANAGER

In the above scenario, there will be tables for Entity Sets EMPLOYEE, BRANCH, JOB
and MANAGER. There will be one table for Relationship Set EBJM having Attributes
E#, B#, J# and Mgr-Id. No table is required for the Relationship Set EBJ because this
table would be a subset of table EBJM.

EBJM
E# B# J# Mgr-Id

E-R DIAGRAM OF AN AIRLINE RESERVATION SYSTEM


CAPACITY
AC_NO AC_TYPE

AIRCRAFT
TO_PLACE

FROM_PLAC DATE ATD


ETD CREW_NAM
E E

Praveen Kumar
42

FLT_NO AT
ETA A
DESIGNATION
FLT_SCHEDULE FLIGHT CREW-ID

FLT_ CREW
CREW
C_DATE

CONFIRMED
CANCELLATION RESERVATION
SEAT_NO

TICKET_N
O ISSUE_DATE
AMOUNT
VOUCHER
_NO
FARE

TICKET
REFUND
P_ADDRES
S
P_TEL_NO
P_NAME

PASSENGER

Where ETD: Estimated Time of Departure (i.e. scheduled take-off time).


ETA: Estimated Time of Arrival (i.e. scheduled landing time at the destination)
ATD: Actual Time of Departure. Initially, it will have a NULL value. It will
get defined after aircraft actually takes-off.
ATA: Actual Time of Arrival. Initially, it will have a NULL value. Its value will
get defined only after aircraft actually lands at the destination.

The above E-R Diagram can be reduced to the following set of tables:-

FLIGHT_SCHEDULE (FLT_NO, FROM_PLACE, TO_PLACE, ETD,

Praveen Kumar
43

ETA)

Praveen Kumar
44

AIRCRAFT (AC_NO, AC_TYPE, CAPACITY)


CREW (CREW_ID, CREW_NAME,
DESIGNATION) FLIGHT (FLT_NO, DATE, AC_NO,
ATD, ATA) FLT_CREW (FLT_NO, DATE,
CREW_ID)
TICKET (TICKET_NO, ISSUE_DATE, FARE, P_NAME, P_ADDR,
P_TEL_NO) RESERVATION (TICKET_NO, FLT_NO, DATE, CONFIRMED,
SEAT_NO) CANCELLATION (TICKET_NO, FLT_NO, DATE,
VOUCHER_NO, C_DATE) REFUND (VOUCHER_NO, AMOUNT)

Since, there is one-to-one relationship between TICKET and PASSENGER, a common


table TICKET will suffice for these two entity sets. Even the CANCELLATION table can
be combined with RESERVATION table, since there is many-to-one relationship between
RESERVATION and REFUND.

The SEAT_NO will get defined only after a passenger checks in for a flight.

E-R DIAGRAM FOR VEHICLE INSURANCE

O_ADDRESS
O_TEL_NO

O-NAME

OWNER
PREMIUM

COLOR EXPIRY_DATE

MODEL
REG_NO
BONUS
POLICY_NO
MAKE

VEHICLE INSURANCE_POLICY

P_DAT
A_DATE E
A_REPORT_NO PAYMENT_
VOUCHER P_AMOUN
_NO T
PLACE

ACCIDENT CLAIM_PAYMENT

S_REPORT_NO
ASSESSED
_DAMAGE Praveen Kumar
45

SURVEYOR REPORT

REPAIR_ITEM
COST
REF_NO

REPAIRS

Praveen Kumar

You might also like