UNIT1_DBMS-1
UNIT1_DBMS-1
UNIT1_DBMS-1
Introduction: Database System Applications, Purpose of Database Systems, View of Data, Components
and Structure, Database Users and Administrator, History of Database Systems. Data models: ER
model, Entities, Attributes and Entity sets, Relationships and Relationship sets, Additional features of
ER Model, Conceptual Design with the ER Model, Enhanced E-R Model- Mapping ER Model to
Relational Model.
5. Reservation systems
6.Telecommunication
1.2 PURPOSE OF DATABASE SYSTEM
In the early days, database applications were built directly on top of file systems. For
example, consider a savings-bank enterprise that keeps information about all customers and savings
accounts in files. The users can manipulate the information in the files using application programs.
Some of the programs are,
A program to debit or credit an account
A program to add an account
A program to find the balance of an account
A program to generate monthly statements
System programmers wrote these applications to meet the needs of the bank. New
application programs are written as the need arises. For example ,if the savings bank decides to
offer checking accounts, the bank creates new permanent files that contain information about all the
checking accounts, and system programmers has to write new application programs to deal with
situations that do not arise in savings accounts, such as overdrafts. Thus as time goes by, the system
DEPTOFCSE 1
23CS1303 DATABASE MANAGEMENT SYSTEMS UNIT I
requires more files and more application programs. This leads to several disadvantages as given
below:
Disadvantages in file systems
The file processing system has a number of disadvantages:
1) Data redundancy and inconsistency
Since different programmers create the files and application programs, the files will have a
different format and the programs may be written in several programming languages. Also the
same information may be duplicated in several files. For example the address and telephone
number of a particular customer may appear in a file that consists of saving-account records and
in a file that consist of checking account records. This redundancy leads to higher storage and
access cost.
Data inconsistency occurs, if the various copies of the same data may no longer agree. For
example, a changed customer address may be reflected in savings account records but not
elsewhere in the system.
2) Difficulty in accessing data
Suppose that a bank officer needs to find the names of all customers who live in a particular
postal-code area. The officer insists the data processing department to generate such a list. The
original system has no application program to meet the request. However, the system has an
application program to generate the list of all customers. The officer has only two choices:
either can obtain the list of all customers and extract the needed information manually or ask the
system programmer to write the necessary application program. Both are unsatisfactory.
Thus the conventional file-processing environments do not allow needed data to be retrieved
in a convenient and efficient manner.
3) Data isolation
Because data are scattered in various files, and files maybe in ,writing new application
programs to retrieve the appropriate data is difficult.
4) Integrity problems
The data values stored in a database must satisfy some consistency constraints. For example,
the balance of a bank account may never fall below a prescribed amount (say, $50). Developers
enforce these constraints in the system by adding appropriate code in the various application
programs. However, when new constraints are added it is difficult to change the programs to
enforce them.
5) Atomicity Problems
A computer system like any other mechanical or electrical device is subject to failure. In
many applications, if a failure occurs, it is difficult to restore the data to a consistent state as it
existed prior to the failure. For example consider a program to transfer $50 from account A to
account B. if a system failure occurs during the execution of the program, it is possible that $50
was removed from account A but was not credited to account Resulting in inconsistent
DEPTOFCSE 2
23CS1303 DATABASE MANAGEMENT SYSTEMS UNIT I
database state. The funds transfer must be atomic. It must happen in its entirety or not at all.
Thus it is difficult to ensure atomicity in conventional file processing system.
6) Concurrent access anomalies
To improve the overall performance of the system, many systems allow multiple users to
update the data simultaneously. In such an environment, interaction of concurrent updates may
result in inconsistent data. For example consider bank account A, containing $500. If two
customers withdraw funds (say $50 and $100 respectively) from account at about the same
time, the result of the concurrent executions may leave the account in an incorrect state. If the
two programs run concurrently, they may both read the value $500, and write the result back
$450 and $400 respectively. Depending on which one writes the value last, the account may
contain either $450 or $400, rather than the correct value of $350.
7) Security Problems
Every user of the database system should not be able to access all the data. For example, in
a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees. They do not need to access information about
customer accounts. The file processing systems do not enforce security constraints.
Areas of Database Applications:
o Banking: all transactions Airlines: reservations, schedules
o Universities: registration, grades
o Sales: customers, products, purchases
o Online retailers: order tracking, customized recommendations
o Manufacturing: production, inventory, orders, supply
chain
o Human resources: employee records, salaries, tax deductions
o Examples of Database Management Systems
IBM’sDB2, Microsoft’s Access and SQL Server, Oracle
Advantages of Using the Database Approach:
o Controlling redundancy in data storage and in development and maintenance
efforts.
DEPTOFCSE 3
23CS1303 DATABASE MANAGEMENT SYSTEMS UNIT I
DEPTOFCSE 4
CS8492 DATABASE MANAGEMENT SYSTEMS UNITI
Programs refer to the data model constructs rather than data storage details
Support of multiple views of the data:
Each user may see a different view of the database, which describes only the data of interest
to that user.
Sharing of data and multi-user transaction processing:
Allowing a set of concurrent users to retrieve from and to update the database.
Concurrency control within the DBMS guarantees that each transaction is correctly executed
or aborted
Recovery subsystem ensures each completed transaction has its effect permanently recorded
in the database
OLTP(Online Transaction Processing) is a major part of database applications. This allows
hundreds of concurrent transactions to execute per second.
• Programs are isolated from data through abstraction
• It does not expose details of how(or where) data is stored or how operations are
implemented
External Schema
External View External View
Database
DEPTOFCSE 5
CS8492 DATABASE MANAGEMENT SYSTEMS UNITI
Mapping:
The transformation of requests and results from one level to another is called as
mapping.
Schema
It is the logical structure of the database. The two types of schema are:
Physical schema represents database design at the physical level
Logical schema represents database design at the logical level
Instance– the actual content of the database at a particular point in time is called the instance of the
schema
Data Independence
The ability to change a schema at one level without affecting the higher level schemas is
called as data independence. There are two types of data independence:
o Physical Data Independence– is the ability to modify the physical or internal
schema without changing the logical or external schemas.
o Logical Data Independence– is the ability to modify logical or conceptual schema
without changing the external schemas or application programs.
DEPTOFCSE 6
CS8492 DATABASE MANAGEMENT SYSTEMS UNITI
DBA staff:
They work on defining the database and tuning it by making changes to its definition using
the DDL and other privileged commands.
DDL Compiler:
The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes information
such as the names and sizes of files, names and data types of data items, storage details of each file,
mapping information among schemas, and constraints. In addition, the catalog stores many other
types of information that are needed by the DBMS modules, which can then look up the catalog
information as needed.
Query Compiler & Optimizer:
Query compiler compiles the queries of casual users and translates the min to an internal
form. This internal query is subjected to query optimization. The query optimizer is concerned
with the rearrangement and possible reordering of operations, elimination of redundancies, and use
of correct algorithms and indexes during execution. It consults the system catalog for statistical and
other physical information about the stored data and generates executable code that performs the
necessary operations for the query and makes calls on the runtime processor.
Precompiler, DML Compiler & Host Language Compiler:
Application programmers write programs in host languages such as Java, C, or C++ that
aresubmittedtoaprecompiler.TheprecompilerextractsDMLcommandsfromanapplicationprogram
written in a host programming language. These commands are sent to the DML compiler for
compilation in to object code for database access.
The rest of the program is sent to the host language compiler. The object codes for the DML
commands and the rest of the program are linked, forming a canned transaction whose executable
code includes calls to the runtime database processor. Canned transactions are executed repeatedly
by parametric users, who simply supply the parameters to the transactions. Each execution is
considered to be a separate transaction.
Lower Module:
Runtime database processor:
It executes the privileged commands, the executable query plans, and the canned
transactions with runtime parameters. It works with the system catalog and may update it with
statistics. It also works with the stored data manager, which in turn uses basic operating system
services for carrying out low-level input/output (read/write) operations between the disk and main
memory.
The runtime database processor handles other aspects of data transfer, such as management
of buffers in the main memory. Some DBMSs have their own buffer management module while
others depend on the OS for buffer management. We have shown concurrency control and
backup and recovery systems separately as a module in this figure. They are integrated into the
working of the runtime database processor for purposes of transaction management.
DEPTOFCSE 7
CS8492 DATABASE MANAGEMENT SYSTEMS UNITI
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level
of the database system without altering the schema at the next higher level.
DEPTOFCSE 8
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Logical data independence refers characteristic of being able to change the conceptual
schema without having to change the external schema.
Logical data independence is used to separate the external level from the conceptual
view.
If we do any changes in the conceptual view of the data, then the user view of the data
would not be affected.
Logical data independence occurs at the user interface level.
Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
Physical data independence occurs at the logical interface level.
DEPTOFCSE 9
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
1.7 DATAMODELS:
A data model is a collection of concepts that can be used to describe the structure of a
database. By structure of a database we mean the data types, relationships and constraints that
apply to the data.
1. Relational Model
The relational model uses a collection of tables to represent both data and relationships
among the data. Each table has multiple columns, and each column has a unique name. The
below table called customer table, shows, for example, that the customer identified by customer-
id 100 is named john and lives at 12 anna st. in Chennai and also shows his account number.
DEPTOFCSE 10
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
DEPTOFCSE 12
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
relationship sets.
For a binary relationship set R between entity sets A and B, the mapping cardinality must
be one of the following:
One to one: An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A. The next figure, shows one to one relation.
A B
a1 b1
B
a2 2
a3 b3
b1
a1
b2
DEPTOFCSE a2 3
b3
a3
a2 a
a3 a
a4 a
a5
Many to many : An entity in A is associated with any number of entities in B, and an entity in
DEPTOFCSE 14
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
B is associated with any number of entities in A. the below figure shows many to many relation.
Participation Constraints:
Total Participation:
TheparticipationofanentitysetEinarelationshipsetRissaidtobetotalifeveryentityinEparticipate
sinatleastonerelationshipinR.
A B
a1 b1
a2 b2
a3 b3
a4 b4
Example: every loan entity should be related to atleast one customer through the borrower
relationship. Therefore the participation of loan in the relationship set borrower is total.
Partial Participation:
The participation of an entity set E in a relationship set R is said to be partial if only some
entities in E participate in relationships in R.
Example:
An individual can be a bank customer whether or not the person has a loan with the bank.
Hence it is possible that only some of the customer entities are related to the loan entity set
through the borrower relationship. Hence the participation of customer in the borrower
relationship set is partial.
DEPTOFCSE 15
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Cardinality constraints:
Cardinality constraints can be expressed by drawing either a directed line(), signifying
“one,” or an undirected line(—),signifying “many” between the relationship set and the entity set.
One-to-one relationship:
A customer is associated with atmost one loan via the relationship borrower
A loan is associated with atmost one customer via borrower as shown below.
One-to-Many:
In the one-to-many relationship a loan is associated with atmost one customer via
borrower, and a customer is associated with several(including)loans via borrower as shown below.
DEPTOFCSE 16
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Many-to-one:
In a many-to-one relationship a loan is associated with several(including)customers via
borrower, and a customer is associated with atmost one loan via borrower as shown below.
Many-to-Many:
A customer is associated with several(possibly)loans via borrower
A loan is associated with several(possibly)customers via borrower as shown below.
The figure above shows the descriptive attribute attached to the relationship set
deposit or to specify the most recent date on which a customer accessed that account.
E-R diagram with composite, Multivalued and derived attributes:
DEPTOFCSE 17
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Composite attribute:
In the above figure, the composite attribute name, is represented with the
component attributes first-name, middle-initial, and last-name. Also a composite attributes
address is represented with it’s component attributes street, city, state and zip-code. The attribute
street is itself a composite attribute with the component attributes street-number, street-name and
apartment-number.
Multivalued attribute and Derived attribute:
The previous figure also illustrates a multivalued attribute phone-number, depicted
by a double ellipse, and a derived attribute age, depicted by a dashed ellipse.
Roles: Roles are indicated in E-R diagrams by labeling the lines that connect diamond
store Rectangles. The below figure shows the role indicators manager and worker between the
employee entity set and the works-for relationship set.
In the above figure the labels “manager” and “worker” are called roles; they specify
how employee entities interact via the works for relationship set.
Role labels are optional, and are used to clarify semantics of the relationship.
The above figure consists of three entity sets employee, job, and branch, related through
the relationship set works-on.
Participation of an entity set in a relationship set:
Total participation (indicated by double line):
DEPT OF CSE 18
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Every entity in the entity set Participates in atleast one relationship in the relationship set.
Double lines are used in E-R diagram to indicate the participation of an entity set in a
relationship set is total.
Example:
Participation of loan in borrower is total. Every loan must have a customer associated to it
via borrower
Partial participation:
Some entities may not participate in any relationship in the relationship set
Example:
Participation of customer in borrower is partial
The below figure indicates the participation of entity sets.
Cardinality limits can also express participation constraints an edge between an entity
set and a binary relationship set can have an associated minimum and maximum cardinality, in the
forml...h, where l is the minimum and h is the maximum cardinality. A minimum value of 1
indicates total participation of the entity set in the relationship set. A maximum value of 1
indicates that the entity participates in at most one relationship, while a maximum value *
indicates no limit. The label 1..* on an edge is equivalent to a double line.
Example:
In the previous figure the edge between loan and borrower has a cardinality constraint of
1...1,meaning the minimum and the maximum cardinality are both 1. That is, each loan must have
exactly one associated customer. The limit 0...* on the edge from customer to borrower indicates
that a customer can have zero or more loans. Thus, the relationship borrower is one to many from
customer to loan, and further the participation of loan in borrower is total.
DEPT OF CSE 19
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
In the case of entity set payment its primary key is {loan-number, payment-number},
where loan-number is the primary key of the identifying entity set, namely loan,and payment-
number distinguishes payment entities within the same loan.
The weak entity set is depicted by double rectangles in the E-R diagram. The
discriminator of a weak entity set is underlined with a dashed line as shown below.
Specialization:
An entity set may include sub-groupings of entities that are distinct in some way from
other entities in the set. The E-R model provides a means for representing these distinctive entity
groupings.
For example consider an entity set person with attributes name, street and city. A person
may be further classified as one of the following:
Customer
Employee
Each of these person types is described by a set of attributes that includes all the attributes
of entity set person plus possibly additional attributes. For example, customer entity may be
described further by the attribute customer-id, and employee entity may be described further by
the attributes employee-id and salary.
The process of designating sub groupings within an entity set is called specialization.
The specialization of person allows distinguishing among persons according to whether they are
employees or customers.
Specialization can be applied repeatedly to refine a design scheme. For example, the bank
employees may be classified as one of the following:
Officer
Teller
Secretary
Each of these types has its own attributes. In E-R diagram, specialization is depicted by a
triangle component labeled ISA. The label ISA stands for “is a” and represents, for example,
that a customer “is a” person. The ISA relationship may also be referred as a superclass-subclass
relationship. Higher and lower level entity sets are depicted as rectangles containing the name of
the entity set.
Generalization:
Generalization is a bottom-up design process which combines a number of entity sets that
share the same features into a higher-level entity set. For example employee and customer both
share the attributes name, street, and city which are combined into higher level entity set person.
Specialization and generalization are simple inversions of each other; they are represented in
an E-R diagram in the same way. The terms specialization and generalization are used
interchangeably.
DEPT OF CSE 21
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Attribute inheritance:
Acrucialpropertyofthehigherandlowerlevelentitiescreatedbyspecializationandgeneralizatio
n is attribute inheritance. The attributes of higher-level entity sets are said to be inherited by the
lower-level entity sets. For example, customer and employee inherit the attributes of person. Thus,
customer is described by its name, street, and city attributes and additionally customer-id attribute.
Employee is described by its name, street, and city attributes, and additionally employee-id and
salary attributes.
In the above figure employee, is the lower-level entity set of person and higher level
entity set of the officer, teller, and secretary entity sets. The lower-level entity set with only one
ISA relationship indicates single inheritance. If an entity set is a lower- level entity set with more
than one ISA relationship, then the entity set has multiple inheritance, and the resulting structure
is said to be lattice.
DEPT OF CSE 22
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
Aggregation:
There is a redundancy in the above figure, since every employee, branch, job
combination in manages is also in works-on. The redundancy can be avoided by aggregation as
shown in the below figure.
.
1.9 CONCEPTUAL DESIGN WITH THE ER MODEL
DEPT OF CSE 23
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
The database can be represented using the notations, and these notations can be reduced to a
collection of tables.
In the database, every entity set or relationship set can be represented in tabular form.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table.
Similarly, COURSE_NAME and COURSE_ID form the column of COURSE table and so on.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key
attribute of the entity.
DEPT OF CSE 25
CS8492-DATABASE MANAGEMENT SYSTEMS UNITI
In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values in a
single column of STUDENT table. Hence we create a table STUD_HOBBY with column name
STUDENT_ID and HOBBY. Using both the column, we create a composite key.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET,
and STATE. In the STUDENT table, these attributes can merge as an individual column.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by calculating
the difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables. Table structure for the given ER diagram is as below:
DEPT OF CSE 26