Unit 1 (Introduction & ER Model)
Unit 1 (Introduction & ER Model)
The ER Model - The Relational Model, Relational Calculus, Introduction to Database Design,
Database Design and ER Diagrams-Entities Attributes, and Entity Sets-Relationship and
What
Relationship Sets - Conceptual Design is Database
with ER Model System
• A database system is basically a computerized record keeping system. i.e., it is a computerized system
whoseoverall purpose is to store information and to allow users to retrieve and update that information on
demand.
• A simplified picture of the database system is shown below.
• This system is a multi-user system in which many users can access the database at the same time.
Hardware:
• The hardware components of the system include disk, CPU, main memory etc.
Software:
• Between the physical database and the user is a layer of software, known as Database Management
System(DBMS)
• DBMS handles all requests for access to the database.
• Functions of DBMS:
o Shields the database users from hardware level details.
o Data Dictionary Management
o Data Storage Management
o Data Transformation and Presentation
o Security Management
What is a Database?
• A database is collection of related data
• Def: A database is a collection of persistent data that is used by the application systems of some given
enterprise.
o Database is persistent because once it has been accepted by the DBMS for entry, it can
subsequentlybe removed from the database only by some explicit request to the DBMS.
o Enterprise can be commercial, scientific, technical or other organization. It can be a single
individual ora complete corporation.
• A real world scenario will be translated into a database. It is combination of entities, relationships and
properties.
• Entity is any real world object w.r.t our problem domain. It is an object about which we wish to
recordinformation.
• An entity set is a set of that share common properties of entities
• A relationship is an association among two or more entities or other relationships.
• The properties/attributes are the information we want to record about entities.
• Properties can be either simple or complex.
Why database?
• A database is a collection of data or information which is held together in an organised or logical way.
• Databases can either be paper based or computerised.
The advantages of database system over traditional paper-based methods of record keeping are
1. Compactness : There is no need for paper files which are big at volume.
2. Speed : machine retrieves and updates data faster than machine.
3. Less drudgery : mechanical works like maintaining files by hand is eliminated.
4. accurate : Accurate, up-to-date information is available on demand at anytime.
5. Protection : Database can be better protected against unintentional loss & lawful access.
6. Multi-user environment can be provided.
Computerised databases can either use a File management System or Database Management system.
They are
1. Data redundancy and inconsistency : Redundancy means multiple copies of the same data. To access data
different programmers create files with different structures in different programming languages. It leads to
data duplication.
Eg: If the student address is stored in several files , and we change only one copy then it leads to data
inconsistency.
Data inconsistency means, multiple copies of the same data may no longer agree.
2. Difficulty in accessing the data: FPS doesn’t allow required data to be retrieved in a convenient and
efficient manner.
Eg: if a user asks a list of customers living in a particular town then they have to write a program manually,
which is a tedious job.
3. Data isolation: New application is needed to retrieve data, as data scatters in multiple files, which has
different formats. Writing new applications every time is a difficult task.
4. Integrity problems : Data stored in the system has to satisfy certain consistency
constraints.Eg: minimum balance of an account be Rs.500/-
We say that data is maintaining integrity, when all these conditions are met.
In FPS, when new constraints are added, it is difficult to change the programs to enforce them.
5. Atomicity Problems : Atomicity means either all actions of a program should be executed, or none of
theactions should be executed.
Eg: A fund transfer of Rs. 500/- from Account A to B
A B
Assume the initial balances 1000 2000
A B
If all actions are done 500 2500
A B
If none of them are done 1000 2000
A B
But partial execution of operations is not acceptable 500 2000
It is difficult to ensure atomicity in FPS.
6. Concurrent access anomalies : When many programs are executed simultaneously, sometimes anomalies
maycome. Dealing these anomalies is difficult case of FPS.
7. Security Problems : Security can be provided by keeping passwords to files. But this is not sufficient when
we have to allow users to access different subsets of the data.
These problems of FPS prompted the development of database systems. A DBMS is a piece of software designed to
make the preceding tasks easier.
Single Tier DBMS Architecture is the most straightforward DBMS architecture. All the DBMS
components reside on a single server or platform, i.e., the database is directly accessible by
the end-user. Because of this direct connection, the DBMS provides a rapid response, due to
which programmers widely use this architecture to enhance the local application.
In this structure, any modifications done by the client are reflected directly in the database, and
all the processing is done on a single server. Also, no network connection is required to perform
actions on the database. This database management system is also known as the local database system.
• Conceptual schema (logical schema) describes the stored data interms of the data model of the DBMS.
• Physical schema specifies storage details
• External schema allows access to individual users or group of users.
Architecture of DBMS
The Architecture of most of commercial DBMS available today is mostly based on this ANSI-SPARC1 database
architecture.
The objectives of this three-level architecture is to separate the user's view,
• It allows independent customized user views: Each user should be able to access the same data, but have a
different customized view of the data. These should be independent: changes to one view should not affect
others.
• It hides the physical storage details from users: Users should not have to deal with physical database
storage details.
• The database administrator should be able to change the database storage structures without affecting the
users’ views.
• The internal structure of the database should be unaffected by changes to the physical aspects of the
storage: For example, a changeover to a new disk.
1
ANSI-SPARC stands for American National Standards Institute, Standards Planning And Requirements Committee, is an
abstract design standard for a Database Management System (DBMS)
• These three levels provide data abstraction, that means they hide the low level complexities from end users .
• Using these three levels, it is possible to use complex structures at internal level for efficient operations
andto provide simpler convenient interface at external level.
• The detailed database architecture is shown in the below figure.
• Any database will have only one conceptual and physical schema, but it may have more number of
external schemas, each tailored to a particular group of users.
• External Schema is a collection of one or more views and relations from the conceptual schema. By
concept aview is a relation, but it is not stored in the DBMS.
• External Schema design is guided by end user requirement
Data Independence :
• The main benefit of DBMS is it offers data independence i.e., the
application programs are insulated from changes in the way the data is
structured and stored and this is achieved through the use of the three
levels ofabstraction.
• The main purpose of data abstraction is achieving data independence
in order to save time and cost requiredwhen the database is modified
or altered.
Mappings:
• In addition to the levels, architecture also involves mappings.
o One conceptual / internal mapping
o Several external / conceptual mappings
Properties of Database
• Data sharing
• Data Integration
• Data Integrity
• Data Security
• Data Abstraction
• Data Independence
The above given image represents the data model of the Vehicle database,
vehicle are classified into two types Viz. two-wheelers and four-wheelers
and then they are further classified. The main drawback we can see here is
we can only have one to many relationships under this model, hence the
hierarchical data model is very rarely used nowadays.
Network Model : A network model is nothing but a generalization of the hierarchical data model as this data model
allows many to many relationships therefore in this model a record can also have more than one parent.
The network model can be represented as a graph and hence it replaces
the hierarchical tree with a graph in which object types are the nodes and
relationships are the edges.
For example -
Here you can see all the three departments are linked with the director
which was not possible in the hierarchical data model. In the network
model, there can be many possible paths to reach a node from the root
node (College is the root node in the above case), therefore the data can
be accessed efficiently when compared to the hierarchical data model.
But, on the other hand, the process of insertion and deletion of data is
quite complex.
Entity-Relationship model (ER Model) : An Entity-Relationship model is a high-level data model that describes the
structure of the database in a pictorial form which is known as ER-diagram. In simple words, an ER diagram are used to
represent logical structure of the database easily. ER model develops a conceptual view of the data hence it can be
used as a blueprint to implement the database in the future. Developers can easily understand the system just by
looking at ER diagram. Let's first have a look at the components of an ER
diagram.
In the above-represented ER diagram, we have two entities that are Employee and Company and the relationship
among them. Also, in the above-represented ER diagram, we can see that both employee and company have some
attributes and the relationship is of "works in" type, which means employee works in a company.
The above table shows a relation "STUDENT" with attributes as Stu. Id, Name, and Branch which consists of 4 records
or tuples.
Object-Oriented Data model: As suggested by its name, the object-oriented data model is a combination of object
oriented programming, and relational data model. In this data model, the data and their relationship are
represented in a single structure which is known as an object. Since data is stored as objects we can easily
store audio, video, images, etc in the database which was very
difficult and inconvenient to do in the relational model. As shown in
the image below two objects are connected with each other through
links.
Advantages of DBMS
1. Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data
multiple times). In a database system, by having a centralized database and centralized control of data by
the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time for processing the
large volume of data. It results in saving the storage space.
2. Improved Data Sharing: DBMS allows a user to share the data in any number of application programs.
3. Data Integrity: Integrity means that the data in the database is accurate. Centralized control of the data
helps in permitting the administrator to define integrity constraints to the data in the database. For
example: in customer database we can enforce an integrity that it must accept the customer only from
Noida and Meerut city.
4. Security: Having complete authority over the operational data, enables the DBA in ensuring that the only
mean of access to the database is through proper channels. The DBA can define authorization checks to be
carried out whenever access to sensitive data is attempted.
5. Data Consistency: By eliminating data redundancy, we greatly reduce the opportunities for inconsistency.
For example: is a customer address is stored only once, we cannot have disagreement on the stored values.
Also updating data values is greatly simplified when each value is stored in one place only. Finally, we avoid
the wasted storage that results from redundant data storage.
6. Efficient Data Access: In a database system, the data is managed by the DBMS and all access to the data is
through the DBMS providing a key to effective data processing
7. Enforcements of Standards: With the centralized of data, DBA can establish and enforce the data
standards which may include the naming conventions, data quality standards etc.
8. Data Independence: Ina database system, the database management system provides the interface
between the application programs and the data. When changes are made to the data representation, the
meta data obtained by the DBMS is changed but the DBMS is continues to provide the data to application
program in the previously used way. The DBMs handles the task of transformation of data wherever
necessary.
9. Reduced Application Development and Maintenance Time : DBMS supports many important functions
that are common to many applications, accessing data stored in the DBMS, which facilitates the quick
development of application.
1. What is the need of data model in DBMS and give its classification
2. Who are the different database users? Explain their interfaces to database management system.
3. Describe the client server architecture for the database with necessary diagram.
4. Define Database Management Systems.
5. What is Data Base Administrator? Discuss the functions of DBA.
6. Explain DBMS applications
7. What are the disadvantages in file system?
8. What is data independence? Discuss three tire schema architecture of data independence.
9. Explain storage manager component.
10. Explain object-oriented data model.
11. Explain briefly the languages supported by database systems.
12. What is Data modeling? Explain relational model.
13. List various types of database users. Explain.
14. Discuss abstract view of data with diagram.
15. Explain about Entity-Relationship model with an example.
16. Define the two levels of data independence.
17. Explain the merits and demerits of data base system.
18. Differentiate between schema and instance.
19. Describe the characteristics of a database system.
20. Draw and explain three-tier schema architecture of database system.
21. Present any two database applications by describing their features.
22. What do you mean by environment in database systems? Explain with the help of database system
structures.
23. Mention various groups of database users. Explain about their roles in detail.
24. What is a data model? Describe various data models.
25. Distinguish between centralized and client-server architectures of a database system.
26. Differentiate between File system and Database System
27. List out Database applications.
28. Explain in detail about Database Management System advantages over file management system.
29. Explain the concept of Data independence.
30. Briefly describe various architectures of database systems.
31. What is Data Independence? Why is it essential?
Relational Calculus:
• It is an alternative to Relational Algebra.
• It allows us to describe the set of answers without being explicit about how they should be computed. i.e.,
it isa non-procedural language. It can also be called as declarative language.
• Calculus has variables, constants, comparison operators, logical connectives and quantifiers
• Relational calculus comes in two flavours
o Tuple Relational Calculus (TRC)
o Domain Relational Calculus (DRC)
• Both TRC and DRC are simple subsets of first-order logic.
• Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants
tovariables that make the formula evaluate to true.
• Result of { T | P(T) } this query is the set of all tuples t which the formula P(T) evaluates to true.
Examples :
1. Find the names and ages of sailors with a rating above 7.
{P | S Sailors(S.rating > 7 P.name = S.sname P.age = S.age)}
This query illustrates a useful convention: P is considered to be a tuple variable with exactly two fields, which
are called name and age, because these are the only fields of P that are mentioned and P does not range
over any of the relations in the query; that is, there is no subformula of the form P Relname. The result of
this query is a relation with two fields, name and age. The atomic formulas P.name = S.sname and P.age =
S.age give values to the fields of an answer tuple P . On instances B1, R2, and S3, the answer is the set of
tuples 〈Lubber, 55.5〉, 〈Andy, 25.5〉, 〈Rusty, 35.0〉, 〈Zorba, 16.0〉, and 〈Horatio,35.0〉.
2. Find the sailor name, boat id and reservation date for each reservation.
{P | R Reserves S Sailors
(R.sid = S.sid
P.bid = R.bid
P.day = R.day
P.sname = S.sname)}
For each Reserves tuple, we look for a tuple in Sailors with the same sid. Given a pair of such tuples, we
construct an answer tuple P with fields sname, bid, and day bycopying the corresponding fields from these
two tuples. This query illustrates how we can combine values from different relations in each answer tuple.
5. Find the names of sailors who have reserved at least two boats.
{P | S Sailors R1 Reserves R2 Reserves (S.sid = R1.sid
R1.sid = R2.sid
R1.bid ≠ R2.bid
P.sname = S.sname)}
We can write this query without using implication, by observing that an expression of the form p q is
logically equivalent to ¬p ^ q:
{S | S Sailors B Boats
(B.color ≠’red’ (R Reserves(
S.sid = R.sid
R.bid = B.bid)))}
This query should be read as follows: “Find sailors S such that for all boats B, either the boat is not red or a
Reserves tuple shows that sailor S has reserved boat B.”
• The result of this query is the set of all tuples <x1, x2, ..., xn> for which the formula evaluates to true.
(or)
3. Find the names of sailors who have reserved at least two boats.
{〈N 〉| I, T, A(〈I, N, T, A〉 Sailors
Br1, Br2, D1, D2(I, Br1, D1〉 Reserves
〈I, Br2,D2 〉 Reserves Br1 = Br2)
Database Design
Database Design process is divided into 6 basic steps.
1. Requirements Analysis
2. Conceptual Database Design
3. Logical Database Design
4. Schema Refinement
5. Physical Database Design
6. Application and Security Design.
1. Requirements Analysis
• This is the first step in designing any database application.
• This is an informal process that involves
o Discussion with users,
o Study of current environment and its changes
o Analysis of documentation on existing applications.
4. Schema Refinement :
• Under this, we have to analyze the collection of relations in our relational database schema to identify
thepotential problems.
6. Security Design:
• In this step, we must identify different user groups and different roles played by various users.
• For each role, and user group, we must identify the parts of the database that they must be able to access
Entity :
• An entity is a thing or object in the real world with an independent existence.
• An entity may be an object
o with a physical existence (for example, a particular person, car, house, or employee) or
o it may be an object with a conceptual existence (for instance, a company, a job, or a university
course).
• Collection of similar entities is referred as Entity Set
o Eg: Faculty set
• Entity Sets need not be disjoint.
o Eg: A student who is part of CSE entity set may also be a part of CRT entity set.
• Entities are represented by Rectangle with entity name inside.
Attribute :
• Attributes are particular properties that describe the entity.
• All entities in a given entity set have the same attributes.
o Eg: Attributes of Employee entity set are empno, name, salary .
• It is represented by an oval.
• The attribute value that describes it becomes major part of the data stored in the Database.
• Domain is set of possible values for an attribute. For every attribute of an entity set, we must
identify adomain.
o Eg: employee name is set of 50 character string.
• NULL values : sometimes a particular entity may not have an applicable value for an attribute or the value
is unknown. To handle these situations a special value NULL is used.
o Eg: Flat number in the address, college degree of a person(may or may not exist)
Composite Attributes :
o The attributes which can be further divided into more attributes with independent meanings.
• Eg: Address can be further divided into street, city, state, country.
o The notation used is :
Multivalued Attribute :
o Attributes which can have multiple values for a single entity.
• Eg: color of a car, phone number.
o These attributes are represented using a double lined oval.
Stored Attribute :
o Attributes that cannot be derived from other attributes.
• Eg: Birth_date
Complex Attributes:
o An attribute which is made by using the multivalued attributes and composite attributes.
• Eg: A person can have more than one residence; each residence can have more than one phone.
Key Attributes :
• The attribute which uniquely identifies each entity in the entity set is called key attribute.
• For example, Roll_No will be unique for each student.
• These attributes have unique values for all entities.
• In ER diagram, key attribute is represented by an oval with underlying line.
Descriptive attributes: A relationship can have attributes which describes about the relation. These are
usedto record information about the relationship, rather than about any one of the participating entities.
o Eg: Latha works in CSE department since 01/01/15.
Department of CSE, Vishnu Institute of Technology 22
Database Management Systems UNIT-I
• The relationships are identified the participating entities, not by descriptive attributes.
o i.e., in the above example relationship is identified by eid and did.
Degree of a relationship :
The number of different entity sets participating in the relationship set is the degree. It can be
o Unary relationship
o Binary relationship
o Ternary relationship
o N-ary relationship
Unary relationship :
• This is a relationship between the entities of the same entity set.
• The entity sets that participate in a relationship set need not be distinct.
o Eg: Latha reports to Sumit.
Both Latha and Sumit are entities of Employee Entity set, but they are playing different roles.
Here they are supervisor and subordinate.
Binary relationship :
• This is a relationship between any two different entities.
o Eg: The worksin relationship shown above is a binary relationship. Because the participating
entitiesare only two (Employee and Department)
Ternary Relationship:
• This is a relationship between any three different entities.
o Eg: Each department has offices in several locations. The ER diagram to record the locations at
which each employee works is
one – to –
one
One entity from entity set A can be associated with more than one entities of entity
set B however an entity from entity set B, can be associated with at most one entity.
It is marked as 1 :N
one – to –
many
More than one entities from entity set A can be associated with at most one entity of
entity set B, however an entity from entity set B can be associated with more than
one entity from entity set A.
It is marked as N :1
many – to –
one
One entity from A can be associated with more than one entity from B and vice versa.
It is marked as M :N
many – to –
many
Participation Constraint:
• It specifies whether the existence of an entity depends on its being related to another entity via a
relationshiptype.
• This constraint specifies the minimum number of relationship instances that each entity can participate in.
• There are two types of participation constraints
• Total Participation : Every entity in the entity set must be related to another entity in the relationship set.
o Eg: Every employee must work for a department.
In ER diagrams the total participation is represented by double line between participating entity
type torelationship.
• Partial Participation : Some part of the entityset is related to some other entity via a relationship.
o Eg: Few of the employee are managing the department.
In ER diagrams the total participation is represented by solid line between participating entity
type torelationship.
o Consider the relationship - Employee is head of the department. Here all employees will not be the
head of the department. Only one employee will be the head of the department. In other words,
only few instances of employee entity participate in the above relationship. So employee entity's
participation is partial in the said relationship. However each department will be headed by some
employee. So department entity's participation n is total in the said relationship.
• As weak entities cannot be identified on their own, they always has a total participation w.r.t to its
identifyingrelationship.
• Partial Key : the attribute that can uniquely identify weak entities that are related to the same owner entity.
o Eg: pname (name of the dependent)
• In ER diagram, the weak entity types are represented by double lined rectangle.
EER Model
• It is natural to classify the entities in an entity set into subclasses.
• EER model stands for Enhanced Entity Relationship Model.
• It includes all the modelling concepts of ER model and also the concepts of subclass and superclass and the
related concepts of specialization and generalization. The resultant diagrams are called Enhanced ER
diagrams.
• Specialization:
o Specialization is the process of identifying subsets of an entity set that share some
distinguishing characteristic.
o The entity type from which subclasses are defined is called superclass of specialization.
o Eg:
• set of classes Secretary, Technician, engineer are subclasses of superclass employee. This
distinguishing is on the basis of job type.
• Another specialization on employee based on method_of_pay can be hourly_employee
and salaried_employee.
• **we may have several specializations of the same entity type based on different distinguishing
characteristic**
• Attributes of subclass are called specific attributes.
• In the above ER diagram, two specialized entities were identified based on job type.
• Generalization:
o This is reverse process of abstraction.
o Generalization is the process of defining generalized entity type from the given entity types.
o Here we suppress the differences among several entity types, identify their common features,
andgeneralize them into a single super class.
o Eg: consider two entity types CAR and TRUCK
o From the above two entites a generalized entity can identified with the attributes vehicle_id,
licence_plate_no, price.
• Aggregation:
o A relationship is an association between entitysets. But sometimes Database Designers may need to
model a relationship between a collection of entities and relationships.
o Aggregation is a feature, which allows us to indicate that a relationship participates in another
relationship.
o Eg: Each project entity is sponsored by one or more departments. Sponsors relationship captures this
information. The department that sponsors a project might assign employees to monitor the
sponsorship. Monitors is a relationship that is associated with sponsors relationship.
o We use aggregation, to express relationship among relationships.
Works_In2 does not allow an employee to work in a department for two or more periods.
Similar to the problem of wanting to record several addresses for an employee: we want to record several
values of the descriptive attributes for each instance of this relationship.
Entity versus Relationship: There is at most one employee managing a department, but a given employee
could manage several departments; we store the starting date and discretionary budget for each manager-
department pair. This approach is natural if we assume that a manager receives a separate discretionary
budget for each department that he or she manages.
What if manager’s dbudget covers all managed depts? (can repeat value, but such redundancy is
problematic)
Advantages of ER Modeling
1. ER Modeling is simple and easily understandable. It is represented in business users language and it can be
understood by non-technical specialist.
2. Intuitive and helps in Physical Database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design.
5. Gives a higher level description of the system.
Disadvantages of ER Modeling
1. Physical design derived from E-R Model may have some amount of ambiguities or inconsistency.
2. Sometime diagrams may lead to misinterpretations
Example:
ER Diagram of a company
ER-Diagram on Hospital
Important Questions
1. How to represent a weak entity set in ER diagram? Quote suitable example.
2. Define the following terms and give examples
a) cardinality (ii)unary relationships (iii)aggregation (iv)specialization
3. How to maintain class hierarchies in ER-Diagrams? Explain with employee database.
4. Explain the following terms:
a) Entity and entity set.
b) Attribute and attribute sets.
c) Relationship and relationship sets.
5. Define generalization and aggregation. Demonstrate generalization and aggregation using E-R diagram.
6. Explain about Entity-Relationship model with an example.
7. Explain about domain constraints and key constraints.
8. What are the major components used in E-R diagram design?
9. Differentiate between super class and sub class.
10. Construct an ER diagram for university registrar’s office. The office maintains data about each class,
including the instructor, the enrollment and the time and place of the class meetings. For each
student class pair a grade is recorded.
11. Determine the entities and relationships.