Chapter 7 - Database Design
Chapter 7 - Database Design
1
Outlines
Steps of Database Design
Convert ER to relations
Normalization
2
Database Design
Database design is the process of coming up with different kinds of
specification for the data to be stored in the database.
Describe how data is stored in the computer system.
Defining its structure, characteristics and contents of data.
Database Design Process
Step 1:- Requirements collection and Analysis
Prospective users are interviewed to collect information.
This step result in a concise set of users requirement.
The functional requirement should be specified as well as data
requirements.
Functional requirement can be documented using diagrams such as
sequence diagrams, DFD scenarios.
3
Database Design … Cont’d
Step 2:- Conceptual Design
Create conceptual schema.
Conceptual schema:- concise description of data requirement of the user,
and include a detailed description of the entity types, relationships,
constraint.
End user must understand it.
Step 3:- Database implementation (Logical Design)
Use one of DBMS for implementation.
The conceptual schema is transformed from the high level data model
into implementation model.
Step 4:- Physical Design
International storage structure, indexes, access paths and file
organizations are specified.
Application programs are designed and implemented.
4
Database Design … Cont’d
Generally the design part has divided into three sub phases
Conceptual Design
Logical Design
Physical Design
Design strategies:
1. Top Down
Start with high level abstraction and refine it.
High level entities
Add sub classes
Attributes
2. Bottom up
Start with basic abstraction and then combine them.
Attributes
Group in to entities
5
Database Design … Cont’d
3. Inside out
Special case of bottom up.
Focus on central set of concepts and work out wards.
No burned on initial designer.
4. Mixed
Start with top down then use inside out or bottom up.
Divide and conquer.
6
Conceptual Design
Is the process of constructing a model of the information used in an
enterprise, independent of any physical consideration.
Is the source of information for logical design.
Is high level and understand by non technical user.
Conceptual model of enterprise, independent of implementation detail
such as target DBMS, application programs, programming language,
hardware platform, performance issues etc.
Tasks to be performed:-
Identity entity types and relationships.
Associate attributes with entities.
Determine attribute domains.
Determine unique identifier (Key) attributes.
Use entity relationship model (ER).
7
Conceptual Design … Cont’d
Why conceptual model:
Independent of DBMS.
Allow easy common b/n user and developer.
Is permanent description of the database requirements.
Database requirements
We must convert written database requirement in to an E-R diagram.
Need to determine the entities, attributes and relationships.
Nouns = entities
Adjectives = attributes
Verbs = relationships
8
Logical Design
Is the process of constructing model of data used in an organization.
9
Logical Design … Cont’d
Converting ER Diagram to Relations
Three basic rules to convert ER into tables.
For a relation with one to one cardinality
All the attributes are merged into a single table.
i.e. primary key or candidate key of one relation is foreign key for the
other.
For a relation with one to many cardinality
Post the primary key or candidate key for the “one” side as a foreign
key attribute to the “many side”.
For a relationship with many to many
Create a new table (which is the associative entity) and post primary
key or candidate key from each entity as attributes in the new table
along with some additional attributed (if applicable)
10
Logical Design … Cont’d
11
Logical Design … Cont’d
12
Logical Design … Cont’d
13
Logical Design … Cont’d
Mapping Regular Entities to relation
Simple attributes: ER Attributes map directly on to the relation.
Composite attribute: Use only their simple, component attributes
Multi-Valued Attribute: Becomes a separate relation with a foreign
key taken from the super entity.
14
Logical Design … Cont’d
15
Normalization
A relational database is merely a collection of data, organized in a
particular manner. The father of the relational database approach, Codd
created a series of rules called normal forms that help define that
organization.
One of the best ways to determine what information should be stored in a
database is to clarify what questions will be asked of it and what data
would be included in the answers.
Database normalization is a series of steps followed to obtain a database
design that allows for consistent storage and efficient access of data in a
relational database. These steps reduce data redundancy and the risk of
data becoming inconsistent.
NORMALIZATION is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;
Insertion, Deletion and Modification Anomalies
16
Normalization … Cont’d
Normalization may reduce system performance since data will be cross
referenced from many tables.
Thus DE normalization is sometimes used to improve performance, at
the cost of reduced consistency guarantees.
Normalization normally is considered as good if it is lossless
decomposition.
Mnemonic for remembering the rationale for normalization could be
the following:
No Repeating or Redundancy: no repeating fields in the table
The Fields Depend Upon the Key: the table should solely depend on the key
The Whole Key: no partial key dependency
And Nothing But The Key: no inter data dependency
17
Normalization … Cont’d
All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
Pitfalls of Normalization
Requires data to see the problems
May reduce performance of the system
Is time consuming,
Difficult to design and apply and
Prone to human error
18
Normalization … Cont’d
The underlying ideas in normalization are simple enough. Through
normalization we want to design for our relational database a set of
tables that;
1. Contain all the data necessary for the purposes that the
database is to serve,
2. Have as little redundancy as possible,
3. Accommodate multiple values for types of data that require
them,
4. Permit efficient updates of the data in the database, and
5. Avoid the danger of losing data unknowingly
The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes;
19
Normalization … Cont’d
1. Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database
entry into all the places in the database where information about that new
entry needs to be stored.
In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database.
In an inadequately normalized database, information about a new entry may
need to be inserted into more than one place and, human fallibility being what
it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing
database entry when it is time to remove that entry.
In a properly normalized database, information about an old, to-be-gotten-rid-
of entry needs to be deleted from only one place in the database.
20
Normalization … Cont’d
In an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what
it is, some of the needed additional deletions may be missed.
3. Modification anomalies
A modification of a database involves changing some value of the attribute of a
table. In a properly normalized database table, what ever information is
modified by the user, the change will be effected and used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a
database.
21
Normalization … Cont’d
Deletion Anomalies: If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++ and its skill
type.
Insertion Anomalies: What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed as a value for skill
and we have no clue about the type of skill that Pascal should be
categorized as.
Modification Anomalies: What if the address for Helico is changed fro
Piazza to Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which is prone to
error.
Database-management system can work only with the information that we
put explicitly into its tables for a given database and into its rules for
working with those tables, where such rules are appropriate and possible.
22
Functional Dependency (FD)
Before moving to steps of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
The logical association between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
If the data item A is the determinant data item and B the dependent
data item then the direction of the association is from A to B and not
vice versa.
23
Functional Dependency (FD) … Cont’d
The essence of this idea is that if the existence of something, call it A, implies
that B must exist and have a certain value, then we say that "B is functionally
dependent on A."
We also often express this idea by saying that "A determines B," or that "B is
a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the
statement, "If A, then B.“
It is important to note that the value B must be unique for a given value of A,
i.e., any given value of A must imply just one and only one value of B, in
order for the relationship to qualify for the name "function." (However, this
does not necessarily prevent different values of A from implying the same
value of B.)
X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
24
Functional Dependency (FD) … Cont’d
The notation is: A B which is read as; B is functionally dependent on A
In general, a functional dependency is a relationship among attributes. In
relational databases, we can have a determinant that governs one other
attribute or several other attributes.
FDs are derived from the real-world constraints on the attributes
Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. Dinner Wine
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
25
Partial Dependency
If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.
Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B} C and B C
Then C is partially functionally dependent on {A,B}
26
Full Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.
27
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."
28
Steps of Normalization
We have various levels or steps in normalization called Normal Forms.
The level of complexity, strength of the rule and decomposition increases
as we move from one lower level Normal Form to the higher.
A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
Normal form below represents a stronger condition than the previous one.
Normalization towards a logical design consists of the following steps:
Un Normalized Form: Identify all data elements
First Normal Form: Find the key with which you can find all data
Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they
29 adhere to third normal form.
UNNORMALIZED FORM (UNF)
A table that contains one or more repeating groups.
A repeating group is a field or group of fields that hold multiple values
for a single occurrence of a field.
30
First Normal Form (1NF)
Requires that all column values in a table are atomic (e.g., a number is
an atomic value, while a list or a set is not).
We have two ways of achieving this:
1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship
2. Moving this repeating groups to a new row by repeating the
common attributes. If so then Find the key with which you can find
all data
Definition of a table (relation) in 1NF if:
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
31
First Normal Form (1NF) … Cont’d
FIRST NORMAL FORM (1NF): Remove all repeating groups.
Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in
relational database.
32
First Normal Form (1NF) … Cont’d
Example 2: Consider the following UNF relation.
33
Second Normal form 2NF
No partial dependency of a non key attribute on part of the primary
key.
Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.
Definition of a table (relation) in 2NF
It is in 1NF and
If all non-key attributes are dependent on all of the key. i.e. no
partial dependency.
Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of
2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and
if it has no partial dependencies."
34
Second Normal form 2NF … Cont’d
Example for 2NF:
This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the
primary key.
{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID
But in addition to this we have the following dependencies
EmpID EmpName
ProjNo ProjName, ProjLoc, ProjFund, ProjMangID
35
Second Normal form 2NF … Cont’d
As we can see some non key attributes are partially dependent on
some part of the primary key. Thus these collections of attributes
should be moved to a new relation.
36
Second Normal form 2NF … Cont’d
• Example 2: Normalize the following relation.
• The primary key for this table is the composite key (PatientId,
RelativeId).
37
Second Normal form 2NF … Cont’d
So, to determine if it satisfies 2NF, you have to find out if all other
fields in it depend fully on both PatientId and RelativeId; that is,you
need to decide whether the following conditions are true:
(PatientId, RelativeId) Relationship and
(PatientId, RelativeId) Patient_tel.
However, on the dependencies in the patient table, only the following
are true:
(PatientId, RelativeId) Relationship and
(PatientId) Patient_tel.
38
Second Normal form 2NF … Cont’d
39
Third Normal Form (3NF )
Eliminate Columns Not Dependent On Key - If attributes do not contribute to
a description of the key, remove them to a separate table.
This level avoids update and delete anomalies.
Definition of a Table (Relation) in 3NF
It is in 2NF and
There are no transitive dependencies between attributes.
Example for (3NF): Assumption: Students of same batch (same year) live in
one building or dormitory
• This schema is in its 2NF since the primary key is a single attribute.
40
Third Normal Form (3NF ) … Cont’d
41
Third Normal Form (3NF ) … Cont’d
Consider the following example:
Now, PK = empid
We have functional dependencies:
Empid → depid
Depid → depname
Or Depid → depbudjet
Therefore, the above table is not is 3NF. To normalize it, we can use the
functional dependencies:
Depid → depname
Depid → depbudjet And
Empid → depid
42
Third Normal Form (3NF ) … Cont’d
So that the resulting tables are the following:
43
Other Normal Forms
Boyce-Codd Normal Form (BCNF): Isolate Independent Multiple
Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related. The correct solution, to cause
the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent.
Def.: A table is in BCNF if it is in 3NF and if every determinant is a
candidate key.
Forth Normal form (4NF): Isolate Semantically Related Multiple
Relationships - There may be practical constrains on information that
justify separating logically related many-to-many relationships.
Def.: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
44
Other Normal Forms … Cont’d
Fifth Normal Form (5NF): A model limited to only simple
(elemental) facts.
Def.: A table is in 5NF, also called "Projection-Join Normal
Form" (PJNF), if it is in 4NF and if every join dependency in the
table is a consequence of the candidate keys of the table.
Domain-Key Normal Form (DKNF): A model free from all
modification anomalies.
Def.: A table is in DKNF if every constraint on the table is a
logical consequence of the definition of keys and domains.
45
Physical Database Design … Cont’d
The Logical database design is concerned with the what;
The Physical database design is concerned with the how.
Physical database design is the process of producing a description of the
implementation of the database on secondary storage.
It describes the base relations, file organization, and indexes used to
achieve effective access to the data along with any associated integrity
constraints and security measures.
Sources of information for the physical design process include global
logical data model and documentation that describes model.
Knowledge of the DBMS that is selected to host the database systems,
with all its functionalities, is required since functionalities of current
DBMS vary widely.
46
Steps in physical database design
1. Translate logical data model for target DBMS
To determine the file organizations and access methods that will be
used to store the base relations; i.e. the way in which relations and
tuples will be held on secondary storage
Design enterprise constraints for target DBMS
This phase is the translation of the global logical data model to produce
a relational database schema in the target DBMS. This includes creating
the data dictionary based on the logical model and information
gathered.
After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all necessary
requirements are fulfilled for the database intended to be developed.
47
Steps in physical database design … Cont’d
Knowledge of the DBMS includes:
how to create base relations
whether the system supports:
definition of Primary key
definition of Foreign key
definition of Alternate key
definition of Domains
Referential integrity constraints
definition of enterprise level constraints
Some tasks to be done:
1.1. Design base relation
1.2. Design representation of derived data
1.3. Design enterprise constraint
48
Steps in physical database design … Cont’d
1.1. Design base relation
Designing base relation involves identification of all necessary requirements
about a relation starting from the name up to the referential integrity constraints.
The implementation of the physical model is dependent on the target DBMS
since some has more facilities than the other in defining database definitions.
The base relation design along with every justifiable reason should be fully
documented.
1.2. Design representation of derived data
While analyzing the requirement of users, we may encounter that there are some
attributes holding data that will be derived from existing or other attributes. A
decision on how to represent such data should be devised.
Most of the time derived attributes are not expressed in the logical model but
will be included in the data dictionary. Whether to store stored attributes in a
base relation or calculate them when required is a decision to be made by the
designer considering the performance impact.
49
Steps in physical database design … Cont’d
1.3. Design enterprise constraint
Data in the database is not only subjected to constraints on the
database and the data model used but also with some enterprise
dependent constraints.
This constraint definition is also dependent on the DBMS selected
and enterprise level requirements.
All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.
50
Steps in physical database design … Cont’d
2. Design physical representation
This phase is the level for determining the optimal file organizations to store the
base relations and indexes that are required to achieve acceptable performance,
that is, the way in which relations and tuples will be held on the secondary
storage.
2.1. Analyze transactions
To understand the functionality of the transactions that will run on the
database and to analyze the important transactions
2.2. Choose file organization
To determine an efficient file organization for each base relation
2.3. Choose indexes
Used for quick access
2.4. Estimate disk space and system requirement
To estimate the amount of disk space that will be required by the database.
51
Steps in physical database design … Cont’d
3. Design user view
To design the user views that were identified in the conceptual
database design methodology
4. Design security mechanisms
5. Consider controlled redundancy
To determine whether introducing redundancy in a controlled
manner by relaxing the normalization rules will improve the
performance of the system.
6. Monitor and tune the operational system
To design the access rules to the base relations and user views
52