0% found this document useful (0 votes)
43 views

Chapter# 14 Database Design Theory and Normalization

The document discusses database normalization and design theory. It describes four guidelines for determining the quality of relation schema design: 1) ensuring attribute semantics are clear, 2) reducing redundant information, 3) reducing null values, and 4) avoiding spurious tuples. The document then discusses normalization, which restructures data to eliminate redundancy and logical data dependencies through normal forms. It also covers anomalies like insertion, deletion, and update anomalies that can occur when referencing and referenced relations are not properly structured.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Chapter# 14 Database Design Theory and Normalization

The document discusses database normalization and design theory. It describes four guidelines for determining the quality of relation schema design: 1) ensuring attribute semantics are clear, 2) reducing redundant information, 3) reducing null values, and 4) avoiding spurious tuples. The document then discusses normalization, which restructures data to eliminate redundancy and logical data dependencies through normal forms. It also covers anomalies like insertion, deletion, and update anomalies that can occur when referencing and referenced relations are not properly structured.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Database

Systems
CHAPTER # 14
DATABASE DESIGN THEORY AND NORMALIZATION
Normalization

 Normalization is the process of


reorganizing/restructuring data in a database
with a series of so called normal-forms, so that it
meets two basic requirements:
 There is no redundancy of data (all data is stored in
only one place)
 data dependencies are logical (all related data
items are stored together).
 Normalization is important for many reasons, but
chiefly because it allows databases to take up as
little disk space as possible, resulting in increased
performance.
Four informal guidelines

 Four informal guidelines used to determine the


quality of relation schema design:
 Making sure that the semantics of the attributes is
clear in the schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious
tuples.
Figure 14.1 A
simplified COMPANY
relational database
schema.
Guideline 1

 Design a relation schema so that it is easy to


explain its meaning.
 Do not combine attributes from multiple entity
types and relationship types into a single relation.
 If the relation corresponds to a mixture of multiple
entities and relationships, semantic ambiguities
will result and the relation cannot be easily
explained.
Examples of Violating
Guideline 1.

Guideline 1 by mixing attributes from distinct real-world entities: EMP_DEPT


mixes attributes of employees and departments, and EMP_PROJ mixes
attributes of employees and projects and the WORKS_ON relationship.They
fare poorly Against the above measure of design quality. They may be used
as views, but they Cause problems when used as base relations,
Redundant Information in
Tuples and Update Anomalies
 Goal of schema design: to minimize the storage
space used by the base relations
 Grouping attributes into relation schemas has a
significant effect on storage space.

Natural join of Employee and


Department Tables
Anomalies
 Anomalies in the relational model refer to inconsistencies or errors
that can arise when working with relational databases,
specifically in the context of data insertion, deletion, and
modification.
 There are different types of anomalies that can occur in
referencing and referenced relations which can be discussed as:
 These anomalies can be categorized into three types:
 Insertion Anomalies
 Deletion Anomalies
 Update Anomalies.
Insertion Anomalies.

 Insertion anomalies can be differentiated into two


types:
 To insert a new employee tuple into EMP_DEPT,
we must include either the attribute values for the
department that the employee works for, or NULLs
(if the employee does not work for a department
as yet).
 For example, to insert a new tuple for an employee
who works in department number 5, we must enter
all the attribute values of department 5 correctly so
that they are consistent
Insertion Anomalies.

For example, to insert a


new tuple for an employee
In the design of Figure 14.2, we do not have
who works in department
to worry about this consistency problem
number 5, we must enter
because we enter only the department
all the attribute values of
number in the employee tuple; once in the
department 5 correctly so
database, as a single tuple in the
that they are consistent
DEPARTMENT relation.
Insertion Anomalies.

 It is difficult to insert a new department that has no


employees as yet in the EMP_DEPT relation.
 The only way to do this is to place NULL values in the

This violates the entity integrity for EMP_DEPT because its primary
key Ssn cannot be null.
EXAMPLE OF AN INSERT
ANOMALY
 Consider the relation:
 EMP_PROJ(Emp#, Proj#, Hours
Ename, Pname,Plocation)
 Insert Anomaly:
 Cannot insert a project unless
an employee is assigned to it.
 Conversely
 Cannot insert an employee
unless an he/she is assigned to
a project.
EXAMPLE OF AN UPDATE
ANOMALY(Repeated Update)
 Consider the relation:
 EMP_PROJ(Emp#, Proj#, Hours
Ename, Pname,Plocation)
 Update Anomaly:
 Changing the name of
project number P1 from
“ProjectX” to “Customer-
Accounting” may cause this
update to be made for all
100 employees working on
project P1.
EXAMPLE OF A DELETE
ANOMALY
 Consider the relation:
 EMP_PROJ(Emp#, Proj#, Hours
Ename, Pname,Plocation)
 Delete Anomaly:
 When a project is deleted, it
will result in deleting all the
employees who work on that
project.
 Alternately, if an employee is
the sole employee on a
project, deleting that
employee would result in
deleting the corresponding
project.
Guideline 2.

 Guideline 2. Design the base relation schemas so


that no insertion, deletion, or modification
anomalies are not present in the relations.
Null Values

 If many of the attributes do not apply to all tuples


in the relation, we end up with many NULLs in
those tuples.
 This can waste space at the storage level and
may also lead to problems with understanding the
meaning of the attributes.
 Another problem with NULLs is how to account for
them when aggregate operations such as COUNT
or SUM are applied.
Reasons for NULLS

 Attribute not applicable or invalid (e.g. Visa_Status


may not apply to local students)
 Attribute value unknown (may exist) (e.g.
Date_of_birth may be unknown for an employee)
 Value known to exist, but unavailable (e.g.
Home_Phone_Number for an employee may exist,
but may not be available and recorded yet.
 Guideline 3. As far as possible, avoid placing
attributes in a base relation whose values may
frequently be NULL.
 If NULLs are unavoidable, make sure that they apply
in exceptional cases only and do not apply to a
majority of tuples in the relation.
Generation of Spurious
Tuples
 Additional tuples that are not present in
original table are called spurious tuples
because they represent spurious
information that is not valid.
Guideline 4

 Design relation schemas so that they can be


joined with equality conditions on attributes that
are appropriately related (primary key, foreign
key) pairs in a way that guarantees that no
spurious tuples are generated.
 Avoid relations that contain matching attributes
that are not (foreign key, primary key)
combinations because joining on such attributes
may produce spurious tuples.
Spurious Tuples
 There are two important properties of decompositions:
 Non-additive or lossless ness of the corresponding join
 Preservation of the functional dependencies.
 Note that:
 Property (a) is extremely important and cannot be sacrificed.
 Property (b) is less stringent and may be sacrificed.
Functional Dependencies

 A functional dependency is a constraint between


two sets of attributes from the database.
 Definition.
 A functional dependency, denoted by X → Y,
between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The
constraint is that, for any two tuples t1 and t2 in r
that have
 t1[X] = t2[X], they must also have t1[Y] = t2[Y].
Defining Functional Dependencies
 If A and B are attributes of relation R, B is functionally dependent on
A (denoted A  B), if each value of A in R is associated with exactly
one value of B in R.

 X  Y holds if whenever two tuples have the same value for X, they
must have the same value for Y
 For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
 X  Y in R specifies a constraint on all relation instances r(R)
 Written as X  Y; can be displayed graphically on a relation
schema as in Figures. ( denoted by the arrow: ).
 FDs are derived from the real-world constraints on the attributes
Examples of FD constraints
(1)
 Social security number determines employee
name
 SSN  ENAME
 Project number determines project name and
location
 PNUMBER  {PNAME, PLOCATION}
 Employee ssn and project number determines the
hours per week that the employee works on the
project
 {SSN, PNUMBER}  HOURS
Defining FDs from
instances
 Note that in order to define the FDs, we need to
understand the meaning of the attributes
involved and the relationship between them.
 An FD is a property of the attributes in the schema
R
 Given the instance (population) of a relation, all
we can conclude is that an FD may exist between
certain attributes.
 What we can definitely conclude is – that certain
FDs do not exist because there are tuples that
show a violation of those dependencies.
Determining Functional
Dependencies
Figure 14.8 What FDs may
exist?
 A relation R(A, B, C, D) with its extension.
 Which FDs may exist in this relation?
Figure 14.8 What FDs may
exist? FD holds
FD not
B -> C
 A relation R(A, B, C, D) with its extension.
holds
C -> B
A -> B
 Which FDs may exist in this relation? {A,B} -> C
A -> C
{A,B} -> D
A -> D
{A,C} -> B
B -> A
{A,C} -> D
B -> D
{A,D} -> B
C -> A
{A,D} -> C
C -> D
{B,D} -> A
D -> A
{B,D} -> C
D -> B
{C,D} -> A
D -> C
{C,D} -> B
{B,C} -> A
{A,B,C} -> D
{B,C} -> D
{A,B,D} -> C
{B,C,D} -> A
Important Definitions
 Determinant
 Refers to the attribute, or group of attributes, on the left-
hand side of the arrow of a functional dependency.
 Full Functional dependency:
 Indicates that if A and B are attributes of a relation, B is
fully functionally dependent on A if B is functionally
dependent on A, but not on any proper subset of A.
 {StaffNo, StaffName}  BranchNo
 StaffNo  BranchNo
 Transitive Dependency
 A condition where A, B, and C are attributes of a relation
such that if A → B and B → C, then C is transitively
dependent on A via B (provided that A is not functionally
dependent on B or C).
Example Transitive
Dependency

 Consider functional dependencies in the Staff-


Branch relation
staffNo → sName, position, salary, branchNo,
bAddress
branchNo → bAddress
• Transitive dependency, 29
Normalization of Relations

 Definition. The normal form of a relation refers to the


highest normal form condition that it meets, and
hence indicates the degree to which it has been
normalized.
 Normalization of data: process of analyzing the
given relation schemas based on their FDs and
primary keys to achieve the desirable properties of
(1) minimizing redundancy
(2) minimizing the insertion, deletion, and update
anomalies
 Normal form:
 Condition using keys and FDs of a relation to certify
whether a relation schema is in a particular normal
form
Normalization of Relations

 2NF, 3NF, BCNF


 based on keys and FDs of a relation schema
 4NF
 based on keys, multi-valued dependencies : MVDs;
 5NF
 based on keys, join dependencies : JDs
 Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation)
Practical Use of Normal
Forms
 The database designers need not
normalize to the highest possible normal
form
 (usually up to 3NF and BCNF. 4NF rarely
used in practice.)
 DE normalization:
 The process of storing the join of higher
normal form relations as a base relation—
which is in a lower normal form
Definitions of Keys and
Attributes Participating in Keys
 If a relation schema has more than one key, each
is called a candidate key.
 One of the candidate keys is arbitrarily designated
to be the primary key, and the others are called
secondary keys.
 A Prime attribute must be a member of some
candidate key
 A Nonprime attribute is not a prime attribute—that
is, it is not a member of any candidate key.
Non-additive or Lossless
Join Decomposition
 If we decompose a relation R into relations R1 and R2,
 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
 To check for lossless join decomposition using FD set,
following conditions must hold:
 Union of Attributes of R1 and R2 must be equal to attribute of
R. Each attribute of R must be either in R1 or in R2.
 Att(R1) U Att(R2) = Att(R)
 Intersection of Attributes of R1 and R2 must not be NULL.
 Att(R1) ∩ Att(R2) ≠ Φ
 Common attribute must be a key for at least one relation (R1
or R2)
 Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
Example

 R (A, B, C, D)
 FD: {A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:
 First condition holds true as Att(R1) U Att(R2) =
(ABC) U (AD) = (ABCD) = Att(R).
 Second condition holds true as Att(R1) ∩ Att(R2) =
(ABC) ∩ (AD) ≠ Φ
 Third condition holds true as Att(R1) ∩ Att(R2) = A is
a key of R1(ABC) because A->BC is given.
Dependency Preserving
Decomposition
 If we decompose a relation R into relations R1 and
R2, All dependencies of R either must be a part of
R1 or R2 or must be derivable from combination of
FD’s of R1 and R2.
 For Example,
 R (A, B, C, D) with FD set{A->BC} is decomposed
into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of
R1(ABC).
FIRST NORMAL FORM
It states that
the domain of an attribute
must include only atomic
(simple, indivisible) values
and
that the value of any
attribute in a tuple must be
a single value from the
domain of
that attribute.

Figure 14.9
Normalization into 1NF. (a)
A relation schema that is not
in 1NF. (b) Sample state of
relation DEPARTMENT. (c)
1NF version of the same
relation with redundancy.
Normalizing nested relations
into 1NF.
Figure 14.10
Normalizing nested
relations into 1NF. (a)
Schema of the
EMP_PROJ relation
with a nested relation
attribute PROJS. (b)
Sample extension of
the EMP_PROJ relation
showing nested
relations within each
tuple. (c)
Decomposition of
EMP_PROJ into
relations EMP_PROJ1
and EMP_PROJ2 by
propagating the
primary key.
Second Normal Form

 Uses the concepts of FDs, primary key


 Definitions
 Prime attribute: An attribute that is member of
the primary key K
 Full functional dependency: a FD Y -> Z where
removal of any attribute from Y means the FD
does not hold any more
 Examples:
 {SSN, PNUMBER} -> HOURS is a full FD since
neither SSN -> HOURS nor PNUMBER -> HOURS
hold
 {SSN, PNUMBER} -> ENAME is not a full FD (it is
called a partial dependency ) since SSN ->
ENAME also holds
Second Normal Form

 Definition. A relation schema R is in 2NF if every


nonprime attribute A in R is fully functionally
dependent on the primary key of R.
Figure 14.11
Normalizing into 2NF

Second Normal Form and 3NF. (a)


Normalizing EMP_PROJ
into 2NF relations. (b)
Normalizing EMP_DEPT
into 3NF relations.
Normalize this table upto
3nf

42
Example Transitive Dependency

 Consider functional dependencies in the Staff-Branch relation


staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
• Transitive dependency,
• branchNo → bAddress exists on staffNo via branchNo
43
Third Normal Form

 Definition. a relation schema R is in 3NF if it satisfies


2NF and no nonprime attribute of R is transitively
dependent on the primary key.
 Definition of Transitive functional dependency:
 Transitive functional dependency: a FD X -> Z that
can be derived from two FDs X -> Y and Y -> Z
 Examples:
 SSN -> DMGRSSN is a transitive FD
 Since SSN -> DNUMBER and DNUMBER -> DMGRSSN
hold
 SSN -> ENAME is non-transitive
 Since there is no set of attributes X where SSN -> X and
X -> ENAME
Third Normal Form
General Definition. A relation schema R
is in third normal form (3NF) if, whenever
a
Figure 14.11 nontrivial functional dependency X →
Normalizing into 2NF A holds in R, either
and 3NF. (b) (a) X is a super key/Candidate key of R,
Normalizing EMP_DEPT OR
into 3NF relations. (b) A is a prime attribute of R.
Tax_rate is partially
Figure 14.12 dependent on the
Normalization into candidate key
2NF and 3NF. (a)
The LOTS relation Price is transitively
with its functional dependent on each of
dependencies the candidate keys via
FD1 through FD4. non-prime attribute
(b) Decomposing area
into the 2NF
relations LOTS1
and LOTS2. (c)
Decomposing
LOTS1 into the 3NF
relations LOTS1A
and LOTS1B. (d)
Progressive
normalization of
LOTS into a 3NF
design.
Boyce-Codd Normal Form
 Definition. A relation schema R is in BCNF
if whenever a nontrivial functional
dependency X → A holds in R, then X is a
superkey of R.
Decomposition of
Relations not in BCNF
FD1: {Student, Course} Instructor
FD2:14 Instructor Course
Decomposition of
Relations not in BCNF
FD1: {Student, Course} Instructor
FD2:14 Instructor Course

Normalized Relations:
R1 (Instructor, Course) and
R2(Instructor, Student)
Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce–Codd Normal Form (BCNF)

You might also like