Normalization PDF
Normalization PDF
Normalization PDF
Databases
Introduction:
Relational database design ultimately produces a set of relations.
The implicit goals of the design activity are: information
preservation and minimum redundancy.
Informal Design Guidelines for Relation Schemas
Four informal guidelines that may be used as measures to
determine the quality of relation schema design:
Making sure that the semantics of the attributes is clear in the
schema
Reducing the redundant information in tuples
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples
Imparting Clear Semantics to Attributes in Relations
The semantics of a relation refers to its meaning resulting from
the interpretation of attribute values in a tuple. The relational
schema design should have a clear meaning.
Guideline 1
1. Design a relation schema so that it is easy to explain.
Deletion Anomalies:
Guideline 4
Design relation schemas so that they can be joined with equality
conditions on attributes that are appropriately related (primary
key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated.
Avoid relations that contain matching attributes that are not
(foreign key, primary key) combinations because joining on such
attributes may produce spurious tuples.
Summary and Discussion of Design Guidelines
We proposed informal guidelines for a good relational design.
The problems we pointed out, which can be detected without
additional tools of analysis, are as follows:
Anomalies that cause redundant work to be done during
insertion into and modification of a relation, and that may cause
accidental loss of information during a deletion from a relation
Waste of storage space due to NULLs and the difficulty of
performing selections, aggregation operations, and joins due to
NULL values
Generation of invalid and spurious data during joins on base
relations with matched attributes that may not represent a proper
(foreign key, primary key) relationship
Course
Text
Smith
Data Structures
Bartram
Smith
Data Management
Martin
Hall
Compilers
Hoffman
Brown
Data Structures
Horowitz
Example:
A
a1
b1
c1
d1
a1
b2
c2
d2
a2
b2
c2
d3
a3
b3
c4
d3
The following FDs may hold because the four tuples in the
current extension have no violation of these constraints:
{Ssn, Ename, Bdate}, and any set of attributes that includes Ssn
are all superkeys.
If a relation schema has more than one key, each is called a
candidate key.
One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys.
In a practical relational database, each relation schema must
have a primary key. If no candidate key is known for a relation,
the entire relation can be treated as a default superkey. In the
Table EMPLOYEE, {Ssn} is the only candidate key for
EMPLOYEE, so it is also the primary key.
Definition:
An attribute of relation schema R is called a prime attribute of R
if it is a member of some candidate key of R.
An attribute is called nonprime if it is not a prime attributethat
is, if it is not a member of any candidate key, both Ssn and
Pnumber are prime attributes of WORKS_ON, whereas other
attributes of WORKS_ON are nonprime.
We now present the first three normal forms: 1NF, 2NF, and 3NF.
As we shall see, 2NF and 3NF attack different problems.
Generalized Definition:
Attribute A determines attribute B ( that is B is functionally
dependent on A) if all of the rows in the table that agree in value
for attribute A also agree in value for attribute B.
Fully functional dependency (composite key)
If attribute B is functionally dependent on a composite key A but
not on any subset of that composite key, the attribute B is fully
functionally dependent on A.
Partial Dependency:
When there is a functional dependence in which the determinant
is only part of the primary key, then there is a partial dependency.
For example if (A, B) (C, D) and B C and (A, B) is the
primary key, then the functional dependence B C is a partial
dependency.
{Ssn, Pnumber} Hours is a full dependency
(neither Ssn Hours nor PnumberHours holds).
However, the dependency {Ssn, Pnumber}Ename is partial
because SsnEname holds.
Transitive Dependency:
When there are the following functional dependencies such that
XY, Y Z and X is the primary key, then X Z is a transitive
dependency because X determines the value of Z via Y.
Whenever a functional dependency is detected amongst nonprime, there is a transitive dependency.
Definition. A relation schema R is in 2NF if every nonprime
attribute A in R is fully functionally dependent on the primary key
of R.
The test for 2NF involves testing for functional dependencies
whose left-hand side attributes are part of the primary key.
If the primary key contains a single attribute, the test need not be
applied at all.
If a relation schema is not in 2NF, it can be second normalized or
2NF normalized into a number of 2NF relations in which nonprime
attributes are associated only with the part of the primary key on
which they are fully functionally dependent.
Remedy (Normalization)
Form new relations for each
multivalued attribute or nested
relation.
Decompose and set up a new
relation for each partial key
with its dependent attribute(s).
Make sure to keep a relation
with the original primary key
and any attributes that are fully
functionally dependent on it.
Third (3NF)
Step 3:
Identify all dependencies:
PROJ_NUM and EMP_NUM PROJ_NAME, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS.
Additional dependencies:
PROJ_NUM PROJ_NAME
EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR
JOB_CLASS CHG_HOUR
This dependency exists between two nonprime attributes, which
signals a transitive dependency.
Conversion to Second Normal Form:
Conversion to 2NF only occurs when the 1NF has a composite
primary key. If the 1NF has a single-attribute primary key, then
the table is automatically 2NF.
Step 1: Make new tables to Eliminate Partial Dependencies
For each component of the primary key that acts as a determinant
in a partial dependency, create a new table with a copy of that
component as the primary key. It is also important that the
determinant attribute remains in the original table because they
will be the foreign keys that will relate the new tables to the
original one.
Step 2: Reassign Corresponding Dependent Attributes
Determine all attributes that are dependent in the partial
dependencies. These are removed from the original table and
placed in the new table with their determinant.
Any attributes that are dependent in a partial dependency will
remain in the original table.
Now, we have 3 tables:
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS,
CHG_HOURS)
ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
Conversion to third Normal Form:
Step 1: Make new tables to eliminate transitive dependencies.
For every transitive dependency, write a copy if its determinant as
a primary key for a new table. It is also important that the
determinant remains in the original table to serve as a foreign key.
Step 2:
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_ID)
JOB(JOB_ID, JOB_CLASS, CHG_HOUR)
ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
Consider the table below, describing a badly designed database.
Follow the steps defined above and seen in class to make the
design 3NF compliant.
StdNo
StdCity
StdClass
OfferNo
OffTerm
OffYear
EnrGrade
CourseNo CrsDescr
S1
Seattle
JUN
01
FALL
2013
3.5
C1
DB
02
FALL
2013
3.3
C2
VB
03
SPRING
2014
3.1
C3
OO
02
FALL
2013
3.4
C2
VB
S2
Bothell
JUN