Functional Dependencies and Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
1. Functional Dependencies
A B
where A and B are sets of attributes of R. A is called the LHS (left hand side) and B is called the
RHS (right hand side), and we say the LHS (functionally) determines the RHS, or the RHS is
(functionally) dependent on the LHS.
The meaning of the functional dependency is that for every value of A, there is a unique value of
B. We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all
attributes of A, they also agree in value for all attributes of B.
However, assuming a student may have multiple phone numbers, then the FD
{studentID} {phoneNumber}
By convention, we often omit the curly braces { } for the set, and write the first functional
dependency in Example 1 as
studentIDname, DateOfBirth.
Note that the above FD can also be written equivalently into the two FDs below:
studentID name
studentIDDateOfBirth
Trivial FD: A trivial FD is one where the RHS is a subset of the LHS. That is, every attribute on the
RHS is also on the LHS. For example, the two FDs below are trivial:
name name,
Trivial FDs do not provide real restrictions on the data in the relation, and they are usually ignored.
A set of FDs may logically imply some other FDs. For instance, ab, and bc together imply ac.
Similarly, ab and c d together imply a,cb,d.
Given a set F of FDs, we can find all FDs implied by those in F using the following three inference
rules.
Given two sets of FDs, E and F, we say E and F are equivalent, if every FD in E is implied by F, and
vice versa.
Given a set F of FDs, and a set X of attributes, the closure set of X (w.r.t F), denoted X+ , is the set of
all attributes that can be inferred to be functionally dependant on X. Surely X is a subset of X+ . For
example, if F={ ab, bc }, then {a}+ ={a,b,c} .
To find the closure set X+, we can use the simple steps below:
1) Initially X+=X
2) For each FD in F, if the LHS is in X+, add the RHS to X+
3) Repeat until no change to X+ is possible.
a,bd,
ce
X= {a,c}.
Initially X+={a,c}. Using the first FD, we can add b to X+. Now using the second FD, we can add d to
X+, and using the last FD we can add e to X+. Thus the final closure set is X+={a,b,c,d,e}.
X is a candidate key if and only if it is a superkey, but none of its proper subset is a superkey.
In other words, X is a candidate key if and only if X+=R, but for any proper subset Y of X, Y+ ≠
R.
The above observation provides a way to find all candidate keys of R using the functional
(dependencies. We can simply try all subsets of R, and test whether it is a candidate key.
However, the candidate keys can be found more efficiently, using the observations below:
Observation 1: any candidate key must contain attributes that have not appeared on the
RHS of any functional dependency.
Observation 2: if an attributed has occurred on the RHS of some FD, but not on the LHS of
any FD, then it cannot be in any candidate key.
Given a set F of functional dependencies that hold on R, we can find all candidate keys of R as
follows:
1. Find all attributes that have not appeared on the RHS of any FD. Denote this set by A.
2. Denote the set of attributes that appear on the RHS of some FD, but not on the LHS of any
FD by B.
3. Compute the closure set A+, if A+=R, then A is the only candidate key.
4. If A+≠R, then for each attribute x in R-B, test whether AU{x} is a candidate key. If not, try to
add another attribute in R-B to A and test whether it is candidate key.
5. Repeat step 4, until all candidate keys have been found.
Example 3.
The set of attribute not occurred on the RHS of any FD is {a, d, e}.
Since the closure set of {a,d,e} contains all attributes of R, it is the only candidate key of R.
{e} is the set of attributes that have not appeared on the RHS of any FD. So every candidate key
must contain e. b only appeared on RHS, so no candidate key can contain b.
Since {e}+ = {e}, {e} is not a superkey. We test each of {c,e}, {a,e}, and {d,e} next.
{c,e}+ ={c,e,b,d,a} . Therefore {c,e} is a superkey. {c,e} is also a candidate key since neither {e} nor
{c} is a superkey.
Initially E=F
Step 1: rewrite each FD that has m attributes on the RHS into m FDs where the RHS is a single
attribute.
For each FD Xy in E, and for each attribute in x in X, if X-{x} y is implied by E, then replace
Xy with X-{x}y.
Example 4.
a,b,c c
a,b,c d
a,b,ce
a,b,cf
Step 4 will remove a,cf (since it is implied by a,cd and df), and one of the ce, and change F
to the minimal cover
4. Normal Forms
A table R is in 3NF if for every non-trivial FD Ab, either A is a superkey or b is a key attribute.
A table R is in 2NF if for every non-trivial FD Ab, either A is not a proper subset of any candidate
key, or b is a key attribute.
An attribute that has appeared in some candidate key is called a key attribute. Attributes that are
not key attributes are called non-key attributes. If A is a proper subset of some candidate key, we
say Ab is a partial dependency, and b is partially dependent on the candidate key. If A is not a
proper subset of some candidate key, and A is not a superkey, we say Ab is a transitive
dependency, and b is transitively dependent on the candidate keys.
With these concepts, 3NF and 2NG can be equivalently defined as follows:
A table R is in 2NF if there are no non-key attributes that are partially dependent on any candidate
key.
A table R is in 3NF if it is already in 2NF, and there are no non-key attributes that are transitively
dependent on any candidate key.
The above definitions also provide a way to test whether a table is in 2NF, 3NF or BCNF. One just
needs to check each non-trivial FD to see whether it violates the definition of the normal forms.
5. Normalization
Given a set F of FDs that hold for table R, if R is not in 2NF (or 3NF, BCNF), we can decompose R into
smaller tables so that each of the smaller tables are in 2NF (or 3NF, BCNF). This process is called
normalization.
The approach is: for each FD Ab that violates the definition of the normal form, we decompose R
into R1 = (A, b), and R2=(R-{b}). This process is repeated until all tables are in the normal form.
Example 4
The only candidate key is {a,b}. ac is a partial dependency and c is a non-key attribute. Therefore,
ac violates the definition of 2NF. We can decompose R into (a,c) and (a,b,d), which are both in
2NF. Actually both are in BCNF.
Caution: when decomposing a table using the FDs, we need to make sure the FD is such that the
RHS attribute does not appear on the LHS (otherwise it is trivial), and the LHS is minimal, that is,
removing at attribute from the LHS the FD will no longer hold.
Suppose R is decomposed into smaller tables R1,R2,…,Rn. We say that the decomposition is lossless
if for any instance r of R, r is equivalent to the natural join of its projections onto R1, R2, …., Rn.
If both A and B are attributes in Ri , and AB is implied by F, then we AB is a FD that hold on Ri.
We say the decomposition is FD-preserving if the FDs that hold on R1, R2,…, Rn altogether is
equivalent to F.
It is known that any table can be decomposed into 3NF tables with lossless-join property and FD-
preserving property, but we may not be able to decompose a table into BCNF with both of these
properties.
An algorithm for decomposing a table into 3NF with both properties is as follows:
2. For each FD Ab in E, if b is a non-key attribute, have a table (A,b), where A is the primary
key.
3. (optional but good) Merge tables that have identical primary key.
4. Have a table R-B, where B is the set of non-key attributes that appeared on the RHS of some
FD in E.
Example 5
Suppose R= (a, b, c, d, e, f)
The FD set F contains a,b c,d,e and da.
The minimal cover of the functional dependencies is:
a,b-->c
a,b-->d
a,b-->e
d-->a
The table is NOT in 2NF because there are partial dependencies .e.g., a.bc
Using step 2, we will have two tables (a,b,c) and (a,b,e), where {a,b} is the primary key.
Using step 3, we merge the above table into (a,b,c,e).
Using step 4, we get the table (a,b,d,f).
1. There may be multiple minimal covers of a set of FDs (For instance, {ab; bc; ca} and
{ac; cb; ba} are both minimal covers of {ab,c; bc,a; ca,b}). The minimal cover
obtained by this tool is one of them.
2. There may be multiple correct ways for a table to be decomposed into 3NF or BCNF tables.
Therefore, if your own decomposition is different from what is obtained by this tool, it does
not mean your decomposition is incorrect.
3. The 3NF decomposition obtained by this tool is a dependency-preserving decomposition.