Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

1

UNIT-3 (Notes)

What is Functional Dependency?


A functional dependency occurs when one attribute uniquely determines another attribute within a
relation. It is a constraint that describes how attributes in a table relate to each other. If attribute A
functionally determines attribute B we write this as the A→B.
Functional dependencies are used to mathematically express relations among database entities
and are very important to understanding advanced concepts in Relational Database Systems.
Example:

roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values
of fields name, dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
• name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
• dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same building
B2, hence dept_building → dept_name is an invalid functional dependency.
• More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name}
→ {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
2

Example, roll_no → dept_name & dept_name → dept_building, then roll_no →


dept_building is also valid.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X →
Y and Y is the subset of X, then it is called trivial functional dependency
Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of
trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
For example,

roll_no name age

42 abc 17
3

roll_no name age

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other(i.e. name → age or age → name
doesn’t exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a →
b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.
For example,

enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and
X->Z which states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and
Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS.

Normalization
If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.

• Update anomalies − If data items are scattered and are not linked to
each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
4

scattered over several places, a few instances get updated properly


while a few others are left with old values. Such instances leave the
database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was
left undeleted because of unawareness, the data is also saved
somewhere else.
• Insert anomalies − We tried to insert data in a record that does not
exist at all.

Normalization is a method to remove all these anomalies and bring the


database to a consistent state.

First Normal Form


First Normal Form is defined in the definition of relations (tables) itself. This
rule defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal


Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form


Before we learn about the second normal form, we need to understand the
following −

• Prime attribute − An attribute, which is a part of the candidate-key, is


known as a prime attribute.
5

• Non-prime attribute − An attribute, which is not a part of the prime-


key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be


fully functionally dependent on prime key attribute. That is, if X → A holds,
then there should not be any proper subset Y of X, for which Y → A also
holds true.

We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any
of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.

We broke the relation in two as depicted in the above picture. So there


exists no partial dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second Normal
form and the following must satisfy −

• No non-prime attribute is transitively dependent on prime key


attribute.
• For any non-trivial functional dependency, X → A, then either −
6

X is a superkey or,
• A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and
only prime key attribute. We find that City can be identified by Stu_ID as
well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two
relations as follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on
strict terms. BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a super-


key.

In the above image, Stu_ID is the super-key in the relation Student_Detail


and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

Which confirms that both the relations are in BCNF.

1st Normal Form (1NF)


7

• A table is referred to as being in its First Normal Form if atomicity of the table is 1.

• Here, atomicity states that a single cell cannot hold multiple values. It must hold only a
single-valued attribute.

• The First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Now you will understand the First Normal Form with the help of an example.

Below is a students’ record table that has information about student roll number,
student name, student course, and age of the student.

In the studentsrecord table, you can see that the course column has two values.
Thus it does not follow the First Normal Form. Now, if you use the First Normal Form
to the above table, you get the below table as a result.

By applying the First Normal Form, you achieve atomicity, and also every column
has unique values.

Before proceeding with the Second Normal Form, get familiar with Candidate Key
and Super Key.
Candidate Key
A candidate key is a set of one or more columns that can identify a record uniquely
in a table, and YOU can use each candidate key as a Primary Key.

Now, let’s use an example to understand this better.


8

Super Key
Super key is a set of over one key that can identify a record uniquely in a table, and
the Primary Key is a subset of Super Key.

Let’s understand this with the help of an example.

Second Normal Form (2NF)


The first condition for the table to be in Second Normal Form is that the table has to
be in First Normal Form. The table should not possess partial dependency. The
partial dependency here means the proper subset of the candidate key should give a
non-prime attribute.

Now understand the Second Normal Form with the help of an example.

Consider the table Location:

The Location table possesses a composite primary key cust_id, storeid. The non-key
attribute is store_location. In this case, store_location only depends on storeid, which
is a part of the primary key. Hence, this table does not fulfill the second normal form.
9

To bring the table to Second Normal Form, you need to split the table into two parts.
This will give you the below tables:

As you have removed the partial functional dependency from the location table, the
column store_location entirely depends on the primary key of that table, storeid.

Now that you understood the 1st and 2nd Normal forms, you will look at the next part
of this Normalization in SQL tutorial.

Third Normal Form (3NF)

• The first condition for the table to be in Third Normal Form is that the table should be in
the Second Normal Form.

• The second condition is that there should be no transitive dependency for non-prime
attributes, which indicates that non-prime attributes (which are not a part of the candidate
key) should not depend on other non-prime attributes in a table. Therefore, a transitive
dependency is a functional dependency in which A → C (A determines C) indirectly,
because of A → B and B → C (where it is not the case that B → A).

• The third Normal Form ensures the reduction of data duplication. It is also used to
achieve data integrity.

Below is a student table that has student id, student name, subject id, subject name,
and address of the student as its columns.

In the above student table, stu_id determines subid, and subid determines sub.
Therefore, stu_id determines sub via subid. This implies that the table possesses a
transitive functional dependency, and it does not fulfill the third normal form criteria.

Now to change the table to the third normal form, you need to divide the table as
shown below:
10

As you can see in both the tables, all the non-key attributes are now fully functional,
dependent only on the primary key. In the first table, columns name, subid, and
addresses only depend on stu_id. In the second table, the sub only depends on
subid.

Boyce CoddNormal Form (BCNF)

Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF
and was developed by Raymond F. Boyce and Edgar F. Codd to tackle certain types
of anomalies which were not resolved with 3NF.

The first condition for the table to be in Boyce Codd Normal Form is that the table
should be in the third normal form. Secondly, every Right-Hand Side (RHS) attribute
of the functional dependencies should depend on the super key of that particular
table.
For example :
You have a functional dependency X → Y. In the particular functional dependency, X
has to be the part of the super key of the provided table.
Consider the below subject table:

The subject table follows these conditions:

• Each student can enroll in multiple subjects.

• Multiple professors can teach a particular subject.

• For each subject, it assigns a professor to the student.

In the above table, student_id and subject together form the primary key because
using student_id and subject; you can determine all the table columns.
11

Another important point to be noted here is that one professor teaches only one
subject, but one subject may have two professors.

Which exhibit there is a dependency between subject and professor, i.e. subject
depends on the professor's name.

The table is in 1st Normal form as all the column names are unique, all values are
atomic, and all the values stored in a particular column are of the same domain.
The table also satisfies the 2nd Normal Form, as there is no Partial Dependency.
And, there is no Transitive Dependency; hence, the table also satisfies the 3rd
Normal Form.
This table follows all the Normal forms except the Boyce Codd Normal Form.
As you can see stuid, and subject forms the primary key, which means the subject
attribute is a prime attribute.
However, there exists yet another dependency - professor → subject.
BCNF does not follow in the table as a subject is a prime attribute, the professor is a
non-prime attribute.
To transform the table into the BCNF, you will divide the table into two parts. One
table will hold stuid which already exists and the second table will hold a newly
created column profid.

And in the second table will have the columns profid, subject, and professor, which
satisfies the BCNF.

With this, you have reached the conclusion of the ‘Normalization in SQL’ tutorial.

Multivalued Dependency and Fourth Normal Form

In database management systems, normalization is an essential process to ensure


that data is organized efficiently and effectively. Multivalued dependency (MVD) is a
12

concept that helps to identify and eliminate data redundancy and anomalies, and
Fourth Normal Form (4NF) is a normalization form that addresses the challenges
associated with multivalued dependency.
Multivalued Dependency (MVD)
Multivalued dependency (MVD) is a type of dependency that exists when a table
contains more than one multivalued attribute, and changes to one attribute can
affect another attribute. In other words, MVD occurs when a table has a non-trivial
relationship between attributes that are not part of the same composite key.
For example, consider a table called "Students". It has columns: "Student ID,"
"Course," and "Textbook." Each student can take multiple courses, and each course
may require multiple textbooks. Therefore, the "Course" and "Textbook" columns
are multivalued attributes.

Student ID Course Textbook

1 Math Algebra

1 Math Calculus

2 Science Biology

2 Science Chemistry

2 Art Art History

3 History American History

In this example, we can note that there is a relationship between the "Course" and
"Textbook" columns. The "Course" column determines which textbooks are needed.
For example, a student taking "Math" will need both "Algebra" and "Calculus"
textbooks. This relationship between "Course" and "Textbook" is a multivalued
dependency.
To express this MVD, we can write the following formula:
Course →→ Textbook
This indicates that for any given value of "Course," there is a set of corresponding
values of "Textbook." For example, if we know that a student is taking "Math," we
can infer that the student needs both "Algebra" and "Calculus" textbooks.
MVD occurs when a table has a non-trivial relationship between attributes that are
not part of the same composite key. In the example above, the "Course" and
"Textbook" columns have a multivalued dependency because the "Course" column
determines which textbooks are needed.
Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is a level of database normalization that requires a


relation to be in BCNF and have no non-trivial multivalued dependencies other than
13

the candidate key, to eliminate redundant data and maintain data consistency. If a
table violates this standard, it needs to be split into two tables to achieve 4NF.
For a relation R to be in 4NF, it must meet two conditions −
• It should be in Boyce-Codd Normal Form (BCNF).
• It should not have any non-trivial multivalued dependencies.
To remove the multivalued dependency (MVD) in the "Students" table example, we
can create two new tables, one for "Courses" and another for "Textbooks," and
establish a relationship between them using foreign keys.
Here's how we can create the tables:
Table 1: Students

Student ID Course ID

1 1

1 2

2 3

2 4

2 5

3 6

Table 2: Courses

Course ID Course Name

1 Math

2 Science

3 Art

4 History

Table 3: Textbooks

Textbook ID Textbook Name Course ID

1 Algebra 1
14

2 Calculus 2

3 Biology 2

4 Chemistry 2

5 Art History 3

6 American History 4

So, we removed the multivalued dependency by splitting the "Course" and


"Textbook" columns into separate tables.
We have also added a new "Course ID" column to the "Students" table. It has a
foreign key that references the "Course ID" column in the "Courses" table. Similarly,
the "Textbooks" table also has a "Course ID" column that serves as a foreign key
referencing the "Course ID" column in the "Courses" table.
Hence, we have achieved the fourth normal form (4NF) for the "Students" table. It
has done after by removing the multivalued dependency and creating separate
tables. The Resultant schema eliminates data redundancy and improves data
integrity, making it easier to manage and query the database.
Challenges of Fourth Normal Form
Achieving 4NF can be challenging, as it requires careful analysis of functional
dependencies, selecting appropriate keys, and balancing between normalization
and performance. Achieving 4NF can be time-consuming, and it may require
additional storage space.
Examples of Achieving Fourth Normal Form
An example of achieving 4NF is splitting a table with MVDs into two or more tables,
with each table containing attributes that are functionally dependent on the primary
key. In another example, an intersection table can be used to eliminate MVDs by
creating a separate table that links two other tables with MVDs.

What is Join Dependency in DBMS?


Whenever we can recreate a table by simply joining various tables where each of these tables
consists of a subset of the table’s attribute, then this table is known as a Join Dependency.
Thus, it is like a generalization of MVD. We can relate the JD to 5NF. Herein, a relation can
be in 5NF only when it’s already in the 4NF. Remember that it cannot be further
decomposed.

Examples of Join Dependency in DBMS

Example 1
<Student>
15

Stu_Name Stu_Skills Stu_Job (Assigned Work)

Tag Marketing GK001

Barry PR GK002

Paulo Graphic Designing GK003

We can decompose the table given above into these three tables given below. And thus, it is
not in the Fifth Normal Form.

<Student_Skills>

Stu_Name Stu_Skills

Tag Marketing

Barry PR

Paulo Graphic Designing

<Student_Job>

Stu_Name Stu_Job

Tag GK001

Barry GK002

Paulo GK002

<Job_Skills>

Stu_Skills Stu_Job
16

Marketing GK001

PR GK002

Graphic Designing GK003

Our Join Dependency would be:

{(Stu_Name, Stu_Skills ), ( Stu_Name, Stu_Job), (Stu_Skills, Stu_Job)}

The relations given above have join dependency. Thus, they do not happen to be in 5NF. It
means that the join relation of the three relations given above is equal to the very original
relation <Student>.

Characteristics of Join Dependency in DBMS?


• The join decomposition is like a further generalization of the Multivalued dependencies.
• In case the join of X1 and X2 over C is equal to relation X, then one can say that there exists a
join dependency (JD).
• Where X1 and X2 are the decompositions X1(A, B, C) and X2(C, D) of a given relation X (A, B,
C, D).
• Alternatively, X1 and X2 are lossless forms of decomposition of X.
• A JD ⋈ {X1, X2,…, Xn} holds over a relation X if X1, X2,….., Xn is a lossless-join type of
decomposition.
• The *(A, B, C, D), (C, D) happen to be a Join Dependency of X if the join of the join’s attribute
happens to be equal to the relation X.
• Here, we use the *(X1, X2, X3) to indicate that relation X1, X2, X3 and so on are a Join
Decomposition of X.
Introduction of 4th and 5th Normal form in DBMS

Prerequisite – Functional Dependency, Database Normalization, Normal Forms
If two or more independent relations are kept in a single relation or we can
say multivalue dependency occurs when the presence of one or more rows in a table
implies the presence of one or more other rows in that same table. Put another way,
two attributes (or columns) in a table are independent of one another, but both depend
on a third attribute. A multivalued dependency always requires at least three
attributes because it consists of at least two attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple values of B exist, then
the table may have a multi-valued dependency. The table should have at least 3
attributes and B and C should be independent for A ->> B multivalued dependency.
Example:
17

Person Mobile Food_Likes

Mahesh 9893/9424 Burger/Pizza

Ramesh 9191 Pizza

Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and “person multi determines
food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than
one.

Fifth Normal Form / Projected Normal Form (5NF)


A relation R is in Fifth Normal Form if and only if everyone joins dependency in R is
implied by the candidate keys of R. A relation decomposed into two relations must
have lossless join Property, which ensures that no spurious or extra tuples are
generated when relations are reunited through a natural join.
Properties
A relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Example – Consider the above schema, with a case as “if a company makes a product
and an agent is an agent for that company, then he always sells that product for the
company”. Under these circumstances, the ACP table is shown as:
Table ACP
Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt
18

Agent Company Product

A2 PQR Nut

The relation ACP is again decomposed into 3 relations. Now, the natural Join of all
three relations will be shown as:
Table R1
Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table R2
Agent Product

A1 Nut

A1 Bolt

A2 Nut

Table R3
Company Product

PQR Nut

PQR Bolt
19

Company Product

XYZ Nut

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of
ACP is a lossless join decomposition. Therefore, the relation is in 5NF as it does not
violate the property of lossless join.

You might also like