Unit 3
Unit 3
Unit 3
UNIT-3 (Notes)
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values
of fields name, dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
• name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
• dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same building
B2, hence dept_building → dept_name is an invalid functional dependency.
• More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name}
→ {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
2
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of
trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
For example,
42 abc 17
3
43 pqr 18
44 xyz 18
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and
X->Z which states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and
Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS.
Normalization
If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to
each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
4
Each attribute must contain only a single value from its pre-defined domain.
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any
of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.
X is a superkey or,
• A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and
only prime key attribute. We find that City can be identified by Stu_ID as
well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two
relations as follows −
and
Zip → City
• A table is referred to as being in its First Normal Form if atomicity of the table is 1.
• Here, atomicity states that a single cell cannot hold multiple values. It must hold only a
single-valued attribute.
• The First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Now you will understand the First Normal Form with the help of an example.
Below is a students’ record table that has information about student roll number,
student name, student course, and age of the student.
In the studentsrecord table, you can see that the course column has two values.
Thus it does not follow the First Normal Form. Now, if you use the First Normal Form
to the above table, you get the below table as a result.
By applying the First Normal Form, you achieve atomicity, and also every column
has unique values.
Before proceeding with the Second Normal Form, get familiar with Candidate Key
and Super Key.
Candidate Key
A candidate key is a set of one or more columns that can identify a record uniquely
in a table, and YOU can use each candidate key as a Primary Key.
Super Key
Super key is a set of over one key that can identify a record uniquely in a table, and
the Primary Key is a subset of Super Key.
Now understand the Second Normal Form with the help of an example.
The Location table possesses a composite primary key cust_id, storeid. The non-key
attribute is store_location. In this case, store_location only depends on storeid, which
is a part of the primary key. Hence, this table does not fulfill the second normal form.
9
To bring the table to Second Normal Form, you need to split the table into two parts.
This will give you the below tables:
As you have removed the partial functional dependency from the location table, the
column store_location entirely depends on the primary key of that table, storeid.
Now that you understood the 1st and 2nd Normal forms, you will look at the next part
of this Normalization in SQL tutorial.
• The first condition for the table to be in Third Normal Form is that the table should be in
the Second Normal Form.
• The second condition is that there should be no transitive dependency for non-prime
attributes, which indicates that non-prime attributes (which are not a part of the candidate
key) should not depend on other non-prime attributes in a table. Therefore, a transitive
dependency is a functional dependency in which A → C (A determines C) indirectly,
because of A → B and B → C (where it is not the case that B → A).
• The third Normal Form ensures the reduction of data duplication. It is also used to
achieve data integrity.
Below is a student table that has student id, student name, subject id, subject name,
and address of the student as its columns.
In the above student table, stu_id determines subid, and subid determines sub.
Therefore, stu_id determines sub via subid. This implies that the table possesses a
transitive functional dependency, and it does not fulfill the third normal form criteria.
Now to change the table to the third normal form, you need to divide the table as
shown below:
10
As you can see in both the tables, all the non-key attributes are now fully functional,
dependent only on the primary key. In the first table, columns name, subid, and
addresses only depend on stu_id. In the second table, the sub only depends on
subid.
Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF
and was developed by Raymond F. Boyce and Edgar F. Codd to tackle certain types
of anomalies which were not resolved with 3NF.
The first condition for the table to be in Boyce Codd Normal Form is that the table
should be in the third normal form. Secondly, every Right-Hand Side (RHS) attribute
of the functional dependencies should depend on the super key of that particular
table.
For example :
You have a functional dependency X → Y. In the particular functional dependency, X
has to be the part of the super key of the provided table.
Consider the below subject table:
In the above table, student_id and subject together form the primary key because
using student_id and subject; you can determine all the table columns.
11
Another important point to be noted here is that one professor teaches only one
subject, but one subject may have two professors.
Which exhibit there is a dependency between subject and professor, i.e. subject
depends on the professor's name.
The table is in 1st Normal form as all the column names are unique, all values are
atomic, and all the values stored in a particular column are of the same domain.
The table also satisfies the 2nd Normal Form, as there is no Partial Dependency.
And, there is no Transitive Dependency; hence, the table also satisfies the 3rd
Normal Form.
This table follows all the Normal forms except the Boyce Codd Normal Form.
As you can see stuid, and subject forms the primary key, which means the subject
attribute is a prime attribute.
However, there exists yet another dependency - professor → subject.
BCNF does not follow in the table as a subject is a prime attribute, the professor is a
non-prime attribute.
To transform the table into the BCNF, you will divide the table into two parts. One
table will hold stuid which already exists and the second table will hold a newly
created column profid.
And in the second table will have the columns profid, subject, and professor, which
satisfies the BCNF.
With this, you have reached the conclusion of the ‘Normalization in SQL’ tutorial.
concept that helps to identify and eliminate data redundancy and anomalies, and
Fourth Normal Form (4NF) is a normalization form that addresses the challenges
associated with multivalued dependency.
Multivalued Dependency (MVD)
Multivalued dependency (MVD) is a type of dependency that exists when a table
contains more than one multivalued attribute, and changes to one attribute can
affect another attribute. In other words, MVD occurs when a table has a non-trivial
relationship between attributes that are not part of the same composite key.
For example, consider a table called "Students". It has columns: "Student ID,"
"Course," and "Textbook." Each student can take multiple courses, and each course
may require multiple textbooks. Therefore, the "Course" and "Textbook" columns
are multivalued attributes.
1 Math Algebra
1 Math Calculus
2 Science Biology
2 Science Chemistry
In this example, we can note that there is a relationship between the "Course" and
"Textbook" columns. The "Course" column determines which textbooks are needed.
For example, a student taking "Math" will need both "Algebra" and "Calculus"
textbooks. This relationship between "Course" and "Textbook" is a multivalued
dependency.
To express this MVD, we can write the following formula:
Course →→ Textbook
This indicates that for any given value of "Course," there is a set of corresponding
values of "Textbook." For example, if we know that a student is taking "Math," we
can infer that the student needs both "Algebra" and "Calculus" textbooks.
MVD occurs when a table has a non-trivial relationship between attributes that are
not part of the same composite key. In the example above, the "Course" and
"Textbook" columns have a multivalued dependency because the "Course" column
determines which textbooks are needed.
Fourth Normal Form (4NF)
the candidate key, to eliminate redundant data and maintain data consistency. If a
table violates this standard, it needs to be split into two tables to achieve 4NF.
For a relation R to be in 4NF, it must meet two conditions −
• It should be in Boyce-Codd Normal Form (BCNF).
• It should not have any non-trivial multivalued dependencies.
To remove the multivalued dependency (MVD) in the "Students" table example, we
can create two new tables, one for "Courses" and another for "Textbooks," and
establish a relationship between them using foreign keys.
Here's how we can create the tables:
Table 1: Students
Student ID Course ID
1 1
1 2
2 3
2 4
2 5
3 6
Table 2: Courses
1 Math
2 Science
3 Art
4 History
Table 3: Textbooks
1 Algebra 1
14
2 Calculus 2
3 Biology 2
4 Chemistry 2
5 Art History 3
6 American History 4
Example 1
<Student>
15
Barry PR GK002
We can decompose the table given above into these three tables given below. And thus, it is
not in the Fifth Normal Form.
<Student_Skills>
Stu_Name Stu_Skills
Tag Marketing
Barry PR
<Student_Job>
Stu_Name Stu_Job
Tag GK001
Barry GK002
Paulo GK002
<Job_Skills>
Stu_Skills Stu_Job
16
Marketing GK001
PR GK002
The relations given above have join dependency. Thus, they do not happen to be in 5NF. It
means that the join relation of the three relations given above is equal to the very original
relation <Student>.
Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and “person multi determines
food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than
one.
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
18
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all
three relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
19
Company Product
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of
ACP is a lossless join decomposition. Therefore, the relation is in 5NF as it does not
violate the property of lossless join.