Normalization of Database
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and Deletion
Anomalies.
It is a multi-step process that puts data into tabular form by removing duplicated data
from the relation tables.
Normalization is used for mainly two purpose,
Eliminating redundant (useless) data
Ensuring data dependencies make sense i.e data is logically stored
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
Email: meraj09034@gmail.com Page. 1
4. BCNF
First Normal Form (1NF)
As per First Normal Form, no two Rows of data must contain repeating group of information
i.e each set of column must have a unique value, such that multiple columns cannot be used to
fetch the same row. Each table should be organized into rows, and each row should have a
primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be
combined to create a single primary key. For example consider a table which is not in First
normal form.
Student Table:
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved,
like separated with commas. Rather than that, we must separate such data into multiple rows.
Email: meraj09034@gmail.com Page. 2
Student Table following 1NF will be:
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
Before we learn about the second normal form, we need to understand the following
Prime attribute − An attribute, which is a part of the prime-key, is known as a prime
attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a
non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there should not
be any proper subset Y of X, for which Y → A also holds true.
Email: meraj09034@gmail.com Page. 3
Example:
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name
can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
Email: meraj09034@gmail.com Page. 4
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form (3NF)
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −
No non-prime attribute is transitively dependent on prime key attribute.
For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
Email: meraj09034@gmail.com Page. 5
To bring this relation into third normal form, we break the relation into two relations as follows
−
For example, consider a table with following fields.
Student_Detail Table :
Student_id Student_name DOB Street city State Zip
Email: meraj09034@gmail.com Page. 6
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transitive dependency is,
Amount of data duplication is reduced.
Data integrity achieved.
Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
Email: meraj09034@gmail.com Page. 7
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
Boyce and Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that −
For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Email: meraj09034@gmail.com Page. 8
Which confirms that both the relations are in BCNF.
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
Email: meraj09034@gmail.com Page. 9
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Third Normal Form Comparison of BCNF and 3NF
BCNF or 3NF?
o Relations in BCNF and 3NF
Relations in BCNF: no repetition of information
Relations in 3NF: problem of repetition of information
o Decomposition in BCNF and in 3NF
It is always possible to decompose a relation into relations in 3NF and
the decomposition is lossless
dependencies are preserved
It is always possible to decompose a relation into relations in BCNF and
the decomposition is lossless
the information is not repeated
Email: meraj09034@gmail.com Page.
10