Normalization of Database
Database Normalization is a technique of organizing the data in the
database. Normalization is a systematic approach of decomposing
tables to eliminate data redundancy(repetition) and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a
multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating redundant(useless) data.
Ensuring data dependencies make sense i.e data is logically
stored.
If a table is not properly normalized and have data redundancy then
it will not only eat up extra memory space but will also make it
difficult to handle and update the database, without facing data
loss. Insertion, Updation and Deletion Anomalies are very frequent if
database is not normalized.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a
branch, data of the student cannot be inserted, or else we will have
to set the branch information as NULL.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of
computer science department? In that case all the student records
will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together,
Student information and Branch information. Hence, at the end of
the academic year, if student records are deleted, we will also lose
the branch information. This is Deletion anomaly.
Normalization Rule
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
For a table to be in the First Normal Form, it should follow the
following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
1. It is in the Second Normal form.
2. And, it doesn't have Transitive Dependency.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third
Normal form. This form deals with certain type of anomaly that is
not handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. For a table to be
in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a
super Key.
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
1. It is in the Boyce-Codd Normal Form.
2. And, it doesn't have Multi-Valued Dependency.