NORMALISATION

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

NORMALISATION

Normalization is the process of organizing the data in the database.It is used


to minimize the redundancy from a relation or set of relations. It is used to
eliminate Insertion, Update, and Deletion Anomalies.It decomposes the
larger table into smaller and links them using relationships.

Types of Anomalies

 Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a


new tuple into a relationship due to lack of data.
 Deletion Anomaly: The delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other important data.
 Updation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.

Normalization is a method to remove all these anomalies and bring the


database to a consistent state.

Types of Normal Forms

 First Normal Form (1NF): This is the most basic level of normalization. In
1NF, each table cell should contain only a single or atomic value, and each
column should have a unique name. The first normal form helps to eliminate
duplicate data and simplify queries.

Example: Suppose a company wants to store the names and contact details
of its employees. It creates a table that looks like this:
You can see that the emp_mobile column has two values. Thus it does not
follow the First Normal Form. Now, if you use the First Normal Form to the
above table, you get the below table as a result.

 Second Normal Form (2NF):

A table is said to be in 2NF if both the following conditions hold:

● Table is in First normal form


● All non-prime attributes should be fully functionally dependent on the
primary key/candidate key.(No partial dependency)
Example: Suppose a school wants to store the data of teachers and the
subjects they teach. They create a table that looks like this:

The table is in 1 NF because each attribute has atomic values. However, it is


not in 2NF because non-prime attribute teacher_age is dependent on
teacher_id alone.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table: teacher_subject table:

Now the tables comply with Second normal form (2NF).

 Third Normal Form (3NF):

A table is said to be in 3NF if the following conditions hold:


● Table must be in 2NF
● there should be no transitive dependency, i.e. non-prime attributes should
not depend on other non-prime attributes in a table.

Example: Suppose a company wants to store the complete address of each


employee, they create a table named employee_details that looks like this:

All attributes except emp_id are non-prime as they are not part of any candidate
keys. To make this table complies with 3NF we have to break the table into two
tables to remove the transitive dependency:

Employee table: Employee_zip table:


 Boyce Codd normal form (BCNF)

Boyce Codd Normal Form is also known as 3.5 NF. It is a strict version of 3NF. BCNF
ensures that each non-key attribute is dependent only on the candidate key.

Example: Suppose there is a company where employees work in more than one
department.

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.To make
the table comply with BCNF we can break the table in three tables like this:

Emp_nationality table: Emp_dept table:


Emp_dept_mapping table:

 Fourth Normal Form (4NF):

4NF is a further refinement of BCNF that ensures that a table does not contain any
multi-valued dependencies.

 Fifth Normal Form (5NF):

This level of normalization involves decomposing a table into smaller tables to


remove data redundancy and improve data integrity.Decomposition should be
lossless.

Advantages of Normal Form


 Reduced data redundancy: Normalization helps to eliminate duplicate data

in tables, reducing the amount of storage space needed and improving


database efficiency.
 Improved data consistency: Normalization ensures that data is stored in a

consistent and organized manner, reducing the risk of data inconsistencies


and errors.
 Simplified database design: Normalization provides guidelines for organizing

tables and data relationships, making it easier to design and maintain a


database.
 Improved query performance: Normalized tables are typically easier to

search and retrieve data from, resulting in faster query performance.


 Easier database maintenance: Normalization reduces the complexity of a

database by breaking it down into smaller, more manageable tables, making


it easier to add, modify, and delete data.
FUNCTIONAL DEPENDENCY

It is a concept that specifies the relationship between two sets of attributes where
one attribute determines the value of another attribute. It is denoted as X → Y,
where the attribute set on the left side of the arrow, X is called Determinant, and Y
is called the Dependent.

Types of Functional dependency


1)Trivial functional dependency

● A → B has trivial functional dependency if B is a subset of A.

● The following dependencies are also trivial like: A → A, B → B

2)Non-trivial functional dependency

● A → B has a non-trivial functional dependency if B is not a subset of A.

● When A intersection B is NULL, then A → B is called as complete non-trivial.

Properties of functional dependencies/Inference Rules/Armstrong’s axioms

 Reflexivity: if B is a subset of A then A determines B. Formally, B ⊆

A then A → B.
Example, {roll_no, name} → name is valid.
 Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by

augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.
 Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also

valid by the Transitivity rule.


Example, roll_no → dept_name & dept_name → dept_building, then roll_no
→ dept_building is also valid.etc.

 Union : if X determines Y and X determines Z, then X must also determine Y


and Z.

If X → Y and X → Z then X → YZ

 Decomposition : if X determines Y and Z, then X determines Y and X


determines Z separately

If X → YZ then X → Y and X → Z

 Pseudo-transitivity: if X determines Y and YZ determines W, then XZ


determines W.

If X → Y and YZ → W then XZ → W

TRANSITIVE DEPENDENCY

In transitive dependency, dependent is indirectly dependent on determinant. i.e. If a


→ b & b → c, then according to axiom of transitivity, a → c. This is a transitive
dependency.
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid transitive dependency.

MULTIVALUED DEPENDENCY

Multivalued dependency occurs when two attributes in a table are independent of


each other but, both depend on a third attribute.It consists of at least two
attributes that are dependent on a third attribute that's why it always requires at
least three attributes.

In Multivalued dependency, entities of the dependent set are not dependent on


each other. i.e. If a → {b, c} and there exists no functional dependency between b
and c, then it is called a multivalued dependency.
Here, roll_no → {name, age} is a multivalued dependency, since the dependents
name & age are not dependent on each other(i.e. name → age or age → name
doesn’t exist !)
DECOMPOSITION

Decomposition in Database Management System is to break a relation into multiple


relations to bring it into an appropriate normal form. It helps to remove
redundancy, inconsistencies, and anomalies from a database. The decomposition of
a relation R in a relational schema is the process of breaking the original relation R
into two or more relations in a relational schema. Each of these relations contains a
subset of the attributes of R and together they include all attributes of R.If a
relation is not properly decomposed, then it may lead to other problems like
information loss, etc.

Types of Decomposition

1)Lossless Decomposition

Lossless decomposition is a decomposition of a relation R into relations R1, and R2


such that if we perform a natural join of relation R1 and R2, it will return the
original relation R. This is effective in removing redundancy from databases while
preserving the original data.

In other words by lossless decomposition, it becomes feasible to reconstruct the


relation R from decomposed tables R1 and R2 by using Joins.

We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it into R1 and R2,
then the rules are:
1. The union of attributes of both the sub relations R1 and R2 must contain all

the attributes of original relation R.


R1 ∪ R2 = R

2. The intersection of attributes of both the sub relations R1 and R2 must not

be null, i.e., there should be some attributes that are present in both R1 and
R2.
R1 ∩ R2 ≠ ∅

3. The intersection of attributes of both the sub relations R1 and R2 must be

the superkey of R1 or R2, or both R1 and R2.


R1 ∩ R2 = Super key of R1 or R2

2)Lossy Decomposition

Whenever we decompose a relation into multiple relational schemas, then the loss
of data/information is unavoidable whenever we try to retrieve the original
relation.This is called as lossy decomposition.
DEPENDENCY PRESERVATION

Dependency Preserving is a technique used in Database Management System


(DBMS) to decompose a relation into smaller relations while preserving the
functional dependencies between the attributes. The goal is to improve the
efficiency of the database by reducing redundancy and improving query
performance.

In this technique, the original relation is decomposed into smaller relations in such
a way that the resulting relations preserve the functional dependencies of the
original relation. This is important because if the decomposition results in losing
any of the original functional dependencies, it can lead to data inconsistencies and
anomalies.

To achieve dependency preserving decomposition, there are various algorithms


available, such as the Boyce-Codd Normal Form (BCNF) decomposition and the
Third Normal Form (3NF) decomposition. These algorithms are based on the
concept of functional dependencies and are used to identify the attributes that
should be grouped together to form smaller relations.
NULL VALUE

The SQL NULL is the term used to represent a missing value. A NULL value in a table
is a value in a field that appears to be blank. A field with a NULL value is a field with
no value & is totally different than a zero value.

Problems with null values

In general null value does not impact anything until it is used just for showing the
status of any field. But once that field is used in mathematical calculation with
some other logical field then it can impact the outcome result.

Null values will also affect the aggregate function that incorporates the value of a
given field. If we were using an aggregate function, for example, a count function,
the result will always contain a null if the field contains a null.etc.

DANGLING TUPLES

In DBMS if there is a tuple that does not participate in a natural join we called it as
dangling tuple . It may gives indication consistency problem in the database.
Another definition of dangling problem tuple is that a tuple with a foreign key
value that not appear in the referenced relation is known as dangling tuple. In
DBMS Referential integrity constraints specify us exactly when dangling tuples
indicate problem.

You might also like