0% found this document useful (0 votes)
65 views

Normalization With Example2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Normalization With Example2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

10/21/2024

Normalization with example

DB Normalization
• DB normalization is a process used for data modelling or DB creation,
where you organize your data & tables so it can be added & updated
efficiently.
• It is something a person does manually. It is commonly done by DB
developers & DB administrators.
• It can be done on any relational DB, where data is stored in tables
that are linked to each other.
• To perform the normalization process, you
• Start with a rough idea of the data you want to store,
• Then apply certain rules to it to get it to a more efficient form.

1
10/21/2024

Why Normalize a Database


• Why do we want to go through this manual process of rearranging the data?
• Make the database more efficient
• Prevent the same data from being stored in more than one place (called an “insert anomaly”)
• Prevent updates being made to some data but not others (called an “update anomaly”)
• Prevent data not being deleted when it is supposed to be, or from data being lost when it is
not supposed to be (called a “delete anomaly”)
• Ensure the data is accurate
• Reduce the storage space that a database takes up
• Ensure the queries on a database run as fast as possible
• Normalization in a DBMS is done to achieve these points. Without normalization on a
database, the data can be slow, incorrect, and messy.

the Normal Forms


• The process of normalization involves applying rules to a set of data.
• Each of these rules transforms the data to a certain structure, called
a normal form.
• There 6 normal forms & the first 3 are the most common
• When the first rule is applied, the data is in “first normal form“
• The second rule is applied & the data is in “second normal form“.
• The third rule is then applied and the data is in “third normal form“.
• ….

2
10/21/2024

First Normal Form (1NF)


• Normalization in DBMS starts with the first rule being applied
• 1NF is the way the data is represented after it has the first rule of
normalization applied to it.
• Let us start with a sample DB.
• we’re going to use a student & teacher DB at a school.
• We have a set of data we want to capture in our DB.
• The table here is how it currently looks, a single table called “STUDENT” with a lot of
columns.

Example
Sample DB.
Uses a student & teacher DB at a school.
We have a set of data we want to capture in our DB.
The table here is how it currently looks, a single table called “STUDENT” with a lot of columns.
Table: STUDENT
Date of Teacher
Student Name Fees Paid Address Subject 1 Subject 2 Subject 3 Subject 4 Teacher Address Course Name
Birth Name
3 Main Street,
Economics 1 Biology 1 James 44 March Way,
John Smith 18-Jul-20 04-Aug-01 North Boston Economics
(Business) (Science) Peterson Glebe 56100
56125
16 Leeds Road, Business
Programming James 44 March Way,
Maria Griffin 14-May-21 10-Sep-02 South Boston Biology 1 (Science) Intro Computer Science
2 (IT) Peterson Glebe 56100
56128 (Business)
21 Arrow Street,
Sarah
Susan Johnson 03-Feb-21 13-Jan-01 South Boston Biology 2 (Science) Medicine
Francis
56128
14 Milk Lane,
Shane 105 Mist Road,
Matt Long 29-Apr-22 25-Apr-02 South Boston Dentistry
Cobson Faulkner 56410
56128

3
10/21/2024

Example
• Everything is in one table.
• How can we normalize this?
• We start with getting the data to 1NF.
• To apply 1NF to a DB, we look at each table, one by one, & ask
ourselves the following questions of it:
• Does the combination of all columns make a unique row every single time?
• What field can be used to uniquely identify the row?

Example
• Does the combination of all columns make a unique row every single time?
• No. There could be the same combination of data, & it would represent a different row.
• There could be the same values for this row & it would be a separate row (although it is rare).
• What field can be used to uniquely identify the row?
• Is this the student name? No, there could be 2 students with the same name.
• Address? No, this is not unique either.
• Any other field? No, there isn’t a field that can uniquely identify the row.
• So we need to create it (PK)
• When we create a new PK, we can call it whatever we like
• But, it should be obvious & consistently named between tables.
• I prefer using the ID suffix, so I would call it student ID.

4
10/21/2024

Example: Data & ERD in 1NF


This is our new table:
Student (student ID, student name, fees paid, date of birth, address, subject 1, subject 2,
subject 3, subject 4, teacher name, teacher address, course name)

This is a common way of representing tables in text format: The table in Entity Relationship
Diagram (ERD)
1. Start with writing the table name
2. Then in parenthesis list all the columns, separated with a
comma AND underline the PK

This data is now in 1NF.

Second Normal Form (2NF)


• The rule of 2NF on a DB can be described as:
1. Fulfil the requirements of 1NF
2. Each non-key attribute must be functionally dependent on the PK

Functional Dependency
means each field that is not the PK is determined by that PK
so it is specific to that record.

10

5
10/21/2024

Example
• In our table, are all these columns dependent on and specific to the PK?
Student (student ID, student name, fees paid, date of birth, address, subject 1, subject 2, subject 3, subject 4, teacher name, teacher address,
course name)

• The PK is student ID, which represents the student. Let us look at each column:
• student name Yes, this is dependent on the PK. A different student ID means a different student name.
• fees paid Yes, this is dependent on the PK. Each fees paid value is for a single student.
• date of birth Yes, it is specific to that student.
• Address Yes, it is specific to that student.
• subject 1 No, this column is not dependent on the student. More than one student can be enrolled in one subject.
• subject 2 No, as above, more than one subject is allowed.
• subject 3 No, same rule as subject 2.
• subject 4 No, same rule as subject 2
• teacher name No, the teacher name is not dependent on the student.
• teacher address No, the teacher address is not dependent on the student.
• course name No, the course name is not dependent on the student.

11

Example
• We have a mix of Yes & No here.
• Some fields are dependent on the student ID, & others are not.
• How can we resolve those we marked as No?
• They are
• Subject
• Teacher
• Course

12

6
10/21/2024

Example
• Subject
• First, the subject 1 column.
• It is not dependent on the student, as more than one student can have a subject, and the subject is not a part of the
definition of a student.
• So, we move it to a new table: Subject (subject name)
• Name: subject name reflects what the value represents.
When we are writing queries on this table or looking at diagrams, its clearer what subject name is instead of using subject.
• Unique field? Subject name is not unique necessarily.
Two subjects could have the same name and this would cause problems in our data.
• So, what do we do? We add a PK column, just like we did for student.
• I will call this subject ID, to be consistent with the student ID.
Subject (subject ID, subject name)
• This means we have a student table & a subject table.
• We can do this for all 4 of our subject columns in the student table, removing them from the student table so it looks like
this:
Student (student ID, student name, fees paid, date of birth, address, teacher name, teacher address, course name)

13

Example
• Teacher
• The next column we marked as No was the Teacher Name column.
• The teacher is separate to the student so should be captured separately.
• This means we should move it to its own table. Teacher (teacher name)
• We should also move the teacher address to this table, as it is a property of the teacher.
• I will also rename teacher address to be just address. Teacher (teacher name, address)
• Teacher name & address is not unique.
• Sure, in most cases it would be, but to avoid duplication we should add a PK.
• Let us call it teacher ID: Teacher (teacher ID, teacher name, address)
• Student table so it looks like this:
Student (student ID, student name, fees paid, date of birth, address, course name)

14

7
10/21/2024

Example
• Course
• The last column we must look at was the Course Name column.
• This indicates the course that the student is currently enrolled in.
• While the course is related to the student (a student is enrolled in a course), the name of
the course itself is not dependent on the student.
• So, we should move it to a separate table.
• This is so any changes to courses can be made independently of students.
• The course table would look like this: Course (course name)
• Let us also add a PK called course ID. Course (course ID, course name)
• Student table so it looks like this:
Student (student ID, student name, fees paid, date of birth, address)

15

Example
• Our DB so far looks like this:
• Student (student ID, student name, fees paid, date of birth, address)
• Subject (subject ID, subject name)
• Teacher (teacher ID, teacher name, address)
• Course (course ID, course name)

16

8
10/21/2024

Creating Relationships (Linking Entities)


• How do we link these tables together?
• We still need to know:
• which subjects a student is taking
• which course they are in
• who their teachers are.

17

Creating Relationships (Linking Entities)


• Foreign Keys (FK) in Tables
• We have 4 separate tables
• Each capture different pieces of information
• In our case, we need to capture that students are taking certain courses,
have teachers, & subjects.
• But the data is in different tables.
• How can we keep track of this?
• By using a FK.

18

9
10/21/2024

Where Does PK Go to Become FK


• Ask:
1. Does one row from table1 have many rows from table2? OR
2. Does one row from table2 have many rows from table1?

• If it is the first, then table1 ID goes into table 2.


• If it is the second, then table2 ID goes into table1.

19

Example
• start with a simple one: STUDENTS taking courses.
Student (student ID, student name, fees paid, date of birth, address)
Course (course ID, course name)
• In the example, a student can only be enrolled in one course at a time, & a course
can have many students.
• We need to either:
• Add the course ID from the course table into the student table
• Add the student ID from the student table into the course table
• But which one?

20

10
10/21/2024

Example
• Does a course have many students? or
• Does a student have many courses?
• Based on our rules, the first statement is true:
• A course has many students.
• This means that the course ID goes into the student table.
Student (student ID, course ID, student name, fees paid, date of birth,
address)
Course (course ID, course name) Note: FKs are bolded & italicized

21

Example
• The populate tables,, now have the course ID in the student table.
• Instead of course name
• Then, the course name can be linked using this ID.
student ID course ID student name fees paid date of birth address
1 3 John Smith 200 04-Aug-01 3 Main Street, North Boston 56125
2 1 Maria Griffin 500 10-Sep-02 16 Leeds Road, South Boston 56128
3 4 Susan Johnson 400 13-Jan-01 21 Arrow Street, South Boston 56128
4 2 Matt Long 850 25-Apr-02 14 Milk Lane, South Boston 56128

• This also means that the course name is stored in one place only
• Which then, can be added/removed/updated without impacting other tables.

22

11
10/21/2024

Example
• Next we look at TEACHER • Does a teacher have many courses, or

• How are teachers related? • Does a course have many teachers?

• Depending on the scenario, they could be • In our example, the first statement is true.
related in one of a few ways: • So, the teacher ID goes into the course table:
• A student can have one teacher that teaches • Student (student ID, course ID, student name,
them all subjects fees paid, date of birth, address)
• A subject could have a teacher than teaches it • Subject (subject ID, subject name)
• A course could have a teacher that teaches all • Teacher (teacher ID, teacher name,
subjects in a course teacher address)
• In our example, a teacher is related to a • Course (course ID, teacher ID, course name)
course.
• The two table need to relate using a FK.

23

Example
• The table data would look like this:
Course Teacher
course ID teacher ID Course name teacher ID teacher name address
1 1 Computer Science
1 James Peterson 44 March Way, Glebe 56100
2 3 Dentistry
2 Sarah Francis
3 1 Economics
4 2 Medicine 3 Shane Cobson 105 Mist Road, Faulkner 56410

• Now we can change the teacher’s information without impacting the courses
or students.

24

12
10/21/2024

Example
• Next we examine Student & Subject
• What about the subject table?
• Does a subject have many students, or
• Does a student have many subjects?
• The answer is both.
• How?
• A student can be enrolled in many subjects at a time. AND
• A subject can have many students in it.

25

Example
• How can we represent that?
• We could try to put one table’s ID in the other table:
student ID course ID subject ID student name fees paid date of birth address
1 3 1, 2 John Smith 200 04-Aug-01 3 Main Street, North Boston 56125
2 1 2, 3, 2004 Maria Griffin 500 10-Sep-02 16 Leeds Road, South Boston 56128
3 4 5 Susan Johnson 400 13-Jan-01 21 Arrow Street, South Boston 56128
4 2 Matt Long 850 25-Apr-02 14 Milk Lane, South Boston 56128

• But, then multiple pieces of information are in one column, maybe separated
by commas.
• This makes it hard to maintain & is very likely to cause errors.

26

13
10/21/2024

Example
• This kind of relationship, is a many to many relationship.
• It means that many of one record is related to many of the other record.
• A M:N relationship is common in DBs. Examples:
• Students and subjects
• A student can be enrolled in many subjects at a time, AND
• A subject can have many students in it
• Employees and companies
• An employee can have many jobs at different companies, AND
• A company has many employees)
• Actors and movies
• An actor is in multiple movies, AND
• A movie has multiple actors

27

Associative Table
(AKA bridge, junction, or linking table)
• If we can not represent M:M relationship by putting a FK in each table, how
can we represent it?

• We create intermediary table (AKA associative, junction, or linking table)

• This table is created only for storing the relationships between the two tables.
• Name the table appropriately

• Then place the id of the two tables into it.

• Make sure you have an PK for the table.

28

14
10/21/2024

Example
• In our case, having the two tables:
Student (student ID, course ID, student name, fees paid, date of birth, address)
Subject (subject ID, subject name)

• The intermediate table: Subject_Student (student ID, subject ID)


• It has 2 columns
• Student ID is a FK to the student table, and
• Subject ID is a FK to the subject table
student ID subject ID
• Each row represents a relationship between a student & a subject. 1 1
• Student 1 is linked to subject 1. 1 2

• Student 1 is linked to subject 2. 2 2


2 3
• Student 2 is linked to subject 2.
2 4
• ...
3 5

29

Example
• Advantages:
• It allows us to store many subjects for each student, & many students for each subject.
• It separates the data that describes the records from the relationship of the records
(linking ID to ID).
• Descriptive data: subject name, student name, address, etc.

• We can add and remove relationships easily.


• We can add more information about the relationship.
• We can add additional data into the table.
• For example: an enrolment date to capture when a student enrolled in a subject.

30

15
10/21/2024

Example: Final Table Structure in 2NF


• The final table structure:
Student (student ID, course ID, student name, fees paid, date of birth, address)
Subject (subject ID, subject name)
Table name “student subject” could
Subject Enrolment (student ID, subject ID) have been left as it was.

Teacher (teacher ID, teacher name, teacher address) However “Subject Enrolment”,
seems to be a better name as it is a
Course (course ID, teacher ID, course name) more descriptive of what the table
captures, a student has enrolled in a
subject.

In Subject Enrolment table both student ID & subject ID are


• bolded & italicized because they are FKs
• underlined, because together they are the PK of Subject Enrolment table

31

Example: 2NF ERD

DB structure is in 2NF

32

16
10/21/2024

Third Normal Form (3NF)


• 3NF is the final stage of the most common normalization process.
• The rule for this is:
• Fulfils the requirements of 2NF
• Has no transitive functional dependency
Transitive Functional Dependency
It means that every attribute that is not the PK, must depend on the PK & the PK only.
For example:
• Column A determines column B
• Column B determines column C
• Therefore, column A determines C
Meaning: column A defines column B which defines column C.
• This is a transitive functional dependency, & it should be removed.
• Column C should be in a separate table.

33

Example
• Investigating for transitive functional dependency in our tables.
Student (student ID, course ID, student name, fees paid, date of birth, address)
• Do any of the non-PK fields depend on something other than the PK? address
• No, none of them do. 3 Main Street, North Boston 56125
• But, examining address, there is a relationship between the ZIP code, the
16 Leeds Road, South Boston 56128
city, & the suburb.
• The ZIP code determines the city, state & the suburb. 21 Arrow Street, South Boston 56128

• This is common with addresses, & you may have noticed this if you have filled out 14 Milk Lane, South Boston 56128
any forms for addresses online recently.

• In our example case, 56128 is South Boston, & 56125 is North Boston.
• Note that these zip codes are made up and not accurate.

• This the pattern: A determines B which determines C.


• Student determines the address ZIP code which determines the suburb.

34

17
10/21/2024

Example
• So, move the ZIP code, & everything it identifies, to another table; then link it
to the student table.
• Our table could look like this:
Student (student ID, course ID, student name, fees paid, date of birth,
street address, address code ID)
Address Code (address code ID, ZIP code, suburb, city, state)
• We created a new table called Address Code & linked it to the student table.
• Address Code has a new column for the address code ID
• because the ZIP code may refer to more than one suburb.

• This way we can capture that fact


• It is in a separate table to make sure it is only stored in one place.
35

Example
• Investigating for transitive functional dependency in our tables.

Subject (subject ID, subject name)


Subject Enrolment (student ID, subject ID)
• Both of these tables have no columns that are not dependent on the PK.

Course (course ID, teacher ID, course name)


• The course name is dependent on the course ID, so this table is okay as well.

36

18
10/21/2024

Example
• Investigating for transitive functional dependency in our tables.
Teacher (teacher ID, teacher name, teacher address)
• The teacher table has the same issue as the student table with the address.
• We use the same approach for remedy the issue of the address.
Teacher (teacher ID, teacher name, street address, address code ID)
Address Code (address code ID, ZIP code, suburb, city, state)

Note: we are using the same Address Code table already created; we are not creating a
new address code table.

37

Example: Final Table Structure in 3NF


Student (student ID, course ID, student name, fees paid, date of birth,
street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Teacher (teacher ID, teacher name, street address, address code ID)

Course (course ID, teacher ID, course name)

38

19
10/21/2024

Example: 3NF ERD

DB structure is in 3NF
39

20

You might also like