Normalization With Example2
Normalization With Example2
DB Normalization
• DB normalization is a process used for data modelling or DB creation,
where you organize your data & tables so it can be added & updated
efficiently.
• It is something a person does manually. It is commonly done by DB
developers & DB administrators.
• It can be done on any relational DB, where data is stored in tables
that are linked to each other.
• To perform the normalization process, you
• Start with a rough idea of the data you want to store,
• Then apply certain rules to it to get it to a more efficient form.
1
10/21/2024
2
10/21/2024
Example
Sample DB.
Uses a student & teacher DB at a school.
We have a set of data we want to capture in our DB.
The table here is how it currently looks, a single table called “STUDENT” with a lot of columns.
Table: STUDENT
Date of Teacher
Student Name Fees Paid Address Subject 1 Subject 2 Subject 3 Subject 4 Teacher Address Course Name
Birth Name
3 Main Street,
Economics 1 Biology 1 James 44 March Way,
John Smith 18-Jul-20 04-Aug-01 North Boston Economics
(Business) (Science) Peterson Glebe 56100
56125
16 Leeds Road, Business
Programming James 44 March Way,
Maria Griffin 14-May-21 10-Sep-02 South Boston Biology 1 (Science) Intro Computer Science
2 (IT) Peterson Glebe 56100
56128 (Business)
21 Arrow Street,
Sarah
Susan Johnson 03-Feb-21 13-Jan-01 South Boston Biology 2 (Science) Medicine
Francis
56128
14 Milk Lane,
Shane 105 Mist Road,
Matt Long 29-Apr-22 25-Apr-02 South Boston Dentistry
Cobson Faulkner 56410
56128
3
10/21/2024
Example
• Everything is in one table.
• How can we normalize this?
• We start with getting the data to 1NF.
• To apply 1NF to a DB, we look at each table, one by one, & ask
ourselves the following questions of it:
• Does the combination of all columns make a unique row every single time?
• What field can be used to uniquely identify the row?
Example
• Does the combination of all columns make a unique row every single time?
• No. There could be the same combination of data, & it would represent a different row.
• There could be the same values for this row & it would be a separate row (although it is rare).
• What field can be used to uniquely identify the row?
• Is this the student name? No, there could be 2 students with the same name.
• Address? No, this is not unique either.
• Any other field? No, there isn’t a field that can uniquely identify the row.
• So we need to create it (PK)
• When we create a new PK, we can call it whatever we like
• But, it should be obvious & consistently named between tables.
• I prefer using the ID suffix, so I would call it student ID.
4
10/21/2024
This is a common way of representing tables in text format: The table in Entity Relationship
Diagram (ERD)
1. Start with writing the table name
2. Then in parenthesis list all the columns, separated with a
comma AND underline the PK
Functional Dependency
means each field that is not the PK is determined by that PK
so it is specific to that record.
10
5
10/21/2024
Example
• In our table, are all these columns dependent on and specific to the PK?
Student (student ID, student name, fees paid, date of birth, address, subject 1, subject 2, subject 3, subject 4, teacher name, teacher address,
course name)
• The PK is student ID, which represents the student. Let us look at each column:
• student name Yes, this is dependent on the PK. A different student ID means a different student name.
• fees paid Yes, this is dependent on the PK. Each fees paid value is for a single student.
• date of birth Yes, it is specific to that student.
• Address Yes, it is specific to that student.
• subject 1 No, this column is not dependent on the student. More than one student can be enrolled in one subject.
• subject 2 No, as above, more than one subject is allowed.
• subject 3 No, same rule as subject 2.
• subject 4 No, same rule as subject 2
• teacher name No, the teacher name is not dependent on the student.
• teacher address No, the teacher address is not dependent on the student.
• course name No, the course name is not dependent on the student.
11
Example
• We have a mix of Yes & No here.
• Some fields are dependent on the student ID, & others are not.
• How can we resolve those we marked as No?
• They are
• Subject
• Teacher
• Course
12
6
10/21/2024
Example
• Subject
• First, the subject 1 column.
• It is not dependent on the student, as more than one student can have a subject, and the subject is not a part of the
definition of a student.
• So, we move it to a new table: Subject (subject name)
• Name: subject name reflects what the value represents.
When we are writing queries on this table or looking at diagrams, its clearer what subject name is instead of using subject.
• Unique field? Subject name is not unique necessarily.
Two subjects could have the same name and this would cause problems in our data.
• So, what do we do? We add a PK column, just like we did for student.
• I will call this subject ID, to be consistent with the student ID.
Subject (subject ID, subject name)
• This means we have a student table & a subject table.
• We can do this for all 4 of our subject columns in the student table, removing them from the student table so it looks like
this:
Student (student ID, student name, fees paid, date of birth, address, teacher name, teacher address, course name)
13
Example
• Teacher
• The next column we marked as No was the Teacher Name column.
• The teacher is separate to the student so should be captured separately.
• This means we should move it to its own table. Teacher (teacher name)
• We should also move the teacher address to this table, as it is a property of the teacher.
• I will also rename teacher address to be just address. Teacher (teacher name, address)
• Teacher name & address is not unique.
• Sure, in most cases it would be, but to avoid duplication we should add a PK.
• Let us call it teacher ID: Teacher (teacher ID, teacher name, address)
• Student table so it looks like this:
Student (student ID, student name, fees paid, date of birth, address, course name)
14
7
10/21/2024
Example
• Course
• The last column we must look at was the Course Name column.
• This indicates the course that the student is currently enrolled in.
• While the course is related to the student (a student is enrolled in a course), the name of
the course itself is not dependent on the student.
• So, we should move it to a separate table.
• This is so any changes to courses can be made independently of students.
• The course table would look like this: Course (course name)
• Let us also add a PK called course ID. Course (course ID, course name)
• Student table so it looks like this:
Student (student ID, student name, fees paid, date of birth, address)
15
Example
• Our DB so far looks like this:
• Student (student ID, student name, fees paid, date of birth, address)
• Subject (subject ID, subject name)
• Teacher (teacher ID, teacher name, address)
• Course (course ID, course name)
16
8
10/21/2024
17
18
9
10/21/2024
19
Example
• start with a simple one: STUDENTS taking courses.
Student (student ID, student name, fees paid, date of birth, address)
Course (course ID, course name)
• In the example, a student can only be enrolled in one course at a time, & a course
can have many students.
• We need to either:
• Add the course ID from the course table into the student table
• Add the student ID from the student table into the course table
• But which one?
20
10
10/21/2024
Example
• Does a course have many students? or
• Does a student have many courses?
• Based on our rules, the first statement is true:
• A course has many students.
• This means that the course ID goes into the student table.
Student (student ID, course ID, student name, fees paid, date of birth,
address)
Course (course ID, course name) Note: FKs are bolded & italicized
21
Example
• The populate tables,, now have the course ID in the student table.
• Instead of course name
• Then, the course name can be linked using this ID.
student ID course ID student name fees paid date of birth address
1 3 John Smith 200 04-Aug-01 3 Main Street, North Boston 56125
2 1 Maria Griffin 500 10-Sep-02 16 Leeds Road, South Boston 56128
3 4 Susan Johnson 400 13-Jan-01 21 Arrow Street, South Boston 56128
4 2 Matt Long 850 25-Apr-02 14 Milk Lane, South Boston 56128
• This also means that the course name is stored in one place only
• Which then, can be added/removed/updated without impacting other tables.
22
11
10/21/2024
Example
• Next we look at TEACHER • Does a teacher have many courses, or
• Depending on the scenario, they could be • In our example, the first statement is true.
related in one of a few ways: • So, the teacher ID goes into the course table:
• A student can have one teacher that teaches • Student (student ID, course ID, student name,
them all subjects fees paid, date of birth, address)
• A subject could have a teacher than teaches it • Subject (subject ID, subject name)
• A course could have a teacher that teaches all • Teacher (teacher ID, teacher name,
subjects in a course teacher address)
• In our example, a teacher is related to a • Course (course ID, teacher ID, course name)
course.
• The two table need to relate using a FK.
23
Example
• The table data would look like this:
Course Teacher
course ID teacher ID Course name teacher ID teacher name address
1 1 Computer Science
1 James Peterson 44 March Way, Glebe 56100
2 3 Dentistry
2 Sarah Francis
3 1 Economics
4 2 Medicine 3 Shane Cobson 105 Mist Road, Faulkner 56410
• Now we can change the teacher’s information without impacting the courses
or students.
24
12
10/21/2024
Example
• Next we examine Student & Subject
• What about the subject table?
• Does a subject have many students, or
• Does a student have many subjects?
• The answer is both.
• How?
• A student can be enrolled in many subjects at a time. AND
• A subject can have many students in it.
25
Example
• How can we represent that?
• We could try to put one table’s ID in the other table:
student ID course ID subject ID student name fees paid date of birth address
1 3 1, 2 John Smith 200 04-Aug-01 3 Main Street, North Boston 56125
2 1 2, 3, 2004 Maria Griffin 500 10-Sep-02 16 Leeds Road, South Boston 56128
3 4 5 Susan Johnson 400 13-Jan-01 21 Arrow Street, South Boston 56128
4 2 Matt Long 850 25-Apr-02 14 Milk Lane, South Boston 56128
• But, then multiple pieces of information are in one column, maybe separated
by commas.
• This makes it hard to maintain & is very likely to cause errors.
26
13
10/21/2024
Example
• This kind of relationship, is a many to many relationship.
• It means that many of one record is related to many of the other record.
• A M:N relationship is common in DBs. Examples:
• Students and subjects
• A student can be enrolled in many subjects at a time, AND
• A subject can have many students in it
• Employees and companies
• An employee can have many jobs at different companies, AND
• A company has many employees)
• Actors and movies
• An actor is in multiple movies, AND
• A movie has multiple actors
27
Associative Table
(AKA bridge, junction, or linking table)
• If we can not represent M:M relationship by putting a FK in each table, how
can we represent it?
• This table is created only for storing the relationships between the two tables.
• Name the table appropriately
28
14
10/21/2024
Example
• In our case, having the two tables:
Student (student ID, course ID, student name, fees paid, date of birth, address)
Subject (subject ID, subject name)
29
Example
• Advantages:
• It allows us to store many subjects for each student, & many students for each subject.
• It separates the data that describes the records from the relationship of the records
(linking ID to ID).
• Descriptive data: subject name, student name, address, etc.
30
15
10/21/2024
Teacher (teacher ID, teacher name, teacher address) However “Subject Enrolment”,
seems to be a better name as it is a
Course (course ID, teacher ID, course name) more descriptive of what the table
captures, a student has enrolled in a
subject.
31
DB structure is in 2NF
32
16
10/21/2024
33
Example
• Investigating for transitive functional dependency in our tables.
Student (student ID, course ID, student name, fees paid, date of birth, address)
• Do any of the non-PK fields depend on something other than the PK? address
• No, none of them do. 3 Main Street, North Boston 56125
• But, examining address, there is a relationship between the ZIP code, the
16 Leeds Road, South Boston 56128
city, & the suburb.
• The ZIP code determines the city, state & the suburb. 21 Arrow Street, South Boston 56128
• This is common with addresses, & you may have noticed this if you have filled out 14 Milk Lane, South Boston 56128
any forms for addresses online recently.
• In our example case, 56128 is South Boston, & 56125 is North Boston.
• Note that these zip codes are made up and not accurate.
34
17
10/21/2024
Example
• So, move the ZIP code, & everything it identifies, to another table; then link it
to the student table.
• Our table could look like this:
Student (student ID, course ID, student name, fees paid, date of birth,
street address, address code ID)
Address Code (address code ID, ZIP code, suburb, city, state)
• We created a new table called Address Code & linked it to the student table.
• Address Code has a new column for the address code ID
• because the ZIP code may refer to more than one suburb.
Example
• Investigating for transitive functional dependency in our tables.
36
18
10/21/2024
Example
• Investigating for transitive functional dependency in our tables.
Teacher (teacher ID, teacher name, teacher address)
• The teacher table has the same issue as the student table with the address.
• We use the same approach for remedy the issue of the address.
Teacher (teacher ID, teacher name, street address, address code ID)
Address Code (address code ID, ZIP code, suburb, city, state)
Note: we are using the same Address Code table already created; we are not creating a
new address code table.
37
Address Code (address code ID, ZIP code, suburb, city, state)
Teacher (teacher ID, teacher name, street address, address code ID)
38
19
10/21/2024
DB structure is in 3NF
39
20