0% found this document useful (0 votes)
33 views

Model Data Object

Model Data Object

Uploaded by

getnetzd7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Model Data Object

Model Data Object

Uploaded by

getnetzd7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

WEB DEVELOPMENT AND DATABASE

ADMINISTRATION
LEVEL – III

Module Title: Modeling Data Objects

Course Code: EIS WDDBA3 02 1221


LEARNING GUIDE

Unit of Competence: Model Data Object

Module Title: Modeling Data Objects

MODULE CODE: EIS WDDBA3 04


1221

Code: Nominal Duration : 75hrs


MODULE DESCRIPTION MODULE DESCRIPTION:

This module covers the knowledge, skills and required to understand business operations, identify
entities and data, diagrammatically represent their relationships and prepare a data model

.LEARNING OUTCOMES: At the end of the module the learner will be able to:

• LO1 identify entities and relationships

• LO2 develop normalization

• LO3 validate model


MODULE CONTENTS:

MODULE CONTENTS:

LO1 Identify entities and relationships


1.1 Understanding and analyzing business operations
• Identify Entity types entity sets, attributes and keys
1.2 Identify the scope of the system
• Databases
• Applications
• Servers
• Operating systems
• Gateways
• Application service provider and
• ISP
1.3 Reviewing business rules to determine impact
• Over view of business rules
1.4 Relationships
• One-to-many
• Many-to-one
• Many-to-many
• One-to-one
1.5 Documenting Entity relationship Diagram
• Person, object or concept
LO2 Develop Normalization
2.1 Introduction to Normalization
2.2 Informal design guide lines for relational schema
2.3 Functional dependencies
2.4 Normal forms based on primary keys
2.5 General definition of second and third normal forms
2.6 Boyce Codd normal form
2.7 Fourth Normal form
2.8 Fifth Normal form
LO3 Validate data Model using Normalization
3.1 Validating data model
3.2 Resolving issues or recommendations
3.3 Documenting completed data model
• Submitting final approval to client
LEARNING STRATEGIES:
• Lecture-discussion
• Group work
Individual assignment

ASSESSMENT METHODS:

• Interview/Written Test
• Demonstration/Observation with Oral Questioning ASSESSMENT CRITERIA:

LO1 Identify entities and relationships

• Business data are analyzed to understand operations


• Boundaries of the system are identified
• Entities, attributes, data types and relationships of data are identified
• Business rules are reviewed to determine impact
• Relationships are documented in an entity relationship diagram
LO2 Develop normalization
• Normalization of business data undertaken and results are documented
• Normalization results are compared with entity relationship diagram
• Differences between data are reconciled, if any
LO3 Validate model

• Data model is validated with client


• Issues or recommendations arising are resolved
• Completed data model are documented Final approval is submitted to client

Model Data Object


LO1. Identify entities and relationships What
is entity?

An entity is an existing or real thing. The fact that something exists also seems to indicate
separateness from other existences or entities. In programming, engineering, and probably many
other contexts, the word is used to identify units, whether concrete things or abstract ideas.

1. In relation to a database , an entity is a single person, place, or thing about which data can
be stored. ex. school, student, course, deparment, employee, university.

2. In data modeling (a first step in the creation of a database), an entity is some unit of data
that can be classified and have stated relationships to other entities.

Entities: are concepts within the data model. Each entity is represented by a box within the ERD.
Entities are abstract concepts, each representing one or more instances of the concept in question.
An entity might be considered a container that holds all of the instances of a particular thing in a
system. Entities are equivalent to database tables in a relational database, with each row of the
table representing an instance of that entity.
Remember that each entity represents a container for instances of the thing in question. The
diagram below has an entity for “student” and another for “school.” This indicates that the system
being modeled may contain one or more students and one or more schools.

Entity Set and entity type


An entity set is a logical container for instances of an entity type and instances of any type derived
from that entity type. The relationship between an entity type and an entity set is analogous to the
relationship between a row and a table in a relational database: Like a row, an entity type describes
data structure, and, like a table, an entity set contains instances of a given structure. An entity set
is not a data modeling construct; it does not describe the structure of data. Instead, an entity set
provides a construct for a hosting or storage environment (such as the common language runtime
or a SQL Server database) to group entity type instances so that they can be mapped to a data store.
Example entity set
The diagram below shows a conceptual model with three entity types: Book, Publisher, and
Author.

The entity type is the fundamental building block for describing the structure of data with the Entity
Data Model (EDM). In a conceptual model, an entity type represents the structure of toplevel
concepts, such as customers or orders. An entity type is a template for entity type instances. Each
template contains the following information:

• A unique name. (Required.)

• An entity key defined by one or more properties. (Required.)


Entity Key

An entity key is a property or a set of properties of an entity type that are used to determine identity.
The properties that make up an entity key are chosen at design time. The values of entity key
properties must uniquely identify an entity type instance within an entity set at run time. The
properties that make up an entity key should be chosen to guarantee uniqueness of instances in an
entity set.

The following are the requirements for a set of properties to be an entity key:

• No two entity keys within an entity set can be identical. That is, for any two entities within
an entity set, the values for all of the properties that constitute a key cannot be the same.
However, some (but not all) of the values that make up an entity key can be the same.

• An entity key must consist of a set of non-nullable, immutable, primitive type properties.

• The properties that make up an entity key for a given entity type cannot change. You cannot
allow more than one possible entity key for a given entity type; surrogate keys are not
supported.
Note: An entity is weak if it depends on another entity for part of its key

What Attribute and entity mean?

Entities are the things you're keeping track of. example in a video rental database, you keep track
of customers, invoices, titles, copies, etc. These are represented in tables where the rows are
individual instances of a customer or title. The columns are the attributes, the things that tell us
about the instance in the row. The customer's name, address, city, balance, etc. are attributes that
help identify the customer. Databases link entities/tables so that a customer, described by its
attributes.so that attributes describe an entity.

Entity and Attribute

An entity with an attribute


Relationship
• A relationship captures how entities are related to one another. Relationships can be
thought of as verbs, linking two or more nouns.

• Relationships are represented by lines between entities. Relationship lines indicate that
each instance of an entity may have a relationship with instances of the connected entity,
and vice versa.
The diagram above now indicates that students may have some relationship with schools. More specifically,
there may be a relationship between a particular student (an instance of the student entity) and a particular
school (an instance of the school entity).

If necessary, a relationship line may be labeled to define the relationship. In this case, one can infer that a
student may attend a school, or that a school may enroll students.

Entity and relationship

Entities and relationships can both have attributes. Examples: an employee entity might have a
Social Security Number (SSN) attribute; the proved relationship may have a date attribute.
Two related entities

Two related entities

Relationship and attribute

A relationship with an attribute

A relationship is how the data is shared between entities. There are three types of relationships
between entities:
• one-to-one: one instance of an entity (A) is associated with one other instance of another
entity (B). For example, in a database of employees, each employee name (A) is
associated with only one social security number (B).

• one-to-many: one instance of an entity (A) is associated with zero, one or many instances
of another entity (B), but for one instance of entity B there is only one instance of entity
A. For example, for a company with all employees working in one building, the building
name (A) is associated with many different employees (B), but those employees all share
the same singular association with entity A.

• many-to-many: one instance of an entity (A) is associated with one, zero or many
instances of another entity (B), and one instance of entity B is associated with one, zero or
many instances of entity A. For example, for a company in which all of its employees
work on multiple projects, each instance of an employee (A) is associated with many
instances of a project (B), and at the same time, each instance of a project (B) has multiple
employees (A) associated with it.

A 1:N (one-to-many) Relationship is a hierarchical relationship created or viewed from the


primary entity. Any one entity instance from the primary entity can be referenced by many entity
instances from the related entity.

An N:1 (many-to-one) Relationship is a hierarchical relationship created or viewed from the


related entity. Many entity instances from the related entity can reference any one entity instance
from the primary entity. Remember that the same relationship can be viewed from either of the two
entities that participate in the relationship.
N:N (many-to-many) Relationships

A many-to-many relationship lets users relate one or more entity instances from another entity to
an entity instance of the current entity. A many-to-many relationship is reciprocal. Therefore, entity
instances can be related from either entity. A many-to-many relationship may also be
selfreferential. This means that one or more other entity instances of the current entity can be related
to an entity instance of the same entity.

Primary key

The primary key is a fundamental concept in relational database design. It's an easy concept: each
record should have something that identifies it uniquely. The primary key can be a single field, or
a combination of fields. A table's primary key also serves as the basis of relationships with other
tables

Functional Dependency
Closely tied to the notion of a key is a special normalization concept called functional dependence
or functional dependency . The second and third normal forms verify that your functional
dependencies are correct. So what is a "functional dependency"? It describes how one field (or
combination of fields) determines another field. Consider an example:

[ZIP Code]
ZIP Code
City
County
State Abbreviation
State Name
ZIP Code is a unique 5-digit key. What makes it a key? It is a key because it determines the other
fields. For each ZIP Code there is a single city, county, and state abbreviation. These fields are
functionally dependent on the ZIP Code field. In other words, they belong with this key. Look at
the last two fields, State Abbreviation and State Name. State Abbreviation determines State Name,
in other words, State Name is functionally dependent on State Abbreviation. State Abbreviation is
acting like a key for the State Name field

ER-Diagram
Data modeling is the formalization and documentation of existing processes and events that occur
during application software design and development. Data modeling techniques and tools capture
and translate complex system designs into easily understood representations of the data flows and
processes, creating a blueprint for construction and/or re-engineering.
A data model can be thought of as a diagram or flowchart that illustrates the relationships between
data. Although capturing all the possible relationships in a data model can be very timeintensive,
it's an important step and shouldn't be rushed. Well-documented models allow stakeholders to
identify errors and make changes before any programming code has been written.

An entity – Relationship model (ER model for short) is an abstract way to describe a database. It
usually starts with a relational database, which stores data in tables.

There are three levels of ER models that may be developed. The conceptual data model is the
highest level ER model in that it contains the least granular detail but establishes the overall scope
of what is to be included within the model set. The conceptual ER model normally defines master
reference data entities that are commonly used by the organization. Developing an enterprise-wide
conceptual ER model is useful to support documenting the data architecture for an organization.

1. Conceptual Data Modeling


A conceptual ER model may be used as the foundation for one or more logical data models. The
purpose of the conceptual ER model is then to establish structural metadata commonality for the
master data entities between the set of logical ER models. The conceptual data model may be used
to form commonality relationships between ER models as a basis for data model integration.

2. Logical Data Modeling


similar to conceptual data modeling, but addresses the unique requirements of a specific business.
Illustrates the specific entities, attributes and relationships involved in a business function. Serves
as the basis for the creation of the physical data model. describes the semantics, as represented by
a particular data manipulation technology. This consists of descriptions of tables and columns,
object oriented classes.

A logical ER model does not require a conceptual ER model especially if the scope of the logical
ER model is to develop a single disparate information system. The logical ER model contains more
detail than the conceptual ER model. In addition to master data entities, operational and
transactional data entities are now defined. The details of each data entity are developed and the
entity relationships between these data entities are established.

The logical ER model is however developed independent of technology into which it will be
implemented.

3. Physical Data Modeling


Represents an application and database-specific implementation of a logical data odel.describes the
physical means by which data are stored. This is concerned with partitions, CPUs, table spaces, and
the like.

One or more physical ER models may be developed from each logical ER model. The physical ER
model is normally developed be instantiated as a database. Therefore, each physical ER model must
contain enough detail to produce a database and each physical ER model is technology dependent
since each database management system is somewhat different.
The physical model is normally forward engineered to instantiate the structural metadata into a
database management system as relational database objects such as database tables, database
indexes such as unique key indexes, and database constraints such as a foreign key constraint or a
commonality constraint. The ER model is also normally used to design modifications to the
relational database objects and to maintain the structural metadata of the database.

An Entity Relationship Diagram (ERD) is a visual representation of different data using


conventions that describe how these data are related to each other. For example, the elements
writer, novel, and consumer may be described using ER diagrams this way:
ER diagram with basic objects

In the diagram, the elements inside rectangles are called entities while the items inside diamonds
denote the relationships between entities.

ER Diagrams Usage
ER While able to describe just about any system, ER diagrams are most often associated with
complex databases that are used in software engineering and IT networks. In particular, ER
diagrams are frequently used during the design stage of a development process in order to identify
different system elements and their relationships with each other. For example, an inventory
software used in a retail shop will have a database that monitors elements such as purchases, item,
item type, item source and item price. Rendering this information through an ER diagram would
be something like this:

ER diagram example with entity having attributes

In the diagram, the information inside the oval shapes are attributes of a particular entity.

There are three basic elements in an ER Diagram: entity, attribute, relationship. There are more
elements which are based on the main elements. They are weak entity, multivalued attribute,
derived attribute, weak relationship and recursive relationship. Cardinality and ordinality are two
other notations used in ER diagrams to further define relationships.

Entity
An entity can be a person, place, event, or object that is relevant to a given system. For example, a
school system may include students, teachers, major courses, subjects, fees, and other items.
Entities are represented in ER diagrams by a rectangle and named using singular nouns.

Weak Entity
A weak entity is an entity that depends on the existence of another entity. In more technical terms
it can defined as an entity that cannot be identified by its own attributes. It uses a foreign key
combined with its attributed to form the primary key. An entity like order item is a good example
for this. The order item will be meaningless without an order so it depends on the existence of
order.

Weak Entity Example in ER diagrams

Attribute
An attribute is a property, trait, or characteristic of an entity, relationship, or another attribute. For
example, the attribute Inventory Item Name is an attribute of the entity Inventory Item. An entity
can have as many attributes as necessary. Meanwhile, attributes can also have their own specific
attributes. For example, the attribute “customer address” can have the attributes number, street,
city, and state. These are called composite attributes. Note that some top level ER diagrams do not
show attributes for the sake of simplicity. In those that do, however, attributes are represented by
oval shapes.

Attributes in ER diagrams, note that an attribute can have its own attributes ( composite attribute
)

Multivalued Attribute
If an attribute can have more than one value it is called an multivalued attribute. It is important to
note that this is different to an attribute having its own attributes. For example a teacher entity can
have multiple subject values.
Example of a multivalued attribute

Derived Attribute
An attribute based on another attribute. This is found rarely in ER diagrams. For example for a
circle the area can be derived from the radius.

Derived Attribute in ER diagrams

Relationship
A relationship describes how entities interact. For example, the entity “carpenter” may be related
to the entity “table” by the relationship “builds” or “makes”. Relationships are represented by
diamond shapes and are labeled using verbs.

Using Relationships in Entity Relationship Diagrams

Recursive Relationship
If the same entity participates more than once in a relationship it is known as a recursive
relationship. In the below example an employee can be a supervisor and be supervised, so there is
a recursive relationship.
Example of a recursive relationship in ER diagrams

Cardinality and Ordinality


These two further defines relationships between entities by placing the relationship in the context
of numbers. In an email system, for example, one account can have multiple contacts. The
relationship in this case follows a “one to many” model. There are number of notations used to
present cardinality in ER diagrams.
ER-Diagram: Example1. exam database
Example 2. University database

Explanation

• A Course has meaning only in the context of a Program, so it‟s a weak entity, with course_id
as a weak key. This means that a Course is uniquely identified using its course_id and the
program_id of its owning program.

• As a weak entity, Course participates totally in the many-to-one identifying relationship


with its owning Program. This relationship has Year and Semester attributes that identify
its sequence position.

• Student and Course are related through the many-to-many Attempts relationships; a course
can exist without a student, and a student can be enrolled without attempting any courses,
so the participation is not total.

• When a student attempts a course, there are attributes to capture the Year and Semester, and
the Mark and Grade.

Stage 1 : Define Scope of the Database Project


• Identify which organizational subdivisions will be served by the database

• Define which functions within these organizations will utilize the database

• Identify which existing and planned applications will be converted to the database system

• Prepare proposal for management and obtain go-ahead

Stage 2 : Organize Database Project


• Pick users for design team

• Select database Administrator (DBA)

• Establish regular meetings and periodic management reporting for design team

Stage 3 : Select Database Management System Products


• Document requirements in formal proposal requests

• Select DBMS vendor

Stage 4 : Develop Initial Implementation Plan and Schedule


• Identify files that will be converted

• Identify programs within applications specified

• Estimate programmer hours needed to modify applications programs

• Estimate user clerical support needed to verify data using conversion

• Develop implementation schedule

Stage 5 : Design Database


• Complete detailed information requirements
• Identify data requirements
• Determine data structure and complete design specifications

• Review and approve design specifications

Stage 6 : Perform Training


• Develop training requirements and training schedule

• Train programmers in the use of DML (Data Manipulation Language)

• Train DBA in DMCL (Data Manipulation Control Language) and DDL (Data Definition
Language)

Stage 7 : Install and Test Database


• Code DMCL, schema and subschema

• Modify representative programs for DBMS test

• Code conversion programs

• Generate the database

• Test and debug

• Review and approve test results

Stage 8 : Develop Detailed Conversion Plan


• Make individual programming assignments for each program to be modified and each file
to be loaded

• Schedule users to verify and correct file contents

• Schedule computer availability


• Prepare formal written conversion schedule and obtain commitments and from all parties
involved

• Approve conversion involved


Stage 9 : Convert Existing Applications
• Bring up one application at a time

• Update and regenerate database as required

• Approve revised applications as they are converted

• Begin using database for new applications and programs

Stage 10 : Fine-Tune Database


• Monitor DBMS access statistics and visible performance, and modify database as required

• Regenerate database when necessary

Stage 11 : Periodically Review Database Performance


• Restate organizational goals and information requirements

• Evaluate success of database project

• Begin new database projects when required

Self-check
LO1: Model Data object

1. Which is an entity
A. Borrower D. Borrower-Address
B. Borrower-ID E. Title
C. Borrower-Name F. Loan-Date
2. Which is an attribute
A. Book D. Student
B. Publisher E. Course
C. Order F. SSN(social security Number)
3. Of the following the one represents an entity diagrammatically

A.

B.

C. ___________

D.

4. Of the following the one represents a diagram of an


attribute
A.

B.

C.

5. The following can be a one-to-one relation ship

A. Relation between Department and employees


B. A relationship between a president and a country
C. Relation between Employee and project
D. Relation between Employees and department

Answer:
1. A 2. F. 3. A. 4. C. 5. B

LO2. Develop Normalization


Database designed based on ER model may have some amount of inconsistency, ambiguity and
redundancy. To resolve these issues some amount of refinement is required. This refinement
process is called as Normalization.
The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are
normalized. These are referred to as normal forms and are numbered from one (the lowest form of
normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In
practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth
normal form is very rarely seen

Benefits of database normalization


Reduced usage of storage space by intelligently categorizing data is one of the many benefits
database normalization. It enables better, faster, stronger searches as it entails fewer entities to scan
in comparison with the earlier searches based on mixed entities. Data integrity is improved through
database normalization as it splits all the data into individual entities yet building strong linkages
with the related data.
• More efficient database structure.

• Better understanding of your data.

• More flexible database structure.

• Easier to maintain database structure.

• Few (if any) costly surprises down the road.

• Validates your common sense and intuition.

• Avoids redundant fields.

• Ensures that distinct tables exist when necessary

First normal form (1NF). Eliminating Repeating Groups


This is the "basic" level of normalization and generally corresponds to the definition of any
database, namely: First normal form (1NF) has two requirements: that there be a primary key, and
that no column shall contain more than one value.

• It contains two-dimensional tables with rows and columns.

• Each column corresponds to a sub-object or an attribute of the object represented by the


entire table.
• Each row represents a unique instance of that sub-object or attribute and must be different
in some way from any other row (that is, no duplicate rows are possible).

• All entries in any column must be of the same kind. For example, in the column labeled
"Customer," only customer names or numbers are permitted.

Steps
• Eliminate duplicative columns from the same table.

• Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).

Second normal form (2NF). Second Normal Form: Eliminating Redundant Data

• Second normal form (2NF) requires that all non-key columns are fully dependent on the
entire primary key. If the table has only a single-column primary key, this requirement is
easily met.

At this level of normalization, each column in a table that is not a determiner of the contents of
another column must itself be a function of the other columns in the table. For example, in a table
with three columns containing customer ID, product sold, and price of the product when sold, the
price would be a function of the customer ID (entitled to a discount) and the specific product.

steps
• Meet all the requirements of the first normal form.

• Remove subsets of data that apply to multiple rows of a table and place them in separate
tables.

• Create relationships between these new tables and their predecessors through the use of
foreign keys

Third normal form (3NF) Third Normal Form: Eliminating Columns Not
Dependent on Keys
Third normal form (3NF) requires that there are no transitive dependencies, where one column
depends on another column which depends on the primary key.

At the second normal form, modifications are still possible because a change to one row in a table
may affect data that refers to this information from another table. For example, using the customer
table just cited, removing a row describing a customer purchase (because of a return perhaps) will
also remove the fact that the product has a certain price. In the third normal form, these tables would
be divided into two tables so that product pricing would be tracked separately.

Steps
• Meet all the requirements of the second normal form.

• Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF)


The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one
more requirement:

• Meet all the requirements of the third normal form.


• Every determinant must be a candidate key.

* Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. A row is in Boyce Codd
normal form if and only if every determinant is a candidate key. Most entities in 3NF are already
in BCNF.

Fourth Normal Form (4NF)


Fourth normal form (4NF) has one additional requirement:

• Meet all the requirements of the third normal form.

• A relation is in 4NF if it has no multi-valued dependencies.

* An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple
sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a
single one-to-many relationship within an entity if the one-to-many attributes are independent of
each other.

* Fifth Normal Form (5NF) specifies that every join dependency for the entity must be a
consequence of its candidate keys.

A row is in first normal form if and only if all underlying domains contain atomic values only. 1NF
eliminates repeating groups by putting each into a separate table and connecting them with a one-
to-many relationship. A row is in second normal form if and only if it is in first normal form and
every non-key attribute is fully dependent on the key. 2NF eliminates functional dependencies on
a partial key by putting the fields in a separate table from those that are dependent on the whole
key. A row is in third normal form if and only if it is in second normal form and every non-key
attribute is non-transitively dependent on the primary key. 3NF eliminates functional dependencies
on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent
on the key, the whole key and nothing but the key.

Example1 the following shows and answer what is the need of normalization?

What are the problems we can face if we proceed without normalization? And
What are the advantages of normalization?
Asking question to oneself is the best way to get familiar with all the concepts of normalization.

The need of Normalization


Observe the following table carefully
Student Details Course Details Result details

1001 Ram 11/09/1986 M4 Basic Maths 7 11/11/2004 89 A

1002 Shyam 12/08/1987 M4 Basic Maths 7 11/11/2004 78 B


1001 Ram 23/06/1987 H6 4 11/11/2004 87 A

1003 Sita 16/07/1985 C3 Basic Chemistry 11 11/11/2004 90 A

1004 Gita 24/09/1988 B3 8 11/11/2004 78 B

1002 Shyam 23/06/1988 P3 Basic Physics 13 11/11/2004 67 C

1005 Sunita 14/09/1987 P3 Basic Physics 13 11/11/2004 78 B

1003 Sita 23/10/1987 B4 5 11/11/2004 67 C

1005 Sunita 13/03/1990 H6 4 11/11/2004 56 D

1004 Gita 21/08/1987 M4 Basic Maths 7 11/11/2004 78 B

In first look the above table is looking so arranged and well in format but if we try to find out what exactly
this table is saying to us, we can easily figure out the various anomalies (irregularities) in this table.

1. Insert Anomaly: We cannot insert prospective course which does not have any registered student
or we cannot insert student details that is yet to register for any course.

2. Update Anomaly: if we want to update the course M4‟s name we need to do this operation three
times. Similarly we may have to update student 1003‟s name twice if it changes.

3. Delete Anomaly: if we want to delete a course M4 , in addition to M4 occurs details , other critical
details of student also will be deleted. This kind of deletion is harmful to business. Moreover, M4
appears thrice in above table and needs to be deleted thrice.

4. Duplicate Data: Course M4‟s data is stored three and student 1002‟s data stored twice .This
redundancy will increase as the number of course offerings increases.
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building blocks which
are used to define normal form.
1. Determinant : Attribute X can be defined as determinant if it uniquely defines the value
Y in a given relationship or entity .To qualify as determinant attribute need NOT be a key
attribute .Usually dependency of attribute is represented as X->Y ,which means attribute
X decides attribute Y.
Example: In RESULT relation, Marks attribute may decide the grade attribute .This is represented
as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.
2. Functional Dependency: Functional dependency has definition but let‟s not care about
that. Let‟s try to understand the concept by example. Consider the following relation :
REPORT(Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where:
• Student#-Student Number

• Course#-Course Number

• CourseName -CourseName

• IName- Name of the instructor who delivered the course

• Room#-Room number which is assigned to respective instructor

• Marks- Scored in Course Course# by student Student #

• Grade –Obtained by student Student# in course Course #

• Student#,Course# together (called composite attribute) defines EXACTLY ONE


value of marks .This can be symbolically represented as Student#Course#
Marks
This type of dependency is called functional dependency. In above example Marks is functionally
dependent on Student#Course#.
Other Functional dependencies in above examples are:
• Course# -> CourseName

• Course#-> IName(Assuming one course is taught by one and only one instructor )

• IName -> Room# (Assuming each instructor has his /her own and non shared room)

• Marks ->Grade
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one value
of Y. This is represented as :
X->Y
However X may be composite in nature.

3. Full functional dependency: In above example Marks is fully functional dependent on


student#Course# and not on the sub set of Student#Course# .This means marks cannot be
determined either by student # or Course# alone .It can be determined by using Student#
and Course# together. Hence Marks is fully functional dependent on student#course#.
CourseName is not fully functionally dependent on student#course# because one of the subset
course# determines the course name and Student# does not having role in deciding Course name
.Hence CourseName is not fully functional dependent on student #Course#.
Student#
Marks
Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes. Y
is fully functionally dependent on attribute X only if it is not functionally dependent on sub-set of
X.However X may be composite in nature.

4. Partial Dependency: In the above relationship CourseName,IName,Room# are partially


dependent on composite attribute Student#Course# because Course# alone can defines the
coursename, IName,Room#.
Room#
IName
CourseName
Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute
Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However
X may be composite in nature.

5. Transitive Dependency: In above example , Room# depends on IName and in turn


depends on Course# .Here Room# transitively depends on Course#.
IName
Room#
Course#
Similarly Grade depends on Marks,in turn Marks depends on Student#Course# hence Grade Fully
transitively depends on Student#Course#.

6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all other
attributes ,then the attribute X is a key attribute which is nothing but the candidate key.
Ex: Student#Course# together is a composite key attribute which determines all attributes
in relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.

Types of Normal Forms


1. First Normal Form(1NF)
A relation R is said to be in first normal form (1NF) if and only if all the attributes of the relation R are
atomic in nature
Student Details Course Details Result details

1001 Ram 11/09/1986 M4 Basic Maths 7 11/11/2004 89 A

1002 Shyam 12/08/1987 M4 Basic Maths 7 11/11/2004 78 B

1001 Ram 23/06/1987 H6 4 11/11/2004 87 A

1003 Sita 16/07/1985 C3 Basic Chemistry 11 11/11/2004 90 A

1004 Gita 24/09/1988 B3 8 11/11/2004 78 B

1002 Shyam 23/06/1988 P3 Basic Physics 13 11/11/2004 67 C

1005 Sunita 14/09/1987 P3 Basic Physics 13 11/11/2004 78 B

1003 Sita 23/10/1987 B4 5 11/11/2004 67 C

1005 Sunita 13/03/1990 H6 4 11/11/2004 56 D

1004 Gita 21/08/1987 M4 Basic Maths 7 11/11/2004 78 B

Table shown above Student Details ,Course Details and Result Details can be further divided. Student
Details attribute is divided into Student#(Student Number) , Student Name and date of birth. Course
Details is divided into Course# ,Course Name,Prerequisites and duration. Similarly Results attribute is
divided into DateOfexam,Marks and Grade.

Second Normal Form (2NF)


A relation is said to be in Second Normal Form if and only If :
• It is in the first normal form ,and

• No partial dependency exists between non-key attributes and key attributes.

Let us re-visit 1NF table structure.


• Student# is key attribute for Student ,

• Course# is key attribute for Course

• Student#Course# together form the composite key attributes for result relationship.

• Other attributes are non-key attributes.


To make this table 2NF compliant, we have to remove all the partial dependencies.
• StudentName and DateOfBirth depends only on student#.

• CourseName,PreRequisite and DurationInDays depends only on Course#

• DateOfExam depends only on Course#.


To remove this partial dependency we need to split Student_Course_Result table into four separate tables
,STUDENT ,COURSE,RESULT and EXAM_DATE tables as shown in figure.

STUDENT TABLE
Student # Student Name DateofBirth

1001 Ram Some value

1002 Shyam Some value

1003 Sita Some value

1004 Geeta Some value

1005 Sunita Some value

COURSE TABLE

Course# CourseName Duration of days

C3 Bio Chemistry 3

B3 Botany 8

P3 Nuclear Physics 1

M4 Applied Mathematics 4

H6 American History 5

B4 Zoology 9

RESULT TABLE

Student# Course# Marks Grade


1001 M4 89 A

1002 M4 78 B

1001 H6 87 A

1003 C3 90 A

1004 B3 78 B

1002 P3 67 C

1005 P3 78 B

1003 B4 67 C

1005 H6 56 D

1004 M4 78 B

EXAM DATE Table

Course# DateOfExam

M4 Some value

H6 Some value

C3 Some value

B3 Some value

P3 Some value

B4 Some value

• In the first table (STUDENT) ,the key attribute is Student# and all other non-key attributes,
StudentName and DateOfBirth are fully functionally dependant on the key attribute.

• In the Second Table (COURSE) , Course# is the key attribute and all the non-key attributes,
CourseName, DurationInDays are fully functional dependant on the key attribute.

• In third table (RESULT) Student#Course# together are key attributes and all other non key
attributes, Marks and Grade are fully functional dependant on the key attributes. In the
fourth Table (EXAM DATE) Course# is the key attribute and the non key attribute
,DateOfExam is fully functionally dependant on the key attribute.
At first look it appears like all our anomalies are taken away ! Now we are storing Student 1003
and M4 record only once. We can insert prospective students and courses at our will. We will
update only once if we need to change any data in STUDENT, COURSE tables. We can get rid of
any course or student details by deleting just one row.

Let us analyze the RESULT Table


Student# Course# Marks Grade

1001 M4 89 A

1002 M4 78 B

1001 H6 87 A

1003 C3 90 A

1004 B3 78 B

1002 P3 67 C

1005 P3 78 B

1003 B4 67 C

1005 H6 56 D

1004 M4 78 B

We already concluded that :

• All attributes are atomic in nature

• No partial dependency exists between the key attributes and non-key attributes

• RESULT table is in 2NF

Assume, at present, as per the university evaluation policy,

• Students who score more than or equal to 80 marks are awarded with “A” grade

• Students who score more than or equal to 70 marks up till 79 are awarded with “B” grade
• Students who score more than or equal to 60 marks up till 69 are awarded with “C” grade

• Students who score more than or equal to 50 marks up till 59 are awarded with “D” grade

The University management which is committed to improve the quality of education ,wants to
change the existing grading system to a new grading system .In the present RESULT table
structure ,

• We don‟t have an option to introduce new grades like A+ ,B- and E

• We need to do multiple updates on the existing record to bring them to new grading
definition

• We will not be able to take away “D” grade if we want to.

• 2NF does not take care of all the anomalies and inconsistencies.

3. Third Normal Form (3NF)


A relation R is said to be in 3NF if and only if

• It is in 2NF

• No transitive dependency exists between non-key attributes and key attributes. In


the above RESULT table Student# and Course# are the key attributes. All other attributes,
except grade are non-partially , non – transitively dependant on key attributes. The grade
attribute is dependant on “Marks “ and in turn “Marks” is dependent on Student#Course#. To
bring the table in 3NF we need to take off this transitive dependency.

Student# Course# Marks

1001 M4 89

1002 M4 78

1001 H6 87

1003 C3 90

1004 B3 78

1002 P3 67

1005 P3 78
1003 B4 67

1005 H6 56

1004 M4 78

UpperBound LowerBound Grade

100 95 A+

94 90 A

89 85 B+

84 80 B

79 75 B-

74 70 C

69 65 C-

After Normalizing tables to 3NF , we got rid of all the anomalies and inconsistencies. Now we can
add new grade systems, update the existing one and delete the unwanted ones.
Hence the Third Normal form is the most optimal normal form and 99% of the databases which
require efficiency in

• INSERT

• UPDATE

• DELETE
Operations are designed in this normal form.

Example 2. The following example will illustrate how database normalization performed.
Title Author Bio ISBN Subject Pages Publishe
r
MY-SQL Rabiya Database En00011 Database 500 Mega
Network administrator 1 Design 200 Mega
Quality Yonas Network En00011 110 Mega
con. administrator 2 54 Aster
Kidus Member of ISO En00011
Teamwork
member of the 3
documentation En00011
team. 4
In the example shown above, a lot of storage space will be wasted if any one criterion (author or
publisher) is considered as the identification key.

Step 1: Create first normal form (1NF)


First normal form is the basic level of database normalization.

For 1NF, ensure that the values in each column of a table are atomic; which means they are unique,
containing no sets of values. In our case, Author and Subject do not comply.

One method for bringing a table into 1NF is to separate the entities contained in the table into
separate tables. In our case this would result in Book, Author, Subject, and Publisher tables.

Book’s table:

ISBN Title Pages

En000111 MY-SQL 500

Author’s table:
Author_I First Last
D Name Name

1 Rabiay Russell

2 Kidus Yared

3 Mike Hilyer
Subject’s table:

Subject Last_name
_ID

1 Rusell

2 Yared
Publisher’s table:
Name Address City State Zip
Publisher_ID

1 Mega 4 killo Addis 14 251


Ababa

Step 2: Define relationships


Three types of relations can be established:
• One-to-one (Example: marriage)

• One-to-many (Example: kids)

• Many-to-many (Example: facebook)

The book‟s table may have many to many relations with the Author‟s table.

Author‟s table may have many books and a book may have more than one author.

The Book‟s table may have many to many relations with the Subject table.

The books may fit in many subjects and the subjects may have many books.

Many-to-many relations have to be presented by “link” tables Book


Author table:

ISBN Subject_ID

En000111 1

En000112 2
Book_Subject table:
ISBN Subject_ID

En000111 1

En000112 2
• One-to-many in our example will be Books to Publisher. Each book has only one
Publisher but one Publisher may have many books.

We can achieve „one-to-many‟ relationships with a foreign key. A foreign key is a mechanism in
database management systems (DBMS) that defines relations and creates constraints between
data segments. It is not possible to review what is not related to the specific book. It is not
possible to have a book without an author or publisher.
ISBN Title Pages Publisher_
ID

En000111 MySQL 500 1

Step 3: Make second normal form (2NF)


This table does not comply with the 2NF:

ISBN Reviewer ID Summary Reviewer_URL

En000111 3 A great
book!

Step 4: Third Normal Form (3NF)


This requires that all columns depend directly on the primary key. Tables violate the 3NF
when one column depends on another column which in turn depends on the primary key.
(A transitive dependency)

In the publisher table, the City and State are actually dependent on the zip code not the
Publisher_ID

Publisher_I Name Address City State Zip


D

1 Mega 4 killo Addis 14 251


Ababa
To comply with 3NF we have to move these outside the publisher‟s table:

Zip City State

251 Addis 14
Ababa
Through the process of database normalization we bring our schema's tables into conformance with
progressive normal forms. As a result the tables each represent a single entity (a book, an author, a
subject, etc) and we benefit from decreased redundancy, fewer anomalies and improved efficiency.
Self-check
LO2: Develop normalization

1. One is the benefit of database normalization


A. More efficient database structure
B. Better understanding of your data C.
Easier to maintain database structure.

D. Avoids redundant fields.

E. All are benefits

2. First normal form (1NF) Eliminates Repeating Groups


G. True
H. False
3. Second Normal Form Eliminates Redundant Data
A. False B.
True
4. Third Normal Form is Eliminating Columns Not Dependent
on primary Keys
A. True
B. False
5. The following is an example of one-to-one relationship
A. Marriage

B. kids

C. Face book

6. The following is an example of one-to-many relationship


A. Marriage
B. Kids
C. Face book
7. Of the following one can be a many to many relationship
A. Face book
B. Kids
C. Marriage

Answers:
1. E 2. A 3. B 4. A 5. A. 6. B. 7. A
LO3. Validate data Model using Normalization

Validating data model


Model validation essential parts of the model development process if models to be accepted and
used to support decision making. One of the very first questions that a person who is promoting a
model is likely to encounter is “has your model been validated?”

Does the model represent and correctly reproduce the behaviors of the real world system?
• Validation ensures that the model meets its intended requirements in terms of the
methods employed and the results obtained
• The ultimate goal of model validation is to make the model useful in the sense that the
model addresses the right problem, provides accurate information about the system being
modeled, and to makes the model actually used.

Business Data model describes the major information of interest from a business respective. The
business data model is independent of specific application system needs and is best developed by
the data management group with input from business subject matter experts.

Model validation by the business community can be performed in several ways. From the modeler's
perspective, the most straightforward approach is to print a copy of the diagram and walk through
it with the business representatives. This approach requires the businessperson to understand a
diagram that is IT oriented, and it is appropriate when the validation is performed by people directly
involved in IT projects. Validation activities include:

Completeness for use: Is all of the data needed to support the pertinent application development
project modeled

Completeness: Are the business areas and concepts represented by the model completely
addressed?

Accuracy: Does the model correctly represent the entities, attributes and business relationships of
interest?

Generalization and specialization: Does the model represent the data at the appropriate level of
abstraction? And Does the model follow the data modeling rules for the level of normalization
(typically third normal form) used?
Specification and generalization can be classified into disjoints and completeness. The
disjointness constraint allows us to specify whether an instance of a super type may
simultaneously be a member of two or more subtype.
Self-check
LO3: Validate model

8. During model validation we should consider one of following


A. Completeness D. Specialization
B. Accuracy E. All
C. Generalization
9. Are the business areas and concepts represented by the model completely
addressed? This characteristics of the model is
B. Accuracy D. Generalization
C. Completeness
3.-----------shows the correct representation of entities, attributes and business relationships
of the data model
A. Accuracy
B. Completeness
Answers
1. E 2. B 3. A

You might also like