Model Data Object
Model Data Object
ADMINISTRATION
LEVEL – III
This module covers the knowledge, skills and required to understand business operations, identify
entities and data, diagrammatically represent their relationships and prepare a data model
.LEARNING OUTCOMES: At the end of the module the learner will be able to:
MODULE CONTENTS:
ASSESSMENT METHODS:
• Interview/Written Test
• Demonstration/Observation with Oral Questioning ASSESSMENT CRITERIA:
An entity is an existing or real thing. The fact that something exists also seems to indicate
separateness from other existences or entities. In programming, engineering, and probably many
other contexts, the word is used to identify units, whether concrete things or abstract ideas.
1. In relation to a database , an entity is a single person, place, or thing about which data can
be stored. ex. school, student, course, deparment, employee, university.
2. In data modeling (a first step in the creation of a database), an entity is some unit of data
that can be classified and have stated relationships to other entities.
Entities: are concepts within the data model. Each entity is represented by a box within the ERD.
Entities are abstract concepts, each representing one or more instances of the concept in question.
An entity might be considered a container that holds all of the instances of a particular thing in a
system. Entities are equivalent to database tables in a relational database, with each row of the
table representing an instance of that entity.
Remember that each entity represents a container for instances of the thing in question. The
diagram below has an entity for “student” and another for “school.” This indicates that the system
being modeled may contain one or more students and one or more schools.
The entity type is the fundamental building block for describing the structure of data with the Entity
Data Model (EDM). In a conceptual model, an entity type represents the structure of toplevel
concepts, such as customers or orders. An entity type is a template for entity type instances. Each
template contains the following information:
An entity key is a property or a set of properties of an entity type that are used to determine identity.
The properties that make up an entity key are chosen at design time. The values of entity key
properties must uniquely identify an entity type instance within an entity set at run time. The
properties that make up an entity key should be chosen to guarantee uniqueness of instances in an
entity set.
The following are the requirements for a set of properties to be an entity key:
• No two entity keys within an entity set can be identical. That is, for any two entities within
an entity set, the values for all of the properties that constitute a key cannot be the same.
However, some (but not all) of the values that make up an entity key can be the same.
• An entity key must consist of a set of non-nullable, immutable, primitive type properties.
• The properties that make up an entity key for a given entity type cannot change. You cannot
allow more than one possible entity key for a given entity type; surrogate keys are not
supported.
Note: An entity is weak if it depends on another entity for part of its key
Entities are the things you're keeping track of. example in a video rental database, you keep track
of customers, invoices, titles, copies, etc. These are represented in tables where the rows are
individual instances of a customer or title. The columns are the attributes, the things that tell us
about the instance in the row. The customer's name, address, city, balance, etc. are attributes that
help identify the customer. Databases link entities/tables so that a customer, described by its
attributes.so that attributes describe an entity.
• Relationships are represented by lines between entities. Relationship lines indicate that
each instance of an entity may have a relationship with instances of the connected entity,
and vice versa.
The diagram above now indicates that students may have some relationship with schools. More specifically,
there may be a relationship between a particular student (an instance of the student entity) and a particular
school (an instance of the school entity).
If necessary, a relationship line may be labeled to define the relationship. In this case, one can infer that a
student may attend a school, or that a school may enroll students.
Entities and relationships can both have attributes. Examples: an employee entity might have a
Social Security Number (SSN) attribute; the proved relationship may have a date attribute.
Two related entities
A relationship is how the data is shared between entities. There are three types of relationships
between entities:
• one-to-one: one instance of an entity (A) is associated with one other instance of another
entity (B). For example, in a database of employees, each employee name (A) is
associated with only one social security number (B).
• one-to-many: one instance of an entity (A) is associated with zero, one or many instances
of another entity (B), but for one instance of entity B there is only one instance of entity
A. For example, for a company with all employees working in one building, the building
name (A) is associated with many different employees (B), but those employees all share
the same singular association with entity A.
• many-to-many: one instance of an entity (A) is associated with one, zero or many
instances of another entity (B), and one instance of entity B is associated with one, zero or
many instances of entity A. For example, for a company in which all of its employees
work on multiple projects, each instance of an employee (A) is associated with many
instances of a project (B), and at the same time, each instance of a project (B) has multiple
employees (A) associated with it.
A many-to-many relationship lets users relate one or more entity instances from another entity to
an entity instance of the current entity. A many-to-many relationship is reciprocal. Therefore, entity
instances can be related from either entity. A many-to-many relationship may also be
selfreferential. This means that one or more other entity instances of the current entity can be related
to an entity instance of the same entity.
Primary key
The primary key is a fundamental concept in relational database design. It's an easy concept: each
record should have something that identifies it uniquely. The primary key can be a single field, or
a combination of fields. A table's primary key also serves as the basis of relationships with other
tables
Functional Dependency
Closely tied to the notion of a key is a special normalization concept called functional dependence
or functional dependency . The second and third normal forms verify that your functional
dependencies are correct. So what is a "functional dependency"? It describes how one field (or
combination of fields) determines another field. Consider an example:
[ZIP Code]
ZIP Code
City
County
State Abbreviation
State Name
ZIP Code is a unique 5-digit key. What makes it a key? It is a key because it determines the other
fields. For each ZIP Code there is a single city, county, and state abbreviation. These fields are
functionally dependent on the ZIP Code field. In other words, they belong with this key. Look at
the last two fields, State Abbreviation and State Name. State Abbreviation determines State Name,
in other words, State Name is functionally dependent on State Abbreviation. State Abbreviation is
acting like a key for the State Name field
ER-Diagram
Data modeling is the formalization and documentation of existing processes and events that occur
during application software design and development. Data modeling techniques and tools capture
and translate complex system designs into easily understood representations of the data flows and
processes, creating a blueprint for construction and/or re-engineering.
A data model can be thought of as a diagram or flowchart that illustrates the relationships between
data. Although capturing all the possible relationships in a data model can be very timeintensive,
it's an important step and shouldn't be rushed. Well-documented models allow stakeholders to
identify errors and make changes before any programming code has been written.
An entity – Relationship model (ER model for short) is an abstract way to describe a database. It
usually starts with a relational database, which stores data in tables.
There are three levels of ER models that may be developed. The conceptual data model is the
highest level ER model in that it contains the least granular detail but establishes the overall scope
of what is to be included within the model set. The conceptual ER model normally defines master
reference data entities that are commonly used by the organization. Developing an enterprise-wide
conceptual ER model is useful to support documenting the data architecture for an organization.
A logical ER model does not require a conceptual ER model especially if the scope of the logical
ER model is to develop a single disparate information system. The logical ER model contains more
detail than the conceptual ER model. In addition to master data entities, operational and
transactional data entities are now defined. The details of each data entity are developed and the
entity relationships between these data entities are established.
The logical ER model is however developed independent of technology into which it will be
implemented.
One or more physical ER models may be developed from each logical ER model. The physical ER
model is normally developed be instantiated as a database. Therefore, each physical ER model must
contain enough detail to produce a database and each physical ER model is technology dependent
since each database management system is somewhat different.
The physical model is normally forward engineered to instantiate the structural metadata into a
database management system as relational database objects such as database tables, database
indexes such as unique key indexes, and database constraints such as a foreign key constraint or a
commonality constraint. The ER model is also normally used to design modifications to the
relational database objects and to maintain the structural metadata of the database.
In the diagram, the elements inside rectangles are called entities while the items inside diamonds
denote the relationships between entities.
ER Diagrams Usage
ER While able to describe just about any system, ER diagrams are most often associated with
complex databases that are used in software engineering and IT networks. In particular, ER
diagrams are frequently used during the design stage of a development process in order to identify
different system elements and their relationships with each other. For example, an inventory
software used in a retail shop will have a database that monitors elements such as purchases, item,
item type, item source and item price. Rendering this information through an ER diagram would
be something like this:
In the diagram, the information inside the oval shapes are attributes of a particular entity.
There are three basic elements in an ER Diagram: entity, attribute, relationship. There are more
elements which are based on the main elements. They are weak entity, multivalued attribute,
derived attribute, weak relationship and recursive relationship. Cardinality and ordinality are two
other notations used in ER diagrams to further define relationships.
Entity
An entity can be a person, place, event, or object that is relevant to a given system. For example, a
school system may include students, teachers, major courses, subjects, fees, and other items.
Entities are represented in ER diagrams by a rectangle and named using singular nouns.
Weak Entity
A weak entity is an entity that depends on the existence of another entity. In more technical terms
it can defined as an entity that cannot be identified by its own attributes. It uses a foreign key
combined with its attributed to form the primary key. An entity like order item is a good example
for this. The order item will be meaningless without an order so it depends on the existence of
order.
Attribute
An attribute is a property, trait, or characteristic of an entity, relationship, or another attribute. For
example, the attribute Inventory Item Name is an attribute of the entity Inventory Item. An entity
can have as many attributes as necessary. Meanwhile, attributes can also have their own specific
attributes. For example, the attribute “customer address” can have the attributes number, street,
city, and state. These are called composite attributes. Note that some top level ER diagrams do not
show attributes for the sake of simplicity. In those that do, however, attributes are represented by
oval shapes.
Attributes in ER diagrams, note that an attribute can have its own attributes ( composite attribute
)
Multivalued Attribute
If an attribute can have more than one value it is called an multivalued attribute. It is important to
note that this is different to an attribute having its own attributes. For example a teacher entity can
have multiple subject values.
Example of a multivalued attribute
Derived Attribute
An attribute based on another attribute. This is found rarely in ER diagrams. For example for a
circle the area can be derived from the radius.
Relationship
A relationship describes how entities interact. For example, the entity “carpenter” may be related
to the entity “table” by the relationship “builds” or “makes”. Relationships are represented by
diamond shapes and are labeled using verbs.
Recursive Relationship
If the same entity participates more than once in a relationship it is known as a recursive
relationship. In the below example an employee can be a supervisor and be supervised, so there is
a recursive relationship.
Example of a recursive relationship in ER diagrams
Explanation
• A Course has meaning only in the context of a Program, so it‟s a weak entity, with course_id
as a weak key. This means that a Course is uniquely identified using its course_id and the
program_id of its owning program.
• Student and Course are related through the many-to-many Attempts relationships; a course
can exist without a student, and a student can be enrolled without attempting any courses,
so the participation is not total.
• When a student attempts a course, there are attributes to capture the Year and Semester, and
the Mark and Grade.
• Define which functions within these organizations will utilize the database
• Identify which existing and planned applications will be converted to the database system
• Establish regular meetings and periodic management reporting for design team
• Train DBA in DMCL (Data Manipulation Control Language) and DDL (Data Definition
Language)
Self-check
LO1: Model Data object
1. Which is an entity
A. Borrower D. Borrower-Address
B. Borrower-ID E. Title
C. Borrower-Name F. Loan-Date
2. Which is an attribute
A. Book D. Student
B. Publisher E. Course
C. Order F. SSN(social security Number)
3. Of the following the one represents an entity diagrammatically
A.
B.
C. ___________
D.
B.
C.
Answer:
1. A 2. F. 3. A. 4. C. 5. B
• All entries in any column must be of the same kind. For example, in the column labeled
"Customer," only customer names or numbers are permitted.
Steps
• Eliminate duplicative columns from the same table.
• Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).
Second normal form (2NF). Second Normal Form: Eliminating Redundant Data
• Second normal form (2NF) requires that all non-key columns are fully dependent on the
entire primary key. If the table has only a single-column primary key, this requirement is
easily met.
At this level of normalization, each column in a table that is not a determiner of the contents of
another column must itself be a function of the other columns in the table. For example, in a table
with three columns containing customer ID, product sold, and price of the product when sold, the
price would be a function of the customer ID (entitled to a discount) and the specific product.
steps
• Meet all the requirements of the first normal form.
• Remove subsets of data that apply to multiple rows of a table and place them in separate
tables.
• Create relationships between these new tables and their predecessors through the use of
foreign keys
Third normal form (3NF) Third Normal Form: Eliminating Columns Not
Dependent on Keys
Third normal form (3NF) requires that there are no transitive dependencies, where one column
depends on another column which depends on the primary key.
At the second normal form, modifications are still possible because a change to one row in a table
may affect data that refers to this information from another table. For example, using the customer
table just cited, removing a row describing a customer purchase (because of a return perhaps) will
also remove the fact that the product has a certain price. In the third normal form, these tables would
be divided into two tables so that product pricing would be tracked separately.
Steps
• Meet all the requirements of the second normal form.
• Remove columns that are not dependent upon the primary key.
* Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. A row is in Boyce Codd
normal form if and only if every determinant is a candidate key. Most entities in 3NF are already
in BCNF.
* An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple
sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a
single one-to-many relationship within an entity if the one-to-many attributes are independent of
each other.
* Fifth Normal Form (5NF) specifies that every join dependency for the entity must be a
consequence of its candidate keys.
A row is in first normal form if and only if all underlying domains contain atomic values only. 1NF
eliminates repeating groups by putting each into a separate table and connecting them with a one-
to-many relationship. A row is in second normal form if and only if it is in first normal form and
every non-key attribute is fully dependent on the key. 2NF eliminates functional dependencies on
a partial key by putting the fields in a separate table from those that are dependent on the whole
key. A row is in third normal form if and only if it is in second normal form and every non-key
attribute is non-transitively dependent on the primary key. 3NF eliminates functional dependencies
on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent
on the key, the whole key and nothing but the key.
Example1 the following shows and answer what is the need of normalization?
What are the problems we can face if we proceed without normalization? And
What are the advantages of normalization?
Asking question to oneself is the best way to get familiar with all the concepts of normalization.
In first look the above table is looking so arranged and well in format but if we try to find out what exactly
this table is saying to us, we can easily figure out the various anomalies (irregularities) in this table.
1. Insert Anomaly: We cannot insert prospective course which does not have any registered student
or we cannot insert student details that is yet to register for any course.
2. Update Anomaly: if we want to update the course M4‟s name we need to do this operation three
times. Similarly we may have to update student 1003‟s name twice if it changes.
3. Delete Anomaly: if we want to delete a course M4 , in addition to M4 occurs details , other critical
details of student also will be deleted. This kind of deletion is harmful to business. Moreover, M4
appears thrice in above table and needs to be deleted thrice.
4. Duplicate Data: Course M4‟s data is stored three and student 1002‟s data stored twice .This
redundancy will increase as the number of course offerings increases.
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building blocks which
are used to define normal form.
1. Determinant : Attribute X can be defined as determinant if it uniquely defines the value
Y in a given relationship or entity .To qualify as determinant attribute need NOT be a key
attribute .Usually dependency of attribute is represented as X->Y ,which means attribute
X decides attribute Y.
Example: In RESULT relation, Marks attribute may decide the grade attribute .This is represented
as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.
2. Functional Dependency: Functional dependency has definition but let‟s not care about
that. Let‟s try to understand the concept by example. Consider the following relation :
REPORT(Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where:
• Student#-Student Number
• Course#-Course Number
• CourseName -CourseName
• Course#-> IName(Assuming one course is taught by one and only one instructor )
• IName -> Room# (Assuming each instructor has his /her own and non shared room)
• Marks ->Grade
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one value
of Y. This is represented as :
X->Y
However X may be composite in nature.
6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all other
attributes ,then the attribute X is a key attribute which is nothing but the candidate key.
Ex: Student#Course# together is a composite key attribute which determines all attributes
in relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.
Table shown above Student Details ,Course Details and Result Details can be further divided. Student
Details attribute is divided into Student#(Student Number) , Student Name and date of birth. Course
Details is divided into Course# ,Course Name,Prerequisites and duration. Similarly Results attribute is
divided into DateOfexam,Marks and Grade.
• Student#Course# together form the composite key attributes for result relationship.
STUDENT TABLE
Student # Student Name DateofBirth
COURSE TABLE
C3 Bio Chemistry 3
B3 Botany 8
P3 Nuclear Physics 1
M4 Applied Mathematics 4
H6 American History 5
B4 Zoology 9
RESULT TABLE
1002 M4 78 B
1001 H6 87 A
1003 C3 90 A
1004 B3 78 B
1002 P3 67 C
1005 P3 78 B
1003 B4 67 C
1005 H6 56 D
1004 M4 78 B
Course# DateOfExam
M4 Some value
H6 Some value
C3 Some value
B3 Some value
P3 Some value
B4 Some value
• In the first table (STUDENT) ,the key attribute is Student# and all other non-key attributes,
StudentName and DateOfBirth are fully functionally dependant on the key attribute.
• In the Second Table (COURSE) , Course# is the key attribute and all the non-key attributes,
CourseName, DurationInDays are fully functional dependant on the key attribute.
• In third table (RESULT) Student#Course# together are key attributes and all other non key
attributes, Marks and Grade are fully functional dependant on the key attributes. In the
fourth Table (EXAM DATE) Course# is the key attribute and the non key attribute
,DateOfExam is fully functionally dependant on the key attribute.
At first look it appears like all our anomalies are taken away ! Now we are storing Student 1003
and M4 record only once. We can insert prospective students and courses at our will. We will
update only once if we need to change any data in STUDENT, COURSE tables. We can get rid of
any course or student details by deleting just one row.
1001 M4 89 A
1002 M4 78 B
1001 H6 87 A
1003 C3 90 A
1004 B3 78 B
1002 P3 67 C
1005 P3 78 B
1003 B4 67 C
1005 H6 56 D
1004 M4 78 B
• No partial dependency exists between the key attributes and non-key attributes
• Students who score more than or equal to 80 marks are awarded with “A” grade
• Students who score more than or equal to 70 marks up till 79 are awarded with “B” grade
• Students who score more than or equal to 60 marks up till 69 are awarded with “C” grade
• Students who score more than or equal to 50 marks up till 59 are awarded with “D” grade
The University management which is committed to improve the quality of education ,wants to
change the existing grading system to a new grading system .In the present RESULT table
structure ,
• We need to do multiple updates on the existing record to bring them to new grading
definition
• 2NF does not take care of all the anomalies and inconsistencies.
• It is in 2NF
1001 M4 89
1002 M4 78
1001 H6 87
1003 C3 90
1004 B3 78
1002 P3 67
1005 P3 78
1003 B4 67
1005 H6 56
1004 M4 78
100 95 A+
94 90 A
89 85 B+
84 80 B
79 75 B-
74 70 C
69 65 C-
After Normalizing tables to 3NF , we got rid of all the anomalies and inconsistencies. Now we can
add new grade systems, update the existing one and delete the unwanted ones.
Hence the Third Normal form is the most optimal normal form and 99% of the databases which
require efficiency in
• INSERT
• UPDATE
• DELETE
Operations are designed in this normal form.
Example 2. The following example will illustrate how database normalization performed.
Title Author Bio ISBN Subject Pages Publishe
r
MY-SQL Rabiya Database En00011 Database 500 Mega
Network administrator 1 Design 200 Mega
Quality Yonas Network En00011 110 Mega
con. administrator 2 54 Aster
Kidus Member of ISO En00011
Teamwork
member of the 3
documentation En00011
team. 4
In the example shown above, a lot of storage space will be wasted if any one criterion (author or
publisher) is considered as the identification key.
For 1NF, ensure that the values in each column of a table are atomic; which means they are unique,
containing no sets of values. In our case, Author and Subject do not comply.
One method for bringing a table into 1NF is to separate the entities contained in the table into
separate tables. In our case this would result in Book, Author, Subject, and Publisher tables.
Book’s table:
Author’s table:
Author_I First Last
D Name Name
1 Rabiay Russell
2 Kidus Yared
3 Mike Hilyer
Subject’s table:
Subject Last_name
_ID
1 Rusell
2 Yared
Publisher’s table:
Name Address City State Zip
Publisher_ID
The book‟s table may have many to many relations with the Author‟s table.
Author‟s table may have many books and a book may have more than one author.
The Book‟s table may have many to many relations with the Subject table.
The books may fit in many subjects and the subjects may have many books.
ISBN Subject_ID
En000111 1
En000112 2
Book_Subject table:
ISBN Subject_ID
En000111 1
En000112 2
• One-to-many in our example will be Books to Publisher. Each book has only one
Publisher but one Publisher may have many books.
We can achieve „one-to-many‟ relationships with a foreign key. A foreign key is a mechanism in
database management systems (DBMS) that defines relations and creates constraints between
data segments. It is not possible to review what is not related to the specific book. It is not
possible to have a book without an author or publisher.
ISBN Title Pages Publisher_
ID
En000111 3 A great
book!
In the publisher table, the City and State are actually dependent on the zip code not the
Publisher_ID
251 Addis 14
Ababa
Through the process of database normalization we bring our schema's tables into conformance with
progressive normal forms. As a result the tables each represent a single entity (a book, an author, a
subject, etc) and we benefit from decreased redundancy, fewer anomalies and improved efficiency.
Self-check
LO2: Develop normalization
B. kids
C. Face book
Answers:
1. E 2. A 3. B 4. A 5. A. 6. B. 7. A
LO3. Validate data Model using Normalization
Does the model represent and correctly reproduce the behaviors of the real world system?
• Validation ensures that the model meets its intended requirements in terms of the
methods employed and the results obtained
• The ultimate goal of model validation is to make the model useful in the sense that the
model addresses the right problem, provides accurate information about the system being
modeled, and to makes the model actually used.
Business Data model describes the major information of interest from a business respective. The
business data model is independent of specific application system needs and is best developed by
the data management group with input from business subject matter experts.
Model validation by the business community can be performed in several ways. From the modeler's
perspective, the most straightforward approach is to print a copy of the diagram and walk through
it with the business representatives. This approach requires the businessperson to understand a
diagram that is IT oriented, and it is appropriate when the validation is performed by people directly
involved in IT projects. Validation activities include:
Completeness for use: Is all of the data needed to support the pertinent application development
project modeled
Completeness: Are the business areas and concepts represented by the model completely
addressed?
Accuracy: Does the model correctly represent the entities, attributes and business relationships of
interest?
Generalization and specialization: Does the model represent the data at the appropriate level of
abstraction? And Does the model follow the data modeling rules for the level of normalization
(typically third normal form) used?
Specification and generalization can be classified into disjoints and completeness. The
disjointness constraint allows us to specify whether an instance of a super type may
simultaneously be a member of two or more subtype.
Self-check
LO3: Validate model