0% found this document useful (0 votes)
10 views

Chapter 7 - Database Design

The document outlines the steps of database design, including requirements analysis, conceptual design, logical design, and physical design. It emphasizes the importance of normalization to reduce data redundancy and avoid anomalies such as insertion, deletion, and modification issues. Additionally, it discusses functional dependencies and the various normal forms to ensure a well-structured relational database.

Uploaded by

kefitagedeno791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 7 - Database Design

The document outlines the steps of database design, including requirements analysis, conceptual design, logical design, and physical design. It emphasizes the importance of normalization to reduce data redundancy and avoid anomalies such as insertion, deletion, and modification issues. Additionally, it discusses functional dependencies and the various normal forms to ensure a well-structured relational database.

Uploaded by

kefitagedeno791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Database Design

1
Outlines
 Steps of Database Design

 Convert ER to relations

 Normalization

 Physical database design

2
Database Design
 Database design is the process of coming up with different kinds of
specification for the data to be stored in the database.
 Describe how data is stored in the computer system.
 Defining its structure, characteristics and contents of data.
Database Design Process
Step 1:- Requirements collection and Analysis
 Prospective users are interviewed to collect information.
 This step result in a concise set of users requirement.
 The functional requirement should be specified as well as data
requirements.
 Functional requirement can be documented using diagrams such as
sequence diagrams, DFD scenarios.

3
Database Design … Cont’d
Step 2:- Conceptual Design
 Create conceptual schema.
 Conceptual schema:- concise description of data requirement of the user,
and include a detailed description of the entity types, relationships,
constraint.
 End user must understand it.
Step 3:- Database implementation (Logical Design)
 Use one of DBMS for implementation.
 The conceptual schema is transformed from the high level data model
into implementation model.
Step 4:- Physical Design
 International storage structure, indexes, access paths and file
organizations are specified.
 Application programs are designed and implemented.
4
Database Design … Cont’d
 Generally the design part has divided into three sub phases
 Conceptual Design
 Logical Design
 Physical Design
Design strategies:
1. Top Down
 Start with high level abstraction and refine it.
 High level entities
 Add sub classes
 Attributes
2. Bottom up
 Start with basic abstraction and then combine them.
 Attributes
 Group in to entities
5
Database Design … Cont’d
3. Inside out
 Special case of bottom up.
 Focus on central set of concepts and work out wards.
 No burned on initial designer.
4. Mixed
 Start with top down then use inside out or bottom up.
 Divide and conquer.

6
Conceptual Design
 Is the process of constructing a model of the information used in an
enterprise, independent of any physical consideration.
 Is the source of information for logical design.
 Is high level and understand by non technical user.
 Conceptual model of enterprise, independent of implementation detail
such as target DBMS, application programs, programming language,
hardware platform, performance issues etc.
Tasks to be performed:-
 Identity entity types and relationships.
 Associate attributes with entities.
 Determine attribute domains.
 Determine unique identifier (Key) attributes.
 Use entity relationship model (ER).
7
Conceptual Design … Cont’d
Why conceptual model:
 Independent of DBMS.
 Allow easy common b/n user and developer.
 Is permanent description of the database requirements.
Database requirements
 We must convert written database requirement in to an E-R diagram.
 Need to determine the entities, attributes and relationships.
 Nouns = entities
 Adjectives = attributes
 Verbs = relationships

8
Logical Design
 Is the process of constructing model of data used in an organization.

 Constructing a model based on a specific data model. (E.g. relation, oo)

 Independent of a particular DBMS and other physical consideration.

 Conceptual schema – Logical schema.

 ER Diagram converts to relations.

9
Logical Design … Cont’d
Converting ER Diagram to Relations
 Three basic rules to convert ER into tables.
 For a relation with one to one cardinality
 All the attributes are merged into a single table.
 i.e. primary key or candidate key of one relation is foreign key for the
other.
 For a relation with one to many cardinality
 Post the primary key or candidate key for the “one” side as a foreign
key attribute to the “many side”.
 For a relationship with many to many
 Create a new table (which is the associative entity) and post primary
key or candidate key from each entity as attributes in the new table
along with some additional attributed (if applicable)
10
Logical Design … Cont’d

11
Logical Design … Cont’d

12
Logical Design … Cont’d

13
Logical Design … Cont’d
Mapping Regular Entities to relation
 Simple attributes: ER Attributes map directly on to the relation.
 Composite attribute: Use only their simple, component attributes
 Multi-Valued Attribute: Becomes a separate relation with a foreign
key taken from the super entity.

14
Logical Design … Cont’d

15
Normalization
 A relational database is merely a collection of data, organized in a
particular manner. The father of the relational database approach, Codd
created a series of rules called normal forms that help define that
organization.
 One of the best ways to determine what information should be stored in a
database is to clarify what questions will be asked of it and what data
would be included in the answers.
 Database normalization is a series of steps followed to obtain a database
design that allows for consistent storage and efficient access of data in a
relational database. These steps reduce data redundancy and the risk of
data becoming inconsistent.
 NORMALIZATION is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;
Insertion, Deletion and Modification Anomalies
16
Normalization … Cont’d
 Normalization may reduce system performance since data will be cross
referenced from many tables.
 Thus DE normalization is sometimes used to improve performance, at
the cost of reduced consistency guarantees.
 Normalization normally is considered as good if it is lossless
decomposition.
 Mnemonic for remembering the rationale for normalization could be
the following:
 No Repeating or Redundancy: no repeating fields in the table
 The Fields Depend Upon the Key: the table should solely depend on the key
 The Whole Key: no partial key dependency
 And Nothing But The Key: no inter data dependency

17
Normalization … Cont’d
 All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
 Pitfalls of Normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

18
Normalization … Cont’d
 The underlying ideas in normalization are simple enough. Through
normalization we want to design for our relational database a set of
tables that;
1. Contain all the data necessary for the purposes that the
database is to serve,
2. Have as little redundancy as possible,
3. Accommodate multiple values for types of data that require
them,
4. Permit efficient updates of the data in the database, and
5. Avoid the danger of losing data unknowingly
 The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes;

19
Normalization … Cont’d
1. Insertion anomalies
 An "insertion anomaly" is a failure to place information about a new database
entry into all the places in the database where information about that new
entry needs to be stored.
 In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database.
 In an inadequately normalized database, information about a new entry may
need to be inserted into more than one place and, human fallibility being what
it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
 A "deletion anomaly" is a failure to remove information about an existing
database entry when it is time to remove that entry.
 In a properly normalized database, information about an old, to-be-gotten-rid-
of entry needs to be deleted from only one place in the database.

20
Normalization … Cont’d
 In an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what
it is, some of the needed additional deletions may be missed.
3. Modification anomalies
 A modification of a database involves changing some value of the attribute of a
table. In a properly normalized database table, what ever information is
modified by the user, the change will be effected and used accordingly.
 The purpose of normalization is to reduce the chances for anomalies to occur in a
database.

21
Normalization … Cont’d
 Deletion Anomalies: If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++ and its skill
type.
 Insertion Anomalies: What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed as a value for skill
and we have no clue about the type of skill that Pascal should be
categorized as.
 Modification Anomalies: What if the address for Helico is changed fro
Piazza to Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which is prone to
error.
 Database-management system can work only with the information that we
put explicitly into its tables for a given database and into its rules for
working with those tables, where such rules are appropriate and possible.
22
Functional Dependency (FD)
 Before moving to steps of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
 The logical association between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
 Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
 If the data item A is the determinant data item and B the dependent
data item then the direction of the association is from A to B and not
vice versa.

23
Functional Dependency (FD) … Cont’d
 The essence of this idea is that if the existence of something, call it A, implies
that B must exist and have a certain value, then we say that "B is functionally
dependent on A."
 We also often express this idea by saying that "A determines B," or that "B is
a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the
statement, "If A, then B.“
 It is important to note that the value B must be unique for a given value of A,
i.e., any given value of A must imply just one and only one value of B, in
order for the relationship to qualify for the name "function." (However, this
does not necessarily prevent different values of A from implying the same
value of B.)
 X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y

24
Functional Dependency (FD) … Cont’d
 The notation is: A B which is read as; B is functionally dependent on A
 In general, a functional dependency is a relationship among attributes. In
relational databases, we can have a determinant that governs one other
attribute or several other attributes.
 FDs are derived from the real-world constraints on the attributes

 Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. Dinner Wine
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
25
Partial Dependency
 If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.
 Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B} C and B C
Then C is partially functionally dependent on {A,B}

26
Full Dependency
 If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.

 Let {A,B} is the Primary Key and C is no key attribute

27
Transitive Dependency
 In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."

28
Steps of Normalization
 We have various levels or steps in normalization called Normal Forms.
 The level of complexity, strength of the rule and decomposition increases
as we move from one lower level Normal Form to the higher.
 A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
 Normal form below represents a stronger condition than the previous one.
 Normalization towards a logical design consists of the following steps:
 Un Normalized Form: Identify all data elements
 First Normal Form: Find the key with which you can find all data
 Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
 Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.
 For most practical purposes, databases are considered normalized if they
29 adhere to third normal form.
UNNORMALIZED FORM (UNF)
 A table that contains one or more repeating groups.
 A repeating group is a field or group of fields that hold multiple values
for a single occurrence of a field.

Repeating group= (Skill, SkillType, School, SchoolAdd, SkillLevel)

30
First Normal Form (1NF)
 Requires that all column values in a table are atomic (e.g., a number is
an atomic value, while a list or a set is not).
 We have two ways of achieving this:
 1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship
 2. Moving this repeating groups to a new row by repeating the
common attributes. If so then Find the key with which you can find
all data
 Definition of a table (relation) in 1NF if:
 There are no duplicated rows in the table. Unique identifier
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.

31
First Normal Form (1NF) … Cont’d
 FIRST NORMAL FORM (1NF): Remove all repeating groups.
 Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in
relational database.

32
First Normal Form (1NF) … Cont’d
 Example 2: Consider the following UNF relation.

Here , Tele and fax fields are multi-valued


- To change in to 1NF relation, we need to split the table in to three
- The following tables are equivalent 1st Normal form of the above employee
table:

33
Second Normal form 2NF
 No partial dependency of a non key attribute on part of the primary
key.
 Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.
 Definition of a table (relation) in 2NF
 It is in 1NF and
 If all non-key attributes are dependent on all of the key. i.e. no
partial dependency.
 Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of
2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and
if it has no partial dependencies."

34
Second Normal form 2NF … Cont’d
 Example for 2NF:

 This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the
primary key.
 {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID
 But in addition to this we have the following dependencies
EmpID EmpName
ProjNo ProjName, ProjLoc, ProjFund, ProjMangID

35
Second Normal form 2NF … Cont’d
 As we can see some non key attributes are partially dependent on
some part of the primary key. Thus these collections of attributes
should be moved to a new relation.

36
Second Normal form 2NF … Cont’d
• Example 2: Normalize the following relation.

• The primary key for this table is the composite key (PatientId,
RelativeId).

37
Second Normal form 2NF … Cont’d
 So, to determine if it satisfies 2NF, you have to find out if all other
fields in it depend fully on both PatientId and RelativeId; that is,you
need to decide whether the following conditions are true:
 (PatientId, RelativeId) Relationship and
 (PatientId, RelativeId) Patient_tel.
 However, on the dependencies in the patient table, only the following
are true:
 (PatientId, RelativeId) Relationship and
 (PatientId) Patient_tel.

Therefore; based on the above dependency the normalized relation will be


divided into to tables.

38
Second Normal form 2NF … Cont’d

39
Third Normal Form (3NF )
 Eliminate Columns Not Dependent On Key - If attributes do not contribute to
a description of the key, remove them to a separate table.
 This level avoids update and delete anomalies.
 Definition of a Table (Relation) in 3NF
 It is in 2NF and
 There are no transitive dependencies between attributes.
 Example for (3NF): Assumption: Students of same batch (same year) live in
one building or dormitory

• This schema is in its 2NF since the primary key is a single attribute.
40
Third Normal Form (3NF ) … Cont’d

41
Third Normal Form (3NF ) … Cont’d
 Consider the following example:

Now, PK = empid
 We have functional dependencies:
 Empid → depid
 Depid → depname
 Or Depid → depbudjet
 Therefore, the above table is not is 3NF. To normalize it, we can use the
functional dependencies:
 Depid → depname
 Depid → depbudjet And
 Empid → depid
42
Third Normal Form (3NF ) … Cont’d
 So that the resulting tables are the following:

43
Other Normal Forms
 Boyce-Codd Normal Form (BCNF): Isolate Independent Multiple
Relationships - No table may contain two or more 1:n or N:M
relationships that are not directly related. The correct solution, to cause
the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent.
 Def.: A table is in BCNF if it is in 3NF and if every determinant is a
candidate key.
 Forth Normal form (4NF): Isolate Semantically Related Multiple
Relationships - There may be practical constrains on information that
justify separating logically related many-to-many relationships.
 Def.: A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.

44
Other Normal Forms … Cont’d
 Fifth Normal Form (5NF): A model limited to only simple
(elemental) facts.
 Def.: A table is in 5NF, also called "Projection-Join Normal
Form" (PJNF), if it is in 4NF and if every join dependency in the
table is a consequence of the candidate keys of the table.
 Domain-Key Normal Form (DKNF): A model free from all
modification anomalies.
 Def.: A table is in DKNF if every constraint on the table is a
logical consequence of the definition of keys and domains.

45
Physical Database Design … Cont’d
 The Logical database design is concerned with the what;
 The Physical database design is concerned with the how.
 Physical database design is the process of producing a description of the
implementation of the database on secondary storage.
 It describes the base relations, file organization, and indexes used to
achieve effective access to the data along with any associated integrity
constraints and security measures.
 Sources of information for the physical design process include global
logical data model and documentation that describes model.
 Knowledge of the DBMS that is selected to host the database systems,
with all its functionalities, is required since functionalities of current
DBMS vary widely.

46
Steps in physical database design
1. Translate logical data model for target DBMS
 To determine the file organizations and access methods that will be
used to store the base relations; i.e. the way in which relations and
tuples will be held on secondary storage
 Design enterprise constraints for target DBMS
 This phase is the translation of the global logical data model to produce
a relational database schema in the target DBMS. This includes creating
the data dictionary based on the logical model and information
gathered.
 After the creation of the data dictionary, the next activity is to
understand the functionality of the target DBMS so that all necessary
requirements are fulfilled for the database intended to be developed.

47
Steps in physical database design … Cont’d
 Knowledge of the DBMS includes:
 how to create base relations
 whether the system supports:
 definition of Primary key
 definition of Foreign key
 definition of Alternate key
 definition of Domains
 Referential integrity constraints
 definition of enterprise level constraints
 Some tasks to be done:
 1.1. Design base relation
 1.2. Design representation of derived data
 1.3. Design enterprise constraint

48
Steps in physical database design … Cont’d
1.1. Design base relation
 Designing base relation involves identification of all necessary requirements
about a relation starting from the name up to the referential integrity constraints.
 The implementation of the physical model is dependent on the target DBMS
since some has more facilities than the other in defining database definitions.
 The base relation design along with every justifiable reason should be fully
documented.
1.2. Design representation of derived data
 While analyzing the requirement of users, we may encounter that there are some
attributes holding data that will be derived from existing or other attributes. A
decision on how to represent such data should be devised.
 Most of the time derived attributes are not expressed in the logical model but
will be included in the data dictionary. Whether to store stored attributes in a
base relation or calculate them when required is a decision to be made by the
designer considering the performance impact.
49
Steps in physical database design … Cont’d
1.3. Design enterprise constraint
 Data in the database is not only subjected to constraints on the
database and the data model used but also with some enterprise
dependent constraints.
 This constraint definition is also dependent on the DBMS selected
and enterprise level requirements.
 All the enterprise level constraints and the definition method in the
target DBMS should be fully documented.

50
Steps in physical database design … Cont’d
2. Design physical representation
This phase is the level for determining the optimal file organizations to store the
base relations and indexes that are required to achieve acceptable performance,
that is, the way in which relations and tuples will be held on the secondary
storage.
 2.1. Analyze transactions
To understand the functionality of the transactions that will run on the
database and to analyze the important transactions
 2.2. Choose file organization
To determine an efficient file organization for each base relation
 2.3. Choose indexes
Used for quick access
 2.4. Estimate disk space and system requirement
To estimate the amount of disk space that will be required by the database.
51
Steps in physical database design … Cont’d
3. Design user view
To design the user views that were identified in the conceptual
database design methodology
4. Design security mechanisms
5. Consider controlled redundancy
 To determine whether introducing redundancy in a controlled
manner by relaxing the normalization rules will improve the
performance of the system.
6. Monitor and tune the operational system
 To design the access rules to the base relations and user views

52

You might also like