Database Management System by IDM Computer Studies (PVT) Ltd.
Database Management System by IDM Computer Studies (PVT) Ltd.
Database Management System by IDM Computer Studies (PVT) Ltd.
Developed by
Interactive Training Division
IDM Computer Studies (Pvt) Ltd.
http://www.idm.edu
National Diploma in Information & Communication Technology
Database Management System
CHAPTER 1 ...................................................................................................................... 4
Fundamentals of Database Concept.................................................................................. 4
CHAPTER 2 .................................................................................................................... 21
Database Design and Modeling ....................................................................................... 21
CHAPTER 3 .................................................................................................................... 48
Data Modeling Using ER Model ....................................................................................... 48
CHAPTER 4 .................................................................................................................... 59
Relational Data Model and Languages ............................................................................ 59
CHAPTER 5 .................................................................................................................... 73
CHAPTER 1
Introduction
In this section we discuss in an informal manner the idea of a database as an abstract
machine. An abstract machine is a model of the key features of some system without
any details of implementation. The objective of this section is to describe the
fundamental concepts of a database system without introducing any formal notation, or
introducing any concepts of representation, development or implementation.
What is a Database?
Most modern-day organizations have a need to store data relevant to their day today
activities. Those organizations choose an electronic database to organize and store
some of this data.
Take for instance a university. Most universities need to record data to help in the
activities of teaching and learning. Most universities need to record, among other things:
Various members of staff at a university will be entering data such as this into a
database system. For instance, administrators in academic departments may enter data
relevant to courses and modules, course co-coordinators may enter data pertaining to
lecturers, and data relevant to students, particularly their enrolments on courses and
modules, may be entered by staff at a central registry.
Once the data is entered into the database it may be utilized in a variety of ways. For
example, a complete and accurate list of enrolled students may be used to generate
membership records for the learning resources center; it may be used as a claim to
educational authorities for student income or as an input into a timetabling system which
might attempt to optimize room utilization across a university campus.
In this section our objective is to learn fundamental concept of a database system using
this example of an academic database to illustrate concepts.
Properties of a Database
The term database also usually implies a series of related properties: data sharing, data
integration, data integrity, data security, data abstraction and data independence.
Data sharing
Data stored in a database is not usually held solely for the use of one person. A
database is normally expected to be accessible by more than one person, perhaps at
the same time. Hence a students' database might be accessible by members of not only
academic but also administrative staff.
Data integration
Shared data brings numerous advantages to the organization. Such advantages,
however, only result if the database is treated responsibly. One major responsibility of
database usage is to ensure that the data is integrated. This implies that a database
should be a collection of data, which, at least ideally, has no redundant data. Redundant
data is unnecessarily duplicated data.
In the past, for instance separate files of student information might have been
maintained by different academic and administrative departments of a university with
many Fields in common. The aim of a database system would be to store one logical
item of data in one place only. Hence, one student record would be accessible to a
range of information systems.
Data integrity
Another responsibility arising as a consequence of shared data is that a database should
display integrity. In other words, that the database should accurately reflect the universe
of discourse that it is attempting to model.
Data security
One of the major ways of ensuring the integrity of a database is by restricting access. In
other words, securing the database. The main way that this is done in contemporary
database systems is by defining in some detail a set of authorized users of the whole, or
more usually parts of the database. For instance, a secure system would be one where
the finance department has access to information used for the collection of student fees
but is prohibited from changing the fee levels of given students.
Data abstraction
A database can be viewed as a model of reality. The information stored in a database is
usually an attempt to represent the properties of some objects in the real world.
Data independence
One immediate consequence of abstraction is the idea of buffering data from the
processes that use such data. The ideal is to achieve a situation where data
organization is transparent to the users or application programs which feed off data. If for
instance, a change is made to some part of the underlying database no application
programs using affected data should need to be changed. Also, if a change is made to
some part of an application system then this should not affect the structure of the
underlying data used by the application.
These properties amount to desirable features of the ideal database. As we shall see,
properties such as data independence are only partly achieved in current
implementations of database technology.
1. Data maintenance - Adding new data structures to the database, removing data
structures from the database, modifying the structure of existing data, inserting
new data into existing data structures, updating data in existing data structures,
Data data
deleting data from existing Management
structures.Subsystem
2. Data Retrieval - Querying existing data by end -users and extracting data for use
by application
Data programs. Information Database
Maintenance Retrieval Administration
3. Data control - Creating and monitoring users of the database, restricting access to
data in the database and monitoring the performance of databases.
Database
Universe Of Discourse
A database is a model of some aspect of the reality of an organization. It is conventional
to call this reality a universe of discourse (UOD). Consider, for instance, an academic
setting such as a university. The UOD in this case might encompass, amongst other
things, modules offered to students and students taking modules.
Modules and students are examples of things of interest to us in a university. We call
such things as interest classes or entities. We are particularly interested in the
phenomenon that students take offered modules. This facet of the UOD would be
regarded as a relationship between those classes or entities.
A class or entity such as module or student is normally defined as such because we
wish to store some information about occurrences of the class. In other words, classes
have properties or attributes. For instance, students have names, addresses and
telephone numbers; modules have titles and credit points or current rolls.
A database of whatever form, elec tronic or otherwise, must be designed. The process of
database design is the activity of representing classes, attributes and their relationships
in a database. (This process is illustrated in figure 1.1)
Fact Bases
A Database can be considered as a well regained collection of data, which is meant to
represent some UOD. Data are facts about the domain. A datum, a unit of data, is one
symbol or a collection of symbols that is used to represent something that within the
domain. Facts by themselves are meaningless; to prove usefulness they must be
interpreted. Therefore information is interpreted row data. Information is data placed
within a meaningful context. Information is data with an assigned semantic-meaning.
Consider the string of symbols 55. Taken together these symbols form a datum. Taken
together however they are meaningless. To turn them into information we have to supply
a meaningful context. We can interpret them this symbol as a student number, a
student's age, or the number of students taking a module. Information of this sort will
contribute to our knowledge of a particular domain, in this case educational
administration.
A database can be considered as a collection of facts or positive assertions about a
UOD, such as relational database design is a module and John Davies takes relational
database design. Usually negative facts, such as what modules are not taken by a
student, are not stored. Hence, databases constitute 'closed worlds' in which only what
is explicitly represented is regarded as being true.
A database is said to be in a given state at a given time. A state denotes the entire
collection of facts that are true at a given in time. A database system can therefore be
considered as fact base, which changes with the time.
Below, for instance, we informally define the schema relevant to a university database:
Schema: university
Classes:
Modules - courses run by the institution in an academic semester
Students - people taking modules at the institution
Relationships:
Students take Modules
Attributes:
Modules have names
Students have names
Extension: university
Modules:
Computer Science
System Analysis and Design
Students:
Anne Johon
Peter Jones
Miller L.H
Takes:
Miller L.H takes System Analysis and Design
Developed by IDM Interactive Training Division for 8
Skills Development Project
National Diploma in Information & Communication Technology
Database Management System
Integrity
When we say that a database displays integrity we mean that it is an accurate reflection
of its UOD. The process of ensuring integrity is a major feature of modern information
systems. The process of designing for integrity is a much neglected aspect of database
development.
Integrity is an important issue because most databases are designed, once in use, to
change. In other words, the data in a database will change over a period of time. If a
database does not change, i.e. it is only used for reference purposes, then integrity is
not an issue of concern.
It is useful to think of database change as occurring in discrete rather than continuous
time. In this sense, we may conceive of a database as undergoing a number of state-
changes, caused by external and internal events. Of the set of possible future states
feasible for a database some of these states are valid and some are invalid. Each valid
state forms the extension of the database at that moment in time. Integrity is the process
of ensuring that a database travels through a space defined by valid states.
Integrity involves determining whether a transition to the state below is a valid one. That
is, integrity involves answering questions such as: is it valid to add another data recode
university database, which relates Miller L.H to Computer Science?
Extension: university
Modules:
Computer Science
System Analysis and Design
Students:
Anne Johon
Peter Jones
Miller L.H
Takes:
Miller L.H takes System Analysis and Design
Miller L.H takes Computer Science
Replication
When we design the database should design to minimize replication of data. In a
database we attempt to store only one item of data about one object or relationships
between objects in our UOD. Ideally, a database should be a repository with no
replicated facts.
If we try to add the assertion Miller L.H takes Computer Science to our extension. This is
not a valid transition. As we shall see, clearly adding this assertion to our extension will
replicate the relationship between Miller L.H and Computer Science
Integrity Constraints
Database integrity is ensured through integrity constraints. An integrity constraint is a
rule, which establishes how a database is to remain an accurate reflection of its UOD.
Constraints may be divided into two major types: static constraints and transition
constraints.
A static constraint or “state invariant” is used to check that an incoming transaction will
not change a database into an invalid state. A static constraint is a restriction defined on
states of the database. An example of a static constraint relevant to our University
database might be: students can only take currently offered modules only. This static
constraint would prevent us from entering the following fact into our current database:
Database Functions
Most data held in a database is there to fulfill some organizational need. To perform
useful activity with a database we need two types of function: Update and Query
functions . Update functions cause changes to data. Query functions will extract data
from the database.
Update Functions
A transaction is an update function. It changes a database from one state to another.
Update functions that might be relevant to the university database might be:
Update Functions:
Initiate Semester
Offer Module
Cancel Module
Enroll student on course
Enroll student in module
Transfer Student between modules
Query Functions
The other major type of function is the query function. This does not modify database in
any way, but is used primarily to check whether a fact or group of facts holds in a given
database. As we shall see, query functions can use to retrieve the data from the
database.
Is course X being offered?
Is Student Y takes Course X?
Above Query functions will relevant to our OUD.
Formalisms
Every database system must use some representation formalism. Patrick Henry Winston
has defined a representation formalism as being, “a set of syntactic and semantic
conventions that make it possible to describe things” (Winston, 1984). The syntax of a
representation specifies a set of rules for combining symbols and arrangements of
symbols to form statements in the representation formalism. The semantics of a
representation specify how such statements should be interpreted. That is, how can we
derive meaning from them?
In database terms the idea of representation formalism corresponds with the concept of
a data model. A data model provides for database developers a set of principles by
which they can construct a database system.
Database Views
Part of the reason that data in databases is shared is that a database may be used for
different purposes within one organization. For instance, the academic database
described in this chapter might be used for various purposes in a university such as
recording student grades or timetabling classes. Each distinct user group may demand a
particular subset of the database in terms of the data it needs to perform its work. Hence
administrators in academic departments will be interested in items such as student
names and grades while a timetabler will be interested in rooms and times. This subset
of data is known as a view.
In practice a view is merely a query function that is packaged for use by a particular user
group or program. It provides a particular window into a database and is discussed in
detail in chapter 10.
Database Architecture
The internal level is the one closest to physical storage - i.e. the one concerned with the
way the data is actually stored.
Mappin
g
Conceptual
level
Conceptual
Mappin
g
Stored Databases
Figure 1.2
Mappings
In a DBMS based on the three schema architecture, each user group refers only to its
own external schema which is converted into a request on the conceptual schema, then
into a request on the internal schema for processing on the stored database. If the
request is a database retrieval, the data extracted from the stored database must be
reformatted to match the user's external view before it is presented to the user.
The processes of transforming requests and results between levels are called mappings.
There are two levels of mapping in the architecture, one between the external and
conceptual levels of the system and one between the conceptual and internal levels.
The conceptual/internal mapping defines the correspondence between the conceptual
view and the stored database; it specifies how conceptual records and fields are
represented at the internal level. If the structure of the stored database is changed i.e. if
a change is made to the storage structure definition - then the conceptual/internal
mapping must also be changed accordingly, so that the conceptual schema may remain
invariant.
An external/conceptual mapping defines the correspondence between particular
external view and the conceptual view. The differences that may exist between these
two levels are similar to those that may exist between the conceptual view and the
stored database. For example, fields can have different data types, field and record
names can be changed, and multiple conceptual fields can be combined into a single
(virtual) external field, and so on. Any number of external views can exist at the same
time; any number of users can share a given external view; different external views can
overlap.
Data Models
Every database, and indeed every DBMS, must adhere to the principles of some data
model. However, the term data model is somewhat ambiguous. In the database system
literature the term is used in a number of different senses, two of which are the most
important: that of architecture for data; that of an integrated set of data requirements.
The set of principles that defines a data model may be divided into three major parts:
1. Data definition - A set of principles concerned with how data is structured.
2. Data manipulation - A set of principles concerned with how data is operated on.
3. Data integrity - A set of principles concerned with determining which states are
valid for a database.
Data definition involves defining an organization for data: a set of templates into which
data will be fitted.
Data manipulation concerns the process of how the data is accessed and how it is
changed in the database. Data integrity is very much linked with the idea of data
manipulation in the sense that integrity concerns the idea of what are valid changes and
invalid changes to data.
Any database and DBMS must adhere to the tenets of some data model. Hence, in the
relational data model, data definition involves the concept of a relation, data
manipulation involves a series of relational operators, and data integrity amounts to two
rules - entity and referential integrity. Note that by the data integrity part of a data model
we are describing only those rules those are inherently part of the data model. A great
deal of other integrity constraints or rules will have to be specified by additional means,
i.e. using mechanisms not inherently part of the data model.
Schema: university
Classes:
Modules - courses run by the institution in an academic semester
Students - people taking modules at the institution
Relationships:
Students take Modules
Attributes:
Modules have names
Students have names
Distributed Databases
CHAPTER 2
Introduction
This chapter of this course is concerned with the general subject of database design and
modeling (more specifically. relational database design). The database design problem
can be stated very simply: Given some body of data to be represented in a database,
how do we decide on a suitable logical structure for that data"—in other words, how do
we decide what entities should exist and what attributes they should have? The practical
significance of this problem is obvious.
Before we start getting into details, a number of preliminary remarks are in order:
1. First, note that we are concerned here with logical (or conceptual) design and
physical design. Because physical design is also very important same as
conceptual design. However:
2. Physical design can be treated as a separate, follow -on activity. In other words,
the "right" way to do database design is to do a clean logical (i.e., relational)
design first, and then, as a separate and subsequent step, to map that logical
design into whatever physical structures the target DBMS happens to support. (In
other words. as noted in this Chapter, the physical design should be derived from
the logical design. not the other way around.)
3. Physical design, by definition, tends to be somewhat DBMS-specific. Logical
design, by contrast, is or should be quite DBMS -independent, and there are
some Solid theoretical principles that can be applied to the problem.
The structure of this chapter is as follows. First in this Chapter lays some theoretical
groundwork on the database design and modeling. And then we try to map the design
process to model enterprise data base requirements. Father we concern on logical and
physical database design briefly by introducing the concepts of "entity/relationship"
modeling and normalization, which use to build the logical database structure. Later of
this chapter will concern the different approaches such as Hierarchical, Network and
relational approaches briefly.
.
Real
World View
Modeling
Requirement
Analysis View
Integration
Conceptual
Modeling
Logical Database
Modeling System
Volume, Usage,
Integrity
Analysis
Physical
Implementation Modeling
Decisions
There are hence three core stages to any database development task:
Conceptual modeling, logical modeling and physical modeling (see figure 2.1). Some
people see conceptual modeling as a stage headed by requirements analysis stage
The process of requirements analysis involves eliciting the initial set of information and
processing requirements from users.
The conceptual modeling stage can be thought of as comprising two sub stages: view
modeling, which transforms the user requirements into a number of individual user
views, and view integration which combines these schemas into a single global schema.
Most conceptual models are built using constructs from the semantic data models. In
chapter 3 we shall use the extended entity-relationship model to illustrate the process of
building conceptual models.
The process of logical modeling is concerned with determining the contents of a
database independently of the exigencies of a particular physical implementation. This is
achieved by taking the conceptual model as input, and transforming it into the
architectural data model supporting the target database management system (DBMS).
This is usually the relational data model. In chapter 4 we shall outline the key concepts
of the relational data model.
Physical modeling involves the transformation of the logical model into a definition of the
physical model suitable for a specific software/hardware configuration. This is usually
some schema expressed in the data definition language of SQL, In chapter 4 we
illustrate the primary components of this modern-day database sub-language.
1. Data is facts. A datum, a unit of data, is one or more symbols that are used to
represent something.
2. Information is interpreted data. Information is data placed within a meaningful
context.
3. Knowledge is derived from information by integrating information with existing
knowledge.
4. Information is necessarily subjective. Information must always be set in the
context of its recipient. The same data may be interpreted differently by different
people depending on their existing knowledge and different contexts.
Human Activity
System
Information System
Information Technology
Layers of an IT System
It is useful to consider an information technology system as being made up of a number
of subsystems or layers (see figure 2.3):
1. Interface subsystem. This subsystem is responsible for managing all interactions
with the end-user. Hence, it is frequently referred to as the user interface.
2. Rules subsystem. This subsystem manages the application logic in terms of a
defined model of business rules.
3. Transaction subsystem. This subsystem acts as the link between the data
subsystem and the rules and interface subsystems. Querying, insertion and
update activity is triggered at the interface, validated by the rules subsystem and
packaged as units (transactions) that will initiate actions (responses or changes)
in the data subsystem.
4. Data subsystem. This subsystem is responsible for managing the underlying data
needed by an application.
Interface Subsystem
Rule Subsystem
Transaction Subsystem
Data Subsystem
Historically, the four parts of a conventional IT application were constructed using one
tool, the high-level or third generation programming language (3GL).
The key conclusions to be drawn from this section are that information systems support
human activity systems and consequently cannot be designed without an effective
understanding of the context of human activity. Information technology systems are
critical components of contemporary information systems. Database systems are critical
elements of information technology systems.
Corporate
Data Model
Consider the case of a university setting. Traditionally, to support the activities of the
university given application data models may have been produced to document the data
requirements relevant to a particular information system such as a library information
system or a student enrolment system. At a higher level a data model may be produced
to integrate the data requirements relevant to a particular business area. In a university
setting, for instance, we might develop a data model relevant to the key 'business' areas
of teaching, research and consultancy.
At the highest level the university may have developed a corporate data model to
integrate the data requirements for the entire organization. This may either be a
unification of the business area data models or a summary of their key elements
There is a close relationship between the three levels of database models and the three
levels of an IS architecture. This is illustrated in figure 2.5.
Business Strategy
Corporate
Information strategy Data Model
Business Area
Information systems strategy Data Model
Figure 2.5
Production Databases
Mass-Deployment Databases
These databases are used to deliver data to the desktop. Generally such databases are
single-user tools running under some PC-based DBMS such as Microsoft Access. They
may be updated on a regular basis either from production or decision-support
databases.
As an example, in terms of a university, a mass-deployment database will be needed by
each lecturer to maintain an ongoing record of student attendance at lectures and
tutorials.
Ideally we would like any database system to fulfill each of these purposes at the same
time. However, in practice, medium to large-scale databases can rarely fulfill all these
purposes without sacrificing something in terms of either retrieval or update
performance. Many organizations therefore choose to design separate databases to
fulfill production, decision-support and mass deployment needs and build necessary
update strategies between each type.
.
Network Database Applications
Database systems have been impacted upon by developments in Internet technology
through so-called Web-enabled or network database applications. Users access such
applications through a Web browser on their desktop system. Browser software access
and display Web pages sited on a Web server identified by a universal resource locator
(URL). A Web page is basically a document a text file with HTML (HyperText Markup
Language) codes inserted. HTML commands instruct the browser how to display the
specified text file.
A user requests a service by specifying the URL. The specified Web page is then
transferred across the network to the desktop, allowing the browser to display the page.
Universal Servers
Traditional database applications support structured data such as numbers or character
strings. Newer applications such as network database applications demand the ability to
store and manipulate more complex data types such as image data, audio data and
video data.
Many contemporary DBMS are hence casting themselves as so-called universal servers.
The universal server approach involves the extension of DBMS to support both
traditional and non-traditional data. Non-traditional data is generally supported through
user-defined data types (UDTs) and user-defined functions (UDFs). UDTs also known as
abstract data types define non-standard data structures. UDFs permit the users of a
database to alter and manipulate UDTs.
Universe of
Discourse
Requirement Requirements
Analysis
Conceptual
Modeling
Logical Conceptual
Model Logical Model
Modeling
Physical
Modeling
Database
1. Requirements analysis
2. Conceptual modeling
§ View modeling
§ Defining entities/objects
§ Defining relationships and constraints on relationships
§ Defining attributes
§ Defining abstraction mechanisms
§ Defining behaviour
§ View integration
§ Identifying communalities among views
§ Producing a global conceptual model
§ Accommodating the conceptual model to a relational schema
3. Logical modeling
§ Normalization
§ Producing a 3NF schema through non-loss decomposition Producing
a 3NF schema through a dependency analysis
§ Reconciling the normalized schema with the schema produced
from conceptual modeling
4. Physical modeling
§ Physical database design
§ Volume analysis
§ Usage/transaction analysis
§ Integrity analysis
§ Control/security analysis
§ Distribution analysis
§ Database Implementation
§ Selecting the DBMS
§ Creating the physical schema
§ Establishing storage structures and associated access mechanisms
§ Adding indexes
§ De-normalization
§ Defining users and privileges
§ Tuning in terms of the chosen DBMS
Reconciliation
Existing Normalization
Databases
The output from the physical database design process is an implementation plan for the
database. This process is illustrated in figure 2.8. This plan will be composed of the
following elements:
1. Data structures declared in a suitable DDL
2. Indexes declared on the data structures
3. Clustering data where appropriate
4. A set of inherent integrity constraints expressed in some DDL and a set of
additional integrity constraints expressed in some DIL
5. A distribution strategy for the database system
6. A set of queries optimized for running on some database
7. A set of defined users
8. A plan for securing data
Volume Analysis
One of the first steps we need to take in moving from logical to physical database design
is to establish estimates as to the average and maximum number of instances per entity
in our database. An estimate of the maximum possible number of instances per entity is
useful in deciding upon realistic storage requirements. An estimate of how many
instances are likely to be present in the system on average also gives us a picture of the
model's ability to fulfill access requirements.
The table below summarizes some provisional sizing estimates for the USC database.
Using the column sizes established in the schema above it is relatively straightforward to
translate entity sizing into table sizing. For each relation describe in the ERD.
Usage Analysis
Usage analysis requires that we identify the major transactions required for a database
system. Transactions are considered here as being made up of a series of insertions,
updates, retrievals or deletions, or a mixture of all four. In University student database
§ Register a new student on a presentation.
§ Add new presentation details.
§ Purge all lapsed presentations before a certain date.
§ Assign a lecturer to a presentation.
Developed by IDM Interactive Training Division for 38
Skills Development Project
National Diploma in Information & Communication Technology
Database Management System
§ Record details of a possible student.
Can be sample retrieval, Update and Deletion transactions. Bas ed on the frequency of
the transactions over table of the database categorized the tables on to two groups,
known as volatile (table change frequently) and Non volatile (table change infrequently)
Tables Volatility
Courses Low
Lecturer Low
Attendance High
Students High
Transaction Analysis
Transaction analysis involves analyzing and documenting the set of critical transactions
that are expected to impact against some database system. Ideally transaction analysis
will fall out of conventional information systems analysis and design in the sense that
these activities should document the major data management activities expected in an
application.
Integrity Analysis
Proper data analysis provides a logical database design which indicates appropriate
data structures and a set of inherent integrity constraints. For example, three types of
inherent constraints should be documented in a relational schema
DOMAIN CONSTRAINTS
Domain constraints such as:
1. That courseNo in courses should be number (3) and taken from the set {<course
numbers >}
Security/Control Analysis
Security is a large issue involving securing buildings and rooms, hardware, operating
systems and DBMS. Many organizations now employ special persons to deal with
information security issues. In this section we concentrate on the issue of database
security. However, there is little point in securing a DBMS and database by itself. To
take an example, most DBMS run on top of native operating systems and place their
database tables within operating system files. Hence one must secure the operating
system against attack by unwanted persons as well as securing the tables within the
database.
Database security is normally assured by using the data control mechanisms available
under a particular DBMS. Data control comes in two parts: preventing unauthorized
access to data , preventing unauthorized access to the facilities of a particular DBMS.
Database security will be a task for the DBA (chapter 10) normally conducted in
collaboration with the organization’s security expert.
Distribution Analysis
The ability to distribute data among different nodes in a network is now a commonplace
feature of modern DBMS. The design for distributed database systems is therefore an
important aspect of modern database design. Distributed database design can be seen
as a variant of physical database design. This we can discuss chapter 11 in detail.
Hierarchical approach
The hierarchical data model does not have the strong theoretical foundation like
relational data model. Probably the most prominent of DBMS adhering to a hierarchical
approach is IBM's IMS (Information Management System) (see History of the
databases). in this section we provide a brief description of a number of key features
which characterize all hierarchical data model.
Parent
Child 1 Child 2
The hierarchical data model uses two data structures: record types and parent child
relationships. A record type is a named data structure composed of a collection of
named fields such as courseCode and courseName. Each field is used to store a simple
attribute and is given a data type such as integer, character, etc. A parent child
relationship is a one-to-many relationship between two record types. The record type at
the one end of a relationship is said to be the parent record type, such as course; that at
the many end the child record type, such as module. Hence, a hierarchical schema is
made up of a number of record types related by parent child relationship.
Consider, for instance, the relationship between courses, modules and students.
Courses can be considered the parent record type of modules in the sense that there
are many modules making up one course. The record-type modules can in turn be
considered the parent record type of students as there are many students
SCHEMA: University
RECORD: Course
PARENT: none
FIELDS (
courseCode: CHARCTER (6)
courseName: CHARCTER (10)
validationYear: DATE))
KEY: courseCode
ORDER BY courseCode
RECORD: Modules
PARENT: Course
FIELDS (
moduleName: CHARCTER (20)
staffnNo: INTEGER(6)
level: INTEGER(1))
KEY: moduleName
ORDER BY moduleName
RECORD: Student
PARENT: Module
FIELDS (
studentNo: INTEGER(6)
studentName: CHARCTER(20)
termAddress: CHARCTER (30))
KEY: studentNo
ORDER BY studentName
Courses
Note that this schema has many similarities with the relational schema describe in next
section. The key differences are:
1. That the data structures are different. In the hierarchical data model we have the
record type, while in the relational data model we have the relation.
2. Relationships are implemented differently. In the hierarchical data model
relationships are implemented different via parent-child links. In relational model
relationships are implemented via foreign key.
Network Approach
The network data model is the successor to the hierarchical data model. The dominant
influence in the development of the network data model was a series of proposals put
forward by the Database Task Group (DBTG) of the Conference on Data Systems and
Languages (CODASYL) in 1971 (DBTG 1971). During the 1970s the majority of
commercial DBMS adhered, albeit loosely, to a network data model. More recently,
ANSI made a recommendation for a network definition language (NDL) in 1986.
The network model represents a complex structure. Within a network any record may
have many immediate parents as well as many dependants.
Like the hierarchical model, the network data model has two data structures: record
types and set types .
A record type is similar in concept to the record type of the hierarchical model except
that fields may be used to store multiple values or represent a composite of values which
repeat. For example, a record type students may have the following fields: studentNo,
studentName and studentProfile. StudentProfile could be considered a composite in that
it clusters together a repeating group made up of courseName, year and grade.
A set type is a description of a one-to-many relationship between two record types.
Customer C1
I 2
Invoice
I1 IP 3
Invoice
Payment
IP 2
IP 1
Payment P1
P1
Network of data
Relational Approach
In relational approach database is construct by s et of relations. A relation is a table,
which obeys set of restrictions. Tables should conforms to all of these rules and hence
constitute a relations.
Module
Module Name Level Course code Staff No
Relational Data Base Design 1 CSD 244
Lecturers
Staff No Staff Name Status
244 David T L
245 Johon P SL
246 Evan R PL
Courses
Courses Code Module Name
CSD Relational Data Base Design
Basic Accountancy
CHAPTER 3
Introduction
An entity is a 'thing' about which an organization holds information. Entity modeling is a
technique for showing relationships between entities. Entity analysis is the process by
which the model is developed, and identifies the underlying structure of the data and
relationships of those data. The diagram produced from entity modeling is called the
Entity Relationship Diagram (ERD). Computer systems, which store large quantities of
data, need to retrieve the data rapidly, in a variety of sequences and combinations.
Problems can often result, whether the system uses a database or individual files,
because of the way in which individual data items are stored within the computer
system; this may not reflect the underlying structure of the data that exists in the
business world outside the computer system
ER Model Concepts
Entity analysis is performed at two stages during the analysis phase of a system:
During analysis of the existing system, to aid the analyst's understanding of it;
In conjunction with relational data analysis, (see Chapter 5) to produce a model of the
required data structure for the new system.
Ideally, analysts beginning work on a new project do not have to start the ERD from a
blank sheet of paper. Many organizations have recognized the importance of their data
sufficiently to develop a corporate data model, which is just a company-wide ERD.
Such a model ensures that the underlying structure of data from the point of view of the
organization as a whole is appreciated. If a corporate ERD is available, the individual
project's ERD will just be a subset of this, and the analysts, whilst developing more detail
in relation to the entities within their particular project area, will have a framework to
guide them. However even if such a corporate data model does exist, entity analysis is
still essential to the individual project.
For example, an order processing system has following data model.
Suppler who supply products
Item the different types of items
Product the different types of products
Customer who purchase the products
Order Order details of customers
Shipment Shipment details of Items
Figure 3.1
Entities
As defined above, an entity is a thing about which the organization wants to hold
information. Entities may be physical things, such as:
§ Customer;
§ Invoice;
§ Product;
§ Supplier;
§ Employee;
§ Training course;
For example,
Employee no: Name: Address: Dept: Salary: ...etc.
A624 Brown, B.J. Nottingham Accounts £25,000
Is one occurrence of the entity employee with a unique key of employee number
Attributes
All entities have attributes; they are the data, which describes or qualifies the entity. In
the entity employee the attributes could be:
§ Employee number;
§ Employee name;
§ Employee address;
§ Employee department;
§ Employee salary, etc.
In the previous example, 'A646' is the specific value of the attribute employee number for
one occurrence of the entity employee.
Relationships
The relationship between two entities describes the way in which an occurrence of one
entity is linked to, or influenced by, occurrences of another. For example, a department
may have zero, one or many employees, but each employee will relate to only one
department (in this example, where the business rule is that an employee only ever
works in one department).
Notation
Entity description. The entity name is always in the singular within a rectangle, which can
be 'hard' or 'soft' (with rounded comers) depending on which structured development
method is followed.
Customer
Relationship between two entities
Order
Figure 3.2
Customer Customer
Customer Customer
Customer
Figure 3.3
One-to-many relationship
Customer Order
This notation will show the: “One customer can make many orders”
One-to-many relationship
Customer Order
This notation will show the:” One customer can make only one order”
Many-to-many relationship
Suppler Product
This notation will show the:” One suppler can supply many products as well as one
product can supply by many suppliers ”
Extended ER model
To draw the extended ERD we should identify the additional characteristics.
Name Relationships
The relationships between two entities should be named in both directions. In Figure 3.4,
an author is the writer of a book, and the book is written by an author.
Author
The Writer of
Written By
Book
Figure 3.4
Optional Relationships
A book can exist without being issued; the relationship with issue is zero, one or more;
however, an issue cannot exist without a book. Some disciplines use a dotted line to
show optional relationships and a solid line to show a permanent relationship, as in
Figure 3.5. In the figure, there is no optionality between author and book: if details of an
author are kept it is because there are books in the library written by that author. A book
is always written by an author. Borrower details can be kept without that borrower taking
out any books (the optionality is shown by the dotted line); however, issue details must
be associated with a borrower (shown by the solid part of the line between issue and
borrower). A relationship defines the association between a master entity and a detail
entity.
For each master occurrence it should be possible to access all details Optional
relationships. for that master. The customer is the master and orders the detail. Given a
customer number, all orders for that customer can be accessed. In order/ product the
order is the master and product the detail. Given an order number all products on that
order can be accessed. An entity can therefore be a master in one relationship and a
detail in another (Figure 3.6). The dotted line can indicate optional masters, e.g. an
Invoice has a one-to- many relationship with a training course, but training course details
can exist without an invoice. An alternative approach, where the dotted line approach is
not used, is to place a '0' on the relationship line indicating optional masters (Figure 3.7).
Author Brower
Written By Made to
Book Issue
Figure 3.5
Customer Master
Order Detail/Master
Product Detail
Figure 3.6
Invoice Training
Course
Figure 3.7
Master
Employee
Weekly Monthly
Pay Pay
Figure 3.8
Customer Supplier
Invoice
Figure 3.9
Recursive relationships.
There are times when entities have relationships with themselves. In Figure 3.10, for
example, each employee reports to a supervisor who is also an employee. Each
employee who is a supervisor may supervise many other employees. An optional
recursive relationship is shown in Figure 3.10.
Employee
Figure 3.10
CHAPTER 4
Introduction
The relational model defined by E.C. Codd in the early seventies is a theoretical model
of a database. The model appealed to the computer science community because of its
mathematical basis and to the computing industry at large because of the simple way in
which it represented information by the well-understood convention of tables of values.
This session will give an overview/review of the relational model and relational Algebra,
Data base constraints and basic overview on Structured Query Language (SQL) as Data
Manipulation language.
Definitions
Attribute
A B C D Header
Relation
1 a q 10
Tuple 2 b w 45
Cardinality
3 c r 88
Body
4 d t 77
5 e y 11
6 f i 12
Degree
§ Both of these attributes are numbers, but they are different kinds of numbers.
The weight and quantity domains would therefore be distinct. We can state
that if two attributes draw their values from the same domain, then
comparisons - and hence joins, unions and many other operations – involving
those two attributes make sense because they are comparing like with like.
Thus, one advantage of having the system support domains is that it enables
the system to prevent users from making silly mistakes.
Keys -A set of attributes whose values uniquely identify each tuple of a relation.
§ Candidate key - Any attribute that satisfies the above definition. A relation
may have many keys.
§ Primary key and alternate key - Among all the candidate keys of a given
relation, one will be chosen to be the primary key, and the others are called
alternate keys. Remember that primary key values are used to identify
uniquely each tuple within a table. Therefore, the values of a primary key
attribute must each be unique within the domain of that attribute.
§ Composite key - When more than one attribute column is needed to establish
unique identification for tuples within a table, the resulting primary key is
referred to as a concatenated primary key.
§ Foreign key - These are attributes used to cross-reference tuples using the
tuples' primary key values. In other words, a primary key for one table is
known as a foreign key in the table into which it is embedded for the purpose
of identifying relationship occurrences.
Foreign keys are used to represent relationships. There are no links or
pointers connecting one table to another. In non-relational systems, by
contrast, such information is typically represented by some kind of physical
link or pointer that is explicitly visible to the user.
Relations
There is only one data structure in the relational data model - the relation. Because the
idea of a relation is modeled on a mathematical construct, a relation is a table, which
obeys a certain restricted set of rules:
Integrity Constraints
An integrity constraint can be regarded as a condition that all correct states of the
database are required to satisfy. Integrity is enforced via integrity rules.
There are three main types of integrity specified in the relational model:
§ Domain integrity - Domain integrity is concerned with the values that may be
contained within a particular column of relations.
§ Entity integrity - Entity integrity is concerned with the primary keys of relations.
§ Referential integrity - Referential integrity is concerned with the foreign keys of
relations.
Domain Integrity
Domain integrity is concerned with the values that may be contained within a particular
column. The domain integrity rule states that every attribute is required to satisfy the
constraint that its values are drawn from the relevant domain.
All columns have an implicit domain derived from their data types (for example, a
telephone number is made up of 10 numeric digits). However, more explicit domain
integrity can also de defined (for example, each telephone number is preceded by the
digits 071).
Ideally commands should be available within the database system to allow the user to
define domain specifications when the database is created - not all systems currently
allow this.
It would also be helpful to have some type of domain constraint. For example, a CHECK
clause could be used to specify the gender column as CHECK (gender IN CM', "F)). This
would be more useful than just being able to specify the column as type character -
which would allow any character.
Entity Integrity
Entity integrity is concerned with primary keys. The entity integrity rule states that every
base relation must have a primary key and no component of that primary key is allowed
to accept null values.
§ Null values - Null values here mean that information is missing for some reason
(for example, the property is non-applicable), or the value is unknown.
§ The function of primary keys - Primary keys perform the unique identification
function in the relational model.
Referential integrity
Referential integrity is concerned with foreign keys. The referential integrity rule- states
that the database must not contain any unmatched foreign key value. In other words, the
referential integrity rule simply says that if references A. then A must exist.
1. Restricted - the delete operation is restricted to the case where there are
no such matching orders (it is rejected otherwise).
For example, the deletion of a parent row will be disallowed if there are
any dependent rows.
2. Cascades - the delete operation cascades to delete those matching
orders also.
For example, if a parent row is deleted, then the system will automatically
delete the dependent rows.
3. Nullifies - the foreign key is set to null in all such matching orders and the
customer is then deleted.
For example, when a parent row is deleted, all dependent foreign keys
are set to null.
o Updating - What should happen on an attempt to update the primary key of the
target of a foreign key reference? For example, an attempt to update the name
for a customer for which there exists at least one matching order.
For example, if an update command changes the primary key of a parent row
and if dependants exist for that row, the update is disallowed.
2. Cascades - the delete operation cascades to update those matching orders
also.
For example, when an update command changes the primary key of a parent
row, all of its dependent rows will have their foreign key set to null.
Relational Algebra
The manipulative part of the relational model defines the set of things, which can be
done to relational databases. Relational algebra and relational calculus are the two main
alternative ways of expressing the manipulative part of the relational model.
Relational algebra is addressed in this session. Relational algebra is basically a set of
operations, which can be applied to a relational database in order to manipulate and
access the data contained in tables.
Relational algebra (RA) consists of a collection of high-level operators that operate on
relations. In his original paper on the relational model, Codd introduced eight such basic
operators which could be used to manipulate data within the body (relation) parts of
tables of a relational database.
These basic operators are all incorporated into the standard, international relational
database-language, Sequential Query Language (SQL). Note, however, that SQL
supports other Data Manipulation Language (DML) operators beyond those just
described and that it also supports Data Definition Language (DDL) operators.
Property of closure - It is important that the results of using the above operators on
tables must themselves be tables. This is because these operators can be used
sequentially in various combinations to obtain desired results. Thus each operation on
completion must leave data as a table (or tables) for the next operator to use. This
property, which all the above operators must have, is referred to as closure.
Project - Builds a relation consisting of all specified attributes from a specified relation.
Product - Builds a relation from two specified relations consisting of all possible
combinations of tuples, one from each of the two relations.
Union - Builds a relation consisting of all tuples appearing in either or both of two
specified relations.
Intersect - Builds a relation consis ting of all tuples appearing in both the first and the
second of two specified relations.
Difference - Builds a relation consisting of all tuples appearing in the first and not the
second of two specified relations.
Join - Builds a relation from two specified relations consisting of all possible
combinations of tuples, one from each of the two relations, such that the two tuples
contributing to any given combination satisfy some specified condition.
Divide - Takes two relations, one binary and one unary, and builds a relation consisting
of all values of one attribute of the binary relation that match (in the other attribute) all
values in the unary relation.
Data Definition
Data Definition is define the Data base structure. The initial activity is the creating the
table (Relation)
Syntax:
CREATE TABLE <table Name>
(<Column Name><Data type>(<Length>),
(<Column Name><Data type>(<Length>),
…………………….)
Example :
CREATE TABLE Modules
(ModuleName CHARACTER(15),
courseCode CHARACTER(3),
StaffNo INTERGER)
For example:
CREATE TABLE Modules
(moduleName CHARACTER(15) NOT NULL UNIQUE,
level SMALLINT,
courseCode CHARACTER(3),
staffNo INTEGER)
Developed by IDM Interactive Training Division for 69
Skills Development Project
National Diploma in Information & Communication Technology
Database Management System
DEFAULT VALUES
A clause can be added to a column definition specifying the value that a column should
take in response to incomplete information being entered by the user. For instance, a
DEFAULT <value> specification can be added to the level column for modules indicating
that the default level should be I.
For example:
CREATE TABLE Modules
(moduleName CHARACTER(15) NOT NULL UNIQUE,
level SMALLINT DEFAULT I,
courseCode CHARACTER(3),
staffNo INTEGER)
DROP TABLE
Table definitions can be created and table definitions can be deleted. To remove a table
from the database we use the following command:
Syntax:
DROP TABLE <table name>
For example:
DROP TABLE Modules
For instance:
ALTER TABLE Lecturers
ADD COLUMN roomNo SMALLINT
Simple Retrieval
Simple retrieval is accomplished by a combination of the select, from and where clauses:
Syntax:
SELECT <attribute name>, <attribute2 name>, ...
FROM -stable name>
[WHERE <condition>]
The select clause indicates the table columns to be retrieved. The “FROM” clause
defines the tables to be referenced. The “WHERE” clause indicates a condition or
conditions to be satisfied. The following command, for instance, is a direct analogue of
the relational algebra select or restrict. The asterisk ' * ' acts as a wildcard. That is, all
the attributes in the table are listed:
SELECT *
FROM Modules
Note, that the example above has no “WHERE” clause. The “WHERE” clause is
optional. we omit the where clause then all the rows of a table are considered by the
query. The addition of a where clause restricts the retrieval to a set of rows matching a
given condition:
SELECT *
FROM Modules
WHERE moduleName=’Reletinal Data base’
CHAPTER 5
Introduction
In this chapter we consider Codd's original ideas on normalization while also
Describing a graphic technique used for designing fully normalized schema. We
particularly emphasize the use of normalization as a bottom -up technique for relational
data base design.
Normal forms
In his seminal paper on the relational data model, E. F. Codd formulated a number of
design principles for a relational database (Codd 1970). These principles were originally
formalized in terms of three normal forms: first normal form, second normal form and
third normal form. The process of transforming a database design through these three
normal forms is known as normalization. By the mid-1970s third normal form was shown
to have certain inadequacies and a stronger normal form, known as Boyce-Codd normal
form was introduced (Codd 1974). Subsequently Fagin introduced fourth normal form
and indeed fifth normal form (Fagin 1977, 1979).
Why Normalise?
Suppose we are given the brief of designing a database to maintain information about
students, modules and lecturers in a university. An analysis of the documentation used
by the administrative staff gives us the following sample data set
with which to work. If we pool all the data together in one table as below, a number of
problems, sometimes called file maintenance anomalies, would arise in maintaining this
data set.
Modules
Staff Staff Student Ass Ass
Module Name Student
No Name No Grade Type
Relational Database Systems 234 Lee T 3468 Smith S B3 Cwk1
Relational Database Systems 234 Lee T 3468 Smith S B1 Cwk2
Relational Database Systems 234 Lee T 3778 Jones S B2 Cwk1
Relational Database Systems 234 Lee T 3488 Patel P B1 Cwk1
Relational Database Systems 234 Lee T 3488 Patel P B3 Cwk2
Relational Database Design 234 Lee T 3468 Smith S B2 Cwk1
Relational Database Design 234 Lee T 3468 Smith S B3 Cwk2
Deductive Databases 345 Evans 3478 Smith J A1 Exam
Figure 5.1
1. What if we wish to delete student 3468? The result is that we lose some valuable
information. We lose information about deductive databases and its associated
lecturer. This is called a deletion side effect.
2. What if we change the lecturer of deductive databases to V Konstantinou? We
need to update not only the staff-Name. But also the staff-No for this module.
This is called an update side effect.
3. What if we admit a new student on to a module, say student-No 3898? We
cannot enter a student record until a student has had at least one assessment.
This is known as an insertion side effect.
4. The size of our sample file is small. One can imagine the seriousness of the file
maintenance anomalies mentioned above multiplying as the size of the file
grows. The above structure is therefore clearly not a good one for the data of this
enterprise. Normalization is a formal process whose aim is to eliminate such file
maintenance anomalies.
Stages of Normalization
Normalization is carried out in the following steps;
1. Collect the data set - the set of data items.
2. Transform the unnormalised data set into tables in first normal form.
3. Transform first normal form tables to second normal form.
4. Transform second normal form tables to third normal form.
Occasionally, the data may still be subject to anomalies in third normal form. In this
case, we may have to perform further steps;
1. Transform third normal form to Boyce-Codd normal form.
2. Transform third normal form to fourth normal form.
3. Transform fourth normal form to fifth normal form.
The process of transforming an unnormalised data set into a fully normalized (Third
normal form) database is frequently referred to as a process of non-loss decomposition.
This is because we continually fragment our data structure into more and more tables
without losing the fundamental relationships between data items.
Functional dependencies
Determinacy/Dependency
Normalization is the process of identifying the logical associations between data items
and designing a database, which will represent such associations, but without suffering
the file maintenance anomalies discussed in section 5.1. The logical associations
between data items that point the database designer in the direction of a good database
design are referred to as determinant or dependent relation ships. Two data items, A
and B, are said to be in a determinant or dependent relationship if certain values of data
item B always appear with certain values of data item A. Determinacy/dependency also
implies some direction in the association. If data item A is the determinant data item and
B the dependent data item then the direction of the association is from A to B and not
vice versa.
There are two major types of determinacy or its opposite dependency: functional (single-
valued) determinacy and non-functional (multi-valued) determinacy. We introduce here
the concept of functional determinacy.
Data item B is said to be functionally dependent on data item A if for every value of A
there is one, unambiguous value for B. In such a relationship data item A is referred to
as the determinant data item, while data item B is referred to as the dependent data
item. Functional determinacy is so-called because it is modeled on the idea of a
mathematical function. A function is a directed one-to-one mapping between the
elements of one set and the elements of another set. In a university personnel database,
staff No and Staff Name are in a functional determinant relationship. Staff No is the
determinant and Staff Name is the dependent data item. This is because for every Staff
No there is only one associated value of Staff Name. For example, 345 may be
associated with the value J.Smith. This does not mean to say that we cannot have more
than one member of staff named J.Smith in our organization. It simply means that each
J.Smith will have a different Staff No. Hence, although there is a functional determinacy
from Staff No to Staff Name the same is not true in the opposite direction – Staff Name
does not functionally determine Staff No. Staying with the personnel information, Staff
No will probably functionally determine Department Name. For every member of staff
there is only one associated
department or school name which applies. A member of staff cannot belong to more
than one department or school at any one time.
Determinacy Diagrams
A diagram, which documents the determinacy or dependency between data items we
shall refer to as a determinacy or dependency diagram. Data items are drawn on a
determinacy diagram as labeled ovals, circles or bubbles. Functional dependency is
represented between two data items by drawing a single-beaded arrow from the
determinant data item to the dependent data item. For example, figure 5. 4(A) represents
a number of functional dependencies as diagrams.
Modules
Staff Staff Student Ass Ass
Module Name Student
No Name No Grade Type
Relational Database Systems 234 Lee T 3468 Smith S B3 Cwk1
B1 Cwk2
3778 Jones S B2 Cwk1
3488 Patel P B1 Cwk1
B3 Cwk2
Relational Database Design 234 Lee T 3468 Smith S B2 Cwk1
B3 Cwk2
Deductive Databases 345 Evans 3478 Smith J A1 Exam
Figure 5.2
A given cell of the table for the attributes Student No, Student Name, Ass Grade and Ass
Type contain multiple values. Examining the table above we see that Student No,
Student Name, Ass Grade and Ass Type all repeat with respect to Module Name.
(A relation is in first normal form if and only if every non-key attribute is functionally
dependent upon the primary key.)
The attributes Student No, Student Name, Ass Grade and Ass Type are clearly not
functionally dependent on our chosen primary key Module Name. The attributes Staff No
and Staff Name clearly are. This means that we form two tables: one for the functionally
dependent attributes, and one for the non-dependent attributes. We declare a compound
of Module Name, Student No and Ass Type to be the primary key of this second table.
Modules
Staff
Module Name Staff No
Name
Relational Database Systems 234 Lee T
Relational Database Design 234 Lee T
Relational Database Design 234 Lee T
Deductive Databases 345 Evans
Assessments
Module Name Student No Ass Type Student Name Ass Grade
Modules
Staff
Module Name Staff No
Name
Relational Database Systems 234 Lee T
Relational Database Design 234 Lee T
Relational Database Design 234 Lee T
Deductive Databases 345 Evans
Assessments
Module Name Student No Ass Type Ass Grade
Students
3468 Smith S
3778 Jones S
3488 Patel P
3478 Smith J
Modules
Module Name Staff No
Lecturers
234 Lee T
345 Evans
Assessments
Module Name Student No Ass Type Ass Grade
Students
Student No Student Name
3468 Smith S
3778 Jones S
3488 Patel P
3478 Smith J
A diagram incorporating all these business rules is illustrated in figure 5.3 On the basis
of these business rules a schema is produced in 3NF represented in the bracketing
notation below:
Area
Student No
Staff No
Figurer 5.3
Majors
This schema is in 3NF because there are no partial dependencies and no interdata
dependencies. However, anomalies will still arise when we come to update this relation.
For instance:
1. Suppose student 123456 changes one of her majors from computer science to
information systems. Doing this means that we lose information about staff No
234 tutoring on computer science. This is an update anomaly.
2. Suppose we wish to insert a new row to establish the fact that staff No 789 tutors
on computer science. We cannot do this until at lest one student takes this area
as their major. This is an insertion anomaly.
3. Suppose student 345678 withdraws from the university. In removing the relevant
row we lose information about staff No 567 being a tutor in the area of
information systems. This is a deletion anomaly,
Schema 1:
Student Tutors (Student No, Staff No)
Tutor Areas (Staff No, Area)
Schema 2:
Student Tutors (Student No, Area)
Tutor Areas (Staff No, Area)
Functional Dependencies
Non-Functional Dependencies
B
Module Name Staff No
Let us assume that the university maintains a list of languages relevant to the
organization. The university wishes to record which members of staff have which
language skills. Clearly the relationship between staff Nos and languages is not a
functional determinacy. Many staff members may just have one language, but some will
have two or more languages. Also, each language, particularly in the case of European
languages such as English, French and German is likely to be spoken by more than one
staff member.
Therefore, staff No and staff Language is in a non-functional or multi-valued
determinacy. In other words, for every staff No we can identify a delimited set of
language codes, which apply to that staff member.
Multi valued or non-functional dependency is indicated by drawing a double-headed
arrow from the determinant to the dependent data item. Figure 5.4(B) represents two
non-functional relationships as determinacy diagrams.
EUEmployees
Employee No Skill Language
0122443 Typing English
0122443 Typing French
0122443 Dictation English
0221133 Typing German
0221133 Dictation French
0332222 Typing French
0332222 Typing English
However, we wish to add the restriction that each employee exercises skill and language
use independently. In other words, typing as a skill is not inherently linked with the ability
to speak a particular language. Under fourth normal form these two relationships should
not be represented in a single table as in figure 5.5(A). This is evident when we draw the
determinacy diagram as in figure 5.5(B). Having two independent multi-valued
dependencies means that we must split the table into two as below:
EUEmployees
Employee No Skill
0122443 Typing
0122443 Dictation
0221133 Typing
0221133 Dictation
0332222 Typing
EUEmployees
Employee No Language
0122443 English
0122443 French
0221133 German
0221133 French
0332222 French
A
Staff Skill
Staff No
Staff Language
B
Staff Skill
Staff No
Staff Language
Figure 5.5
Join Dependencies:
Outlets
If agents represent companies, companies make products, and agents sell products,
then we might want to record which agent sells which product for which company. To do
this we need the structure above. We cannot decompose the structure because although
agent Jones sells cars made by Ford and vans made by Vauxhall he does not sell Ford
vans or Vauxhall cars.
Fifth normal form concerns interdependent multi-value dependencies, otherwise known
as join dependencies.
Product
Agent
Company
Product
Agent
Company
Inclusion dependency
Inclusion dependency can be regarded as generalization of referential integrity
constrains. This will consent above the values papering in one attribute. But with in the
to tables.
Bonus table:
Emp_No Bonus
0001 1000
0002 2500
0003 1500
According to inclusion dependency rules the attribute values of bonus file (Emp_No)
must be sub set of the values appearing in employee master file. Emp_No is in the
bonus file need not to become a foreign key and also emp_No attribute in employee
master file need not to be candy date key according to the integrity constraints.
CHAPTER 6
Introduction
In this chapter we will discuses the Query management of DBMS kernel. Under Query
management the main topics of Query processing and optimization will discuss and
some other functions also.
DBMS ensure the database integrity by maintaining the ACID properties of the database
transaction
A single transaction can contend one or more Query functions.
1. Convert the query into a more suitable internal form. First, the original query is
converted into some internal representation, which is more suitable for machine
manipulation.
i. The typical forms used for this representation are:
1. Query tree;
2. Relational algebra.
2. Convert to a more efficient canonical form. This internal representation is further
converted into some. Equivalent canonical form which is more efficient, making
use of well-defined transformation rules
3. Choose set of candidate low-level procedures, using statistics about the
database
i. Low-leva) operation
ii. Implementation procedure
iii. Cost formula
4. Generate query plans and choose the best (cheapest) plan by evaluating the
cost formulae
Query Optimization
Query optimization is an important component of a modern relational database system.
When processing user queries, the query optimizer transforms them into a set of low
level operations, and decides in what order these operations are to be executed in the
most efficient way.
In a non-relational database system, any optimization has to be carried out manually.
Relational database systems, however, offer a system-managed optimization facility by
making use of the wealth of statistical information available to the system. The overall
objective of a query optimizer is to improve the efficiency and performance of a relational
database system.
Retrieve the names of students who have borrowed the book B1.
This query can be expressed in SQL:
2. Select the result of Step I for just the tuples for book B I.
• This results reading the 10,000 joined tuples (obtained in step 1) back into
memory.
• Then Select produces a relation containing only 50 tuples, which can be kept in
memory (see assumption).
3. Project the result of Step 2 over Stud_Name to get result (50 max).
• This results in reading a relation of 50 tuples (obtained in step 2) which is already
in memory, and producing a final relation of no more than 50 tuples, again in
memory.
The number of tuple I/O in this step is:
0+0=0
Therefore, the total number of tuple l/0s for query plan A is:
(1,010,000 +10,000).
Select-Join-Project
1. Select the relation Lending for just the tuples for Book B I.
• This results in reading 10,000 tuples of Lending relation, but only generates a
relation with 50 tuples, which will be kept in memory (see assumption).
Therefore, the total number of tuple l/0s for query plan B is (10,000 + 100)
Database Statistics
Various decisions, which have to be made in the optimization process are based upon
the database statistics stored in the system, often in the form of a system catalogue or a
data dictionary
Schedules
Locking schemes can be described as pessimistic, inasmuch as they make the worst-
case assumption that every piece of data accessed by a given transaction might be
needed by some concurrent transaction and had therefore better be locked.
By contrast, optimistic scheduling also known as certification or validation schemes
make the opposite assumption that conflicts are likely to be quite rare in practice. Thus,
they operate by allowing transactions to run to completion completely unhindered, and
then checking at COMMIT time to see whether a conflict did in fact occur. If it did, the
offending transaction is simply started again from the beginning. No updates are ever
written to the database prior to successful completion of commit processing, so such
restarts do not require any updates to be undone.
Optimistic method have certain inherent advantages over traditional locking methods in
terms of the expected level of concurrency (i.e., number of simultaneous transactions)
they can support, suggesting that optimistic methods might become the technique of
choice in systems with large numbers of parallel processors.
Recoverability
Recoverability means possibility of recovering the database itself in the case of database
failure. Therefore Recovery is to "restore the database to a state that is know to be
correct after some failure has rendered the current state incorrect, or at least suspect."
The underlying principles for handling recovery can be summarized in a single word
“redundancy”. By applying this principle, any piece of information the database
contains can be reconstructed (recovered) from some other information stored,
redundantly, somewhere else in the system.
If there is a failure in the database Recovery manager in the DBMS will recover the
database by using some recovery technique. That technique can be Rollback entire
Transaction, Undo or Redo the transactions.
To recover from the failure recovery manager will keep track of operations information.
Serialisability of schedules
We have now laid the groundwork for explaining the crucial notion of serializability.
Serializability is the generally accepted criterion for correctness for the execution of a
given set of transactions. More precisely, a given execution of a set of transactions is
considered to be correct if it is serializable—i.e., if it produces the same result as some
serial execution of the same transactions, running them one at a time. Here is the
justification for this claim:
Individual transactions are assumed to be correct—i.e., they are assumed to transform a
correct state of the database into another correct state.
Running the transactions one at a time in any serial order is therefore also correct "any"
serial order because individual transactions are assumed to be independent of one
another.
An interleaved execution is therefore correct if it is equivalent to some serial execution
i.e., if it is serializable.
Terminology: Given a set of transactions, any execution of those transactions,
interleaved or otherwise, is called a schedule. Executing the transactions one at a time,
with no interleaving, constitutes a serial schedule; a schedule that is not serial is an
interleaved schedule (or simply a nonserial schedule). Two schedules are said to be
equivalent if they are guaranteed to produce the same result, independent of the initial
state of the database. Thus, a schedule is correct (i.e., serializable) if it is equivalent to
some serial schedule.
The point is worth emphasizing that two different serial schedules involving the same set
of transactions might well produce different results, and hence that two different
interleaved schedules involving those transactions might also produce different results
and yet both be considered correct. For example, suppose transaction A is of the form
"Add 1 to x" and transaction B is of the form "Double x" (where x is some item in the
database). Suppose also that the initial value of x is 10. Then the serial schedule A-then-
fl gives x = 22, whereas the serial schedule B-then-A gives x = 21. These two results are
equally correct, and any schedule that is guaranteed to be equivalent to either A-then-B
or B-then-A is like wise correct.
The concept of serializability was first introduced (although not by that name) by
Eswaran et al. and he introduce the two-phase locking method.
1. Before start the transaction on any data item (e.g., a database tuple), a
transaction must acquire all necessary locks on that data item.
2. After releasing a lock, a transaction must never go on to acquire any more locks.
CHAPTER 7
Introduction
We discussed integrity of the database. The objectives of maintain database integrity is
to make sure that the database is an accurate reflection of the real world it is attempting
to represent. This is more critical issue in the multi user environment. The kernel of a
DBMS is concerning this integrity in multi user environment. Database integrity can be
losing due to many reasons.
In this chapter we examine one of the most important aspects of database integrity. By
discuss the concurrency problems and possible solutions for them such as locking, time
stamping etc.
What is concurrency?
Database systems typically provide multi user access to the database in order to share
the data in the database. The process, which enables this simultaneous access to a
database, is known as concurrency.
What is Transaction?
In a multi-user database system the events that cause changes to a database or retrieve
data from the database are called transactions. A transaction may define as a logical
unit of work. It normally corresponds to some coherent activities performed by an
individual, group, organization or software process on a database. In practice this can
constitute an entire program or a set of commands, which accesses or updates some
database.
Concurrency Problems
There are three concurrency problems, i.e. three types of potential mistake which
could occur if concurrency control is not properly enforced in the database system.
The lost update problem - This relates to a situation where two concurrent transactions,
say A and B, are allowed to update an uncommitted change on the same data item, say
x. The second update by transaction B replaces the value of the first update by
transaction A. Consequently, the updated value of x by A is lost following the second
update by B.
The inconsistent analysis problem - This problem relates to a situation where transaction
A uses an data item which is in an inconsistent state and as a result carries out an
inconsistent analysis.
Locking;
• Optimistic scheduling;
• Time stamping.
B Request
X – Excusive Lock (Write Lock)
S – Shared Lock (Read Lock)
X S -
B Transaction Request
X N N Y A Has locks
S N Y Y
A has
- Y Y Y
Two-Phase Locking
Using locks by themselves does not guarantee serialisability of transactions. To
guarantee such serialisability the DBMS must enforce an additional method known as
two-phase locking. In this technique divide the life of every transaction into two phases:
a growing phase and the shrinking phase
In growing phase, which it acquires all the locks it needs to perform its work, and a
shrinking phase, in whic h it releases its locks. During the growing phase it is not allowed
to release any locks, and during the shrinking phase it is not allowed to acquire any new
locks. This does not mean that all locks be acquired simultaneously since a transaction
normally will engage in the process of acquiring some locks, conducting some
processing, then going on to acquire more locks, and so on. The two-phase locking
protocol merely involves enforcing two rules:
1. A transaction must acquire a lock on a data item before performing any
processing on that data item.
2. Once a transaction releases a lock it is not permitted to acquire any new locks.
A Data
Item 2
Data B
Item 1
There are a number of strategies that can be employed to prevent deadlock. One
strategy is to force transaction B to acquire all required locks prior to execution. This
strategy, however, tends to have a detrimental effect on performance as large
transactions continuously wait to execute.
There are more effective strategies in database systems involve detecting deadlock after
it has occurred and resolving such deadlock.
A deadlock can detect in the system by searching a cycle the wait-for graph of the
database transactions and periodically invokes an algorithm, which searches for a cycle
in the graph. Each transaction involved in the cycle is involved in the deadlock.
After detection algorithm has identified a deadlock, the system must try to recover from
it. The most common solution is to roll back one or more transactions so that the
deadlock can be broken.
Bear in mind that data items held by deadlocked transactions will be unavailable to
other transactions until the deadlock is broken.
Determine which transactions, among a set of deadlocked transactions, to roll back to
break the deadlock. Transaction can select based on the Time stamping value of the
transaction.
Transaction Time stamp is unique identifier assign to the each transaction. The times
stamps are ordered based on the order in which transaction are started.
IF T1 start before T2
Then
TS(T1) < TS(T2)
(Older Tx) (Younger Tx)
In the deadlock prevention mechanism system will rollback the youngest transaction out
of several transactions in the transactions list.
Developed by IDM Interactive Training Division for 105
Skills Development Project
National Diploma in Information & Communication Technology
Database Management System
CHAPTER 8
Introduction
This chapter will concentrate on the Data organisation and database processing. Data
organisation (sometimes called file organisation) concerns the way in which data is
structured on physical storage devices, the most important being disk devices. The idea
of data organisation and access is inherently inter-linked. In this chapter we shall discuss
the main types of data organisation found in DBMS. We also discuss a type of data
organisation known as a cluster, which is particularly commonplace in contemporary
relational DBMS.
Data models as described in chapter 2 are all forms of logical data organisation. For
instance, relations or tables are logical data structures. A database organized in terms of
any particular data model must be mapped onto the organisation relevant to a physical
storage device. This form of data organisation is known as physical organisation.
file
Record
Field
Byte
Bit
Figure 8.1
Page
Page
File
Page
Figure 8.2
above does not always exist in practice. Some file managers are not particularly well
suited to the needs of database applications. In this case, the DBMS frequently
bypasses the file manager and directly interacts with the disk manager to retrieve data.
DBMS
File Requests
File Manager
Disk Manger
Disk
Figure 8.3
Sequential Files
The base form of file organisation is the sequential file. In this form of file organisation,
records are placed in the file in the order in which they are inserted. A new record is
simply added to the last block in the file. If there is insufficient space in the last block
then a new block is added to the file. Therefore, insertion into sequential file is very
efficient.
Hence, suppose we have a sequential file of student records. As each student enrolls
with the university a new student record is created and added to the end other file.
However, searching for a record in a sequential file involves a linear search through the
file record by record. Hence, for a file of N records, N/2 records will be searched on
average. This makes searching through a sequential file that is more than a few blocks
long a slow process.
Another problem arises in relation to deletion activity performed against sequential files.
To delete a record we first need to retrieve the appropriate block from disk. The relevant
record is then marked as deleted and then written back to disk. Because the deleted
records are not reused, a database administrator normally has to re-organise a
sequential file periodically to reclaim deleted space.
Ordered Files
The problems of maintenance associated with sequential files mean that most systems
tend to maintain some form of ordered file organisation. In an ordered file the records
are ordered on the basis of one or more of the fields – generally referred to as the key
fields of the file.
If we store student data as an ordered file then we might use student number as our key
field.
Ordered files allow more efficient access algorithms to be employed to search a file. One
of the most popular of such algorithms is the binary search or binary chop algorithm,
which involves a continuous half-wise refinement of the search space.
Insertion and deletion is more complicated in an ordered file. To insert a new record we
first have to find the correct position for the insertion in the key sequence. If there is
space for the insertion in the relevant block then we can transfer it to main memory, re-
order the block by adding the new record, and than write the block back to disk. If there
is insufficient space in the block then the record may be written to a temporary area
known as overflow. Periodically, when the overflow area becomes full the DBA needs to
initiate a merging of the overflow records with the main file.
Hashed Files
Hashed files provide very fast access to records based on certain criteria. They
organized in terms of a hash function, which calculates the address of the block record
should be assigned to or accessed from.
A hashed file must be declared in terms of a so-called hash key. This means that there
can be only one hashed order per file. Inserting a record into a hashed file means that
the key of the record is submitted to a hash function. The hash function translates the
logical key value into a physical key value - a relative block address. Because this
causes records to be randomly distributed throughout a file they are sometimes known
as random Files or direct files.
Hash functions are chosen to ensure even distribution of records throughout a file. One
technique for constructing a hash function involves applying an arithmetic function to the
hash key. For instance, suppose we use student code as the hash key for a student file.
The hash function might involve taking the first three characters from a student code,
converting these characters to an integer, and adding the result to a similar integer
conversion performed on the last three characters of the student code. The result of this
addition would then be used as the address of the block into which the record should be
placed.
Hashed files must have a mechanism for handling collisions. A collision occurs when two
logical values are mapped into the same physical key value. Suppose we have a logical
key value: 2034. We find that feeding this key value through the hash function computes
the relative block address: 12 (in practice, this addres s would be a hexadecimal
number). A little later we try to insert a record with the key value 5678. We find that this
also translates to relative block address 12. This is no problem if there is space in the
block for the record. As the size of the file grows, however, blocks are likely to fill up. If
the file manager cannot insert a record into the computed block then a collision is said to
have occurred. One of the simplest schemes for handling collisions is to declare an
overflow area into which collided records are placed. As the hashed file grows, however,
the number of records placed in the overflow area increases. This can cause
degradation in access performance. For this reason, a number of more complex
algorithms are now employed to handle collisions. Various types of hashed file are now
also available which dynamically resize themselves in response to update activity.
Clustering
Clustering is a technique whereby the physical organisation of data reflects some aspect
of the logical organisation of data. We may distinguish between two types of cluster:
1. Intra-file clustering
2. Inter-file clustering.
Intra-file clustering involves storing data in order of some key value where by each set of
records having the same key value are organized as a cluster. For instance, we might
decide to cluster a Students table in terms of a department or school code. Hence all
computing students would be stored in a cluster, all humanities students would be stored
as a cluster, and so on.
Inter-file clustering involves interleaving the data from two or more tables. The table
below, for instance, is a clustered version of the Lecturers and Modules tables with
which we are familiar.
Cluster:
Teaching
234 Davies T L
345 Evans R PL
237 Jones S SL
The rationale for clustering Lecturers data with Modules data in this way is to improve
the joining of Lecturer records with Module records. It must be remembered, however,
that clustering, like indexing, is a physical concern. All that matters in terms of the data
model is that the user perceives the data as being organized in relational terms. How the
data is stored on disk is outside the domain of the data model.
For example:
CREATE CLUSTER Teaching
(staffNo NUMBER(5))
We have hence created a cluster named Teaching and declared staffNo to be the
so-called cluster key. This will be used to organise the data in the cluster. Next we
create the tables and assign each table to the cluster:
It is valid to use clusters to implement established joins, i.e. stable access paths into
your data. However, use of clusters can degrade other access paths, for example, full
table scan on single tables in a cluster.
CHAPTER 9
Introduction
In this chapter we shall examine the essentials of the CODASYL recommendations. We
shall not look at any specific CODASYL system. There are many different
implementations of the CODASYL recommendations on the market. They all vary in
some degree from the CODASYL recommendations, but the variations are for the most
part minor ones. And a grasp of the CODASYL specifications is sufficient for an
understanding of any of the current implementations. The CODASYL recommendations
have developed over a period of about 15 years.
CODASYL is an acronym for Conference on Data System Languages, which is a
voluntary organization representing user groups and manufacturers of computer
equipment. It was CODASYL that was originally responsible for the development of
COBOL.
1. Into groups of conceptual records of the same type, that is, into distinct
conceptual files.
2. Into groups of records called owner coupled set types or CODASYL sets. An
owner-coupled set grouping of conceptual records embodies a in relationship between
conceptual files.
It is worthwhile gaining a grasp of the owner-coupled set concept even at this stage.
WAREHOUSE
WAR -EMP
EMPLOYEE
Figure 9.1
A subschema definition language depends on the programming language used with the
application program. For example, if we intended to manipulate the data base by means
of FORTRAN programs, we would use a subschema definition that was essentially the
same but which has a s yntax more like that of FORTRAN. It is quite clear that the
subschema above is in the style of COBOL.
The above subschema is called PERSONNEL and is derived from the conceptual
schema OUTSTANDING-Of ORDERS.
The subschema also use to define the User Work Area (UWA), which consist of record
structure variables each which can hold a record correspond to the external files
declaration in the subschema.
CHAPTER 10
Database Administration
Introduction
The database administrator (DBA) is responsible for the technical implementation of
database systems, managing the database systems currently in use and setting and
enforcing policies for their use. Whereas the data administrator works primarily at the
conceptual level of business data, the database administrator works primarily at the
physical level. The place where the data administrator and the database administrator
meet is at the logical level. Both the data administrator and database administrator must
be involved in the system -independent specification and management of data.
The need for a specialist DBA function varies depending on the size of the database
system being administered. In terms of a small desktop database system the main user
of the database will probably perform all DBA tasks such as taking regular backups of
data. However, when many users are using a database and the volume of data is
significant, the need for a person or persons, which specializes in administration
functions, becomes apparent.
VIEWS
A view is a virtual table. It has no concrete reality in the sense that no data is actually
stored in a view. A view is simply a window into a database in that it presents to the user
a particular perspective on the world represented by the database.
We can identify three main interdependent uses for views in a database system: to
simplify, to perform some regular function, or to provide security.
In a university we might want to restrict each department or school to only be able to
update the data relating to its own students. A view declared on each department or
school and attached to a range of user privileges can ensure this. Hence the two views
above. Computing Students and Business Students might be secured for access only by
administrative staff of the school of computing and the business school, respectively.
Encryption
Encryption is very important issue in the data security. When a user attempts to bypass
the system (e.g., by physically removing part of the database, or by tapping into a
communication line) the most effective countermeasure against such threats is data
encryption—that is, storing and transmitting sensitive data in encrypted form.
In encryption process the plaintext data is encrypted by an encryption algorithm. This
encryption algorithm will take plaintext as input and an encryption key; the output from
this algorithm is the encrypted form of the plaintext is called the ciphertext. To view the
plain text back user need the decryption key. The encryption key is kept secret. The
ciphertext, which should be unintelligible to anyone not holding the encryption key, is
what is stored in the database or transmitted down the communication line.
CHAPTER 11
Distributed Database
Introduction
The 1990s have been portrayed by many as the era of the distributed database system
(Ozsu and Valduriez, 1991 ). We will describe some of the major features of distributed
systems within this chapter. This leads us to a discussion of some of the major types of
distributed database system.
A distributed database system is a database system, which is fragmented or replicated
on the various configurations of hardware and software located usually at different
geographical sites within an organisation. Distribution is normally discussed solely in
terms of the fragmentation and replication of data. A data fragment constitutes some
subset of the original database. A data replicate constitutes some copy of the whole or
part of the original database. However, distribution can also be discussed in terms of
distribution of functions. It is for this reason that we include client-server database
systems within our discussion of distributed database systems. Although most current
client-server database systems do not effectively distribute data, they do distribute
functionality. There for we can define distributed databases as collection multiple,
logically interrelated databases, which are physically distributed over a computer
network.
According to above definition the words “logically interrelated” are highlighted to
emphasize the fact that it only makes sense to group together data in a distributed
fashion if there is some relationship between them.
The words “physically distributed” are highlighted to emphasize the fact that two or more
separate sites are usual. This will of course have advantages and disadvantages.
The words computer network is highlighted to em phasize the fact that obviously some
type of network technology will have to be utilised in this area. Again there will be
advantages and disadvantages to this.
"A distributed database management system is defined as the software system that
permits the management of the DDBs and makes the distribution transparent to the
users." This is simply to clarify that even in the distributed environment a DBMS is
necessary for management purposes. It shares many of the aspects of a centralized
DBMS, but has to have additional facilities such as making the use of the DDB
transparent to the users.
1. Data dictionary - The data dictionary holds information on sites, fragments and
replicated copies.
2. Remote data processor - The remote data processor software is responsible for
most of the distribution functions; it accesses data distribution information from
the data dictionary and is responsible for processing all requests that require
access to more than one site. An important function of the RDP is to hide the
details of data distribution from the user – this is known as Transparency.
3. Network processor - The network processor provides the communication
primitives that are used by the RDP to transmit commands and data among the
various sites as needed.
Data warehouse
A "data warehouse" is an organization-wide snapshot of data, typically used for
decision-making.
Warehouse properties
• Very large: 100gigabytes to many terabytes (or as big as you can go)
• Tends to include historical data
• Workload: A mostly complex query that access lots of data, and do many scans,
joins, aggregations. Tend to look for "the big picture". Some workloads are
canned queries (OLAP), some are ad-hoc (general DSS). Parallelism is must.
• Updates pumped to warehouse in batches (overnight)
• Data may be heavily summarized and/or consolidated in advance (must be done
in batches too, must finish overnight). This is where the lion's share of the
research work has been (e.g. "materialized views") -- a small piece of the
problem.
Data Cleaning
• Data Migration: simple transformation rules (replace "gender" with "sex")
• Data Scrubbing: use domain-specific knowledge (e.g. zip codes) to modify data.
Try parsing and fuzzy matching from multiple sources.
• Data Auditing: discover rules and relationships (or signal violations thereof). Not
unlike data "mining".
Data Load: can take a very long time! (Sorting, indexing, summarization, integrity
constraint checking, etc.) Parallelism a must.
• Full load: like one big exact – change from old data to new is atomic.
• Incremental loading ("refresh") makes sense for big warehouses, but transaction
model is more complex – have to break the load into lots of transactions, and
commit them periodically to avoid locking everything. Need to be careful to keep
metadata & indices consistent along the way.
Developed by
Interactive Training Division
IDM Computer Studies (Pvt) Ltd.
http://www.idm.edu
ASSIGNMENT 1
Admission Bed
Treatment
Under
Outdoor
Doctor
Indoor
This is ERD for the medical system database of the small hospital. Following
assignments are based on this ERD.
Assignments
Question 01
It is required to implement the above ERD as a Relational Database. Write down all the
corresponding relational tables by clearly indicating their primary keys and foreign Keys.
The non-key attributes are not required.
Question 02
Define the following with reference to Relational Database concepts by giving example
from above system.
o Candidate Key
o Primary Key
o Foreign Key
Question 03
“How data redundancy could affect data inconsistency”. Explain with an example from
above system.
Question 04
What is data redundancy and how it has been avoided in Relational databases?
Question 05
Discuss the advantages of having a database over a traditional filling system.
Question 06
Write down the following rules and discuss their applications in above Relational
Database.
ASSIGNMENT 2
Assignments
Question 01
Explain what is meant by DISTRIBUTED DATABASE. Discuss key ADVANTAGES of
managing data using a database.
Question 02
Plan and Design suitable fragmentation plan of relations for the distributed database.
Question 03
What is meant by the “transparency” in the distributed databases?
Question 04
What is data redundancy and how it has been avoided in distributed databases?
Question 05
Discuss the advantages of having distributed databases over a traditional centralized
database system.
Question 06
Discuss the Role of Database Administrator in the above system.
ASSIGNMENT 3
Assignments
Question 01
Identify FOUR of the main types of data files used in data processing.
Question 02
Explain how data in the MEMBER table can be stored as an INDEXED SEQUENTIAL
file. Use a diagram to illustrate your answer. Assume that about 5 records will fit onto
each disc page. Your diagram should show the required index layers as well as the data
pages.
Question 03
Explain how overflow can occur in INDEXED SEQUENTIAL structures and describe how
it is resolved.
Question 04
A decision has been taken to store the BOOK table as an INDEXED SERIAL file.
Compared to an indexed sequential file, explain why this storage method is less efficient
in terms of access to the data.
Question 05
In relational DBMSs 'clusters' may be implemented. Explain how the use of many
clusters can degrade overall performance of the database.
ASSIGNMENT 4
Assignments
Question 01
Write a series of relational algebra statements to process the same query as specified in
above
Question 02
Calculate how many tuples are read as each statement is processed
Question 03
Calculate bow many rows and columns there would be in the resulting relation
Question 04
State the total number of tuples read in the processing of the query.
Question 05
Rewrite the same query in order to optimize and Calculate how many tuples are read as
each statement is processed
Question 06
Calculate bow many rows and columns there would be in the new resulting relation
ASSIGNMENT 5
Assignments
Question 01
Describe the Database transaction concept in respect to the multi user environment.
Question 02
Describe THREE well-known problems, which can be caused by interleaving the
operations of concurrent transactions.
Question 03
Two main solutions to overcome the problems in (Q2) are locking and time stamping.
Explain how locking and time stamping overcome the above problems.
Question 04
What is deadlock? Show, with the aid of a timing diagram, how locking can cause
deadlocks.
Developed by
Interactive Training Division
IDM Computer Studies (Pvt) Ltd.
http://www.idm.edu
CASE STUDY 1
Description
The new system planed to provide Services/functions to Support book loans, book
reservations, issue of overdue loan reminders, book registration, and reader registration.
• To lend a book copy - Loans are one week long. The maximum number of books
on loan for each reader is 10. Readers who reach this limit or with outstanding
overdue loans can not borrow more books. If the reader borrows the book
reserved by him/her, the corresponding reservation is terminated.
• To send reminders - Overdue loans should be monitored and up to two
reminders sent. When a reminder is sent its date will be stored in the loan record.
• To reserve a book title - A reader-can reserve a title if all copies are out on loan.
• To deal with a book return - When a book copy is returned by a reader, the
corresponding loan record is updated and outstanding reservations are checked.
If the book has been reserved, the reservation record is updated to indicate
which copy is available and the appropriate reader is notified.
• To register a new reader - This creates the reader's record.
• To deregister an existing reader - This deletes the reader's record.
• To add a new book copy - This creates the book copy record and the book title
record
• (if the first copy of the title is added).
• To remove an existing book copy - This deletes the book copy record and the
book title record (if the last copy is removed).
To simplify this case study, other aspects of the system (for example, fines) are not
specified.
Requirements
1. Describe the database design process of above library system by dividing the
process in to three steps:
a. Conceptual database design
b. Logical database design
c. Physical database design
3. Draw the Entity event matrix by giving all the entities and the their corresponding
events
4. Draw the Entity relationships diagram with their attributes for the library system to
provide the required services
5. Draw the set relational database tables to implement the ER diagram that you
create in the above question.
6. Plan the physical database design to implement the above system with
specifying the software and hardware requirements.
CASE STUDY 2
Description
A Manufacturing Company makes Parts and assembles these Parts to make a Product
(such as a universal steering joint, a shock absorber). A Part can be used in more than
one Product. Each Part is made on a m achine (such as a capstan lathe) by a Machinist,
an employee of the Company. Machinists work on a particular machine throughout their
employment but Parts can be made on different machines as tool sets can be changed
for machining a different Part. Skilled employees called Setters supervise and manage a
number of machines during the manufacturing process. A customer can buy Products by
completing a Job sheet: this gives details such as the quantity of each product requested
and the selling price.
• What is the total cost of the parts in the assembly of one product with Product No
2209 (a shock absorber)?
• What customer has ordered more than 5 products with Product No 2209 on a
single order?
State any assumptions you make and include any necessary entities or services.
Requirements
2. Draw the Entity event matrix by giving all the entities and the their corresponding
events
3. Draw the Entity relationships diagram with their attributes for the above system to
provide the required services
4. Draw the set relational database tables to implement the ER diagram that you
create in the above question.
5. What are the data base constraints required to maintain the database integrity?
Justify your solutions.
7. How you can plan the fragmentations in the new database system? Justify your
answer.