0% found this document useful (0 votes)
35 views124 pages

Information and DBM Module-updated

Uploaded by

Lemi Biru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views124 pages

Information and DBM Module-updated

Uploaded by

Lemi Biru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

College of Engineering and Technology

Department of Information Technology

Exit Exam Module for Database System and Information Management

By:

1. Mr. Abebayehu Meketa (MSc.)


2. Mr. Ibsa Beyene (MSc.)

Module Name Database System and Information Management

Course Names 1. Fundamentals of Database System (ITec2071)

2. Advanced Database System (ITec3071)

Mattu – Ethiopia
March 2025
Mattu University
College of Engineering and Technology
Department of Information Technology

Exit Exam Module for Database System and Information Management

By:

1. Mr. Abebayehu Meketa (MSc.)


2. Mr. Ibsa Beyene (MSc.)

1) Mr. Firaol Tesfaye(MSc) __________________ ________________


Name of Reviewer-1 Signature Date
2) Mr. Segni Bedasa (MSc) ___________________ ________________
Name of Reviewer-2 Signature Date
3) Mr/Dr. ___________________ ________________
Approved By Signature Date

i
Module Name Database System and Information Management

Course Name Fundamentals of Database System (ITec2071)

Course After completing this course students will be able to understanding:


Objectives Database system and file system
Database Schema
Different types of database models
Distributed database system
Design and Create databases, tables, views, triggers, and indices
Write SQL queries and database programs

ii
Table of Contents

1. Introduction ............................................................................................................................. 1

1.1. File based system ............................................................................................................. 1

1.2. Database System .............................................................................................................. 2

1.2.1. Database Development Lifecycle ............................................................................. 3

1.2.2. Roles in database design environment ...................................................................... 6

1.2.3. The ANSI-SPARC Architecture ............................................................................. 10

Categories of Database System ................................................................................................. 11

1.3. Database Languages ....................................................................................................... 15

1.4. Database Model .............................................................................................................. 17

A. Hierarchical Model ..................................................................................................... 17

B. Network Model ........................................................................................................... 18

C. Entity Relationship Data Model ................................................................................. 19

D. Relational Model ........................................................................................................ 31

E. Object oriented Model ................................................................................................ 38

F. NOSQL Database Model ............................................................................................ 38

1.5. Database Design ............................................................................................................. 39

i) Requirements Analysis .................................................................................................. 40

ii) Conceptual Design ..................................................................................................... 40

iii) Logical Design ............................................................................................................ 41

iv) Physical Design .......................................................................................................... 41

1.6. Functional Dependencies ............................................................................................... 44

1.7. Normalization using Functional Dependencies.............................................................. 48

1.7.1. Normalization ......................................................................................................... 49

iii
1.8. Structured Query Language(SQL) ................................................................................ 53

SQL Basics ................................................................................................................. 54

2. Introduction to Advanced Database Management System ................................................... 70

2.1. Database query processing and optimization ................................................................. 72

2.1.1. Introduction to Query Processing ........................................................................... 72

2.1.2. Introduction to Query Optimization........................................................................ 74

2.2. Transaction Management Procedure .............................................................................. 78

Transaction Initiation .................................................................................................. 78

2.3. Introduction to Database security................................................................................... 85

2.3.1. Database Security technique and procedure ........................................................... 85

2.4. Database Failure and Recovery Methods ....................................................................... 89

Introduction ............................................................................................................................... 89

2.4.1. Types of Database Failures ..................................................................................... 89

2.4.2. Recovery Methods .................................................................................................. 94

2.5. Distributed Database system .......................................................................................... 96

2.5.1. Types of Distributed Database System ................................................................... 97

2.6. Query Processing in Distributed Databases ................................................................. 100

Appendix A ................................................................................................................................. 103

Appendix B ................................................................................................................................. 105

Appendix C ................................................................................................................................. 111

iv
1. Introduction
1.1. File based system

File-based systems were a primary attempt to computerize the manual filing system. File- based
system is a collection of application programs that perform services for the end-users. Each
program defines and manages its data. It is a method of handling and organizing data in which
information is stored in separate files, often organized in a hierarchical or flat structure. Each file
typically contains a specific type of data related to a particular application or aspect of the system.
In this approach, data is managed directly by the applications themselves, without the use of a
centralized system to oversee data storage and retrieval.

Problems occurred in using the file-based approach are:

 Separation and isolation of data: When data is isolated in separate files, it is more difficult
for us to access data that should be available. The application programmer is required to
synchronize the processing of two or more files to ensure the correct data is extracted.
 Duplication of data: When employing the decentralized file-based approach, the
uncontrolled duplication of data is occurred. Uncontrolled duplication of data is undesirable
because:
o Duplication is wasteful
o Duplication can lead to loss of data integrity
 Data dependence: Using file-based system, the physical structure and storage of the data files
and records are defined in the application program code. This characteristic is known as
program-data dependence. Making changes to an existing structure are rather difficult and will
lead to a modification of program. Such maintenance activities are time-consuming and subject
to error.
 Incompatible file formats: The structures of the file are dependent on the application
programming language. However, file structure provided in one programming language such
as direct file, indexed-sequential file which is available in COBOL programming, may be
different from the structure generated by other programming language such as C. The direct
incompatibility makes them difficult to process jointly.
 Fixed queries / proliferation of application programs: File-based systems are very
dependent upon the application programmer. Any required queries or reports have to be written

1
by the application programmer. Normally, a fixed format query or report can only be
entertained and no facility for ad-hoc queries if offered. File-based systems also give
tremendous pressure on data processing staff, with users ‘complaints on programs that are
inadequate or inefficient in meeting their demands. Documentation may be limited and
maintenance of the system is difficult. Provision for security, integrity and recovery capability
is very limited.

1.2. Database System

In order to overcome the limitations of the file-based approach, the concept of database and the
Database Management System (DMS) was emerged in 60s.

A database is a collection of interrelated data that helps in the efficient retrieval, insertion, and
deletion of data from the database and organizes the data in the form of tables, views, schemas,
reports, etc. For Example, a university database organizes the data about students, faculty, admin
staff, etc. which helps in the efficient retrieval, insertion, and deletion of data from it. It represents
some aspect of the real world, sometimes called the mini world or the universe of discourse (UoD).
In the context of websites and applications, when people refer to a “database” they are often talking
about a computer program that allows them to interact with their database. These programs, known
more formally as a database management system (DBMS), are often installed on a virtual private
server and accessed remotely. Redis, MariaDB, and PostgreSQL are a few examples of open-
source DBMSs.

Example: Consider a UNIVERSITY database for maintaining information concerning students,


courses, and grades in a university environment. The database is organized as five files, each of
which stores data records of the same type.

1) STUDENT file: stores data on each student.


2) COURSE file: stores data on each course.
3) SECTION file: stores data on each section of a course.
4) GRADE_REPORT file: stores the grades that students receive in the various sections they
have completed.
5) PREREQUISITE file: stores the prerequisites of each course.

2
Advantages of Database System

 Control of data redundancy: The database approach attempts to eliminate the redundancy by
integrating the file. Although the database approach does not eliminate redundancy entirely, it
controls the amount of redundancy inherent in the database.
 Data consistency: By eliminating or controlling redundancy, the database approach reduces
the risk of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
 More information from the same amount of data: With the integration of the operated data
in the database approach, it may be possible to derive additional information for the same data.
 Sharing of data: Database belongs to the entire organization and can be shared by all
authorized users.
 Improved data integrity: Database integrity provides the validity and consistency of stored
data. Integrity is usually expressed in terms of constraints, which are consistency rules that the
database is not permitted to violate.
 Improved security: Database approach provides a protection of the data from the
unauthorized users. It may take the term of user names and passwords to identify user type and
their access right in the operation including retrieval, insertion, updating and deletion.
 Enforcement of standards: The integration of the database enforces the necessary standards
including data formats, naming conventions, documentation standards, update procedures and
access rules.
 Improved maintenance: Database approach provides a data independence. As a change of
data structure in the database will be affect the application program, it simplifies database
application maintenance.

1.2.1. Database Development Lifecycle


The Database Development Life Cycle (DBDLC) is a structured approach to designing,
developing, and maintaining a database system. It ensures the database meets the organization's
needs, is robust, and operates efficiently. The life cycle typically includes the following phases:

i) Requirements Analysis

During this step, the database designers have to interview the customers (database users) to
understand the proposed system and obtain and document the data and functional requirements.
The result of this step is a document that includes the detailed requirements provided by the users.

3
Organization experts (managers, accountants, users) are necessarily involved in the creation of the
database.

Generally,

 Objective: Understand the business needs and define what the database should achieve.
 Key Activities:

Gather requirements from stakeholders.

Identify the types of data to be stored.

Define the relationships between data entities.

Document business rules and constraints.

 Outcome: A requirements specification document


ii) Conceptual Design

The conceptual data model provides a shared, formal representation of what is being
communicated between clients and developers during database development it is focused on the
data in a database, irrespective of the eventual use of that data in user processes or implementation
of the data in specific computer environments. Therefore, a conceptual data model is concerned
with the meaning and structure of data, but not with the details affecting how they are implemented.
The conceptual data model then is a formal representation of what data a database should contain
and the constraints the data must satisfy. This should be expressed in terms that are independent
of how the model may be implemented.

Objective: Create a high-level model of the database.


Key Activities:

o Develop an Entity-Relationship Diagram (ERD) or similar models.

o Identify entities, attributes, and relationships.

o Focus on data abstraction and avoid technical details.

Outcome: Conceptual schema that outlines the database structure.

4
iii) Logical Design

Database design starts with a conceptual data model and produces a specification of a logical
schema; this will determine the specific type of database system (network, relational, object-
oriented) that is required. The relational representation is still independent of any specific DBMS;
it is another conceptual data model. We can use a relational representation of the conceptual data
model as input to the logical design process. The output of this stage is a detailed relational
specification, the logical schema, of all the tables and constraints needed to satisfy the description
of the data in the conceptual data model. It is during this design activity that choices are made as
to which tables are most appropriate for representing the data in a database. These choices must
take into account various design criteria including, for example, flexibility for change, control of
duplication and how best to represent the constraints. It is the tables defined by the logical schema
that determine what data are stored and how they may be manipulated in the database.

Objective: Transform the conceptual model into a logical schema.

Key Activities:

o Map entities and relationships into tables.

o Define primary and foreign keys.

o Normalize the database to reduce redundancy and improve integrity.

Outcome: Logical schema independent of the specific database management system (DBMS).
iv) Physical Design
Objective: Optimize the database for a specific DBMS and hardware environment.
Key Activities:

o Choose data types for attributes.

o Define storage structures (e.g., indexes, partitions).

o Optimize for performance, including query response times.

Outcome: Physical schema tailored to the implementation environment.

5
1.2.2. Roles in database design environment
Database design involves multiple roles, each contributing to the effective development,
implementation, and maintenance of a database system. Here are the primary roles in a database
design environment:

 Database Administrator (DBA)

A Database Administrator (DBA) is a professional responsible for the management,


maintenance, and optimization of an organization's database systems. They ensure the reliability,
security, and availability of databases while supporting the organization's data needs. The DBA
plays a critical role in both the operational and strategic aspects of database management.

Key Responsibilities of a DBA

1. Database Installation and Configuration:

o Set up database management systems (DBMS) such as Oracle, SQL Server, MySQL, or
PostgreSQL.

o Configure systems for optimal performance and reliability.

2. Performance Monitoring and Tuning:

o Regularly monitor database performance and optimize queries, indexes, and configurations to
maintain efficiency.

3. Data Security:

o Implement access controls, encryption, and other measures to protect sensitive data from
unauthorized access or breaches.

4. Backup and Recovery:

o Develop and execute backup strategies to safeguard data.

o Plan for disaster recovery to minimize data loss in case of hardware failure or cyberattacks.

5. Database Maintenance:

6
o Perform routine maintenance tasks such as updating software, reorganizing indexes, and
ensuring the database is running smoothly.

6. User Support:

o Provide support to developers and analysts by granting appropriate permissions and


troubleshooting database-related issues.

7. Capacity Planning:

o Forecast future database storage needs and plan for upgrades or expansions.

8. Compliance and Auditing:

o Ensure databases comply with regulatory standards and industry best practices.

o Conduct audits to maintain accountability and data integrity.

Skills and Tools Used by a DBA

 Technical Skills:

o Proficiency in SQL and database programming.

o Knowledge of DBMS like Oracle, Microsoft SQL Server, MySQL, or PostgreSQL.

o Understanding of operating systems and network architecture.

o Expertise in backup, recovery, and security strategies.

 Tools:

o Database monitoring tools such as SolarWinds, Nagios, or Oracle Enterprise Manager.

o Scripting tools (e.g., Python, Bash) for automation.

o Backup and recovery tools like RMAN or Veeam.

 Soft Skills:

o Problem-solving abilities for quick issue resolution.

o Communication skills to work with cross-functional teams and explain complex concepts.

Importance of a DBA

7
A DBA is essential for:

1. Data Availability: Ensures databases are accessible and responsive to meet business
needs.

2. Data Security: Protects critical data from unauthorized access or loss.

3. Operational Efficiency: Maintains system performance and minimizes downtime.

4. Regulatory Compliance: Ensures databases meet legal and industry requirements.

5. Strategic Planning: Supports organizational growth by planning for future data needs.

 Database Designer

A Database Designer is a professional responsible for designing the structure and architecture of
a database to meet the specific needs of an organization or application. They play a crucial role in
the development lifecycle of a database, ensuring that it is efficient, scalable, and capable of storing
and retrieving data effectively.

Key Responsibilities of a Database Designer

1. Requirement Analysis:

o Understand the needs of the users or organization by collaborating with stakeholders to


determine what data will be stored, accessed, and processed.

2. Database Schema Design:

o Create the conceptual, logical, and physical designs of the database. This includes defining
tables, relationships, constraints, and indexes.

3. Normalization:

o Organize data to reduce redundancy and improve data integrity while maintaining efficient
access.

4. Data Modeling:

o Use tools such as Entity-Relationship Diagrams (ERDs) or Unified Modeling Language


(UML) to visually represent data relationships.

5. Collaboration:

8
o Work closely with database administrators, application developers, and business analysts to
ensure the database supports application requirements and business objectives.

6. Optimization:

o Design databases with performance and scalability in mind, considering factors like indexing,
query optimization, and data access patterns.

7. Documentation:

Provide clear documentation for the database design, including data dictionaries, schemas, and
entity-relationship diagrams

 Skills and Tools Used by a Database Designer


Technical Skills:

o Proficiency in database management systems (DBMS) like MySQL, PostgreSQL,


Oracle, or Microsoft SQL Server.

o Knowledge of Structured Query Language (SQL).

o Familiarity with database design principles and best practices.

Tools:

o ERD tools such as Lucidchart, ER/Studio, or dbForge Studio.

o Data modeling tools like IBM InfoSphere, SAP PowerDesigner, or Toad Data Modeler.

Soft Skills:

o Analytical thinking and problem-solving.

o Communication skills to collaborate effectively with cross-functional teams.

 Importance of a Database Designer


A database designer is essential to ensure:

o Data Integrity: Ensures that the database consistently represents accurate and reliable data.

o Performance: A well-designed database supports fast and efficient queries.

o Scalability: The database can grow with the organization’s needs.

9
o Maintainability: Simplifies troubleshooting, updates, and modifications.

1.2.3. The ANSI-SPARC Architecture


The ANSI-SPARC architecture, also known as the three-schema architecture, is a framework
for designing and managing database systems. It was developed in the 1970s by the American
National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC).
This architecture provides a clear separation of concerns for database systems by dividing them
into three levels: external, conceptual, and internal. Each level serves a specific purpose and
helps manage complexity while maintaining independence between different users and system
functions.

1. External Level (User View)

Purpose: Represents how individual users or user groups view the data.

Features:

o Each user has a customized view tailored to their specific requirements, often called the
external schema.

o Hides irrelevant details and presents only the data of interest to the user.

o Ensures data security and privacy by restricting access to specific subsets of the database.

Examples: A financial analyst sees sales reports, while a customer support representative sees
customer profiles.

2. Conceptual Level (Logical View)

Purpose: Represents the entire database logically as a unified view, independent of physical
storage details.

Features:

o Provides a global view of the data, including relationships, constraints, and rules.

o Defines the conceptual schema, which abstracts the data structure and relationships in
terms of entities, attributes, and constraints.

o Ensures logical independence changes at the internal level do not affect this level.

Examples: An entity-relationship model describing customers, orders, and products.

10
3. Internal Level (Physical View)

Purpose: Represents how data is physically stored in the database system.

Features:

o Defines the internal schema, specifying storage structures, file layouts, indexes, and access
paths.

o Handles data optimization and ensures efficient storage and retrieval.

o Ensures physical independence—changes in storage structures do not affect the conceptual


level.

Examples: Data stored as B-trees, hash tables, or heap files.

 Key Benefits of the ANSI-SPARC Architecture


1. Data Abstraction: Clear separation between physical, logical, and user-specific views
simplifies system design and maintenance.

2. Data Independence:

Logical independence: The ability to change the conceptual schema without affecting
external schemas.

Physical independence: The ability to change the internal schema without affecting the
conceptual schema.

3. Flexibility and Scalability: Supports multiple user views and accommodates changes at
different levels without disrupting the overall system.

4. Security and Privacy: External schemas restrict user access to specific parts of the
database.

Categories of Database System

i) Relational Database Model

Since the 1970s, most DBMSs have been designed around the relational model. The most
fundamental elements in the relational model are relations, which users and modern relational

11
DBMSs (RDBMSs or relational databases) recognize as tables. A relation is a set of tuples, or
rows in a table, with each tuple sharing a set of attributes, or columns:

You can think of each tuple as a unique instance of whatever type of people, objects, events, or
associations the table holds. These instances might be things like employees at a company, sales
from an online business, or test results in a medical lab. For example, in a table that holds employee
records of teachers at a school, the tuples might have attributes like name, subjects, start_date, and
so on.

In the relational model, each table contains at least one column that can be used to uniquely
identify each row, called a primary key. Building on the example of a table storing employee
records of teachers at a school, the database administrator could create a primary key column
named employee_ID whose values automatically increment. This would allow the DBMS to keep
track of each record and return them on an ad hoc basis. In turn, it would mean that the records
have no defined logical order, and users have the ability to return their data in whatever order or
through whatever filters they wish.

If you have two tables that you’d like to associate with one another, one way you can do so is with
a foreign key. A foreign key is essentially a copy of one table’s (the “parent” table) primary key
inserted into a column in another table (the “child”). The following example highlights the
relationship between two tables, one used to record information about employees at a company
and another used to track the company’s sales. In this example, the primary key of the
EMPLOYEES table is used as the foreign key of the TRAINER table:

12
Employee

TRAINER

In the above tables, for table “EMPLOYEES” primary key is Employyee_ID and Foreign Key for
table “TRAINER” is TrainerPerson_ID.

The relational model’s structural elements help to keep data stored in an organized way, but storing
data is only useful if you can retrieve it. To retrieve information from an RDBMS, you can issue
a query, or a structured request for a set of information. Most relational databases use a language
called Structured Query Language known as SQL and informally pronounced like “sequel” to
manage and query data. SQL allows you to filter and manipulate query results with a variety of
clauses, predicates, and expressions, giving you fine control over what data will appear in the result
set.

13
There are many open-source RDBMSs available today, including the following:

 MySQL
 MariaDB
 PostgreSQL
 SQLite

ii. Non-relational Databases

Nowadays, most applications still use the relation model to store and organize data. However, the
relation model cannot meet the needs of every application. For example, it can be difficult to scale
relational databases horizontally, and though they’re ideal for storing structured data, they are less
useful for storing unstructured data. These and other limitations of the relational model have led
to the development of alternatives. Collectively, these database models are often referred to as
non-relational databases. Because these alternative models typically don’t implement SQL for
defining or querying data, they are also sometimes referred to as NoSQL databases. This also
means that many NoSQL databases implement a unique syntax to insert and retrieve data.

It can be helpful to think of “NoSQL” and “non-relational” as broad umbrella terms, as there are
many database models that are labeled as NoSQL, with significant differences between them. The
remainder of this section highlights a few of the more commonly used non-relational database
models:

 Key-Value Databases

Key-value databases, also known as key-value stores, work by storing and managing associative
arrays. An associative array, also known as a dictionary or hash table, consists of a collection of
key-value pairs in which a key serves as a unique identifier to retrieve an associated value. Values
can be anything from simple objects, like integers or strings, to more complex objects, like JSON
structures. Redis: is an example of a popular, open-source key-value store.

 Document-Oriented Databases

Document-oriented databases, or document stores, are NoSQL databases that store data in the form
of documents. Document stores are a type of key-value store: each document has a unique
identifier its key and the document itself serves as the value. The difference between these two
models is that, in a key-value database, the data is treated as opaque and the database doesn’t know

14
or care about the data held within it; it’s up to the application to understand what data is stored. In
a document store, however, each document contains some kind of metadata that provides a degree
of structure to the data. Document stores often come with an API or query language that allows
users to retrieve documents based on the metadata they contain. They also allow for complex data
structures, as you can nest documents within other documents. MongoDB is a widely used
document database. The documents you store in a MongoDB database are written in BSON, which
is a binary form of JSON.

 Columnar Databases

Columnar databases, sometimes called column-oriented databases, are database systems that store
data in columns. This may seem similar to traditional relational databases, but rather than grouping
columns together into tables, each column is stored in a separate file or region in the system’s
storage. The data stored in a columnar database appears in record order, meaning that the first
entry in one column is related to the first entry in other columns. This design allows queries to
only read the columns they need, rather than having to read every row in a table and discard
unneeded data after it’s been stored in memory. Apache Cassandra is a widely used open-source
column store.

1.3. Database Languages

A DBMS has appropriate languages and interfaces to express database queries and updates. It can
be used to read, store and update the data in the database.

Types of Database Language

15
i) Data Definition Language(DDL)

Data definition language is used to define and manage all attributes and properties of a database,
including row layouts, column definitions, key columns, file locations, and storage strategy. DDL
statements are used to build and modify the structure of tables and other objects such as views,
triggers, stored procedures, and so on. For each object, there are usually CREATE, ALTER, and
DROP statements (such as, CREATE TABLE, ALTER TABLE, and DROP TABLE).

 Common tasks that come under DDL:


o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary

Most DDL statements take the following form:

o CREATE object _ name


o ALTER object _ name
o DROP object _ name

In DDL statements, object_name can be a table, view, trigger, stored procedure, and so on.

ii. Data Manipulation Language (DML)

It is used for accessing and manipulating data in a database and handles user requests.

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.
iii. Data Control Languag(DCL)

16
It is used to retrieve the stored or saved data and the execution is transactional. It also has
rollback parameters. (But in Oracle database, the execution of data control language does not have
the feature of rolling.

Some tasks that come under DCL are:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke: CONNECT, INSERT,
USAGE, EXECUTE, DELETE, UPDATE and SELECT.

iv. Transaction Control Language(TCL)

TCL is used to run the changes made by the DML statement and it can be grouped into a logical
transaction.

Here are some tasks that come under TCL

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last Commit

1.4. Database Model

A database model refers to the structure or framework used to define, organize, and store data in
a database. It determines how data is stored, how relationships between data are managed, and
how data can be accessed or modified. There are several types of database models, each offering
different ways of organizing and managing data.

Here are some common types of database models:

A. Hierarchical Model

This is one of the oldest models in a data model which was developed by IBM, in the 1950s. In a
hierarchical model, data are viewed as a collection of tables, or we can say segments that form a
hierarchical relation. In this, the data is organized into a tree-like structure where each record
consists of one parent record and many children. Even if the segments are connected as a chain-
like structure by logical associations, then the instant structure can be a fan structure with multiple
branches. We call the illogical associations as directional associations.

17
In the hierarchical model, segments pointed to by the logical association are called the child
segment and the other segment is called the parent segment. If there is a segment without a parent
is then that will be called the root and the segment which has no children are called the leaves.
The main disadvantage of the hierarchical model is that it can have one-to-one and one-to-many
relationships between the nodes.

Applications of hierarchical model :

Hierarchical models are generally used as semantic models in practice as many real-world
occurrences of events are hierarchical in nature like biological structures, political, or social
structures.
Hierarchical models are also commonly used as physical models because of the inherent
hierarchical structure of the disk storage system like tracks, cylinders, etc. There are various
examples such as Information Management System (IMS) by IBM, NOMAD by NCSS, etc.

Example: IMS (Information Management System) is a hierarchical database model.

Figure: Hierarchical database model

B. Network Model
Some data were more naturally modeled with more than one parent per child. So, the network
model permitted the modeling of many-to-many relationships in data. In 1971, the Conference on
Data Systems Languages (CODASYL) formally defined the network model. The basic data
modeling construct in the network model is the set construct. A set consists of an owner record

18
type, a set name, and a member record type. A member record type can have that role in more than
one set, hence the multipart concept is supported. An owner record type can also be a member or
owner in another set.

The data model is a simple network, and link and intersection record types (called junction records
by IDMS) may exist, as well as sets between them. Thus, the complete network of relationships is
represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the
network arrow) and one or more record types are members (at the head of the relationship arrow).
Usually, a set defines a 1: M relationship, although 1:1 is permitted. The CODASYL network
model is based on mathematical set theory.

Example: for network database model

C. Entity Relationship Data Model


It is well suited to data modelling for use with databases because it is fairly abstract and is easy to
discuss and explain. ER models are readily translated to relations. ER models, also called an ER
schema, are represented by ER diagrams.

ER modelling is based on two concepts:

o Entities, defined as tables that hold specific information (data)


o Relationships, defined as the associations or interactions between entities

19
Here is an example of how these two concepts might be combined in an ER data model: Mr. Ebisa
(entity) teaches (relationship) the Database Systems course (entity).

 Entity, Entity Set and Entity Type

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable.
For example, in a school database, students, teachers, classes, and courses offered can be
considered as entities. All these entities have some attributes or properties that give them their
identity. An entity set is a collection of similar types of entities. An entity set may contain entities
with attribute sharing similar values. For example, a Students set may contain all the students of a
school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity
sets need not be disjoint.

A real-world thing either living or non-living that is easily recognizable and no recognizable. It is
anything in the enterprise that is to be represented in our database. It may be a physical thing or
simply a fact about the enterprise or an event that happens in the real world.

An entity can be place, person, object, event or a concept, which stores data in the database. The
characteristics of entities are must have an attribute, and a unique key. Every entity is made up of
some 'attributes' which represent that entity.

Examples of entities:

o Person: Employee, Student, Patient


o Place: Store, Building
o Object: Machine, product, and Car
o Event: Sale, Registration, Renewal
o Concept: Account, Course

Entities can be classified based on their strength. An entity is considered to be weak if its tables
are existence dependent.

o It cannot exist without a relationship with another entity


o Its primary key is derived from the primary key of the parent entity

An entity is considered strong if it can exist apart from all of its related entities.

o Kernels are strong entities.

20
o A table without a foreign key or a table that contains a foreign key that can contain nulls
is a strong entity

An entity set is a collection of entities of an entity type at a particular point of time. In an entity
relationship diagram (ERD), an entity type is represented by a name in a box.

Figure: ERD with entity type EMPLOYEE

 Types of Entities
i. Independent entities

Independent entities, also referred to as kernels, are the backbone of the database. They are what
other tables are based on. Kernels have the following characteristics:

o They are the building blocks of a database.


o The primary key may be simple or composite.
o The primary key is not a foreign key.
o They do not depend on another entity for their existence.
ii. Dependent entities

Dependent entities, also referred to as derived entities, depend on other tables for their meaning.
These entities have the following characteristics:

o Dependent entities are used to connect two kernels together.


o They are said to be existence dependent on two or more tables.
o Many to many relationships become associative tables with at least two foreign keys.
o They may contain other attributes.
o The foreign key identifies each associated table.
o There are three options for the primary key:
 Use a composite of foreign keys of associated tables if unique
 Use a composite of foreign keys and a qualifying column

21
 Create a new simple primary key
 Characteristic entities

Characteristic entities provide more information about another table. These entities have the
following characteristics:

o They represent multivalued attributes.


o They describe other entities.
o They typically have a one to many relationship.
o The foreign key is used to further identify the characterized table.
Options for primary key are as follows:
i. Use a composite of foreign key plus a qualifying column
ii. Create a new simple primary key. In the COMPANY database, these might include:
o Employee (EID, Name, Address, Age, Salary) – EID is the simple primary key.
o EmployeePhone (EID, Phone) – EID is part of a composite primary key. Here, EID is also
a foreign key.
 Attributes

Each entity is described by a set of attributes (e.g., Employee = (Name, Address, Birthdate (Age),
Salary). Each attribute has a name, and is associated with an entity and a domain of legal values.
However, the information about attribute domain is not presented on the ERD.

Figure: How attributes are represented in an ERD.

22
 Types of Attributes

There are a few types of attributes you need to be familiar with. Some of these are to be left as is,
but some need to be adjusted to facilitate representation in the relational model. This first section
will discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly
into the relational model.

 Simple attributes

Simple attributes are those drawn from the atomic value domains; they are also called single-
valued attributes. In the COMPANY database, an example of this would be: Name = {John ;Age=
{23}.

 Composite attributes

Composite attributes are those that consist of a hierarchy of attributes. Address may consist of
Number, Street and Suburb. So this would be written as → Address = {59 + ‘Meek Street’ +
‘Kingsford’}.

Figure: An example of composite attributes

 Multivalued attributes

Multivalued attributes are attributes that have a set of values for each entity. An example of a
multivalued attribute from the COMPANY database, are the degrees of an employee: BSc, MIT,
PhD.

23
Figure: Example of a multivalued attribute

 Derived attributes

Derived attributes are attributes that contain values calculated from other attributes. Age can be
derived from the attribute Birthdate. In this situation, Birthdate is called a stored attribute, which
is physically saved to the database.

Figure: Example of a derived attribute.

 Keys

An important constraint on an entity is the key. The key is an attribute or a group of attributes
whose values can be used to uniquely identify an individual entity in an entity set.

24
Types of Keys
 Candidate key

A candidate key is a simple or composite key that is unique and minimal. It is unique because no
two rows in a table may have the same value at any time. It is minimal because every column is
necessary in order to attain uniqueness. From our COMPANY database example, if the entity is
Employee (EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID),
possible candidate keys are:

o EID, SIN
o First Name and Last Name - assuming there is no one else in the company with the same
name
o Last Name and DepartmentID - assuming two people with the same last name don’t work
in the same department
 Composite key

A composite key is composed of two or more attributes, but it must be minimal. Using the example
from the candidate key section, possible composite keys are:

o First Name and Last Name - assuming there is no one else in the company with the same name
o Last Name and Department ID - assuming two people with the same last name don’t work in
the same department.

 Primary key

The primary key is a candidate key that is selected by the database designer to be used as an
identifying mechanism for the whole entity set. It must uniquely identify tuples in a table and not
be null. The primary key is indicated in the ER model by underlining the attribute.

o A candidate key is selected by the designer to uniquely identify tuples in a table. It must not
be null.
o A key is chosen by the database designer to be used as an identifying mechanism for the whole
entity set.

This is referred to as the primary key. This key is indicated by underlining the attribute in the ER
model.

In the following example, EID is the primary key:

Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID).

25
Secondary key: is an attribute used strictly for retrieval purposes (can be composite), for
example: Phone and Last Name.
Alternate keys: are all candidate keys not chosen as the primary key.
Foreign key:- is an attribute in a table that references the primary key in another table OR it
can be null. Both foreign and primary keys must be of the same data type.

In the COMPANY database example below, DepartmentID is the foreign key: Employee(EID,
First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID).

 Nulls

A null is a special symbol, independent of data type, which means either unknown or inapplicable. It does
not mean zero or blank. Features of null include:

o No data entry
o Not permitted in the primary key
o Should be avoided in other attributes
o Can represent
 An unknown attribute value
 A known, but missing, attribute value
 A “not applicable” condition
o Can create problems when functions such as COUNT, AVERAGE and SUM are used
o Can create logical problems when relational tables are linked.

NOTE: The result of a comparison operation is null when either argument is null. The result of an
arithmetic operation is null when either argument is null (except functions that ignore nulls).

 Example of how null can be used

Use the Salary table (Salary_tbl) below to follow an example of how null can be used.

26
Table: Salary_tbl

EmpNo JobName Salary Commission

E10 Teacher 12,500 32,090

E11 Null 25,000 8,000

E12 Teacher 44,000 0

E13 Teacher 44,000 Null

To begin, find all employees (EmpNo) in Sales (under the jobName column) whose salary plus
commission are greater than 30,000.

SELECT EmpNo FROM Salary_tbl


WHERE jobName = Sales AND
(Commission + Salary) > 30,000 –> Output=> E10 and E12
This result does not include E13 because of the null value in the commission column. To ensure
that the row with the null value is included, we need to look at the individual fields. By adding
commission and salary for employee E13, the result will be a null value. The solution is shown
below.

SELECT EmpNo FROM Salary_tbl WHERE jobName = Sales AND (commission > 30000 OR
salary > 30000 OR (commission + salary) > 30,000 => E10 and E12 and E13.
 Relationships
Relationships are the glue that holds the tables together. They are used to connect related
information between tables. Relationship strength is based on how the primary key of a related
entity is defined. A weak, or non-identifying, relationship exists if the primary key of the related
entity does not contain a primary key component of the parent entity. Company database examples
include:

Customer(CustID, CustName)
Order(OrderID, CustID, Date)

A strong, or identifying, relationship exists when the primary key of the related entity contains the
primary key component of the parent entity. Examples include:

Course(CrsCode, DeptCode, Description)

27
Class (CrsCode, Section, ClassTime…)

Types of relationship

i) One to one (1:1) relationship

When only one instance of an entity is associated with the relationship, then it is known as one to
one relationship. For example, A female can marry to one male, and a male can marry to one
female.

ii) One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship. For
example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

iii) Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the
right associates with the relationship then it is known as a many-to-one relationship. For example,
Student enrolls for only one course, but a course can have many students.

28
iv) Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then it is known as a many-to-many relationship. For
example, Employee can assign by many projects and project can have many employees.

Enhanced Entity – Relationship Model


The enhanced entity–relationship (EER) model (or extended entity–relationship model) in
computer science is a high-level or conceptual data model incorporating extensions to the original
entity–relationship (ER) model, used in the design of databases.

 Prerequisite

Generalization, Specialization and Aggregation in ER model are used for data abstraction in which
abstraction mechanism is used to hide details of a set of objects.

 Generalization

Generalization is the process of extracting common properties from a set of entities and create a
generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher level entity if they have some attributes in common. For Example,
STUDENT and FACULTY can be generalized to a higher level entity called PERSON as shown

29
below. In this case, common attributes like P_NAME, P_ADD become part of higher entity
(PERSON) and specialized attributes like S_FEE become part of specialized entity (STUDENT).

 Specialization
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-
down approach where higher level entity is specialized into two or more lower level entities. For
Example, EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER etc. as shown below. In this case, common attributes like E_NAME,
E_SAL etc. become part of higher entity (EMPLOYEE) and specialized attributes like T_TYPE
become part of specialized entity (TESTER).

30
 Aggregation
An ER diagram is not capable of representing relationship between an entity and a relationship
which may be required in some scenarios. In those cases, a relationship with its corresponding
entities is aggregated into a higher level entity. For Example, Employee working for a project may
require some machinery. So, REQUIRE relationship is needed between relationship
WORKS_FOR and entity MACHINERY. Using aggregation, WORKS_FOR relationship with its
entities EMPLOYEE and PROJECT is aggregated into single entity and relationship REQUIRE
is created between aggregated entity and MACHINERY.

D. Relational Model
Relational Database Model represents the database as a collection of relations. A relation is
nothing but a table of values. Every row in the table represents a collection of related data values.
These rows in the table denote a real-world entity or relationship. The table name and column
names are helpful to interpret the meaning of values in each row. The data are represented as a set
of relations. In the relational model, data are stored as tables. However, the physical storage of the
data is independent of the way the data are logically organized.

Some popular Relational Database management systems are:

DB2 and Informix Dynamic Server:- IBM


Oracle and RDB:- Oracle
SQL Server and Access:- Microsoft

31
 Relational Model Concepts in DBMS
Attribute: - Each column in a Table. Attributes are the properties which define a relation.
e.g., Student_Rollno, Name, etc.
Tables: - In the Relational model the, relations are saved in the table format. It is stored along
with its entities. A table has two properties rows and columns. Rows represent records and
columns represent attributes.
Tuple: It is nothing but a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: - The column represents the set of values for a specific attribute.
Relation instance: - Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
Relation key: Every row has one, two or multiple attributes, which is called relation key.
Attribute domain – Every attribute has some pre-defined value and scope which is known
as attribute domain.

 Properties of a Table
A table has a name that is distinct from all other tables in the database.
There are no duplicate rows; each row is distinct.

32
Entries in columns are atomic. The table does not contain repeating groups or multivalued
attributes.
Entries from columns are from the same domain based on their data type including:
o Number (numeric, integer, float, small int…)
o Character (string)
o Date
o Logical (true or false)
Operations combining different data types are disallowed.
Each attribute has a distinct name.
The sequence of columns is insignificant.
The sequence of rows is insignificant.
 Relational Integrity Constraints

Relational Integrity constraints in DBMS are referred to conditions which must be present for a
valid relation. These Relational constraints in DBMS are derived from the rules in the mini-world
that the database represents.

There are many types of Integrity Constraints in DBMS. Constraints on the Relational database
management system is mostly divided into three main categories are:

Domain Constraints
Key Constraints
Referential Integrity Constraints
 Domain Constraints

Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type. Domain constraints specify that within each tuple,
and the value of each attribute must be unique. This is specified as data types which include
standard data type’s integers, real numbers, characters, Booleans, variable length strings, etc.

 Key Constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The value
of the attribute for different tuples in the relation has to be unique.

33
Example

CustomerID CustomerName Status

101 Google Active

102 Amazon Active

103 Apple Inactive

In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have a
single key for one customer, CustomerID =101 is only for the CustomerName =” Google”.

 Referential Integrity Constraints

Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign
key is an important attribute of a relation which should be referred to in other relationships.
Referential integrity constraint state happens where relation refers to a key attribute of a different
or same relation. However, that key element must exist in the table.

Example:

In the above example, we have 2 relations, Customer and Billing

34
Tuple for CustomerID =1 is referenced twice in the relation Billing. So we know
CustomerName=Google has billing amount $300.

 Operations in Relational Model

Four basic update operations performed on relational database model are:

Insert is used to insert data into the relation


Delete is used to delete tuples from the table.
Modify allows you to change the values of some attributes in existing tuples.
Select allows you to choose a specific range of data.
 Insert Operation

The insert operation gives values of the attribute for a new tuple which should be inserted into a
relation.

 Update Operation
You can see that in the below-given relation table CustomerName= „Apple‟ is updated from Inactive to
Active.

35
 Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.

In the above-given example, CustomerName= “Apple” is deleted from the table. The Delete
operation could violate referential integrity if the tuple which is deleted is referenced by foreign
keys from other tuples in the same database.

In the above-given example, CustomerName=”Amazon” is selected.

 Advantages of Relational Database Model


Simplicity: A Relational data model in DBMS is simpler than the hierarchical and network
model.
Structural Independence: The relational database is only concerned with data and not
with a structure. This can improve the performance of the model.
Easy to use: The Relational model in DBMS is easy as tables consisting of rows and
columns are quite natural and simple to understand
Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.

36
Data independence: The Structure of Relational database can be changed without having
to change any application.
Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.
 Disadvantages of Relational Model
 Few relational databases have limits on field lengths which can’t be exceeded.
 Relational databases can sometimes become complex as the amount of data grows, and the
relations between pieces of data become more complicated.
 Complex relational database systems may lead to isolated databases where the information
cannot be shared from one system to another.

 DatabaseSchema
A database schema is the skeleton structure that represents the logical view of the entire database.
It defines how the data is organized and how the relations among them are associated. It formulates
all the constraints that are to be applied on the data. A database schema defines its entities and the
relationship among them. It contains a descriptive detail of the database, which can be depicted by
means of schema diagrams. It’s the database designers who design the schema to help
programmers understand the database and make it useful.

37
A database schema can be divided broadly into two categories:

Physical Database Schema: This schema pertains to the actual storage of data and its form
of storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
Logical Database Schema: This schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.

 Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is operational,
it is very difficult to make any changes to it. A database schema does not contain any data or
information.

A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that its
every instance (state) is in a valid state, by diligently following all the validations, constraints, and
conditions that the database designers have imposed.

E. Object oriented Model


The first obstacle faced by programmers using the relational data model was the limited type
system supported by the relational model. Complex application domains require correspondingly
complex data types, such as nested record structures, multivalued attributes, inheritance etc. The
object relational data model extends the relational data model by providing a richer type system
including complex data types and object orientation.

Example: Computer-aided-design and Geographical Information Systems.

F. NOSQL Database Model


Though traditional Relational Database Management Systems (RDBMS) have existed for decades
and are constantly being improved by the database vendors, RDBMS struggle to handle the large
volumes of data. Nevertheless, a new category of database technology called NoSQL databases is
able to support larger volumes of data by providing faster data access and cost savings. Basically,
the cost savings and improved performance of NoSQL databases results from a physical
architecture that includes the use of inexpensive commodity servers that leverage distributed
processing.

38
In addition to distributed processing and inexpensive hardware, NoSQL databases differ
significantly on their approach to maintaining data integrity and consistency. A more relaxed
approach to data consistency helps NoSQL databases improve the performance of data storage.
Because RDBMS highly value data integrity, they use the ACID theorem for data consistency
which was presented in the early 1980’s by Jim Grey. ACID is an acronym for Atomicity,
Consistency, Isolation and Durability and supports the concept of a transaction.

Atomicity: ensures that all tasks with a transaction are performed completely (all or
nothing).
Consistency: ensures that a transaction must leave the database in a consistent state at the
end of the transaction.
Isolation: ensures that transactions are isolated and do not interfere with other transactions.
Durability: ensures that once the transaction is complete, the data will persist permanently,
even in the event of a system failure.

In contrast, NoSQL databases use the CAP theorem (Consistency, Availability and Partition
Tolerance) for data consistency. The CAP Theorem states that of the three possible combinations
of CAP, only two are available at a given point in time. In another words, NoSQL databases can
have partition tolerance (in a distributed environment) and consistency or partition tolerance and
availability, but not all three factors at the same time. CAP theorem has evolved into what is now
called BASE (Basically Available, Soft state and Eventual consistency).

Basically Available: means that data will be available, however the response from the
database can be a failure to retrieve the data or the data retrieved may be in an inconsistent
state.
Soft state: means that the data can change over time as the database seeks consistency.
Eventual consistency: means that sooner or later, the database will eventually become
consistent.

1.5. Database Design

What is Database Design?

Database design is the process of creating a detailed data model of a database. It involves defining
the structure, format, and organization of the data stored in the database system. The goal is to

39
optimize data management, ensuring data is stored efficiently, and can be retrieved and updated
quickly and accurately. It involves defining data structures, relationships, and rules for storing,
accessing, and manipulating data. Effective database design reduces redundancy, ensures data
integrity, and optimizes performance.

 Database Design Fundamentals

Database design encompasses several stages that collectively ensure the database meets the needs
of its users and applications. These stages are:

i) Requirements Analysis

The first step in database design is to gather and analyze requirements. This involves
understanding what data needs to be stored, how it will be used, and the expected performance
requirements. For example, consider a simple requirement: a library database system needs to
store information about books, authors, and borrowers.

Example of Requirements Analysis

Consider a simple requirement: a library database system needs to store information about
books, authors, and borrowers. Key activities in this stage include:

Interviews and Surveys: Engaging with users, managers, and stakeholders to collect
detailed information about their data needs.

Document Analysis: Reviewing existing documentation, such as reports, forms, and


organizational charts, to understand current data usage and flow.

Requirement Specification: Writing detailed requirements documents that outline all data
needs, including data types, relationships, constraints, and security requirements.

ii) Conceptual Design

Once requirements are gathered, the next step is to create a conceptual design. This involves
creating an abstract model of the database. Using the library example, the conceptual design
might include entities like 'Book', 'Author', and 'Borrower', along with their relationships (e.g.,
each book is written by an author).

Example of Conceptual Design

40
Using the library example, the conceptual design might include:

 Entities: Book, Author, Member, Loan

 Attributes

o Book: ISBN, Title, PubYear

o Author: AuthorID, Name

o Member: MemberID, Name, Email

o Loan: LoanID, LoanDate, DueDate, MemberID, ISBN

 Relationships

o A Book can have multiple Authors.

o A Member can borrow multiple Books.

o A Loan links a Member to a Book.

iii) Logical Design


The logical design phase involves translating the conceptual model into a logical schema using
a data model such as the Entity-Relationship (ER) model. Here, each entity and relationship is
defined in detail, specifying attributes and cardinalities.

Example of Logical Design

For the library system, the logical design might involve:

Tables:

o Book: ISBN (PK), Title, PubYear


o Author: AuthorID (PK), Name
o Member: MemberID (PK), Name, Email
o Loan: LoanID (PK), LoanDate, DueDate, MemberID (FK), ISBN (FK)
o BookAuthor: BookID (FK), AuthorID (FK)

iv) Physical Design


Physical database design is to decide how the logical database design will be implemented. For
the relational database, this involves:

41
Defining a set of the table structures, data types for fields, and constraints on these tables such
as primary key, foreign key, unique key, not null and domain definitions to check if data are
out of the range.
Identifying the specific storage structures and access methods to retrieve data efficiently. For
example, adding a secondary index to a relation.
Designing security features for the database system including account creation, privilege
granting/revocation, access protection, and security level assignment.
Logical Design Compared with Physical Design

Generally, Physical design is where you translate the expected schemas into actual database
structures. At this time, you have to map:
Entities to Tables

Relationships to Foreign Keys

Attributes to Columns

Primary Unique Identifiers to the Primary Key

Unique Identifiers to Unique Keys

42
Example: Physical database design for a staff in a department are as follow:

Table: Staff

Column Name Data Type Constraints Description


Staff_id INT PRIMARY KEY, Unique identifier for each staff
AUTO_INCREMENT member.
First_name VARCHAR(50) NOT NULL Staff member's first name.
Last_name VARCHAR(50) NOT NULL Staff member's last name.
Email VARCHAR(100) UNIQUE, NOT Staff member's email address.
NULL
Hire_date DATE NOT NULL Date when the staff member was
hired.

Department_id INT FOREIGN KEY Links to the department table.

Salary DECIMAL(10,2) Staff member's salary.

Table: Department

Column Name Data Type Constraints Description

Department_id INT PRIMARY KEY, Unique identifier for the


AUTO_INCREMENT department.

Department_name VARCHAR(100) NOT NULL, UNIQUE Name of the department.

Manager_id INT FOREIGN KEY ID of the department


manager (linked to staff).

43
Indexes

1.6. Functional Dependencies

Functional Dependency (FD) determines the relation of one attribute to another attribute in a
database management system (DBMS) system. Functional dependency helps you to maintain the
quality of data in the database. A functional dependency is denoted by an arrow →. The functional
dependency of X on Y is represented by X → Y. Functional Dependency plays a vital role to find
the difference between good and bad database design.

Example

In this example, if we know the value of Employee number, we can obtain Employee Name, city,
salary, etc. By this, we can say that the city, Employee Name, and salary are functionally depended
on Employee number.

44
Key terms

Key Terms Description Terms

Axiom Axioms is a set of inference rules used to infer all the functional dependencies
on a relational database

Decomposition It is a rule that suggests if you have a table that appears to contain two entities
which are determined by the same primary key then you should consider
breaking them up into two different tables.

Dependent It is displayed on the right side of the functional dependency diagram

Determinant It is displayed on the left side of the functional dependency Diagram.

Union It suggests that if two tables are separate, and the PK is the same, you should
consider putting them. together

Rules of Functional Dependencies

Below given are the three most important rules for Functional Dependency:

Reflexive rule: If X is a set of attributes and Y is_subset_of X, then X holds a value of Y.


Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc also holds. That is
adding attributes which do not change the basic dependencies.
Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds
and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y.

 Types of Functional Dependencies


i. Multivalued dependency in DBMS

Multivalued dependency occurs in the situation where there are multiple independent multivalued
attributes in a single table. A multivalued dependency is a complete constraint between two sets
of attributes in a relation. It requires that certain tuples be present in a relation.

Example

45
In this example, maf_year and color are independent of each other but dependent on car_model.
In this example, these two columns are said to be multivalued dependent on car_model.

This dependence can be represented like this:

car_model -> maf_year


car_model-> colour
ii. Trivial Functional dependency:

The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X.

For example:

Consider this table with two columns Emp_id and Emp_name.


{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of
{Emp_id,Emp_name}.

46
iii. Non trivial functional dependency in DBMS

Functional dependency which also known as a nontrivial dependency occurs when A->B holds
true where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then
it is considered as a nontrivial dependency.

Company CEO AGE

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name) But CEO is not a
subset of Company, and hence it's non-trivial functional dependency.

iv. Transitive dependency:

A transitive is a type of functional dependency which happens when t is indirectly formed by two
functional dependencies.

Example:

o {Company} -> {CEO} (if we know the company, we know its CEO's name)
o {CEO} -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:

o {Company} -> {Age} should hold, that makes sense because if we know the company
name, we can know his age.

47
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.

1.7. Normalization using Functional Dependencies

The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.

X→Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, and Emp_Address. Here
Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

 Types of Functional dependency


i. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

Consider a table with two columns Employee_Id and Employee_Name. {Employee_id,


Employee_Name} →Employee_Id is a trivial functional dependency as Employee_Id is a subset
of {Employee_Id, Employee_Name}. Also, Employee_Id → Employee_Id and Employee_Name
→ Employee_Name are trivial dependencies too.

48
ii. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
o When an intersection B is NULL, then A → B is called as complete non-trivial.

Example:

o ID → Name,
o Name → DOB

1.7.1. Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
 Types of Normal Forms
i) First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

49
The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

ii) Second Normal Form (2NF)


In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

50
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is
a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table: TEACHER_SUBJECT table:

TEACHER_ID TEACHER_AGE TEACHER_ID SUBJECT


25 Chemistry
25 30 25 Biology
47 35 47 English
83 Math
83 38
83 Computer

iii) Third Normal Form (3NF)


A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.

A relation is in third normal form if it holds at least one of the following conditions for every non-
trivial function dependency X → Y.

X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.

51
Example: EMPLOYEE_DETAIL table:

Super key in the table above: {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME,
EMP_ZIP}....so on
Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime. Here,
EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
nonprime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key (EMP_ID).
It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.

52
1.8. Structured Query Language(SQL)

SQL is a standard database language used to access and manipulate data in databases. SQL stands
for Structured Query Language. It was developed by IBM Computer Scientists in the 1970s.
By executing queries SQL can create, update, delete, and retrieve data in databases like
MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with
databases.

SQL Example

We'll create a table to store employee information and populate it with sample data like
Employee_Id, Name, Age, Department, and Salary.

Employee_ID Name Age Department Salary


1 Ajay 25 HR 50000
2 Priya 30 Finance 60000
3 Rahul 28 IT 55000
4 Neha 35 Sales 65000
5 Amit 27 Marketing 52000
6 Anjali 32 HR 58000
7 Vikram 29 IT 56000
8 Meena 31 Finance 59000

53
If you want to retrieves data from the employees table where the salary is greater than 55000.00
then we will use SELECT Statement.

Query: SELECT * FROM employees WHERE Salary > 55000.00;

Employee_ID Name Age Department Salary

2 Priya 30 Finance 60000

4 Neha 35 Sales 65000

6 Anjali 32 HR 58000

7 Vikram 29 IT 56000

8 Meena 31 Finance 59000

Explanation:

This SELECT statement retrieves records from the employees table where the Salary column is
greater than 55000.00. The * symbol indicates that we want to retrieve all columns for the matching
records. The WHERE clause filters the results based on the specified condition.

 SQL Basics

It is a fundamental skill for anyone who wants to interact with databases. This standard Query
Language all users to create, manage, and retrieve data from relational databases. In this SQL
tutorial PDF, we have listed all the basics of SQL.

 Create Database in SQL


The first step to storing the information electronically using SQL includes creating database. And
in this section we will learn how to Create, Select, Drop, and Rename databases with examples.

SQL CREATE DATABASE


The CREATE DATABASE statement is a foundational SQL command used to create new
databases in SQL-based Database Management Systems (DBMS), including MySQL,
PostgreSQL, SQL Server, and others. Understanding how to use this command effectively is
crucial for developers, database administrators, and anyone working with relational databases.

A database is a repository that organizes data in structured formats through tables, views, stored
procedures, and other components.

54
Syntax: CREATE DATABASE database_name;

To show created databases:

Syntax: SHOW DATABASES;

Example: Create a database with the name IT

Create database IT;

Use IT

SQL DROP DATABASE


The DROP DATABASE command in SQL is used to completely delete a database and all of its
objects, such as tables, indexes, views, and other dependent objects. Once executed, this
operation cannot be undone, so it's crucial to back up the database before performing this action.

This command is typically used in scenarios where a database is no longer required or needs
to be removed for reorganization. Before running the command, ensure that we have
appropriate permissions to drop the database, as only database administrators typically have
the necessary rights.

o Purpose: To remove a database from the DBMS permanently.

o Outcome: All data, schema, and database objects are erased.

Syntax: DROP DATABASE database_name;

Examples of SQL DROP DATABASE

To demonstrate the functionality of the SQL DROP DATABASE command, Let's look at some
examples of the DROP DATABASE statement in SQL

Step 1: Create a Sample Database

First, let's create a database on which we will run the query. This will allow us to perform
operations and test the SQL DROP DATABASE command.

CREATE DATABASE Staff;

Step 2: Verify the Database

55
To confirm that the Staff database was successfully created, use the following command to list all
databases:

Query: Show DATABASE

Step 3: Drop the Database

Now that the database is confirmed to exist, let’s use the DROP DATABASE command, to delete
the database ‘Staff’.

Query: DROP DATABASE Staff

SQL DROP DATABASE IF EXISTS

To avoid any error while running the DROP DATABASE command use the IF EXISTS clause,
which will delete the database only if it exists in the system.

Syntax: DROP DATABASE IF EXISTS Database_Name;

Example: DROP DATABASE IF EXISTS Staff

SQL Query to Rename Database


Renaming a database in SQL is an essential task that database administrators and developers
frequently perform. Whether you’re reorganizing your data, correcting naming conventions, or
simply updating your project structure, knowing how to rename a database properly is critical.

How to Rename Database in SQL?

To rename a database in SQL, you’ll need to use the ALTER DATABASE command. The syntax
varies slightly between different SQL platforms, but the core functionality remains the same. The
ALTER DATABASE statement allows you to modify the properties of a database, including its
name. However, it’s important to note that while SQL Server uses the ALTER DATABASE
statement with a MODIFY NAME clause, MySQL employs a different approach using
the RENAME DATABASE statement.

To change the name of a database in SQL, use the syntax:

ALTER DATABASE [current_database_name]


MODIFY NAME = [new_database_name];

To rename a database in MySQL use the query:

56
RENAME DATABASE [current_database_name] TO [new_database_name];

SQL Rename Database Example

Let’s look at an example of how to rename a database in SQL.

First, we will create a database which will be renamed in the example:

Query: CREATE DATABASE Test

Rename database in SQL Example

In this example, we will use the ALTER command with MODIFY NAME clause to rename
the database.

Query: ALTER DATABASE Test MODIFY NAME = Example

Important Considerations When Renaming a Database

Database Availability: Please be aware that renaming a database may temporarily render
it inaccessible while the process is underway. It is advisable to schedule this task during
off-peak hours, particularly in a production environment.

Dependencies: Ensure that any applications, scripts, or users relying on the database name
are updated to reflect the new designation.

Permissions: Confirm that you possess the requisite permissions to rename the database,
which typically necessitates administrative or root-level access.

Backups: Prior to renaming a database, it is prudent to create a backup, especially in a


production environment, to mitigate the risk of data loss should any issues arise.

SQL Select Database


The USE DATABASE statement is a command in certain SQL-based database management
systems that allows users to select and set a specific database as the default for the current session.
By selecting a database, subsequent queries are executed within the context of that database,
making it easier to interact with tables and other objects contained within it.

Additionally, the SELECT statement in SQL is used to query and retrieve data from the tables
within the selected database. In this article, we will learn about SQL Select Database by
understanding various examples in detail and so on.

57
 USE DATABASE Statement
The USE DATABASE statement is not a standard SQL command, but rather a variation of
the USE command used in some SQL database management systems (DBMS) to select a
specific database for the current session.

This command sets the default database for subsequent queries in that session.

Syntax: USE database_name;

Example of SQL Select Database

Let’s take a look at how to select a database in SQL, using MySQL as an example. Suppose you
have a database called company_db that contains employee information.

o Create a Database: To begin, you can create a new database if it doesn’t already exist

CREATE DATABASE Technology;

Select the Database: To set Technology as the active database, use the USE command

USE Technology;
How to Query Data from the Selected Database

The SELECT statement in SQL is used to query and retrieve data from the tables within the
selected database. Here are some key ways to use the SELECT statement effectively.

58
Table: employees

ID Name Age Department Salary

1 Alice 30 Sales 50000

2 Bob 40 Marketing 60000

3 Charlie 35 Sales 55000

4 David 28 HR 45000

5 Eve 45 Marketing 65000

6 Frank 50 HR 70000

7 Grace 29 IT 48000

8 Hannah 38 IT 53000

1. Selecting Specific Columns

SELECT name, age FROM employees;

Output

ID Name

1 Alice

2 Bob

3 Charlie

4 David

5 Eve

6 Frank

7 Grace

8 Hannah

Explanation: Retrieves only the name and age columns for all rows.

59
2. Filtering Results with WHERE

SELECT name, age FROM employees WHERE age >= 35;

Output

Name Age

Bob 40

Charlie 35

Eve 45

Frank 50

Hannah 38

Explanation: Retrieves names and ages of employees older than 35.

3. Sorting Results with ORDER BY


SELECT name, age FROM employees ORDER BY age DESC;
Output

Name Age

Frank 50

Eve 45

Bob 40

Hannah 38

Charlie 35

Alice 30

Grace 29

David 28

Explanation: Retrieves the top 3 highest-paid employees, ordered by salary in descending order.

60
4. Aggregating Data with GROUP BY and Aggregation Functions
 SELECT department, AVG(salary) AS average_salary FROM employees GROUP BY
department;

Output
Department Average_Salary

Sales 52500

Marketing 62500

HR 57500

IT 50500

Explanation: Calculates the average salary for each department.

Tables in SQL

The cornerstone of any SQL database is the table. Basically, these structure functions is very
similar to spreadsheets, which store data in very organized grid format. In this section, you will
learn how to Create, Drop, Delete, and more related to Table.

 SQL CREATE TABLE

The SQL CREATE TABLE statement is a foundational command used to define and structure
a new table in a database. By specifying the columns, data types, and constraints such
as PRIMARY KEY, NOT NULL, and CHECK, helps you design the database schema.

In this article, we’ll learn the syntax, best practices, and practical examples of using the CREATE
TABLE statement in SQL. We’ll also cover how to create tables from existing data and
troubleshoot common errors.

SQL CREATE TABLE Statement

To create a new table in the database, use the SQL CREATE TABLE statement. A table’s
structure, including column names, data types, and constraints like NOT NULL, PRIMARY KEY,
and CHECK, are defined when it is created in SQL.

The CREATE TABLE command is a crucial tool for database administration because of these
limitations, which aid in ensuring data integrity.

61
Syntax: To create a table in SQL, use this CREATE TABLE syntax:

CREATE table table_name


(
Column1 datatype (size),
column2 datatype (size),
.
.
columnN datatype(size)
);
Table_name: The name you assign to the new table.

Column1, column2,… The names of the columns in the table.

Datatype (size): Defines the data type and size of each column.

Here table_name is name of the table, column is the name of column

Example: CREATE TABLE in SQL and Insert Data.

Let us create a table to store data of Customers, so the table name is Customer, Columns are
Name, Country, age, phone, and so on.
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(50),
LastName VARCHAR (50),
Country VARCHAR (50),
Age INT CHECK (Age >= 0 AND Age <= 99),
Phone int(10)
);
Output
Customer

CustomerID CustomerName LastName Country Age Phone

Empty

To add data to the table, we use INSERT INTO command, the syntax is as shown below:

Syntax: INSERT INTO table_name (column1, column2, …) VALUES (value1, value2, …);

Example Query

This query will add data in the table named Subject.

62
INSERT INTO Customer (CustomerID, CustomerName, LastName, Country, Age, Phone)

VALUES (1, 'Shubham', 'Thakur', 'India','23','xxxxxxxxxx'),


(2, 'Aman ', 'Chopra', 'Australia','21','xxxxxxxxxx'),
(3, 'Naveen', 'Tulasi', 'Sri lanka','24','xxxxxxxxxx'),
(4, 'Aditya', 'Arpan', 'Austria','21','xxxxxxxxxx'),
(5, 'Nishant', 'Jain', 'Spain','22','xxxxxxxxxx');
Customer

CustomerID CustomerName LastName Country Age Phone

1 Shubham Thakur India 23 xxxxxxxxxx

2 Aman Chopra Australia 21 xxxxxxxxxx

3 Naveen Tulasi Sri lanka 24 xxxxxxxxxx

4 Aditya Arpan Australia 21 xxxxxxxxxx

5 Nishant Jain Span 22 xxxxxxxxxx

Create Table From Another Table

We can also use CREATE TABLE to create a copy of an existing table. In the new table, it gets
the exact column definition all columns or specific columns can be selected.

If an existing table was used to create a new table, by default the new table would be populated
with the existing values from the old table.
Syntax:
CREATE TABLE new_table_name AS SELECT column1, column2,…
FROM existing_table_name
WHERE ….;
Query:
CREATE TABLE SubTable AS
SELECT CustomerID, CustomerName
FROM customer;

63
Output
Sub Table

CustomerID CustomerName

1 Shubham

2 Aman

3 Naveen

4 Aditya

5 Nishant

 SQL DROP TABLE

The DROP TABLE command in SQL is a powerful and essential tool used to permanently delete
a table from a database, along with all of its data, structure, and associated constraints such as
indexes, triggers, and permissions. When executed, this command removes the table and all its
contents, making it unrecoverable unless backed up.

DROP TABLE in SQL

The DROP TABLE statement in SQL is used to delete a table and all of its data from the database
permanently. This operation cannot be undone, and once the table is dropped all data in that table
is lost.

Once executed, this operation removes the table definition and all of its rows, so the table can no
longer be accessed or used. This action is irreversible which means that once a table is dropped,
it cannot be recovered unless there is a backup.

Syntax:

The syntax to use the DROP TABLE command in SQL is:

DROP TABLE table_name;

Example:

Step1: First, we will create a database and table on which the SQL queries will be run.

64
CREATE DATABASE NewCafe;
USE NewCafe;
CREATE TABLE categories (
CategoryID INT NOT NULL PRIMARY KEY,
CategoryName NVARCHAR(50) NOT NULL,
ItemDescription NVARCHAR(50) NOT NULL
);
INSERT INTO categories (CategoryID, CategoryName, ItemDescription)
VALUES
(1, 'Beverages', 'Soft Drinks'),
(2, 'Condiments', 'Sweet and Savory Sauces'),
(3, 'Confections', 'Sweet Breads');
SELECT * FROM categories;

Output
Results Messages

CategoryID CategoryName ItemDescription

1 1 Beverages SoftDrink

2 2 Condiments Sweet and Savoury Sauces

3 3 Confections Sweet bread

Step2: Now, let’s use the DROP TABLE statement to delete the categories table permanently.
Query:
DROP TABLE categories;
 SQL DELETE Statement

The SQL DELETE statement is one of the most commonly used commands in SQL
(Structured Query Language). It allows you to remove one or more rows from the table
depending on the situation. Unlike the DROP statement, which removes the entire table,
the DELETE statement removes data (rows) from the table retaining only the table structure,
constraints, and schema.

65
Syntax:

DELETE FROM table_name WHERE some_condition;

 ALTER (RENAME) in SQL

In SQL, making structural changes to a database is often necessary. Whether it’s renaming a
table or a column, adding new columns, or modifying data types, the SQL ALTER TABLE
command plays a critical role. This command provides flexibility to manage and adjust
database schemas without affecting the existing data.

 What is the ALTER Command in SQL?

The structure of an existing table can be changed by users using the SQL ALTER
TABLE command. If we need to rename a table, add a new column, or change the name of
an existing column in SQL, the ALTER command is crucial for
making schema changes without affecting the data that is already there. This is essential for
tasks like:

Renaming a table.

Syntax:

ALTER TABLE table_name

RENAME TO new_table_name;

Changing a column name.

Syntax:

ALTER TABLE table_name

RENAME COLUMN old_column_name TO new_column_name;

Adding or deleting columns.

Syntax:

ALTER TABLE table_name

ADD column_name datatype;

Modifying the data type of a column.

66
Syntax:

ALTER TABLE table_name

MODIFY COLUMN column_name new_datatype;

 Basic SQL Commands


Here is the list of some important SQL Commands:

Commands Descriptions

SELECT Used to retrieve data from a database.

INSERT Adds new records or rows to a table.

UPDATE Modifies existing records in a table.

DELETE Removes records from a table.

CREATE TABLE Creates a new table in the database.

ALTER TABLE Modifies an existing table structure.

DROP TABLE Deletes a table from the database.

CREATE INDEX Creates an index on a table column to speed up data


retrieval.

DROP INDEX Removes an index from a table.

CREATE VIEW Creates a virtual table based on the result of a SQL


statement.

DROP VIEW Deletes a view from the database.

INSERT INTO SELECT Inserts data from one table into another based on a
query.

TRUNCATE TABLE Removes all records from a table, but keeps the table
structure intact.

ALTER TABLE ADD Adds a constraint to a table after it has been created.
CONSTRAINT

67
GRANT Gives specific privileges to database users.

REVOKE Removes specific privileges from database users.

COMMIT Saves all changes made since the last commit.

ROLLBACK Discards all changes made since the last commit.

BEGIN TRANSACTION Starts a new transaction.

SET TRANSACTION Sets characteristics for the transaction.

68
Module Name Database System and Information Management

Course Name Advanced Database System (ITec3071)

Course After completing this course students will be able to understanding:


Objectives  Database query processing and optimization.
 Basics of transaction management procedure.
 Identify database security technique and procedure.
 Analyze and evaluate the different recovery methods when there is a
database failure.
 Compare and contrast a distributed database system in homogenous
and heterogeneous environments.

69
2. Introduction to Advanced Database Management System
Advanced Database Management involves the study of complex database systems, focusing on the
more intricate and sophisticated features of databases that are typically not covered in basic
database courses. It builds on the fundamental concepts of relational databases, extending them to
handle larger and more intricate data management challenges. This field is essential for managing
vast amounts of data in various industries, including finance, healthcare, e-commerce, and more.

Key Areas in Advanced Database Management

1. Database Design and Modeling

Normalization and Denormalization: Ensuring data is organized efficiently for query


performance and storage. Advanced techniques often deal with balancing normalization
(reducing redundancy) and denormalization (improving performance).

Entity-Relationship (ER) Models: Designing complex relationships between entities in


large datasets.

Object-Oriented Databases: Combining object-oriented programming principles with


database management for more flexible and reusable data models.

2. Transaction Management

ACID Properties: Ensuring that database transactions are reliable, following the properties
of Atomicity, Consistency, Isolation, and Durability.

Concurrency Control: Techniques to handle multiple users accessing or modifying the


database simultaneously without causing conflicts or inconsistency.

Distributed Databases: Managing databases spread across different locations or servers,


ensuring they work as a single system, maintaining consistency and reliability.

3. Indexing and Query Optimization

Indexes: Advanced indexing strategies, such as B-trees, hash indexing, and bitmap indexes,
are used to speed up search operations in large databases.

70
Query Optimization: Developing algorithms to improve the performance of SQL queries by
reducing their computational cost. This includes techniques like query rewriting, join
reordering, and parallel execution.

4. Data Warehousing and Big Data

Data Warehouses: Building large-scale systems for storing and querying vast amounts of
data, often used in business intelligence and analytics.

Big Data Management: Technologies such as Hadoop, NoSQL databases (e.g., MongoDB,
Cassandra), and distributed computing frameworks that handle large-scale, unstructured, or
semi-structured data.

5.NoSQL Databases

Types of NoSQL Databases: Exploring key-value stores, document-oriented databases,


column-family stores, and graph databases, which provide more flexibility for handling
unstructured or semi-structured data.

Use Cases: How NoSQL databases support applications requiring high scalability, real-time
data, and horizontal scaling.

6. Cloud Databases

Cloud-Based Database Systems: Managing databases hosted on cloud platforms like AWS,
Google Cloud, or Microsoft Azure. This includes scaling databases dynamically based on
demand.

Database as a Service (DBaaS): Offering managed database services with automatic scaling,
backup, and maintenance.

7. Data Security and Privacy

Encryption: Ensuring data is secure by applying encryption techniques both in transit and
at rest.

Access Control: Defining roles and permissions to restrict access to sensitive data.

Auditing and Compliance: Maintaining records of database activity to ensure adherence to


legal and regulatory standards.

71
8. Distributed and Parallel Databases

Distributed Systems: Dealing with data that is distributed across multiple machines or
locations, ensuring consistency and fault tolerance (CAP theorem).

Parallel Databases: Enhancing performance by distributing query processing across multiple


processors or servers.

9. Machine Learning and AI in Databases

Predictive Analytics: Using machine learning algorithms to predict trends based on data
stored in databases.

Automated Database Tuning: AI techniques that can optimize queries, index creation, and
database performance automatically.

2.1. Database query processing and optimization

2.1.1. Introduction to Query Processing

Query processing in a database system refers to the series of steps that are performed to translate
and execute a query. The goal of query processing is to efficiently retrieve the required data while
ensuring minimal resource consumption.

Steps in Query Processing

i. Parsing:

The SQL query is parsed to check its syntax and translate it into an internal representation.

The query is converted into a parse tree, which is then analyzed for correctness and
transformed into a query tree or relational algebra expression.

Syntax errors are detected in this step, such as missing clauses or incorrect keywords.

ii. Translation:

The query is transformed into an intermediate representation, often in the form of relational
algebra, which can be used to generate the actual execution plan.

For example, SQL queries are converted to relational algebra expressions like select, project,
and join operations.

72
iii. Optimization:

The main goal here is to improve the efficiency of the query execution. The database engine
explores different ways of executing the query to determine the most efficient plan.

The query optimizer evaluates multiple execution plans and chooses the one that minimizes
time or resource consumption (e.g., CPU, memory, I/O).

Optimizations can be logical (changing the sequence of operations) or physical (choosing


the best algorithm for each operation).

iv. Execution:

Once the optimal query execution plan is chosen, the database engine executes the query using
the selected strategy.

The result is retrieved from the database and returned to the user.

 Query Processing Example

Example Query:

SELECT customer_name, order_date


FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE order_date > '2024-12-05';

Step-by-Step Query Processing:

1. Parsing:

The database parses the SQL query to ensure its syntax is correct.

The SQL query is converted into an internal parse tree.

During this step, the SQL engine checks for errors like invalid column names or missing
FROM clauses.

2. Translation:

The parsed query is converted into relational algebra. For example, the JOIN operation
between the customers and orders table is translated into a relational algebra operation like
σ_order_date > '2024-01-01' (π_customer_name, order_date (customers ⨝ orders)).

73
The selection (σ) operation filters the rows based on the order_date, and projection (π) limits
the output columns to customer_name and order_date.

3. Optimization:

During optimization, the database may apply various techniques to improve query
performance, such as reordering operations or using indexes.

4. Execution:

After the query is optimized, the database engine executes it by performing the join operation,
filtering the data using the condition order_date > '2024-12-05', and returning the results.

2.1.2. Introduction to Query Optimization


Query optimization is a critical part of query processing. It involves improving the execution plan
to reduce the time and resources needed to retrieve the result. This process can be complex due to
the vast number of potential execution plans for a given query.

Types of Query Optimization:

I. Heuristic Optimization:

This optimization method involves applying a set of rules (heuristics) to the query in order to
rearrange its operations to minimize execution cost.

Common rules include:

o Selectivity Pushdown: Moving selection operations as early as possible in the query plan
to reduce the number of rows involved.

o Join Commutativity: Changing the order of joins, as certain join orders can be more
efficient.

o Projection Pushdown: Moving projection (column selection) as early as possible to


reduce the number of columns being worked on during subsequent operations.

74
II. Cost-Based Optimization
Types of Query Optimization

 Heuristic Optimization:

o This optimization method involves applying a set of rules (heuristics) to the query in order to
rearrange its operations to minimize execution cost.

o Common rules include:

 Selectivity Pushdown: Moving selection operations as early as possible in the query plan
to reduce the number of rows involved.

 Join Commutativity: Changing the order of joins, as certain join orders can be more
efficient.

 Projection Pushdown: Moving projection (column selection) as early as possible to


reduce the number of columns being worked on during subsequent operations.

 Cost-Based Optimization:

o This involves evaluating different execution plans based on their cost, which is usually
a combination of factors like:

 CPU time: The time needed to process the query.

 I/O cost: The number of disk reads and writes needed.

 Memory usage: The amount of memory required.

o A cost model is used to calculate the cost for each potential plan. The query optimizer then
chooses the plan with the lowest cost.

III. Transformational Rules:

o These are used in both heuristic and cost-based optimization. Transformational rules allow
a query to be restructured into an equivalent query that may be more efficient to execute.

o Common transformations include:

 Join reordering: Changing the order in which tables are joined.

75
 Subquery flattening: Rewriting subqueries into joins or other more efficient
expressions.

IV. Join Optimization:

a. Joins are typically the most expensive operations in a query, so much of optimization is
aimed at choosing the most efficient join algorithm.

b. Common types of join algorithms include:

i. Nested-loop join: A simple but often inefficient algorithm.

ii. Merge join: Efficient when the input tables are already sorted.

iii. Hash join: Efficient for large, unsorted datasets.

 Join Ordering
The order in which joins are performed can significantly impact the query performance.
Optimizers use a search space to find the most efficient join order, but this can be
computationally expensive.
Dynamic programming and greedy algorithms are often used to explore different join
orders and choose the most efficient one.

 Physical Query Optimization

Physical query optimization focuses on selecting the best physical implementation for each
relational operation in the query plan. It involves:

Index Selection: Choosing whether to use indexes for searching or sorting data, based on
their availability and cost.

Access Path Selection: The physical method used to access data (e.g., table scan, index
scan).

Join Strategy Selection: Choosing the best algorithm for joins (nested-loop, hash join, etc.).

Sorting: Deciding whether to use external sorting, in-memory sorting, or other methods.

76
 Advanced Optimization Techniques

I. Materialized Views:

These are precomputed results of a query stored in the database. The optimizer can choose to
use a materialized view instead of recalculating a complex query repeatedly.

II. Query Rewriting:

Sometimes queries can be rewritten to improve performance. For example, converting a query
that requires a full scan to one that can use an index or eliminating redundant operations.

III. Distributed Query Optimization:

In distributed databases, optimization must account for the cost of network communication.
It must decide which part of the query to execute on which node and how to handle data
transfer.
 Query Optimization Examples
Optimization Example 1: Using Indexes

Consider the query:

SELECT product_name FROM products WHERE product_id = 123;

Without Index: If there is no index on product_id, the database will have to scan all the rows
in the products table to find the matching product. This can be inefficient for large tables.

With Index: If there is an index on product_id, the database can quickly locate the product
using the index, reducing the need to scan the entire table. This drastically improves
performance.

Optimized Query Plan with Index:

The optimizer may choose to use an Index Scan instead of a Full Table Scan, which is a
more efficient way to retrieve the data.

77
2.2. Transaction Management Procedure

Transaction management is a systematic process used to ensure the proper handling of database
transactions, maintaining data consistency, integrity, and security throughout their lifecycle.
Below is a standard procedure for managing transactions effectively.

 Transaction Initiation

Start the Transaction: A transaction begins when an application or user requests to perform
an operation that will modify the database. This is typically done by sending a BEGIN
TRANSACTION command (in SQL).

Establish Transaction Context: Ensure the necessary context (like user identification and
session details) is established for auditing and tracking.

 Transaction Execution

Data Manipulation: During the execution phase, the transaction performs operations such
as inserts, updates, or deletes on the database. These operations are not yet permanent.

Maintain ACID Properties:

o Atomicity: Ensure that operations are grouped in a way that they all succeed or fail
together. If any operation fails, the entire transaction must be rolled back.

o Consistency: Ensure that the database maintains its consistency rules, such as
constraints, triggers, and referential integrity.

o Isolation: Ensure that the changes made by the transaction are not visible to other
transactions until the transaction is committed. Techniques like locking, multiversion
concurrency control (MVCC), or isolation levels (e.g., Serializable, Read Committed) may
be used.

o Durability: Once committed, the transaction’s effects must be permanent, and even if the
system crashes afterward, the data must remain intact.

 Concurrency Control

o Locking Mechanisms: Use locks (e.g., shared or exclusive locks) to prevent conflicts
between concurrent transactions that may lead to inconsistencies or conflicts.

78
o Deadlock Detection and Resolution: Implement techniques for detecting deadlocks
(where two or more transactions are waiting for each other to release resources) and take
corrective actions, such as rolling back one of the transactions.

o Optimistic Concurrency: In some cases, allow transactions to execute without locks and
only validate at commit time, rolling back if conflicts are detected.

 Commit or Rollback

Commit Transaction: Once the transaction has completed successfully and all operations are
ready to be finalized, issue a COMMIT statement to make all changes permanent.

Rollback Transaction: If an error occurs or the transaction cannot be completed (e.g., due to
a violation of constraints, business logic failure, or system failure), issue a ROLLBACK
statement to undo all changes made during the transaction.

 Error Handling
Transaction Failure: If any part of the transaction encounters an error (e.g., a constraint
violation, a system crash, or an integrity violation), the transaction should be rolled back to
maintain consistency.
Automatic Recovery: Implement recovery mechanisms to restore the system to a consistent
state after a failure. This may involve transaction logs, backups, or journaling.
Logging: Maintain detailed logs of all transaction activities. These logs can be used for
recovery and auditing purposes.
 Logging and Auditing

Transaction Log: Keep a detailed record of all changes made during the transaction,
including the start and end time, data modified, and the user initiating the transaction.

Audit Trails: Ensure that a secure audit trail is kept, recording any changes made to critical
data, especially for sensitive operations like financial transactions

 Transaction State Management

Active: When the transaction is in progress and has not yet been committed or rolled back.

Partially Committed: When the transaction has executed, but has not yet been fully
committed.

79
Committed: When the transaction has been successfully completed and the changes are
permanent.

Aborted: When the transaction has failed and any changes have been undone.

 Concurrency Isolation Levels

Read Uncommitted: Transactions can see uncommitted changes from other transactions,
which can lead to dirty reads.

Read Committed: A transaction can only read committed data, avoiding dirty reads but still
susceptible to non-repeatable reads.

Repeatable Read: Prevents dirty and non-repeatable reads by locking the data being read,
ensuring the same result if read multiple times.

Serializable: The highest isolation level, ensuring that transactions are executed as if they were
processed serially (one after the other), eliminating all concurrency issues.

 Performance Optimization

Minimize Transaction Duration: Keep transactions short and avoid holding locks for
extended periods, as long transactions can reduce system throughput and cause contention.

Batch Processing: For large data modifications, consider using batch processing to reduce
the load on the system.

Use Indexing: Ensure proper indexing of frequently accessed columns to speed up data
retrieval during transactions.

Avoid Blocking: Use techniques like deadlock detection and transaction queuing to prevent
transactions from blocking each other unnecessarily.

 Compliance and Security

Data Integrity: Ensure that all transactions respect the integrity constraints of the database,
such as foreign keys, uniqueness, and check constraints.

Authorization: Only authorized users or systems should be allowed to initiate or modify


transactions.

80
Confidentiality: Sensitive data should be encrypted, and access should be restricted to
prevent unauthorized access during the transaction.

 System Recovery and Transaction Logging

Transaction Logs: Keep a detailed log of all transactions that include both committed and
aborted transactions to allow for recovery in case of failure.

Checkpoints and Rollback Segments: Implement checkpoints in the system to periodically


save the state of the system to enable faster recovery after a crash.

 Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. The main advantages of
Concurrency executions are:

o Increased processor and disk utilization, leading to better transaction throughput

E.g. one transaction can be using the CPU while another is reading from or writing to the disk

o Reduced average response time for transactions: short transactions need not wait behind long
ones.
o Concurrency control schemes: mechanisms to achieve isolation

That is, to control the interaction among the concurrent transactions in order to prevent them from
destroying the consistency of the database.

 Schedules: a sequences of instructions that specify the chronological order in which


instructions of concurrent transactions are executed.
o A schedule for a set of transactions must consist of all instructions of those transactions. Must
preserve the order in which the instructions appear in each individual transaction.
o A transaction that successfully completes its execution will have a commit instructions as the
last statement. By default transaction assumed to execute commit instruction as its last step.
o A transaction that fails to successfully complete its execution will have an abort instruction as
the last statement.

81
 Schedule 1
o Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
o An example of a serial schedule in which T1 is followed by T2 :

 Schedule 2

A serial schedule in which T2 is followed by T1 :

 Schedule 3
o Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.

82
Note : In schedules 1, 2 and 3, the sum A + B is preserved

 Serializability

Basic Assumption –Each transaction preserves database consistency. Thus, serial execution of a
set of transactions preserves database consistency. A (possibly concurrent) schedule is serializable
if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the
notions of:

o Conflict serializability

o View serializability

 Simplified view of transactions


o We ignore operations other than read and write instructions  We assume that transactions may
perform arbitrary computations on data in local buffers in between reads and writes.  Our
simplified schedules consist of only read and write instructions.
o Conflicting Instructions  Let li and lj be two Instructions of transactions Ti and Tj respectively.
Instructions li and lj conflict if and only if there exists some item Q accessed by both li and lj ,
and at least one of these instructions wrote Q.
I. li = read(Q), lj = read(Q). li and lj don’tconflict.
II. li = read(Q), lj = write(Q). They conflict.
III. li = write(Q), lj = read(Q). They conflict

83
IV. li = write(Q), lj = write(Q). They conflict
o Intuitively, a conflict between li and lj forces a (logical) temporal order between them.  If li
and lj are consecutive in a schedule and they do not conflict, their results would remain the
same even if they had been interchanged in the schedule.
 Conflict Serializability

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting


instructions, we say that S and S´ are conflict equivalent. We say that a schedule S is conflict
serializable if it is conflict equivalent to a serial schedule

Conflict Serializability (Cont.) Schedule 3 can be transformed into Schedule 6 a serial


schedulewhere T2 follows T1, by a series of swaps of non-conflicting instructions. Therefore,
Schedule 3 is conflict serializable.

Conflict Serializability (Cont.)  Example of a schedule that is not conflict serializable:

84
We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3
, T4 >, or the serial schedule < T4 , T3 >.

2.3. Introduction to Database security

Database security refers to the measures, policies, and practices employed to protect database
systems against unauthorized access, misuse, and malicious attacks. It ensures the confidentiality,
integrity, and availability of the data stored within the database, making it a critical aspect of any
organization's overall cybersecurity strategy.

As organizations increasingly rely on databases to store sensitive information, such as customer


records, financial data, and intellectual property, the importance of robust database security has
grown. The risks of neglecting database security include data breaches, financial loss, reputational
damage, and compliance violations.

A. Key Objectives of Database Security

Confidentiality: Ensuring that only authorized users can access sensitive data.

Integrity: Maintaining the accuracy and reliability of the data.

Availability: Guaranteeing that authorized users have reliable access to data when needed.

B. Common Threats to Database Security

Unauthorized Access: Gaining access to the database without proper permissions.

SQL Injection: Exploiting vulnerabilities in database queries to manipulate data.

Malware and Ransomware: Infecting databases with malicious software.

Insider Threats: Employees or contractors misusing their access privileges.

Data Leaks: Accidental or intentional exposure of sensitive data.

Physical Threats: Theft or damage to the hardware hosting the database.

2.3.1. Database Security technique and procedure

A. Authentication and Authorization

Authentication: is a critical aspect of database security that ensures that only authorized users or
systems can access a database and its associated resources. It is the process of verifying the identity

85
of users or applications attempting to interact with the database. Proper authentication mechanisms
prevent unauthorized access and help protect sensitive data.

 Common methods include:


Password-based Authentication: Users must provide a username and password to gain
access.
Multi-factor Authentication (MFA): Requires multiple forms of identification, such as a
password and a code sent to a mobile device, for added security.
Single Sign-On (SSO): Allows users to authenticate once and gain access to multiple systems
or databases.
Biometric Authentication: Uses physical characteristics like fingerprints or facial
recognition.

Authorization: in database security refers to the process of granting or denying access to specific
resources within a database system based on user roles and permissions. It determines what actions
a user can perform on the database, such as reading, writing, modifying, or deleting data.

This can be done using:

Role-Based Access Control (RBAC): Users are assigned roles with specific privileges. For
example, an admin has full access, while a regular user may only have read access.
Discretionary Access Control (DAC): The owner of the data determines who has access.
Mandatory Access Control (MAC): Access control policies are enforced by the system, not
the user.
Attribute-Based Access Control (ABAC): Uses attributes (e.g., user roles, location) to define
access permissions.
B. Data Encryption

Encryption at Rest: Ensures that data stored in the database (e.g., files, backups) is encrypted,
making it unreadable without the proper decryption key. Techniques include:

o Transparent Data Encryption (TDE): Encrypts the entire database file system.

o Column-Level Encryption: Encrypts sensitive fields, such as Social Security numbers or


credit card details, within the database.

86
Encryption in Transit: Protects data as it moves between the database and applications, using
protocols like SSL/TLS to prevent interception and tampering.

C. Database Auditing and Monitoring

Database Activity Monitoring (DAM): Tracks and analyzes all database activity, such as
who is accessing the data, what actions they are performing, and when.

Audit Logs: Maintain a detailed record of database interactions for security reviews and
regulatory compliance. Logs should be immutable and regularly reviewed.

Alerting: Automatically notifies administrators of suspicious activities, such as unauthorized


access or large-scale data exports.

D. Backup and Recovery

Regular Backups: Periodically back up the database to prevent data loss. Backups can be full
(entire database), incremental (only changes since last backup), or differential (changes since
last full backup).

Offsite Backups: Store backups in a separate physical location (or the cloud) to protect against
local disasters like fires, floods, or theft.

Automated Backup Systems: Ensure regular backups without manual intervention,


minimizing human error.

Restore Testing: Periodically test backup restoration processes to ensure data recovery is
possible in case of a disaster.

E. Database Patching and Updates

Regular Patch Management: Ensure that the database management system (DBMS) is
regularly updated to fix vulnerabilities. This includes applying patches for security fixes, bug
fixes, and performance improvements.

Automated Patch Management: Systems can be configured to automatically download and


apply critical patches to reduce the window of vulnerability.

F. Network Security

87
Firewall Protection: Use network firewalls to block unauthorized access to the database from
external sources.

Virtual Private Network (VPN): Encrypts the connection between remote users and the
database, ensuring secure access over untrusted networks.

IP Whitelisting: Only allow database access from predefined IP addresses to reduce the attack
surface.

Segmentation: Use network segmentation to isolate databases from other systems or


applications, reducing the risk of lateral movement in case of a breach.

G. Database Hardening

Removing Unnecessary Features: Disable unused database features or services to minimize


the attack surface.

Strong Password Policies: Enforce strong, complex passwords for database users, and use
password vaults for secure storage.

Minimum Privilege Principle: Ensure users have only the minimum privileges necessary to
perform their job functions.

Account Lockout: Lock accounts after a certain number of failed login attempts to protect
against brute-force attacks.

H. Database Integrity and Consistency

Checksums: Use checksums to verify that data hasn’t been tampered with or corrupted.

Data Validation: Ensure that data entered into the database is consistent, complete, and
adheres to defined rules (e.g., field lengths, data types).

Referential Integrity: Use foreign keys and constraints to ensure data consistency across
tables, preventing orphaned records and other integrity violations.

I. Data Masking

Static Data Masking: Creates a copy of the database with sensitive data obscured for testing
or development purposes.

88
Dynamic Data Masking: Masks sensitive data on the fly when it is queried, ensuring that
sensitive data is not exposed to unauthorized users.

Tokenization: Replaces sensitive data with a token that can be used for processing but cannot
be reverse-engineered to reveal the original data.

J. Database Security Testing

Database security testing is the process of evaluating the security of a database to identify
vulnerabilities, weaknesses, or misconfigurations that could lead to unauthorized access, data
breaches, or other security incidents. This type of testing is essential for protecting sensitive
information stored within databases and ensuring compliance with regulatory requirements.

Penetration Testing: Regularly test the database for vulnerabilities and attempt to exploit
weaknesses to identify and address potential security gaps.

Vulnerability Scanning: Use automated tools to identify known vulnerabilities in the


database software and configurations.

Security Audits: Regular security audits by internal or third-party experts to identify


potential weaknesses and ensure compliance with security standards.

2.4. Database Failure and Recovery Methods

Introduction

Database failures can disrupt operations, result in data loss, and impact business continuity. To
address these challenges, understanding the causes of database failures and implementing effective
recovery methods is essential.

2.4.1. Types of Database Failures


i) Transaction Failures

A transaction failure in a database management system (DBMS) occurs when a transaction is


unable to complete. This can happen for a number of reasons, including, logical errors, syntax
errors, system crashes, catastrophic failure, etc. It occur when a transaction cannot complete due
to logical errors, deadlocks, or system-imposed constraints.

89
Example: A customer initiates a bank transaction to transfer $1,000 from their Savings Account
to their Checking Account.

1. The database debits $1,000 from the Savings Account.

2. Before it credits the amount to the Checking Account, the database server crashes due to a
sudden power outage.

As a result:

 The Savings Account shows a $1,000 deduction.

 The Checking Account does not reflect the corresponding credit.

This failure leaves the database in an inconsistent state. To address this, the database recovery
system would use a transaction log to roll back the incomplete transaction, ensuring no partial
updates are retained and maintaining consistency.

ii) System Failures

System failures in database systems refer to unexpected disruptions or malfunctions that affect the
normal operations of a database. These failures can result in data loss, inconsistencies, or
downtime, potentially impacting the integrity and availability of the stored information.
Understanding system failures is critical for designing robust database systems that ensure data
reliability and recoverability. Result from hardware or software issues causing the database server
to crash.

 Example: The 2000 Microsoft SQL Server Failure


A well-known real-world example of a system failure in a database system is the Microsoft SQL
Server failure of 2000, which caused significant downtime for several businesses.

 Scenario:

In 2000, Microsoft SQL Server experienced a major system failure in a large-scale, high-traffic e-
commerce application during a critical shopping period. The failure was caused by a bug in the
SQL Server’s transaction log system, which led to a deadlock situation. The bug prevented the
system from writing transaction logs properly, thus leading to database corruption.

 Details of the Failure:

90
 Transaction Log Corruption: SQL Server uses transaction logs to ensure ACID
properties (Atomicity, Consistency, Isolation, Durability). In this case, the transaction log
got corrupted during the execution of a transaction. Because the logs couldn't be written
properly, the system was unable to recover from the failure.

 Crash and Data Inconsistency: As a result of the corrupted logs, several transactions
couldn't be rolled back or completed, leading to inconsistent data within the database.
Users trying to perform operations on the affected system encountered errors.

 Downtime: The failure led to several hours of downtime for the affected systems, which
was detrimental to the businesses relying on SQL Server for handling customer
transactions.

 Consequences:

 Data Loss and Inconsistency: Some data was lost, and other parts of the database became
inconsistent. Even though data could be partially recovered from backups, the system had
issues with incomplete transactions, leading to partial orders and inconsistent customer
records.

 Business Impact: Many online transactions could not be processed during the peak
shopping period, resulting in revenue loss, customer dissatisfaction, and reputational
damage.

 Recovery Efforts: The failure required a lengthy disaster recovery process, with
Microsoft providing patches and hotfixes to prevent the issue from reoccurring.

iii) Media Failures


In database systems, media failures refer to problems caused by physical or logical issues in the
storage media where data is stored. These failures can disrupt database operations, leading to
potential data loss, corruption, or downtime.

 Example

Imagine a company that uses a relational database management system (RDBMS) like MySQL,
PostgreSQL, or Oracle. The database stores critical data for the company, such as customer

91
information, orders, and financial transactions. This data is stored on physical disk drives in the
form of files that make up the database.

Media Failure

If the disk on which the database files are stored fails (e.g., due to hardware malfunction, bad
sectors, or power surges), the database system will be unable to access the files that contain the
critical data. This constitutes a media failure.

 Impact
o Data Loss: The data stored on the failed disk may be lost, especially if there are no backups
or redundant storage mechanisms (such as RAID) in place.
o System Downtime: The database will be unavailable until the media failure is resolved,
causing downtime for applications and services relying on the database.
o Corruption: In some cases, if the failure happens during a write operation, the database could
become corrupted, making it impossible to restore or read the data.

iv) Human Errors

Human errors in database systems can significantly impact data integrity, security, and
performance. These errors may stem from user mistakes, misconfigurations, or lack of training.
Here are some common types of human errors in database systems:
Example
A common human error in a database management system (DBMS) is incorrect data entry. For
example, a user accidentally enters "1234" as a phone number instead of "234-567-8901." This
mistake could lead to data inconsistency and cause issues in data retrieval or reporting.
v) Natural Disasters

In the context of a database management system (DBMS), natural disasters can refer to events
like earthquakes, floods, fires, or other calamities that can physically damage the hardware storing
the database, causing data loss or corruption. For instance:

Floods can damage servers and storage devices, leading to potential loss of critical data.

Earthquakes can cause server racks or data centers to collapse, resulting in system downtime
and the inability to access databases.

92
Fires can destroy physical infrastructure, including databases, if not properly backed up or
stored off-site.

vi) Malicious Attacks

Malicious attacks can lead to database failures in several ways, causing both direct and indirect
impacts. Here are some common malicious attack vectors that can compromise database integrity
and result in failure:

 SQL Injection
 How it Leads to Failure: Attackers inject malicious SQL queries into input fields or
application parameters. If these queries are executed by the database, the attacker could
retrieve, modify, or delete sensitive data. In some cases, attackers might even drop entire
tables or databases, causing the system to fail.

 Example: An attacker enters '; DROP TABLE users;-- into a form field, which could delete
the users table if not properly sanitized.

 Denial of Service (DoS) or Distributed Denial of Service (DDoS) Attacks


 How it Leads to Failure: Malicious actors flood the database with an overwhelming number
of requests, causing resource exhaustion. This overload can cause the database to crash or
become unresponsive, leading to service downtime.

 Example: A DDoS attack may flood a database server with thousands of requests per second,
eventually causing it to crash due to high CPU usage or memory depletion.

 Man-in-the-Middle (MitM) Attacks


 How it Leads to Failure: In MitM attacks, the attacker intercepts communication between
the database and its clients, potentially altering or stealing data. This can lead to data
corruption or loss, especially if database integrity checks are bypassed.

 Example: Sensitive data, such as login credentials, is intercepted during transmission


between a user and the database, allowing an attacker to hijack sessions and make
unauthorized changes.

93
 Brute Force Attacks
 How it leads to Failure: Attackers may use brute force techniques to guess database login
credentials. Once successful, they gain unauthorized access, which can be used to cause
damage, including data corruption or deletion.
 Example: An attacker continuously tries different password combinations until they gain
access to the database, where they delete or modify key records.
 Key Impacts of Malicious Attacks on Databases:
 Loss of Data: Attackers can delete, corrupt, or exfiltrate sensitive information, leading to
significant data loss.

 Data Integrity Issues: Altered or corrupted data may lead to inconsistent or unreliable
information, causing system failures.

 Business Disruption: Extended downtime due to an attack can halt business operations,
resulting in financial losses.

 Reputational Damage: A database failure due to a malicious attack can harm an


organization's reputation and erode customer trust.

2.4.2. Recovery Methods


Database recovery refers to the process of restoring a database to a correct and consistent state
after a failure, such as hardware failure, software bugs, or human errors. Various methods can be
employed to ensure minimal data loss and restore databases to their previous state. Here are some
common database recovery methods:

 Full Database Backup and Restore

A full database backup involves creating a complete copy of all data, schema, and transaction logs
in the database. This ensures that if data loss or corruption occurs, the database can be fully restored
to its state at the time of the backup.

Backup Process:

i. Full Backup: A snapshot of the entire database is taken, including data, indexes, and system
files.

ii. Store Backup: The backup file is stored securely for future recovery.

94
Restore Process:

i. Full Restore: During recovery, the full backup is applied to restore the entire database to its
last backup point.
ii. Point-in-Time Restore (Optional): If combined with transaction log backups, the database can
be restored to a specific point in time.
 Transaction Log Recovery
Transaction log recovery is a process in which the database uses transaction logs to restore data
after a failure, ensuring data consistency and integrity.
Process:
ii. Transaction Logs: These logs track all changes (inserts, updates, deletes) made to the
database, allowing for precise recovery.
iii. Restore Full Backup: The most recent full backup is restored as the base for recovery.
iv. Apply Transaction Logs: After restoring the full backup, the transaction logs are applied
in sequence to roll forward all committed transactions and undo any incomplete or
uncommitted transactions.
Transaction log recovery allows databases to be restored to a consistent state, either to the last
backup or to a specific point in time, minimizing data loss and ensuring system reliability.

 RAID (Redundant Array of Independent Disks) and Mirroring

Both techniques used in database failure recovery to ensure data availability, redundancy, and fault
tolerance in the event of hardware failures.

Redundant Array of Independent Disks(RAID):

o RAID 1 (Mirroring): Data is copied to two or more disks. If one fails, the other keeps the
data available.

o RAID 5/6 (Parity): Distributes data with parity across multiple disks. Can recover from
one or two disk failures.

o RAID 10 (1+0): Combines RAID 1 and RAID 0 for redundancy and performance.

Mirroring:

o Involves duplicating database data to a secondary location. If the primary fails, the system
switches to the mirrored copy, ensuring minimal downtime.

95
 Choosing the Right Recovery Method
The choice of recovery method depends on:

 Criticality of Data: Mission-critical systems benefit from transaction log recovery or high-
availability solutions.

 Database Size: Differential or incremental backups are suitable for large databases.

 Cost Considerations: RAID and clustering incur higher costs but offer near-instant
recovery.

 Recovery Time Objective (RTO): High-availability and checkpoint-based methods


minimize downtime.

2.5. Distributed Database system

A Distributed Database System (DDBS) is a collection of interconnected databases spread across


multiple locations. These systems are designed to manage, store, and retrieve data efficiently while
maintaining consistency across all nodes in the network. Unlike traditional centralized databases,
distributed databases distribute data across several physical locations, which can be geographically
dispersed or within a single organization.

 Key Characteristics of Distributed Database Systems


Data Distribution:

 Data is stored across multiple sites or nodes, which may reside in different physical
locations.

 Each node operates independently but collaborates to process queries.

Transparency:

 Location Transparency: Users can access data without needing to know where it is located.

 Replication Transparency: Users are unaware of data replication, ensuring consistency


across replicas.

 Fragmentation Transparency: Users don’t need to know how data is partitioned across
nodes.

96
Autonomy:

 Each node can function autonomously and may have its database management system
(DBMS).

 Nodes can manage their operations while still being part of the larger distributed system.

Scalability:

 DDBS can handle increasing amounts of data and users by adding more nodes.

 Performance remains efficient even as the system grows.

Fault Tolerance:

o Distributed systems are designed to continue functioning even if some nodes fail, ensuring
high availability and reliability.

2.5.1. Types of Distributed Database System

i) Homogeneous Databases

Homogeneous databases are characterized by all participating nodes sharing the same database
management system (DBMS) and schema structure. These databases are designed to offer a
unified and consistent view of data across all nodes.

Key Characteristics:

i. Uniform Software: All nodes use the same DBMS software. This ensures compatibility and
avoids issues arising from differences in database functionalities.

ii. Consistent Schema: The database schema is identical across all nodes. This means that data
representation, table structures, and relationships are uniform.

iii. Ease of Management: Due to the homogeneity, managing and maintaining the system is
easier compared to heterogeneous systems, as the tools and configurations are standardized.

iv. Transparency: Homogeneous systems provide location and distribution transparency,


making it easier for users to interact with the system without worrying about where data is
stored.

97
v. Simplified Communication: Since all nodes use the same software, they can communicate
directly without the need for additional middleware or translation mechanisms.

Example: Consider a distributed e-commerce system with multiple nodes handling customer
orders, inventory management, and shipping. Employing a homogeneous database approach, all
nodes share the same DBMS (e.g., MySQL) and adhere to a consistent schema. When an order
is placed on one node, the system automatically synchronizes the order details and inventory
updates across all nodes, enabling real-time visibility and consistency.

ii) Heterogeneous Database

Contrary to homogeneous databases, heterogeneous databases allow nodes in a distributed system


to use different DBMS software or possess varying database schemas. This approach caters to
diverse requirements and facilitates seamless integration between nodes employing different
technologies.

It is a distributed database system where the databases are stored across multiple locations and
managed by different database management systems (DBMS), which may have diverse structures,
data models, query languages, and access protocols. The main purpose of such a system is to
provide a unified interface for accessing and managing the distributed, heterogeneous data.
 Key Characteristics of Heterogeneous Distributed Database Systems
Diverse Data Models: Combines relational, NoSQL, hierarchical, or object-oriented
databases with varying schemas.
Different DBMS: Integrates databases managed by different systems (e.g., Oracle,
MySQL, MongoDB).
Autonomy: Each database operates independently with its own control.
Geographic Distribution: Data is physically distributed across multiple locations.
Query Translation: Converts queries to match the specific syntax of each database.
Middleware Coordination: Uses middleware to enable communication and data
integration between diverse systems.

98
 Homogeneous vs Heterogeneous Distributed Database System

Aspect Homogeneous Distributed Heterogeneous Distributed


Database Database
Definition A distributed database where all A distributed database where nodes
nodes use the same DBMS use different DBMS (Database
(Database Management System). Management Systems).
DBMS Consistent DBMS across all sites. DBMSs may be different at
Consistency different sites, leading to possible
inconsistency.
Architecture Uniform architecture across all Mixed architecture with different
nodes. systems and database models at
different sites.
Data Model Same data model (e.g., relational, Different data models (e.g.,
object-oriented) across all nodes. relational, NoSQL, object-oriented)
at different sites.
Query Language Same query language (e.g., SQL) Different query languages may be
used at all sites. used at different sites (e.g., SQL,
NoSQL, etc.).
Data Management Simplified data management as all Complex data management due to
nodes follow the same standards the heterogeneity of the DBMS and
and protocols. data models.
Data Integration Easier data integration due to More complex data integration as
uniformity in systems and different DBMSs require different
protocols. interfaces and connectors.
Communication Similar communication protocols Different communication protocols
Protocols across all nodes. may be required between
heterogeneous DBMSs.
Security Easier to implement uniform More challenging to implement
security measures across all nodes. uniform security policies due to
different systems.

99
Consistency More straightforward consistency Complex consistency models, as
Models models due to the same DBMS. different systems may follow
different consistency paradigms.
Data Distribution Data distribution is easier to Data distribution is more
manage as all nodes follow the challenging due to varying data
same principles. models and systems.
Replication Simplified replication process as Replication is more complicated
the same DBMS is used across all due to different DBMSs with
sites. varying replication strategies.
Performance Generally better performance due Performance may vary depending
to uniformity in architecture. on how different systems are
optimized or interact.
Example Distributed MySQL, Distributed A system with a combination of
PostgreSQL. MySQL, MongoDB, and Oracle
databases at different sites.

2.6. Query Processing in Distributed Databases

Query processing in distributed databases refers to the process of executing a query across multiple
interconnected databases, which can be spread across different locations, machines, or even
organizations. The goal is to ensure that the query is executed efficiently, data is fetched from the
appropriate sites, and results are returned in a timely manner.

Here are the key steps and concepts involved in query processing for distributed databases:
1. Query Decomposition
Parsing: The query is parsed to understand the structure, such as selecting tables, columns,
and conditions.
Translation: The high-level query (e.g., SQL) is translated into a more abstract internal
query representation, such as a relational algebra expression.
Decomposition: The query is decomposed into subqueries based on data distribution.
These subqueries can be run on different databases in the system. The challenge is to
determine how to break down the query efficiently based on data location and availability.

100
2. Query Optimization
Centralized vs. Distributed Optimization: In distributed databases, query optimization
can be challenging because the optimizer must consider the distribution of data across
nodes and the cost of data transfer between them.
Join Ordering: The order in which joins are executed can significantly affect performance.
The optimizer must choose the most efficient join strategy (e.g., nested loops, hash join)
considering the location of data.
Minimizing Data Transfer: Transferring large amounts of data over the network can be
expensive. The optimizer may choose to filter or aggregate data early in the execution to
reduce the amount of data that needs to be sent.
3. Query Execution
Data Localization: The system must decide where to execute each part of the query. Some
operations may be performed locally on each node, while others might require
communication between nodes (e.g., if the data is distributed).
Parallel Execution: Distributed databases often execute queries in parallel across different
sites. This requires managing concurrency and ensuring that results from multiple sites are
combined correctly.
Distributed Joins: If the data for a join resides on multiple nodes, the database must decide
how to perform the join. Techniques such as broadcast joins or partitioned joins can be
used to reduce data movement.
4. Data Fragmentation and Replication
Fragmentation: Data can be fragmented horizontally (dividing rows across different sites)
or vertically (dividing columns). Query execution must take into account where the
fragments are stored.
Replication: Data may be replicated across multiple nodes for fault tolerance and
performance reasons. The query processor must decide whether to access a replica or the
original copy based on factors such as load and availability.
5. Transaction Management and Consistency
Distributed Transactions: Queries in distributed databases are often part of larger
transactions. Ensuring ACID (Atomicity, Consistency, Isolation, and Durability) properties

101
in a distributed setting requires careful management of resources and coordination between
nodes.
Two-Phase Commit (2PC): This is a protocol used to ensure that all participating sites
either commit or abort a transaction in a consistent manner.
6. Cost Estimation and Selection
Cost Models: A cost model is used to estimate the resources required to execute a query,
including computation, data transfer, and disk I/O. The optimizer uses these estimates to
choose the best execution plan.
Factors in Cost Estimation: This includes network latency, data transfer costs, disk I/O
costs, CPU usage, and memory usage.
7. Result Aggregation and Presentation
Once the query is executed on all relevant nodes, the results are aggregated and presented
to the user. In distributed systems, this step might involve combining data from different
sites and resolving conflicts if replication is involved.
 Challenges in Distributed Query Processing:
Data Distribution: Deciding where data should reside and how it can be accessed
efficiently is a key challenge.
Network Latency: Minimizing the cost of data transfer between nodes in the system.
Concurrency Control: Ensuring correct and consistent results when multiple users or
processes are accessing distributed data simultaneously.
Fault Tolerance: Ensuring the query processing is resilient to network or node failures,
which requires mechanisms for replication and recovery.
 Tools and Techniques:
Distributed Query Optimizers: These specialized optimizers take into account the
distributed nature of the database to improve query performance.
Materialized Views: Storing precomputed results of queries to avoid repeated execution.
Data Sharding: Dividing the database into smaller, more manageable parts, each of which
can be stored on different nodes.

102
Appendix A
Tools used for Information and Database Management Module

Category Tools for Fundamentals of Database Tools for Advanced Database

Hardware - Computer Workstations/Servers - High-performance Servers/Clusters


Tools (for distributed systems)

- Storage Systems (SSD, HDD, Cloud - Network Infrastructure (for


Storage) distributed databases)

Software Relational Database Management Distributed Database Systems


Tools Systems (RDBMS)

- MySQL - Apache Cassandra

- PostgreSQL - MongoDB

- SQLite - HBase

- Microsoft SQL Server

Database Query Languages Big Data Tools

- SQL (Structured Query Language) - Hadoop

- PL/SQL (Procedural SQL) - Apache Spark

Database Design Tools - Hive (Data Warehousing on


Hadoop)

- MySQL Workbench

- pgAdmin Cloud-based DBMS

- DBDesigner - Amazon RDS (Relational Database


Service)

- Lucidchart/Draw.io (ERD Design) - Google Cloud SQL

Database Management and - Azure SQL Database


Administration

103
- DBeaver

- phpMyAdmin Advanced Querying and


Optimization

NoSQL Databases - EXPLAIN (SQL for query


execution plans)

- Redis - Query Optimizer Tools (built-in


DBMS tools)

Data Warehousing and OLAP Data Warehousing and OLAP

- Microsoft SQL Server Analysis - Microsoft SQL Server Analysis


Services (SSAS) Services (SSAS)

- Apache Kylin (Distributed


Analytics Engine)

Additional - Docker (for containerization) NoSQL Databases


Tools

- Virtual Machines/VMware (for - Couchbase


virtualized environments)

- Redis

Database Security Tools

- Oracle Database Vault

- Vulnerability Scanners (Nexpose,


OpenVAS)

Data Modeling and Analysis

- ERwin Data Modeler

- IBM InfoSphere Data Architect

104
Appendix B
Here are some common terms in the fundamentals of database systems:
1. Database: A collection of organized data that can be easily accessed, managed, and
updated.
2. Database Management System (DBMS): Software that allows users to create, manage,
and interact with databases. It provides an interface for users and programs to access data.
3. Table: A collection of related data organized into rows and columns. Each table represents
a specific type of entity in the database.
4. Row (Record/tuple): A single, data item or entry in a table, representing a unique instance
of an entity.
5. Column (Field/Attribute): A single characteristic or property of an entity, represented as
a vertical division in a table.
6. Primary Key: A column or a combination of columns in a table that uniquely identifies
each record in that table.
7. Foreign Key: A column or a set of columns in one table that refers to the primary key in
another table, establishing a relationship between the two tables.
8. Entity: A real-world object or concept that can have data stored about it in the database
(e.g., an employee, a product).
9. Schema: The structure that defines the organization of data in a database, including tables,
views, and relationships.
10. Normalization: The process of organizing data in a database to reduce redundancy and
improve data integrity by dividing large tables into smaller, more manageable ones.
11. Denormalization: The process of combining tables to reduce the complexity of a database,
often for performance reasons.
12. SQL (Structured Query Language): A standardized language used to query and manage
databases, including commands for selecting, inserting, updating, and deleting data.
13. Query: A request for data or information from the database, typically written in SQL.
14. Index: A data structure that improves the speed of data retrieval operations on a database
table by providing quick access to rows based on the values of one or more columns.

105
15. Transaction: A unit of work that is executed as a single, indivisible operation, ensuring
database consistency. It typically includes operations like INSERT, UPDATE, and
DELETE.
16. ACID Properties: The set of properties that guarantee database transactions are processed
reliably:
o Atomicity: The transaction is all-or-nothing.
o Consistency: The database moves from one valid state to another.
o Isolation: Transactions do not interfere with each other.
o Durability: Once a transaction is committed, it is permanent.
17. Relational Model: A type of database model that organizes data into tables (relations),
using rows and columns.
18. Normalization Forms (1NF, 2NF, 3NF): Specific guidelines for organizing database
tables to minimize redundancy and dependency.
19. Data Integrity: The accuracy and consistency of data within a database.
20. Backup: The process of making a copy of the database to ensure data can be restored in
case of failure or loss.
21. View: A virtual table created by querying one or more tables. It doesn't store data itself but
displays data from underlying tables.
22. Stored Procedure: A precompiled collection of one or more SQL statements that can be
executed as a single unit to perform a task.
23. Trigger: A set of actions automatically performed in the database in response to specific
events (e.g., data insertion, deletion, or update).
24. Data Warehouse: A large collection of data used for analysis and reporting, often
structured differently from operational databases to support decision-making processes.
25. NoSQL: A class of databases designed to handle large volumes of unstructured data,
providing more flexibility than traditional relational databases.
26. Distributed Database: A database that is distributed across multiple physical locations,
either on different machines or in different geographical locations. Data may be replicated
or partitioned.
27. Sharding: A method of horizontal partitioning where a database is divided into smaller,
more manageable pieces (shards) that are distributed across multiple servers.

106
28. Replication: The process of copying data from one database server to another to improve
data availability, fault tolerance, and load balancing.
29. Concurrency Control: A mechanism to ensure that database transactions are executed in
such a way that the database remains in a consistent state, even when multiple transactions
occur simultaneously. Common methods include locking and optimistic concurrency
control.
30. Locking: A mechanism to control access to data in a multi-user environment. Locks can be
placed on data to prevent other transactions from modifying it until the lock is released.

o Exclusive Lock: Prevents other transactions from reading or writing to the locked data.

o Shared Lock: Allows other transactions to read the data but prevents writing.

31. Deadlock: A situation where two or more transactions are blocked because each is waiting
for the other to release a lock. Deadlock detection and resolution techniques are used to
handle this problem.
32. Eventual Consistency: A model of consistency in distributed databases where updates to
data are propagated through the system over time, but immediate consistency is not
guaranteed. It is often used in NoSQL databases.

33. Database Partitioning: The practice of dividing a database into smaller, more manageable
pieces to improve performance and scalability. This can be done in different ways:

o Horizontal Partitioning: Dividing a table into rows, with each partition containing
a subset of the rows.

o Vertical Partitioning: Dividing a table into columns, where each partition contains
a subset of the columns.

34. CAP Theorem: A principle that states that a distributed database can provide only two out
of three guarantees at a time:

o Consistency: Every read receives the most recent write.

o Availability: Every request receives a response, even if some nodes are


unavailable.

107
o Partition Tolerance: The system continues to function even if network partitions
occur.

o The CAP theorem is used to understand trade-offs in distributed database systems.

35. OLTP (Online Transaction Processing): A type of database management that supports
transaction-oriented applications, such as banking or e-commerce systems, which require
fast query processing, high concurrency, and consistency.

36. OLAP (Online Analytical Processing): A database design optimized for query processing
and analytics, focusing on data retrieval for decision support rather than transactional
operations. OLAP systems typically use multidimensional data models (e.g., star or
snowflake schema).

37. Data Mining: The process of discovering patterns, relationships, or knowledge from large
datasets using statistical, machine learning, or computational techniques.

38. Big Data: Large, complex datasets that require special handling and storage techniques
due to their volume, variety, and velocity. Big data tools and databases (e.g., Hadoop,
Apache Spark) handle high-volume, high-velocity data.

39. NoSQL Databases: Non-relational databases designed for unstructured or semi-structured


data, typically offering scalability and flexibility for applications with large datasets and
real-time analytics. Types of NoSQL databases include:

o Document Stores: Store data in the form of documents (e.g., MongoDB).

o Key-Value Stores: Data is stored as key-value pairs (e.g., Redis, DynamoDB).

o Column-Family Stores: Store data in columns rather than rows (e.g., Cassandra, HBase).

o Graph Databases: Use graph structures with nodes and edges to represent and query data
(e.g., Neo4j).

40. Cloud Databases: Databases that are hosted on cloud platforms (e.g., AWS, Azure, and
Google Cloud), offering scalability, high availability, and flexibility in terms of
infrastructure management.

108
41. Data Lake: A storage repository that can hold vast amounts of raw, unstructured, or semi-
structured data in its native format. Unlike traditional databases, data lakes are designed to
store large-scale data without the need for preprocessing.

42. Data Federation: A data integration technique that allows access to multiple databases
and data sources as if they were a single database, without physically consolidating them.

43. Data Warehousing: The practice of collecting, storing, and managing large amounts of
data from different sources for reporting and analysis. Data warehouses typically store
historical data and support complex queries.

44. ETL (Extract, Transform, Load): A process used in data warehousing to extract data
from different sources, transform it into a usable format, and load it into a data warehouse.

45. Materialized Views: A database object that stores the result of a query for faster access.
Unlike regular views, which are computed on demand, materialized views are precomputed
and stored on disk.

46. Query Optimization: The process of improving the efficiency of SQL queries to reduce
execution time and resource consumption. This can involve indexing, rewriting queries,
and adjusting the database configuration.

47. Hybrid Databases: A database that supports both transactional and analytical workloads
(e.g., HTAP - Hybrid Transactional/Analytical Processing), enabling real-time analytics
on transactional data.

48. In-Memory Databases: A database that stores data primarily in the system’s main
memory (RAM) rather than on traditional disk storage, which provides much faster data
access and processing speeds (e.g., Redis, SAP HANA).

49. ACID vs BASE: A comparison of two different approaches to transaction handling:

o ACID: (Atomicity, Consistency, Isolation, Durability) ensures reliability and consistency


for transactional systems.

o BASE: (Basically Available, Soft state, eventually consistent) focuses on availability and
flexibility in distributed systems, often used in NoSQL databases.

109
50. Blockchain Databases: A decentralized, distributed ledger system used to securely store
data in a tamper-evident manner. Block chain databases are often used for cryptocurrencies
and other applications requiring transparent and immutable records.

110
Appendix C
Model Questions

1. Which of the following is a relational database?

A. MongoDB C. Redis

B. MySQL D. Cassandra

2. Which of these is a characteristic of a DBMS?

A. Data Redundancy C. Inconsistent Data

B. Data Independence D. Manual Data Management

3. Which language is used to interact with relational databases?

A. SQL C. Python

B. HTML D. JavaScript

4. Which of the following is the key used to uniquely identify a record in a table?

A. Foreign Key C. Composite Key

B. Primary Key D. Secondary Key

5. What does SQL stand for?

A. Simple Query Language C. Sequential Query Language

B. Structured Query Language D. Standard Query Language

6. In SQL, which command is used to remove a table?

A. DROP C. REMOVE

B. DELETE D. CLEAR

7. Which of the following is used to store the schema of a database?

A. Data Dictionary C. Data Table

B. Data Set D. Data Collection

8. What does the term ‘Normalization’ refer to in database management?

111
A. Reducing data redundancy C. Combining data from multiple tables

B. Increasing query speed D. Encrypting data

9. Which of the following is the correct order of SQL clauses?

A. SELECT, FROM, WHERE, ORDER BY C. SELECT, WHERE, ORDER BY, FROM

B. WHERE, FROM, SELECT, ORDER BY D. FROM, SELECT, WHERE, ORDER BY

10. What does ACID stand for in database management?

A. Atomicity, Consistency, Isolation, Durability

B. Accuracy, Consistency, Integration, Durability

C. Atomicity, Consistency, Interaction, Durability

D. Aggregation, Consistency, Isolation, Durability

11. Which of the following is a relational database model concept?

A. Tables C. Documents

B. Objects D. Nodes

12. What is the purpose of a transaction in a DBMS?

A. To handle security of the database

B. To group multiple operations into a single unit

C. To update all records in the database

D. To increase query performance

13. What is the default sorting order in SQL?

A. Descending C. Alphabetical

B. Ascending D. Random

14. What does a "NULL" value indicate in a database?

A. Data is missing or unknown C. The value is blank

B. The value is 0 D. The value is not valid

112
15. What is a schema in database terms?

A. The structure of a database C. A method of data encryption

B. The physical storage location D. The process of indexing data

16. Which of these commands is used to modify existing records in a table?

A. MODIFY C. CHANGE

B. UPDATE D. ALTER

17. Which SQL clause is used to filter records?

A. WHERE C. SELECT

B. HAVING D. ORDER BY

18. Which is the correct syntax for creating a table in SQL?

A. CREATE TABLE table_name (column1 datatype, column2 datatype, ...);

B. CREATE DATABASE table_name (column1 datatype, column2 datatype, ...);

C. ADD TABLE table_name (column1 datatype, column2 datatype, ...);

D. CREATE TABLE (column1 datatype, column2 datatype, ...);

19. Which of the following ensures that no two rows in a table have identical values?

A. Foreign Key C. Index

B. Primary Key D. Unique Key

20. What is the purpose of the SQL GROUP BY clause?

A. To filter records C. To join tables

B. To group rows that share a property D. To sort the result set

21. What is a tuple in database terminology?

A. A row in a table C. A key in a table

B. A column in a table D. A database schema

22. What does "atomicity" mean in the context of ACID properties?

113
A. Transactions are executed in a complete manner

B. Transactions are stored in discrete units

C. Transactions can be partially rolled back

D. Transactions involve multiple steps


23. Which of these is a type of database model?
A. Relational Model C. Network Model
B. Object-Oriented Model D. All of the above
24. Which of these is an example of a NoSQL database?
A. MySQL C. PostgreSQL
B. MongoDB
D. Oracle

25. Which database technology is primarily designed for handling real-time data?

A. OLTP (Online Transaction Processing) C. Data Warehouses

B. OLAP (Online Analytical Processing) D. Cloud Database

26. Which of the following is a benefit of data replication in distributed databases?

A. It reduces redundancy

B. It improves system performance and availability

C. It creates more data inconsistencies

D. It increases the time taken for data updates

27. What is sharding in a distributed database?


A. Partitioning data into small chunks and distributing them across multiple machines
B. Combining different database technologies into a single database
C. Encrypting data for security
D. Backing up data into multiple regions

28. What is a consistency model in distributed databases?

A. A rule that defines how changes to data are propagated across nodes
B. A type of query optimization algorithm

114
C. A security mechanism for encryption
D. A method for archiving data

29. Which is a common challenge in distributed databases?

A. Single-point failure
D. Ensuring data consistency across nodes
B. Centralized data storage

C. Increased query speed

30. Which of the following is NOT a goal of query optimization?


A) Minimize resource usage
B) Improve query execution time
C) Ensure referential integrity
D) Generate efficient execution plans
31. Which of the following strategies is used in query optimization?

A) Indexing C) Predicate pushdown


B) Join reordering D) All of the above

32. Which of the following is NOT a type of serializability?

A) Conflict serializability C) Deadlock serializability


B) View serializability D) Precedence graph serializability

33. The main purpose of concurrency control in DBMS is to:

A) Maximize CPU usage C) Optimize query execution time


B) Prevent data anomalies D) Reduce storage overhead

34. Which of the following concurrency control methods uses locks?

A) Time-stamping C) Multiversion concurrency control


B) Two-phase locking D) Shadow paging

35. The problem of deadlocks in concurrency control arises when:


A) Transactions are executed serially
B) Multiple transactions wait for each other to release resources

115
C) There is no locking mechanism
D) The database is in read-only mode
36. What is the function of a cost-based query optimizer?
A) It measures the runtime of the query.
B) It uses predefined rules to optimize queries.
C) It estimates the cost of different query plans and chooses the lowest-cost plan.
D) It rewrites queries into equivalent forms for execution.
37. A schedule is conflict-serializable if:
A) There are no conflicting operations between transactions.
B) The precedence graph has no cycles.
C) All transactions are executed in a predefined order.
D) It avoids cascading aborts.
38. Which security mechanism can prevent a brute force attack on a database?
A) Limiting the number of login attempts
B) Disabling indexing
C) Avoiding SQL joins
D) Increasing server capacity
39. Which of the following is NOT a feature of homogeneous distributed databases?
A) Easier integration
B) Uniform query language
C) Seamless communication between nodes
D) Supports multiple database platforms
40. Which of the following challenges is common in heterogeneous distributed databases?
A) Query translation between different query languages
B) Data integrity enforcement in a single format
C) Uniform access to all data
D) Centralized control of data
41. Which of the following is a key difference between homogeneous and heterogeneous
distributed databases?
A) Heterogeneous systems support multiple DBMSs, while homogeneous systems use only one.
B) Homogeneous systems require manual synchronization of data.

116
C) Heterogeneous systems cannot handle distributed queries.
D) Homogeneous systems always operate without failures.
42. An example of a heterogeneous distributed database is:
A) A collection of MySQL databases replicated across multiple sites
B) An integration of Oracle, SQL Server, and MongoDB databases
C) A system of multiple SQLite instances sharing the same schema
D) A single PostgreSQL database split into shards
43. A trigger in a database is:
A) A condition that enforces constraints on data integrity
B) A stored procedure that automatically executes in response to specific database events
C) A special type of index for query optimization
D) A tool for encrypting sensitive data
44. What is the primary limitation of triggers?
A) They can only be executed once.
B) They cannot interact with other tables.
C) They may lead to performance overhead if overused.
D) They cannot handle DELETE operations.
45. A database view is:
A) A physical copy of a table
B) A virtual table based on a SELECT query
C) An index on a table
D) A trigger for data validation
46. What is the difference between WHERE and HAVING in SQL?

A) WHERE filters rows before grouping, and HAVING filters rows after grouping

B) WHERE filters columns, and HAVING filters rows

C) WHERE is used in SELECT, and HAVING is used in INSERT

D) There is no difference

47. What is the purpose of the GROUP BY clause in SQL?

A) To group rows that have the same values into summary rows

117
B) To filter rows after grouping
C) To join multiple tables together
D) To sort the result set

48. What is the ROLLBACK command used for in SQL?

A) To permanently save changes to the database

B) To undo changes made in the current transaction

C) To create a backup of the database

D) To remove records from a table

49. Which of the following statements will return the second highest salary from an Employee
table?

A) SELECT MAX(salary) FROM Employee WHERE salary < (SELECT


MAX(salary) FROM Employee);

B) SELECT DISTINCT salary FROM Employee ORDER BY salary LIMIT 1


OFFSET 2;
C) SELECT salary FROM Employee WHERE salary = (SELECT
MAX(salary) FROM Employee);
D) SELECT salary FROM Employee WHERE salary < (SELECT
MIN(salary) FROM Employee);
50. Which of the following statements will return the total number of employees from an
Employee table?
A) SELECT COUNT(*) FROM Employee;
B) SELECT SUM(*) FROM Employee;
C) SELECT TOTAL(*) FROM Employee;
D) SELECT ALL(*) FROM Employee;

118
Answer Key
1) B 18) A 35) B
2) B 19) B 36) C
3) A 20) B 37) B
4) B 21) A 38) A
5) B 22) A 39) D
6) A 23) D 40) A
7) A 24) B 41) A
8) A 25) A 42) B
9) A 26) B 43) B
10) A 27) A 44) C
11) A 28) A 45) B
12) B 29) D 46) A
13) B 30) C 47) A
14) A 31) D 48) B
15) A 32) C 49) A
16) B 33) B 50) A
17) A 34) B

119

You might also like