Introduction of DBMS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 147

UNIT 1: Database System Concepts & Architecture

Basic Database Concepts


The database system is an excellent computer-based record-keeping system. A collection of data, commonly called a
database, contains information about a particular enterprise. It maintains any information that may be necessary to
the decision-making process involved in the management of that organization. It can also be defined as a collection
of interrelated data stored together to serve multiple applications, the data is stored so that it is independent of
programs that use the data. A generic and controlled approach is used to add new data and modify and retrieve
existing data within the database. The data is structured so as to provide the basis for future application
development.

Purpose of Database

The intent of a database is that a collection of data should serve as many applications as possible. Therefore, a
database is often thought of as a repository of information needed to run certain functions in a corporation or
organization. It would permit only the retrieval of data but also the continuous modification of data needed for the
control of operations. It may be possible to search the database to obtain answers to questions or information for
planning purposes.

In a typical file-processing system, permanent records are stored in different files. Many different application
programs are written to extract the records and add the records to the appropriate files. However, this scheme has
several major limitations and disadvantages, such as data redundancy (duplication of data), data inconsistency,
maladaptive data, non-standard data, insecure data, incorrect data, etc. A database management system is an
answer to all these problems as it provides centralized control of the data.

Database Abstraction

A major purpose of a database is to provide the user with only as much information as is required of them. This
means that the system does not disclose all the details of the data, rather it hides some details of how the data is
stored and maintained. The complexity of databases is hidden from them which, if necessary, are ordered through
multiple levels of abstraction to facilitate their interaction with the system. The different levels of the database are
implemented through three layers:

1. Internal Level(Physical Level): The lowest level of abstraction, the internal level, is closest to physical
storage. It describes how the data is stored concretely on the storage medium.

2. Conceptual Level: This level of abstraction describes what data is concretely stored in the database. It also
describes the relationships that exist between the data. At this level, databases are described logically in
terms of simple data structures. Users at this level are not concerned with how these logical data structures
will be implemented at the physical level.

3. External Level(View Level): It is the level closest to users and is related to the way the data is viewed by
individual users.
UNIT 1: Database System Concepts & Architecture

Since a database can be viewed through three levels of abstraction, any change at one level can affect plans at other
levels. As databases continue to grow, there may be frequent changes to it at times. This should not lead to redesign
and re-implementation of the database. In such a context the concept of data independence proves beneficial.

Concept of Database

To store and manage data efficiently in the database let us understand some key terms:

1. Database Schema: It is a design of the database. Or we can say that it is a skeleton of the database that is used to
represent the structure, types of data will be stored in the rows and columns, constraints, relationships between the
tables.

2. Data Constraints: In a database, sometimes we put some restrictions on the table that what type of data can be
stored in one or more columns of the table, it can be done by using constraints. Constraints are defined while we are
creating a table.

3. Data dictionary or Metadata: Metadata is known as the data about the data. Or we can say that the database
schema along with different types of constraints on the data is stored by DBMS in the dictionary is known as
metadata.

4. Database instance: In a database, a database instance is used to define the complete database environment and
its components. Or we can say that it is a set of memory structures and background processes that are used to
access the database files.

5. Query: In a database, a query is used to access data from the database. So users have to write queries to retrieve
or manipulate data from the database.

6. Data manipulation: In a database, we can easily manipulate data using the three main operations that is Insertion,
Deletion, and updation.

7. Data Engine: It is an underlying component that is used to create and manage various database queries.
UNIT 1: Database System Concepts & Architecture
Advantages of Database

Let us consider some of the benefits provided by a database system and see how a database system overcomes the
above-mentioned problems:-

1. Reduces database data redundancy to a great extent

2. The database can control data inconsistency to a great extent

3. The database facilitates sharing of data.

4. Database enforce standards.

5. The database can ensure data security.

6. Integrity can be maintained through databases.

Therefore, for systems with better performance and efficiency, database systems are preferred.

Disadvantages of Database

With the complex tasks to be performed by the database system, some things may come up which can be termed as
the disadvantages of using the database system. These are:-

1. Security may be compromised without good controls.

2. Integrity may be compromised without good controls.

3. Extra hardware may be required

4. Performance overhead may be significant.

5. The system is likely to be complex.

database-related concept that is often overlooked or noDatabase system is an excellent computer-based record-
keeping system. A collection of data, commonly called a database, contains information about a particular
enterprise. It maintains any information that may necessary to the decision-making process involved in the
management of that organization. It can also be defined as a collection of interrelated data stored together to serve
multiple applications, the data is stored so that it is independent of programs that use the data. A generic and
controlled approach is used to add new data and modify and retrieve existing data within the database. The data is
structured so as to provide the basis for future application development.

Purpose of Database

The intent of a database is that a collection of data should serve as many applications as possible. Therefore, a
database is often thought of as a repository of information needed to run certain functions in a corporation or
organization. It would permit only the retrieval of data but also the continuous modification of data needed for the
control of operations. It may be possible to search the database to obtain answers to questions or information for
planning purposes.

In a typical file-processing system, permanent records are stored in different files. Many different application
programs are written to extract the records and add the records to the appropriate files. But this scheme has several
major limitations and disadvantages, such as data redundancy (duplication of data), data inconsistency, maladaptive
data, non-standard data, insecure data, incorrect data, etc. A database management system is an answer to all these
problems as it provides centralized control of the data.
UNIT 1: Database System Concepts & Architecture
Concept of Database

To store and manage data efficiently in the database let us understand some key terms:

1. Database Schema: It is a design of the database. Or we can say that it is a skeleton of the database that is used to
represent the structure, types of data will be stored in the rows and columns, constraints, relationships between the
tables.

2. Data Constraints: In a database, sometimes we put some restrictions on the table that what type of data can be
stored in one or more columns of the table, it can be done by using constraints. Constraints are defined while we are
creating a table.

3. Data dictionary or Metadata: Metadata is known as the data about the data. Or we can say that the database
schema along with different types of constraints on the data is stored by DBMS in the dictionary is known as
metadata.

4. Database instance: In a database, a database instance is used to define the complete database environment and
its components. Or we can say that it is a set of memory structures and background processes that are used to
access the database files.

5. Query: In a database, a query is used to access data from the database. So users have to write queries to retrieve
or manipulate data from the database.

6. Data manipulation: In a database, we can easily manipulate data using the three main operations that is Insertion,
Deletion, and updation.

7. Data Engine: It is an underlying component that is used to create and manage various database queries.

Advantages of Database

Let us consider some of the benefits provided by a database system and see how a database system overcomes the
above-mentioned problems:-

 Reduces database data redundancy to a great extent

 The database can control data inconsistency to a great extent

 The database facilitates sharing of data.

 Database enforce standards.

 The database can ensure data security.

 Integrity can be maintained through databases.

Therefore, for systems with better performance and efficiency, database systems are preferred.

Disadvantages of Database

With the complex tasks to be performed by the database system, some things may come up which can be termed as
the disadvantages of using the database system. These are:-

 Security may be compromised without good controls.

 Integrity may be compromised without good controls.

 Extra hardware may be required

 Performance overhead may be significant.

 The system is likely to be complex.


UNIT 1: Database System Concepts & Architecture
Normalization

This is the process of organizing a database to minimize redundancy and dependency by breaking down complex
tables into smaller, more manageable ones. It’s important to understand normalization because it helps you create
efficient and scalable databases, reduces data inconsistency and duplication, and makes it easier to update and
maintain the database over time.

This information is often skipped over in introductory material because it can be technical and complex, but it is
crucial for understanding how to properly design and maintain a database.

Data Abstraction and Data Independence


Database systems comprise complex data structures. In order to make the system efficient in terms of retrieval of
data, and reduce complexity in terms of usability of users, developers use abstraction i.e. hide irrelevant details from
the users. This approach simplifies database design.

Level of Abstraction in a DBMS

There are mainly 3 levels of data abstraction:

 Physical or Internal Level

 Logical or Conceptual Level

 View or External Level

Physical or Internal Level

This is the lowest level of data abstraction. It tells us how the data is actually stored in memory. Access methods like
sequential or random access and file organization methods like B+ trees and hashing are used for the same.
Usability, size of memory, and the number of times the records are factors that we need to know while designing the
database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of memory used for these
purposes are kept hidden from the user.

Logical or Conceptual Level

This level comprises the information that is actually stored in the database in the form of tables. It also stores the
relationship among the data entities in relatively simple structures. At this level, the information available to the
user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager can also be stored.

View or External Level

This is the highest level of abstraction. Only a part of the actual database is viewed by the users. This level exists to
ease the accessibility of the database by an individual user. Users view data in the form of rows and columns. Tables
and relations are used to store data. Multiple views of the same database may exist. Users can just view the data
and interact with the database, storage and implementation details are hidden from them.

Example: In case of storing customer data,

 Physical level – it will contains block of storages (bytes,GB,TB,etc)

 Logical level – it will contain the fields and the attributes of data.

 View level – it works with CLI or GUI access of database


UNIT 1: Database System Concepts & Architecture

Data Abstraction

The main purpose of data abstraction is to achieve data independence in order to save the time and cost required
when the database is modified or altered.

Data Independence

Data Independence is mainly defined as a property of DBMS that helps you to change the database schema at one
level of a system without requiring to change the schema at the next level. it helps to keep the data separated from
all program that makes use of it.
We have namely two levels of data independence arising from these levels of abstraction:

 Physical level data independence

 Logical level data independence

Data Independence
UNIT 1: Database System Concepts & Architecture
Physical Level Data Independence

It refers to the characteristic of being able to modify the physical schema without any alterations to the conceptual
or logical schema, done for optimization purposes, e.g., the Conceptual structure of the database would not be
affected by any change in storage size of the database system server. Changing from sequential to random access
files is one such example. These alterations or modifications to the physical structure may include:

 Utilizing new storage devices.

 Modifying data structures used for storage.

 Altering indexes or using alternative file organization techniques etc.

Logical Level Data Independence

It refers characteristic of being able to modify the logical schema without affecting the external schema or
application program. The user view of the data would not be affected by any changes to the conceptual view of the
data. These changes may include insertion or deletion of attributes, altering table structures entities or relationships
to the logical schema, etc.

Database Languages in DBMS

o A DBMS has appropriate languages and interfaces to express database queries and updates.

o Database languages can be used to read, store and update the data in the database.

Types of Database Languages

1. Data Definition Language (DDL)

o DDL stands for Data Definition Language. It is used to define database structure or pattern.

o It is used to create schema, tables, indexes, constraints, etc. in the database.

o Using the DDL statements, you can create the skeleton of the database.

o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
UNIT 1: Database System Concepts & Architecture
Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.

o Alter: It is used to alter the structure of the database.

o Drop: It is used to delete objects from the database.

o Truncate: It is used to remove all records from a table.

o Rename: It is used to rename an object.

o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under Data definition language.

2. Data Manipulation Language (DML)

DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It handles
user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.

o Insert: It is used to insert data into a table.

o Update: It is used to update existing data within a table.

o Delete: It is used to delete all records from a table.

o Merge: It performs UPSERT operation, i.e., insert or update operations.

o Call: It is used to call a structured query language or a Java subprogram.

o Explain Plan: It has the parameter of explaining data.

o Lock Table: It controls concurrency.

3. Data Control Language (DCL)

o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.

o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.

o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language (TCL)

TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
UNIT 1: Database System Concepts & Architecture
Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.

o Rollback: It is used to restore the database to original since the last Commit.

Components of DBMS

DBMS stands for DataBase Management System. DBMS is a type of software by which we can save and retrieve the
user's data with the security process. DBMS can manipulate the database with the help of a group of programs. The
DBMS can accept the request from the operating system to supply the data. The DBMS also can accept the request
to retrieve a large amount of data through the user and third-party software.

DBMS also give permission to the user to use the data according to their needs. The word "DBMS" contains
information regarding the database program and the users. It also provides an interface between the user and the
software. In this topic, we are going to discuss the various types of DBMS.

Components of DBMS

There are many components available in the DBMS. Each component has a significant task in the DBMS. A database
environment is a collection of components that regulates the use of data, management, and a group of data. These
components consist of people, the technique of Handel the database, data, hardware, software, etc. there are
several components available for the DBMS. We are going to explain five main topics of the database below.

1. Hardware

o Here the hardware means the physical part of the DBMS. Here the hardware includes output devices like a
printer, monitor, etc., and storage devices like a hard disk.
UNIT 1: Database System Concepts & Architecture
o In DBMS, information hardware is the most important visible part. The equipment which is used for the
visibility of the data is the printer, computer, scanner, etc. This equipment is used to capture the data and
present the output to the user.

o With the help of hardware, the DBMS can access and update the database.

o The server can store a large amount of data, which can be shared with the help of the user's own system.

o The database can be run in any system that ranges from microcomputers to mainframe computers. And this
database also provides an interface between the real worlds to the database.

o When we try to run any database software like MySQL, we can type any commands with the help of our
keyboards, and RAM, ROM, and processor are part of our computer system.

2. Software

o Software is the main component of the DBMS.

o Software is defined as the collection of programs that are used to instruct the computer about its work. The
software consists of a set of procedures, programs, and routines associated with the computer system's
operation and performance. Also, we can say that computer software is a set of instructions that is used to
instruct the computer hardware for the operation of the computers.

o The software includes so many software like network software and operating software. The database
software is used to access the database, and the database application performs the task.

o This software has the ability to understand the database accessing language and then convert these
languages to real database commands and then execute the database.

o This is the main component as the total database operation works on a software or application. We can also
be called as database software the wrapper of the whole physical database, which provides an easy interface
for the user to store, update and delete the data from the database.

o Some examples of DBMS software include MySQL, Oracle, SQL Server, dBase, FileMaker, Clipper, Foxpro,
Microsoft Access, etc.

3. Data

o The term data means the collection of any raw fact stored in the database. Here the data are any type of raw
material from which meaningful information is generated.

o The database can store any form of data, such as structural data, non-structural data, and logical data.

o The structured data are highly specific in the database and have a structured format. But in the case of non-
structural data, it is a collection of different types of data, and these data are stored in their native format.

o We also call the database the structure of the DBMS. With the help of the database, we can create and
construct the DBMS. After the creation of the database, we can create, access, and update that database.

o The main reason behind discovering the database is to create and manage the data within the database.

o Data is the most important part of the DBMS. Here the database contains the actual data and metadata.
Here metadata means data about data.

o For example, when the user stores the data in a database, some data, such as the size of the data, the name
of the data, and some data related to the user, are stored within the database. These data are called
metadata.
UNIT 1: Database System Concepts & Architecture
4. Procedures

o The procedure is a type of general instruction or guidelines for the use of DBMS. This instruction includes
how to set up the database, how to install the database, how to log in and log out of the database, how to
manage the database, how to take a backup of the database, and how to generate the report of the
database.

o In DBMS, with the help of procedure, we can validate the data, control the access and reduce the traffic
between the server and the clients. The DBMS can offer better performance to extensive or complex
business logic when the user follows all the procedures correctly.

o The main purpose of the procedure is to guide the user during the management and operation of the
database.

o The procedure of the databases is so similar to the function of the database. The major difference between
the database procedure and database function is that the database function acts the same as the SQL
statement. In contrast, the database procedure is invoked using the CALL statement of the DBMS.

o Database procedures can be created in two ways in enterprise architecture. These two ways are as below.

o The individual object or the default object.

o The operations in a container.

1. CREATE [OR REPLACE] PROCEDURE procedure_name (<Argument> {IN, OUT, IN OUT}

2. <Datatype>,...)

3. IS

4. Declaration section<variable, constant> ;

5. BEGIN

6. Execution section

7. EXCEPTION

8. Exception section

9. END

5. Database Access Language

o Database Access Language is a simple language that allows users to write commands to perform the desired
operations on the data that is stored in the database.

o Database Access Language is a language used to write commands to access, upsert, and delete data stored in
a database.

o Users can write commands or query the database using Database Access Language before submitting them
to the database for execution.

o Through utilizing the language, users can create new databases and tables, insert data and delete data.

o Examples of database languages are SQL (structured query language), My Access, Oracle, etc. A database
language is comprised of two languages.
UNIT 1: Database System Concepts & Architecture
1. Data Definition Language(DDL):It is used to construct a database. DDL implements database schema at the
physical, logical, and external levels.

The following commands serve as the base for all DDL commands:

o ALTER<object>

o COMMENT

o CREATE<object>

o DESCRIBE<object>

o DROP<object>

o SHOW<object>

o USE<object>

2. Data Manipulation Language(DML): It is used to access a database. The DML provides the statements to retrieve,
modify, insert and delete the data from the database.

The following commands serve as the base for all DML commands:

o INSERT

o UPDATE

o DELETE

o LOCK

o CALL

o EXPLAIN PLAN

6. People

o The people who control and manage the databases and perform different types of operations on the
database in the DBMS.

o The people include database administrator, software developer, and End-user.

o Database administrator-database administrator is the one who manages the complete database
management system. DBA takes care of the security of the DBMS, its availability, managing the license keys,
managing user accounts and access, etc.

o Software developer- theThis user group is involved in developing and designing the parts of DBMS. They can
handle massive quantities of data, modify and edit databases, design and develop new databases, and
troubleshoot database issues.

o End user - These days, all modern web or mobile applications store user data. How do you think they do it?
Yes, applications are programmed in such a way that they collect user data and store the data on a DBMS
system running on their server. End users are the ones who store, retrieve, update and delete data.

o The users of the database can be classified into different groups.

i. Native Users

ii. Online Users

iii. Sophisticated Users


UNIT 1: Database System Concepts & Architecture
iv. Specialized Users

v. Application Users

vi. DBA - Database Administrator

DBA Full Form


A Database Administrator (DBA) is an individual or person responsible for controlling, maintaining, coordinating,
and operating a database management system. Managing, securing, and taking care of the database systems is a
prime responsibility. They are responsible and in charge of authorizing access to the database, coordinating,
capacity, planning, installation, and monitoring uses, and acquiring and gathering software and hardware resources
as and when needed. Their role also varies from configuration, database design, migration, security, troubleshooting,
backup, and data recovery. Database administration is a major and key function in any firm or organization that is
relying on one or more databases. They are overall commanders of the Database system.

Types of Database Administrator (DBA) :

 Administrative DBA –
Their job is to maintain the server and keep it functional. They are concerned with data backups, security,
troubleshooting, replication, migration, etc.

 Data Warehouse DBA –


Assigned earlier roles, but held accountable for merging data from various sources into the data warehouse.
They also design the warehouse, with cleaning and scrubs data prior to loading.

 Cloud DBA –
Nowadays companies are preferring to save their workpiece on cloud storage. As it reduces the chance of
data loss and provides an extra layer of data security and integrity.

 Development DBA –
They build and develop queries, stores procedure, etc. that meets firm or organization needs. They are par
at programming.

 Application DBA –
They particularly manage all requirements of application components that interact with the database and
accomplish activities such as application installation and coordination, application upgrades, database
cloning, data load process management, etc.

 Architect –
They are held responsible for designing schemas like building tables. They work to build a structure that
meets organizational needs. The design is further used by developers and development DBAs to design and
implement real applications.

 OLAP DBA –
They design and build multi-dimensional cubes for determination support or OLAP systems.

 Data Modeler –
In general, a data modeler is in charge of a portion of a data architect’s duties. A data modeler is typically
not regarded as a DBA, but this is not a hard and fast rule.

 Task-Oriented DBA –
To concentrate on a specific DBA task, large businesses may hire highly specialised DBAs. They are quite
uncommon outside of big corporations. Recovery and backup DBA, whose responsibility it is to guarantee
that the databases of businesses can be recovered, is an example of a task-oriented DBA. However, this
specialism is not present in the majority of firms. These task-oriented DBAs will make sure that highly
qualified professionals are working on crucial DBA tasks when it is possible.
UNIT 1: Database System Concepts & Architecture
 Database Analyst –
This position doesn’t actually have a set definition. Junior DBAs may occasionally be referred to as database
analysts. A database analyst occasionally performs functions that are comparable to those of a database
architect. The term “Data Administrator” is also used to describe database analysts and data analysts.
Additionally, some businesses occasionally refer to database administrators as data analysts.

Importance of Database Administrator (DBA) :

 Database Administrator manages and controls three levels of database internal level, conceptual level, and
external level of Database management system architecture and in discussion with the comprehensive user
community, gives a definition of the world view of the database. It then provides an external view of
different users and applications.

 Database Administrator ensures held responsible to maintain integrity and security of database restricting
from unauthorized users. It grants permission to users of the database and contains a profile of each and
every user in the database.

 Database Administrators are also held accountable that the database is protected and secured and that any
chance of data loss keeps at a minimum.

 Database Administrator is solely responsible for reducing the risk of data loss as it backup the data at regular
intervals.

Role and Duties of Database Administrator (DBA) :

 Decides hardware –
They decide on economical hardware, based on cost, performance, and efficiency of hardware, and best
suits the organization. It is hardware that is an interface between end users and the database.

 Manages data integrity and security –


Data integrity needs to be checked and managed accurately as it protects and restricts data from
unauthorized use. DBA eyes on relationships within data to maintain data integrity.

 Database Accessibility –
Database Administrator is solely responsible for giving permission to access data available in the database. It
also makes sure who has the right to change the content.

 Database design –
DBA is held responsible and accountable for logical, physical design, external model design, and integrity and
security control.

 Database implementation –
DBA implements DBMS and checks database loading at the time of its implementation.

 Query processing performance –


DBA enhances query processing by improving speed, performance, and accuracy.

 Tuning Database Performance –


If the user is not able to get data speedily and accurately then it may lose organization’s business. So by
tuning SQL commands DBA can enhance the performance of the database.

Various responsibilities of Database Administrator (DBA) :

 Responsible for designing overall database schema (tables & fields).

 To select and install database software and hardware.

 Responsible for deciding on access methods and data storage.


UNIT 1: Database System Concepts & Architecture
 DBA selects appropriate DBMS software like oracle, SQL server or MySQL.

 Used in designing recovery procedures.

 DBA decides the user access level and security checks for accessing, modifying or manipulating data.

 DBA is responsible for specifying various techniques for monitoring the database performance.

 DBA is responsible for operation managements.

 The operation management deals with the data problems which arises on day to day basis, and the
responsibilities include are:

1. Investigating if any error is been found in the data.

2. Supervising of restart and recovery procedures in case of any event failure.

3. Supervising reorganization of the databases.

4. Controlling and handling all periodic dumps of data.

Skills Required for DBA:

1. The various programming and soft skills are required to DBA are as follows,

 Good communication skills

 Excellent knowledge of databases architecture and design and RDBMS.

 Knowledge of Structured Query Language (SQL).

2. In addition, this aspect of database administration includes maintenance of data security, which involves
maintaining security authorization tables, conducting periodic security audits, investigating all known security
breaches.

3. To carry out all these functions, it is crucial that the DBA has all the accurate information about the company’s
data readily on hand. For this purpose he maintains a data dictionary.

4. The data dictionary contains definitions of all data items and structures, the various schemes, the relevant
authorization and validation checks and the different mapping definitions.

5. It should also have information about the source and destination of a data item and the flow of a data item as it is
used by a system. This type of information is a great help to the DBA in maintaining centralized control of data.

Short Note on Data Dictionary


A Data Dictionary comprises two words i.e. Data which simply means information being collected through some
sources and Dictionary means where this information is available.

A Data Dictionary can be defined as a collection of information on all data elements or contents of databases such as
data types, and text descriptions of the system. It makes it easier for users and analysts to use data as well as
understand and have common knowledge about inputs, outputs, components of a database, and intermediate
calculations.

Why Data Dictionary is Essential?

There is less information and details provided by data models. So, a data dictionary is essential and needed to have
proper knowledge and usage of contents. Data Dictionary provides all information about names that are used in
system models. Data Dictionary also provides information about entities, relationships, and attributes that are
present in the system model. As a part of the structured analysis and design tool, the implementation of a data
dictionary is done.
UNIT 1: Database System Concepts & Architecture
The following data name type of Information is used to store in a data dictionary:

Name Description

Name generally includes the primary name of all composite data or control items
Name
available, and the name of the external entity or data store.

Aliases Any other word used in place of Name

Where or How it’s A data dictionary generally gives information about where and how data or control items
used? are used which may include an input/ output to process.

Description A notation for representing content

Types of Data Dictionary

Data Dictionary is basically of two types. We will discuss each of them.

 Integrated Data Dictionary

 Stand Alone Data Dictionary

1. Integrated Data Dictionary

Integrated Data Dictionary can be seen as a catalog that can be maintained by the relational database. In previous
databases, there is not any functionality of Integrated Data Dictionary, so they use Stand Alone Data Dictionary.

There are two types of Integrated Data Dictionary.

 Active Data Dictionary: Active Database Dictionary is a type of database that is updated automatically in
case any changes are to be done to the database. These are self-updating.

 Passive Data Dictionary: Passive Databases are the databases that have to be maintained or updated
manually in case of any changes have been made to the system.

2. Stand Alone Data Dictionary

Stand Alone Data Dictionary is a type of flexible data dictionary as Database Administrator has ease of managing
data. It does not require data that is computer-based. It has no fixed format. But some elements are common in this
kind of database.

 Data Elements: It has the elements like name, datatype, validation rules, etc.

 Tables: These contain all the necessary information that is required for the table, how many rows in the
table, how many columns in the table, etc.

 Index: The index of the databases is to be stored here.

 Programs: These are used for accessing the database, and can include SQL Queries, Reports, etc.

 Relationship between Data Elements: This stores the relationship among the different databases, like
cardinality, connectivity, etc.
UNIT 1: Database System Concepts & Architecture
How to Create a Data Dictionary?

Data Dictionary can be formed with the help of the following relational database like MySQL, SQL Server, etc. Several
notations are present everywhere which enhances the quality of the Data Dictionary. We will also look into that.
Database Administrators can have the following templates like SQL Server, that help in making Data Dictionary.

Notations in Data Dictionary

Data Construct Notation Meaning

Composition = Is composed of

Sequence + Represents AND

Selection [ |] Represents OR

Repetition { }n Represents n repetitions or repetition for n times

Parentheses () Optional data that may or may not be present

Comment *…* Delimits a comment or commented information

Example: Let us understand this by taking an example of the reservation system. In the reservation system,
“passenger” is a data item whose information is available on the data dictionary as follows:

Keys Values

Name Passenger

Aliases None

Where or how it’s used? Passenger’s query (input) Ticket (output)

 Passenger = Passenger_name + Passenger_address

 Passenger_name = Passenger_lastname + Passenger_firstname + Passenger_middle_initial

Description  Passenger_address = Local_address + Community_address + Zip_code

 Local_address = House_number + street_name + Apartment_number

 Community_Address = City_name + State_name


UNIT 1: Database System Concepts & Architecture
Challenges with Data Dictionary

 The main challenge that occurs in front of us is that a data dictionary is somehow difficult and it may lead to
take much time in case when data is not prepared previously.

 Data Preparation is a bit lengthy and hectic process for a large scale of data.

 When you don’t do Data Preparation, Data Dictionary is not that much efficient.

 It can be an expensive process when resources are not to be utilized efficiently.

 Data Dictionary can be an insecure process as if you give access to so many persons, it might be a challenge
to the security of the Data Dictionary.

Conclusion

 Data dictionaries enhance the maintainability, security, accessibility, and quality of data.

 Data dictionaries can be installed as a stand-alone program or integrated into the database management
system.

 An informational database about the data within a database is called a data dictionary.

 They include thorough explanations of all database elements, such as constraints, tables, columns, and
indexes.
UNIT 2: Data Modeling Technique
Data Models in DBMS
A Data Model in Database Management System (DBMS) is the concept of tools that are developed to summarize the
description of the database. Data Models provide us with a transparent picture of data which helps us in creating an
actual database. It shows us from the design of the data to its proper implementation of data.

Types of Relational Models

1. Conceptual Data Model

2. Representational Data Model

3. Physical Data Model

It is basically classified into 3 types:-

1. Conceptual Data Model

The conceptual data model describes the database at a very high level and is useful to understand the needs or
requirements of the database. It is this model, that is used in the requirement-gathering process i.e. before the
Database Designers start making a particular database. One such popular model is the entity/relationship model (ER
model). The E/R model specializes in entities, relationships, and even attributes that are used by database designers.
In terms of this concept, a discussion can be made even with non-computer science(non-technical) users and
stakeholders, and their requirements can be understood.

Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data and the
relationships between them. It is basically a conceptual design of any database which is easy to design the view of
data.

Components of ER Model:

1. Entity: An entity is referred to as a real-world object. It can be a name, place, object, class, etc. These are
represented by a rectangle in an ER Diagram.

2. Attributes: An attribute can be defined as the description of the entity. These are represented by Eclipse in
an ER Diagram. It can be Age, Roll Number, or Marks for a Student.

3. Relationship: Relationships are used to define relations among different entities. Diamonds and Rhombus
are used to show Relationships.
UNIT 2: Data Modeling Technique
Characteristics of a conceptual data model

 Offers Organization-wide coverage of the business concepts.

 This type of Data Models are designed and developed for a business audience.

 The conceptual model is developed independently of hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and technology. The focus is to represent data as a user
will see it in the “real world.”

Conceptual data models known as Domain models create a common vocabulary for all stakeholders by establishing
basic concepts and scope

2. Representational Data Model

This type of data model is used to represent only the logical part of the database and does not represent the physical
structure of the database. The representational data model allows us to focus primarily, on the design part of the
database. A popular representational model is a Relational model. The relational Model consists of Relational
Algebra and Relational Calculus. In the Relational Model, we basically use tables to represent our data and the
relationships between them. It is a theoretical concept whose practical implementation is done in Physical Data
Model.

The advantage of using a Representational data model is to provide a foundation to form the base for the Physical
model

3. Physical Data Model

The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a database is
stored physically on a secondary storage device such as discs and tapes. This is stored in the form of files, records,
and certain other data structures. It has all the information on the format in which the files are present and the
structure of the databases, the presence of external data structures, and their relation to each other. Here, we
basically save tables in memory so they can be accessed efficiently. In order to come up with a good physical model,
we have to work on the relational model in a better way. Structured Query Language (SQL) is used to practically
implement Relational Algebra.

This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is
typically created by DBA and developers. The purpose is actual implementation of the database.

Characteristics of a physical data model:

 The physical data model describes data need for a single project or application though it maybe integrated
with other physical data models based on project scope.

 Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.

 Developed for a specific version of a DBMS, location, data storage or technology to be used in the project.

 Columns should have exact datatypes, lengths assigned and default values.

 Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined

Some Other Data Models

1. Hierarchical Model

The hierarchical Model is one of the oldest models in the data model which was developed by IBM, in the 1950s. In a
hierarchical model, data are viewed as a collection of tables, or we can say segments that form a hierarchical
relation. In this, the data is organized into a tree-like structure where each record consists of one parent record and
UNIT 2: Data Modeling Technique
many children. Even if the segments are connected as a chain-like structure by logical associations, then the instant
structure can be a fan structure with multiple branches. We call the illogical associations as directional associations.

2. Network Model

The Network Model was formalized by the Database Task group in the 1960s. This model is the generalization of the
hierarchical model. This model can consist of multiple parent segments and these segments are grouped as levels
but there exists a logical association between the segments belonging to any level. Mostly, there exists a many-to-
many logical association between any of the two segments.

3. Object-Oriented Data Model

In the Object-Oriented Data Model, data and their relationships are contained in a single structure which is referred
to as an object in this data model. In this, real-world problems are represented as objects with different attributes.
All objects have multiple relationships between them. Basically, it is a combination of Object Oriented programming
and a Relational Database Model.

4. Float Data Model

The float data model basically consists of a two-dimensional array of data models that do not contain any duplicate
elements in the array. This data model has one drawback it cannot store a large amount of data that is the tables can
not be of large size.

5. Context Data Model

The Context data model is simply a data model which consists of more than one data model. For example, the
Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users to do more than
one thing which each individual data model can do.

6. Semi-Structured Data Model

Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes and some
entities may have some missing attributes. Basically, you can represent data here in a flexible way.

Advantages of Data Models

1. Data Models help us in representing data accurately.

2. It helps us in finding the missing data and also in minimizing Data Redundancy.

3. Data Model provides data security in a better way.

4. The data model should be detailed enough to be used for building the physical database.

5. The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.

Disadvantages of Data Models

1. In the case of a vast database, sometimes it becomes difficult to understand the data model.

2. You must have the proper knowledge of SQL to use physical models.

3. Even smaller change made in structure require modification in the entire application.

4. There is no set data manipulation language in DBMS.

5. To develop Data model one should know physical data stored characteristics.
UNIT 2: Data Modeling Technique
Conclusion

 Data modeling is the process of developing data model for the data to be stored in a Database.

 Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring
quality of the data.

 Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.

 There are three types of conceptual, logical, and physical.

 The main aim of conceptual model is to establish the entities, their attributes, and their relationships.

 Logical data model defines the structure of the data elements and set the relationships between them.

 A Physical Data Model describes the database specific implementation of the data model.

 The main goal of a designing data model is to make certain that data objects offered by the functional team
are represented accurately.

 The biggest drawback is that even smaller change made in structure require modification in the entire
application.

 Reading this Data Modeling tutorial, you will learn from the basic concepts such as What is Data Model?
Introduction to different types of Data Model, advantages, disadvantages, and data model example.

Difference between Hierarchical, Network and Relational Data Model


1. Hierarchical Data Model:
Hierarchical data model is the oldest type of the data model. It was developed by IBM in 1968. It organizes data in
the tree-like structure. Hierarchical model consists of the following :

 It contains nodes which are connected by branches.

 The topmost node is called the root node.

 If there are multiple nodes appear at the top level, then these can be called as root segments.

 Each node has exactly one parent.

 One parent may have many child.


UNIT 2: Data Modeling Technique

Figure – Hierarchical Data Model

In the above figure, Electronics is the root node which has two children i.e. Televisions and Portable Electronics.
These two has further children for which they act as parent. For example: Television has children as Tube, LCD and
Plasma, for these three Television act as parent. It follows one to many relationship.

2. Network Data Model:


It is the advance version of the hierarchical data model. To organize data it uses directed graphs instead of the tree-
structure. In this child can have more than one parent. It uses the concept of the two data structures i.e. Records
and Sets.

Figure – Network Data Model

In the above figure, Project is the root node which has two children i.e. Project 1 and Project 2. Project 1 has 3
children and Project 2 has 2 children. Total there are 5 children i.e Department A, Department B and Department C,
they are network related children as we said that this model can have more than one parent. So, for the Department
B and Department C have two parents i.e. Project 1 and Project 2.

3. Relational Data Model:


The relational data model was developed by E.F. Codd in 1970. There are no physical links as they are in the
hierarchical data model. Following are the properties of the relational data model :

 Data is represented in the form of table only.


UNIT 2: Data Modeling Technique
 It deals only with the data not with the physical structure.

 It provides information regarding metadata.

 At the intersection of row and column there will be only one value for the tuple.

 It provides a way to handle the queries with ease.

Figure – Relational Data Model

Difference between Hierarchical, Network and Relational Data Model:

S.
No. Hierarchical Data Model Network Data Model Relational Data Model

In this model, to store data It organizes records in the form


hierarchy method is used. It is It organizes records to one another of table and relationship
1.
the oldest method and not in through links or pointers. between tables are set using
use today. common fields.

To organize records, it uses It organizes records in the form of It organizes records in the form
2.
tree structure. directed graphs. of tables.

In addition to 1:1 and 1:n it also In addition to 1:1 and 1:n it also
It implements 1:1 and 1:n
3. implements many to many implements many to many
relations.
relationships. relationships.
UNIT 2: Data Modeling Technique
S.
No. Hierarchical Data Model Network Data Model Relational Data Model

The logical representation is


Pointers are used to establish A linked list is used to establish a
used with rows and columns to
4. relationships among records relationship among records
depict relationship among
physically. physically.
records.

Insertion anomaly exits in this


model i.e. child node cannot be
5. There is no insertion anomaly. There is no insertion anomaly.
inserted without the parent
node.

Deletion anomaly exists in this


6. model i.e. it is difficult to There is no deletion anomaly. There is no deletion anomaly.
delete the parent node.

Update leads to inconsistency Updating a record is easy and


problems because of the No such problem as only one simple with the process of
7.
existence of multiple instances instance of records exist. normalization, the redundant
of a child record. data gets removed.

This model lacks data There is partial data independence This model provides data
8.
independence. in this model. independence.

No such facility for querying No such facility for querying SQL-based declarative querying
9.
database is supported. database is supported. is supported.

It is used to access the data It is used to access the data


It is used to access the data which is
10. which is complex and which is complex and
complex and symmetric.
asymmetric. symmetric.

Difficult to design a database and


It is easy to comprehend due to
Difficult to design a database manipulate a database because of
11. concealed physical level details
because of its complexity. its complexity. Hence, it imposes a
from end-users.
burden on the programmer.

It is flexible as compared to the It is flexible as compared to the


12. It is less flexible.
hierarchical model. hierarchical model.
UNIT 2: Data Modeling Technique
S.
No. Hierarchical Data Model Network Data Model Relational Data Model

&XML and XAML use this VAX-DBMS, DMS-1100 of UNIVAC It is mostly used in real world
13.
model. and SUPRADBMS’s use this model. applications. Oracle, SQL.

Enhanced ER Model
Prerequisite – Introduction of ER Model

Today the complexity of the data is increasing so it becomes more and more difficult to use the traditional ER model
for database modeling. To reduce this complexity of modeling we have to make improvements or enhancements to
the existing ER model to make it able to handle the complex application in a better way.

Enhanced entity-relationship diagrams are advanced database diagrams very similar to regular ER diagrams which
represent the requirements and complexities of complex databases.

It is a diagrammatic technique for displaying the Sub Class and Super Class; Specialization and Generalization; Union
or Category; Aggregation etc.

Generalization and Specialization: These are very common relationships found in real entities. However, this kind of
relationship was added later as an enhanced extension to the classical ER model. Specialized classes are often
called subclass while a generalized class is called a superclass, probably inspired by object-oriented programming. A
sub-class is best understood by “IS-A analysis”. The following statements hopefully make some sense to your mind
“Technician IS-A Employee”, and “Laptop IS-A Computer”.

An entity is a specialized type/class of another entity. For example, a Technician is a special Employee in a university
system Faculty is a special class of Employees. We call this phenomenon generalization/specialization. In the
example here Employee is a generalized entity class while the Technician and Faculty are specialized classes of
Employee.

Example:

This example instance of “sub-class” relationships. Here we have four sets of employees: Secretary, Technician, and
Engineer. The employee is a super-class of the rest three sets of individual sub-class is a subset of Employee set.
UNIT 2: Data Modeling Technique

 An entity belonging to a sub-class is related to some super-class entity. For instance emp, no 1001 is a
secretary, and his typing speed is 68. Emp no 1009 is an engineer (sub-class) and her trade is “Electrical”, so
forth.

 Sub-class entity “inherits” all attributes of super-class; for example, employee 1001 will have attributes eno,
name, salary, and typing speed.
UNIT 2: Data Modeling Technique
Enhanced ER model of above example

Constraints – There are two types of constraints on the “Sub-class” relationship.

1. Total or Partial – A sub-classing relationship is total if every super-class entity is to be associated with some
sub-class entity, otherwise partial. Sub-class “job type based employee category” is partial sub-classing – not
necessary every employee is one of (secretary, engineer, and technician), i.e. union of these three types is a
proper subset of all employees. Whereas other sub-classing “Salaried Employee AND Hourly Employee” is
total; the union of entities from sub-classes is equal to the total employee set, i.e. every employee
necessarily has to be one of them.

2. Overlapped or Disjoint – If an entity from a super-set can be related (can occur) in multiple sub-class sets,
then it is overlapped sub-classing, otherwise disjoint. Both the examples: job-type based and salaries/hourly
employee sub-classing are disjoint.

Note – These constraints are independent of each other: can be “overlapped and total or partial” or “disjoint and
total or partial”. Also, sub-classing has transitive properties.

Multiple Inheritance (sub-class of multiple superclasses) –


An entity can be a sub-class of multiple entity types; such entities are sub-class of multiple entities and have multiple
super-classes; Teaching Assistant can subclass of Employee and Student both. A faculty in a university system can be
a subclass of Employee and Alumnus. In multiple inheritances, attributes of sub-class are the union of attributes of
all super-classes.
UNIT 2: Data Modeling Technique
Union –

 Set of Library Members is UNION of Faculty, Student, and Staff. A union relationship indicates either type;
for example, a library member is either Faculty or Staff or Student.

 Below are two examples that show how UNION can be depicted in ERD – Vehicle Owner is UNION of
PERSON and Company, and RTO Registered Vehicle is UNION of Car and Truck.

You might see some confusion in Sub-class and UNION; consider an example in above figure Vehicle is super-class of
CAR and Truck; this is very much the correct example of the subclass as well but here use it differently we are saying
RTO Registered vehicle is UNION of Car and Vehicle, they do not inherit any attribute of Vehicle, attributes of car and
truck are altogether independent set, where is in sub-classing situation car and truck would be inheriting the
attribute of vehicle class.

An Enhanced Entity-Relationship (EER) model is an extension of the original Entity-Relationship (ER) model that
includes additional concepts and features to support more complex data modeling requirements. The EER model
includes all the elements of the ER model and adds new constructs, such as subtypes and supertypes, generalization
and specialization, and inheritance.

Here are some of the key features of the EER model:

 Subtypes and Supertypes: The EER model allows for the creation of subtypes and supertypes. A supertype is
a generalization of one or more subtypes, while a subtype is a specialization of a supertype. For example, a
vehicle could be a supertype, while car, truck, and motorcycle could be subtypes.

 Generalization and Specialization: Generalization is the process of identifying common attributes and
relationships between entities and creating a supertype based on these common features. Specialization is
UNIT 2: Data Modeling Technique
the process of identifying unique attributes and relationships between entities and creating subtypes based
on these unique features.

 Inheritance: Inheritance is a mechanism that allows subtypes to inherit attributes and relationships from
their supertype. This means that any attribute or relationship defined for a supertype is automatically
inherited by all its subtypes.

 Constraints: The EER model allows for the specification of constraints that must be satisfied by entities and
relationships. Examples of constraints include cardinality constraints, which specify the number of
relationships that can exist between entities, and participation constraints, which specify whether an entity
is required to participate in a relationship.

 Overall, the EER model provides a powerful and flexible way to model complex data relationships, making it
a popular choice for database design. An Enhanced Entity-Relationship (EER) model is an extension of the
traditional Entity-Relationship (ER) model that includes additional features to represent complex
relationships between entities more accurately. Some of the main features of the EER model are:

 Subclasses and Superclasses: EER model allows for the creation of a hierarchical structure of entities where
a superclass can have one or more subclasses. Each subclass inherits attributes and relationships from its
superclass, and it can also have its unique attributes and relationships.

 Specialization and Generalization: EER model uses the concepts of specialization and generalization to
create a hierarchy of entities. Specialization is the process of defining subclasses from a superclass, while
generalization is the process of defining a superclass from two or more subclasses.

 Attribute Inheritance: EER model allows attributes to be inherited from a superclass to its subclasses. This
means that attributes defined in the superclass are automatically inherited by all its subclasses.

 Union Types: EER model allows for the creation of a union type, which is a combination of two or more
entity types. The union type can have attributes and relationships that are common to all the entity types
that make up the union.

 Aggregation: EER model allows for the creation of an aggregate entity that represents a group of entities as
a single entity. The aggregate entity has its unique attributes and relationships.

 Multi-valued Attributes: EER model allows an attribute to have multiple values for a single entity instance.
For example, an entity representing a person may have multiple phone numbers.

 Relationships with Attributes: EER model allows relationships between entities to have attributes. These
attributes can describe the nature of the relationship or provide additional information about the
relationship.

Generalization, Specialization and Aggregation in ER Model


Using the ER model for bigger data creates a lot of complexity while designing a database model, So in order to
minimize the complexity Generalization, Specialization, and Aggregation were introduced in the ER model and these
were used for data abstraction in which an abstraction mechanism is used to hide details of a set of objects. Some of
the terms were added to the Enhanced ER Model, where some new concepts were added. These new concepts are:

 Generalization

 Specialization

 Aggregation

Generalization

Generalization is the process of extracting common properties from a set of entities and creating a generalized entity
from it. It is a bottom-up approach in which two or more entities can be generalized to a higher-level entity if they
UNIT 2: Data Modeling Technique
have some attributes in common. For Example, STUDENT and FACULTY can be generalized to a higher-level entity
called PERSON as shown in Figure 1. In this case, common attributes like P_NAME, and P_ADD become part of a
higher entity (PERSON), and specialized attributes like S_FEE become part of a specialized entity (STUDENT).

Generalization is also called as ‘ Bottom-up approach”.

Generalization

Specialization

In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-down approach where
the higher-level entity is specialized into two or more lower-level entities. For Example, an EMPLOYEE entity in an
Employee management system can be specialized into DEVELOPER, TESTER, etc. as shown in Figure 2. In this case,
common attributes like E_NAME, E_SAL, etc. become part of a higher entity (EMPLOYEE), and specialized attributes
like TES_TYPE become part of a specialized entity (TESTER).

Specialization is also called as ” Top-Down approch”.


UNIT 2: Data Modeling Technique

Specialization

Inheritance: It is an important feature of generalization and specialization

 Attribute inheritance: allows lower level entities to inherit the attributes of higher level entities and vice
versa.

 in diagram: Car entity is an inheritance of Vehicle entity ,So Car can acquire attributes
of Vehicle example:car can acquire Model attribute of Vehicle.

 Participation inheritance: In participation inheritance, relationships involving higher level entity set also
inherited by lower level entity and vice versa.

 in diagram: Vehicle entity has an relationship with Cycle entity ,So Cycle entity can acquire attributes of
lower level entities i.e Car and Bus since it is inheritance of Vehicle

Aggregation

An ER diagram is not capable of representing the relationship between an entity and a relationship which may be
required in some scenarios. In those cases, a relationship with its corresponding entities is aggregated into a higher-
level entity. Aggregation is an abstraction through which we can represent relationships as higher-level entity sets.

For Example, an Employee working on a project may require some machinery. So, REQUIRE relationship is needed
between the relationship WORKS_FOR and entity MACHINERY. Using aggregation, WORKS_FOR relationship with its
entities EMPLOYEE and PROJECT is aggregated into a single entity and relationship REQUIRE is created between the
aggregated entity and MACHINERY.
UNIT 2: Data Modeling Technique

Aggregation

Representing Aggregation Via Schema

To represent aggregation, create a schema containing the following things.

 the primary key to the aggregated relationship

 the primary key to the associated entity set

 descriptive attribute, if exists


UNIT 3: Relational Data Model in Depth
Relational Model in DBMS
E.F. Codd proposed the relational Model to model data in the form of relations or tables. After designing the
conceptual model of the Database using ER diagram, we need to convert the conceptual model into a relational
model which can be implemented using any RDBMS language like Oracle SQL, MySQL, etc. So we will see what the
Relational Model is.

What is the Relational Model?

The relational model represents how data is stored in Relational Databases. A relational database consists of a
collection of tables, each of which is assigned a unique name. Consider a relation STUDENT with attributes ROLL_NO,
NAME, ADDRESS, PHONE, and AGE shown in the table.

Table Student

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 18

Important Terminologies

 Attribute: Attributes are the properties that define an entity. e.g.; ROLL_NO, NAME, ADDRESS

 Relation Schema: A relation schema defines the structure of the relation and represents the name of the
relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation
schema for STUDENT. If a schema has more than 1 relation, it is called Relational Schema.

 Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one of which is
shown as:

1 RAM DELHI 9455123451 18

 Relation Instance: The set of tuples of a relation at a particular instance of time is called a relation instance.
Table 1 shows the relation instance of STUDENT at a particular time. It can change whenever there is an
insertion, deletion, or update in the database.

 Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.

 Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above
has cardinality 4.
UNIT 3: Relational Data Model in Depth
 Column: The column represents the set of values for a particular attribute. The column ROLL_NO is
extracted from the relation STUDENT.

ROLL_NO

 NULL Values: The value which is not known or unavailable is called a NULL value. It is represented by blank
space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.

 Relation Key: These are basically the keys that are used to identify the rows uniquely or also help in
identifying tables. These are of the following types.

 Primary Key

 Candidate Key

 Super Key

 Foreign Key

 Alternate Key

 Composite Key

Constraints in Relational Model

While designing the Relational Model, we define some conditions which must hold for data present in the database
are called Constraints. These constraints are checked before performing any operation (insertion, deletion, and
updation ) in the database. If there is a violation of any of the constraints, the operation will fail.

Domain Constraints

These are attribute-level constraints. An attribute can only take values that lie inside the domain range. e.g.; If a
constraint AGE>0 is applied to STUDENT relation, inserting a negative value of AGE will result in failure.

Key Integrity

Every relation in the database should have at least one set of attributes that defines a tuple uniquely. Those set of
attributes is called keys. e.g.; ROLL_NO in STUDENT is key. No two students can have the same roll number. So a key
has two properties:

 It should be unique for all tuples.

 It can’t have NULL values.

Referential Integrity
UNIT 3: Relational Data Model in Depth
When one attribute of a relation can only take values from another attribute of the same relation or any other
relation, it is called referential integrity. Let us suppose we have 2 relations

Table Student

ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAM DELHI 9455123451 18 CS

2 RAMESH GURGAON 9652431543 18 CS

3 SUJIT ROHTAK 9156253131 20 ECE

4 SURESH DELHI 18 IT

Table Branch

BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ECE ELECTRONICS AND COMMUNICATION ENGINEERING

CV CIVIL ENGINEERING

BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of BRANCH which is called
referential integrity constraint. The relation which is referencing another relation is called REFERENCING RELATION
(STUDENT in this case) and the relation to which other relations refer is called REFERENCED RELATION (BRANCH in
this case).

Anomalies in the Relational Model

An anomaly is an irregularity or something which deviates from the expected or normal state. When designing
databases, we identify three types of anomalies: Insert, Update, and Delete.

Insertion Anomaly in Referencing Relation

We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not present in the referenced
attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT relation will result in an error
because ‘ME’ is not present in BRANCH_CODE of BRANCH.
UNIT 3: Relational Data Model in Depth
Deletion/ Updation Anomaly in Referenced Relation:

We can’t delete or update a row from REFERENCED RELATION if the value of REFERENCED ATTRIBUTE is used in the
value of REFERENCING ATTRIBUTE. e.g; if we try to delete a tuple from BRANCH having BRANCH_CODE ‘CS’, it will
result in an error because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete the row from
BRANCH with BRANCH_CODE CV, it will be deleted as the value is not been used by referencing relation. It can be
handled by the following method:

On Delete Cascade

It will delete the tuples from REFERENCING RELATION if the value used by REFERENCING ATTRIBUTE is deleted from
REFERENCED RELATION. e.g.; For, if we delete a row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT
relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.

On Update Cascade

It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the attribute value used by REFERENCING
ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if we update a row from BRANCH with BRANCH_CODE ‘CS’ to
‘CSE’, the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with
BRANCH_CODE ‘CSE’.

Super Keys

Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is known as super keys.
Out of these super keys, we can always choose a proper subset among these that can be used as a primary key. Such
keys are known as Candidate keys. If there is a combination of two or more attributes that are being used as the
primary key then we call it a Composite key.

Codd Rules in Relational Model

Edgar F Codd proposed the relational database model where he stated rules. Now these are known as Codd’s Rules.
For any database to be the perfect one, it has to follow the rules.

For more, refer to Codd Rules in Relational Model.

Advantages of the Relational Model

 Simple model: Relational Model is simple and easy to use in comparison to other languages.

 Flexible: Relational Model is more flexible than any other relational model present.

 Secure: Relational Model is more secure than any other relational model.

 Data Accuracy: Data is more accurate in the relational data model.

 Data Integrity: The integrity of the data is maintained in the relational model.

 Operations can be Applied Easily: It is better to perform operations in the relational model.

Disadvantages of the Relational Model

 Relational Database Model is not very good for large databases.

 Sometimes, it becomes difficult to find the relation between tables.

 Because of the complex structure, the response time for queries is high.

Characteristics of the Relational Model

 Data is represented in rows and columns called relations.

 Data is stored in tables having relationships between them called the Relational model.
UNIT 3: Relational Data Model in Depth
 The relational model supports the operations like Data definition, Data manipulation, and Transaction
management.

 Each column has a distinct name and they are representing attributes.

 Each row represents a single entity.

Types of Keys in Relational Model (Candidate, Super, Primary, Alternate and Foreign)
Keys are one of the basic requirements of a relational database model. It is widely used to identify the tuples(rows)
uniquely in the table. We also use keys to set up relations amongst various columns and tables of a relational
database.

Different Types of Database Keys

 Candidate Key

 Primary Key

 Super Key

 Alternate Key

 Foreign Key

 Composite Key

Candidate Key

The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For Example, STUD_NO
in STUDENT relation.

 It is a minimal super key.

 It is a super key with no repeated data is called a candidate key.

 The minimal set of attributes that can uniquely identify a record.

 It must contain unique values.

 It can contain NULL values.

 Every table must have at least a single candidate key.

 A table can have multiple candidate keys but only one primary key.

 The value of the Candidate Key is unique and may be null for a tuple.

 There can be more than one candidate key in a relationship.

Example:

STUD_NO is the candidate key for relation STUDENT.


UNIT 3: Relational Data Model in Depth
Table STUDENT

STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

 The candidate key can be simple (having only one attribute) or composite as well.

Example:

{STUD_NO, COURSE_NO} is a composite


candidate key for relation STUDENT_COURSE.

Table STUDENT_COURSE

STUD_NO TEACHER_NO COURSE_NO

1 001 C001

2 056 C005

Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that column only once.
That’s why the STUD_PHONE attribute is a candidate here, but can not be a ‘null’ value in the primary key attribute.

Primary Key

There can be more than one candidate key in relation out of which one can be chosen as the primary key. For
Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO can be chosen
as the primary key (only one out of many candidate keys).

 It is a unique key.

 It can identify only one tuple (a record) at a time.

 It has no duplicate values, it has unique values.

 It cannot be NULL.

 Primary keys are not necessarily to be a single column; more than one column can also be a primary key for
a table.

Example:

STUDENT table -> Student(STUD_NO, SNAME,


ADDRESS, PHONE) , STUD_NO is a primary key
UNIT 3: Relational Data Model in Depth
Table STUDENT

STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

Super Key

The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO, (STUD_NO,
STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows in a table. It supports NULL
values.

 Adding zero or more attributes to the candidate key generates the super key.

 A candidate key is a super key but vice versa is not true.

 Super Key values may also be NULL.

Example:

Consider the table shown above.


STUD_NO+PHONE is a super key.

Relation between Primary Key, Candidate Key, and Super Key


UNIT 3: Relational Data Model in Depth
Alternate Key

The candidate key other than the primary key is called an alternate key.

 All the keys which are not primary keys are called alternate keys.

 It is a secondary key.

 It contains two or more fields to identify two or more records.

 These values are repeated.

 Eg:- SNAME, and ADDRESS is Alternate keys

Example:

Consider the table shown above.


STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).

Primary Key, Candidate Key, and Alternate Key

Foreign Key

If an attribute can only take the values which are present as values of some other attribute, it will be a foreign key to
the attribute to which it refers. The relation which is being referenced is called referenced relation and the
corresponding attribute is called referenced attribute the relation which refers to the referenced relation is called
referencing relation and the corresponding attribute is called referencing attribute. The referenced attribute of the
referenced relation should be the primary key to it.
UNIT 3: Relational Data Model in Depth
 It is a key it acts as a primary key in one table and it acts as
secondary key in another table.

 It combines two or more relations (tables) at a time.

 They act as a cross-reference between the tables.

 For example, DNO is a primary key in the DEPT table and a non-key in EMP

Example:

Refer Table STUDENT shown above.


STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.

Table STUDENT_COURSE

STUD_NO TEACHER_NO COURSE_NO

1 005 C001

2 056 C005

It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be NULL as well as may
contain duplicate tuples i.e. it need not follow uniqueness constraint. For Example, STUD_NO in the
STUDENT_COURSE relation is not unique. It has been repeated for the first and third tuples. However, the STUD_NO
in STUDENT relation is a primary key and it needs to be always unique, and it cannot be null.

Relation between Primary Key and Foreign Key


UNIT 3: Relational Data Model in Depth
Composite Key

Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a table. To
uniquely identify rows of a table, a combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a
table.

 It acts as a primary key if there is no primary key in a table

 Two or more attributes are used together to make a composite key.

 Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.

Example:

FULLNAME + DOB can be combined


together to access the details of a student.

Different Types of Keys


UNIT 3: Relational Data Model in Depth
Difference between Primary Key and Foreign Key
PRIMARY KEY FOREIGN KEY

A foreign key is a column or group of columns in a


A primary key is used to ensure data
relational database table that provides a link between data in
in the specific column is unique.
two tables.

It uniquely identifies a record in the It refers to the field in a table which is the primary key of
relational database table. another table.

Only one primary key is allowed in a


Whereas more than one foreign key is allowed in a table.
table.

It is a combination of UNIQUE and It can contain duplicate values and a table in a relational
Not Null constraints. database.

It does not allow NULL values. It can also contain NULL values.

Its value cannot be deleted from the


Its value can be deleted from the child table.
parent table.

It constraint can be implicitly It constraint cannot be defined on the local or global


defined on the temporary tables. temporary tables.

Conclusion

In conclusion, the relational model makes use of a number of keys: Candidate keys allow for distinct identification,
the Primary key serves as the chosen identifier, Alternate keys offer other choices, and Foreign keys create vital
linkages that guarantee data integrity between tables. The creation of strong and effective relational databases
requires the thoughtful application of these keys.

Constraints on Relational Database Model


In modeling the design of the relational database we can put some restrictions like what values are allowed to be
inserted in the relation, and what kind of modifications and deletions are allowed in the relation. These are the
restrictions we impose on the relational database.

In models like Entity-Relationship models, we did not have such features. Database Constraints can be categorized
into 3 main categories:

1. Constraints that are applied in the data model are called Implicit Constraints.

2. Constraints that are directly applied in the schemas of the data model, by specifying them in the DDL(Data
Definition Language). These are called Schema-Based Constraints or Explicit Constraints.

3. Constraints that cannot be directly applied in the schemas of the data model. We call these Application-
based or Semantic Constraints.

So here we are going to deal with Implicit constraints.


UNIT 3: Relational Data Model in Depth
Relational Constraints

These are the restrictions or sets of rules imposed on the database contents. It validates the quality of the database.
It validates the various operations like data insertion, updation, and other processes that have to be performed
without affecting the integrity of the data. It protects us against threats/damages to the database. Mainly
Constraints on the relational database are of 4 types

 Domain constraints

 Key constraints or Uniqueness Constraints

 Entity Integrity constraints

 Referential integrity constraints

Types of Relational Constraints

Let’s discuss each of the above constraints in detail.

1. Domain Constraints

 Every domain must contain atomic values(smallest indivisible units) which means composite and multi-
valued attributes are not allowed.

 We perform a datatype check here, which means when we assign a data type to a column we limit the
values that it can contain. Eg. If we assign the datatype of attribute age as int, we can’t give it values other
than int datatype.
UNIT 3: Relational Data Model in Depth
Example:

EID Name Phone

123456789
01 Bikash Dutta
234456678

Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values attribute, so it is
violating domain constraint.

2. Key Constraints or Uniqueness Constraints

 These are called uniqueness constraints since it ensures that every tuple in the relation should be unique.

 A relation can have multiple keys or candidate keys(minimal superkey), out of which we choose one of the
keys as the primary key, we don’t have any restriction on choosing the primary key out of candidate keys,
but it is suggested to go with the candidate key with less number of attributes.

 Null values are not allowed in the primary key, hence Not Null constraint is also part of the key constraint.

Example:

EID Name Phone

01 Bikash 6000000009

02 Paul 9000090009

01 Tuhin 9234567892

Explanation: In the above table, EID is the primary key, and the first and the last tuple have the same value in EID ie
01, so it is violating the key constraint.

3. Entity Integrity Constraints

 Entity Integrity constraints say that no primary key can take a NULL value, since using the primary key we
identify each tuple uniquely in a relation.

Example:

EID Name Phone

01 Bikash 9000900099

02 Paul 600000009
UNIT 3: Relational Data Model in Depth
EID Name Phone

NULL Sony 9234567892

Explanation: In the above relation, EID is made the primary key, and the primary key can’t take NULL values but in
the third tuple, the primary key is null, so it is violating Entity Integrity constraints.

4. Referential Integrity Constraints

 The Referential integrity constraint is specified between two relations or tables and used to maintain the
consistency among the tuples in two relations.

 This constraint is enforced through a foreign key, when an attribute in the foreign key of relation R1 has the
same domain(s) as the primary key of relation R2, then the foreign key of R1 is said to reference or refer to
the primary key of relation R2.

 The values of the foreign key in a tuple of relation R1 can either take the values of the primary key for some
tuple in relation R2, or can take NULL values, but can’t be empty.

Example:

EID Name DNO

01 Divine 12

02 Dino 22

04 Vivian 14

DNO Place

12 Jaipur

13 Mumbai

14 Delhi

Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2 is the primary key. DNO =
22 in the foreign key of Table 1 is not allowed because DNO = 22 is not defined in the primary key of table 2.
Therefore, Referential integrity constraints are violated here.

Advantages of Relational Database Model

 It is simpler than the hierarchical model and network model.

 It is easy and simple to understand.


UNIT 3: Relational Data Model in Depth
 Its structure can be changed anytime upon requirement.

 Data Integrity: The relational database model enforces data integrity through various constraints such as
primary keys, foreign keys, and unique constraints. This ensures that the data in the database is accurate,
consistent, and valid.

 Flexibility: The relational database model is highly flexible and can handle a wide range of data types and
structures. It also allows for easy modification and updating of the data without affecting other parts of the
database.

 Scalability: The relational database model can scale to handle large amounts of data by adding more tables,
indexes, or partitions to the database. This allows for better performance and faster query response times.

 Security: The relational database model provides robust security features to protect the data in the
database. These include user authentication, authorization, and encryption of sensitive data.

 Data consistency: The relational database model ensures that the data in the database is consistent across
all tables. This means that if a change is made to one table, the corresponding changes will be made to all
related tables.

 Query Optimization: The relational database model provides a query optimizer that can analyze and
optimize SQL queries to improve their performance. This allows for faster query response times and better
scalability.

Disadvantages of the Relational Model

 Few database relations have certain limits which can’t be expanded further.

 It can be complex and it becomes hard to use.

 Complexity: The relational model can be complex and difficult to understand, particularly for users who are
not familiar with SQL and database design principles. This can make it challenging to set up and maintain a
relational database.

 Performance: The relational model can suffer from performance issues when dealing with large data sets or
complex queries. In particular, joins between tables can be slow, and indexing strategies can be difficult to
optimize.

 Scalability: While the relational model is generally scalable, it can become difficult to manage as the
database grows in size. Adding new tables or indexes can be time-consuming, and managing relationships
between tables can become complex.

 Cost: Relational databases can be expensive to license and maintain, particularly for large-scale
deployments. Additionally, relational databases often require dedicated hardware and specialized software
to run, which can add to the cost.

 Limited flexibility: The relational model is designed to work with tables that have predefined structures and
relationships. This can make it difficult to work with data that does not fit neatly into a table-based format,
such as unstructured or semi-structured data.

 Data redundancy: In some cases, the relational model can lead to data redundancy, where the same data is
stored in multiple tables. This can lead to inefficiencies and can make it difficult to ensure data consistency
across the database.
UNIT 3: Relational Data Model in Depth
Difference between Data Security and Data Integrity
1. Data Security :
Data security refers to the prevention of data from unauthorized users. It is only allowed to access the data to the
authorized users. In database, the DBA or head of department can access all the data. Some users are only allowed
to retrieve data, whereas others are allowed to retrieve as well as to modify the data.

2. Data Integrity :
Data integrity is defined as the data contained in the database is both correct and consistent. For this purpose, the
data stored in the database must satisfy certain types of procedures (rules). The data in a database must be correct
and consistent. DBMS provides different ways to implement such types of constraints (rules). It can be implemented
by rules i.e., Primary Key, Secondary Key, Foreign key. This improves data integrity in a database.

Difference between Data Security and Data Integrity :

S.No. Data Security Data Integrity

Data integrity refers to the quality of data, which


Data security refers to the prevention of data corruption assures the data is complete and has a whole
1. through the use of controlled access mechanisms. structure.

2. Its motive is the protection of data. Its motive is the validity of data.

Its work is to only the people who should have access to


3. the data are the only ones who can access the data. Its work is to check the data is correct and not corrupt.

It refers to making sure that data is accessed by its


intended users, thus ensuring the privacy and protection of It refers to the structure of the data and how it
4. data. matches the schema of the database.

Some of the means to preserve integrity are backing


Some of the popular means of data security are up, error detection, designing a suitable user interface
5. authentication/authorization, masking, and encryptions. and correcting data.

It relates to the physical form of data against accidental or It relates to the logical protection (correct, complete
6. intentional loss or misuse and destruction. and consistence) of data.

7. It avoids unauthorized access of data. It avoids human error when data is entered.

It can be implemented through :

 user accounts (passwords) It can be implemented by following rule


 Primary Key
 authentication schemes
 Foreign Key
8.  Relationship
UNIT 3: Relational Data Model in Depth
Introduction of Relational Algebra in DBMS
Pre-Requisite: Relational Model in DBMS

Relational Algebra is a procedural query language. Relational algebra mainly provides a theoretical foundation for
relational databases and SQL. The main purpose of using Relational Algebra is to define operators that transform one
or more input relations into an output relation. Given that these operators accept relations as input and produce
relations as output, they can be combined and used to express potentially complex queries that transform
potentially many input relations (whose data are stored in the database) into a single output relation (the query
results). As it is pure mathematics, there is no use of English Keywords in Relational Algebra and operators are
represented using symbols.

Fundamental Operators

These are the basic/fundamental operators used in Relational Algebra.

1. Selection(σ)

2. Projection(π)

3. Union(U)

4. Set Difference(-)

5. Set Intersection(∩)

6. Rename(ρ)

7. Cartesian Product(X)

1. Selection(σ): It is used to select required tuples of the relations.

Example:

A B C

1 2 4

2 2 3

3 2 3

4 3 4

For the above relation, σ(c>3)R will select the tuples which have c more than 3.
UNIT 3: Relational Data Model in Depth
A B C

1 2 4

4 3 4

Note: The selection operator only selects the required tuples but does not display them. For display, the data
projection operator is used.

2. Projection(π): It is used to project required column data from a relation.

Example: Consider Table 1. Suppose we want columns B and C from Relation R.

π(B,C)R will show following columns.

B C

2 4

2 3

3 4

Note: By Default, projection removes duplicate data.

3. Union(U): Union operation in relational algebra is the same as union operation in set theory.

Example:

FRENCH

Student_Name Roll_Number

Ram 01

Mohan 02

Vivek 13

Geeta 17
UNIT 3: Relational Data Model in Depth
GERMAN

Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25

Consider the following table of Students having different optional subjects in their course.

π(Student_Name)FRENCH U π(Student_Name)GERMAN

Student_Name

Ram

Mohan

Vivek

Geeta

Shyam

Rohan

Note: The only constraint in the union of two relations is that both relations must have the same set of Attributes.

4. Set Difference(-): Set Difference in relational algebra is the same set difference operation as in set theory.

Example: From the above table of FRENCH and GERMAN, Set Difference is used as follows

π(Student_Name)FRENCH - π(Student_Name)GERMAN
UNIT 3: Relational Data Model in Depth
Student_Name

Ram

Mohan

Note: The only constraint in the Set Difference between two relations is that both relations must have the same set
of Attributes.

5. Set Intersection(∩): Set Intersection in relational algebra is the same set intersection operation in set theory.

Example: From the above table of FRENCH and GERMAN, the Set Intersection is used as follows

π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN

Student_Name

Vivek

Geeta

Note: The only constraint in the Set Difference between two relations is that both relations must have the same set
of Attributes.

6. Rename(ρ): Rename is a unary operation used for renaming attributes of a relation.

ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.

7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so the cross product between A X B will
result in all the attributes of A followed by each attribute of B. Each record of A will pair with every record of B.

Example:

Name Age Sex

Ram 14 M

Sona 15 F

Kim 20 M

B
UNIT 3: Relational Data Model in Depth
ID Course

1 DS

2 DBMS

AXB

Name Age Sex ID Course

Ram 14 M 1 DS

Ram 14 M 2 DBMS

Sona 15 F 1 DS

Sona 15 F 2 DBMS

Kim 20 M 1 DS

Kim 20 M 2 DBMS

Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.

Derived Operators

These are some of the derived operators, which are derived from the fundamental operators.

1. Natural Join(⋈)

2. Conditional Join

1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or more relations will result in a set of
all combinations of tuples where they have an equal common attribute.

Example:
UNIT 3: Relational Data Model in Depth
EMP

Name ID Dept_Name

A 120 IT

B 125 HR

C 110 Sales

D 111 IT

DEPT

Dept_Name Manager

Sales Y

Production Z

IT A

Natural join between EMP and DEPT with condition :

EMP.Dept_Name = DEPT.Dept_Name

EMP ⋈ DEPT

Name ID Dept_Name Manager

A 120 IT A

C 110 Sales Y

D 111 IT A

3. Conditional Join: Conditional join works similarly to natural join. In natural join, by default condition is equal
between common attributes while in conditional join we can specify any condition such as greater than, less
than, or not equal.

Example:
UNIT 3: Relational Data Model in Depth
R

ID Sex Marks

1 F 45

2 F 55

3 F 60

ID Sex Marks

10 M 20

11 M 22

12 M 59

Join between R and S with condition R.marks >= S.marks

R.ID R.Sex R.Marks S.ID S.Sex S.Marks

1 F 45 10 M 20

1 F 45 11 M 22

2 F 55 10 M 20

2 F 55 11 M 22

3 F 60 10 M 20

3 F 60 11 M 22

3 F 60 12 M 59
UNIT 3: Relational Data Model in Depth
Relational Calculus

As Relational Algebra is a procedural query language, Relational Calculus is a non-procedural query language. It
basically deals with the end results. It always tells me what to do but never tells me how to do it.

There are two types of Relational Calculus

1. Tuple Relational Calculus(TRC)

2. Domain Relational Calculus(DRC)


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
SQL Tutorial
SQL is a standard database language used to access and manipulate data in databases. SQL stands for Structured
Query Language. SQL was developed by IBM Computer Scientists in the 1970s. By executing queries SQL can create,
update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall SQL is a query language
that communicates with databases.

In this SQL tutorial, you’ll learn all the basic to advanced concepts like SQL queries, SQL join, SQL injection, SQL
insert, create tables in SQL. SQL is easy to learn, there are no higher prerequisites to learning SQL.

What is a Database?

Data is the new fuel of this world but only data is unorganized information, so to organize that data we make a
database. A database is the organized collection of structured data which is usually controlled by a database
management system (DBMS). Databases help us with easily storing, accessing, and manipulating data held on a
computer.

SQL CREATE TABLE


In the SQL database for creating a table, we use a command called CREATE TABLE.

SQL CREATE TABLE Statement

A Table is a combination of rows and columns. For creating a table we have to define the structure of a table by
adding names to columns and providing data type and size of data to be stored in columns.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Syntax:

CREATE table table_name

Column1 datatype (size),

column2 datatype (size),

columnN datatype(size)

);

Here table_name is name of the table, column is the name of column

SQL CREATE TABLE Example

Let us create a table to store data of Customers, so the table name is Customer, Columns are Name, Country, age,
phone, and so on.

CREATE TABLE Customer(

CustomerID INT PRIMARY KEY,

CustomerName VARCHAR(50),

LastName VARCHAR(50),

Country VARCHAR(50),

Age int(2),

Phone int(10)

);

Output:

Insert Data into Table

To add data to the table, we use INSERT INTO, the syntax is as shown below:

Syntax:

//Below query adds data in specific column, (like Column1=Value1)//

Insert into Table_name(Column1, Column2, Column3)

Values (Value1, value2, value3);

//Below query adds data in table in sequence of column name(Value1 will be

added in Column1 and so on)//


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Insert into Table_name

Values (Value1, value2, value3);

//Adding multiple data in the table in one go//

Insert into Table_name

Values (Value01, value02, value03),

(Value11, value12, value13),

(Value21, value22, value23),

(ValueN1, valueN2, valueN3)

Example Query

This query will add data in the table named Subject

-- Insert some sample data into the Customers table

INSERT INTO Customer (CustomerID, CustomerName, LastName, Country, Age, Phone)

VALUES (1, 'Shubham', 'Thakur', 'India','23','xxxxxxxxxx'),

(2, 'Aman ', 'Chopra', 'Australia','21','xxxxxxxxxx'),

(3, 'Naveen', 'Tulasi', 'Sri lanka','24','xxxxxxxxxx'),

(4, 'Aditya', 'Arpan', 'Austria','21','xxxxxxxxxx'),

(5, 'Nishant. Salchichas S.A.', 'Jain', 'Spain','22','xxxxxxxxxx');

Output:

Create a Table Using Another Table

We can also use CREATE TABLE to create a copy of an existing table. In the new table, it gets the exact column
definition all columns or specific columns can be selected.

If an existing table was used to create a new table, by default the new table would be populated with the existing
values ??from the old table.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Syntax:

CREATE TABLE new_table_name AS

SELECT column1, column2,…

FROM existing_table_name

WHERE ….;

Query:

CREATE TABLE SubTable AS

SELECT CustomerID, CustomerName

FROM customer;

Output:

 We can use * instead of column name to copy whole table to another table.

Note : In this query another table will be created with all the columns instead of just two columns.

Syntax :

CREATE TABLE new_table_name AS

SELECT *

FROM existing_table_name

WHERE ….;

Query :

CREATE TABLE customer_copy AS SELECT * FROM customer;

Output :
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

 You can also use LIMIT to insert specific number of records from old table. In below example it will populate
new table with first three records.

Syntax :

CREATE TABLE <new_table_name> AS SELECT * FROM <old_table_name> LIMIT <numeric_value>;

Query :

CREATE TABLE customer_copy AS SELECT * FROM customer LIMIT 3;

Output :

SQL INSERT INTO Statement


The INSERT INTO statement of SQL is used to insert a new row/record in a table. There are two ways of using the SQL
INSERT INTO statement for inserting rows.

SQL INSERT Query

1. Only Values

The first method is to specify only the value of data to be inserted without the column names.

INSERT INTO Syntax:

INSERT INTO table_name VALUES (value1, value2, value3);

table_name: name of the table. value1, value2

value of first column, second column,… for the new record


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Column Names And Values Both

In the second method we will specify both the columns which we want to fill and their corresponding values as
shown below:

Insert Data in Specified Columns – Syntax:

INSERT INTO table_name (column1, column2, column3)

VALUES ( value1, value2, value3); table_name:

name of the table.

column1: name of first column, second column .


value1, value2, value3 value of first column, second column,… for the new record

Suppose there is a Student database and we want to add values.

ROLL_NO NAME ADDRESS PHONE AGE

1 Ram Delhi xxxxxxxxxxxxxx 18

2 RAMESH GURGAON xxxxxxxxxxxxxx 18

3 SUJIT ROHTAK xxxxxxxxxxxxxx 20

4 SURESH ROHTAK xxxxxxxxxxxxxx 18

3 SUJIT ROHTAK xxxxxxxxxxxxxx 20

2 RAMESH GURGAON xxxxxxxxxxxxxx 18

Method 1 (Inserting only values) – SQL INSERT Query

If we want to insert only values then we use the following query:

Query:

INSERT INTO Student VALUES


('5','HARSH','WEST BENGAL',
'XXXXXXXXXX','19');

Output:

The table Student will now look like this:


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ROLL_NO NAME ADDRESS PHONE Age

1 Ram Delhi XXXXXXXXXX 18

2 RAMESH GURGAON XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

4 SURESH Delhi XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

2 RAMESH GURGAON XXXXXXXXXX 18

5 HARSH WEST BENGAL XXXXXXXXXX 19

Method 2 (Inserting values in only specified columns) – SQL INSERT INTO Statement

If we want to insert values in the specified columns then we use the following query:

Query:

INSERT INTO Student (ROLL_NO,


NAME, Age) VALUES ('5','PRATIK','19');

Output:

The table Student will now look like this:

ROLL_NO NAME ADDRESS PHONE Age

1 Ram Delhi XXXXXXXXXX 18

2 RAMESH GURGAON XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

4 SURESH Delhi XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ROLL_NO NAME ADDRESS PHONE Age

2 RAMESH GURGAON XXXXXXXXXX 18

5 PRATIK null null 19

Notice that the columns for which the values are not provided are filled by null. Which are the default values for
those columns?

2. Using SELECT in INSERT INTO Statement

We can use the SELECT statement with INSERT INTO statement to copy rows from one table and insert them into
another table. The use of this statement is similar to that of the INSERT INTO statement. The difference is that the
SELECT statement is used here to select data from a different table. The different ways of using INSERT INTO SELECT
statement are shown below:

Inserting all columns of a table – INSERT INTO SELECT Statement

We can copy all the data of a table and insert it into a different table.

Syntax:

INSERT INTO first_table SELECT * FROM second_table;

first_table: name of first table.

second_table: name of second table.

We have used the SELECT statement to copy the data from one table and the INSERT INTO statement to insert from
a different table.

Inserting specific columns of a table – INSERT INTO SELECT Statement

We can copy only those columns of a table that we want to insert into a different table.

Syntax:

INSERT INTO first_table(names_of_columns1)

SELECT names_of_columns2 FROM second_table;

first_table: name of first table. second_table: name of second table.

names of columns1: name of columns separated by comma(,) for table 1.

names of columns2: name of columns separated by comma(,) for table 2.

We have used the SELECT statement to copy the data of the selected columns only from the second table and the
INSERT INTO statement to insert in the first table.

Copying specific rows from a table – INSERT INTO SELECT Statement

We can copy specific rows from a table to insert into another table by using the WHERE clause with the SELECT
statement. We have to provide appropriate conditions in the WHERE clause to select specific rows.

INSERT INTO table1 SELECT * FROM table2 WHERE condition;

first_table: name of first table.


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
second_table: name of second table.

condition: condition to select specific rows.

Suppose there is a LateralStudent database.

ROLL_NO NAME ADDRESS PHONE Age

7 SOUVIK HYDERABAD XXXXXXXXXX 18

8 NIRAJ NOIDA XXXXXXXXXX 19

9 SOMESH ROHTAK XXXXXXXXXX 20

Method 1 – (Inserting all rows and columns)

If we want to insert only values then we use the following query:

SQL INSERT INTO SELECT Query:

INSERT INTO Student


SELECT * FROM LateralStudent;

Output:

This query will insert all the data of the table LateralStudent in the table Student. The table Student will now look like
this,

ROLL_NO NAME ADDRESS PHONE Age

1 Ram Delhi XXXXXXXXXX 18

2 RAMESH GURGAON XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

4 SURESH Delhi XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

2 RAMESH GURGAON XXXXXXXXXX 18


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ROLL_NO NAME ADDRESS PHONE Age

7 SOUVIK DUMDUM XXXXXXXXXX 18

8 NIRAJ NOIDA XXXXXXXXXX 19

9 SOMESH ROHTAK XXXXXXXXXX 20

Method 2(Inserting specific columns)

If we want to insert values in the specified columns then we use the following query:

SQL INSERT INTO SELECT Query:

INSERT INTO Student(ROLL_NO,NAME,Age)


SELECT ROLL_NO, NAME, Age FROM LateralStudent;

Output:

This query will insert the data in the columns ROLL_NO, NAME, and Age of the table LateralStudent in the table
Student and the remaining columns in the Student table will be filled by null which is the default value of the
remaining columns. The table Student will now look like this,

ROLL_NO NAME ADDRESS PHONE Age

1 Ram Delhi XXXXXXXXXX 18

2 RAMESH GURGAON XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

4 SURESH Delhi XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

2 RAMESH GURGAON XXXXXXXXXX 18

7 SOUVIK null null 18

8 NIRAJ null null 19


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ROLL_NO NAME ADDRESS PHONE Age

9 SOMESH null null 20

Select specific rows to insert:

INSERT INTO Student SELECT *


FROM LateralStudent WHERE Age = 18;

Output:

This query will select only the first row from table LateralStudent to insert into the table Student. The table Student
will now look like this,

ROLL_NO NAME ADDRESS PHONE Age

1 Ram Delhi XXXXXXXXXX 18

2 RAMESH GURGAON XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

4 SURESH Delhi XXXXXXXXXX 18

3 SUJIT ROHTAK XXXXXXXXXX 20

2 RAMESH GURGAON XXXXXXXXXX 18

7 SOUVIK DUMDUM XXXXXXXXXX 18

To insert multiple rows in a table using Single SQL Statement:

Syntax:

INSERT INTO table_name(Column1,Column2,Column3,…….)

VALUES (Value1, Value2,Value3,…..),

(Value1, Value2,Value3,…..),

(Value1, Value2,Value3,…..),

……………………….. ;

Where,

 table_name: name of the table.


Column 1: name of the first column, second column.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
 Values: Value1, Value2, Value3: the value of the first column, second column.

 For each new row inserted, you need To provide Multiple lists of values where each list is separated by “,”.
Every list of values corresponds to values to be inserted in each new row of the table. Values in the next list
tell values to be inserted in the next Row of the table.

Example:

The following SQL statement inserts multiple rows in Student Table.

Query:

INSERT INTO STUDENT(ID, NAME,AGE,GRADE,CITY)


VALUES(1,"AMIT KUMAR",15,10,"DELHI"),
(2,"GAURI RAO",18,12,"BANGALORE"),
(3,"MANAV BHATT",17,11,"NEW DELHI"),
(4,"RIYA KAPOOR",10,5,"UDAIPUR");

Output:

Thus STUDENT Table will look like this:

ID NAME AGE GRADE CITY

1 AMIT KUMAR 15 10 DELHI

2 GAURI RAO 18 12 BANGALORE

3 MANAV BHATT 17 11 NEW DELHI

4 RIYA KAPOOR 10 5 UDAIPUR

SQL | UPDATE Statement


The UPDATE statement in SQL is used to update the data of an existing table in the database. We can update single
columns as well as multiple columns using the UPDATE statement as per our requirement.

In a very simple way, we can say that SQL commands(UPDATE and DELETE) are used to change the data that is
already in the database. The SQL DELETE command uses a WHERE clause.

Syntax

UPDATE table_name SET column1 = value1, column2 = value2,…

WHERE condition;

table_name: name of the table

column1: name of first , second, third column….

value1: new value for first, second, third column….

condition: condition to select the rows for which the

values of columns needs to be updated.


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Parameter Explanation

1. UPDATE: Command is used to update the column value in the table.

2. WHERE: Specifies the condition which we want to implement on the table.

Note: In the above query the SET statement is used to set new values to the particular column and
the WHERE clause is used to select the rows for which the columns are needed to be updated. If we have not used
the WHERE clause then the columns in all the rows will be updated. So the WHERE clause is used to choose the
particular rows.

Let’s see the SQL update statement with examples.

Query:

CREATE TABLE Customer(


CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(50),
LastName VARCHAR(50),
Country VARCHAR(50),
Age int(2),
Phone int(10)
);

-- Insert some sample data into the Customers table


INSERT INTO Customer (CustomerID, CustomerName, LastName, Country, Age, Phone)
VALUES (1, 'Shubham', 'Thakur', 'India','23','xxxxxxxxxx'),
(2, 'Aman ', 'Chopra', 'Australia','21','xxxxxxxxxx'),
(3, 'Naveen', 'Tulasi', 'Sri lanka','24','xxxxxxxxxx'),
(4, 'Aditya', 'Arpan', 'Austria','21','xxxxxxxxxx'),
(5, 'Nishant. Salchichas S.A.', 'Jain', 'Spain','22','xxxxxxxxxx');

Select * from Customer;

Output:

Update Single Column

Update the column NAME and set the value to ‘Nitin’ in the rows where the Age is 22.

UPDATE Customer SET CustomerName


= 'Nitin' WHERE Age = 22;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output:

Updating Multiple Columns

Update the columns NAME to ‘Satyam’ and Country to ‘USA’ where CustomerID is 1.

UPDATE Customer SET CustomerName = 'Satyam',


Country = 'USA' WHERE CustomerID = 1;

Output:

Note: For updating multiple columns we have used comma(,) to separate the names and values of two columns.

Omitting WHERE Clause

If we omit the WHERE clause from the update query then all of the rows will get updated.

UPDATE Customer SET CustomerName = 'Shubham';

Output:

The table Customer will now look like this,


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

SQL DELETE Statement


SQL DELETE is a basic SQL operation used to delete data in a database. SQL DELETE is an important part of database
management DELETE can be used to selectively remove records from a database table based on certain conditions.
This SQL DELETE operation is important for database size management, data accuracy, and integrity.

Syntax:

DELETE FROM table_name

WHERE some_condition;

Parameter Explanation

 Some_condition: condition to choose a particular record.

 table_name: name of the table

Note: We can delete single as well as multiple records depending on the condition we provide in the WHERE clause.
If we omit the WHERE clause then all of the records will be deleted and the table will be empty.

The sample table is as follows GFG_Employees:

Query:

Assume we have created a table named GFG_Employee which contains the personal details of the Employee
including their id, name, email and department etc. as shown below −

CREATE TABLE GFG_Empoyees (


id INT PRIMARY KEY,
name VARCHAR (20) ,
email VARCHAR (25),
department VARCHAR(20),
);
INSERT INTO GFG_Employees (id,name,email,department) VALUES
(1,Jessie,jessie23@gmail.com,Developmet),
(2,Praveen,praveen_dagger@yahoo.com,HR),
(3,Bisa,dragonBall@gmail.com,Sales),
(4,Rithvik,msvv@hotmail.com,IT),
(5,Suraj,srjsunny@gmail.com,Quality Assurance),
(6,Om,OmShukla@yahoo.com,IT),
(7,Naruto,uzumaki@konoha.com,Development);
Select * From GFG_Employee

Output
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

GFG_Employees

Deleting Single Record

You can delete the records named Rithvik by using the below query:

Query

DELETE FROM GFG_Employees WHERE NAME = 'Rithvik';

Output

output

Deleting Multiple Records

Delete the rows from the table GFG_Employees where the department is “Development”. This will delete 2
rows(the first row and the seventh row).

Query

DELETE FROM GFG_Employees


WHERE department = 'Development';

Output
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

output

Delete All of the Records

To remove all the entries from the table, you can use the following query:

Query

DELETE FROM GFG_EMPLOyees;


Or
DELETE * FROM GFG_EMPLOyees;

Output

All of the records in the table will be deleted, there are no records left to display. The table GFG_EMPLOyees will
become empty.

output

SQL | Constraints
Constraints are the rules that we can apply on the type of data in a table. That is, we can specify the limit on the type
of data that can be stored in a particular column in a table using constraints.

The available constraints in SQL are:

 NOT NULL: This constraint tells that we cannot store a null value in a column. That is, if a column is specified
as NOT NULL then we will not be able to store null in this particular column any more.

 UNIQUE: This constraint when specified with a column, tells that all the values in the column must be
unique. That is, the values in any row of a column must not be repeated.

 PRIMARY KEY: A primary key is a field which can uniquely identify each row in a table. And this constraint is
used to specify a field in a table as primary key.

 FOREIGN KEY: A Foreign key is a field which can uniquely identify each row in a another table. And this
constraint is used to specify a field as Foreign key.

 CHECK: This constraint helps to validate the values of a column to meet a particular condition. That is, it
helps to ensure that the value stored in a column meets a specific condition.

 DEFAULT: This constraint specifies a default value for the column when no value is specified by the user.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
How to specify constraints?
We can specify constraints at the time of creating the table using CREATE TABLE statement. We can also specify the
constraints after creating a table using ALTER TABLE statement.

Syntax:
Below is the syntax to create constraints using CREATE TABLE statement at the time of creating the table.

CREATE TABLE sample_table

column1 data_type(size) constraint_name,

column2 data_type(size) constraint_name,

column3 data_type(size) constraint_name,

....

);

sample_table: Name of the table to be created.

data_type: Type of data that can be stored in the field.

constraint_name: Name of the constraint. for example- NOT NULL, UNIQUE, PRIMARY KEY etc.

Let us see each of the constraint in detail.

1. NOT NULL –
If we specify a field in a table to be NOT NULL. Then the field will never accept null value. That is, you will be not
allowed to insert a new row in the table without specifying any value to this field.
For example, the below query creates a table Student with the fields ID and NAME as NOT NULL. That is, we are
bound to specify values for these two fields every time we wish to insert a new row.

CREATE TABLE Student

ID int(6) NOT NULL,

NAME varchar(10) NOT NULL,

ADDRESS varchar(20)

);

2. UNIQUE –
This constraint helps to uniquely identify each row in the table. i.e. for a particular column, all the rows should have
unique values. We can have more than one UNIQUE columns in a table.
For example, the below query creates a table Student where the field ID is specified as UNIQUE. i.e, no two students
can have the same ID. Unique constraint in detail.

CREATE TABLE Student

(
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ID int(6) NOT NULL UNIQUE,

NAME varchar(10),

ADDRESS varchar(20)

);

3. PRIMARY KEY –
Primary Key is a field which uniquely identifies each row in the table. If a field in a table as primary key, then the field
will not be able to contain NULL values as well as all the rows should have unique values for this field. So, in other
words we can say that this is combination of NOT NULL and UNIQUE constraints.
A table can have only one field as primary key. Below query will create a table named Student and specifies the field
ID as primary key.

CREATE TABLE Student

ID int(6) NOT NULL UNIQUE,

NAME varchar(10),

ADDRESS varchar(20),

PRIMARY KEY(ID)

);

4. FOREIGN KEY –
Foreign Key is a field in a table which uniquely identifies each row of a another table. That is, this field points to
primary key of another table. This usually creates a kind of link between the tables.
Consider the two tables as shown below:

Orders

O_ID ORDER_NO C_ID

1 2253 3

2 3325 3

3 4521 2

4 8532 1

Customers

C_ID NAME ADDRESS

1 RAMESH DELHI

2 SURESH NOIDA

3 DHARMESH GURGAON
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
As we can see clearly that the field C_ID in Orders table is the primary key in Customers table, i.e. it uniquely
identifies each row in the Customers table. Therefore, it is a Foreign Key in Orders table.
Syntax:

CREATE TABLE Orders

O_ID int NOT NULL,

ORDER_NO int NOT NULL,

C_ID int,

PRIMARY KEY (O_ID),

FOREIGN KEY (C_ID) REFERENCES Customers(C_ID)

(i) CHECK –
Using the CHECK constraint we can specify a condition for a field, which should be satisfied at the time of entering
values for this field.
For example, the below query creates a table Student and specifies the condition for the field AGE as (AGE >= 18 ).
That is, the user will not be allowed to enter any record in the table with AGE < 18. Check constraint in detail

CREATE TABLE Student

ID int(6) NOT NULL,

NAME varchar(10) NOT NULL,

AGE int NOT NULL CHECK (AGE >= 18)

);

(ii) DEFAULT –
This constraint is used to provide a default value for the fields. That is, if at the time of entering new records in the
table if the user does not specify any value for these fields then the default value will be assigned to them.
For example, the below query will create a table named Student and specify the default value for the field AGE as
18.

CREATE TABLE Student

ID int(6) NOT NULL,

NAME varchar(10) NOT NULL,

AGE int DEFAULT 18

);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Primary key constraint in SQL
A primary key constraint depicts a key comprising one or more columns that will help uniquely identify every
tuple/record in a table.

Properties :

1. No duplicate values are allowed, i.e. Column assigned as primary key should have UNIQUE values only.

2. NO NULL values are present in column with Primary key. Hence there is Mandatory value in column having
Primary key.

3. Only one primary key per table exist although Primary key may have multiple columns.

4. No new row can be inserted with the already existing primary key.

5. Classified as : a) Simple primary key that has a Single column 2) Composite primary key has Multiple column.

6. Defined in Create table / Alter table statement.

The primary key can be created in a table using PRIMARY KEY constraint. It can be created at two levels.

1. Column

2. Table.

SQL PRIMARY KEY at Column Level :


If Primary key contains just one column, it should be defined at column level. The following code creates the Primary
key “ID” on the person table.

Syntax :

Create Table Person

Id int NOT NULL PRIMARY KEY,

Name varchar2(20),

Address varchar2(50)

);

Here NOT NULL is added to make sure ID should have unique values. SQL will automatically set null values to the
primary key if it is not specified.

Example-1 :
To verify the working of Primary key :

Insert into Person values(1, 'Ajay', 'Mumbai');

Output :

1 row created

Example-2 :
Let’s see if you will insert the same values again.

Insert into Person values(1, 'Ajay', 'Mumbai');

Output :

Error at line 1: unique constraint violated


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Since we are inserting duplicate values, an error will be thrown since the Primary key “Id” can contain only unique
values.

Example-3 :

Insert into Person values('', 'Ajay', 'Mumbai');

Output :

Error at line 1: cannot insert Null into<"user"."Person"."ID">

Primary Key cannot contain Null Values so That too will throw an error.

SQL PRIMARY KEY at Table Level :


Whenever the primary key contains multiple columns it has to be specified at Table level.

Syntax:

Create Table Person

(Id int NOT NULL,

Name varchar2(20),

Address varchar2(50),

PRIMARY KEY(Id, Name)

);

Here, you have only one Primary Key in a table but it consists of Multiple Columns(Id, Name). However, the
Following are permissible.

Insert into Person values(1, 'Ajay', 'Mumbai');

Insert into Person values(2, 'Ajay', 'Mumbai');

Since multiple columns make up Primary Key so both the rows are considered different. SQL permits either of the
two values can be duplicated but the combination must be unique.

SQL PRIMARY KEY with ALTER TABLE :


Most of the time, Primary Key is defined during the creation of the table but sometimes the Primary key may not be
created in the already existing table. We can however add Primary Key using Alter Statement.

Syntax :

Alter Table Person add Primary Key(Id);

To add Primary key in multiple columns using the following query.

Alter Table Person add Primary Key(Id, Name);

It is necessary that the column added as primary key MUST contain unique values or else it will be violated. An id
cannot be made Primary key if it contains duplicate values. It violates the basic rule of Primary Key. Altering the table
to add Id as a primary key that may contain duplicate values generates an error.

Output :

Error at line 1: cannot validate- primary key violated

Also, A column added as primary key using alter statement should not have null values. Altering table to add Id as
primary key that may contain null values generates an error.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output :

Error at line 1: column contains NULL values; cannot alter to NOT NULL

DELETE PRIMARY KEY CONSTRAINT :


To remove Primary Key constraint on table use given SQL as follows.

ALTER table Person DROP PRIMARY KEY;

Set Operations
Union
Union of the sets A and B, denoted by A ∪ B, is the set of distinct elements that belong to set A or set B, or both.

Venn diagram of A ∪ B

Above is the Venn Diagram of A U B.

Example: Find the union of A = {2, 3, 4} and B = {3, 4, 5};

Solution : A ∪ B = {2, 3, 4, 5}.

Intersection
The intersection of the sets A and B, denoted by A ∩ B, is the set of elements that belong to both A and B i.e. set of
the common elements in A and B.

Venn diagram of A ∩ B

Above is the Venn Diagram of A ∩ B.

Example: Find the intersection of A = {2, 3, 4} and B = {3, 4, 5}

Solution : A ∩ B = {3, 4}.


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Disjoint
Two sets are said to be disjoint if their intersection is the empty set. i.e, sets have no common elements.

Above is the Venn Diagram of A disjoint B.

Example: Let A = {1, 3, 5, 7, 9} and B = { 2, 4, 6, 8}

A and B are disjoint sets since both of them have no common elements.

Set Difference
The difference between sets is denoted by ‘A – B’, which is the set containing elements that are in A but not in B. i.e.,
all elements of A except the element of B.

Above is the Venn Diagram of A-B.

Example: If A = {1, 2, 3, 4, 5} and B = { 2, 4, 6, 8}, find A-B

Solution: A-B = {1, 3, 5}

Complement
The complement of a set A, denoted by AC is the set of all the elements except the elements in A. Complement of the
set A is U – A.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

Above is the Venn Diagram of Ac

Example: Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and A = {2, 4, 6, 8}.

Find AC

Solution: AC = U-A = {1, 3, 5, 7, 9, 10}

Addition & Subtraction

Addition of sets A and B, referred to as Minkowski addition, is the set in whose elements are the sum of each
possible pair of elements from the 2 sets (that is one element is from set A and the other is from set B).
Set subtraction follows the same rule, but with the subtraction operation on the elements. It is to be observed that
these operations are operable only on numeric data types. Even if operated otherwise, it would only be a symbolic
representation without any significance. Further, it can be seen easily that set addition is commutative, while
subtraction is not.

For addition and consequently subtraction, please refer to this answer.

[Tex]A-B=A\cap \bar{B} [/Tex]

1. Associative Properties: A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C

2. Commutative Properties: A ∪ B = B ∪ A and A ∩ B = B ∩ A

3. Identity Property for Union: A ∪ φ = A

4. Intersection Property of the Empty Set: A ∩ φ = φ

5. Distributive Properties: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) similarly for intersection.

Aggregate functions in SQL


In database management an aggregate function is a function where the values of multiple rows are grouped
together as input on certain criteria to form a single value of more significant meaning.

Various Aggregate Functions

1) Count()

2) Sum()

3) Avg()

4) Min()

5) Max()
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Now let us understand each Aggregate function with a example:

Id Name Salary

-----------------------

1 A 80

2 B 40

3 C 60

4 D 70

5 E 60

6 F Null

Count():

Count(*): Returns total number of records .i.e 6.

Count(salary): Return number of Non Null values over the column salary. i.e 5.

Count(Distinct Salary): Return number of distinct Non Null values over the column salary .i.e 4

Sum():

sum(salary): Sum all Non Null values of Column salary i.e., 310

sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.

Avg():

Avg(salary) = Sum(salary) / count(salary) = 310/5

Avg(Distinct salary) = sum(Distinct salary) / Count(Distinct Salary) = 250/4

Min():

Min(salary): Minimum value in the salary column except NULL i.e., 40.

Max(salary): Maximum value in the salary i.e., 80.

SQL | String functions


String functions
are used to perform an operation on input string and return an output string.
Following are the string functions defined in SQL:

1. ASCII(): This function is used to find the ASCII value of a character.

2. Syntax: SELECT ascii('t');

Output: 116

3. CHAR_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is used to find the
length of a word.

4. Syntax: SELECT char_length('Hello!');

Output: 6

5. CHARACTER_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is used to find
the length of a line.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
6. Syntax: SELECT CHARACTER_LENGTH('geeks for geeks');

Output: 15

7. CONCAT(): This function is used to add two words or strings.

8. Syntax: SELECT 'Geeks' || ' ' || 'forGeeks' FROM dual;

Output: ‘GeeksforGeeks’

9. CONCAT_WS(): This function is used to add two words or strings with a symbol as concatenating symbol.

10. Syntax: SELECT CONCAT_WS('_', 'geeks', 'for', 'geeks');

Output: geeks_for_geeks

11. FIND_IN_SET(): This function is used to find a symbol from a set of symbols.

12. Syntax: SELECT FIND_IN_SET('b', 'a, b, c, d, e, f');

Output: 2

13. FORMAT(): This function is used to display a number in the given format.

14. Syntax: Format("0.981", "Percent");

Output: ‘98.10%’

15. INSERT(): This function is used to insert the data into a database.

16. Syntax: INSERT INTO database (geek_id, geek_name) VALUES (5000, 'abc');

Output: successfully updated

17. INSTR(): This function is used to find the occurrence of an alphabet.

18. Syntax: INSTR('geeks for geeks', 'e');

Output: 2 (the first occurrence of ‘e’)

Syntax: INSTR('geeks for geeks', 'e', 1, 2 );

Output: 3 (the second occurrence of ‘e’)

19. LCASE(): This function is used to convert the given string into lower case.

20. Syntax: LCASE ("GeeksFor Geeks To Learn");

Output: geeksforgeeks to learn

21. LEFT(): This function is used to SELECT a sub string from the left of given size or characters.

22. Syntax: SELECT LEFT('geeksforgeeks.org', 5);

Output: geeks

23. LENGTH(): This function is used to find the length of a word.

24. Syntax: LENGTH('GeeksForGeeks');

Output: 13

25. LOCATE(): This function is used to find the nth position of the given word in a string.

26. Syntax: SELECT LOCATE('for', 'geeksforgeeks', 1);


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output: 6

27. LOWER(): This function is used to convert the upper case string into lower case.

28. Syntax: SELECT LOWER('GEEKSFORGEEKS.ORG');

Output: geeksforgeeks.org

29. LPAD(): This function is used to make the given string of the given size by adding the given symbol.

30. Syntax: LPAD('geeks', 8, '0');

31. Output:

000geeks

32. LTRIM(): This function is used to cut the given sub string from the original string.

33. Syntax: LTRIM('123123geeks', '123');

Output: geeks

34. MID(): This function is to find a word from the given position and of the given size.

35. Syntax: Mid ("geeksforgeeks", 6, 2);

Output: for

36. POSITION(): This function is used to find position of the first occurrence of the given alphabet.

37. Syntax: SELECT POSITION('e' IN 'geeksforgeeks');

Output: 2

38. REPEAT(): This function is used to write the given string again and again till the number of times mentioned.

39. Syntax: SELECT REPEAT('geeks', 2);

Output: geeksgeeks

40. REPLACE(): This function is used to cut the given string by removing the given sub string.

41. Syntax: REPLACE('123geeks123', '123');

Output: geeks

42. REVERSE(): This function is used to reverse a string.

43. Syntax: SELECT REVERSE('geeksforgeeks.org');

Output: ‘gro.skeegrofskeeg’

44. RIGHT(): This function is used to SELECT a sub string from the right end of the given size.

45. Syntax: SELECT RIGHT('geeksforgeeks.org', 4);

Output: ‘.org’

46. RPAD(): This function is used to make the given string as long as the given size by adding the given symbol on
the right.

47. Syntax: RPAD('geeks', 8, '0');

Output: ‘geeks000’

48. RTRIM(): This function is used to cut the given sub string from the original string.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
49. Syntax: RTRIM('geeksxyxzyyy', 'xyz');

Output: ‘geeks’

50. SPACE(): This function is used to write the given number of spaces.

51. Syntax: SELECT SPACE(7);

Output: ‘ ‘

52. STRCMP(): This function is used to compare 2 strings.

 If string1 and string2 are the same, the STRCMP function will return 0.

 If string1 is smaller than string2, the STRCMP function will return -1.

 If string1 is larger than string2, the STRCMP function will return 1.

53. Syntax: SELECT STRCMP('google.com', 'geeksforgeeks.com');

Output: -1

54. SUBSTR(): This function is used to find a sub string from the a string from the given position.

55. Syntax:SUBSTR('geeksforgeeks', 1, 5);

Output: ‘geeks’

56. SUBSTRING(): This function is used to find an alphabet from the mentioned size and the given string.

57. Syntax: SELECT SUBSTRING('GeeksForGeeks.org', 9, 1);

Output: ‘G’

58. SUBSTRING_INDEX(): This function is used to find a sub string before the given symbol.

59. Syntax: SELECT SUBSTRING_INDEX('www.geeksforgeeks.org', '.', 1);

Output: ‘www’

60. TRIM(): This function is used to cut the given symbol from the string.

61. Syntax: TRIM(LEADING '0' FROM '000123');

Output: 123

62. UCASE(): This function is used to make the string in upper case.

63. Syntax: UCASE ("GeeksForGeeks");

64. Output:

GEEKSFORGEEKS

SQL | Date functions


In SQL, dates are complicated for newbies, since while working with a database, the format of the data in the table
must be matched with the input data to insert. In various scenarios instead of date, datetime (time is also involved
with date) is used.

For storing a date or a date and time value in a database,MySQL offers the following data types:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
DATE format YYYY-MM-DD

DATETIME format: YYYY-MM-DD HH:MI: SS

TIMESTAMP format: YYYY-MM-DD HH:MI: SS

YEAR format YYYY or YY

Now, come to some popular functions in SQL date functions.

NOW()

Returns the current date and time.

Query:

SELECT NOW();

Output:

CURDATE()

Returns the current date.

Query:

SELECT CURDATE();

Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
CURTIME()

Returns the current time.

Query:

SELECT CURTIME();

Output:

DATE()

Extracts the date part of a date or date/time expression. Example: For the below table named ‘Test’

Id Name BirthTime

4120 Pratik 1996-09-26 16:44:15.581

Query:

SELECT Name, DATE(BirthTime)

AS BirthDate FROM Test;

Output:

Name BirthDate

Pratik 1996-09-26

EXTRACT()

Returns a single part of a date/time.

Syntax

EXTRACT(unit FROM date);

Several units can be considered but only some are used such as MICROSECOND, SECOND, MINUTE, HOUR, DAY,
WEEK, MONTH, QUARTER, YEAR, etc. And ‘date’ is a valid date expression. Example: For the below table named
‘Test’
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Id Name BirthTime

4120 Pratik 1996-09-26 16:44:15.581

Query:

SELECT Name, Extract(DAY FROM

BirthTime) AS BirthDay FROM Test;

Output:

Name Birthday

Pratik 26

Query:

SELECT Name, Extract(YEAR FROM BirthTime)

AS BirthYear FROM Test;

Output:

Name BirthYear

Pratik 1996

Query:

SELECT Name, Extract(SECOND FROM

BirthTime) AS BirthSecond FROM Test;

Output:

Name BirthSecond

Pratik 581

DATE_ADD()

Adds a specified time interval to a date.

Syntax:

DATE_ADD(date, INTERVAL expr type);

Where, date – valid date expression, and expr is the number of intervals we want to add. and type can be one of the
following: MICROSECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR, etc. Example: For the
below table named ‘Test’
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Id Name BirthTime

4120 Pratik 1996-09-26 16:44:15.581

Query:

SELECT Name, DATE_ADD(BirthTime, INTERVAL

1 YEAR) AS BirthTimeModified FROM Test;

Output:

Name BirthTimeModified

Pratik 1997-09-26 16:44:15.581

Query:

SELECT Name, DATE_ADD(BirthTime,

INTERVAL 30 DAY) AS BirthDayModified FROM Test;

Output:

Name BirthDayModified

Pratik 1996-10-26 16:44:15.581

Query:

SELECT Name, DATE_ADD(BirthTime, INTERVAL

4 HOUR) AS BirthHourModified FROM Test;

Output:

Name BirthSecond

Pratik 1996-10-26 20:44:15.581

DATE_SUB()

Subtracts a specified time interval from a date. The syntax for DATE_SUB is the same as DATE_ADD just the
difference is that DATE_SUB is used to subtract a given interval of date.

DATEDIFF()

Returns the number of days between two dates.

Syntax:

DATEDIFF(date1, date2);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
date1 & date2- date/time expression

Query:

SELECT DATEDIFF('2017-01-13','2017-01-03') AS DateDiff;

Output:

DateDiff

10

DATE_FORMAT()

Displays date/time data in different formats.

Syntax:

DATE_FORMAT(date,format);

the date is a valid date and the format specifies the output format for the date/time. The formats that can be used
are:

 %a-Abbreviated weekday name (Sun-Sat)

 %b-Abbreviated month name (Jan-Dec)

 %c-Month, numeric (0-12)

 %D-Day of month with English suffix (0th, 1st, 2nd, 3rd)

 %d-Day of the month, numeric (00-31)

 %e-Day of the month, numeric (0-31)

 %f-Microseconds (000000-999999)

 %H-Hour (00-23)

 %h-Hour (01-12)

 %I-Hour (01-12)

 %i-Minutes, numeric (00-59)

 %j-Day of the year (001-366)

 %k-Hour (0-23)

 %l-Hour (1-12)

 %M-Month name (January-December)

 %m-Month, numeric (00-12)

 %p-AM or PM

 %r-Time, 12-hour (hh:mm: ss followed by AM or PM)

 %S-Seconds (00-59)

 %s-Seconds (00-59)
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
 %T-Time, 24-hour (hh:mm: ss)

 %U-Week (00-53) where Sunday is the first day of the week

 %u-Week (00-53) where Monday is the first day of the week

 %V-Week (01-53) where Sunday is the first day of the week, used with %X

 %v-Week (01-53) where Monday is the first day of the week, used with %x

 %W-Weekday name (Sunday-Saturday)

 %w-Day of the week (0=Sunday, 6=Saturday)

 %X-Year for the week where Sunday is the first day of the week, four digits, used with %V

 %x-Year for the week where Monday is the first day of the week, four digits, used with %v

 %Y-Year, numeric, four digits

 %y-Year, numeric, two digits

NULL values in SQL


In SQL there may be some records in a table that do not have values or data for every field and those fields are
termed as a NULL value.

NULL values could be possible because at the time of data entry information is not available. So SQL supports a
special value known as NULL which is used to represent the values of attributes that may be unknown or not apply
to a tuple. SQL places a NULL value in the field in the absence of a user-defined value. For example, the
Apartment_number attribute of an address applies only to addresses that are in apartment buildings and not to
other types of residences.

So, NULL values are those values in which there is no data value in the particular field in the table.

Importance of NULL Value

 It is important to understand that a NULL value differs from a zero value.

 A NULL value is used to represent a missing value, but it usually has one of three different interpretations:

 The value unknown (value exists but is not known)

 Value not available (exists but is purposely withheld)

 Attribute not applicable (undefined for this tuple)

 It is often not possible to determine which of the meanings is intended. Hence, SQL does not distinguish
between the different meanings of NULL.

Principles of NULL values

 Setting a NULL value is appropriate when the actual value is unknown, or when a value is not meaningful.

 A NULL value is not equivalent to a value of ZERO if the data type is a number and is not equivalent to spaces
if the data type is a character.

 A NULL value can be inserted into columns of any data type.

 A NULL value will evaluate NULL in any expression.

 Suppose if any column has a NULL value, then UNIQUE, FOREIGN key, and CHECK constraints will ignore by
SQL.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
In general, each NULL value is considered to be different from every other NULL in the database. When a NULL is
involved in a comparison operation, the result is considered to be UNKNOWN. Hence, SQL uses a three-valued logic
with values True, False, and Unknown. It is, therefore, necessary to define the results of three-valued logical
expressions when the logical connectives AND, OR, and NOT are used.

How To Test for NULL Values?

SQL allows queries that check whether an attribute value is NULL. Rather than using = or to compare an attribute
value to NULL, SQL uses IS and IS NOT. This is because SQL considers each NULL value as being distinct from every
other NULL value, so equality comparison is not appropriate.

Now, consider the following Employee Table.

Query:

CREATE TABLE Employee (

Fname VARCHAR(50),

Lname VARCHAR(50),

SSN VARCHAR(11),

Phoneno VARCHAR(15),

Salary FLOAT

);

INSERT INTO Employee (Fname, Lname, SSN, Phoneno, Salary)

VALUES

('Shubham', 'Thakur', '123-45-6789', '9876543210', 50000.00),

('Aman', 'Chopra', '234-56-7890', NULL, 45000.00),

('Aditya', 'Arpan', NULL, '8765432109', 55000.00),

('Naveen', 'Patnaik', '345-67-8901', NULL, NULL),

('Nishant', 'Jain', '456-78-9012', '7654321098', 60000.00);


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output:

The IS NULL Operator

Suppose we find the Fname and Lname of the Employee having no Super_ssn then the query will be:

Query:

SELECT Fname, Lname FROM Employee WHERE SSN IS NULL;

Output:

The IS NOT NULL Operator

Now if we find the Count of number of Employees having SSNs.

Query:

SELECT COUNT(*) AS Count FROM Employee WHERE SSN IS NOT NULL;

Output:

Updating NULL Values in a Table

We can update the NULL values present in a table using the UPDATE statement in SQL. To do so, we can use the IS
NULL operator in the WHERE clause to select the rows with NULL values and then we can set the new value using the
SET keyword.

Let’s suppose that we want to update SSN in the row where it is NULL.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Query:

UPDATE Employee

SET SSN = '789-01-2345'

WHERE Fname = 'Aditya' AND Lname = 'Arpan';

select* from Employee;

Output:

Nested Queries in SQL


Prerequisites :

Basics of SQL

Nested queries are a way to perform more complex queries by embedding one query within another. A nested
query is a query that appears inside another query, and it helps retrieve data from multiple tables or apply
conditions based on the results of another query. The result of inner query is used in execution of outer query. We
will use STUDENT, COURSE, STUDENT_COURSE tables for understanding nested queries.

STUDENT

S_ID S_NAME S_ADDRESS S_PHONE S_AGE

S1 RAM DELHI 9455123451 18

S2 RAMESH GURGAON 9652431543 18

S3 SUJIT ROHTAK 9156253131 20

S4 SURESH DELHI 9156768971 18


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
COURSE

C_ID C_NAME

C1 DSA

C2 Programming

C3 DBMS

STUDENT_COURSE

S_ID C_ID

S1 C1

S1 C3

S2 C1

S3 C2

S4 C2

S4 C3

There are mainly two types of nested queries:

 Independent Nested Queries: In independent nested queries, query execution starts from innermost query
to outermost queries. The execution of inner query is independent of outer query, but the result of inner
query is used in execution of outer query. Various operators like IN, NOT IN, ANY, ALL etc are used in writing
independent nested queries. IN: If we want to find out S_ID who are enrolled in C_NAME ‘DSA’ or ‘DBMS’,
we can write it with the help of independent nested query and IN operator. From COURSE table, we can find
out C_ID for C_NAME ‘DSA’ or DBMS’ and we can use these C_IDs for finding S_IDs
from STUDENT_COURSE TABLE. STEP 1: Finding C_ID for C_NAME =’DSA’ or ‘DBMS’
Select C_ID from COURSE where C_NAME = ‘DSA’ or C_NAME = ‘DBMS’ STEP 2: Using C_ID of step 1 for
finding S_ID Select S_ID from STUDENT_COURSE where C_ID IN
(SELECT C_ID from COURSE where C_NAME = ‘DSA’ or C_NAME=’DBMS’); The inner query will return a set
with members C1 and C3 and outer query will return those S_IDs for which C_ID is equal to any member of
set (C1 and C3 in this case). So, it will return S1, S2 and S4. Note: If we want to find out names of STUDENTs
who have either enrolled in ‘DSA’ or ‘DBMS’, it can be done as: Select S_NAME from STUDENT where S_ID IN
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
(Select S_ID from STUDENT_COURSE where C_ID IN (SELECT C_ID from COURSE where C_NAME=’DSA’
or C_NAME=’DBMS’)); NOT IN: If we want to find out S_IDs of STUDENTs who have neither enrolled in
‘DSA’ nor in ‘DBMS’, it can be done as: Select S_ID from STUDENT where S_ID NOT IN
(Select S_ID from STUDENT_COURSE where C_ID IN (SELECT C_ID from COURSE where C_NAME=’DSA’
or C_NAME=’DBMS’)); The innermost query will return a set with members C1 and C3. Second inner query
will return those S_IDs for which C_ID is equal to any member of set (C1 and C3 in this case) which are S1, S2
and S4. The outermost query will return those S_IDs where S_ID is not a member of set (S1, S2 and S4). So it
will return S3.

 Co-related Nested Queries: In co-related nested queries, the output of inner query depends on the row
which is being currently executed in outer query. e.g.; If we want to find out S_NAME of STUDENTs who are
enrolled in C_ID ‘C1’, it can be done with the help of co-related nested query as: Select S_NAME
from STUDENT S where EXISTS ( select * from STUDENT_COURSE SC where S.S_ID=SC.S_ID and
SC.C_ID=’C1’); For each row of STUDENT S, it will find the rows from STUDENT_COURSE where S.S_ID =
SC.S_ID and SC.C_ID=’C1’. If for a S_ID from STUDENT S, atleast a row exists in STUDENT_COURSE SC
with C_ID=’C1’, then inner query will return true and corresponding S_ID will be returned as output.

EXAMPLE IN SQL CODE:

SELECT StudentName

FROM Students

WHERE StudentID IN (

SELECT StudentID

FROM Grades

WHERE Subject = ‘Mathematics’ AND Score > 90

);

This article has been contributed by Muhammed Sami. Please write comments if you find anything incorrect, or
you want to share more information about the topic discussed above

SQL Concepts and Queries


In this article, we will discuss the overview of SQL and will mainly focus on Concepts and Queries and will understand
each with the help of examples. Let’s discuss it one by one.

Overview :
SQL is a computer language that is used for storing, manipulating, and retrieving data in a structured format. This
language was invented by IBM. Here SQL stands for Structured Query Language. Interacting databases with SQL
queries, we can handle a large amount of data. There are several SQL-supported database servers such as
MySQL, PostgreSQL, sqlite3 and so on. Data can be stored in a secured and structured format through these
database servers. SQL queries are often used for data manipulation and business insights better.

SQL Database :
Here, we will discuss the queries and will understand with the help of examples.

Query-1 :
Show existing databases –
Let’s consider the existing database like information_schema, mysql, performance_schema, sakila, student, sys, and
world. And if you want to show the exiting database then we will use the show database query as follows.

SHOW DATABASES;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output :

Existing database Name

information_schema

mysql

performance_schema

sakila

student

sys

world

Query-2 :
Drop a database –
Suppose we want to drop the database namely student.

DROP DATABASE student;

SHOW DATABASES;

Database Name

information_schema

mysql

performance_schema

sakila

sys
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Database Name

world

Query-3 :
Create a database –
Suppose we want to create a database namely a bank.

CREATE DATABASE bank;

SHOW DATABASES;

Database Name

bank

information_schema

mysql

performance_schema

sakila

sys

world

Query-4 :
Using a database –

USE bank;

Query-5 :
Create a Table –
Here data type may be varchar, integer, date, etc.

CREATE TABLE table_name (

column1 datatype,

column2 datatype,

....

);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Example –

CREATE TABLE IF NOT EXISTS Employee (

EmployeeID int,

FirstName varchar(55),

LastName varchar(55),

Email varchar(150),

DOB date

);

Query-6 :
Show tables in the same database –

SHOW TABLES;

Query-7 :
Dropping a Table –

DROP TABLE table_name;

Query-8 :
Inserting values into an existing table –

INSERT INTO Employee

VALUES(1111,'Dipak','Bera','dipakbera@gmail.com','1994-11-22');

Query-9 :
Fetching values in a table –

SELECT * FROM Employee;

Query-10 :
Not Null –
We can specify which column does not accept the null value when we insert a value(row) in a table. It will be done at
the time of table creation.

CREATE TABLE table_name (

column1 datatype NOT NULL,

column2 datatype,

....

);

Query-11 :
Unique –
We can also specify that entries in a particular column should be unique.

CREATE TABLE table_name (

column1 datatype UNIQUE,

column2 datatype,

....
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
);

Example –

CREATE TABLE demo_table

EmployeeID int NOT NULL UNIQUE,

FirstName varchar(55),

LastName varchar(55)

);

KEY CONCEPTS in SQL :


Here, we will discuss some important concepts like keys, join operations, having clauses, order by, etc. Let’s discuss it
one by one.

 PRIMARY KEY –
The constraint PRIMARY KEY suggests that entries should be neither null nor duplicate corresponding to the
specified column.

CREATE TABLE IF NOT EXISTS Customer(

CustID int NOT NULL,

FName varchar(55),

LName varchar(55),

Email varchar(100),

DOB date,

CONSTRAINT customer_custid_pk PRIMARY KEY(CustID)

);

 FOREIGN KEY –
The FOREIGN KEY is used to build a connection between the current table and the previous table containing
the primary key.

CREATE TABLE Account(

AccNo int NOT NULL,

AType varchar(20),

OBal int,

OD date,

CurBal int,

CONSTRAINT customer_AccNo_fk FOREIGN KEY(AccNo) REFERENCES Customer(CustID)

);

 Here, AccNo column in the Account table is referred to by CustID column in the Customer table. Here
Account table is a child table and the Customer table is the parent table.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ORDER BY :
The ORDER BY keyword is used to show the result in ascending or descending order. By default, it is in ascending
order.

Syntax –

SELECT column1, column2, ...

FROM table_name

ORDER BY column1, column2, ... ASC|DESC;

 Scenario-1 :
Suppose we have the Account Table as follows.

AccNo AType OBal OD CurBal

1111 savings 1000 1990-11-09 6000

1114 current 2000 1992-10-07 1000

1113 current 7000 1992-11-03 4000

1112 savings 1000 2003-12-12 3000

 Now, we will use the Order By command as follows.

SELECT * FROM Account ORDER BY CurBal;

Output :
(By default it will be in increasing order)

AccNo AType OBal OD CurBal

1114 current 2000 1992-10-07 1000

1112 savings 1000 2003-12-12 3000

1113 current 7000 1992-11-03 4000

1111 savings 1000 1990-11-09 6000

 Scenario-2 :
For descending order :

SELECT * FROM Account ORDER BY CurBal DESC;


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output :

AccNo AType OBal OD CurBal

1111 savings 1000 1990-11-09 6000

1113 current 7000 1992-11-03 4000

1112 savings 1000 2003-12-12 3000

1114 current 2000 1992-10-07 1000

GROUP BY :
This keyword is used for grouping the results.

Example –

SELECT COUNT(AType) FROM Account GROUP BY AType;

Output :

AType count(AType)

savings 2

current 2

JOIN CONCEPTS :
Here, we will discuss the join concept as follows.

 LEFT JOIN :
The LEFT JOIN keyword returns all records from the left table (table1) along with the matching records from
the right table (table2).
Syntax –

SELECT column_name(s)

FROM table1

LEFT JOIN table2

ON table1.column_name = table2.column_name;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

LEFT JOIN

 RIGHT JOIN :
The RIGHT JOIN keyword returns all records from the right table (table2) along with the matching records
from the left table (table1).

 Syntax –

SELECT column_name(s)

FROM table1

RIGHT JOIN table2

ON table1.column_name = table2.column_name;

RIGHT JOIN

 INNER JOIN :
The INNER JOIN keyword returns all matching records from both the table.

 Syntax –

SELECT column_name(s)

FROM table1

INNER JOIN table2

ON table1.column_name = table2.column_name;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
INNER JOIN

 FULL JOIN :
The FULL JOIN or FULL OUTER JOIN keyword returns all records from both the table.

 Syntax –

SELECT column_name(s)

FROM table1

FULL OUTER JOIN table2

ON table1.column_name = table2.column_name;

FULL JOIN

 Note –
This keyword is not used in MySQL’s latest version. Instead, the keyword UNION is used. Here the syntax is
as follows.

SELECT column_name(s) FROM table1

UNION

SELECT column_name(s) FROM table2;

 SELF JOIN :
This is a regular join between aliases of the same table.

 Syntax –

SELECT column_name(s)

FROM table1 T1, table1 T2

WHERE condition;

WHERE CLAUSE :
This clause is used for filtering our data.

Syntax –

SELECT column1, column2, ...

FROM table_name

WHERE condition;

Example –

SELECT AccNo,CurBal FROM Account WHERE CurBal>=1000;

Output :
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
AccNo CurBal

1111 6000

1113 4000

1114 1000

HAVING CLAUSE :
This is required as the WHERE clause does not support aggregate functions such as count, min, max, avg, sum, and so
on.

SELECT column1, column2, ...

FROM table_name

HAVING condition;

Example –

SELECT AccNo,CurBal FROM Account HAVING CurBal=MAX(CurBal);

Output :

AccNo CurBal

1111 6000

SQL | Join (Inner, Left, Right and Full Joins)


SQL Join statement is used to combine data or rows from two or more tables based on a common field between
them. Different types of Joins are as follows:

 INNER JOIN

 LEFT JOIN

 RIGHT JOIN

 FULL JOIN

 NATURAL JOIN

Consider the two tables below as follows:


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Student

StudentCourse

The simplest Join is INNER JOIN.

A. INNER JOIN

The INNER JOIN keyword selects all rows from both the tables as long as the condition is satisfied. This keyword will
create the result-set by combining all rows from both the tables where the condition satisfies i.e value of the
common field will be the same.

Syntax:

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
table1: First table.
table2: Second table
matching_column: Column common to both the tables.

Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER JOIN.

Example Queries(INNER JOIN)

This query will show the names and age of students enrolled in different courses.

SELECT StudentCourse.COURSE_ID, Student.NAME, Student.AGE FROM Student


INNER JOIN StudentCourse
ON Student.ROLL_NO = StudentCourse.ROLL_NO;

Output:

B. LEFT JOIN

This join returns all the rows of the table on the left side of the join and matches rows for the table on the right side
of the join. For the rows for which there is no matching row on the right side, the result-set will contain null. LEFT
JOIN is also known as LEFT OUTER JOIN.

Syntax:

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

table1: First table.


table2: Second table
matching_column: Column common to both the tables.

Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.

Example Queries(LEFT JOIN):

SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
LEFT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;

Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
C. RIGHT JOIN

RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and
matching rows for the table on the left side of the join. For the rows for which there is no matching row on the left
side, the result-set will contain null. RIGHT JOIN is also known as RIGHT OUTER JOIN.

Syntax:

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;

table1: First table.


table2: Second table
matching_column: Column common to both the tables.

Note: We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the same.

Example Queries(RIGHT JOIN):

SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
RIGHT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;

Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
D. FULL JOIN

FULL JOIN creates the result-set by combining results of both LEFT JOIN and RIGHT JOIN. The result-set will contain
all the rows from both tables. For the rows for which there is no matching, the result-set will contain NULL values.

Syntax:

SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;

table1: First table.


table2: Second table
matching_column: Column common to both the tables.

Example Queries(FULL JOIN):

SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output:

NAME COURSE_ID

HARSH 1

PRATIK 2

RIYANKA 2

DEEP 3

SAPTARHI 1

DHANRAJ NULL

ROHIT NULL

NIRAJ NULL

NULL 4

NULL 5

NULL 4

Left JOIN (Video)


Right JOIN (Video)
Full JOIN (Video)
SQL | JOIN (Cartesian Join, Self Join)

E. Natural join (?)

Natural join can join tables based on the common columns in the tables being joined. A natural join returns all rows
by matching values in common columns having same name and data type of columns and that column should be
present in both tables.

Both table must have at list one common column with same column name and same data type.

The two table are joined using Cross join.


UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
DBMS will look for a common column with same name and data type Tuples having exactly same values in common
columns are kept in result.

Example:

Employee

Emp_id Emp_name Dept_id

1 Ram 10

2 Jon 30

3 Bob 50

Department

Dept_id Dept_name

10 IT

30 HR

40 TIS

Query: Find all Employees and their respective departments.

Solution: (Employee) ? (Department)

Emp_id Emp_name Dept_id Dept_id Dept_name

1 Ram 10 10 IT

2 Jon 30 30 HR

Employee data Department data

PL/SQL Introduction
PL/SQL is a block structured language that enables developers to combine the power of SQL with procedural
statements.All the statements of a block are passed to oracle engine all at once which increases processing speed
and decreases the traffic.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Basics of PL/SQL

 •PL/SQL stands for Procedural Language extensions to the Structured Query Language (SQL).

 PL/SQL is a combination of SQL along with the procedural features of programming languages.

 Oracle uses a PL/SQL engine to processes the PL/SQL statements.

 PL/SQL includes procedural language elements like conditions and loops. It allows declaration of constants
and variables, procedures and functions, types and variable of those types and triggers.

Disadvantages of SQL:

 SQL doesn’t provide the programmers with a technique of condition checking, looping and branching.

 SQL statements are passed to Oracle engine one at a time which increases traffic and decreases speed.

 SQL has no facility of error checking during manipulation of data.

Features of PL/SQL:

1. PL/SQL is basically a procedural language, which provides the functionality of decision making, iteration and
many more features of procedural programming languages.

2. PL/SQL can execute a number of queries in one block using single command.

3. One can create a PL/SQL unit such as procedures, functions, packages, triggers, and types, which are stored
in the database for reuse by applications.

4. PL/SQL provides a feature to handle the exception which occurs in PL/SQL block known as exception
handling block.

5. Applications written in PL/SQL are portable to computer hardware or operating system where Oracle is
operational.

6. PL/SQL Offers extensive error checking.

Differences between SQL and PL/SQL:

SQL PL/SQL

SQL is a single query that is used to perform DML and PL/SQL is a block of codes that used to write the entire
DDL operations. program blocks/ procedure/ function, etc.

It is declarative, that defines what needs to be done, PL/SQL is procedural that defines how the things needs
rather than how things need to be done. to be done.

Execute as a single statement. Execute as a whole block.

Mainly used to manipulate data. Mainly used to create an application.

Cannot contain PL/SQL code in it. It is an extension of SQL, so it can contain SQL inside it.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Structure of PL/SQL Block:

PL/SQL extends SQL by adding constructs found in procedural languages, resulting in a structural language that is
more powerful than SQL. The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which can

be nested within each other.

Typically, each block performs a logical action in the program. A block has the following structure:

DECLARE

declaration statements;

BEGIN

executable statements

EXCEPTIONS

exception handling statements

END;

 Declare section starts with DECLARE keyword in which variables, constants, records as cursors can be
declared which stores data temporarily. It basically consists definition of PL/SQL identifiers. This part of the
code is optional.

 Execution section starts with BEGIN and ends with END keyword.This is a mandatory section and here the
program logic is written to perform any task like loops and conditional statements. It supports
all DML commands, DDL commands and SQL*PLUS built-in functions as well.

 Exception section starts with EXCEPTION keyword.This section is optional which contains statements that
are executed when a run-time error occurs. Any exceptions can be handled in this section.

PL/SQL identifiers

There are several PL/SQL identifiers such as variables, constants, procedures, cursors, triggers etc.

1. Variables: Like several other programming languages, variables in PL/SQL must be declared prior to its use.
They should have a valid name and data type as well. Syntax for declaration of variables:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
variable_name datatype [NOT NULL := value ];

Blocks in PL/SQL
In PL/SQL, All statements are classified into units that is called Blocks. PL/SQL blocks can include variables, SQL
statements, loops, constants, conditional statements and exception handling. Blocks can also build a function or a
procedure or a package.

The Declaration section: Code block start with a declaration section, in which memory variables, constants, cursors
and other oracle objects can be declared and if required initialized.

The Begin section: Consist of set of SQL and PL/SQL statements, which describe processes that have to be applied to
table data. Actual data manipulation, retrieval, looping and branching constructs are specified in this section.

The Exception section: This section deals with handling errors that arise during execution data manipulation
statements, which make up PL/SQL code block. Errors can arise due to syntax, logic and/or validation rule.

The End section: This marks the end of a PL/SQL block.

Broadly, PL/SQL blocks are two types: Anonymous blocks and Named blocks are as follows:

1. Anonymous blocks: In PL/SQL, That’s blocks which is not have header are known as anonymous blocks. These
blocks do not form the body of a function or triggers or procedure. Example: Here a code example of find greatest
number with Anonymous blocks.

2. Named blocks: That’s PL/SQL blocks which having header or labels are known as Named blocks. These blocks can
either be subprograms like functions, procedures, packages or Triggers. Example: Here a code example of find
greatest
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
PL/SQL Architecture
Prerequisite : PL/SQL Introduction

What do you mean by PL/SQL?

In Oracle, PL/SQL (Procedural Language/SQL) is the procedural language extension to the non-procedural SQL. It
combines the data manipulation power of SQL and the procedural power of standard programming
languages. PL/SQL was developed by Oracle Corporation within the early ’90s to reinforce the capabilities of SQL. It
integrates well with SQL* PLUS and other application development products of Oracle. PL/SQL is the superset of SQL.
It provides SQL data manipulation commands and SQL data types. In PL/SQL, a block without any name is called
Anonymous Block. PL/ SQL block consists of various functions, library, procedures, trigger, packages etc.

The following points should be remembered while writing a PL/SQL program –

 In PL/SQL the semicolon (;) is placed at the end of an SQL statement or PL/SQL control statement.

 Section keyword DECLARE, BEGIN and EXCEPTION are not followed by semicolons.

 END keyword and all other PL/SQL statements require a semicolon to terminate the statements.

 Before you start creating code, define the goals and capabilities of your PL/SQL software.

 Select names for constants, variables, and other identifiers that are both meaningful and descriptive.

 To keep track of the modifications in your PL/SQL code, use version control tools (like Git).

 Keep the use of universal variables to a minimum.

 Observe the least privilege principle.

 Use the COMMIT and ROLLBACK instructions to provide proper transaction management.

Features of PL/SQL :

The various features of PL/SQL are given below –

 PL/SQL runs on various operating systems such as windows, Linux etc.

 PL/SQL have an error-checking facility and displays user-friendly messages when an error occurs in a
program.

 It offers logging and debugging capabilities, including the capacity to use exception messages.

 SQL can be executed dynamically.

 When certain data events, such as INSERT, UPDATE, or DELETE actions on a table, occur, triggers are
specialized forms of stored processes that are automatically invoked.

 Multi-row queries are handled using cursors.

 The declaration of variables and constants to store data values is supported.

What do you mean by PL/SQL Architecture?

The PL/SQL runtime system is a technology and not an independent product. This technology is actually like an
engine that exhibits PL/SQL blocks, subprograms like functions and procedures. This engine can be installed in an
Oracle Server or in application development tools such as Oracle Form Builder, Oracle Reports Builder etc.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)

PL/SQL Architecture

PL/SQL can reside in two environments –

1. The Oracle Server

2. The Oracle tools

These two environments are independent of each other. In either environment, the PL/SQL engine accepts any valid
PL/SQL block as input. The PL/SQL engine executes the procedural part of the statements and sends the SQL
statement executer in the Oracle Server. A single transfer is required to send the block from the application to the
Oracle Server, thus improving performance, especially in a Client-Server network. PL/SQL code can also be stored in
the Oracle server as subprograms that can be referenced by any number of applications connected to the database.

Advantages of PL/SQL :

 PL/SQL provides better performance.

 PL/SQL has high Productivity.

 It supports Object-Oriented Programming concepts.

 It has Scalability and Manageability.

 PL/SQL supports various Web Application Development tools.

Disadvantages of PL/SQL :

 PL/SQL requires high memory.

 Lacks of functionality debugging in stored procedures.

number with Named blocks means using function.


UNIT 5: Functional Dependencies & Normalization
Normalization Process in DBMS
Database Normalization is a stepwise formal process that allows us to decompose database tables in such a way that
both data dependency and update anomalies are minimized. It makes use of functional dependency that exists in the
table and the primary key or candidate key in analyzing the tables. Normal forms were initially proposed called

 First Normal Form (INF)

 Second Normal Form (2NF)

 Third Normal Form (3NF)

Subsequently, R, Boyce, and E. F. Codd introduced a stronger definition of 3NF called Boyce-Codd Normal Form. With
the exception of 1NF, all these normal forms are based on functional dependency among the attributes of a table.
Higher normal forms that go beyond BCNF were introduced later such as Fourth Normal Form (4NF) and Fifth
Normal Form (5NF). However, these later normal forms deal with situations that are very rare.

Normalization Process in DBMS

Now, we are going to describe the process of Normalization by considering an Example. Let us look into that.

Normalization Process Example

In this example, we will look into that how we can normalize the database with the help of different Normal Forms.
We will look into each of the Normal Forms separately in this example.
UNIT 5: Functional Dependencies & Normalization
Consider the table 1 shown below:

Full Name Institute Courses Result

Naveen Kumar IIT Delhi DBMS, OS Pass

Utkarsh Tiwari IIT Bombay CN. COA Fail

Utkarsh Tiwari IIT Kanpur OS Fail

Now, we are re-structuring the table according to the 1st Normal Form.

Rules of 1st Normal Form

 Each table should contain a single value.

 Each record needs to be unique.

Table 1 after applying 1st Normal Form:

Full Name Institute Subject Result

Naveen Kumar IIT Delhi DBMS Pass

Naveen Kumar IIT Delhi OS Pass

Utkarsh Tiwari IIT Bombay CN Fail

Utkarsh Tiwari IIT Bombay COA Fail

Utkarsh Tiwari IIT Kanpur OS Fail

Consider the table 2 shown below,

Full Name Institute Subject Result

Utkarsh Tiwari IIT Bombay COA Fail

Utkarsh Tiwari IIT Kanpur OS Fail


UNIT 5: Functional Dependencies & Normalization
Here, People having the same name are from different institutes. So, we require Full Name and Institute to Identify a
Row of the database. For this, we have to remove Partial Dependency.

Let’s look at the 2nd Normal Form.

Rules of 2nd Normal Form

 The table should be in 1NF.

 Primary Key does not functionally dependent on any subset of Candidate Key.

Table 1

ID Full Name Institute Result

1 Naveen Kumar IIT Delhi Pass

2 Utkarsh Tiwari IIT Bombay Fail

3 Utkarsh Tiwari IIT Kanpur Fail

Table 2

ID Subject

1 DBMS

1 OS

2 CN

2 COA

3 OS

Here, the Id in Table 2 is Foreign Key to the Id in Table 1. Now, we have to remove Transitive Functional
Dependency from our Table to Normalize our Database. A Transitive Functional Dependency basically tells us that
there is an indirect relationship between functional dependency.

Now, let us look at the 3rd Normal Form.


UNIT 5: Functional Dependencies & Normalization
Rules of 3rd Normal Form

 The tables should be in 2NF.

 There will be no Transitive Dependency.

Table 1

ID Full Name Institute Result_ID

1 Naveen Kumar IIT Delhi 1

2 Utkarsh Tiwari IIT Bombay 2

3 Utkarsh Tiwari IIT Kanpur 2

Table 2

ID Subject

1 DBMS

1 OS

2 CN

2 COA

3 OS
UNIT 5: Functional Dependencies & Normalization
Table 3

Result_ID Result

1 Pass

2 Fail

3 On Hold

Finally, Our Database is Normalized. From the above-mentioned example, we have reached our level of
Normalization. In fact, there are also some higher forms or next levels of Normalization. Now, we are going to
discuss them one by one.

Other Normal Forms

Boyce-Codd Normal Form (BCNF)

Sometimes, when the database is in the 3rd Normal Form, there exist some anomalies in DBMS, like when more
than one Candidate Keys is present in the Database. This has to be removed under BCNF. BCNF is also called 3.5NF.

4th Normal Form

Whenever a Database contains multivalued and independent data in two or more tables, then the database is to be
considered in the 4th Normal Form.

5th Normal Form

Whenever a Database Table is not in 4NF, and we cannot divide it into smaller tables keeping our data safe with us,
then our Database is in 5th Normal Form.
UNIT 5: Functional Dependencies & Normalization
Summary of Normalization in a Nutshell

Normal
Form Test Remedy (Normalization)

The relation should have no non-atomic attributes or Form a name relation for each non-
1NF
nested relations. atomic attribute or nested relation.

Decompose and set up a new relation for


each partial key with its dependent
For relations where the primary key contains multiple
attributes.
attributes, no non-key
2NF Make sure to keep a relationship with the
attributes should be functionally dependent on a part of
original primary key and any attributes
the primary key.
that are fully functionally dependent on
it.

The relation should not have a non-key attribute


functionally determined by another
Decompose and set up a relation that
non-key attribute (or by a set of non-key attributes) i.e., includes the non-key attribute(s) that
3NF
there should be no functionally determine(s) another non-
key attribute (s).
transitive dependency of a non-key attribute of the
primary key.

The relation should not have any attribute in Functional


Dependency which is Make sure that the left side of every
BCNF
non-prime, the attribute that doesn’t occur in any functional dependency is a candidate key.
candidate key.

The relation should not have a multi-value dependency


means it occurs when

4NF two attributes of a table are independent of each other Decompose the table into two subtables.
but both depend on a

third attribute.

The relation should not have join dependency means if a


Decompose all the tables into as many as
table can be recreated by joining multiple tables and
5NF possible numbers in order to avoid
each of the tables has a subset of the attributes of the
dependency.
table, then the table is in Join Dependency.
UNIT 5: Functional Dependencies & Normalization
The Problem of Redundancy in Database
Redundancy means having multiple copies of the same data in the database. This problem arises when a database is
not normalized. Suppose a table of student details attributes is: student ID, student name, college name, college
rank, and course opted.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

It can be observed that values of attribute college name, college rank, and course are being repeated which can lead
to problems. Problems caused due to redundancy are:

 Insertion anomaly

 Deletion anomaly

 Updation anomaly

Insertion Anomaly

If a student detail has to be inserted whose course is not being decided yet then insertion will not be possible till the
time course is decided for the student.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

This problem happens when the insertion of a data record is not possible without adding some additional unrelated
data to the record.

Deletion Anomaly

If the details of students in this table are deleted then the details of the college will also get deleted which should
not occur by common sense. This anomaly happens when the deletion of a data record results in losing some
unrelated information that was stored as part of the record that was deleted from a table.

It is not possible to delete some information without losing some other information in the table as well.

Updation Anomaly

Suppose the rank of the college changes then changes will have to be all over the database which will be time-
consuming and computationally costly.
UNIT 5: Functional Dependencies & Normalization
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places then the database will be in an inconsistent
state.

Redundancy in a database occurs when the same data is stored in multiple places. Redundancy can cause various
problems such as data inconsistencies, higher storage requirements, and slower data retrieval.

Problems Caused Due to Redundancy

 Data Inconsistency: Redundancy can lead to data inconsistencies, where the same data is stored in multiple
locations, and changes to one copy of the data are not reflected in the other copies. This can result in
incorrect data being used in decision-making processes and can lead to errors and inconsistencies in the
data.

 Storage Requirements: Redundancy increases the storage requirements of a database. If the same data is
stored in multiple places, more storage space is required to store the data. This can lead to higher costs and
slower data retrieval.

 Update Anomalies: Redundancy can lead to update anomalies, where changes made to one copy of the data
are not reflected in the other copies. This can result in incorrect data being used in decision-making
processes and can lead to errors and inconsistencies in the data.

 Performance Issues: Redundancy can also lead to performance issues, as the database must spend more
time updating multiple copies of the same data. This can lead to slower data retrieval and slower overall
performance of the database.

 Security Issues: Redundancy can also create security issues, as multiple copies of the same data can be
accessed and manipulated by unauthorized users. This can lead to data breaches and compromise
the confidentiality, integrity, and availability of the data.

 Maintenance Complexity: Redundancy can increase the complexity of database maintenance, as multiple
copies of the same data must be updated and synchronized. This can make it more difficult to troubleshoot
and resolve issues and can require more time and resources to maintain the database.

 Data Duplication: Redundancy can lead to data duplication, where the same data is stored in multiple
locations, resulting in wasted storage space and increased maintenance complexity. This can also lead to
confusion and errors, as different copies of the data may have different values or be out of sync.

 Data Integrity: Redundancy can also compromise data integrity, as changes made to one copy of the data
may not be reflected in the other copies. This can result in inconsistencies and errors and can make it
difficult to ensure that the data is accurate and up-to-date.
UNIT 5: Functional Dependencies & Normalization
 Usability Issues: Redundancy can also create usability issues, as users may have difficulty accessing the
correct version of the data or may be confused by inconsistencies and errors. This can lead to frustration and
decreased productivity, as users spend more time searching for the correct data or correcting errors.

To prevent redundancy in a database, normalization techniques can be used. Normalization is the process of
organizing data in a database to eliminate redundancy and improve data integrity. Normalization involves breaking
down a larger table into smaller tables and establishing relationships between them. This reduces redundancy and
makes the database more efficient and reliable.

Advantages of Redundant Data

 Enhanced Query Performance: By eliminating the need for intricate joins, redundancy helps expedite data
retrieval.

 Offline Access: In offline circumstances, redundant copies allow data access even in the absence of
continuous connectivity.

 Increased Availability: Redundancy helps to increase fault tolerance, which makes data accessible even in
the event of server failures.

Disadvantages of Redundant Data

 Increased storage requirements: Redundant data takes up additional storage space within the database,
which can increase costs and slow down performance.

 Inconsistency: If the same data is stored in multiple places within the database, there is a risk that updates
or changes made to one copy of the data may not be reflected in other copies, leading to inconsistency and
potentially incorrect results.

 Difficulty in maintenance: With redundant data, it becomes more difficult to maintain the accuracy and
consistency of the data. It requires more effort and resources to ensure that all copies of the data are
updated correctly.

 Increased risk of errors: When data is redundant, there is a greater risk of errors in the database. For
example, if the same data is stored in multiple tables, there is a risk of inconsistencies between the tables.

 Reduced flexibility: Redundancy can reduce the flexibility of the database. For example, if a change needs to
be made to a particular piece of data, it may need to be updated in multiple places, which can be time-
consuming and error-prone.

Types of Functional dependencies in DBMS


In a relational database management, functional dependency is a concept that specifies the relationship between
two sets of attributes where one attribute determines the value of another attribute. It is denoted as X → Y, where
the attribute set on the left side of the arrow, X is called Determinant, and Y is called the Dependent.

Functional dependencies are used to mathematically express relations among database entities and are very
important to understand advanced concepts in Relational Database System and understanding problems in
competitive exams like Gate.

Example:
UNIT 5: Functional Dependencies & Normalization
roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:

 roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency

 roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building}, it can
determine its subset dept_name also.

 dept_name → dept_building , Dept_name can identify the dept_building accurately, since departments with
different dept_name will also have a different dept_building

 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.

Here are some invalid functional dependencies:

 name → dept_name Students with the same name can have different dept_name, hence this is not a valid
functional dependency.

 dept_building → dept_name There can be multiple departments in the same building. Example, in the
above table departments ME and EC are in the same building B2, hence dept_building → dept_name is an
invalid functional dependency.

 More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no, dept_building →
roll_no, etc.

Armstrong’s axioms/properties of functional dependencies:

1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule


Example, {roll_no, name} → name is valid.

2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the augmentation rule.


Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name} → {dept_building,
dept_name} is also valid.

3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no → dept_building is also valid.
UNIT 5: Functional Dependencies & Normalization
Types of Functional Dependencies in DBMS

1. Trivial functional dependency

2. Non-Trivial functional dependency

3. Multivalued functional dependency

4. Transitive functional dependency

1. Trivial Functional Dependency

In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and Y is the
subset of X, then it is called trivial functional dependency

Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset of
determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional dependency.

2. Non-trivial Functional Dependency

In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y is
not a subset of X, then it is called Non-trivial functional dependency.

Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not
a subset of {roll_no, name}
UNIT 5: Functional Dependencies & Normalization
3. Multivalued Functional Dependency

In Multivalued functional dependency, entities of the dependent set are not dependent on each other. i.e. If a →
{b, c} and there exists no functional dependency between b and c, then it is called a multivalued functional
dependency.

For example,

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name doesn’t exist !)

4. Transitive Functional Dependency

In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a → b & b → c, then
according to axiom of transitivity, a → c. This is a transitive functional dependency.

For example,

enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity, enrol_no →
building_no is a valid functional dependency. This is an indirect functional dependency, hence called Transitive
functional dependency.
UNIT 5: Functional Dependencies & Normalization
5. Fully Functional Dependency

In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of
attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and X->Z which states that those
dependencies are fully functional.

6. Partial Functional Dependency

In partial functional dependency a non key attribute depends on a part of the composite key, rather than the whole
key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is non key attribute. Then X->Z is
a partial functional dependency in RBDMS.

Advantages of Functional Dependencies

Functional dependencies having numerous applications in the field of database management system. Here are some
applications listed below:

1. Data Normalization

Data normalization is the process of organizing data in a database in order to minimize redundancy and increase
data integrity. Functional dependencies play an important part in data normalization. With the help of functional
dependencies we are able to identify the primary key, candidate key in a table which in turns helps in normalization.

2. Query Optimization

With the help of functional dependencies we are able to decide the connectivity between the tables and the
necessary attributes need to be projected to retrieve the required data from the tables. This helps in query
optimization and improves performance.

3. Consistency of Data

Functional dependencies ensures the consistency of the data by removing any redundancies or inconsistencies that
may exist in the data. Functional dependency ensures that the changes made in one attribute does not affect
inconsistency in another set of attributes thus it maintains the consistency of the data in database.

4. Data Quality Improvement

Functional dependencies ensure that the data in the database to be accurate, complete and updated. This helps to
improve the overall quality of the data, as well as it eliminates errors and inaccuracies that might occur during data
analysis and decision making, thus functional dependency helps in improving the quality of data in database.

First Normal Form (1NF)


If a table has data redundancy and is not properly normalized, then it will be difficult to handle and update the
database, without facing data loss. It will also eat up extra memory space and Insertion, Update, and Deletion
Anomalies are very frequent if the database is not normalized.

Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation
may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal
forms are used to eliminate or reduce redundancy in database tables.

Levels of Normalization

There are various levels of normalization. These are some of them:

 First Normal Form (1NF)

 Second Normal Form (2NF)

 Third Normal Form (3NF)


UNIT 5: Functional Dependencies & Normalization
 Boyce-Codd Normal Form (BCNF)

 Fourth Normal Form (4NF)

 Fifth Normal Form (5NF)

In this article, we will discuss the First Normal Form (1NF).

First Normal Form with Example

If a relation contains a composite or multi-valued attribute, it violates the first normal form, or the relation is in the
first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is single-valued attribute.

A table is in 1 NF if:

 There are only Single Valued Attributes.

 Attribute Domain does not change.

 There is a unique name for every Attribute/Column.

 The order in which data is stored does not matter.

Consider the examples given below.

Example 1:

Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into
1NF has been shown in table 2.

Example
UNIT 5: Functional Dependencies & Normalization
Example 2:

ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M c2, c3

In the above table, Course is a multi-valued attribute so it is not in 1NF.

Below Table is in 1NF as there is no multi-valued attribute:

ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

Note: A database design is considered as bad if it is not even in the First Normal Form (1NF).

Second Normal Form (2NF)


First Normal Form (1NF) does not eliminate redundancy, but rather, it’s that it eliminates repeating groups. Instead
of having multiple columns of the same kind of data in a record, (0NF or Unnormalized form) you remove the
repeated information into a separate relation and represent them as rows. This is what constitutes 1NF.

Second Normal Form

The second Normal Form (2NF) is based on the concept of fully functional dependency. The second Normal Form
applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes. A
relation with a single-attribute primary key is automatically in at least 2NF. A relation that is not in 2NF may suffer
from the update anomalies. To be in the second normal form, a relation must be in the first normal form and the
relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-
prime attribute (attributes that are not part of any candidate key) is dependent on any proper subset of any
candidate key of the table. In other words,

A relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on the
primary key, then the relation is in Second Normal Form (2NF).

Note – If the proper subset of the candidate key determines a non-prime attribute, it is called partial dependency.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency
exists, we remove the partially dependent attribute(s) from the relation by placing them in a new relation along with
a copy of their determinant. Consider the examples given below.

Example-1: Consider the table as follows below.

STUD_NO COURSE_NO COURSE_FEE


1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
UNIT 5: Functional Dependencies & Normalization
{Note that, there are many courses having the same course fee. } Here, COURSE_FEE cannot alone decide the value
of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO; Hence, COURSE_FEE would be a non-
prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO ->
COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency
and so this relation is not in 2NF. To convert the above relation to 2NF, we need to split the table into two tables
such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE

Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5

Note – 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students
taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead once we can store it in the
second table as the course fee for C1 is 1000.

Example-2: Consider following functional dependencies in relation R (A, B, C, D )

AB -> C [A and B together determine C]


BC -> D [B and C together determine D]

In this case, we can see that the relation R has a composite candidate key {A,B} as AB->C. Therefore, A and B
together uniquely determine the value of C. Similarly, BC -> D shows that B and C together uniquely determine the
value of D.

The relation R is already in 1NF because it does not have any repeating groups or nested relations.

However, we can see that the non-prime attribute D is functionally dependent on only part of a candidate key, BC.
This violates the 2NF condition.

Third Normal Form (3NF)


Although Second Normal Form (2NF)relations have less redundancy than those in 1NF, they may still suffer from
update anomalies. If we update only one tuple and not the other, the database will be in an inconsistent state. This
update anomaly is caused by a transitive dependency. We need to remove such dependencies by progressing to the
Third Normal Form (3NF).

Third Normal Form (3NF)

A relation is in the third normal form, if there is no transitive dependency for non-prime attributes as well as it is in
the second normal form. A relation is in 3NF if at least one of the following conditions holds in every non-trivial
function dependency X –> Y.

 X is a super key.

 Y is a prime attribute (each element of Y is part of some candidate key).

In other words,

A relation that is in First and Second Normal Form and in which no non-primary-key attribute is transitively
dependent on the primary key, then it is in Third Normal Form (3NF).
UNIT 5: Functional Dependencies & Normalization
Example 1:

In relation STUDENT given in Table 4,

FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO ->
STUD_AGE} Candidate Key: {STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal
form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)


STATE_COUNTRY (STATE, COUNTRY)

Example 2:

Consider relation R(A, B, C, D, E)

A -> BC,
CD -> E,
B -> D,
E -> A

All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides of all functional
dependencies are prime.

Multivalued Dependency (MVD) in DBMS


In Database Management Systems (DBMS), multivalued dependency (MVD) deals with complex attribute
relationships in which an attribute may have many independent values while yet depending on another attribute or
group of attributes. It improves database structure and consistency and is essential for data integrity and database
normalization.

MVD or multivalued dependency means that for a single value of attribute ‘a’ multiple values of attribute ‘b’ exist.
We write it as,

a --> --> b

It is read as a is multi-valued dependent on b. Suppose a person named Geeks is working on 2 projects Microsoft and
Oracle and has 2 hobbies namely Reading and Music. This can be expressed in a tabular format in the following way.
UNIT 5: Functional Dependencies & Normalization

Example

Project and Hobby are multivalued attributes as they have more than one value for a single person i.e., Geeks.

What is Multivalued Dependency?

When one attribute in a database depends on another attribute and has many independent values, it is said to have
multivalued dependency (MVD). It supports maintaining data accuracy and managing intricate data interactions.

Multi Valued Dependency (MVD)

We can say that multivalued dependency exists if the following conditions are met.

Conditions for MVD

Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs of tuples t1 and t2 in r,
such that,

t1[a] = t2[a]

Then there exists t3 and t4 in r such that.

t1[a] = t2[a] = t3[a] = t4[a]


t1[b] = t3[b]; t2[b] = t4[b]
t1 = t4; t2 = t3

Then multivalued (MVD) dependency exists. To check the MVD in given table, we apply the conditions stated above
and we check it with the values in the given table.
UNIT 5: Functional Dependencies & Normalization

Example

Condition-1 for MVD

t1[a] = t2[a] = t3[a] = t4[a]

Finding from table,

t1[a] = t2[a] = t3[a] = t4[a] = Geeks

So, condition 1 is Satisfied.

Condition-2 for MVD

t1[b] = t3[b]
And
t2[b] = t4[b]

Finding from table,

t1[b] = t3[b] = MS
And
t2[b] = t4[b] = Oracle

So, condition 2 is Satisfied.

Condition-3 for MVD

∃c ∈ R-(a ∪ b) where R is the set of attributes in the relational table.


t1 = t4
And
t2=t3

Finding from table,

t1 = t4 = Reading
And
t2 = t3 = Music

So, condition 3 is Satisfied. All conditions are satisfied, therefore,

a --> --> b
UNIT 5: Functional Dependencies & Normalization
According to table we have got,

name --> --> project

And for,

a --> --> C

We get,

name --> --> hobby

Hence, we know that MVD exists in the above table and it can be stated by,

name --> --> project


name --> --> hobby

Boyce-Codd Normal Form (BCNF)


Application of the general definitions of 2NF and 3NF may identify additional redundancy caused by dependencies
that violate one or more candidate keys. However, despite these additional constraints, dependencies can still exist
that will cause redundancy to be present in 3NF relations. This weakness in 3NF resulted in the presentation of a
stronger normal form called the Boyce-Codd Normal Form (Codd, 1974).

Although, 3NF is an adequate normal form for relational databases, still, this (3NF) normal form may not remove
100% redundancy because of X−>Y functional dependency if X is not a candidate key of the given relation. This can
be solved by Boyce-Codd Normal Form (BCNF).

Boyce-Codd Normal Form (BCNF)

Boyce–Codd Normal Form (BCNF) is based on functional dependencies that take into account all candidate keys in a
relation; however, BCNF also has additional constraints compared with the general definition of 3NF.

Rules for BCNF

Rule 1: The table should be in the 3rd Normal Form.

Rule 2: X should be a superkey for every functional dependency (FD) X−>Y in a given relation.

Note: To test whether a relation is in BCNF, we identify all the determinants and make sure that they are candidate
keys.

BCNF in DBMS
UNIT 5: Functional Dependencies & Normalization
You came across a similar hierarchy known as the Chomsky Normal Form in the Theory of Computation. Now,
carefully study the hierarchy above. It can be inferred that every relation in BCNF is also in 3NF. To put it another
way, a relation in 3NF need not be in BCNF. Ponder over this statement for a while.

To determine the highest normal form of a given relation R with functional dependencies, the first step is to check
whether the BCNF condition holds. If R is found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF as the hierarchy shows. The 1NF has the least restrictive constraint – it only requires a relation
R to have atomic values in each tuple. The 2NF has a slightly more restrictive constraint.

The 3NF has a more restrictive constraint than the first two normal forms but is less restrictive than the BCNF. In this
manner, the restriction increases as we traverse down the hierarchy.

Examples

Here, we are going to discuss some basic examples which let you understand the properties of BCNF. We will discuss
multiple examples here.

Example 1

Let us consider the student database, in which data of the student are mentioned.

Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

101 Computer Science & Engineering DBMS B_001 201

101 Computer Science & Engineering Computer Networks B_001 202

Electronics & Communication


102 VLSI Technology B_003 401
Engineering

Electronics & Communication Mobile


102 B_003 402
Engineering Communication

Functional Dependency of the above is as mentioned:

Stu_ID −> Stu_Branch

Stu_Course −> {Branch_Number, Stu_Course_No}

Candidate Keys of the above table are: {Stu_ID, Stu_Course}

Why this Table is Not in BCNF?

The table present above is not in BCNF, because as we can see that neither Stu_ID nor Stu_Course is a Super Key. As
the rules mentioned above clearly tell that for a table to be in BCNF, it must follow the property that for functional
dependency X−>Y, X must be in Super Key and here this property fails, that’s why this table is not in BCNF.

How to Satisfy BCNF?

For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full procedure through
which we transform this table into BCNF. Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
UNIT 5: Functional Dependencies & Normalization
Stu_Branch Table

Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.

Stu_Course Table

Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.

Stu_ID to Stu_Course_No Table

Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.

After decomposing into further tables, now it is in BCNF, as it is passing the condition of Super Key, that in functional
dependency X−>Y, X is a Super Key.
UNIT 5: Functional Dependencies & Normalization
Example 2

Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:

{ BC->D, AC->BE, B->E }

Explanation:

 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attributes of the
relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.

 Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and others
will be non-prime {B, D, E} in this example.

 Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite
attributes.

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key
AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in
B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be a prime attribute. So the highest normal form of relation will be the 2nd Normal form.

Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.

Consider these functional dependencies of some relation R

AB ->C

C ->B

AB ->B

Suppose, it is known that the only candidate key of R is AB. A careful observation is required to conclude that the
above dependency is a Transitive Dependency as the prime attribute B transitively depends on the key AB through C.
Now, the first and the third FD are in BCNF as they both contain the candidate key (or simply KEY) on their left sides.
The second dependency, however, is not in BCNF but is definitely in 3NF due to the presence of the prime attribute
on the right side. So, the highest normal form of R is 3NF as all three FDs satisfy the necessary conditions to be in
3NF.

Example 3

For example consider relation R(A, B, C)

A -> BC,

B -> A

A and B both are super keys so the above relation is in BCNF.

Note: BCNF decomposition may always not be possible with dependency preserving, however, it always satisfies
the lossless join condition. For example, relation R (V, W, X, Y, Z), with functional dependencies:

V, W -> X

Y, Z -> X

W -> Y
UNIT 5: Functional Dependencies & Normalization
Introduction of 4th and 5th Normal Form in DBMS
Two of the highest levels of database normalization are the fourth normal form (4NF) and the fifth normal form
(5NF). Multivalued dependencies are handled by 4NF, whereas join dependencies are handled by 5NF.

If two or more independent relations are kept in a single relation or we can say multivalue dependency occurs when
the presence of one or more rows in a table implies the presence of one or more other rows in that same table. Put
another way, two attributes (or columns) in a table are independent of one another, but both depend on a third
attribute. A multivalued dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.

For a dependency A -> B, if for a single value of A, multiple values of B exist, then the table may have a multi-valued
dependency. The table should have at least 3 attributes and B and C should be independent for A ->> B multivalued
dependency.

Example:

Person Mobile Food_Likes

Mahesh 9893/9424 Burger/Pizza

Ramesh 9191 Pizza

Person->-> mobile,
Person ->-> food_likes

This is read as “person multi determines mobile” and “person multi determines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a functional dependency X -> Y,
every x determines exactly one y, never more than one.

Fourth Normal Form (4NF)

The Fourth Normal Form (4NF) is a level of database normalization where there are no non-trivial multivalued
dependencies other than a candidate key. It builds on the first three normal forms (1NF, 2NF, and 3NF) and
the Boyce-Codd Normal Form (BCNF). It states that, in addition to a database meeting the requirements of BCNF, it
must not contain more than one multivalued dependency.

Properties

A relation R is in 4NF if and only if the following conditions are satisfied:

1. It should be in the Boyce-Codd Normal Form (BCNF).


2. The table should not have any Multi-valued Dependency.

A table with a multivalued dependency violates the normalization standard of the Fourth Normal Form (4NF)
because it creates unnecessary redundancies and can contribute to inconsistent data. To bring this up to 4NF, it is
necessary to break this information into two tables.

Example: Consider the database table of a class that has two relations R1 contains student ID(SID) and student name
(SNAME) and R2 contains course id(CID) and course name (CNAME).
UNIT 5: Functional Dependencies & Normalization
Table R1

SID SNAME

S1 A

S2 B

Table R2

CID CNAME

C1 C

C2 D

When their cross-product is done it resulted in multivalued dependencies.

Table R1 X R2

SID SNAME CID CNAME

S1 A C1 C

S1 A C2 D

S2 B C1 C

S2 B C2 D

Multivalued dependencies (MVD) are:

SID->->CID; SID->->CNAME; SNAME->->CNAME

Join Dependency

Join decomposition is a further generalization of Multivalued dependencies. If the join of R1 and R2 over C is equal to
relation R then we can say that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1, R2,
…, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join decomposition. The *(A, B, C, D), (C, D) will
be a JD of R if the join of joins attribute is equal to the relation R. Here, *(R1, R2, R3) is used to indicate that relation
R1, R2, R3 and so on are a JD of R. Let R is a relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is
said to satisfy join dependency if and only if
UNIT 5: Functional Dependencies & Normalization

Joint Dependency

Example:

Table R1

Company Product

C1 Pendrive

C1 mic

C2 speaker

C2 speaker

Company->->Product

Table R2

Agent Company

Aman C1

Aman C2

Mohan C1

Agent->->Company

Table R3

Agent Product

Aman Pendrive

Aman Mic

Aman speaker
UNIT 5: Functional Dependencies & Normalization
Agent Product

Mohan speaker

Agent->->Product

Table R1⋈R2⋈R3

Company Product Agent

C1 Pendrive Aman

C1 mic Aman

C2 speaker speaker

C1 speaker Aman

Agent->->Product

Fifth Normal Form/Projected Normal Form (5NF)

A relation R is in Fifth Normal Form if and only if everyone joins dependency in R is implied by the candidate keys of
R. A relation decomposed into two relations must have lossless join Property, which ensures that no spurious or
extra tuples are generated when relations are reunited through a natural join.

Properties

A relation R is in 5NF if and only if it satisfies the following conditions:

1. R should be already in 4NF.


2. It cannot be further non loss decomposed (join dependency).

Example – Consider the above schema, with a case as “if a company makes a product and an agent is an agent for
that company, then he always sells that product for the company”. Under these circumstances, the ACP table is
shown as:

Table ACP

Agent Company Product

A1 PQR Nut

A1 PQR Bolt
UNIT 5: Functional Dependencies & Normalization
Agent Company Product

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three relations will be shown as:

Table R1

Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table R2

Agent Product

A1 Nut

A1 Bolt

A2 Nut

Table R3

Company Product

PQR Nut

PQR Bolt
UNIT 5: Functional Dependencies & Normalization
Company Product

XYZ Nut

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of R13 and R2 over ‘Agent’and
‘Product’ will be Table ACP.

Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a lossless join
decomposition. Therefore, the relation is in 5NF as it does not violate the property of lossless join.

Conclusion

 Multivalued dependencies are removed by 4NF, and join dependencies are removed by 5NF.

 The greatest degrees of database normalization, 4NF and 5NF, might not be required for every application.

 Normalizing to 4NF and 5NF might result in more complicated database structures and slower query speed,
but it can also increase data accuracy, dependability, and simplicity.

You might also like