Unit-2 Part-1 Relational Data Model in DBMS
Unit-2 Part-1 Relational Data Model in DBMS
E.F. Codd proposed the relational Model to model data in the form of relations or tables. After designing the
conceptual model of the Database using ER diagram, we need to convert the conceptual model into a relational
model which can be implemented using any RDBMS language like Oracle SQL, MySQL, etc. So we will see
what the Relational Model is.
What is the Relational Model?
The relational model represents how data is stored in Relational Databases. A relational database consists of a
collection of tables, each of which is assigned a unique name. Consider a relation STUDENT with attributes
ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in the table.
Table Student
ROLL_NO NAME ADDRESS PHONE AGE
4 SURESH DELHI 18
Important Terminologies
Attribute: Attributes are the properties that define an entity. e.g.; ROLL_NO, NAME, ADDRESS
Relation Schema: A relation schema defines the structure of the relation and represents the name of the
relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the
relation schema for STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one of which is
shown as:
RA
1 DELHI 9455123451 18
M
Relation Instance: The set of tuples of a relation at a particular instance of time is called a relation instance.
Table 1 shows the relation instance of STUDENT at a particular time. It can change whenever there is an
insertion, deletion, or update in the database.
Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined
above has cardinality 4.
Column: The column represents the set of values for a particular attribute. The column ROLL_NO is
extracted from the relation STUDENT.
ROLL_NO
NULL Values: The value which is not known or unavailable is called a NULL value. It is represented by
blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Relation Key: These are basically the keys that are used to identify the rows uniquely or also help in
identifying tables. These are of the following types.
Primary Key
Candidate Key
Super Key
Foreign Key
Alternate Key
Composite Key
Constraints in Relational Model
While designing the Relational Model, we define some conditions which must hold for data present in the
database are called Constraints. These constraints are checked before performing any operation (insertion,
deletion, and updation ) in the database. If there is a violation of any of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie inside the domain range. e.g.; If a
constraint AGE>0 is applied to STUDENT relation, inserting a negative value of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that defines a tuple uniquely. Those set of
attributes is called keys. e.g.; ROLL_NO in STUDENT is key. No two students can have the same roll number.
So a key has two properties:
It should be unique for all tuples.
It can’t have NULL values.
Referential Integrity
When one attribute of a relation can only take values from another attribute of the same relation or any other
relation, it is called referential integrity. Let us suppose we have 2 relations
Table Student
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE
4 SURESH DELHI 18 IT
Table Branch
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of BRANCH
which is called referential integrity constraint. The relation which is referencing another relation is called
REFERENCING RELATION (STUDENT in this case) and the relation to which other relations refer is called
REFERENCED RELATION (BRANCH in this case).
Anomalies in the Relational Model
An anomaly is an irregularity or something which deviates from the expected or normal state. When designing
databases, we identify three types of anomalies: Insert, Update, and Delete.
Insertion Anomaly in Referencing Relation
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not present in the
referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT relation will
result in an error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if the value of REFERENCED ATTRIBUTE
is used in the value of REFERENCING ATTRIBUTE. e.g; if we try to delete a tuple from BRANCH having
BRANCH_CODE ‘CS’, it will result in an error because ‘CS’ is referenced by BRANCH_CODE of STUDENT,
but if we try to delete the row from BRANCH with BRANCH_CODE CV, it will be deleted as the value is not
been used by referencing relation. It can be handled by the following method:
On Delete Cascade
It will delete the tuples from REFERENCING RELATION if the value used by REFERENCING ATTRIBUTE
is deleted from REFERENCED RELATION. e.g.; For, if we delete a row from BRANCH with
BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this
case) will be deleted.
On Update Cascade
It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the attribute value used by
REFERENCING ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if we update a row from
BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT relation with BRANCH_CODE CS
(ROLL_NO 1 and 2 in this case) will be updated with BRANCH_CODE ‘CSE’.
Super Keys
Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is known as super
keys. Out of these super keys, we can always choose a proper subset among these that can be used as a primary
key. Such keys are known as Candidate keys. If there is a combination of two or more attributes that are being
used as the primary key, then we call it a Composite key.
Codd Rules in Relational Model
Edgar F Codd proposed the relational database model where he stated rules. Now these are known as Codd’s
Rules. For any database to be the perfect one, it has to follow the rules.
For more, refer to Codd Rules in Relational Model.
Advantages of the Relational Model
Simple model: Relational Model is simple and easy to use in comparison to other languages.
Flexible: Relational Model is more flexible than any other relational model present.
Secure: Relational Model is more secure than any other relational model.
Data Accuracy: Data is more accurate in the relational data model.
Data Integrity: The integrity of the data is maintained in the relational model.
Operations can be Applied Easily: It is better to perform operations in the relational model.
Disadvantages of the Relational Model
Relational Database Model is not very good for large databases.
Sometimes, it becomes difficult to find the relation between tables.
Because of the complex structure, the response time for queries is high.
Characteristics of the Relational Model
Data is represented in rows and columns called relations.
Data is stored in tables having relationships between them called the Relational model.
The relational model supports the operations like Data definition, Data manipulation, and Transaction
management.
Each column has a distinct name and they are representing attributes.
Each row represents a single entity.
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
RDBMS Vendors
There are several vendors that offer Relational Database Management Systems (RDBMS). Here are some of the
most popular ones:
Oracle: Oracle Database is one of the most widely used RDBMS products in the market. It is known for its
robustness, scalability, and reliability. It is used by many large enterprises and is particularly well-suited for data
warehousing and transaction processing.
Microsoft: Microsoft SQL Server is a popular RDBMS used in Windows environments. It offers a range of
features, including data mining, business intelligence, and reporting services.
IBM: IBM DB2 is a popular RDBMS used in enterprise environments. It offers high availability, disaster
recovery, and scalability features.
MySQL: MySQL is an open-source RDBMS used by many small to medium-sized businesses. It is known for
its ease of use, flexibility, and low cost.
PostgreSQL: PostgreSQL is another popular open-source RDBMS. It is known for its scalability, reliability, and
support for complex transactions.
SAP: SAP HANA is an in-memory RDBMS that is designed for high-performance analytics and data
processing. It is often used in enterprise environments for real-time reporting and business intelligence.
Relational Algebra
It is a procedural Language. It consists of a set of operators that can be performed on relations. Relational
Algebra forms the basis for many other high-level data sub-languages like SQL and QBE.
Relational algebra has mainly 9 types of operators.
UNION
INTERSECTION
MINUS
TIMES
SELECTION
PROJECTION
JOIN
DIVISION
RENAME
1. UNION (U): A and B are two relations. It displays total values (Attributes) in both relations. It avoids
duplicate values in both relations. U symbol can be used.
Syntax:
A UNION B (or) A U B
Example:
A = { clerk, manager, salesman}
B = { president, clerk, manager}
A UNION B = {clerk, manager, salesman, president}
2. INTERSECTION (∩): A and B are two relations. It displays common elements in both relations. “∩” symbol
can be used.
Syntax:
A INTERSECT B (or) A ∩ B
Example:
A = { clerk, manager, salesman}
B = { president, clerk, manager}
A INTERSECT B = { clerk, manager}
3. DIFFERENCE (─): A and B are two relations. It displays elements in relation A not in relation B.
Syntax:
A MINUS B (OR) A ─ B
Example:
A = { clerk, manager, salesman}
B = { president, clerk, manager}
A MINUS B = {salesman}
4. CARTESIAN PRODUCT(X): A and B are two relations. It has a new relation consisting of all pair wises
combinations of all elements in A and B. The relation A has “m” elements and relation B has “n” elements,
then the resultant relation will be “ m * n “.
Syntax:
A TIMES B (OR) A X B
Example:
A = {clerk, manager, salesman}
B = {president, clerk, manager}
A TIMES B = {(clerk, president), (clerk, clerk),(clerk, manager), (manager, president), (manager, clerk),
(manager, manager),(salesman, president), (salesman, clerk), (salesman, manager) }
5. SELECTION (σ): It displays all the attributes or columns of all the rows in a relation. “” operator can be
used to select the attributes in a relation. In general SELECT operation is denoted by
(σ)(R)
Syntax:
σ condition (relation name)
6. PROJECTION (π): It displays some specified columns in a relation. “π” operator can be used to select some
specified columns in a relation. It selects tuples that satisfy the given predicate from a relation. It displays some
specified columns by using some conditions.
Syntax:
Syntax: π(col1,col2…) Relation Name
Example:
π(sno, sname, total) MARKS
7. JOIN( ): It combines two or more relations. Symbols can be used. It can be mainly divided into mainly 4
types. These are mainly
Inner Join
Outer Join
Left Outer Join
Right Outer Join
8. DIVIDE (÷): It divides the tuple from one relation to another relation
Syntax:
A DIVIDE B (OR) A ÷ B
Example:
A = {clerk, manager, salesman}
B = {clerk, manager}
A DIVIDE B = {salesman}
Syntax:
ρ(OLD RELATION, NEW RELATION)
Example:
ρ(STUDENT,MARKS)
It changes the “student” relation to “Marks” relation. It also renames the specified column. It changes the old-
column name to new-column name. Features of the relational model and Codd’s Rules:
Tables/Relations: The basic building block of the relational model is the table or relation, which represents a
collection of related data. Each table consists of columns, also known as attributes or fields, and rows, also
known as tuples or records.
Primary Keys: In the relational model, each row in a table must have a unique identifier, which is known as the
primary key. This ensures that each row is unique and can be accessed and manipulated easily.
Foreign Keys: Foreign keys are used to link tables together and enforce referential integrity. They ensure that
data in one table is consistent with data in another table.
Normalization: The process of organizing data into tables and eliminating redundancy is known as
normalization. Normalization is important in the relational model because it helps to ensure that data is
consistent and easy to maintain.
Codd’s Rules: Codd’s Rules are a set of 12 rules that define the characteristics of a true relational DBMS. These
rules ensure that the DBMS is consistent, reliable, and easy to use.
Atomicity, Consistency, Isolation, Durability (ACID): The ACID properties are a set of properties that ensure
that transactions are processed reliably in the relational model. Transactions are sets of operations that are
executed as a single unit, ensuring that data is consistent and accurate.
Advantages of Relational Algebra
Relational Algebra is a formal language used to specify queries to retrieve data from a relational database. It has
several advantages that make it a popular choice for managing and manipulating data. Here are some of the
advantages of Relational Algebra:
Simplicity: Relational Algebra provides a simple and easy-to-understand set of operators that can be used to
manipulate data. It is based on a set of mathematical concepts and principles, which makes it easy to learn and
use.
Formality: Relational Algebra is a formal language that provides a standardized and rigorous way of expressing
queries. This makes it easier to write and debug queries, and also ensures that queries are correct and consistent.
Abstraction: Relational Algebra provides a high-level abstraction of the underlying database structure, which
makes it easier to work with large and complex databases. It allows users to focus on the logical structure of the
data, rather than the physical storage details.
Portability: Relational Algebra is independent of any specific database management system, which means that
queries can be easily ported to other systems. This makes it easy to switch between different databases or
vendors without having to rewrite queries.
Efficiency: Relational Algebra is optimized for efficiency and performance, which means that queries can be
executed quickly and with minimal resources. This is particularly important for large and complex databases,
where performance is critical.
Extensibility: Relational Algebra provides a flexible and extensible framework that can be extended with new
operators and functions. This allows developers to customize and extend the language to meet their specific
needs.
Disadvantages of Relational Algebra
While Relational Algebra has many advantages, it also has some limitations and disadvantages that should
be considered when using it. Here are some of the disadvantages of Relational Algebra:
Complexity: Although Relational Algebra is based on mathematical principles, it can be complex and difficult
to understand for non-experts. The syntax and semantics of the language can be challenging, and it may require
significant training and experience to use it effectively.
Limited Expressiveness: Relational Algebra has a limited set of operators, which can make it difficult to
express certain types of queries. It may be necessary to use more advanced techniques, such as subqueries or
joins, to express complex queries.
Lack of Flexibility: Relational Algebra is designed for use with relational databases, which means that it may
not be well-suited for other types of data storage or management systems. This can limit its flexibility and
applicability in certain contexts.
Performance Limitations: While Relational Algebra is optimized for efficiency and performance, it may not be
able to handle large or complex datasets. Queries can become slow and resource-intensive when dealing with
large amounts of data or complex queries.
Limited Data Types: Relational Algebra is designed for use with simple data types, such as integers, strings,
and dates. It may not be well-suited for more complex data types, such as multimedia files or spatial data.
Lack of Integration: Relational Algebra is often used in conjunction with other programming languages and
tools, which can create integration challenges. It may require additional programming effort to integrate
Relational Algebra with other systems and tools.
Relational Algebra is a powerful and useful tool for managing and manipulating data in relational databases, it
has some limitations and disadvantages that should be carefully considered when using it.
Codd’s Twelve Rules of Relational Database
Codd rules were proposed by E.F. Codd which should be satisfied by the relational model. Codd’s Rules are
basically used to check whether DBMS has the quality to become Relational Database Management System
(RDBMS). But, it is rare to find that any product has fulfilled all the rules of Codd. They generally follow the 8-
9 rules of Codd. E.F. Codd has proposed 13 rules which are popularly known as Codd’s 12 rules. These rules are
stated as follows:
Rule 0: Foundation Rule– For any system that is advertised as, or claimed to be, a relational database
management system, that system must be able to manage databases entirely through its relational capabilities.
Rule 1: Information Rule– Data stored in the Relational model must be a value of some cell of a table.
Rule 2: Guaranteed Access Rule– Every data element must be accessible by the table name, its primary
key, and the name of the attribute whose value is to be determined.
Rule 3: Systematic Treatment of NULL values– NULL value in the database must only correspond to
missing, unknown, or not applicable values.
Rule 4: Active Online Catalog– The structure of the database must be stored in an online catalog that can be
queried by authorized users.
Rule 5: Comprehensive Data Sub-language Rule- A database should be accessible by a language
supported for definition, manipulation, and transaction management operation.
Rule 6: View Updating Rule- Different views created for various purposes should be automatically
updatable by the system.
Rule 7: High-level insert, update and delete rule- Relational Model should support insert, delete, update,
etc. operations at each level of relations. Also, set operations like Union, Intersection, and minus should be
supported.
Rule 8: Physical data independence- Any modification in the physical location of a table should not
enforce modification at the application level.
Rule 9: Logical data independence- Any modification in the logical or conceptual schema of a table should
not enforce modification at the application level. For example, merging two tables into one should not affect
the application accessing it which is difficult to achieve.
Rule 10: Integrity Independence- Integrity constraints modified at the database level should not enforce
modification at the application level.
Rule 11: Distribution Independence- Distribution of data over various locations should not be visible to
end-users.
Rule 12: Non-Subversion Rule- Low-level access to data should not be able to bypass the integrity rule to
change data.
GATE Question
Given the basic ER and relational models, which of the following is INCORRECT? [GATE CS 2012]
1. An attribute of an entity can have more than one value.
2. An attribute of an entity can be a composite.
3. In a row of a relational table, an attribute can have more than one value.
4. In a row of a relational table, an attribute can have exactly one value or a NULL value.
Answer: In the relation model, an attribute can’t have more than one value. So, option 3 is the answer.
Introduction of Relational Algebra in DBMS
Relational Algebra is a procedural query language. Relational algebra mainly provides a theoretical foundation
for relational databases and SQL. The main purpose of using Relational Algebra is to define operators that
transform one or more input relations into an output relation. Given that these operators accept relations as input
and produce relations as output, they can be combined and used to express potentially complex queries that
transform potentially many input relations (whose data are stored in the database) into a single output relation
(the query results). As it is pure mathematics, there is no use of English Keywords in Relational Algebra and
operators are represented using symbols.
Fundamental Operators
These are the basic/fundamental operators used in Relational Algebra.
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
1. Selection(σ): It is used to select required tuples of the relations.
Example:
A B C
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C
1 2 4
4 3 4
Note: The selection operator only selects the required tuples but does not display them. For display, the data
projection operator is used.
2. Projection(π): It is used to project required column data from a relation.
Example: Consider Table 1. Suppose we want columns B and C from Relation R. π(B,C)R will show following
columns.
B C
2 4
2 3
3 4
Ram 01
Mohan 02
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Consider the following table of Students having different optional subjects in their course.
π(Student_Name) FRENCH U π(Student_Name) GERMAN
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations must have the same set of
Attributes.
4. Set Difference (-): Set Difference in relational algebra is the same set difference operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as follows
π(Student_Name) FRENCH - π(Student_Name) GERMAN
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that both relations must have the same
set of Attributes.
5. Set Intersection (∩): Set Intersection in relational algebra is the same set intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is used as follows
π(Student_Name) FRENCH ∩ π(Student_Name) GERMAN
Student_Name
Vivek
Student_Name
Geeta
Note: The only constraint in the Set Difference between two relations is that both relations must have the same
set of Attributes.
7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so the cross product between A
X B will result in all the attributes of A followed by each attribute of B. Each record of A will pair with every
record of B.
Example:
A
Name Age Sex
Ram 14 M
Sona 15 F
Kim 20 M
B
ID Course
1 DS
2 DBMS
AXB
Name Age Sex ID Course
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Name Age Sex ID Course
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the fundamental operators.
1. Natural Join(⋈)
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or more relations will result in a
set of all combinations of tuples where they have an equal common attribute.
Example:
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
A 120 IT A
C 110 Sales Y
D 111 IT A
2. Conditional Join: Conditional join works similarly to natural join. In natural join, by default condition is
equal between common attributes while in conditional join we can specify any condition such as greater than,
less than, or not equal.
Example:
R
ID Sex Marks
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
11 M 22
12 M 59
1 F 45 10 M 20
R.ID R.Sex R.Marks S.ID S.Sex S.Marks
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a non-procedural query language. It
basically deals with the end results. It always tells me what to do but never tells me how to do it. Relational
Calculus is the formal query language. It is also known as Declarative language. In Relational
Calculus, the order is not specified in which the operation has to be performed. Relational
Calculus means what result we have to obtain.
There are two types of Relational Calculus
1. Tuple Relational Calculus(TRC)
2. Domain Relational Calculus(DRC)
{ t | P(t) }
Where,
t: the set of tuples
p: is the condition which is true for the given set of tuples.
Difference between Relational Algebra and Relational Calculus:
Basis of
S.NO Comparison Relational Algebra Relational Calculus
8. Query The evaluation of the query relies The order of operations does not matter
Evaluation on the order specification in in relational calculus for the evaluation of
which the operations must be
Basis of
S.NO Comparison Relational Algebra Relational Calculus
performed. queries.