DBMS_SQL
DBMS_SQL
What is Database
A database is an organized collection of data, so that it can be easily accessed
and managed. You can organize data into tables, rows, columns, and index.
The main purpose of the database is to operate a large amount of information
by storing, retrieving, and managing data.
Relational Database
This database is based on the relational data model, which stores data in the form
of rows(tuple) and columns(attributes), and together forms a table(relation).
ACID Properties:
To ensure the consistency of the database, certain properties are followed by all
the transactions occurring in the system. These properties are called as ACID
Properties of a transaction.
● Atomicity – This property ensures that either the transaction occurs completely
or it does not occur at all.
o In other words, it ensures that no transaction occurs partially.
● Consistency – This property ensures that integrity constraints are maintained.
o In other words, it ensures that the database remains consistent before and after
the transaction.
● Isolation – This property ensures that multiple transactions can occur
simultaneously without causing any inconsistency.
● Durability – This property ensures that all the changes made by a transaction
after its successful execution are written successfully to the disk.
o It also ensures that these changes exist permanently and are never lost even if
there occurs a failure of any kind
Data Integrity
Data integrity is defined as the data contained in the database is both correct
and consistent.
There are the following categories of data integrity exist with each RDBMS:
Entity integrity: It specifies that there should be no duplicate rows in a table.
Domain integrity: It enforces valid entries for a given column by restricting the
type, the format, or the range of values.
Referential integrity: It specifies that rows cannot be deleted, which are used
by other records.
User-defined integrity: It enforces some specific business rules that are defined
by users. These rules are different from entity, domain or referential integrity.
What is table
A table is a collection of related data entries and contains rows and columns to
store data.
In DBMS, data is generally stored In RDBMS, the tables have an identifier called
2) in either a hierarchical form or a primary key and the data values are stored in the
navigational form. form of tables.
DBMS does not apply any RDBMS defines the integrity constraint for the
4) security with regards to data purpose of ACID (Atomocity, Consistency, Isolation
manipulation. and Durability) property.
DBMS uses file system to store in RDBMS, data values are stored in the form of
5) data, so there will be no relation tables, so a relationship between these data
between the tables. values will be stored in the form of a table as well.
DBMS has to provide some RDBMS system supports a tabular structure of the
6) uniform methods to access the data and a relationship between them to access
stored information. the stored information.
Examples of DBMS are file Example of RDBMS are mysql, postgre, sql
9)
systems, xml etc. server, oracle etc.
DBMS Architecture
•The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers,
database servers and other components that are connected with networks.
•DBMS architecture depends upon how users are connected to the database
to get their request done.
1-Tier Architecture
•In this architecture, the database is directly available to the user. It means
the user can directly sit on the DBMS and uses it.
•Any changes done here will directly be done on the database itself. It
doesn't provide a handy tool for end users.
2-Tier Architecture
•The 2-Tier architecture is same as basic client-server.
In the two-tier architecture, applications on the client
end can directly communicate with the database
at the server side. For this interaction, API's like:
ODBC, JDBC are used.
•The user interfaces and application programs are
run on the client-side.
3-Tier Architecture
•The 3-Tier architecture contains another layer
between the client and server. In this
architecture, client can't directly communicate
with the server.
•The application on the client-end interacts with
an application server which further
communicates with the database system.
•The 3-Tier architecture is used in case of
large web application.
Three schema Architecture
•This framework is used to describe the structure of a specific database system.
•The three schema architecture is also used to separate the user applications and
physical database.
•The three schema architecture contains three-levels. It breaks the database down
into three different categories.
The three-schema architecture is as follows:
1. Internal Level
•The internal level has an internal schema which describes the physical storage
structure of the database.
•Physical level of database describes how the data is being stored in secondary
storage device.
•It uses the physical data model. It is used to define that how the data will be
stored in a block.
2. Conceptual Level
•The conceptual schema describes the design of a database at the conceptual
level. Conceptual level is also known as logical level.
•The conceptual schema describes the structure of the whole database.
•The conceptual level describes what data are to be stored in the database and
also describes what relationship exists among those data.
3. External Level
•It is define as the design of the database at the externel level.
•The view schema describes the end user interaction with database systems.
Data Independence
•Data Independence means, a chance of data at one level should not be affect
another level. There are two types of data independence:
1. Logical Data Independence
• The data at conceptual and externel level must be independent. This means, a
change in conceptual level should not be affect externel level schema. EX:-
Adding or deleting attributes of table should not affect the user’s view of the table.
2. Attribute
Attributes are the descriptive properties which are owned by each entity of an
Entity Set.
Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
Types of Attributes:
Simple Attributes - Simple attributes are those attributes which cannot be divided
further. It represents a primary key. Ex. Age
Composite Attributes - Composite attributes are those attributes which are
composed of many other simple attributes. Ex. Name, Address
Multi Valued Attributes - Multi valued attributes are those attributes which can
take more than one value for a given entity from an entity set. Ex. Mobile No,
Email ID.
The double oval is used to represent multivalued attribute.
Derived Attributes - Derived attributes are those attributes which can be derived
from other attribute(s). Ex. Age can be derived from DOB.
● Key Attributes - Key attributes are those attributes which can identify an entity
uniquely in an entity set. Ex. Roll No.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.
Types of relationship are as follows:
a. One-to-One Relationship
An entity in set A can be associated with at most one entity in set B. An entity in
set B can be associated with at most one entity in set A. OR
EX:-A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
An entity in set A can be associated with any number (zero ormore) of entities in
set B. An entity in set B can be associated with at most one entity in set A.
EX:- Scientist can invent many inventions, but the invention is done by the only
specific scientist.
c. Many-to-one relationship
An entity in set A can be associated with at most one entity in
set B. An entity in set B can be associated with any number of entities in set A.
EX:- Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
An entity in set A can be associated with any number (zero or
more) of entities in set B. An entity in set B can be associated with any number
(zero or more) of entities in set A.
EX:- Employee can assign by many projects and project can have many
employees.
Relationship:
A relationship is defined as an association among several entities.
● Unary Relationship Set - Unary relationship set is a relationship set where only one
entity set participates in a relationship set.
● Binary Relationship Set - Binary relationship set is a relationship set where two entity
sets participate in a relationship set.
● Ternary Relationship Set - Ternary relationship set is a relationship set where three
entity sets participate in a relationship set.
● N-ary Relationship Set - N-ary relationship set is a relationship set where ‘n’ entity
sets participate in a relationship set.
Keys:- A key is a set of attributes that can identify each tuple uniquely in the
given relation.
1. Primary key:-The primary key defines a set of attributes that are used to
uniquely identify and NOT NULL every tuple.
2. Candidate key:- A candidate key is an attribute or set of an attribute which
can uniquely identify a tuple.
The remaining attributes except for primary key are considered as a candidate
key. The candidate keys are as strong as the primary key.
3. Super Key:- A super key is a combination of all possible attributes which can
uniquely identify two tubles in a table.
4. Foreign key:- It is an attribute or set of attributes that references to primary
key of same table or another table.(Foreign key maintain referential integrity.)
•Referential integrity refers to the relationship between tables. Because each table in
a database must have a primary key, this primary key can appear in other tables
because of its relationship to data within those tables.
5 Composite Key - A primary key composed of multiple attributes and not just
a single attribute is called a composite key.
6 Unique Key - It is unique for all the records of the table. Once assigned, its
value cannot be changed i.e. it is non-updatable. It may have a NULL value.
Integrity Constraints
•Integrity constraints are a set of rules. It is used to maintain the quality of
information.
•Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
•Thus, integrity constraint is used to guard against accidental damage to the
database.
1. Domain constraints:- Domain constraint defines the domain or set of values for an
attribute. It specifies that the value taken by the attribute must be the atomic value from its
domain.
2. Entity integrity constraints:- Entity integrity constraint specifies that no
attribute of primary key must contain a null value in any relation.
A table can contain a null value other than the primary key field.
3. Referential Integrity Constraints:- It specifies that all the values taken by
the foreign keymust either be available in the relation of the primary key or be
null.
4. Key constraints:- All the values of the primary key must be unique. The value
of the primary key must not be null.
Partial Dependency –If the proper subset of candidate key determines non-
prime attribute, it is called partial dependency.
Transitive dependency –If A->B and B->C are two FDs then A->C is called
transitive dependency.
Natural Join:- A natural join is the set of tuples of all combinations are equal on their
common attribute names and duplicate attributes are removed by default.
Note: Natural Join is equivalent to cross product if two relations have no attribute
in common and natural join of a relation R with itself will return R only.
•It is denoted by ⋈.
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched
records from the right table. Ex:-
SELECT * FROM Table_A A LEFT JOIN Table_B B ON A.col = B.col;
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched
records from the left table. Ex:-
SELECT * FROM Table_A A RIGHT JOIN Table_B B ON A.col = B.col;
FULL (OUTER) JOIN: Returns all records when there is a match in either left or right
table. Ex:-
SELECT * FROM Table_A A FULL JOIN Table_B B ON A.col = B.col;
Self-Join : - A table is joined to itself based on some relation between its own column(s). Self-join
uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the
table within the
query.
Cross-Join:- The table are join contains the same number of rows as in the cross-product of the
number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like
an INNER JOIN.
3. Equi join:-It is a special case of conditional join when only equality conditions
are applied between attributes.
Relational Algebra
Relational Algebra is a procedural query language which takes a relation as an
input and generates a relation as an output.
∩(Intersection):- Returns those tuples which are in both R1 and R2. Max no.
of rows returned = min(m,n) and Min no. of rows returned =0.
File Structures:
● Primary Index: A primary index is an ordered file, records of fixed length with two fields.
First field is the same as the primary key as a data file and the second field is a pointer to
the data block, where the key is available. The average number of block accesses using
index = log2 Bi + 1, where Bi = number of index blocks.
● Clustering Index: Clustering index is created on data file whose records are physically
ordered on a non-key field (called Clustering field).
● Secondary Index: Secondary index provides secondary means of accessing a file for
which primary access already exists.
B Trees
At every level , we have Key and Data Pointer and data pointer points to either block or record.
Properties of B-Trees:
Root of B-tree can have children between 2 and P, where P is Order of tree.
Order of tree – Maximum number of children a node can have.
Internal node can have children between ⌈ P/2 ⌉ and P
Internal node can have keys between ⌈ P/2 ⌉ – 1 and P-1
B+ Trees
In B+ trees, the structure of leaf and non-leaf are different, so their order is. Order of non-leaf
will be higher as compared to leaf nodes.
Searching time will be less in B+ trees, since it doesn’t have record pointers in non-leaf
because of which depth will decrease.
SQL
SQL stands for Structured Query Language. It is used for storing and
managing data in relational database management system (RDMS).
Advantages of SQL
High speed:- the user can quickly and efficiently retrieve a large amount of
records from a database.
Multiple data view:- the users can make different views of the database
structure.
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
BETWEEN It is used to search for values that are within a set of values.
SELECT:
WHERE:
The WHERE clause is used to filter records.
Syntax –
● SELECT column1, column2, ...
FROM table_name
WHERE condition;
Ex –
● SELECT * FROM Customers
WHERE Country='Mexico';
Ex –
● SELECT * FROM Customers
WHERE Country='Germany' AND City='Berlin';
INSERT INTO:
The INSERT INTO statement is used to insert new records in a table.
Syntax –
● INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
● INSERT INTO table_name
VALUES (value1, value2, value3, ...);
*In the second syntax, make sure the order of the values is in the same order as the columns in
the table.
Ex –
● INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode,
Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');
NULL Value:
It is not possible to test for NULL values with comparison operators, such as =, <, or
<>. We will have to use the IS NULL and IS NOT NULL operators instead.
Syntax –
● SELECT column_names
FROM table_name
WHERE column_name IS NULL;
● SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;
Ex –
● SELECT CustomerName, ContactName, Address
FROM Customers
WHERE Address IS NULL;
UPDATE:
The UPDATE statement is used to modify the existing records in a table.
Syntax –
● UPDATE table_nameSET column1 = value1, column2 = value2, ...
WHERE condition;
Ex –
● UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
DELETE:
The DELETE statement is used to delete existing records in a table.
Syntax –
● DELETE FROM table_name WHERE condition;
● DELETE FROM table_name;
In 2ndsyntax, all rows are deleted. The table structure, attributes, and indexes will be intact
Ex –
● DELETE FROM Customers WHERE CustomerName='Alfreds Futterkiste';
SELECT TOP:
The SELECT TOP clause is used to specify the number of records to return.
Syntax –
● SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;
● SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;
● SELECT column_name(s)
FROM table_name
ORDER BY column_name(s)
FETCH FIRST number ROWS ONLY;
● SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number;*In case the interviewer asks other than the TOP, rest are also correct.
(Diff. DB Systems)
Ex –
● SELECT TOP 3 * FROM Customers;
● SELECT * FROM Customers
LIMIT 3;
● SELECT * FROM Customers
FETCH FIRST 3 ROWS ONLY;
Aggregate Functions:
MIN():
The MIN() function returns the smallest value of the selected column.
Syntax –
● SELECT MIN(column_name) FROM table_name WHERE condition;
Ex –
● SELECT MIN(Price) AS SmallestPrice FROM Products;
MAX():
The MAX() function returns the largest value of the selected column.
Syntax –
● SELECT MAX(column_name) FROM table_name WHERE condition;
Ex –
● SELECT MAX(Price) AS LargestPrice FROM Products;
COUNT():
The COUNT() function returns the number of rows that matches a specified criterion.
Syntax –
● SELECT COUNT(column_name) FROM table_name WHERE condition;
Ex –
● SELECT COUNT(ProductID) FROM Products;
AVG():
The AVG() function returns the average value of a numeric column.
Syntax –
● SELECT AVG(column_name) FROM table_name WHERE condition;
Ex –
● SELECT AVG(Price) FROM Products;
SUM():
The SUM() function returns the total sum of a numeric column.
Syntax –
● SELECT SUM(column_name) FROM table_name WHERE condition;
Ex –
● SELECT SUM(Quantity) FROM OrderDetails;
LIKE Operator:
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards often used in conjunction with the LIKE operator:
● The percent sign (%) represents zero, one, or multiple characters
● The underscore sign (_) represents one, single character
Syntax –● SELECT column1, column2, ...
FROM table_name WHERE columnN LIKE pattern;
LIKE Operator Description
WHERE CustomerName LIKE 'a%' Finds any values that start with "a"
WHERE CustomerName LIKE '%a' Finds any values that end with "a"
WHERE CustomerName LIKE '%or%' Finds any values that have "or" in any position
WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second
position
WHERE CustomerName LIKE 'a_%' Finds any values that start with "a" and are at
least 2 characters in length
WHERE CustomerName LIKE 'a__%' Finds any values that start with "a" and are at
least 3 characters in length
WHERE ContactName LIKE 'a%o' Finds any values that start with "a" and ends with
"o"
IN:
The IN operator allows you to specify multiple values in a WHERE clause.
The IN operator is a shorthand for multiple OR conditions.
Syntax –
● SELECT column_name(s) FROM table_name
WHERE column_name IN (value1, value2, ...);
● SELECT column_name(s) FROM table_name
WHERE column_name IN (SELECT STATEMENT);
Ex –
● SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
● SELECT * FROM CustomersWHERE Country IN (SELECT Country FROM Suppliers);
BETWEEN:
The BETWEEN operator selects values within a given range. The values can be numbers, text, or
dates.
The BETWEEN operator is inclusive: begin and end values are included.
Syntax –
● SELECT column_name(s) FROM table_name
WHERE column_name BETWEEN value1 AND value2;
Ex –
● SELECT * FROM Products WHERE Price BETWEEN 10 AND 20;
Joins:
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
INNER JOIN:
The INNER JOIN keyword selects records that have matching values in both tables.
Syntax –
● SELECT column_name(s) FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
Ex –
● SELECT Orders.OrderID, Customers.CustomerName FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
LEFT (OUTER) JOIN:The LEFT JOIN keyword returns all records from the left table (table1), and
the matching records
from the right table (table2). The result is 0 records from the right side, if there is no match.
Syntax –
● SELECT column_name(s) FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
Ex –
● SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
UNION:
The UNION operator is used to combine the result-set of two or more SELECT statements.
● Every SELECT statement within UNION must have the same number of columns
● The columns must also have similar data types
● The columns in every SELECT statement must also be in the same order
The UNION operator selects only distinct values by default. To allow duplicate values,
use UNION ALL
Syntax –
● SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
● SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
Ex –
● SELECT City FROM Customers UNION
SELECT City FROM Suppliers ORDER BY City;
GROUP BY:
The GROUP BY statement groups rows that have the same values into summary rows, like "find
the number of customers in each country".The GROUP BY statement is often used with aggregate
functions
(COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more
columns.
Syntax –
● SELECT column_name(s) FROM table_name
WHERE condition GROUP BY column_name(s)
ORDER BY column_name(s);
Ex –
● SELECT COUNT(CustomerID), Country
FROM Customers GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
HAVING:
The HAVING clause was added to SQL because the WHERE keyword cannot be used with
aggregate functions.
*WHERE is given priority over HAVING.
Syntax –
● SELECT column_name(s) FROM table_name
WHERE condition GROUP BY column_name(s)
HAVING condition ORDER BY column_name(s);
Ex –
● SELECT COUNT(CustomerID), Country
FROM Customers GROUP BY Country
HAVING COUNT(CustomerID) > 5;
CREATE DATABASE:
The CREATE DATABASE statement is used to create a new SQL database.Syntax –
● CREATE DATABASE databasename;
DROP DATABASE:
The DROP DATABASE statement is used to drop an existing SQL database.
Syntax –
● DROP DATABASE databasename;
CREATE TABLE:
The CREATE TABLE statement is used to create a new table in a database.
Syntax –
● CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
DROP TABLE:
The DROP TABLE statement is used to drop an existing table in a database.
Syntax –
● DROP TABLE table_name;
TRUNCATE TABLE:
The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.
Syntax –
● TRUNCATE TABLE table_name;
ALTER TABLE:
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
The ALTER TABLE statement is also used to add and drop various constraints on an existing
table.
Syntax –
● ALTER TABLE table_name
ADD column_name datatype;
● ALTER TABLE table_name
DROP COLUMN column_name;
● ALTER TABLE table_name
MODIFY COLUMN column_name datatype;
Ex –
● ALTER TABLE Customers
ADD Email varchar(255);
● ALTER TABLE Customers
DROP COLUMN Email;
● ALTER TABLE Persons
ALTER COLUMN DateOfBirth year;
Distributed Databases
In a distributed database, there are a number of databases that may
be geographically distributed all over the world. A distributed DBMS
manages the distributed database in a manner so that it appears as
one single database to users.
Features
• Databases in the collection are logically interrelated with each
other. Often they represent a single logical database.
• Data is physically stored across multiple sites. Data in each site
can be managed by a DBMS independent of the other sites.
• The processors in the sites are connected via a network. They do
not have any multiprocessor configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it
is not synonymous with a transaction processing system.
Homogeneous distributed
Homogeneous distributed database, all the sites use identical DBMS
and operating systems. Its properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other
sites to process user requests.
• The database is accessed through a single interface as if it is a
single database.
There are two types of homogeneous distributed database −
•Autonomous − Each database is independent that functions on
its own. They are integrated by a controlling application and use
message passing to share data updates.
•Non-autonomous − Data is distributed across the
homogeneous nodes and a central or master DBMS co-ordinates
data updates across the sites.
Heterogeneous distributed
In a heterogeneous distributed database, different sites have different
operating systems, DBMS products and data models. Its properties are
−
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like
relational, network, hierarchical or object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-
operation in processing user requests.
Types of Heterogeneous Distributed Databases
•Federated− The heterogeneous database systems are
independent in nature and integrated together so that they
function as a single database system.
•Un-federated− The database systems employ a central
coordinating module through which the databases are accessed.
Codd’s Rules
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd and
he also invent the relational model for database management. These rules are made
to ensure data integrity, consistency, and usability. This set of rules basically
signifies the characteristics and requirements of a relational database management
system (RDBMS). In this article, we will learn about various Codd’s rules.
Transaction?
Indexing
Indexing improves database performance by minimizing the number of
disc visits required to fulfill a query. It is a data structure technique used
to locate and quickly access data in databases.