0% found this document useful (0 votes)
46 views

Basic Database Concepts

Uploaded by

Ishmal Touqeer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Basic Database Concepts

Uploaded by

Ishmal Touqeer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Basic Database Concepts

The database system is an excellent computer-based record-keeping system. A


collection of data, commonly called a database, contains information about a
particular enterprise. It maintains any information that may be necessary to the
decision-making process involved in the management of that organization. It can
also be defined as a collection of interrelated data stored together to serve multiple
applications, the data is stored so that it is independent of programs that use the
data. A generic and controlled approach is used to add new data and modify and
retrieve existing data within the database. The data is structured so as to provide
the basis for future application development.

A database is information that's set up for easy access, management and updating.
Computer databases typically store aggregations of data records or files that contain
information such as sales transactions, customer data, financials and product
information.

Databases are used for storing, maintaining and accessing any sort of data. They
collect information on people, places or things. This information is gathered in one
place so it can be observed and analyzed. Databases can be thought of as an
organized collection of information.

Databases are essential for storing large amounts of data in one place. With
databases, organizations can quickly access, manage, modify, update, organize and
retrieve their data

Databases are normally controlled using a database management system (DBMS).


In the database, data is organized into tables consisting of rows and columns. Many
databases also use Structured Query Language (SQL) for writing and querying data.
There are different kinds of databases, however, so the exact language used and
how it works depends on the type of database.

What are databases used for?

Business, government and scientific fields use databases for data storage, analysis
and management. Organizations use the data stored in databases to make informed
business decisions. Some of the ways organizations use databases include the
following:
 Improve business processes. Companies collect data about business
processes, such as sales, order processing and customer service. They analyze
that data to improve these processes, expand their business and grow revenue.

 Store personal data. Databases can also be used to store personal


information. For example, personal cloud storage is available for individual
users to store media, such as photos, in a managed cloud.

What are the components of a database?

While the different types of databases vary in schema, data structure and data types
most suited to them, they're all comprised of the following five basic components:

 Hardware. This is the physical device that database software runs on. Database
hardware includes computers, servers and hard drives.

 Software. Database software or applications give users control of the database.


DBMS software is used to manage and control databases.

 Data. This is the raw information that the database stores. Database
administrators organize the data to make it more meaningful.

 Data access language. This is the programming language that controls the
database. The programming language and the DBMS must work together. One
of the most common database languages is SQL.

 Procedures. These rules determine how the database works and how it handles
the data.

Advantages of using Database

 Database minimizes data redundancy to a great extent.


 The database can control the inconsistency of data to a large extent.
 Sharing of data is also possible using the database.
 Database enforce standards.
 The use of Databases can ensure data security.
 Integrity can be managed using the database.

Various Levels of Database Implementation


The database is implemented through three general levels. These levels are:

 Internal Level or Physical level


 Conceptual Level
 External Level or View Level
The Concept of Data Independence
As the database may be viewed through three levels of abstraction, any change at
any level can affect other levels' schemas. Since the database keeps on growing,
then there may be frequent changes at times. This should not lead to redesigning
and re-implementation of the database. The concepts of data independence prove
beneficial in such types of contexts.

 Physical data independence


 Logical data independence

Basic Terminologies Related to Database and SQL


Relation: In general, a relation is a table, i.e., data is arranged in rows and columns.
A relation has the following properties:

 In any given column of a table, all the items are of the same kind, whereas
items in different columns may not be of the same kind.
 For a row, each column must have an atomic value, and also, for a row, a
column cannot have more than one value.
 All rows of a relation are distinct.
 The ordering of rows in a relationship is immaterial.
 The column of a relation are assigned distinct names, and the ordering of
these columns is immaterial.

Tuple: The rows of tables in a relationship are generally termed Tuples.

Attributes: The columns or fields of a table are termed Attributes.

Degree: The number of attributes in a relation determines the degree of the relation.
A relation having three attributes is said to have a relation of degree 3.

Cardinality: The number of tuples or rows in a relation is termed cardinality.

Types of databases

There are many types of databases. They're classified according to content type:
bibliographic, full text, numeric and images. In computing, databases are often
classified by the organizational approach they use.

Some of the main organizational databases include the following.

1. There are several types of databases, that are briefly explained below.
2. Hierarchical databases
3. Network databases
4. Object-oriented databases
5. Relational databases
6. Cloud Database
7. Operational Database
8. NoSQL databases

What is a database management system?


A DBMS is software that enables users to create and manage a database. It also
helps them create, read, update and delete data in a database, and it assists
with logging and auditing functions.

A DBMS provides physical and logical independence from data. Users and
applications don't need to know either the physical or logical locations of data. A
DBMS can also limit and control access to the database and provide different views
of the same database schema to multiple users. Some examples of DBMSes include
Microsoft SQL Server, MySQL and Oracle Database.

Relational Model in DBMS

A relational database is a type of database that stores and provides


access to data points that are related to one another.
Relational databases are based on the relational model, an intuitive,
straightforward way of representing data in tables. In a relational
database, each row in the table is a record with a unique ID called the
key. The columns of the table hold attributes of the data, and each record
usually has a value for each attribute, making it easy to establish the
relationships among data points.

After designing the conceptual model of the Database using ER diagram, we need to
convert the conceptual model into a relational model which can be implemented using
any RDBMS language like Oracle SQL, MySQL, etc. So we will see what the Relational
Model is.

The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Tables are also known as relations. The relational model is an example of a record-based
model. Record-based models are so named because the database is structured in fixed-
format records of several types. Each table contains records of a particular type. Each record
type defines a fixed number of fields, or attributes. The columns of the table correspond to
the attributes of the record type. The relational data model is the most widely used data
model, and a vast majority of current database systems are based on the relational model.

What is the Relational Model?


The relational model represents how data is stored in Relational Databases. A
relational database consists of a collection of tables, each of which is assigned a
unique name

Important Terminologies
 Attribute: Attributes are the properties that define an entity.
e.g.; ROLL_NO, NAME, ADDRESS
 Relation Schema: A relation schema defines the structure of the relation and
represents the name of the relation with its attributes. e.g.; STUDENT
(ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation schema for
STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
 Tuple: Each row in the relation is known as a tuple. The above relation contains
4 tuples, one of which is shown as:

1 RAM DELHI 9455123451 18

 Relation Instance: The set of tuples of a relation at a particular instance of time


is called a relation instance. Table 1 shows the relation instance of STUDENT
at a particular time. It can change whenever there is an insertion, deletion, or
update in the database.
 Degree: The number of attributes in the relation is known as the degree of the
relation. The STUDENT relation defined above has degree 5.
 Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
 Column: The column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from the relation STUDENT.
ROLL_NO

 NULL Values: The value which is not known or unavailable is called a NULL
value. It is represented by blank space. e.g.; PHONE of STUDENT having
ROLL_NO 4 is NULL.
Relational Model Concepts

As discussed earlier, a relational database is based on the relational model. This database
consists of various components based on the relational model. These include:
 Relation: Two-dimensional table used to store a collection of data elements.
 Tuple: Row of the relation, depicting a real-world entity.
 Attribute/Field: Column of the relation, depicting properties that define the relation.
 Attribute Domain: Set of pre-defined atomic values that an attribute can take i.e., it
describes the legal values that an attribute can take.
 Degree: It is the total number of attributes present in the relation.
 Cardinality: It specifies the number of entities involved in the relation i.e., it is the
total number of rows present in the relation. Read more about Cardinality in DBMS.
 Relational Schema: It is the logical blueprint of the relation i.e., it describes the
design and the structure of the relation. It contains the table name, its attributes,
and their types:

TABLE_NAME(ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)

For our Student relation example, the relational schema will be:

STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

 Relational Instance: It is the collection of records present in the relation at a given


time

Keys in Relational Model


Keys are one of the basic requirements of a relational database model. It is widely used to
identify the tuples (rows) uniquely in the table. We also use keys to set up relations amongst
various columns and tables of a relational database.

Different Types of Database Keys


 Candidate Key
 Primary Key
 Super Key
 Alternate Key
 Foreign Key
 Composite Key

Super Key:

Super Key is defined as a set of attributes within a table that can uniquely identify each
record within a table. Super Key is a superset of Candidate key. In the table defined
above super key would include student_id, (student_id, name), phone etc.

Candidate Key

Candidate keys are defined as the minimal set of fields which can uniquely identify each
record in a table. It is an attribute or a set of attributes that can act as a Primary Key for a
table to uniquely identify each record in that table. There can be more than one candidate key.

In our example, student_id and phone both are candidate keys for table Student.

 A candidate key can never be NULL or empty. And its value should be unique.

 There can be more than one candidate keys for a table.

 A candidate key can be a combination of more than one columns (attributes).


Primary Key

Primary key is a candidate key that is most appropriate to become the main key for any table.
It is a key that can uniquely identify each record in a table.

For the table Student we can make the student_id column as the primary key.

Composite Key

Key that consists of two or more attributes that uniquely identify any record in a table is
called Composite key. But the attributes which together form the Composite key are not a
key independentely or individually.

In the above picture we have a Score table which stores the marks scored by a student in a
particular subject.

In this table student_id and subject_id together will form the primary key, hence it is a
composite key.

Foreign Key FOREIGN KEY is a field (or collection of fields) in one table, that refers to the

PRIMARY KEY in another table. The table with the foreign key is called the child table, and

the table with the primary key is called the referenced or parent table. It is a key it acts as a

primary key in one table and it acts as

secondary key in another table.

It combines two or more relations (tables) at a time.

They act as a cross-reference between the tables.

For example, DNO is a primary key in the DEPT table and a non-key in EMP
Use of Foreign Key

The use of a foreign key is simply to link the attributes of two tables together with the help of

a primary key attribute. Thus, it is used for creating and maintaining the relationship between

the two relation

Constraints in Relational Model


While designing the Relational Model, we define some conditions which must hold
for data present in the database are called Constraints. These constraints are
checked before performing any operation (insertion, deletion, and updation ) in the
database. If there is a violation of any of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie
inside the domain range. e.g.; If a constraint AGE>0 is applied to STUDENT
relation, inserting a negative value of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that
defines a tuple uniquely. Those set of attributes is called keys. e.g.; ROLL_NO in
STUDENT is key. No two students can have the same roll number. So a key has
two properties:
 It should be unique for all tuples.
 It can’t have NULL values.
Referential Integrity
When one attribute of a relation can only take values from another attribute of the
same relation or any other relation, it is called referential integrity.

Advantages of the Relational Model


 Simple model: Relational Model is simple and easy to use in comparison to
other languages.
 Flexible: Relational Model is more flexible than any other relational model
present.
 Secure: Relational Model is more secure than any other relational model.
 Data Accuracy: Data is more accurate in the relational data model.
 Data Integrity: The integrity of the data is maintained in the relational model.
 Operations can be Applied Easily: It is better to perform operations in the
relational model.

Disadvantages of the Relational Model


 Relational Database Model is not very good for large databases.
 Sometimes, it becomes difficult to find the relation between tables.
 Because of the complex structure, the response time for queries is high.

Characteristics of the Relational Model


 Data is represented in rows and columns called relations.
 Data is stored in tables having relationships between them called the Relational
model.
 The relational model supports the operations like Data definition, Data
manipulation, and Transaction management.
 Each column has a distinct name and they are representing attributes.
 Each row represents a single entity.
What is SQL (Structured Query Language)?

What is SQL?

Structured query language (SQL) is a programming language for storing and


processing information in a relational database. A relational database stores
information in tabular form, with rows and columns representing different data
attributes and the various relationships between the data values. You can use SQL
statements to store, update, remove, search, and retrieve information from the
database. You can also use SQL to maintain and optimize database performance.

Why is SQL important?

Structured query language (SQL) is a popular query language that is frequently used
in all types of applications. Data analysts and developers learn and use SQL
because it integrates well with different programming languages. For example, they
can embed SQL queries with the Java programming language to build high-
performing data processing applications with major SQL database systems such as
Oracle or MS SQL Server. SQL is also fairly easy to learn as it uses common
English keywords in its statements

History of SQL

SQL was invented in the 1970s based on the relational data model. It was initially
known as the structured English query language (SEQUEL). The term was later
shortened to SQL. Oracle, formerly known as Relational Software, became the first
vendor to offer a commercial SQL relational database management system.

What are the components of a SQL system?

Relational database management systems use structured query language (SQL) to


store and manage data. The system stores multiple database tables that relate to
each other. MS SQL Server, MySQL, or MS Access are examples of relational
database management systems. The following are the components of such a
system.

SQL table

A SQL table is the basic element of a relational database. The SQL database table
consists of rows and columns. Database engineers create relationships between
multiple database tables to optimize data storage space.

For example, the database engineer creates a SQL table for products in a store:

Product ID Product Name Color ID


0001 Mattress Color 1
0002 Pillow Color 2

Then the database engineer links the product table to the color table with the Color
ID:

Color ID Color Name


Color 1 Blue
Color 2 Red

SQL statements

SQL statements, or SQL queries, are valid instructions that relational database
management systems understand. Software developers build SQL statements by
using different SQL language elements. SQL language elements are components
such as identifiers, variables, and search conditions that form a correct SQL
statement.

The data in RDBMS is stored in database objects called tables. A table is a


collection of related data entries and it consists of columns and rows.

Look at the "Customers" table:

example

SELECT * FROM Customers;

Every table is broken up into smaller entities called fields. The fields in the
Customers table consist of CustomerID, CustomerName, ContactName,
Address, City, PostalCode and Country. A field is a column in a table that
is designed to maintain specific information about every record in the
table.

A record, also called a row, is each individual entry that exists in a table.
For example, there are 91 records in the above Customers table. A record
is a horizontal entity in a table.
A column is a vertical entity in a table that contains all information
associated with a specific field in a table.

What are SQL commands?

Structured query language (SQL) commands are specific keywords or SQL


statements that developers use to manipulate the data stored in a relational
database. You can categorize SQL commands as follows.

Data definition language

Data definition language (DDL) refers to SQL commands that design the database
structure. Database engineers use DDL to create and modify database objects
based on the business requirements. For example, the database engineer uses the
CREATE command to create database objects such as tables, views, and indexes.

Data query language

Data query language (DQL) consists of instructions for retrieving data stored in
relational databases. Software applications use the SELECT command to filter and
return specific results from a SQL table.

Data manipulation language

Data manipulation language (DML) statements write new information or modify


existing records in a relational database. For example, an application uses the
INSERT command to store a new record in the database.

Data control language

Database administrators use data control language (DCL) to manage or authorize


database access for other users. For example, they can use the GRANT command
to permit certain applications to manipulate one or more tables.

Transaction control language

The relational engine uses transaction control language (TCL) to automatically make
database changes. For example, the database uses the ROLLBACK command to
undo an erroneous transaction.

What is NoSQL?

NoSQL refers to non-relational databases that don't use tables to store data.
Developers store information in different types of NoSQL databases, including
graphs, documents, and key-values. NoSQL databases are popular for modern
applications because they are horizontally scalable. Horizontal scaling means
increasing the processing power by adding more computers that run NoSQL
software.
SQL vs. NoSQL

Structured query language (SQL) provides a uniform data manipulation language,


but NoSQL implementation is dependent on different technologies. Developers use
SQL for transactional and analytical applications, whereas NoSQL is suitable for
responsive, heavy-usage applications.
What is backup and recovery?
Backup and recovery is the process of duplicating data and storing it in a secure place
in case of loss or damage, and then restoring that data to a location—the original one or
a safe alternative—so it can be used again in operations. Ideally, this backup copy
(often called a snapshot) is immutable—meaning it cannot be altered after it is created
to protect against mutations such as ransomware. Backup and recovery is also a
category of onsite and cloud-based technology solutions that automate and support this
process, enabling organizations to protect and retain their data for business and
compliance reasons.

What are the 3 types of backups?


Backups are often bucketed into three categories:
 Full backups – Like filling up an extra tire at the service station, think of this process
as pumping all of the data stored on a production system into a backup system for
safekeeping. Full backups protect every bit of data from a single server, database,
virtual machine (VM), or data source connected to the network. These backups can
take many hours, even days, depending on the amount of data being saved. The
more modern a data management solution is, the fewer full backups it must perform,
and when it does, the faster it goes.
 Incremental backups – Think of incremental backups as adding just a little more air
each time you revisit the station—just in case—so you’re always ready to replace
your tire. An incremental backup captures only new data since the last full
incremental was performed. However, a full backup is required before a backup
solution can perform its first incremental backup. Then it can automatically do them
based on the last incremental taken.
 Differential backups – Like incremental backups, these add more air but the delta
is from the last full backup, not the last incremental. Think of this backup as what’s
different from the last time you even filled the tire with air. Again, this can only
happen if a full backup has been performed first. Organizations typically establish
policies about how much data and when incremental or differential backups should
occur.
What is the difference between backup and recovery?
The key difference between backup and recovery is that the backup process is how you
save and protect your production data and safely store it away so you have it for a later
time, when you might need to use it.

Recovery is the process whereby you retrieve and restore that backup data to your
production systems to avoid downtime.

Reliable backups and fast recovery together ensure business continuity and business
resilience.
What are the types of data recovery?
The amount of data organizations create, capture, and store has skyrocketed over the
last decade. And analysts anticipate the amount of new data generated will grow at
more than 50% compounded annually.

Because enterprises and people are storing data in more places, new categories of
backup data recovery have emerged. These include:

 Granular recovery of files, folders and objects – Also known as file-level or


object-level recovery, this is the process of quickly getting back one or just a few
specific data sets from among many volumes.
 Instant mass restore – This process allows IT staff to recover not just files but
hundreds of virtual machines (VMs) instantly, at scale, to any point in time, saving
time and resources.
 Volume recovery – A process teams that need to recover an unlimited number of
VMs at the same time used for faster recovery; for example, all VMs belonging to an
application group.
 Virtual Machine Disk (VMDK) recovery – This recovery process ensures all data
and apps on a VM are restored quickly.
 Bare machine recovery – Restoring an entire operating system (software, apps,
and data) in one process.
 Instant volume mounts – Teams can save time using a backup solution as a target
to restore an entire volume to a Windows VM.
 Instant restores of VMs – This process restores a large number of VMs to any
previous recovery point with backup copies fully hydrated and available immediately.

Database Recovery Techniques in DBMS


Database Systems like any other computer system, are subject to failures but the
data stored in them must be available as and when required. When a database
fails it must possess the facilities for fast recovery. It must also have atomicity i.e.
either transactions are completed successfully and committed (the effect is
recorded permanently in the database) or the transaction should have no effect on
the database.
Types of Recovery Techniques in DBMS
Database recovery techniques are used in database management systems
(DBMS) to restore a database to a consistent state after a failure or error has
occurred. The main goal of recovery techniques is to ensure data integrity and
consistency and prevent data loss.
There are mainly two types of recovery techniques used in DBMS
 Rollback/Undo Recovery Technique
 Commit/Redo Recovery Technique
 CheckPoint Recovery Technique

Rollback/Undo Recovery Technique


The rollback/undo recovery technique is based on the principle of backing out or
undoing the effects of a transaction that has not been completed successfully due
to a system failure or error. This technique is accomplished by undoing the
changes made by the transaction using the log records stored in the transaction
log. The transaction log contains a record of all the transactions that have been
performed on the database. The system uses the log records to undo the changes
made by the failed transaction and restore the database to its previous state.
Commit/Redo Recovery Technique
The commit/redo recovery technique is based on the principle of reapplying the
changes made by a transaction that has been completed successfully to the
database. This technique is accomplished by using the log records stored in the
transaction log to redo the changes made by the transaction that was in progress at
the time of the failure or error. The system uses the log records to reapply the
changes made by the transaction and restore the database to its most recent
consistent state.
Checkpoint Recovery Technique
Checkpoint Recovery is a technique used to improve data integrity and system
stability, especially in databases and distributed systems. It entails preserving the
system’s state at regular intervals, known as checkpoints, at which all ongoing
transactions are either completed or not initiated. This saved state, which includes
memory and CPU registers, is kept in stable, non-volatile storage so that it can
withstand system crashes. In the event of a breakdown, the system can be
restored to the most recent checkpoint, which reduces data loss and downtime.
The frequency of checkpoint formation is carefully regulated to decrease system
overhead while ensuring that recent data may be restored quickly.
Overall, recovery techniques are essential to ensure data consistency and
availability in Database Management System, and each technique has its own
advantages and limitations that must be considered in the design of a recovery
system.
Head-to-head comparison between Backup and Recovery

Features Backup Recovery

It is the process of restoring


It is a replication of data that is
inaccessible, damaged, lost,
Definition utilized to recover the original data in
corrupted, or formatted data to its
the event of a data loss.
original state.

It stores the replication of the original It is the technique of restoring lost


Basic
data at a distinct location. data.

Cost It is less expensive. It is more expensive than a backup.


Its main aim is to preserve one Its main purpose is to recover
Main Goal additional copy for reference in the original data in the event of original
event of original data loss. data failure.

Production usage of backup is very Production usage of recovery is very


Production usage
common. rare.

It stores copies of the files in an It is performed internally on your


Location
external location. computer system.

It doesn't require additional storage


Storage Space It requires additional storage space.
space.

It assists in improving data It assists in improving the database's


Assistance
protection. reliability.

It is the automatic creation of restore


Creation It is not generated automatically.
points by your computer system.

SnapManager is an example of a SnapManager is an example of


Example backup because it creates a backup Recover, which retrieves data from
of the entire database. the latest transaction.

Here, you will learn head-to-head comparisons between Backup and Recovery. The
main differences between Backup and Recovery are as follows:

Database index

An index in a database is a data structure that improves the speed of data retrieval operations
on a database table. It acts like a lookup table that allows the database management system
(DBMS) to find data more efficiently without scanning the entire table.

Key Points about Indexes:

1. Purpose:
o Indexes are used to quickly locate and access data in a database table,
significantly speeding up query performance, especially for large datasets.
2. Types of Indexes:
o Single-column Index: Created on a single column of a table.
o Multi-column (Composite) Index: Created on multiple columns to enhance
queries that filter on more than one column.
o Unique Index: Ensures that all values in the indexed column(s) are unique,
often used for primary keys.
o Full-text Index: Allows for efficient searching of text in large textual data.

3. Structure:
o Indexes are typically implemented using data structures like B-trees or hash
tables, allowing for efficient searching and retrieval.

4. Trade-offs:
o While indexes speed up read operations (SELECT queries), they can slow
down write operations (INSERT, UPDATE, DELETE) because the index needs to
be updated as well.
o Indexes consume additional disk space.

5. Creating an Index:
o In SQL, you can create an index using the CREATE INDEX statement. For
example:

sql
Copy code
CREATE INDEX idx_last_name ON employees(last_name);

6. Using Indexes:
o When a query is executed, the database engine decides whether to use an
index based on its query optimizer and can significantly reduce the number of
rows scanned.

Conclusion:

Indexes are crucial for optimizing database performance, particularly in read-heavy


environments. Understanding how to effectively create and manage indexes can lead to
significant improvements in application responsiveness and efficiency.

You might also like