De Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

UNIT-1

Introduction to DBMS & concept


Database Management System or DBMS in short refers to the technology of storing and
retrieving data with most efficiency along with appropriate security measures. This tutorial
explains the basics of DBMS such as its architecture, data models, data schemas, data
independence, E-R model, relation model, relational database design, and storage and file
structure and much more.

Why to Learn DBMS?

Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data
management. A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a
school database may use students as an entity and their age as an attribute.
• Relation-based tables − DBMS allows entities and relations among them to form
tables. A user can understand the architecture of a database just by looking at the
table names.
• Isolation of data and application − A database system is entirely different than its
data. A database is an active entity, whereas data is said to be passive, on which the
database works and organizes. DBMS also stores metadata, which is data about
data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a relation
when any of its attributes is having redundancy in values. Normalization is a
mathematically rich and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of
leaving database in inconsistent state. A DBMS can provide greater consistency as
compared to earlier forms of data storing applications like file-processing systems.
• Query Language − DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and as different
filtering options as required to retrieve a set of data. Traditionally it was not
possible where file-processing system was used.
Applications of DBMS

Database is a collection of related data and data is a collection of facts and figures that can
be processed to produce information.
Mostly data represents recordable facts. Data aids in producing information, which is based
on facts. For example, if we have data about marks obtained by all students, we can then
conclude about toppers and average marks.
A database management system stores data in such a way that it becomes easier to
retrieve, manipulate, and produce information. Following are the important characteristics
and applications of DBMS.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,
and Durability (normally shortened as ACID). These concepts are applied on
transactions, which manipulate data in a database. ACID properties help the
database stay healthy in multi-transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment and
allows them to access and manipulate data in parallel. Though there are restrictions
on transactions when users attempt to handle the same data item, but users are
always unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is in the
Sales department will have a different view of database than a person working in
the Production department. This feature enables the users to have a concentrate
view of the database according to their requirements.
• Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers many different
levels of security features, which enables multiple users to have different views
with different features.
For example, a user in the Sales department cannot see the data that belongs to the
Purchase department. Additionally, it can also be managed how much data of the
Sales department should be displayed to the user. Since a DBMS is not saved on the
disk as traditional file systems, it is very hard for miscreants to break the code.

Purpose of Database Management Systems


Organizations use large amounts of data. A database management system (DBMS) is a
software tool that makes it possible to organize data in a database.
The standard acronym for database management system is DBMS, so you will often see this
instead of the full name. The ultimate purpose of a database management system is to store
and transform data into information to support making decisions.
A DBMS consists of the following three elements:
->The physical database: the collection of files that contain the data
->The database engine: the software that makes it possible to access and modify the
contents of the database
->The database scheme: the specification of the logical structure of the data stored in the
database
While it sounds logical to have a DBMS in place, it is worth thinking for a moment about the
alternative. What would the data in an organization look like without a DBMS? Consider
yourself as the organization for a moment, and the data are all the files on your computer.
How is your data organized? If you are like most typical computer users, you have a large
number of files, organized in folders.
You may have word processor documents, presentation files, spreadsheets, photographs,
etc. You find the information you need based on the folder structure you have created and
the names you have given to your files. This is called a file system and is typical for individual
computer users.

Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data models
used for understanding the structure of the database:

1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model was initially described by
Edgar F. Codd, in 1969. The relational data model is the widely used model which is
primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of data as


objects and relationships among them. These objects are known as entities, and relationship
is an association among these entities. This model was designed by Peter Chen and
published in 1976 papers. It was widely used in database designing. A set of attributes
describe the entities. For example, student_name, student_id describes the 'student' entity.
A set of the same type of entities is known as an 'Entity set', and the set of the same type of
relationships is known as 'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of functions,


encapsulation, and object identity, as well. This model supports a rich type system that
includes structured and collection types. Thus, in 1980s, various database systems following
the object-oriented approach were developed. Here, the objects are nothing but the data
carrying its properties.

4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have different
attributes sets. The Extensible Markup Language, also known as XML, is widely used for
representing the semistructured data. Although XML was initially designed for including the
markup information to the text document, it gains importance because of its application in
the exchange of data.

Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.

Types of Database Language

1. Data Definition Language


o DDL stands for Data Definition Language. It is used to define database structure or
pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.

These commands are used to update the database schema that's why they come under Data
definition language.

2. Data Manipulation Language

DML stands for Data Manipulation Language. It is used for accessing and manipulating data
in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.

3. Data Control Language


o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have the
feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language

TCL is used to run the changes made by the DML statement. TCL can be grouped into a
logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last Commit.
o Savepoint-it is used to saved the temporary copied data in a datatable of rollback
segment

Database Users and Administrators


Database Users
Database users are the one who really use and take the benefits of database. There will be
different types of users depending on their need and way of accessing the database.

1. Application Programmers – They are the developers who interact with the database
by means of DML queries. These DML queries are written in the application
programs like C, C++, JAVA, .net,python,Pascal etc. These queries are converted into
object code to communicate with the database.
For example, writing a C program to generate the report of employees who are
working in particular department will involve a query to fetch the data from
database. It will include a embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to
request the database. They directly interact with the database by means of query
language like SQL. These users will be scientists, engineers, analysts who thoroughly
study SQL and DBMS to apply the concepts in their requirement. In short, we can say
this category includes designers and developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special
database application programs. They are the developers who develop the complex
programs to the requirement.
4. Stand-alone Users – These users will have stand –alone database for their personal
use. These kinds of database will have readymade database packages which will
have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact with
the database.
For example, online library system, ticket booking systems, ATMs etc which has
existing application and users use them to interact with the database to fulfill their
requests.
Database Administrators

The life cycle of database starts from designing, implementing to administration of it. A
database for any kind of requirement needs to be designed perfectly so that it should work
without any issues. Once all the design is complete, it needs to be installed. Once this step is
complete, users start using the database. The database grows as the data grows in the
database. When the database becomes huge, its performance comes down. Also accessing
the data from the database becomes challenge. There will be unused memory in database,
making the memory inevitably huge. These administration and maintenance of database is
taken care by database Administrator – DBA.
A DBA has many responsibilities. A good performing database is in the hands of DBA.
• Installing and upgrading the DBMS Servers: – DBA is responsible for installing a new
DBMS server for the new projects. He is also responsible for upgrading these servers
as there are new versions comes in the market or requirement. If there is any failure
in upgradation of the existing servers, he should be able revert the new changes
back to the older version, thus maintaining the DBMS working. He is also responsible
for updating the service packs/ hot fixes/ patches to the DBMS servers.
• Design and implementation: – Designing the database and implementing is also
DBA’s responsibility. He should be able to decide proper memory management, file
organizations, error handling, log maintenance etc for the database.
• Performance tuning: – Since database is huge and it will have lots of tables, data,
constraints and indices, there will be variations in the performance from time to
time. Also, because of some designing issues or data growth, the database will not
work as expected. It is responsibility of the DBA to tune the database performance.
He is responsible to make sure all the queries and programs works in fraction of
seconds.
• Migrate database servers: – Sometimes, users using oracle would like to shift to SQL
server or Netezza. It is the responsibility of DBA to make sure that migration happens
without any failure, and there is no data loss.
• Backup and Recovery: – Proper backup and recovery programs needs to be
developed by DBA and has to be maintained him. This is one of the main
responsibilities of DBA. Data/objects should be backed up regularly so that if there is
any crash, it should be recovered without much effort and data loss.
• Security: – DBA is responsible for creating various database users and roles, and
giving them different levels of access rights.
• Documentation: – DBA should be properly documenting all his activities so that if he
quits or any new DBA comes in, he should be able to understand the database
without any effort. He should basically maintain all his installation, backup, recovery,
security methods. He should keep various reports about database performance.
In order to perform his entire task, he should have very good command over DBMS.

Types of DBA

There are different kinds of DBA depending on the responsibility that he owns.
• Administrative DBA – This DBA is mainly concerned with installing, and maintaining
DBMS servers. His prime tasks are installing, backups, recovery, security,
replications, memory management, configurations and tuning. He is mainly
responsible for all administrative tasks of a database.
• Development DBA – He is responsible for creating queries and procedure for the
requirement. Basically his task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and maintaining
the users, roles, access rights, tables, views, constraints and indexes. He is mainly
responsible for designing the structure of the database depending on the
requirement. These structures will be used by developers and development DBA to
code.
• Data Warehouse DBA –DBA should be able to maintain the data and procedures
from various sources in the data warehouse. These sources can be files, COBOL, or
any other programs. Here data and programs will be from different sources. A good
DBA should be able to keep the performance and function levels from these sources
at same pace to make the data warehouse to work.
• Application DBA –He acts like a bridge between the application program and the
database. He makes sure all the application program is optimized to interact with the
database. He ensures all the activities from installing, upgrading, and patching,
maintaining, backup, recovery to executing the records works without any issues.
• OLAP(online analytical processing) DBA – He is responsible for installing and
maintaining the database in OLAP systems. He maintains only OLAP databases.

Three Schema Architecture of DBMS


Three schema Architecture
o The three schema architecture is also called ANSI/SPARC architecture or three-level
architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and
physical database.
o The three schema architecture contains three-levels. It breaks the database down
into three different categories.

The three-schema architecture is as follows:


In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between various database
levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the request from
external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the conceptual
to internal level.

1. Internal Level
o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored
in a block.
o The physical level is used to describe complex low-level data structures in detail.

2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.

3. External Level
o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.

Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.

There are two types of data independence:

1. Logical Data Independence

o Logical data independence refers characteristic of being able to change the


conceptual schema without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual
view.
o If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
o Logical data independence occurs at the user interface level.

2. Physical Data Independence


o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal
levels.
o Physical data independence occurs at the logical interface level.

Fig: Data Independence

EF Code Rule
Dr Edgar F Codd
Dr E.F.Codd, also known to the world as the ‘Father of Database Management Systems’ had
propounded 12 rules which are in-fact 13 in number. The rules are numbered from zero to
twelve.
Here is brief note on E.F Codd’s Twelve rules:
Rule 0 − Foundation rule
Any relational database management system that is propounded to be RDBMS or advocated
to be a RDBMS should be able to manage the stored data in its entirety through its
relational capabilities.
Rule 1 − Rule of Information
Relational Databases should store the data in the form of relations. Tables are relations in
Relational Database Management Systems. Be it any user defined data or meta-data, it is
important to store the value as an entity in the table cells.
Rule 2 − Rule of Guaranteed Access
The use of pointers to access data logically is strictly forbidden. Every data entity which is
atomic in nature should be accessed logically by using a right combination of the name of
table, primary key represented by a specific row value and column name represented by
attribute value.
Rule 3 − Rule of Systematic Null Value Support
Null values are completely supported in relational databases. They should be uniformly
considered as ‘missing information’. Null values are independent of any data type. They
should not be mistaken for blanks or zeroes or empty strings. Null values can also be
interpreted as ‘inapplicable data’ or ‘unknown information.’
Rule 4 − Rule of Active and online relational Catalog
In the Database Management Systems lexicon, ‘metadata’ is the data about the database or
the data about the data. The active online catalog that stores the metadata is called ‘Data
dictionary’. The so called data dictionary is accessible only by authored users who have the
required privileges and the query languages used for accessing the database should be used
for accessing the data of data dictionary.
Rule 5 − Rule of Comprehensive Data Sub-language
A single robust language should be able to define integrity constraints, views, data
manipulations, transactions and authorizations. If the database allows access to the
aforementioned ones, it is violating this rule.
Rule 6 − Rule of Updating Views
Views should reflect the updates of their respective base tables and vice versa. A view is a
logical table which shows restricted data. Views generally make the data readable but not
modifiable. Views help in data abstraction.
Rule 7 − Rule of Set level insertion, update and deletion
A single operation should be sufficient to retrieve, insert, update and delete the data.
Rule 8 − Rule of Physical Data Independence
Batch and end user operations are logically separated from physical storage and respective
access methods.
Rule 9 − Rule of Logical Data Independence
Batch and end users can change the database schema without having to recreate it or
recreate the applications built upon it.
Rule 10 − Rule of Integrity Independence
Integrity constraints should be available and stored as metadata in data dictionary and not
in the application programs.
Rule 11 − Rule of Distribution Independence
The Data Manipulation Language of the relational system should not be concerned about
the physical data storage and no alterations should be required if the physical data is
centralized or distributed.
Rule 12 − Rule of Non Subversion
Any row should obey the security and integrity constraints imposed. No special privileges
are applicable.

You might also like