De Unit 1
De Unit 1
De Unit 1
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data
management. A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a
school database may use students as an entity and their age as an attribute.
• Relation-based tables − DBMS allows entities and relations among them to form
tables. A user can understand the architecture of a database just by looking at the
table names.
• Isolation of data and application − A database system is entirely different than its
data. A database is an active entity, whereas data is said to be passive, on which the
database works and organizes. DBMS also stores metadata, which is data about
data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a relation
when any of its attributes is having redundancy in values. Normalization is a
mathematically rich and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of
leaving database in inconsistent state. A DBMS can provide greater consistency as
compared to earlier forms of data storing applications like file-processing systems.
• Query Language − DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and as different
filtering options as required to retrieve a set of data. Traditionally it was not
possible where file-processing system was used.
Applications of DBMS
Database is a collection of related data and data is a collection of facts and figures that can
be processed to produce information.
Mostly data represents recordable facts. Data aids in producing information, which is based
on facts. For example, if we have data about marks obtained by all students, we can then
conclude about toppers and average marks.
A database management system stores data in such a way that it becomes easier to
retrieve, manipulate, and produce information. Following are the important characteristics
and applications of DBMS.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,
and Durability (normally shortened as ACID). These concepts are applied on
transactions, which manipulate data in a database. ACID properties help the
database stay healthy in multi-transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment and
allows them to access and manipulate data in parallel. Though there are restrictions
on transactions when users attempt to handle the same data item, but users are
always unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is in the
Sales department will have a different view of database than a person working in
the Production department. This feature enables the users to have a concentrate
view of the database according to their requirements.
• Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers many different
levels of security features, which enables multiple users to have different views
with different features.
For example, a user in the Sales department cannot see the data that belongs to the
Purchase department. Additionally, it can also be managed how much data of the
Sales department should be displayed to the user. Since a DBMS is not saved on the
disk as traditional file systems, it is very hard for miscreants to break the code.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data models
used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model was initially described by
Edgar F. Codd, in 1969. The relational data model is the widely used model which is
primarily used by commercial data processing applications.
4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have different
attributes sets. The Extensible Markup Language, also known as XML, is widely used for
representing the semistructured data. Although XML was initially designed for including the
markup information to the text document, it gains importance because of its application in
the exchange of data.
Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
These commands are used to update the database schema that's why they come under Data
definition language.
DML stands for Data Manipulation Language. It is used for accessing and manipulating data
in a database. It handles user requests.
(But in Oracle database, the execution of data control language does not have the
feature of rolling back.)
There are the following operations which have the authorization of Revoke:
TCL is used to run the changes made by the DML statement. TCL can be grouped into a
logical transaction.
1. Application Programmers – They are the developers who interact with the database
by means of DML queries. These DML queries are written in the application
programs like C, C++, JAVA, .net,python,Pascal etc. These queries are converted into
object code to communicate with the database.
For example, writing a C program to generate the report of employees who are
working in particular department will involve a query to fetch the data from
database. It will include a embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to
request the database. They directly interact with the database by means of query
language like SQL. These users will be scientists, engineers, analysts who thoroughly
study SQL and DBMS to apply the concepts in their requirement. In short, we can say
this category includes designers and developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special
database application programs. They are the developers who develop the complex
programs to the requirement.
4. Stand-alone Users – These users will have stand –alone database for their personal
use. These kinds of database will have readymade database packages which will
have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact with
the database.
For example, online library system, ticket booking systems, ATMs etc which has
existing application and users use them to interact with the database to fulfill their
requests.
Database Administrators
The life cycle of database starts from designing, implementing to administration of it. A
database for any kind of requirement needs to be designed perfectly so that it should work
without any issues. Once all the design is complete, it needs to be installed. Once this step is
complete, users start using the database. The database grows as the data grows in the
database. When the database becomes huge, its performance comes down. Also accessing
the data from the database becomes challenge. There will be unused memory in database,
making the memory inevitably huge. These administration and maintenance of database is
taken care by database Administrator – DBA.
A DBA has many responsibilities. A good performing database is in the hands of DBA.
• Installing and upgrading the DBMS Servers: – DBA is responsible for installing a new
DBMS server for the new projects. He is also responsible for upgrading these servers
as there are new versions comes in the market or requirement. If there is any failure
in upgradation of the existing servers, he should be able revert the new changes
back to the older version, thus maintaining the DBMS working. He is also responsible
for updating the service packs/ hot fixes/ patches to the DBMS servers.
• Design and implementation: – Designing the database and implementing is also
DBA’s responsibility. He should be able to decide proper memory management, file
organizations, error handling, log maintenance etc for the database.
• Performance tuning: – Since database is huge and it will have lots of tables, data,
constraints and indices, there will be variations in the performance from time to
time. Also, because of some designing issues or data growth, the database will not
work as expected. It is responsibility of the DBA to tune the database performance.
He is responsible to make sure all the queries and programs works in fraction of
seconds.
• Migrate database servers: – Sometimes, users using oracle would like to shift to SQL
server or Netezza. It is the responsibility of DBA to make sure that migration happens
without any failure, and there is no data loss.
• Backup and Recovery: – Proper backup and recovery programs needs to be
developed by DBA and has to be maintained him. This is one of the main
responsibilities of DBA. Data/objects should be backed up regularly so that if there is
any crash, it should be recovered without much effort and data loss.
• Security: – DBA is responsible for creating various database users and roles, and
giving them different levels of access rights.
• Documentation: – DBA should be properly documenting all his activities so that if he
quits or any new DBA comes in, he should be able to understand the database
without any effort. He should basically maintain all his installation, backup, recovery,
security methods. He should keep various reports about database performance.
In order to perform his entire task, he should have very good command over DBMS.
Types of DBA
There are different kinds of DBA depending on the responsibility that he owns.
• Administrative DBA – This DBA is mainly concerned with installing, and maintaining
DBMS servers. His prime tasks are installing, backups, recovery, security,
replications, memory management, configurations and tuning. He is mainly
responsible for all administrative tasks of a database.
• Development DBA – He is responsible for creating queries and procedure for the
requirement. Basically his task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and maintaining
the users, roles, access rights, tables, views, constraints and indexes. He is mainly
responsible for designing the structure of the database depending on the
requirement. These structures will be used by developers and development DBA to
code.
• Data Warehouse DBA –DBA should be able to maintain the data and procedures
from various sources in the data warehouse. These sources can be files, COBOL, or
any other programs. Here data and programs will be from different sources. A good
DBA should be able to keep the performance and function levels from these sources
at same pace to make the data warehouse to work.
• Application DBA –He acts like a bridge between the application program and the
database. He makes sure all the application program is optimized to interact with the
database. He ensures all the activities from installing, upgrading, and patching,
maintaining, backup, recovery to executing the records works without any issues.
• OLAP(online analytical processing) DBA – He is responsible for installing and
maintaining the database in OLAP systems. He maintains only OLAP databases.
1. Internal Level
o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored
in a block.
o The physical level is used to describe complex low-level data structures in detail.
2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.
3. External Level
o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.
EF Code Rule
Dr Edgar F Codd
Dr E.F.Codd, also known to the world as the ‘Father of Database Management Systems’ had
propounded 12 rules which are in-fact 13 in number. The rules are numbered from zero to
twelve.
Here is brief note on E.F Codd’s Twelve rules:
Rule 0 − Foundation rule
Any relational database management system that is propounded to be RDBMS or advocated
to be a RDBMS should be able to manage the stored data in its entirety through its
relational capabilities.
Rule 1 − Rule of Information
Relational Databases should store the data in the form of relations. Tables are relations in
Relational Database Management Systems. Be it any user defined data or meta-data, it is
important to store the value as an entity in the table cells.
Rule 2 − Rule of Guaranteed Access
The use of pointers to access data logically is strictly forbidden. Every data entity which is
atomic in nature should be accessed logically by using a right combination of the name of
table, primary key represented by a specific row value and column name represented by
attribute value.
Rule 3 − Rule of Systematic Null Value Support
Null values are completely supported in relational databases. They should be uniformly
considered as ‘missing information’. Null values are independent of any data type. They
should not be mistaken for blanks or zeroes or empty strings. Null values can also be
interpreted as ‘inapplicable data’ or ‘unknown information.’
Rule 4 − Rule of Active and online relational Catalog
In the Database Management Systems lexicon, ‘metadata’ is the data about the database or
the data about the data. The active online catalog that stores the metadata is called ‘Data
dictionary’. The so called data dictionary is accessible only by authored users who have the
required privileges and the query languages used for accessing the database should be used
for accessing the data of data dictionary.
Rule 5 − Rule of Comprehensive Data Sub-language
A single robust language should be able to define integrity constraints, views, data
manipulations, transactions and authorizations. If the database allows access to the
aforementioned ones, it is violating this rule.
Rule 6 − Rule of Updating Views
Views should reflect the updates of their respective base tables and vice versa. A view is a
logical table which shows restricted data. Views generally make the data readable but not
modifiable. Views help in data abstraction.
Rule 7 − Rule of Set level insertion, update and deletion
A single operation should be sufficient to retrieve, insert, update and delete the data.
Rule 8 − Rule of Physical Data Independence
Batch and end user operations are logically separated from physical storage and respective
access methods.
Rule 9 − Rule of Logical Data Independence
Batch and end users can change the database schema without having to recreate it or
recreate the applications built upon it.
Rule 10 − Rule of Integrity Independence
Integrity constraints should be available and stored as metadata in data dictionary and not
in the application programs.
Rule 11 − Rule of Distribution Independence
The Data Manipulation Language of the relational system should not be concerned about
the physical data storage and no alterations should be required if the physical data is
centralized or distributed.
Rule 12 − Rule of Non Subversion
Any row should obey the security and integrity constraints imposed. No special privileges
are applicable.