Dbms Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

DATABASE MANAGEMENT SYSTEM

Data is the collection of raw facts, figures, statistics which can be processed to produce
meaningful information.
For Ex: Name of student, marks obtained in the exam, rate, quantity, sales figures
Information is the processed data with some definite meaning.
For Ex: Student marks are processed to get the final result.
Payroll generation
Example for data and information
Student data maintenance- regno, name, class, comb fees
Payroll generation-empno,empname,empaddress,basicpay

Definition of Data processing


A series of actions or operations that converts data into useful information

Types of Data processing


1. Manual data processing
This method of data processing involves human intervention. The manual process of data
entry implies many opportunities for errors, delay in data capture ,high labor expenses etc.
2. Electronic data processing/computerized file systems
In this method the data is processed using electronic means such as computers, calculators,
servers and other similar electronic equipment’s. Faster and accurate processing are the important
feature of the electronic data processing.

Difference between Manual and Electronic Data Processing

MANUAL DATA PROCESSING ELECTRONIC DATA PROCESSING


The volume of data that can be processed is The volume of data that can be processed is
limited unlimited or very large
Requires large quantity of paper Requires reasonable less quantity of paper
The processing speed and accuracy is limited The processing speed and accuracy is
extremely high
The cost of processing is more The cost of processing is less

Paper is used for storing processed data Electronic devices are used for storing
processed data
Initial setup cost is low Initial setup cost is high

Data processing cycle


Data
communication

Data output Data input

Data processing

Data storage
Data input:
The data has to be collected i.e. gathering data, fact, figures and strategies.
The collected data should be given as input to the processing system using various input devices
such as keyboard, mouse, scanners, digital camera.
This stage converts the raw data into computer understandable form.

Data processing
This is a series of operations that converts the data into information
It includes various operations such as calculations, classification, sorting of data, data verification,
data manipulation, indexing, accessing, extracting.

Data storage
The processed data (information) are stored in suitable devices such as hard disk, CD and other
secondary storage devices permanently.

Output
The processed data is presented to the user in the required format such as hardcopy or softcopy
represented in the form of printouts, animated audio or video, picture.

Communication
The processed data (information) can be communicated to its users through various modern
communication modes using computers and other devices.

DATABASE
DEFINITION
A database is a collection of logically related data organized in a way that data can be easily
accessed, managed and updated

Applications of Database

Banking: For customer information, accounts, and loans, and banking transactions.
Water meter billing : The RR number and all the details are stored in the database and connected to
the server based works.

Sales: For customer, product, and purchase information.

Rail and Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner terminals situated around the world accessed the
central database system through phone lines and other data networks.

Colleges : For student information, course registrations, and grades.

Credit card transactions: For purchases on credit cards and generation of monthly statements.

Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.

Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds.
Manufacturing: For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/ stores, and orders for items.

Human resources: For information about employees recruitment, salaries, payroll taxes and
benefits, and for generation of paychecks

TERMS USED IN DATABASE


1. File : File is the basic unit of storage in a computer system.
File is a collection of logically related information.
2. Database : Collection of large volumes of logically related data
3. Tables : Collection of data organized in the form of rows and columns.
4. Fields : Each column name in a table is called a field.
5. Attribute : Each column heading is called as an attribute.
6. Records : A single entry in a table called a record or row.
7. Tuple: Each row in a table is called as tuple.
8. Domain : Set of all permitted values for an attribute.
9. Entity : It is an object such as a table or a form.
10. Instance: instance is a set of data values that are given to an entity.

FIELD/ATTRIBUTES Student database ENTITY/TABLE/RELATION

REGNO NAME COMB CLASS FEES


195912 VARUN CEBA II PUC 35000
195155 DINESH PCMC II PUC 32000 RECORD/
TUPLE
195367 KARAN PCMC II PUC 32000
195234 MANJUNATH PCMB II PUC 28000

DOMAIN

DATABASE MANAGEMENT SYSTEM

DEFINITION
A DBMS is a software that allows users to create, define, manipulate and maintain a database.

In the database the data is stored at central location and is shared among multiple users.

Some examples of popular DBMS are: MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.

ADVANTAGES OF DBMS FEATURES OF DBMS

Centralized data management


The data is stored at a central location and shared among multiple users. This helps in easy
management of the entire database

Controlled data redundancy


Since whole data resides in one central database, avoids unnecessary duplication of data and
reduces the total amount of space required to store the data.
Data integrity
It refers to the correctness(validity) of the data in the database. Data integrity ensures consistent
data throughout the database by following the standard formats that were specified during
database creation.

Data sharing
Data stored in the database can be shared among multiple users or application programs. New
applications can be developed using same stored data.

Ease of application development


The programmers can develop any number of applications easily as the database takes care of the
issues like security, redundancy, multiple access and data integrity.

Data security
As the data is stored at a centralized location, enforcing security constraints such as identifying key
attributes, providing passwords and user rights are easy to implement.

Multiple User interface


In order to meet the needs of various users having different types of interfaces such as query
languages, application program interfaces and graphical user interfaces(GUI) that include form style
and menu-driven interfaces.

Backup and recovery


The database management system provides backup and recovery from hardware and software
failures.

DISADVANTAGES OF DATA BASE MANAGEMENT SYSTEM

➢ Initial investments are high for procuring software, hardware and training.
➢ Additional processing is involved in implementing data integrity, data security and data
sharing.
➢ Backup and recovery operations are complex in DBMS environment.

DBMS USERS

DBA

RDBMS PROGRAMMER
DATABASE

END USERS
System analyst
System analysts determine the requirement of end users develop specifications for transactions
that meet these requirements.

Application programmers
Application programmers implement these parameters in programs.

Database Administrator (DBA)


DBA is responsible for authorization access to the database for coordinating and monitoring its use,
and for acquiring the needed software and hardware resources.

End users
People who require access to the database for querying, updating and generating reports.

Database designers
Database designers are responsible to design appropriate structures to represent and store the data.

DATA ABSTRACTION/3-LAYERED ARCHITECTURE/3-LEVEL DBMS ARCHITECTURE

A major purpose of a database system is to provide users with an abstract view of the data i.e. the
system hides certain details of how the data are stored and maintained.
The database description can be define at three levels namely
1. Internal or physical level
2. Conceptual or logical level
3. External or view level

This architecture was proposed by ANSI/SPARC so it is also known as ANSI/SPARC architecture.


SPARC- standard planning and requirements committee
ANSI- american national standard institute

Schema: Description of data at some level or the overall design of the database
Each level has its own schema.
Schema objects are database objects that contain data or govern or perform operations on data.
INTERNAL LEVEL
This is the lowest level of data abstraction that
describes how the data are physically stored and organized on the storage medium.
The physical level describes complex low-level data structures in detail.
At this level various aspects are considered such as storage space allocation techniques for data and
indexes, access paths such as indexes, data compression and encryption techniques and record
placement.

CONCEPTUAL LEVEL
This level of abstraction deals with the logical structure of the entire database and thus, it is also
known as LOGICAL LEVEL
Conceptual level describes what data are stored in the database and what relationships exist among
those data.
That is, it hides the complexity of physical storage structures.
The conceptual level is the overall view of the database and it includes all the information that is
going to be represented in the database.

EXTERNAL LEVEL
This is the highest level of abstraction and it is close to the user’s and it is concerned with the way
in which data are viewed by individual users. It is also known as VIEW LEVEL
Most of the users of the database are not concerned with all the information contained in the
database. Instead they need only a part of the database relevant to them. Eg: bank account holder

DATA INDEPENDENCE
The major objective of 3 level architecture is to provide data independence, which means
upperlevels are unaffected by change in lowerlevels.

Definition
The ability to modify schema definition in one level without affecting a schema definition in the
next higher level is called data independence.

Data independence occurs because when one schema is changed at one level, the schema at next
level remains unchanged and only the mapping between the levels is changed.

What is mapping?
The process of transforming the requests and results between various levels of DBMS architecture
is known as mapping.
CLASSIFICATION OF DATA INDEPENDENCE

Data
Independence

physical data Logical data


independence independence

File Entity
Data Models Normalization
organization relationship

There are two levels of data independence. They are


1. Logical data independence
2. Physical data independence.

LOGICAL DATA INDEPENDENCE


• The ability to modify logical level definition without affecting the external level or
application programs. That is it gives the overview of the records in a file, arrangement of
records in a file and also key used for accessing records in a file.

PHYSICAL DATA INDEPENDENCE


• The ability to change the internal level without affecting the schema at the conceptual or
external level.
• Physical data independence are supported based on the accessing of records.
• Changes in physical level is induced to improve the performance of the system for faster
update and retrieval
DATA INDEPENDENCE

Internal

Physical data
Independence mapping

mapping
mapping
Conceptual Logical data External
independence

TERMINOLOGIES USED IN RELATIONAL MODEL


We have already learnt terms used in database,i.e. Relation, attribute, domain, tuple, table etc.
Key: A key field is the data in a file using which the required information can be accessed from the
file

Types of Keys
Primary key: Primary key is a key that uniquely identify each row/record in a database table.
• A table can have only one primary key
• It is also called as super key
• The value of the primary key should not be repeated and it should not be null/blank.

Super Key: An attribute or set of attributes that uniquely identifies a tuple within a relation/table.

Alternate Key or Secondary key: The alternate keys of any table are those candidate keys which are
not currently selected as a primary key
For eg: In student database regno is the primary key whereas name, class, comb serves as an
alternate key.

Candidate key: Any set of one or more columns whose combined values are unique among all
occurrences
For eg: In student database (Name, class) – whose combined values are unique

Foreign Key: A key used to link two tables together is called foreign key. It is also known as
External key/referencing key.

Composite Key: when multiple fields are used as a primary key they are called a composite key

FILE ORGANIZATION
What is file organization?
It refers to the way records are physically arranged on a storage device

There are three types of file organization used in DBMS data processing. They are
 Sequential
 Direct Access /Random Access
 Indexed Sequential access method (ISAM)

SERIAL FILE ORGANIZATION


 In serial file organization records are arranged one after other i.e. there is no particular
order.
 A collection of records
 No particular sequence / order.
 Cannot be used as master file.
 Used as temporary transaction file.
 Records are created and stored in a file in the order in which transaction takes place.
Advantages
• Fast access to next record
• The storage devices used are very economical (Magnetic tapes)
• Easy for file back-up
Disadvantage
• Updating is slow
Example: Grocery list (temporary collection of data and unsorted)
SEQUENTIAL FILE ORGANIZATION
FEATURES
• A collection of records
• Stored in key sequence
• Records are accessed always from beginning of file.
• The addition of new records takes place only at the end of the file.
• storage devices used are magnetic tape, punched cards, or magnetic disks.
• Example payroll file-records are stored in the form of employee id., Linear search technique,
audio cassettes
ADVANTAGES
 Simple to use and understand/ simple file design
 Files are easy to construct in case of system failure.
 Very efficient when most of the records must be processed e.g. Payroll
 Accessing of records requires only the record key.
 Storage devices used are cheap/inexpensive like magnetic tape, punched cards, magnetic
disk.
DISADVANTAGES
 Entire file must be processed even if a single record is to be searched.(time consuming)
 Transactions must be sorted in sequential order before processing
 Overall processing is slow
 Data duplication is very high as the file does not support enough mechanism

RANDOM FILE ORGANIZATION


FEATURES
• Direct access file organization allow immediate direct access to individual records in the file.
• Records are stored on the disk by using a hashing algorithm.
• The key field/ record key is fed through hashing algorithm and the relative address is
created and this address gives the position on the disk where the record is to be stored.
• These disks contains tracks and sectors, data is stored in the random order at known
physical address.
• The data access takes place randomly with the help of record key.
• This type of organization also allows the file to be accessed sequentially.
• Magnetic and optical disks allow data to be stored and accessed randomly.
ADVANTAGES
• Accessing and retrieval of a records is fast and direct.
• Transactions need not be sorted.
• Best used for online transaction.
• Records can be accessed in any order (sequentially/non-sequentially
DISADVANTAGES
• Address generation overhead is involved for accessing each record due to hashing
function.(Due to address generating function, the method is not so efficient)
• The type of storage device used is expensive.
• May be less efficient in the use of storage space than sequentially organized files.

INDEXED SEQUENTIAL ACCESS METHOD (ISAM)


• ISAM is the combination of sequential and direct access file organization.
• ISAM consists of (1) main file for storage (2) A table to maintain the index variables.
• The records within the file are stored sequentially but accessed randomly through an index.
• Indexing permit access to selected records without searching the entire file.
• Uses binary search techniques to search for the required data.
• The storage device used is magnetic disk.
ADVANTAGES
• ISAM permits efficient and economical use of sequential processing techniques when the
activity ratio is high.
• Permits direct access processing of records in a relatively efficient way when the activity
ratio is low.
DISADVANTAGES
• Files must be stored in a direct-access storage device.
• Relatively expensive hardware and software resources are required.
• Access to records may be slower than direct file.
• Required more storage space than other types of files because of the index files.

DATABASE MODELS OR DATA MODELS


Definition: A Database model defines the logical design of data. The model describes the
relationships between different parts of the data.
It also defines a set of operations that can be performed.
A database model provides the necessary means to achieve data abstraction.
The design of the database can be any one of three models :
* Hierarchical Model
* Network Model
* Relational Model

HIERARCHICAL MODEL

• The hierarchical data model is the oldest type of data model, developed by IBM in 1968.
• In this model data is represented in a tree-like structure.
• A hierarchical database consists of collection of records which are connected to one another
through links.
• Each record is a collection of attributes, each of which contains only one data value.
• A link is an association between two records.
• To create a link between the records, the hierarchical model uses parent-child relationship.
• In hierarchical model data is represented as collection of records and the relationship are
represented by links.
ADVANTAGES
• It provides an easy way for addition and deletion of new records into the database.
• The data at the top level in the hierarchy is fast to access.
• This model work well to represent one-to-one and one-to-many relationship.

DISADVANTAGES
• Accessing the data at lower level in the hierarchy is time consuming.
• Searching a particular information requires dbms to run through entire model.
• The physical links make it difficult to expand or modify the database and the changes require
substantial redesigning efforts.
• This model does not support many-to-many relationship.

NETWORK DATABASE MODEL

• This model was presented by Conference on Data Systems Languages (CODASYL) in 1969.
• In a network model the data is represented by a collection of records,
• Relationships among data are represented by links.
• The data is organized in the form of graphs and some entities can be accessed through
several paths.
• However, the link in a network data model represents an association between precisely two
records.
• Each record of a particular record type represents a node.
• All the nodes are linked to each other without any hierarchy.

ADVANTAGES
• The network data model supports one-to-one and many to many relation.
• The network model is simple and easy to design.
DISADVANTAGES
• The network data model is that it can be quite complicated to maintain all the links and a
single broken link can lead to problems in the database.
• In addition, since there are no restrictions on the number of relationships, the database
design can become complex.
RELATIONAL DATABASE MODEL
• The relational data model was developed by E. F. Codd in 1970.
• All data is maintained in the form of two-dimensional tables (generally, known as relations)
consisting of rows and columns.
• Each table consists of rows and columns.
• A row (record) is alos called as tuple.
• A column hearder is called an attribute
• A table is a relation.
• A domain is a set of atomic values drawn from the column.
• The relationship between the two tables is implemented through a common attribute in the
tables.
• This makes the querying much easier in a relational database system.
• Oracle, Sybase, DB2, Ingres, Informix, MS-SQL Server are few of the popular relational
DBMSs

DBMS ARCHITECTURE
ONE-TIER ARCHITECTURE
• The user directly operates on the database.
• Any changes done here will directly done on the DBMS itself.
• It does not provide any tools for end users
• Database designer and programmers use single-tier architecture
• Example: Client platform, hosting web browser

TWO-TIER ARCHITECTURE
• Two-tier Client / Server architecture is used for User Interface program and Application
Programs that runs on client side.
• An interface is provided to access the DBMS.
• As interface called ODBC(open database connectivity) provides an API, that allow client-side
program to call the DBMS
• A client program may connect to several DBMS's.

THREE-TIER ARCHITECTURE
• Three-tier Client / Server database architecture is commonly used architecture for many
web applications
APPLICATION SERVER:
• Intermediate layer called Application server stores the web connectivity software and the
business logic (constraints) part of application used to access the right amount of data from
the database server.
• This layer acts like medium for sending partially processed data between the database
server and the client.
DATABSE TIER
• In this tier only database, query processing languages, relations and their constraints
resides.
PRESENTATION TIER/CLIENT/USER TIER
• This tier is used by end users.
• It provides multiple views of database.
• All views are generated by applications which resides in application tier.
Entity Relationship (ER) diagram
Entity is an object, place, person or class. It is represented using rectangle.

Attribute describes the property or characteristic of an entity. It is represented using elipse.

Relationship describes relations between entities. It is represented using rhombus.

Structure
ATTRIBUTE

ATTRIBUTE ENTITY ATTRIBUTE

RELATION

ENTITY ATTRIBUTE
ATTRIBUTE

Example
NAME

REGNO STUDENT COURSE

TEACH

TEACHER SUBJECT
IDNO
Data warehouse
A data ware house is a repository of an organization’s electronically stored data.
OR
A data warehouse is a structured repository of historic data.

Components of data warehouse


Data Sources:
Data sources refer to any electronic repository of information.
Data needs to be passed from PC databases / mainframe / client-server databases to the data
warehouse either on the transaction-by-transaction basis for real-time data warehouses or on a
regular cycle(e.g daily or weekly) for offline data warehouses.
Data transformation: T
he data transformation layer receives data from the data sources, cleaned and standardizes and
loads it into the data repository. This is often called “staging”.
Reporting :
The data in the data warehouses must be available to the organization’s staff if the data warehouses
is to be useful.
Reporting can be custom developed.
Example: Business intelligence tools, Executive information systems, Online Analytical
processing(OLAP) Tools, Data Mining etc.,
Metadata:
Metadata or “Data about data” is used to inform operators and end users of the data warehouses
about its status and the information held within the data warehouses.
Operations : Data warehouses operations comprises of the process of loading, manipulating and
extracting data from the data warehouse. Operations also cover users management security,
capacity management and related functions.

Stages of data warehouse


• Offline operational databases
Data warehouse in this initial stage are developed by simply copying the database of an operational
system to an off-line server where the processing load of reporting does not impact on the
operational system’s performance.
• Offline data warehouse
Database warehouses in this stages of evolution are updated on regular time cycle(usually daily,
weekly or monthly) form operational systems and the data is stored in a integrated reporting-
oriented data structure.
• Real Time data warehouse
Data warehouses are updated on transaction or event basis, event time an operational system
performs a transaction.

• Integrated data warehouses


Data warehouses are used to transactions that are passed back into the operational systems for use
in the daily activity of the organization.

Advantages
• Enhance end-user access to reports and analysis of information.
• Increases data consistency.
• Increases productivity and decreases computing costs.
• Able to combine data from different sources, in one place.
• Data warehouses provide an infrastructure that could support changes to data and
replication of the changed data back into the operational systems.
Disadvantages
• Extracting, cleaning and loading data could be time consuming.
• Data warehouses can get outdated relatively quickly.
• Providing training to end-users.
• Security could develop into a serious issue, especially if the data warehouses is internet
accessible.
• A data warehouses is usually not static and maintenance costs are high

DATA MINING

Definition: it refers to extracting or “mining” knowledge from large amounts of data.


Data mining is analysis and picking out relevant information.
The computer finds the patterns by identifying the underling rules of the features in the data.

Some of the data mining software’s are SPSS, SAS, Think Analytics and G-Sat etc.

Stages in Data Mining


• Selection- Selecting or segmenting the data according to some criteria .
• Preprocessing – This is the data cleaning stage where certain information is removed which
seems unnecessary and may slow down queries
• Transformation – The data is not merely transferred, but transformed. The data is made
useable and navigable.
• Data mining- This stage is concerned with the extraction of patterns from the data.
• Interpretation and Evaluation – The patterns identified by the system are interpreted into
knowledge which can be used to support human decision-making.

You might also like