0% found this document useful (0 votes)
5 views65 pages

DBMS_BasicIntroduction

The document provides an introduction to databases, defining key concepts such as databases, DBMS, and data independence. It discusses the roles of database users and administrators, the advantages of the database approach over file processing systems, and outlines the three-schema architecture. Additionally, it highlights the main characteristics of database systems, including data abstraction, sharing, and multiuser transaction processing.

Uploaded by

kartikrohra2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views65 pages

DBMS_BasicIntroduction

The document provides an introduction to databases, defining key concepts such as databases, DBMS, and data independence. It discusses the roles of database users and administrators, the advantages of the database approach over file processing systems, and outlines the three-schema architecture. Additionally, it highlights the main characteristics of database systems, including data abstraction, sharing, and multiuser transaction processing.

Uploaded by

kartikrohra2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Chapter 1: Introduction

Outline
 Users of Database system
 Data Models
 Schemas and Instances
 Three Schema Architecture
 Data Independence
 DBA & its role
 The Need for Databases
Basic Definitions
 Database: A collection of related data.
 Data: Known facts that can be recorded and have an
implicit meaning.
 Database Management System (DBMS): A software
package/ system to facilitate the creation and maintenance
of a computerized database.
 Database System: The DBMS software together with the
data itself. Sometimes, the applications are also included.
Definitions of Database
 Def 1: Database is an organized collection of
logically related data
 Def 2: A database is a shared collection of logically

related data that is stored to meet the requirements of


different users of an organization
 Def 3: A database is a self-describing collection of

integrated records
 Def 4: A database models a particular real world

system in the computer in the form of data


Definitions
 Data: stored representations of meaningful objects and
events or
 Referred to facts concerning objects and events that

could be recorded and stored on computer media


 Structured: numbers, text, dates
 Unstructured: images, video, documents
 Information: data processed to increase knowledge in
the person using the data
 Metadata: data that describes the properties and

context of user data


5
What is a Database
 Shared collection of logically related data (and a
description of this data), designed to meet the
information needs of an organization.
 System catalog (metadata) provides description of
data to enable program–data independence.
 Logically related data comprises entities,
attributes, and relationships of an organization’s
information.

6
Database Management System
 A software system that is used to create, maintain, and
provide controlled access to users of a database
 (Database) application program: A computer program that
interacts with database by issuing an appropriate request
(SQL statement) to the DBMS
Database Management System

DBMS manages data resources like an operating system manages hardware resources
Typical DBMS Functionality
 Define a database : in terms of data types,
structures and constraints
 Construct or Load the Database on a secondary
storage medium
 Manipulating the database : querying, generating
reports, insertions, deletions and modifications to
its content
 Concurrent Processing and Sharing by a set of
users and programs – yet, keeping all data valid
and consistent
Typical DBMS Functionality
Other features:
 Protection or Security measures to prevent unauthorized
access
 “Active” processing to take internal actions on data
 Presentation and Visualization of data
Example of a Database
(with a Conceptual Data Model)
 Mini-world for the example: Part of a
UNIVERSITY environment.
 Some mini-world entities:
 STUDENTs
 COURSEs
 SECTIONs (of COURSEs)
 (academic) DEPARTMENTs
 INSTRUCTORs
Note: The above could be expressed in the ENTITY-
RELATIONSHIP data model.
Example of a Database
(with a Conceptual Data Model)
 Some mini-world relationships:
 SECTIONs are of specific COURSEs
 STUDENTs take SECTIONs
 COURSEs have prerequisite COURSEs
 INSTRUCTORs teach SECTIONs
 COURSEs are offered by DEPARTMENTs
 STUDENTs major in DEPARTMENTs

Note: The above could be expressed in the ENTITY-


RELATIONSHIP data model.
The concept of a shared organizational database
Management Marketing

Product
Planning Control Sales
Development

Corporate
Database

Accounting Manufacturing

Accounts Accounts 13
Scheduling Production
Receivable Payable
A bit of History
 Computer initially used for computational/ engineering
purposes
 Commercial applications introduced File Processing

System

14
File Processing System
 A collection of application programs that perform
services for the end-users such as production of
reports
 Each program defines and manages its own data

15
File Processing Systems
Library Examination Registration

Library Examination Registration


Applications Applications Applications

Library Examination Registration


Data Data Data
Files Files Files 16

Program and Data Interdependence


File Processing Systems

Library Examination Registration


Reg_Number Reg_Number Reg_Number
Name Name Name
Father Name Address Father Name
Books Issued Class Phone
Fine Semester Address
Grade Class

17
Files Based Processing
Disadvantages of File Processing
 Program-Data Dependence
 File structure is defined in the program code.
 All programs maintain metadata for each file they use
 Duplication of Data (Data Redundancy)
 Different systems/programs have separate copies of the same data
• Same data is held by different programs.
• Wasted space and potentially different values and/or different formats for
the same item.
 Limited Data Sharing
 No centralized control of data
 Programs are written in different languages, and so cannot easily access
each other’s files.
Disadvantages of File Processing
 Lengthy Development Times
 Programmers must design their own file formats
 Excessive Program Maintenance
 80% of of information systems budget
 Vulnerable to Inconsistency
 Change in one table need changes in corresponding tables as well
otherwise data will be inconsistent

20
Problems with Data Dependency
 Each application programmer must maintain their
own data
 Each application program needs to include code
for the metadata of each file
 Each application program must have its own
processing routines for reading, inserting, updating
and deleting data
 Lack of coordination and central control
 Non-standard file formats
Problems with Data Redundancy
 Waste of space to have duplicate data
 Causes more maintenance headaches

 The biggest problem:

 When data changes in one file, could cause inconsistencies


( Vulnerable to Inconsistency)
 Compromises data integrity (data reliability)
SOLUTION:
The DATABASE Approach

 Central repository of shared data


 Data is managed by a controlling agent

 Stored in a standardized, convenient form

This requires a
Database and Database Management System (DBMS)
Advantages of Database Approach
Library Examination Registration

Library Examination Registration


Applications Applications Applications

Database
Management
System

- Data Sharing - Data Independence


- Controlled Redundancy University - Better Data Integrity
24
Students
Database
Levels of Abstraction
 Physical level: describes how a record (e.g., instructor) is
stored.
 Logical level: describes data stored in database, and the
relationships among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
 View level: application programs hide details of data
types. Views can also hide information (such as an
employee’s salary) for security purposes.
View of Data
An architecture for a database system
Instances and Schemas
 Similar to types and variables in programming languages
 Logical Schema – the overall logical structure of the
database
 Example: The database consists of information about a set
of customers and accounts in a bank and the relationship
between them
 Analogous to type information of a variable in a program
 Physical schema– the overall physical structure of the
database
 Instance – the actual content of the database at a
particular point in time
 Analogous to the value of a variable
Database Users
Users may be divided into those who actually use and
control the content (called “Actors on the Scene”)
and those who enable the database to be developed
and the DBMS software to be designed and
implemented (called “Workers Behind the Scene”).
Database Users and Administrators

Database
Database Users
Actors on the scene
• Database administrators: responsible for
authorizing access to the database,
• for co-ordinating and monitoring its use,
acquiring software, and hardware resources,
controlling its use and monitoring efficiency of
operations.
• Database Designers: responsible to define the
content, the structure, the constraints, and
functions or transactions against the database.
They must communicate with the end-users and
understand their needs.
• End-users: they use the data for queries, reports
and some of them actually update the database
Database Users
• Application Programmers - They are the developers
who interact with the database by means of DML
queries. These queries are converted into object code to
communicate with the database. For example, writing a
C program to generate the report of employees who are
working in particular department will involve a query to
fetch the data from database. It will include a embedded
SQL query in the C Program.
Database Users
• Sophisticated Users - They are database developers,
who write SQL queries to select/insert/delete/update
data. They do not use any application or programs to
request the database. They directly interact with the
database by means of query language like SQL. These
users will be scientists, engineers, analysts who
thoroughly study SQL and DBMS to apply the concepts
in their requirement. In short, we can say this category
includes designers and developers of DBMS and SQL.
Database Users
• Specialized Users - These are also sophisticated
users, but they write special database application
programs. They are the developers who develop
the complex programs to the requirement.
• Naive Users - these are the users who use the
existing application to interact with the database.
For example, online library system, ticket
booking systems, ATMs etc which has existing
application and users use them to interact with
the database to fulfill their requests.
Database Administrator
• A Database Administrator, Database Analyst or
Database Developer is the person responsible for
managing the information within an organization.
• As most companies continue to experience
inevitable growth of their databases, these
positions are probably the most solid within the
IT industry.
Role of Database Administrator
1. Schema Definition:
• The DBA definition the logical Schema of the database.
A Schema refers to the overall logical structure of the
database.
• According to this schema, database will be developed
to store required data for an organization.
2. Storage Structure and Access Method Definition:
• The DBA decides how the data is to be represented in
the stored database.
3. Assisting Application Programmers:
• The DBA provides assistance to application
programmers to develop application programs.
Role of Database Administrator
4. Physical Organization Modification:
• The DBA modifies the physical organization of the database to reflect the
changing needs of the organization or to improve performance.
5. Approving Data Access:
• The DBA determines which user needs access to which part of the database.
• According to this, various types of authorizations are granted to different
users.
6. Monitoring Performance:
• The DBA monitors performance of the system.The DBA ensures that better
performance is maintained by making changes in physical or logical schema
if required.
7. Backup and Recovery:
• Database should not be lost or damaged.
• The DBA ensures this periodically backing up the database on magnetic
tapes or remote servers.
• In case of failure, such as virus attack database is recovered from this
backup.
Three-Schema Architecture
• Proposed to support DBMS characteristics of:
• Program-data independence.
• Support of multiple views of the data.
Three-Schema Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical storage
structures and access paths. Typically uses a physical data model.
• Conceptual schema at the conceptual level to describe the structure
and constraints for the whole database for a community of users. Uses
a conceptual or an implementation data model.
• External schemas at the external level to describe the various user
views. Usually uses the same data model as the conceptual level.
Three-Schema Architecture
• Mappings among schema levels are needed to
transform requests and data. Programs refer to
an external schema, and are mapped by the
DBMS to the internal schema for execution.
Data Independence
• Logical Data Independence: The capacity to change
the conceptual schema without having to change the
external schemas and their application programs.
• Physical Data Independence: The capacity to
change the internal schema without having to change
the conceptual schema.
Data Independence
• When a schema at a lower level is changed, only the
mappings between this schema and higher-level
schemas need to be changed in a DBMS that fully
supports data independence.
• The higher-level schemas themselves are unchanged.
Hence, the application programs need not be changed
since they refer to the external schemas.
Database System Internals
Database Engine
• Storage manager
• Query processing
• Transaction manager
Storage Management
• Storage manager is a program module that provides
the interface between the low-level data stored in the
database and the application programs and queries
submitted to the system.
• The storage manager is responsible to the following
tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
• Issues:
• Storage access
• File organization Indexing and hashing
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Query Processing (Cont.)
• Alternative ways of evaluating a given query
• Equivalent expressions
• Different algorithms for each operation
• Cost difference between a good and a bad way of
evaluating a query can be enormous
• Need to estimate the cost of operations
• Depends critically on statistical information about relations
which the database must maintain
• Need to estimate statistics for intermediate results to
compute cost of complex expressions
Transaction Management
• What if the system fails?
• What if more than one user is concurrently updating the same
data?
• A transaction is a collection of operations that performs a single
logical function in a database application
• Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures
(e.g., power failures and operating system crashes) and
transaction failures.
• Concurrency-control manager controls the interaction among
the concurrent transactions, to ensure the consistency of the
database.
Database Architecture
The architecture of a database systems is greatly influenced by
the underlying computer system on which the database is
running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
Main Characteristics of the
Database Approach
 Self-describing nature of a database system: A
DBMS catalog stores the description of the
database. The description is called meta-data). This
allows the DBMS software to work with different
databases.
 Insulation between programs and data: Called
program-data independence. Allows changing data
storage structures and operations without having to
change the DBMS access programs.
Main Characteristics of the
Database Approach
 Data Abstraction: A data model is used to hide
storage details and present the users with a
conceptual view of the database.
 Support of multiple views of the data: Each user
may see a different view of the database, which
describes only the data of interest to that user.
Main Characteristics of the
Database Approach
 Sharing of data and multiuser transaction processing :
allowing a set of concurrent users to retrieve and to
update the database. Concurrency control within the
DBMS guarantees that each transaction is correctly
executed or completely aborted. OLTP (Online
Transaction Processing) is a major part of database
applications.
Data Models
 A collection of tools for describing
 Data
 Data relationships
 Data semantics
 Data constraints
 Relational model
 Entity-Relationship data model (mainly for database design)
 Object-based data models (Object-oriented and Object-
relational)
 Semistructured data model (XML)
 Other older models:
 Network model
 Hierarchical model
Relational Model
 All the data is stored in various tables.
Columns
 Example of tabular data in the relational model

Rows
A Sample Relational Database
Data Definition Language (DDL)
 Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))

 DDL compiler generates a set of table templates stored in a data dictionary


 Data dictionary contains metadata (i.e., data about data)
 Database schema
 Integrity constraints
Primary key (ID uniquely identifies instructors)
 Authorization
 Who can access what
Data Manipulation Language (DML)
 Language for accessing and manipulating the data
organized by the appropriate data model
 DML also known as query language
 Two classes of languages
 Pure – used for proving properties about computational
power and for optimization
 Relational Algebra
 Tuple relational calculus
 Domain relational calculus
 Commercial – used in commercial systems
 SQL is the most widely used commercial language
SQL
 The most widely used commercial language
 SQL is NOT a Turing machine equivalent language
 SQL is NOT a Turing machine equivalent language
 To be able to compute complex functions SQL is
usually embedded in some higher-level language
 Application programs generally access databases
through one of
 Language extensions to allow embedded SQL
 Application program interface (e.g., ODBC/JDBC)
which allow SQL queries to be sent to a database
Database Design

 Logical Design – Deciding on the database schema.


Database design requires that we find a “good” collection
of relation schemas.
 Business decision – What attributes should we record in the
database?
 Computer Science decision – What relation schemas
should we have and how should the attributes be
distributed among the various relation schemas?
 Physical Design – Deciding on the physical layout of the
database
Database Design (Cont.)
 Is there any problem with this relation?
Design Approaches
• Need to come up with a methodology to ensure that each of the
relations in the database is “good”
• Two ways of doing so:
• Entity Relationship Model
• Models an enterprise as a collection of entities and relationships
• Represented diagrammatically by an entity-relationship diagram:
• Normalization Theory
• Formalize what designs are bad, and test for them
Object-Relational Data Models
• Relational model: flat, “atomic” values
• Object Relational Data Models
• Extend the relational data model by including object orientation and
constructs to deal with added data types.
• Allow attributes of tuples to have complex types, including non-atomic
values such as nested relations.
• Preserve relational foundations, in particular the declarative access to
data, while extending modeling power.
• Provide upward compatibility with existing relational languages.
XML: Extensible Markup Language
• Defined by the WWW Consortium (W3C)
• Originally intended as a document markup language not a
database language
• The ability to specify new tags, and to create nested tag
structures made XML a great way to exchange data, not just
documents
• XML has become the basis for all new generation data
interchange formats.
• A wide variety of tools is available for parsing, browsing and
querying XML documents/data
End of Chapter 1

You might also like