Chapter 1 - Introduction to Database Systems
Chapter 1 - Introduction to Database Systems
Introduction
During early computer days, data was collected and stored on tapes, which were mostly write-
only, which means once data is stored on it, it can never be read again. They were slow and
bulky, and soon computer scientists realized that they needed a better solution to this problem.
Database and database technology have a major impact on the growing use of computers. It is
fair to say that databases play a critical role in almost all areas where computers are used,
including business, e-commerce, engineering, medicine, law, education, library services and
more…………
Database
Database is a collection of related data organized in a way that data can be easily accessed,
managed and updated. By data, we mean known facts that can be recorded and that have implicit
meaning. For eg: Consider the names, telephone numbers, and address of the people you may
know. You may have recorded these data in an indexed book or stored in a hard disk of a
computer with help of some softwares like MS-Excel or MS-Access. This collection of related
data with an implicit meaning is a database.
A database represents some aspects of the real world, sometimes called the mini world
or Universe of Discourse (UoD). Changes to the mini world are reflected in the database
A database is a logically coherent collection of data with some inherent meaning. A
random mixture/collection of data cannot be correctly referred to as a database.
A database is designed, built, and populated with data for a specific purpose. It has an
intended group of users and some fixed applications in which these users are interested.
In other words, a database has some source from which data is derived, some degree of
interaction with events in the real world, and an audience that is actively interested in its
contents.
File Organization
1
Chapter 1. Introduction to Database Systems (G2 CS)
are placed on the storage medium and interlinked. The operating system (O.S) decides how the
files are stored in a secondary storage medium-referred as file system.
A database can be of any size and complexity. For eg: the list of names and addresses of the
students in a class may consist of only a few hundreds of records each with a simple structure.
On the other hand, a large computerized library may contain millions of entries organized under
different categories – primarily by author’s name, by subject, by book title, etc…. – with each
category organized alphabetically. This huge amount of information must be organized and
managed, so that, users can easily search for, retrieve, and update the data as needed.
An example of a large commercial database is Amazon.com. It contains data for over 20 million
books, CDs and DVDs, videos, games, electronic items, and many other items. This database
occupies 2TB storage area and is stored on 200 different server computers. About 15 million
visitors access Amazon.com each day and use the database to make purchases. The database is
continually updated as new books and other items are added to the inventory and stock quantities
are updated as purchases are transacted. About 100 employees are responsible keeping the
Amazon.com database up-to-date.
DBMS is a collection of programs that enables users to create and maintain a database. The
DBMS is a general purpose software system that facilitates the process of defining,
constructing, manipulating and sharing database among various users and applications. Defining
a database involves specifying the datatype, structures and constraints/restrictions of the data to
be stored in the database.
Constructing the database is the process of storing the data on some storage medium that is
controlled by the DBMS. Manipulating a database includes functions such as querying the
database to retrieve specific data, updating the database to reflect changes in the mini world, and
generating reports from the data. Sharing a database allows multiple users and programs to
access the database simultaneously.
2
Chapter 1. Introduction to Database Systems (G2 CS)
Other important functions provided by the DBMS include protecting the database and
maintaining it over a long period of time. Protection includes system protection against
hardware or software malfunction (or crashes) and security protection against un-authorized or
malicious access. There are general-purpose DBMS softwares as well as special-purpose DBMS
softwares to implement a computerized database. Most DBMSs are very complex software
systems. The database and DBMS software together called as a Database System. Here are
some examples of popular DBMS used these days: MySql,SQL Server, PostgreSQL, IBM
DB2,oracle Amazon SimpleDB (cloud based) etc.
File system
Traditionally, data accessed through computers has been stored on different storage media in the
form of individual files. Files proved to be quite satisfactory as long as computerization was
limited to few application areas. However, the number of users became increased, especially
with the advent of internet and online transaction systems, the file systems gave rise to many
serious problems. The discipline of database systems evolved in response to these problems.
Most data processing systems use files for storing, accessing, and manipulating data. Files are
typically stored on magnetic tapes and disks. Most of the problems with files arise out of the fact
that, the files are specific to an application. As application increases, the total number of
3
Chapter 1. Introduction to Database Systems (G2 CS)
computerized files grows considerably. A large number of files gave rise to the following
problems.
Files involve a high level of redundancy in data; ie, same data item being stored at many
different places. Redundancy often results in inconsistency. What is worse is that, it may
exist in different stages of update at different places and thus may have different values.
Individual files are not adaptable to rapid changes, especially with respect to the way the
data items are structured within the file. Data stored in a file system is purely dependent
of the physical medium used.
In file system, only pre-defined questions can be answered.
Modifications in one program may require modifications in other programs, which
interface with this program.
Size of Data: The small amount of data storing into spreadsheet is fine, however it might
turns into a large amount of data then Spreadsheet solution will not work. Even if the size
of data records goes into millions then storing data in multiple spreadsheet which will
create a problem of speed. It will take you long time to find a record from the multiple
spreadsheet files.
Ease of Updating Data: Multiple peoples cannot edit the same file on same time. Other
peoples must wait until files are available to update which results into wastage of time.
To manage large chunks of data: Yes, you can store data into a spreadsheet, but if you
add large chunks of data into the sheet, it will simply not work. For instance: if your size
of data increases into thousands of records, it will simply create a problem of speed.
Accuracy: When doing data entry files in a spreadsheet, it becomes difficult to manage
the accuracy as there are no validations present in it.
Security of data: There is no denying the fact that your data is less secure in
spreadsheets. Anyone can easily get access to file and can make changes to it. With
databases you have security groups and privileges you set to restrict access.
Data integrity: Data integrity also becomes a question when storing data in spreadsheets.
In databases, you can be assured of accuracy and consistency of data due to the built in
integrity checks and access controls.
4
Chapter 1. Introduction to Database Systems (G2 CS)
Data Abstraction
DBMS supports Data abstraction. By Data Abstraction we mean DBMS provide user’s with an
abstract view of the system. i.e, the system hides certain details of how data is stored, created and
maintained. Main purpose of a database system is to hide complexity from database users.
3 Levels of Abstraction
• Physical Level
• Conceptual (logical)Level
• View level
Physical level:
Describes data stored in database, and the relationships among the data.
– Database Administrator level.
– Describes what data are stored.
– Describes the relationships among data.
– Next highest level of abstraction.
Eg:
type customer = record
5
Chapter 1. Introduction to Database Systems (G2 CS)
customer_id:string;
customer_name:string;
customer_street:string;
customer_city : string;
end;
View level:
The level that is closest to the users.
– Its concerned with the way the data seen by individual users.
– Describes part of the database for a particular group of users.
– Can be many different views of a database.
Characteristics of the database approach
In the database approach, a single repository of data is maintained. That is, defined once and then
accessed by various users. In file systems, each application is free to name data elements
independently. In contrast, in a database, the names and labels of data are defined once and used
repeatedly by queries, transactions, and applications.
The main characteristics of the database approach against the file processing approach are the
following:-
6
Chapter 1. Introduction to Database Systems (G2 CS)
program themselves. Hence, these programs are constrained to work with only one specific
database, whose structure is declared in application programs.
Insulation between Programs and Data, and Data Abstraction:-
In traditional file system, the structure of data files is embedded in the application programs,
so any changes to the structure of a file may require changing all programs that access that
file. By contrast, DBMS access programs do not require such changes in most cases. The
structure of data files is stored in the DBMS catalog separately from the access programs.
This property is known as program-data independence.
DBMS provides users with a conceptual representation of data that does not include
many of the details of how the data is stored or how the operations are implemented. This
characteristic is called as data abstraction. Many other details of file storage organization-
such as the access path specified on a file- can be hidden from database users by the DBMS.
Support of Multiple Views of the Data:-
A database typically has many users, each of whom may require different perspective or
view of the database. A multiuser DBMS, as its name implies, must allow multiple users to
access the database at the same time. The users will have variety of distinct applications
which defines multiple views
Sharing of Data and Multiuser Transaction Processing:-
The DBMS must include concurrency control software to ensure a controlled, correct
updation since several users may try to update the same data in a multiuser DBMS. For
example, when several reservation clerks try to assign a seat on an airline flight, the DBMS
should ensure that each seat can be accessed by only one clerk at a time for assignment to a
passenger. These types of applications are generally called online transaction processing
(OLTP) applications. A fundamental role of multiuser DBMS software is to ensure that
concurrent transactions operate correctly and efficiently. A transaction is an executing
program or process that includes one or more database accesses, such as reading or updating
of database records.
7
Chapter 1. Introduction to Database Systems (G2 CS)
In large organizations, many people are involved in the design, use, and maintenance of a large
database with hundreds of users. The people whose job involve the day-to-day use of a large
database: called as actors on the scene, and are classified as follows:-
System analysts determine the requirements of end users, especially naive users, and develop
specifications for canned transactions that meet these requirements. Application programmers
implement these specifications as programs; then they test, debug, document, and maintain these
canned transactions.
8
Chapter 1. Introduction to Database Systems (G2 CS)
of queries and updates—called canned transactions—that have been carefully programmed and
tested. The
tasks that such users perform are varied:
Bank tellers check account balances and post withdrawals and deposits.
Reservation agents for airlines, hotels, and car rental companies check availability for a
given request and make reservations.
Employees at receiving stations for shipping companies enter package identifications via
bar codes and descriptive information through buttons to update a central database of
received and in-transit packages.
■ Sophisticated end users include engineers, scientists, business analysts, and others who
thoroughly familiarize themselves with the facilities of the DBMS in order to implement their
own applications to meet their complex requirements.
■ Standalone users maintain personal databases by using ready-made program packages that
provide easy-to-use menu-based or graphics-based interfaces.An example is the user of a tax
A typical DBMS provides multiple facilities to access a database. Naive end users need to learn
very little about the facilities provided by the DBMS; they simply have to understand the user
interfaces of the standard transactions designed and implemented for their use. Casual users
learn only a few facilities that they may use repeatedly. Sophisticated users try to learn most of
the DBMS facilities in order to achieve their complex requirements. Standalone users typically
become very proficient in using a specific software package.