A.1.1 Outline The Differences Between Data and Information. What Is Data: Data Is A Raw and Unorganized Fact That Required To Be Processed To Make It
A.1.1 Outline The Differences Between Data and Information. What Is Data: Data Is A Raw and Unorganized Fact That Required To Be Processed To Make It
A.1.1 Outline The Differences Between Data and Information. What Is Data: Data Is A Raw and Unorganized Fact That Required To Be Processed To Make It
What is Data: Data is a raw and unorganized fact that required to be processed to make it
meaningful. Data can be simple at the same time unorganized unless it is organized.
Generally, data comprises facts, observations, perceptions numbers, characters, symbols,
image, etc.
It is processed data which includes data that possess context, relevance, and purpose. It also
involves manipulation of raw data.
Information assigns meaning and improves the reliability of the data. It helps to ensure
undesirability and reduces uncertainty. So, when the data is transformed into information, it
never has any useless details.
1
Data Vs. Information
2
useful. researcher for use.
Dependency Data is never designed to the Information is always specific to the
specific need of the user. requirements and expectations
because all the irrelevant facts and
figures are removed, during the
transformation process.
DIKW is the model used for discussion of data, information, knowledge, wisdom and their
interrelationships. It represents structural or functional relationships between data,
information, knowledge, and wisdom.
Example:
KEY DIFFERENCE
3
Data can be structured, tabular data, graph, data tree whereas Information is language,
ideas, and thoughts based on the given data.
Define Database
The database is a collection of inter-related data which is used to retrieve, insert and delete
the data efficiently. It is also used to organize the data in the form of a table, schema, views,
and reports, etc.
For example: The college Database organizes the data about the admin, staff,
students and faculty etc. Using the database, you can easily retrieve, insert, and delete the
information.
4
Advantages of DBMS
Controls database redundancy: It can control data redundancy because it stores all the
data in one single database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup
of data from hardware and software failures and restores the data if required.
multiple user interface: It provides different types of user interfaces like graphical
user interfaces, application program interfaces
Disadvantages of DBMS
Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of
the organization, all the data stored in a single database and if the database is damaged
due to electric failure or database corruption then the data may be lost forever.
DBMS Architecture
5
The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
The client/server architecture consists of many PCs and a workstation which are
connected via the network.
DBMS architecture depends upon how users are connected to the database to get
their request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing
and transaction management.
To communicate with the DBMS, client-side application establishes a connection
with the server side.
6
3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
The application on the client-end interacts with an application server which
further communicates with the database system.
End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.
7
Three schema Architecture
The three schema architecture is also called ANSI/SPARC architecture or three-
level architecture.
This framework is used to describe the structure of a specific database system.
The three schema architecture is also used to separate the user applications and
physical database.
The three schema architecture contains three-levels. It breaks the database down
into three different categories.
The three-schema architecture is as follows:
8
9
Programmers and database administrators work at this level.
3. External Level
At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
An external schema is also known as view schema.
Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
The view schema describes the end user interaction with database systems.
A schema diagram can display only some aspects of a schema like the name of record type,
data type, and constraints. Other aspects can't be specified through the schema diagram. For
example, the given figure neither show the data type of each data item nor the relationship
among various files.
In the database, actual data changes quite frequently. For example, in the given figure, the
database changes whenever we add a new grade or add a student. The data at a particular
moment of time is called the instance of the database.
10
Data Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.
11
Physical data independence is used to separate conceptual levels from the internal
levels.
Physical data independence occurs at the logical interface level.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
A.1.7. Explain the importance of the ACID properties of a database transaction
ACID Properties
A transaction is a very small unit of a program and it may contain several low level tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and
12
Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of
the transaction or after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after
the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits,
then the database will hold the modified data. If a transaction commits but the system fails
before the data could be written on to the disk, then that data will be updated once the system
springs back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will
affect the existence of any other transaction.
What is a Database?
A database is a collection of related data which represents some aspect of the real world. A
database system is designed to be built and populated with data for a certain task.
What is DBMS?
Database Management System (DBMS) is a software for storing and retrieving users' data
while considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request for data from an application and
instructs the operating system to provide the specific data. In large systems, a DBMS helps
users and other third-party software to store and retrieve data.
DBMS allows users to create their own databases as per their requirement. The term
“DBMS” includes the user of the database and other application programs. It provides an
interface between the data and the software application.
Define Normalisation
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are:
13
Insertion
Update and
deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like
this:
emp_id emp_name emp_address emp_dept
14
emp_id emp_name emp_address emp_mobile
8812121212
102 Jon Kanpur
9900012222
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them
in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
15
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non-prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non-prime attribute teacher_age is dependent on teacher_id alone which is a proper
subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
Transitive dependency in DBMS
A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies. For e.g.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).
Book Author Author_age
16
Harry Potter J. K. Rowling 49
{Book} ->{Author} (if we know the book, we knows the author name)
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold,
that makes sense because if we know the book name we can know the author’s age.
18
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
1001 stores
19
This is now in BCNF as in both the functional dependencies left side part is a key.
A.2.1: Define the terms: database management system (DBMS) and relational database
management system (RDBMS).
DBMS:
RDBMS:
A.2.5: Identify the characteristics of the three levels of the schema: conceptual, logical,
physical.
A.2.9: Define the following database terms: table, record, field, primary key, secondary
key, foreign key, candidate key, composite primary key, join.
20
3. The Database Administrator (DBA) who is like the super-user of the system.
Individual in the association who controls the plan and the utilisation of the database
alludes as DBA
The role of the DBA is very important and is defined by the following functions.
Defining the Schema The DBA defines the schema which contains the structure of the
data in the application. The DBA determines what data needs to be present in the
system ad how this data has to be represented and organized. As per this construction,
database will be produced to store required information for an association.
Defining Storage Structure and Access Method: The DBA chooses how the
information is to be spoken to in the put away database.
Liaising with Users The DBA needs to interact continuously with the users to
understand the data in the system and its use. The DBA figures out which client needs
access to which part of the database
Defining Security & Integrity Checks The DBA finds about the access restrictions to
be defined and defines security checks accordingly. Data Integrity checks are also
defined by the DBA.
Defining Backup / Recovery Procedures The DBA also defines procedures for backup
and recovery. Defining backup procedures includes specifying what data is to backed
up, the periodicity of taking backups and also the medium and storage place for the
backup data.
Monitoring Performance The DBA has to continuously monitor the performance of
the queries and take measures to optimize all the queries in the application.
Assistance to Application Programmers: The DBA gives help to application software
engineers to create application programs.
Above listed DBA's Roles are the same we learned from our text books. Practically Role of a
DBA differ from association to association. In any case, the essential Roles that numerous
DBAs are accused of are listed below :
1. The First one is Install and configure Oracle software.
2. Taking Back up and planning for recovery of the database.
3. database schema designing and make required fundamental database objects.
4. Design the database schema and create any necessary database objects.
5. Define ideal application SQL.
6. Execute Guarantee database security to defend the information.
7. Work intimately with application engineers and framework chairmen to guarantee all
database needs are being met.
8. Apply patches to fixes or upgrade the database as per requirements.
21
End users are basically those people whose jobs require access to the database for querying,
updating and generating reports. The database primarily exists for their use. There are several
categories of end users these are as follows:
1. Casual End Users –These are the users who occasionally access the database but they
require different information each time. They use a sophisticated database query language
basically to specify their request and are typically middle or level managers or other
occasional browsers. These users learn very few facilities that they may use repeatedly from
the multiple facilities provided by DBMS to access it.
2. Naive or parametric end users –These are the users who basically make up a sizeable
portion of database end users. The main job function revolves basically around constantly
querying and updating the database for this we basically use a standard type of query known
as canned transaction that have been programmed and tested. These users need to learn very
little about the facilities provided by the DBMS they basically have to understand the users’
interfaces of the standard transaction designed and implemented for their use. The following
tasks are basically performed by Naive end users:
The person who is working in the bank will basically tell us the account balance and
post-withdrawal and deposits.
Reservation clerks for airlines, railway, hotels, and car rental companies basically
check availability for a given request and make the reservation.
Clerks who are working at receiving end for shipping companies enter the package
identifies via barcodes and descriptive information through buttons to update a central
database of received and in transit packages.
3. Sophisticated end users –These users basically include engineers, scientist, business
analytics and others who thoroughly familiarize themselves with the facilities of the DBMS
in order to implement their application to meet their complex requirement. These users try to
learn most of the DBMS facilities in order to achieve their complex requirements.
4. Standalone users –These are those users whose job is basically to maintain personal
databases by using a ready-made program package that provides easy to use menu-based or
graphics-based interfaces, An example is the user of a tax package that basically stores a
variety of personal financial data of tax purposes. These users become very proficient in
using a specific software package.
A.3.3 Describe different methods of database recovery.
Database Recovery Techniques in DBMS
Database systems, like any other computer system, are subject to failures but the data stored
in it must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database.
There are both automatic and non-automatic ways for both, backing up of data and recovery
from any failure situations. The techniques used to recover the lost data due to system crash,
transaction errors, viruses, catastrophic failure, incorrect commands execution etc. are
22
database recovery techniques. So to prevent data loss recovery techniques based on deferred
update and immediate update or backing up data can be used.
Recovery techniques are heavily dependent upon the existence of a special file known as a
system log. It contains information about the start and end of each transaction and any
updates which occur in the transaction. The log keeps track of all transaction operations that
affect the values of database items. This information is needed to recover from transaction
failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T starts
the execution.
read_item(T, X): This log entry records that transaction T reads the value of database item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T changes
the value of the database item X from old_value to new_value. The old value is sometimes
known as a before an image of X, and the new value is known as an afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before which
the DBMS was in consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will not abort
(terminate without completing). Once committed, the transaction is permanently recorded in
the database. Commitment always involves writing a commit entry to the log and writing the
log to disk. At the time of a system crash, item is searched back in the log for all transactions
T that have written a start_transaction(T) entry into the log but have not written a commit(T)
entry yet; these transactions may have to be rolled back to undo their effect on the database
during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and setting the value of item x in the database
to old-value.There are two major techniques for recovery from non-catastrophic transaction
failures: deferred updates and immediate updates.
Deferred update – This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails before reaching its commit
point, it will not have changed the database in any way so UNDO is not needed. It may be
necessary to REDO the effect of the operations that are recorded in the local transaction
workspace, because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm
Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database, making
recovery still possible. If a transaction fails to reach its commit point, the effect of its
23
operation must be undone i.e. the transaction must be rolled back hence we require both undo
and redo. This technique is known as undo/redo algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to
disk. A collection of in-memory buffers called the DBMS cache is kept under control of
DBMS for holding these buffers. A directory is used to keep track of which database items
are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not
modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory. When a
page is to be modified, a shadow page is allocated in which changes are made and when it is
ready to become durable, all pages that refer to original are updated to refer new replacement
page.
Some of the backup techniques are as follows :
Full database backup – In this full database including data and database, Meta information
needed to restore the whole database, including full-text catalogues are backed up in a
predefined time series.
Differential backup – It stores only the data changes that have occurred since last full
database backup. When same data has changed many times since last full database backup, a
differential backup stores the most recent version of changed data. For this first, we need to
restore a full database backup.
Transaction log backup – In this, all events that have occurred in the database, like a record
of every single statement executed is backed up. It is the backup of transaction log entries and
contains all transaction that had happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to perform a backup from a
transaction log if the data files are destroyed and not even a single committed transaction is
lost.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every
second. The durability and robustness of a DBMS depends on its complex architecture and its
underlying hardware and system software. If it fails or crashes amid transactions, it is
expected that the system would follow some sort of algorithm or techniques to recover lost
data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it
can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Reasons for a transaction failure could be −
24
Logical errors − Where a transaction cannot complete because it has some code error
or any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system
aborts an active transaction.
System Crash: There are problems − external to the system − that may cause the system to
stop abruptly and cause the system to crash. For example, interruptions in power supply may
cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure: In early days of technology evolution, it was a common problem where hard-
disk drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided
into two categories −
Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of
information.
Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-
disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity: When a system crashes, it may have several transactions being
executed and various files opened for them to modify the data items. Transactions are made
of various operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all the
operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be rolled
back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
25
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.
Log-based recovery works as follows −
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
Deferred database modification − All logs are written on to the stable storage and
the database is updated when a transaction commits.
Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions: When more than one transaction are being executed
in parallel, the logs are interleaved. At the time of recovery, it would become hard for the
recovery system to backtrack all logs, and then start recovering. To ease this situation, most
modern DBMS use the concept of 'checkpoints'.
Checkpoint: Keeping and maintaining logs in real time and in real environment may fill out
all the memory space available in the system. As time passes, the log file may grow too big to
be handled at all. Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.
Recovery: When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
The recovery system reads the logs backwards from the end to the last checkpoint.
26
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <T n, Start> and <Tn, Commit> or just <T n,
Commit>, it puts the transaction in the redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before
saving their logs.
27