BCA2 Dbmsnotes
BCA2 Dbmsnotes
BCA2 Dbmsnotes
system
unit -1
Contents
• What is Data, information and knowledge
• Introduction to database
• How data managed with databases
• Problem
• Purpose of database
• Advantage of database
• Introduction to DBMS
What is data, information and knowledge
Data
Data are the raw materials,pieces of facts and figures Or unprocessed data.
(like for example 33 is the data, Ram is the data, bricks, sand are the data).
Data - facts can be recorded in the form of text, speech, video, audio, any numbers and images.
Data (singular: date) are understood differently in different sectors. In the basic form, data are
different symbols and characters whose meaning only becomes clear when they connect with context.
Collecting and measuring observations generates data. Usually machines sent, receive and
process data. The confusion between data and information often arises because information is made out of
data.
Information
Information is data that has been processed in such a way as to be meaningful to the person
who receives it.
Information is data that has structure and context.
Data reaches a more complex level and becomes information by integrating them to a context.
Information provides expertise about facts or persons.
Example of information: The information about a date of birth still has very little value when it
is unknown to which person it belongs. By adding more information like the name, the linked
information creates knowledge about a person.
Knowledge
Knowledge is basically what a person knows. This is known as their knowledge base.
Knowledge is information that has been aggregated and analyzed and can be used for making
decisions.
Knowledge thus describes the collected information that is available about a particular fact or a person.
The knowledge of this situation makes it possible to make informed decisions and solve problems.
Thus, knowledge influences the thinking and actions of people.
Machines can also make decisions based on new knowledge generated by information. In order to gain
knowledge, it is necessary to process information.
Introduction to database
Application
Application
Database
System
Structured data which is
managed by DBMS
Database
Database
Purpose of database
It provides protection and security to the database. In the case of multiple users, it also maintains data
consistency
Characteristics of DBMS
It uses a digital repository established on a server to store and manage the information.
It can provide a clear and logical view of the process that manipulates data.
It contains ACID properties which maintain data in a healthy state in case of failure.
It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
Controls database redundancy: It can control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Backup: It provides backup and recovery subsystems which create automatic backup of data from hardware
and software failures and restores the data if required.
multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces
DBMS vs File System
DBMS is a collection of data. In DBMS, the user is not required to write the procedures.
File system is a collection of data. In this system, the user has to write the procedures for
managing the database.
DBMS takes care of Concurrent access of data using some form of locking.
In the File system, concurrent access has many problems like redirecting the file
while other deleting some information or updating some information.
DBMS provides a crash recovery mechanism, i.e., DBMS protects the user from the system failure.
File system doesn't have a crash mechanism, i.e., if the system crashes while entering
some data, then the content of the file will lost.
DBMS contains a wide variety of sophisticated techniques to store and retrieve the data.
File system can't efficiently store and retrieve the data.
Data processing
Data is processed to get the required results. For the purpose, different operations
may be performed on data. Therefore, data processing is defined as: " a sequence of
operations on data to convert it into useful information". The important operations that
can be performed on data are:
Physical Level
This is the lowest level in the three level architecture. It is also known as the internal level. The physical
level describes how data is actually stored in the database. In the lowest level, this data is stored in the
external hard drives in the form of bits and at a little high level, it can be said that the data is stored in
files and folders. The physical level also discusses compression and encryption techniques.
Conceptual Level
The conceptual level is at a higher level than the physical level. It is also known as the logical level. It
describes how the database appears to the users conceptually and the relationships between various data
tables. The conceptual level does not care for how the data in the database is actually stored.
External Level
This is the highest level in the three level architecture and closest to the user. It is also known as the
view level. The external level only shows the relevant database content to the users in the form of views
and hides the rest of the data. So different users can see the database as a different view as per their
individual requirements.
Data Independence
Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.
Logical data independence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
•Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
•If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
•Physical data independence is used to separate conceptual levels from the internal
levels.
The role of the DBA is very important and is defined by the following functions.
•Defining the Schema The DBA defines the schema which contains the structure of the data in
the application. The DBA determines what data needs to be present in the system ad how this data
has to be represented and organized. As per this construction, database will be produced to store
required information for an association.
•Defining Storage Structure and Access Method: The DBA chooses how the information is to
be spoken to in the put away database.
•Liaising with Users The DBA needs to interact continuously with the users to understand the
data in the system and its use. The DBA figures out which client needs access to which part of the
database
•Defining Security & Integrity Checks The DBA finds about the access restrictions to be
defined and defines security checks accordingly. Data Integrity checks are also defined by the
DBA.
•Defining Backup / Recovery Procedures The DBA also defines procedures for backup and
recovery. Defining backup procedures includes specifying what data is to backed up, the
periodicity of taking backups and also the medium and storage place for the backup data.
•Monitoring Performance The DBA has to continuously monitor the performance of the queries
and take measures to optimize all the queries in the application.
A Database stores critical information and helps access data quickly and securely. Therefore,
selecting the correct Architecture of DBMS helps in easy and efficient data management.
Query processor
The storage manager is important because databases typically require a large amount of storage space.
Corporate databases range in of data. A gigabyte is 1000 megabytes (1 billion bytes), and a terabyte is 1 million
megabytes (1 trillion bytes), since the main memory of computers cannot store this much information, the
information is stored on disks. Data are moved between disk storage and main memory of computers cannot
store this much information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of the central
processing unit, it is imperative that the database system structure the data so as to minimize the need to move
data between disk and main memory.
The Query Processor
is important because it helps the database system simplify and facilitate access to data. High level views
help to achieve this goal; with them, users of the system are not be burdened unnecessarily with the physical
details of the implementation of the system. However, quick processing of updates and queries is important.
It is the job of the database system to translate updates and queries is important. It is the job of the database
system to translate updates and queries written in a nonprocedural language, at the logical level, into an
efficient sequence of operations at the physical level.
Storage Manager
A storage manager is a program module that provides the interface between the low level data stored in the
database and the application programs and queries submitted to the system. The storage manager is
responsible for the interaction with the file manager. The raw data are stored on the disk using the file
system, which is usually provided by a conventional operating system. The storage manager translates the
various DML statements into low level file system commands. Thus, the storage manager is responsible for
storing, retrieving and updating data in the database.
File Manager
Which manages the allocation of space on disk storage and the data structures used
torepresent information stored on disk.
Buffer manager
Which is responsible for fetching data from disk storage into main memory, and deciding what
data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
DML compiler, which translates DML statements in a query language into an evaluation plan consisting of
low level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is it picks the lowest cost evaluation plan
from among the alternatives.
Query evaluation engine, which executes low level instructions generated by the DML compiler
Types of Database Users in DBMS
There are two types of database users, Users, and Administrators.
Database Users
Database users are the ones who really use and take the benefits of the database. There will be different
types of users depending on their needs and way of accessing the database.
Application Programmers – They are the developers who interact with the database by means of DML
queries. These DML queries are written in the application programs like C, C++, JAVA, Pascal, etc. These
queries are converted into object code to communicate with the database. For example, writing a C
program to generate the report of employees who are working in a particular department will involve a
query to fetch the data from the database. It will include an embedded SQL query in the C Program.
Sophisticated Users – They are database developers, who write SQL queries to select/insert/delete/update
data. They do not use any application or programs to request the database. They directly interact with the
database by means of a query language like SQL. These users will be scientists, engineers, analysts who
thoroughly study SQL and DBMS to apply the concepts in their requirements. In short, we can say this
category includes designers and developers of DBMS and SQL.
Specialized Users – These are also sophisticated users, but they write special database application
programs. They are the developers who develop the complex programs to the requirement.
Stand-alone Users – These users will have a stand-alone database for their personal use. These kinds of
the database will have readymade database packages which will have menus and graphical interfaces.
Native Users – these are the users who use the existing application to interact with the database. For
example, online library system, ticket booking systems, ATMs etc which has existing application and
users use them to interact with the database to fulfill their requests
What is Data Dictionary
A data dictionary contains metadata i.e data about the database. The data dictionary is very
important as it contains information such as what is in the database, who is allowed to access it, where is
the database physically stored etc. The users of the database normally don't interact with the data
dictionary, it is only handled by the database administrators.
Physical information about the tables such as where they are stored and how.
Table constraints such as primary key attributes, foreign key information etc.
Active Data Dictionary
If the structure of the database or its specifications change at any point of time, it should be reflected
in the data dictionary. This is the responsibility of the database management system in which the data
dictionary resides. So, the data dictionary is automatically updated by the database management
system when any changes are made in the database. This is known as an active data dictionary as it is
self updating.
DDL stands for Data Definition Language. It is used to define database structure or pattern.
It is used to create schema, tables, indexes, constraints, etc. in the database.
Using the DDL statements, you can create the skeleton of the database.
Data definition language is used to store the information of metadata like the number of tables and schemas,
their names, indexes, columns in each table, constraints, etc.
These commands are used to update the database schema that's why they come under Data definition
language.
Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a
database. It handles user requests.
.
Why use Data Model?
The primary goal of using data model are:
Ensures that all data objects required by the database are accurately represented. Omission of data will lead
to creation of faulty reports and produce incorrect results.
A data model helps design the database at the conceptual, physical and logical levels.
Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
It provides a clear picture of the base data and can be used by database developers to create a physical
database.
It is also helpful to identify missing and redundant data.
The very first data model could be flat data-models, where all the data used are to be kept in the same
plane. Earlier data models were not so scientific, hence they were prone to introduce lots of duplication
and update anomalies.
Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among
them. While formulating real-world scenario into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints.
Entity − An entity in an ER Model is a real-world entity having properties called attributes. Every attribute
is defined by its set of values called domain. For example, in a school database, a student is considered as
an entity. Student has various attributes like name, age, class, etc.
Relationship − The logical association among entities is called relationship. Relationships are mapped with
entities in various ways. Mapping cardinalities define the number of association between two entities.
Mapping cardinalities −
one to one
one to many
many to one
many to many
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than others.
This model is based on first-order predicate logic and defines a table as an n-ary relation.
An extension of the ER model with notions of functions, encapsulation, and object identity, as well. This
model supports a rich type system that includes structured and collection types. Thus, in 1980s, various
database systems following the object-oriented approach were developed. Here, the objects are nothing
but the data carrying its properties.
4) Semistructured Data Model:
This type of data model is different from the other three data models (explained above). The semistructured
data model allows the data specifications at places where the individual data items of the same type may have
different attributes sets. The Extensible Markup Language, also known as XML, is widely used for
representing the semistructured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the exchange of data.
Advantages and Disadvantages of Data Model:
Advantages of Data model:
The main goal of a designing data model is to make certain that data objects offered by the functional team
are represented accurately.
The data model should be detailed enough to be used for building the physical database.
The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.
Data Model helps business to communicate the within and across organizations.
To develop Data model one should know physical data stored characteristics.
This is a navigational system produces complex application development, management. Thus, it requires a
knowledge of the biographical truth.
Even smaller change made in structure require modification in the entire application.
There is no set data manipulation language in DBMS.
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire database. It
defines how the data is organized and how the relations among them are associated. It formulates
all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it useful.
A database schema can be divided broadly into two
categories −
Physical Database Schema − This schema pertains to the actual storage of data and its form of storage
like files, indices, etc. It defines how the data will be stored in a secondary storage.
Logical Database Schema − This schema defines all the logical constraints that need to be applied on
the data stored. It defines tables, views, and integrity constraints.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton of
database. It is designed when the database doesn't exist at all. Once the database is operational, it is very
difficult to make any changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot
of the database. Database instances tend to change with time. A DBMS ensures that its every instance
(state) is in a valid state, by diligently following all the validations, constraints, and conditions that the
database designers have imposed.
What is the ER Model?
ENTITY RELATIONAL (ER) MODEL is a high-level conceptual data model diagram. ER modeling
helps you to analyze data requirements systematically to produce a well-designed database. The Entity-
Relation model represents real-world entities and the relationship between them. It is considered a best
practice to complete ER modeling before implementing your database.
ER modeling helps you to analyze data requirements systematically to produce a well-designed database.
So, it is considered a best practice to complete ER modeling before implementing your database.
History of ER models
ER diagrams are a visual tool which is helpful to represent the ER model. It was proposed by Peter Chen
in 1971 to create a uniform convention which can be used for relational database and network. He aimed
to use an ER model as a conceptual modeling approach.
What is ER Diagrams?
A real-world thing either living or non-living that is easily recognizable and nonrecognizable. It is
anything in the enterprise that is to be represented in our database. It may be a physical thing or simply
a fact about the enterprise or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data in the database. The
characteristics of entities are must have an attribute, and a unique key. Every entity is made up of some
'attributes' which represent that entity.
Examples of entities:
•Person: Employee, Student, Patient
•Place: Store, Building
•Object: Machine, product, and Car
•Event: Sale, Registration, Renewal
•Concept: Account, Course
Notation of an Entity
Entity set:
Student
An entity set is a group of similar kind of entities. It may contain entities with attribute
sharing similar values. Entities are represented by their properties, which also called
attributes. All attributes have their separate values. For example, a student entity may
have a name, age, class, as attributes.
Example of Entities:
A university may have some departments. All these departments employ
various lecturers and offer several programs.
Some courses make up each program. Students register in a particular
program and enroll in various courses. A lecturer from the specific
department takes each course, and each lecturer teaches a various group
of students.
Relationship
Relationship is nothing but an association among two or more entities. E.g., Tom works
in the Chemistry department.
Entities take part in relationships. We can often identify relationships with verbs or
verb phrases.
For example:
•You are attending this lecture
•I am giving the lecture
•Just loke entities, we can classify relationships according to relationship-types:
•A student attends a lecture
•A lecturer is giving a lecture.
Weak Entities
A weak entity is a type of entity which doesn't have its key attribute. It can be
identified uniquely by considering the primary key of another entity. For that,
weak entity sets need to have participation.
In aboe example, "Trans No" is a discriminator within a group of transactions in
an ATM.
Entity
Attributes
Every ellipse represents one attribute and is directly connected to its
entity (rectangle).
One-to-one
Many-to-many
What are Keys in DBMS?
KEYS in DBMS is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table). They allow you to find the relation between
two tables. Keys help you uniquely identify a row in a table by a combination
of one or more columns in that table. Key is also helpful for finding unique
record or row from the table. Database key is also helpful for finding unique
record or row from the table.
Example:
Employee FirstName LastName
ID
11 Andrew Johnson
22 Tom Wood
33 Alex Hale
•Primary Key - is a column or group of columns in a table that uniquely identify every row in that
table.
•Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super
key with no repeated attributes.
•Alternate Key - is a column or group of columns in a table that uniquely identify every row in that
table.
•Foreign Key - is a column that creates a relationship between two tables. The purpose of Foreign keys
is to maintain data integrity and allow navigation between two different instances of an entity.
•Compound Key - has two or more attributes that allow you to uniquely recognize a specific record. It
is possible that each column may not be unique by itself within the database.
•Composite Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
•Surrogate Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
Example:
In the following example, <code>StudID</code> is a Primary Key.
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID is the
primary key, Roll No, Email becomes the alternative key.
CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate Key is a
super key with no repeated attributes. The Primary key should be selected from the candidate keys.
Every table must have at least a single candidate key. A table can have multiple candidate keys but
only a single primary key.
Properties of Candidate key:
•It must contain unique values
•Candidate key may have multiple attributes
•Must not contain null values
•It should contain minimum fields to ensure uniqueness
•Uniquely identify each record in a table
Example: In the given table Stud ID, Roll No, and email are candidate keys which help us to
uniquely identify the student record in the table.
FOREIGN KEY is a column that creates a relationship between two tables. The purpose of Foreign
keys is to maintain data integrity and allow navigation between two different instances of an entity. It
acts as a cross-reference between two tables as it references the primary key of another table.
In this key in dbms example, we have two table, teach and department in a school. However,
there is no way to see which search work in which department.
In this table, adding the foreign key in Deptcode to the Teacher name, we can create a
relationship between the two tables.
Teacher ID DeptCode Fname Lname
•ER diagrams are a visual tool which is helpful to represent the ER model
•Entity relationship diagram displays the relationships of entity set stored in a database
•ER diagrams help you to define terms related to entity relationship modeling
•ER model is based on three basic concepts: Entities, Attributes & Relationships
•An entity can be place, person, object, event or a concept, which stores data in the database
•A weak entity is a type of entity which doesn't have its key attribute
•It helps you to defines the numerical attributes of the relationship between two entities or entity sets
•ER- Diagram is a visual representation of data that describe how data is related to each other
•While Drawing ER diagram you need to make sure all your entities and relationships are properly
labeled.
Thankyou