dbms unit1
dbms unit1
What is DBMS?
A DBMS is a software that allows creation, definition and manipulation of database, allowing
users to store, process and analyse data easily. DBMS provides us with an interface or a tool, to
perform various operations like creating database, storing data in it, updating data, creating tables
in the database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data
consistency in case of multiple users.
Here are some examples of popular DBMS used these days:
● MySql
● Oracle
● SQL Server
● IBM DB2
● PostgreSQL
● Amazon SimpleDB (cloud based) etc.
Goals of DBMS
Providing a way to store and retrieve database information that is both convenient and efficient.
Ensuring the safety of the information.
What is file based system? Write its advantages and disadvantages.
A file based system is a collection of application programs that perform services for the users
wishing to access information. Each program within a file based system defines and manages its
own data. Because of this, there are limits as to how that data can be used or transported.
File based systems were developed as better alternatives to paper based filing systems. By having
files stored on computers, the data could be accessed more efficiently. It was common practice for
larger companies to have each of its departments looking after its own data.
What is role of a DBMS, and what are its advantages? What ate its disadvantages?
The DBMS serves as the intermediary between the user and the database. The database structure
itself is stored as a collection of files, and the only way to access the data in those files is through
the DBMS. The DBMS receives all the application requests and translates them into the complex
operations required to fulfill those requests.
Advantages of DBMS :
1.Reduces data redundancy : The DBMS do not maintain separate copies of the some data.
Rather, all the data are kept at one place and all the applications that require data refer to the
centrally maintained database. Now, if any change is to be made to data, it will be made at just one
place and the same changed information will be available to all the applications referring to it. Thus
redundancy gets controlled and so are the problems associated with it.
2.Improved data sharing : The DBMS helps creates an environment in which end users have
better access to more data and better-managed data. Such access makes its possible for end user
to respond quickly to changes in their environment.
3.Improved data security : The more users access data, the greater the risks of data security
breaches. A DMBS provides a framework for better enforcement of data privacy and security
policies.
4.Better data integration : Wider access to well-managed data promotes an integrated view of the
organization’s operations. It becomes much easier to see how actions in one segment of the
company affect other segment.
5.Minimized data inconsistency : Data inconsistency exists when different versions of the same
data appear in different places. The probability of data inconsistency is greatly reduced in a properly
designed database.
6.Improved data access : The DBMS makes it possible to produce quick answers to queries. From
a database perspective, a query is a specific
request issued to the DBMS for data manipulation. The DBMS sends back an answer to the
application very quickly.
7.Improved decision making : Better-managed data and improved data access makes it possible
to generate better quality information, on which better decisions are based.
8.Increased end-user productivity : The availability of data, combined with the tools that
transform data into usable information, empowers end users to make quick, informed decisions that
can make difference between success and failure in the global economy.
Although the DBMS yields considerable advantages over previous data management approaches,
DBMS do carry significant disadvantages.
Disadvantages of DBMS :
Increased costs : Database systems require sophisticated hardware and software and highly
skilled personnel. The cost maintaining the hardware,
software and personnel required to operate and manage a database system can be substantial.
Training, licensing and regulation compliance costs are often overlooked when database system are
implemented.
Management complexity : Database system interface with different
technologies and have a significant impact on the company’s resources and culture. Given the fact
that database system hold crucial company data that are accessed from multiple sources, security
issues must be accessed constantly.
Maintaining currency : To maximize the efficiency of the database system, you must keep your
system current. Therefore you must perform frequent updates and apply the latest patches and
security measures to all components.
Vender dependence : Given the heavy investment in technology and personnel training,
companies might be reluctant to change database venders. As a consequence, venders are less
likely to offer pricing point advantages to existing customers, and those customers might be limited
in their choice of database system components.
File System DBMS
A file system is a software that manages and DBMS or Database Management System is
organizes the files in a storage medium. It a software application. It is used for
controls accessing,
how data is stored and retrieved. creating, and managing databases.
The file system provides the details of data DBMS gives an abstract view of data that
representation and storage of data. hides the details
Storing and retrieving of data can’t be done DBMS is efficient to use as there are a wide
efficiently in a file system. variety of methods to store and retrieve
data.
It does not offer data recovery processes. There is a backup recovery for data in
DBMS.
The file system doesn’t have a crash recovery
DBMS provides a crash recovery
mechanism.
mechanism
Protecting a file system is very difficult. DBMS offers good protection mechanism.
In a file management system, the redundancy
The redundancy of data is low in the DBMS
of data is greater.
system.
Data inconsistency is low in a database
Data inconsistency is higher in the file system.
management system.
The file system offers lesser security. Database Management System offers high
security.
File System allows you to stores the data as Database Management System stores data
isolated data files and entities. as well as defined constraints and
interrelation.
Not provide support for complicated Easy to implement complicated
transactions. transactions.
The centralization process is hard in File Centralization is easy to achieve in the
Management System. DBMS system.
It doesn’t offer backup and recovery of data if DBMS system provides backup and
it is lost. recovery of data even if it is lost.
There is no efficient query processing in the You can easily query data in a database
file system. using the SQL language.
These system doesn’t offer concurrency. DBMS system provides a concurrency
facility.
Frequent update/replacement cycles : DBMS vendors frequently upgrade their products by
adding new functionality. Such new features often come bundled in new upgrade versions of the
software. Some of these versions require hardware upgrades. Not only do the upgrades themselves
cost money, but it also costs money to train database users and administrators to properly use and
manage the new features.
DBMS Approach:
In order to remove all limitations of the File Based Approach, a new approach was required
that must be more effective known as Database approach.
The Database is a shared collection of logically related data, designed to meet the
information needs of an organization. A database is a computer based record keeping system
whose over all purpose is to record and maintains information.
The database is a single, large repository of data, which can be used simultaneously by
many departments and users. Instead of disconnected files with redundant data, all data items are
integrated with a minimum amount of duplication.
The database is no longer owned by one department but is a shared corporate resource. The
database holds not only the organization's operational data but also a description of this data. For
this reason, a database is also defined as a self-describing collection of integrated records.
The description of the data is known as the Data Dictionary or Meta Data (the 'data about
data'). It is the self-describing nature of a database that provides program-data independence.
In the DBMS approach, application program written in some programming language like
Java, Visual Basic.Net, and Developer 2000 etc. uses database connectivity to access the database
stored in the disk with the help of operating system's file management system.
Database Models:
A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and updated in a database management system. While
the Relational Model is the most widely used database model, there are other models too:
● Hierarchical Model
● Network Model
● Entity-relationship Model
● Relational Model
● Object oriented database
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which all
the other data is linked. The heirarchy starts from the Root data, and expands like a tree, adding
child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes etc.In
hierarchical model, data is organised into tree-like structure with one one-to-many relationship
between two different types of data, for example, one department can have many courses, many
professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more like a
graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this
database model. Also, as the data is more related, hence accessing the data is also easier and fast.
This database model was used to map many-to-many data relationships.
This was the most widely used database model, before Relational Model was introduced.
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into entity and
its characteristics into attributes.
Different entities are related using relationships.E-R Models are defined to represent the
relationships into pictorial form to make it easier for different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in relational
model(explained below).
Let's take an example, If we have to design a School Database, then Student will be
an entity with attributes name, age, address etc. As Address is generally complex, it can be
another entity with attributes street name, pincode, city etc, and there will be a relationship
between them.
Relationships can also be of different types.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is maintained
by storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most widely
used database model, infact, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
Object oriented database – The information here is in the form of the object as used in object oriented
programming. It adds the database functionality to object programming languages. It requires less
code, use more natural data and also code bases are easy to maintain. Examples are ObjectDB
(ObjectDB software).
DBMS ARCHITECTURE
DatabaseUsers: Users are differentiated by the way they expect to interact with the system:
Naïve users: Also called Unsophisticated users, who have zero knowledge of database system.
(ex: Clerk in bank) They interact with the system by invoking one of the application programs that
have been written previously.
Application programmers:
Programmers who write software using tools such as Java, . Net, PHP etc… (ex: Software
developers)
Sophisticated users: Sophisticated users interact with the system without writing programs.
Instead, they form their requests in a database query language (SQL).
Database Administrator (Specialized Users): write specialized database applications program.
Coordinates all the activities of the database system. Query Processor: Query processors come
with the following components, DDL Interpreter: It interprets the DDL statements and records the
definitions in data dictionary.
DML Compiler: It translates the DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation understands. It also performs query
optimization which actually picks up the lowest cost evaluation plan from various alternatives.
Query Evaluation Engine: It executes the low-level instruction compiled by the DML compiler.
Storage Manager: The storage manager is the component of a database system that provides the
interface between the low level data stored in the database and the application programs and
queries submitted to the system.
The storage manager components include:
Authorization and Integrity Manager: It tests the integrity constraints and checks the authorization
of users to access data.
Transaction Manager: It ensures that no kind of change will be brought to the database until a
transaction has been completed totally
File Manager: It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer Manager: It decides which data is in need to be cached in main memory and then fetch it up
in main memory. This is very important as it defines the speed in which the database can be used.
Data structures implemented by storage manager.
Data files: Stored in the database itself.
Data dictionary: Stores metadata about the structure of the database.
Indices: Provide fast access to data items.
2-tier architecture:
Two tier architecture is similar to a basic client-server model. The application at the client
end directly communicates with the database at the server side. API’s like ODBC,JDBC are used for
this interaction.
The server side is responsible for providing query processing and transaction management
functionalities. On the client side, the user interfaces and application programs are run.
The application on the client side establishes a connection with the serverside in order to
communicate with the DBMS.
An advantage of this type is that maintenance and understanding is easier, compatible with existing
systems. However this model gives poor performance when there are a large number of users.
3-tier architecture:
In this type, there is another layer between the client and the server. The client does not
directly communicate with the server. Instead, it interacts with an application server which further
communicates with the database system and then the query processing and transaction
management takes place.
This intermediate layer acts as a medium for exchange of partially processed data between
server and client. This type of architecture is used in case of large web applications.
ANSI-SPARC Architecture:
The ANSI-SPARC Architecture, where ANSI-SPARC stands for American National
Standards Institute, Standards Planning And Requirements Committee, is an abstract design
standard for a Database Management System (DBMS), first proposed in 1975.
The ANSI-SPARC model however never became a formal standard. No mainstream DBMS
systems are fully based on it (they tend not to exhibit full physical independence or to prevent direct
user access to the conceptual level), but the idea of logical data independence is widely adopted.
What are types of Data Independence in DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that make use
of it.
In DBMS there are two types of data independence
1. Physical data independence
2.Logical data independence.
Physical data independence helps you to separate conceptual levels from the internal/physical
levels. It allows you to provide a logical description of the database without the need to specify
physical structures. Compared to Logical Independence, it is easy to achieve physical data
independence. With Physical independence, you can easily change the physical storage structures
or devices with an effect on the conceptual schema. Any change done would be absorbed by the
mapping between the conceptual and internal levels.
1.Due to Physical independence, any of the below change will not affect the conceptual layer.
2.Using a new storage device like Hard Drive or Magnetic Tapes 3. Modifying the file organization
technique in the Database 4. Switching to different data structures.
5.Changing the access method
6.Modifying indexes.
7.Changes to compression techniques or hashing algorithms. 8.Change of Location of Database
from say C drive to D Drive
Logical Data Independence is the ability to change the conceptual scheme without changing
External views, External API or programs,
Due to Logical independence, any of the below change will not affect the external layer.
1.Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite of existing
application programs
2.Merging two records into one
3.Breaking an existing record into two or more records
Data Abstraction
Database systems comprise complex data-structures. In order to make the system efficient in terms
of retrieval of data, and reduce complexity in terms of usability of users, developers use abstraction
i.e., hide irrelevant details from the users. This approach simplifies database design.
There are mainly 3 levels of data abstraction:
Physical: This is the lowest level of data abstraction. It tells us how the data is actually stored in
memory. The access methods like sequential or random access and file organization methods like
B+ trees, hashing used for the same. Usability, size of memory, and the number of times the records
are factors that we need to know while designing the database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of memory
used for these purposes are kept hidden from the user.
Logical: This level comprises the information that is actually stored in the database in the form of
tables. It also stores the relationship among the data entities in relatively simple structures. At this
level, the information available to the user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager can
also be stored.
View: This is the highest level of abstraction. Only a part of the actual database is viewed by the
users. This level exists to ease the accessibility of the database by an individual user. Users view
data in the form of rows and columns. Tables and relations are used to store data. Multiple views of
the same database may exist. Users can just view the data and interact with the database, storage
and implementation details are hidden from them.
The main purpose of data abstraction is to achieve data independence in order to save time and
cost required when the database is modified or altered.
DATABASE DESIGN
1. Requirement Analysis: - This step gathers data requirements from different set of users.
2. Conceptual Database Design: - Once the requirements have been collected and analysed, the
next step is to create a conceptual design using a high-level conceptual data model (ER Model).
The conceptual schema is a concise description of the data requirements of the users and includes
detailed descriptions of the entity types, relationships, and constraints.
3. Schema Refinement: - Schema Refinement is an approach that is used to refine the schema
based on decomposition. Data Redundancy is the main problem that will lead to schema
refinement. To eliminate redundancy, we use concept of normalization.
4. Logical Database Design: - The next step in database design is the actual implementation of
the database, using a commercial DBMS such as the relational (RDBMS or SQL) model. so, the
conceptual schema is transformed from the high-level data model (E-R model) into the
implementation data model. This step is called logical design.
5. Physical Database Design: - Physical design phase, during which the internal storage
structures, file organizations, indexes, access paths, and physical design parameters for the
database files are specified
Database Applications
• Railway Reservation System
• Library Management System
• Banking
• Education Sector
• Credit card exchanges
• Airlines
• Finance
1. Instances :
Instances are the collection of information stored at a particular moment. The instances can be
changed by certain CRUD operations as like addition, deletion of data. It may be noted that any
search query will not make any kind of changes in the instances.
2. SCHEMA
Schema is the overall description of the database. The basic structure of how the data will be stored
in the database is called schema. The overall design of a database is called schema.
A database schema is the skeleton structure of the database. It represents the logical view of the
entire database.
Entity-Relationship Model
1) What is Entity Relationship Model?
A) An Entity–relationship model (ER model) describes the structure of a database with the help
of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set.
Strong Entity
The Strong Entity is the one whose existence does not depend on the existence of any other entity
in a schema. It is denoted by a single rectangle. A strong entity always has the primary key in the
set of attributes that describes the strong entity.
Weak Entity
An entity that cannot be uniquely identified by its own attributes and relies on the relationship with
other entity is called weak entity. The weak entity is represented by a double rectangle.
For example: A bank account cannot be uniquely identified without knowing the bank to which the
v. Multivalued Attribute
A derived attribute is one whose value is dynamic and derived from another
attribute. It is represented by dashed oval in an ER Diagram. For example – In
Student entity age is a derived attribute as it changes over time and can be
derived from another attribute date_of_birth.
7) What is a Relationship?
5. Define Primary Identify the data attribute(s) that uniquely identify one
Keys and only one occurrence of each entity.
9. Draw fully attributed Adjust the ERD from step 6 to account for entities or
ERD relationships discovered in step 8.
Structural Constraints
Participation Constraints and Cardinality Ratios taken together are called
Structural Constraints.
Participation Constraints:
TotalParticipation: A Total participation of an entity set represents that
each entity in entity set must have at least one relationship in a
relationship set.
• Partial participation: Not all entities of an entity set are involved in the
relationship. Partial participation is represented by single lines
8) Define Cardinality in Relationships.
A) Cardinality defines the numerical attributes of the relationship between two
entities or entity sets.
When a single instance of an entity is associated with more than one instances
of another entity then it is called one to many relationship.
Ex: A customer can place many orders but a order cannot be placed by many
customers.
Types of degree
Now, based on the number of linked entity types, we have 4 types of degrees of
relationships.
1. Unary
2. Binary
3. Ternary
Let’s discuss them one by one with the help of examples.
Unary
In this type of relationship, both the associating entity type are the same. So, we
can say that unary relationships exist when both entity types are the same and
we call them the degree of relationship is 1. Or in other words, in a relation only
one entity set is participating then such type of relationship is known as a unary
relationship.
Example: In a particular class, we have many students, there are monitors too.
So, here class monitors are also students. Thus, we can say that only students
are participating here. So the degree of such type of relationship is 1.
Binary (degree 2)
In a Binary relationship, there are two types of entity associates. So, we can say
that a Binary relationship exists when there are two types of entity and we call
them a degree of relationship is 2. Or in other words, in a relation when two
entity sets are participating then such type of relationship is known as a binary
relationship. This is the most used relationship and one can easily be converted
into a relational table.
Example: We have two entity types ‘Student’ and ‘ID’ where each ‘Student’ has
his ‘ID’. So, here two entity types are associating we can say it is a binary
relationship. Also, one ‘Student’ can have many ‘daughters’ but each ‘daughter’
should belong to only one ‘father. We can say that it is a one-to-many binary
relationship.
Ternary(degree 3)
In the Ternary relationship, there are three types of entity associates. So, we
can say that a Ternary relationship exists when there are three types of entity
and we call them a degree of relationship is 3. Since the number of entities
increases due to this, it becomes very complex to turn E-R into a relational
table. Now let’s understand with the examples.
Example: We have three entity types ‘Teacher’, ‘Course’, and ‘Class’. The
relationship between these entities is defined as the teacher teaching a
particular course, also the teacher teaches a particular class. So, here three
entity types are associating we can say it is a ternary relationship.
EER Model:
EER is a high-level data model that incorporates the extensions to the original
ER model.
● The relationship between sub class and super class is denoted with
symbol.
Super Class:
● Super class is an entity type that has a relationship with one or more subtypes.
● An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle,
Triangle.
Sub Class:
● Sub class is a group of entities with unique attributes.
● Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
B. Specialization
Specialization is a process that defines a group entities which is divided into
sub groups based on their characteristic.
● It is a top down approach, in which one higher entity can be broken down into
two lower level entity.
● It maximizes the difference between the members of an entity by identifying the
unique characteristic or attributes of each member.
● It defines one or more sub class for the super class and also forms the
superclass/subclass relationship.
For example
In the above example, Employee can be specialized as Developer or Tester,
based on what role they play in an Organization.
C. Generalization:
● Generalization is the process of generalizing the entities which contain the
properties of all the generalized entities.
● It is a bottom approach, in which two lower level entities combine to form a
higher level entity.
● Generalization is the reverse process of Specialization.
● It defines a general entity type from a set of specialized entity type.
● It minimizes the difference between the entities by identifying the common
features.
For example:
In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
D. Aggregation:
● Aggregation is a process that represents a relationship between a whole object
and its component parts.
● It abstracts a relationship between objects and viewing the relationship as an
object.
● It is a process when two entity is treated as a single entity.
In the above example, the relation between College and Course is acting as an
Entity in Relation with Student.
DIFFERENT TYPES OF KEYS
KEYS
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys as we saw in PERSON table. The
key which is most suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary key since they are also unique.
o For each entity, selection of the primary key is based on requirement and
developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify
a tuple.
o The remaining attributes except for primary key are considered as a candidate
key. The candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key.
Rest of the attributes like SSN, Passport_Number, and License_Number, etc.
are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key
is a superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
The super key would be EMPLOYEE-ID, (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary
key of another table.
o In a company, every employee works in a specific department, and employee
and department are two different entities. So we can't store the information of
the department in the employee table. That's why we link these two tables
through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new
attribute in the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the
tables are related.
o
o
Note:
NOT NULL
NOT NULL constraint makes sure that a column does not hold NULL value.
When we don’t provide value for a particular column while inserting a record
into a table, it takes NULL value by default. By specifying NOT NULL constraint,
we can be sure that a particular column(s) cannot have NULL values.
Ex:
CREATE TABLE STUDENT (
SID NUMBER (3) NOT NULL, NAME VARCHAR (35) NOT NULL
);
UNIQUE
DEFAULT
The DEFAULT constraint provides a default value to a column when there is no
value provided while inserting a record into a table.
Ex:
CREATE TABLE STUDENT (
SID NUMBER (3) NOT NULL, NAME VARCHAR (35) NOT NULL, PHONE
NUMBER (10) UNIQUE
DONATION NUMBER DEFAULT 500;
);
CHECK
This constraint is used for specifying range of values for a particular column of a
table. When this constraint is being set on a column, it ensures that the
specified column must have the value falling in the specified range.
Ex:
CREATE TABLE STUDENT (
SID NUMBER(3) NOT NULL CHECK(SID<100, NAME VARCHAR (35) NOT
NULL,
PHONE NUMBER(10) UNIQUE DONATION NUMBER DEFAULT 500;
);
KEY CONSTRAINTS
PRIMARY KEY
Primary key is an attribute or set of attributes that uniquely identifies each
record in a table. It must have unique values and cannot contain nulls.
FOREIGN KEY
Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
STUDENT(SID,NAME,PHONE,CITY,DONATION)
COURSE(CID,CNAME,SID) -- contains Foreign Key that refers Primary Key of
STUDENT table.