Database Management System

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

DATABASE MANAGEMENT SYSTEM

UNIT-1

A database is a collection of related data which represents some aspect of the real world. A database system is
designed to be built and populated with data for a certain task. A database is a collection of inter-related data which
helps in efficient retrieval, insertion and deletion of data from database and organizes the data in the form of tables,
views, schemas, reports etc. For Example, university database organizes the data about students, faculty, and admin
staff etc. which helps in efficient retrieval, insertion and deletion of data from it.
Database Management System (DBMS) is software for storing and retrieving users' data while considering
appropriate security measures. It consists of a group of programs which manipulate the database. The DBMS accepts
the request for data from an application and instructs the operating system to provide the specific data. In large
systems, a DBMS helps users and other third-party software to store and retrieve data.

HISTORY OF DBMS -
1960 - Charles Bachman designed first DBMS system
1970 - Codd introduced IBM'S Information Management System (IMS)
1976- Peter Chen coined and defined the Entity-relationship model also known as the ER model
1980 - Relational Model becomes a widely accepted database component
1985- Object-oriented DBMS develops.
1990s- Incorporation of object-orientation in relational DBMS.
1991- Microsoft ships MS access, a personal DBMS and that displaces all other personal DBMS products.
1995: First Internet database applications
1997: XML applied to database processing. Many vendors begin to integrate XML into DBMS products.

CHARACTERISTICS OF DATABASE MANAGEMENT SYSTEM -


 Provides security and removes redundancy
 Self-describing nature of a database system
 Insulation between programs and data abstraction
 Support of multiple views of the data
 Sharing of data and multiuser transaction processing
 DBMS allows entities and relations among them to form tables.
 It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).
 DBMS supports multi-user environment that allows users to access and manipulate data in parallel.

POPULAR DBMS SOFTWARE -


 MySQL  dBASE  IBM DB2
 Microsoft Access  FoxPro  LibreOffice Base
 Oracle  SQLite  MariaDB

TYPES OF DBMS –
 Hierarchical DBMS: In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored
Hierarchically (top down or bottom up) format. Data is represented using a parent-child relationship. In
Hierarchical DBMS parent may have many children, but children have only one parent.
 Network Model: The network database model allows each child to have multiple parents. It helps you to address
the need to model more complex relationships like as the orders/parts many-to-many relationship. In this
model, entities are organized in a graph which can be accessed through several paths.
 Relational model: Relational DBMS is the most widely used DBMS model because it is one of the easiest. This
model is based on normalizing data in the rows and columns of the tables. Relational model stored in fixed
structures and manipulated using SQL.
 Object-Oriented Model: In Object - oriented Model data stored in the form of objects. The structure which is
called classes which display data within it. It defines a database as a collection of objects which stores both data
member’s values and operations.

ADVANTAGES OF DBMS -
 DBMS offers a variety of techniques to store & retrieve data
• DBMS serves as an efficient handler to balance the needs of multiple applications using the same data
 Uniform administration procedures for data
 Application programmers never exposed to details of data representation and storage.
 A DBMS uses various powerful functions to store and retrieve data efficiently.
 Offers Data Integrity and Security
 The DBMS implies integrity constraints to get a high level of protection against prohibited access to data.
 A DBMS schedules concurrent access to the data in such a manner that only one user can access the same data
at a time
 Reduced Application Development Time

DISADVANTAGE OF DBMS -
 Cost of Hardware and Software of a DBMS is quite high which increases the budget of your organization.
 Most database management systems are often complex systems, so the training for users to use the DBMS is
required.
 In some organizations, all data is integrated into a single database which can be damaged
 because of electric failure or database is corrupted on the storage media
 Use of the same program at a time by many users sometimes leads to the loss of some data.
 DBMS can't perform sophisticated calculations

DATA VS. INFORMATION: -


Data- Data is the complete list of facts and details like text, observations, figures, symbols and description of things.
It is the raw list of facts that are processed to gain information. The basic concept of data is associated with scientific
research collected by different research organizations.
Information- Information is the processed, organized and structured data. It provides context for data. However,
both the terms are used together; information can be easily understood than data.

DIFFERENCE BETWEEN DATA AND INFORMATION

DATA INFORMATION
Data is unorganized raw facts that need processing Information is a processed, organized data presented in
without which it is seemingly random and useless to a given context and is useful to humans.
humans
Data is an individual unit that contains raw material Information is a group of data that collectively carry a
which does not carry any specific meaning. logical meaning.

Data doesn’t depend on information. Information depends on data.


It is measured in bits and bytes. Information is measured in meaningful units like time,
quantity, etc.
Data is never suited to the specific needs of a Information is specific to the expectations and
designer. requirements because all the irrelevant facts and figures
are removed, during the transformation process.
An example of data is a student’s test score The average score of a class is the information derived
from the given data.

RECORDS IN THE DATABASE: -


Records provide a practical way to store and retrieve data from the database. Each record can have different kinds of
data, and thus a single row could have several types of information. A customer record could contain an ID number,
name, birth date, cell phone number, and email. The term record can also be pronounced record, meaning to save
information. In database management systems, a complete set of information. Records are composed of fields, each
of which contains one item of information. A set of records constitutes a file.
For example, a personnel file might contain records that have three fields: a name field, an address field, and a
phone number field. In relational database management systems, records are called tuples.
Types of record:
 Fixed length record- Each and every record in fixed length record has exactly the same size. These are simpler
and
easy to implement. It has uniform slots and are arranged in a continuous manner in the file, a record is identified
using both record Id and slot no. of record.
 Variable length record- Each and every record in this, is not of same size that is record size may vary depending
on the requirement. A record may be variable length fields, repeating or optional fields.

FILES: -
Relative data and information are stored collectively in file formats. A file is a sequence of records stored in binary
format. A disk drive is formatted into several blocks that can store records. File records are mapped onto those disk
blocks.
Types of Files:
 Master file - A master file is the main that contains relatively permanent records about particular items or
entries. For example, a customer file will contain details of a customer such as customer ID, name and contact
address.
 Transaction (movement) file - A transaction file is used to hold data during transaction processing. The file is
later used to update the master file and audit daily, weekly or monthly transactions. For example, in a busy
supermarket, daily sales are recorded on a transaction file and later used to update the stock file. The file is also
used by the management to check on the daily or periodic transactions.
 Reference file - A reference file is mainly used for reference or look-up purposes. Look-up information is that
information that is stored in a separate file but is required during processing. For example, in a point of sale
terminal, the item code entered either manually or using a barcode reader looks up the item description and
price from a reference file stored on a storage device.
 Backup file - A backup files is used to hold copies (backups) of data or information from the computers fixed
storage (hard disk). Since a file held on the hard disk may be corrupted, lost or changed accidentally, it is
necessary to keep copies of the recently updated files. In case of the hard disk failure, a backup file can be used
to reconstruct the original file.
 Report file - It is used to store relatively permanent records extracted from the master file or generated after
processing. For example, you may obtain a stock levels report generated from an inventory system while a copy
of the report will be stored in the report file.
 Sort file - It stores data which is arranged in a particular order. It is used mainly where data is to be processed
sequentially. In sequential processing, data or records are first sorted and held on a magnetic tape before
updating the master file.

DATA DICTIONARY:
Data Dictionary consists of database metadata. It has records about objects in the database.
Data Dictionary consists of the following information −
 Name of the tables in the database
 Constraints of a table i.e. keys, relationships, etc.
 Columns of the tables that related to each other
 Owner of the table
 Last accessed information of the object
 Last updated information of the object
An example of Data Dictionary can be personal details of a student
Student _ID Student _NameStudent_ Address Student City

DATABASE ADMINISTRATOR (DBA)


A database administrator (DBA) is a specialized computer systems administrator who maintains a successful
database environment by directing or performing all related activities to keep the data secure. The top responsibility
of a DBA professional is to maintain data integrity. This means the DBA will ensure that data is secure from
unauthorized access but is available to users.
A database administrator will often have a working knowledge and experience with a wide range of database
management products such as Oracle-based software, SAP and SQL.

DUTIES AND RESPONSIBILITIES


 Software installation and Maintenance - A DBA often collaborates on the initial installation and configuration of
a new Oracle, SQL Server etc. database. The system administrator sets up hardware and deploys the operating
system for the database server, and then the DBA installs the database software and configures it for use. As
updates and patches are required, the DBA handles this on-going maintenance.
 Data Extraction, Transformation, and Loading - Known as ETL, data extraction, transformation, and loading refer
to efficiently importing large volumes of data that have been extracted from multiple systems into a data
warehouse environment. This external data is cleaned up and transformed to fit the desired format so that it can
be imported into a central repository.
 Specialized Data Handling - Today’s databases can be massive and may contain unstructured data types such as
images, documents, or sound and video files. Managing a very large database (VLDB) may require higher- level
skills and additional monitoring and tuning to maintain efficiency.
 Database Backup and Recovery - DBAs create backup and recovery plans and procedures based on industry best
practices, then make sure that the necessary steps are followed. Backups cost time and money, so the DBA may
have to persuade management to take necessary precautions to preserve data.
 Security - A DBA needs to know potential weaknesses of the database software and the company’s overall
system and work to minimize risks. No system is one hundred per cent immune to attacks, but implementing
best practices can minimize risks.
 Authentication - Setting up employee access is an important aspect of database security. DBAs control who has
access and what type of access they are allowed. For instance, a user may have permission to see only certain
pieces of information, or they may be denied the ability to make changes to the system.
 Capacity Planning - The DBA needs to know how large the database currently is and how fast it is growing in
order to make predictions about future needs. Storage refers to how much room the database takes up in server
and backup space. Capacity refers to usage level.
 Performance Monitoring - Monitoring databases for performance issues is part of the on-going system
maintenance a DBA performs. If some part of the system is slowing down processing, the DBA may need to make
configuration changes to the software or add additional hardware capacity.
 Database Tuning - Performance monitoring shows where the database should be tweaked to operate as
efficiently as possible. The physical configuration, the way the database is indexed, and how queries are handled
can all have a dramatic effect on database performance. With effective monitoring, it is possible to proactively
tune a system based on application and usage instead of waiting until a problem develops.
 Troubleshooting - DBAs are on call for troubleshooting in case of any problems. Whether they need to quickly
restore lost data or correct an issue to minimize damage, a DBA needs to quickly understand and respond to
problems when they occur.

FILE ORIENTED SYSTEM VS. DATABASE SYSTEM


Sr. No. FILE SYSTEM DBMS
1. File system is software that manages and DBMS is software for managing the database.
organizes the files in a storage medium within a
computer.
2. Redundant data can be present in a In DBMS there is no redundant data.
file system.
3. It doesn’t provide backup and recovery of data if It provides backup and recovery of data even if it
it is lost. is lost.
4. There is no efficient query Efficient query processing is there in DBMS.
processing in file system.
5. There is less data consistency in file There is more data consistency because of the
system. process of normalization.
6. It is less complex as compared to DBMS. It has more complexity in handling as
compared to file system.
7. File systems provide less security in DBMS has more security mechanisms as
comparison to DBMS. compared to file system.
8. It is less expensive than DBMS. It has a comparatively higher cost than a file
system.
9. Multiple users cannot access the Multiple users can access the database at same
info. at same time. time.
10. Data is isolated and separated. Data is not separated and separated.
11. Ex- software or prog. are developed using Ex- software or prog. are developed using RDBM
languages like C, C++, COBOL, Pascal. and SQL.
UNIT 2
DBMS ARCHITECTURE: -
The design of a DBMS depends on its architecture. It can be centralized or decentralized or hierarchical. The
architecture of a DBMS can be seen as either single tier or multi-tier. N-tier architecture divides the whole system
into related but independent n modules, which can be independently modified, altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses it. Any changes
done here will directly be done on the DBMS itself. It does not provide handy tools for end-users. Database designers
and programmers normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can be accessed.
Programmers use 2-tier architecture where they access the DBMS by means of an application. Here the application
tier is entirely independent of the database in terms of operation, design, and programming.

3-TIER ARCHITECTURE:
3-tier architecture separates its tiers from each other based on the complexity of the users and how they use the
data present in the database. It is the most widely used architecture to design a DBMS.
 Database (Data) Tier − At this tier, the database resides along with its query processing languages. We also have
the relations that define the data and their constraints at this level.
 Application (Middle) Tier − At this tier reside the application server and the programs that access the database.
For a user, this application tier presents an abstracted view of the database. End-users are unaware of any
existence of the database beyond the application. At the other end, the database tier is not aware of any other
user beyond the application tier. Hence, the application layer sits in the middle and acts as a mediator between
the end-user and the database.
 User (Presentation) Tier − End-users operate on this tier and they know nothing about any existence of the
database beyond this layer. At this layer, multiple views of the database can be provided by the application. All
views are generated by applications that reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are independent and can be
changed independently.

LEVELS OF ARCHITECTURE
 Physical Level - This is the lowest level in the three -level architecture. It is also known as the internal level. The
physical level describes how data is actually stored in the database. In the lowest level, this data is stored in the
external hard drives in the form of bits and at a little high level, it can be said that the data is stored in files and
folders. The physical level also discusses compression and encryption techniques.
 Conceptual Level - The conceptual level is at a higher level than the physical level. It is also known as the logical
level. It describes how the database appears to the users conceptually and the relationships between various
data tables. The conceptual level does not care for how the data in the database is actually stored.
 External Level - This is the highest level in the three -level architecture and closest to the user. It is also known as
the view level. The external level only shows the relevant database content to the users in the form of views and
hides the rest of the data. So different users can see the database as a different view as per their individual
requirements

DATABASE SCHEMA:
A database schema is the skeleton structure that represents the logical view of the entire database. It defines how
the data is organized and how the relations among them are associated. It formulates all the constraints that are to
be applied on the data. A database schema defines its entities and the
relationship among them. It contains a descriptive detail of the
database, which can be depicted by means of schema diagrams. It’s the
database designers who design the schema to help programmers
understand the database and make it useful.
A database schema can be divided broadly into two categories −
• Physical Database Schema − This schema pertains to the actual
storage of data and its form of storage like files, indices, etc. It
defines how the data will be stored in a secondary storage.
• Logical Database Schema − This schema defines all the logical constraints that need to be applied on the data
stored. It defines tables, views, and integrity constraints.
SUBSCHEMA
A sub schema is a subset of the
schema and inherits the same
property that a schema has. Sub
schema refers to an application
programmer’s (user’s) view of the
data item types and record types,
which he or she uses. A subschema
lets the user have access to
different areas of applications in
which the user designed. The areas
that are included in an application are set, types, record types, data items, and data aggregates. Schemas may have
many different subschema’s’ that are all very different. It gives the users a window through which he or she can view
only that part of the database, which is of interest to him. Therefore, different application programs can have
different view of data.

DATABASE INSTANCE:
It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It is
designed when the database doesn't exist at all. Once the database is operational, it is very difficult to make any
changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a valid
state, by diligently following all the validations, constraints, and conditions that the database designers have
imposed.

MAPPING CARDINALITIES:
Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of
other set via relationship set.
• A mapping constraint is a data constraint that expresses the number of entities to which another entity can be
related via a relationship set.
• It is most useful in describing the relationship sets that involve more than two entity sets.
• For binary relationship set R on an entity set A and B, there are four possible mapping cardinalities. These are as
follows:
A. 1.One to one (1:1)
B. One to many (1:M)
C. 3.Many to one (M:1)
D. 4.Many to many (M:M)
 One-to-one − One entity from entity set A can be associated with at most one entity of entity set B and vice
versa.
 One-to-many − One entity from entity set A can be associated with more than one entity of entity set B however
an entity from entity set B, can be associated with at most one entity.
 Many-to-one − More than one entity from entity set A can be associated with at most one entity of entity set B,
however an entity from entity set B can be associated with more than one entity from entity set A.
 Many-to-many − One entity from A can be associated with more than one entity from B and vice versa.

Mapping Process (Algorithm):


• Create table for each entity.
• Entity's attributes should become fields of tables with their respective data types.
• Declare primary key.

Mapping Process:
• Create table for a relationship.
• Add the primary keys of all participating Entities as fields of table with their respective data types.
• If relationship has any attribute, add each attribute as field of table.
• Declare a primary key composing all the primary keys of participating entities.

Mapping Weak Entity Sets:


A weak entity set is one which does not have any primary key associated with it.

Mapping Hierarchical Entities


ER specialization or generalization comes in the form of hierarchical entity sets.

Data Independence: -
A database system normally contains a lot of data in addition to users’ data. For example, it stores data about data,
known as metadata, to locate and retrieve data easily. It is rather difficult to modify or update a set of metadata
once it is stored in the database. But as a DBMS expands, it needs to change over time to satisfy the requirements of
the users. If the entire data is dependent, it would become a tedious and highly complex job.

Metadata itself follows a layered architecture, so that when we change data at one layer, it does not affect the data
at another level. This data is independent but mapped to each other.
Logical Data Independence:
Logical data is data about database, that is, it stores information about how data is managed inside. For example, a
table (relation) stored in the database and all its constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the disk. If we
do some changes on table format, it should not change the data residing on the disk.
Physical Data Independence:
All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data independence is the
power to change the physical data without impacting the schema or logical data.
For example, in case we want to change or upgrade the storage system itself − suppose we want to replace hard-
disks with SSD − it should not have any impact on the logical data or schemas.
The ER model defines the conceptual view of a database. It works around real-world entities and the associations
among them. At view level, the ER model is considered a good option for designing databases.

INTRO TO DATA MODELS: -


Data models define how the logical structure of a database is modeled. Data Models are fundamental entities to
introduce abstraction in a DBMS. Data models define how data is connected to each other and how they are
processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the same plane. Earlier
data models were not so scientific; hence they were prone to introduce lots of duplication and update anomalies.

ENTITY-RELATIONSHIP MODEL:
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among them. While
formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general
attributes and constraints. ER Model is best used for the conceptual design of a database.
ER Model is based on −
• Entities and their attributes.
• Relationships among entities.

• Entity − an entity in an ER Model is a


real-world entity having properties
called attributes. Every attribute is
defined by its set of values called domain. For example, in a school database, a student is considered as an
entity. Student has various attributes like name, age, class, etc.
• Relationship − the logical association among entities is called relationship. Relationships are mapped with
entities in various ways. Mapping cardinalities define the number of associations between two entities.

RELATIONAL MODEL:
The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model
is based on first-order predicate logic and defines a table as an n-ary relation.
The main highlights of this model are −
• Data is stored in tables called relations.
• Relations can be normalized.
• In normalized relations, values saved are atomic values.
• Each row in a relation contains a unique value.
• Each column in a relation contains values from a same domain.

ENTITY:
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a
school database, students, teachers, classes, and courses offered can be considered as entities. All these entities
have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing
similar values. For example, a Students set may contain all the students of a school; likewise, a Teachers set may
contain all the teachers of a school from all faculties. Entity sets need not be disjoint.
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.

ATTRIBUTES:
Entities are represented by means of their properties, called attributes. All attributes have values. For example, a
student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be
a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Types of Attributes:
• Simple attribute − Simple attributes are atomic average_salary in a department should not be
values, which cannot be divided further. For saved directly in the database, instead it can be
example, a student's phone number is an atomic derived. For another example, age can be derived
value of 10 digits. from data_of_birth.
• Single-value attribute − Single-value attributes
• Composite attribute − Composite attributes are contain single value. For example – Social Security
made of more than one simple attribute. For Number.
example, a student's complete name may have • Multi-value attribute − Multi-value attributes may
first name and last name. contain more than one values. For example, a
• Derived attribute − Derived attributes are the person can have more than one phone number,
attributes that do not exist in the physical email address, etc.
database, but their values are derived from other
attributes present in the database. For example,

These attribute types can come together in a way like −


• simple single-valued attributes
• simple multi-valued attributes
• composite single-valued attributes
• composite multi-valued attributes

RELATIONSHIP:
Relationships are represented by diamond-shaped box. Name of the relationship is written inside the diamond-box.
All the entities (rectangles) participating in a relationship, are connected to it by a line.

RELATIONSHIP SET:
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes.
These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree

BINARY RELATIONSHIP AND CARDINALITY


A relationship where two entities are participating is called a binary relationship. Cardinality is the number of
instances of an entity from a relation that can be associated with the relation.
• One-to-one − when only one instance of an entity than one instance of an entity on the right can be
is associated with the relationship, it is marked as associated with the relationship. It depicts one-to-
'1:1'. The following image reflects that only one many relationship.
instance of each entity should be associated with • Many-to-one − when more than one instance of
the relationship. It depicts one-to-one entity is associated with the relationship, it is
marked as 'N:1'. The following image reflects that
more than one instance of an entity on the left
and only one instance of an entity on the right can
be associated with the relationship. It depicts
many-to-one relationship.
relationship. • Many-to-many − The following image reflects that
• One-to-many − when more than one instance of more than one instance of an entity on the left
an entity is associated with a relationship, it is and more than one instance of an entity on the
marked as '1: N’. The following image reflects that right can be associated with the relationship. It
only one instance of entity on the left and more depicts many-to-many relationship.
Participation Constraints
• Total Participation − each entity is involved in the relationship. Total participation is represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial participation is represented by
single lines.

The ER Model has the power of expressing database entities in a conceptual hierarchical manner. As the hierarchy
goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives us the detail of every entity
included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more
generalized view. For example, a particular student named Mira can be generalized along with all the students. The
entity shall be a student, and further, the student is a person. The reverse is called specialization where a person is a
student, and that student is Mira.

TYPES OF DATABASE: -
1. Centralized Database - It is the type of database that stores data at a centralized database system. It comforts
the users to access the stored data from different locations through several applications. These applications
contain the authentication process to let users access data securely. An example of a Centralized database can
be Central Library that carries a central database of each library in a college/university.
2. Distributed Database - Unlike a centralized database system, in distributed systems, data is distributed among
different database systems of an organization. These database systems are connected via communication links.
Such links help the end-users to access the data easily. Examples of the Distributed database are Apache
Cassandra, HBase, Ignite, etc.
3. Relational Database - This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational database uses SQL for
storing, manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from others. Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.
4. NoSQL Database - Non-SQL/Not Only SQL is a type of database that is used for
storing a wide range of data sets. It is not a relational database as it stores data not
only in tabular form but in several different ways. It came into existence when the
demand for building modern applications increased. Thus, NoSQL presented a wide
variety of database technologies in response to the demands. We can further
divide a NoSQL database into the following four types --->
5. Cloud Database - A type of database where data is stored in a virtual environment
and executes over the cloud computing platform. It provides users with various
cloud computing services (SaaS, PaaS, IaaS, etc.) for accessing the database.
6. Object-oriented Databases - The type of database that uses the object-based data model approach for storing
data in the database system. The data is represented and stored as objects which are similar to the objects used
in the object-oriented programming language.
7. Hierarchical Databases - It is the type of database
that stores data in the form of parent-children
relationship nodes. Here, it organizes data in a tree-
like structure.
8. Network Databases - It is the database that typically
follows the network data model. Here, the
representation of data is in the form of nodes
connected via links between them. Unlike the
hierarchical database, it allows each record to have multiple children and parent nodes to form a generalized
graph structure.
UNIT 3
DATABASE SECURITY AND THREATS: -
Data security is an imperative aspect of any database system. It is of particular importance in distributed systems
because of large number of users, fragmented and replicated data, multiple sites and distributed control.
Database security concerns the use of a broad range of information security controls to protect databases
(potentially including the data, the database applications or stored functions, the database systems, the database
servers and the associated network links) against compromises of their confidentiality, integrity and availability. It
involves various types or categories of controls, such as technical, procedural/administrative and physical.

THREATS IN A DATABASE:
• Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers
• Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or
proprietary data, deletion of or damage to the data or programs, interruption or denial of authorized access to
the database, attacks on other systems and the unanticipated failure of database services;
• Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use
databases as intended;
• Physical damage to database servers caused by computer room fires or floods, overheating, lightning,
accidental liquid spills, static discharge, electronic breakdowns/equipment failures and obsolescence;
• Design flaws and programming bugs in databases and the associated programs and systems, creating various
security vulnerabilities (e.g. unauthorized privilege escalation), data loss/corruption, performance degradation
etc.
• Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system
administration processes, sabotage/criminal damage etc.

3 ELEMENTS OF DATABASE SECURITY


The basic security standards which technologies can assure are:
Confidentiality
• Access control - Access to data is controlled by means of privileges, roles and user accounts. Authenticated users
– Authentication is a way of implementing decisions of whom to trust. It can be employ passwords, finger prints
etc.
• Secure storage of sensitive data – It is required to prevent data from hackers who could damage the sensitive
data.
• Privacy of communication - The DBMS should be capable of controlling the spread of confidential personal
information from unauthorized people such as credit cards etc.
Integrity
Integrity contributes to maintaining a secure database by preventing the data from becoming invalid and giving
misleading results. It consists of following aspects:
System and object privileges control access to applications tables and system commands so that only authorized
users can change the data.
• Integrity constraints are applied to maintain the correctness and validity of the data in the database.
• Database must be protected from viruses so firewalls and anti-viruses should be used.
• Ensures that access to the network is controlled and data is not vulnerable to attacks during transmission across
network.
Availability
Data should always be made available for the authorized user by the secure system without any delays. Availability is
often thought of as a continuity of service assuring that database is available. Denials of service attacks are attempts
to block authorized users’ ability to access and use the system when needed. It has number of aspects.
• Ease of use – Resources managed by users for working with databases should be effectively managed so that it is
available all the time to valid users.
• Flexibility – Administrators must have all the relevant tools for managing user population.
• Scalability - System performance should not get affected by the increase in number of users or processes which
require services from system.
• Resistance – User profiles must be defined and the resource used by any user should be limited.

MEASURES OF CONTROL:
The measures of control can be broadly divided into the following categories −
• Access Control − Access control includes security mechanisms in a database management system to protect
against unauthorized access. A user can gain access to the database after clearing the login process through only
valid user accounts. Each user account is password protected.
• Flow Control − Distributed systems encompass a lot of data flow from one site to another and also within a site.
Flow control prevents data from being transferred in such a way that it can be accessed by unauthorized agents.
A flow policy lists out the channels through which information can flow. It also defines security classes for data
as well as transactions.
• Data Encryption − Data encryption refers to coding data when sensitive data is to be communicated over public
channels. Even if an unauthorized agent gains access of the data, he cannot understand it since it is in an
incomprehensible format.

DATA SECURITY:
In distributed systems, it is imperative to adopt measure to secure data apart from communications. The data
security measures are −
• Authentication and authorization − these are the access control measures adopted to ensure that only
authentic users can use the database. To provide authentication digital certificates are used. Besides, login is
restricted through username/password combination.
• Data encryption − The two approaches for data encryption in distributed systems are −
 Internal to distributed database approach: The user applications encrypt the data and then store the encrypted
data in the database. For using the stored data, the applications fetch the encrypted data from the database and
then decrypt it.
 External to distributed database: The distributed database system has its own encryption capabilities. The user
applications store data and retrieve them without realizing that the data is stored in an encrypted form in the
database.
• Validated input − In this security measure, the user application checks for each input before it can be used for
updating the database. An un-validated input can cause a wide range of exploits like buffer overrun, command
injection, cross-site scripting and corruption in data.

DATA AUDITING:
A database security system needs to detect and monitor security violations, in order to ascertain the security
measures it should adopt. It is often very difficult to detect breach of security at the time of occurrences. One
method to identify security violations is to examine audit logs. Audit logs contain information such as −
• Date, time and site of failed access attempts.
• Details of successful access attempts.
• Vital modifications in the database system.
• Access of huge amounts of data, particularly from databases in multiple sites.
All the above information gives an insight of the activities in the database. A periodical analysis of the log helps to
identify any unnatural activity along with its site and time of occurrence. This log is ideally stored in a separate server
so that it is inaccessible to attackers.

SECURITY THREATS:
• Poor Privileges - This is a great place to start because it can be so easy to fix and yet potentially so devastating to
your data. Human privileges can often exceed the requirements of a person's job function. They can also remain
unchanged when someone moves roles within the organization, or leaves altogether. This exposes the data to
people who may have ill intent.
• Malware - Cyber-criminals often use advanced attacks that blend multiple tactics such as spear phishing emails
and malware. Just on its own, spear phishing has become an endemic scourge: 95% of US and 83% of UK
respondents in a recent Cloud mark survey said they have experienced an attack. Malware or "malicious
software" is also on the rise. Unaware that malware has infected their device; legitimate database users become
a direct conduit for these groups to access your networks and sensitive data.
• Exploitation of Vulnerable Misconfigured Databases - Our enterprise database consultants find vulnerable and
unpatched databases all too often as we audit systems considered safe by their owners. We often see databases
with their default accounts and configuration parameters in place too - especially in the case of open source
implementations where a staff member has downloaded the software and done a DIY job on the database.

• Limited Security Expertise and Education - According to the Ponemon Institute 2014 Cost of Data Breach Study,
30% of data breach incidents are caused by “human factor”. The Online Trust Alliance (OTA) also stated in 2013
that more than 97% of breaches were preventable by implementing simple steps and following best practice and
internal controls. Ensure your internal staff is trained and capable of maintaining the security of your enterprise
database to a professional business-critical level. If you are not sure, then engage the services of a professional
database service provider such as Fujitsu.

• Input Injection (SQL Injection) - This type of attack allows an attacker to inject code into a program or query or
inject malware onto a computer in order to execute remote commands that can read or modify a database, or
change data on a web site. There are two major types of database injection attacks. The first is SQL Injection that
targets traditional database systems, and then there is no SQL Injection targeting Big Data platforms.

FIREWALL
Firewall is a barrier between Local Area Network (LAN) and the Internet. It allows keeping private resources
confidential and minimizes the security risks. It controls network traffic, in both directions.
The following diagram depicts a sample firewall between LAN and the internet. The connection between the two is
the point of vulnerability. Both hardware and the software can be used at this point to filter network traffic.
Key Points
• Firewall management must be addressed by both system managers and the network managers.
• The amount of filtering a firewall varies. For the same firewall, the amount of filtering may be different in
different directions.

TYPES OF FIREWALL:
There are software and hardware firewalls. Each format serves a different but important purpose. A hardware
firewall is physical, like a broadband router — stored between your network and gateway. A software firewall is
internal — a program on your computer that works through port numbers and applications.
Hardware Firewalls
A hardware firewall sits between your local network of computers and the Internet. The firewall will inspect all the
data that comes in from the Internet, passing along the safe data packets while blocking the potentially dangerous
packets. In order to properly protect a network without hindering performance, hardware firewalls require expert
setup, and so may not be a feasible solution for companies without a dedicated IT department. For businesses with
many computers, however, being able to control network security from one single device simplifies the job.
Software Firewalls
Software firewalls are installed on individual computers on a network. Unlike hardware firewalls, software firewalls
can easily distinguish between programs on a computer. This lets them allow data to one program while blocking
another. Software firewalls can also filter outgoing data, as well as remote responses to outgoing requests. The
major downside to software firewalls for a business is their upkeep: they require installation, updating and
administration on each individual computer.
FIREWALL SYSTEMS FALL IN TO TWO CATEGORIES:
• Network level
• Application level
Network Firewalls
Network firewalls are security devices used to stop or mitigate unauthorized access to private networks connected
to the Internet, especially intranets. The only traffic allowed on the network is defined via firewall policies – any
other traffic attempting to access the network is blocked. Network firewalls sit at the front line of a network, acting
as a communications liaison between internal and external devices.
A network firewall can be configured so that any data entering or exiting the network has to pass through it – it
accomplishes this by examining each incoming message and rejecting those that fail to meet the defined security
criteria. When properly configured, a firewall allows users to access any of the resources they need while
simultaneously keeping out unwanted users, hackers, viruses, worms or other malicious programs trying to access
the protected network.
Application firewall
An application firewall is a form of firewall that controls input/output or system calls of an application or service. It
operates by monitoring and blocking communications based on a configured policy, generally with predefined rule
sets to choose from. The application firewall can control communications up to the application layer of the OSI
model, which is the highest operating layer, and where it gets its name.

DATABASE RECOVERY:
It is the method of restoring the database to its correct state in the event of a failure at the time of the transaction or
after the end of a process. Earlier, you have been given the concept of database recovery as a service that should be
provided by all the DBMS for ensuring that the database is dependable and remains in a consistent state in the
presence of failures. In this context, dependability refers to both the flexibility of the DBMS to various kinds of failure
and its ability to recover from those failures. In this chapter, you will gather a brief knowledge of how this service can
be provided. To gain a better understanding of the possible problems you may encounter in providing a consistent
system, you will first learn about the need for recovery and its types of failure, which usually occurs in a database
environment.

WHAT IS THE NEDD FOR RECOVERY OF DATA


The storage of data usually includes four types of media with an increasing amount of reliability: the main memory,
the magnetic disk, the magnetic tape, and the optical disk. Many different forms of failure can affect database
processing and/or transaction, and each of them has to be dealt with differently. Some data failures can affect the
main memory only, while others involve non-volatile or secondary storage also. Among the sources of failure are:
• Due to hardware or software errors, the system crashes, which ultimately resulting in loss of main memory.
• Failures of media, such as head crashes or unreadable media that results in the loss of portions of secondary
storage.
• There can be application software errors, such as logical errors that are accessing the database that can cause
one or more transactions to abort or fail.
• Natural physical disasters can also occur, such as fires, floods, earthquakes, or power failures.
• Carelessness or unintentional destruction of data or directories by operators or users.
• Damage or intentional corruption or hampering of data (using malicious software or files) hardware or software
facilities.
Whatever the grounds of the failure are, there are two principal things that you have to consider:
• Failure of main memory, including that database buffers.
• Failure of the disk copy of that database.

RECOVERY FACILITY
Every DBMS should offer the following facilities to help out with the recovery mechanism:
• Backup mechanism makes backup copies at a specific interval for the database.
• Logging facilities keep tracing the current state of transactions and any changes made to the database.
• Checkpoint facility allows updates to the database for getting the latest patches to be made permanent and
keep secure from vulnerability.
• Recovery manager allows the database system for restoring the database to a reliable and steady-state after any
failure occurs.
DISTRIBUTED DATABASE MANAGEMENT SYSTEM
A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer
network. A distributed database management system (D–DBMS) is the software that manages the DDB and provides
an access mechanism that makes this distribution transparent to the users. A distributed database management
system (DDBMS) is a centralized software system that manages a distributed database in a manner as if it were all
stored in a single location.
A distributed database is basically a database that is not limited to one system; it is spread over different sites, i.e.,
on multiple computers or over a network of computers. A distributed database system is located on various sited
that don’t share physical components. This may be required when a particular database needs to be accessed by
various users globally. It needs to be managed such that for the users it looks like one single database.
TYPES:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating system, database
management system and the data structures used – all are same at all sites. Hence, they’re easy to manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and software that can lead to
problems in query processing and transactions. Also, a particular site might be completely unaware of the other
sites. Different computers may use a different operating system, different database application. They may even
use different data models for the database. Hence, translations are required for different sites to communicate.

Features
• It is used to create, retrieve, update and delete distributed databases.
• It synchronizes the database periodically and provides access mechanisms by the virtue of which the distribution
becomes transparent to the users.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed and accessed by numerous users
simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the databases.

FACTORS ENCOURAGING DDBMS


1. Distributed Nature of Organizational Units − Most organizations in the current times are subdivided into
multiple units that are physically distributed over the globe. Each unit requires its own set of local data. Thus, the
overall database of the organization becomes distributed.
2. Need for Sharing of Data − the multiple organizational units often need to communicate with each other and
share their data and resources. This demands common databases or replicated databases that should be used in
a synchronized manner.
3. Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical Processing
(OLAP) work upon diversified systems which may have common data. Distributed database systems aid both
these processing by providing synchronized data.
4. Database Recovery − One of the common techniques used in DDBMS is replication of data across different sites.
Replication of data automatically helps in data recovery if database in any site is damaged. Users can access data
from other sites while the damaged site is being reconstructed. Thus, database failure may become almost
inconspicuous to users.
5. Support for Multiple Application Software − Most organizations use a variety of application software each with
its specific database support. DDBMS provides a uniform functionality for using the same data among different
platforms.

ADVANTAGES OF DISTRIBUTED DATABASES


• Modular Development − If the system needs to be expanded to new locations or new units, in centralized
database systems, the action requires substantial efforts and disruption in the existing functioning. However, in
distributed databases, the work simply requires adding new computers and local data to the new site and finally
connecting them to the distributed system, with no interruption in current functions.
• More Reliable − In case of database failures, the total system of centralized databases comes to a halt. However,
in distributed systems, when a component fails, the functioning of the system continues may be at a reduced
performance. Hence DDBMS is more reliable.
• Better Response − If data is distributed in an efficient manner, then user requests can be met from local data
itself, thus providing faster response. On the other hand, in centralized systems, all queries have to pass through
the central computer for processing, which increases the response time.
• Lower Communication Cost − In distributed database systems, if data is located locally where it is mostly used,
then the communication costs for data manipulation can be minimized. This is not feasible in centralized
systems.
UNIT 4
DATA WAREHOUSE:
Data warehouse refers to the process of compiling and organizing data into one common database, whereas data
mining refers to the process of extracting useful data from the databases. The data mining process depends on the
data compiled in the data warehousing phase to recognize meaningful patterns. A data warehousing is created to
support management systems.
A Data Warehouse refers to a place where data can be stored for useful mining. It is like a quick computer system
with exceptionally huge data storage capacity. Data from the various organizations’ systems are copied to the
Warehouse, where it can be fetched and conformed to delete errors. Here, advanced requests can be made against
the warehouse storage of data.
Data warehouse combines data from
numerous sources which ensure the data
quality, accuracy, and consistency. Data
warehouse boosts system execution by
separating analytics processing from
transnational databases. Data flows into a
data warehouse from different databases. A data warehouse works by sorting out data into a pattern that depicts
the format and types of data. Query tools examine the data tables using patterns.
Data warehouses and databases both are relative data systems, but both are made to serve different purposes. A
data warehouse is built to store a huge amount of historical data and empowers fast requests over all the data,
typically using Online Analytical Processing (OLAP). A database is made to store current transactions and allow quick
access to specific transactions for ongoing business processes, commonly known as Online Transaction Processing
(OLTP).

IMPORTANT FEATURES OF DATA WAREHOUSE:


 Subject Oriented - A data warehouse is subject-oriented. It provides useful data about a subject instead of the
company's ongoing operations, and these subjects can be customers, suppliers, marketing, product, promotion,
etc. A data warehouse usually focuses on modeling and analysis of data that helps the business organization to
make data-driven decisions.
 Time-Variant - The different data present in the data warehouse provides information for a specific period.
 Integrated - A data warehouse is built by joining data from heterogeneous sources, such as social databases,
level documents, etc.
 Non- Volatile - It means, once data entered into the warehouse cannot be change.

ADVANTAGES OF DATA WAREHOUSE:


 More accurate data access
 Improved productivity and performance
 Cost-efficient
 Consistent and quality data

DATA MINING:
Data mining refers to the analysis of data. It is the computer-supported process of analyzing huge sets of data that
have either been compiled by computer systems or have been downloaded into the computer. In the data mining
process, the computer analyzes the data and extracts useful information from it. It looks for hidden patterns within
the data set and tries to predict future behavior. Data mining is primarily used to discover and indicate relationships
among the data sets.
Data mining aims to enable business organizations to view business behaviors, trends relationships that allow the
business to make data-driven decisions. It is also known as knowledge Discover in Database (KDD). Data mining tools
utilize AI, statistics, databases, and machine learning systems to discover the relationship between the data. Data
mining tools can support business-related questions that traditionally time-consuming to resolve any issue.

IMPORTANT FEATURES OF DATA MINING:


 It utilizes the automated discovery of patterns.
 It predicts the expected results.
 It focuses on large data sets and databases
 It creates actionable information.

BENEFITS OF DATA MINING:


• Data mining technique helps companies to get knowledge-based information.
• Data mining helps organizations to make the profitable adjustments in operation and production.
• The data mining is a cost-effective and efficient solution compared to other statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviors as well as automated discovery of hidden patterns.
• It can be implemented in new systems as well as existing platforms.
• It is the speedy process which makes it easy for the users to analyze huge amount of data in less time.

DISADVANTAGES OF DATA MINING


• There are chances of companies may sell useful information of their customers to other companies for money.
For example, American Express has sold credit card purchases of their customers to the other companies.
• Many data mining analytics software is difficult to operate and requires advance training to work on.
• Different data mining tools work in different manners due to different algorithms employed in their design.
Therefore, the selection of correct data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious consequences in certain conditions.

DIFFERENCES BETWEEN DATA MINING AND DATA WAREHOUSING:


DATA MINING DATA WAREHOUSING

Data mining is the process of determining data A data warehouse is a database system designed for
patterns. analytics.
Data mining is generally considered as the process of Data warehousing is the process of combining all the
extracting useful data from a large set of data. relevant data.

Business entrepreneurs carry data mining with the help Data warehousing is entirely carried out by the
of engineers. engineers.
In data mining, data is analyzed repeatedly. In data warehousing, data is stored periodically.

Data mining uses pattern recognition Data warehousing is the process of extracting and
techniques to identify patterns. storing data that allow easier reporting.
One of the most amazing data mining technique is the One of the advantages of the data warehouse is its
detection and identification of the unwanted errors ability to update frequently. That is the reason why it
that occur in the system. is ideal for business entrepreneurs who want up to
date with the latest stuff.

The data mining techniques are cost-efficient as The responsibility of the data warehouse is to
compared to other statistical data applications. simplify every type of business data.

The data mining techniques are not 100 percent In the data warehouse, there is a high possibility that
accurate. It may lead to serious consequences in a the data required for analysis by the company may
certain condition. not be integrated into the warehouse. It can simply
lead to loss of data.

Companies can benefit from this analytical tool by Data warehouse stores a huge amount of historical
equipping suitable and accessible knowledge-based data that helps users to analyze different periods
data. and trends to make future predictions.

INTERNET DATABASE
An online database is a database accessible from a local network or the Internet, as opposed to one that is stored
locally on an individual computer or its attached storage (such as a CD). Online databases are hosted on websites,
made available as software as a service product accessible via a web browser.
Used for both professional and personal use, they are hosted on websites, and are software as service products
which mean that access is provided via a web browser. The TCM Movie Database is an example of an online
database.

MULTIMEDIA DATABASE
A Multimedia database (MMDB) is a collection of related for multimedia data. The multimedia data include one or
more primary media data types such as text, images, graphic objects (including drawings, sketches and illustrations)
animation sequences, audio and video.
A multimedia database is a database that include one or more primary media file types such as .txt (documents), .jpg
(images), .swf (videos), .mp3(audio) etc.
A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data
potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for
multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database.

A Multimedia Database (MMDB) needs to manage additional information pertaining to the actual multimedia data.
The information is about the following:
• Media data: the actual data representing an object.
• Media format data: information about the format of the media data after it goes through the acquisition,
processing and encoding phases.
• Media keyword data: the keyword descriptions, usually relating to the generation of the media data.
• Media feature data: content dependent data such as contain information about the distribution of colors, the
kinds of textures and the different shapes present in an image.
The last three types are called metadata as they describe several different aspects of the media data. The media
keyword data and media feature data are used as indices for searching purpose. The media format data is used to
present the retrieved information.

DIGITAL LIBRARY
A digital library is a collection of documents in organized electronic form, available on the Internet or on CD-ROM
(compact-disk read-only memory) disks. Depending on the specific library, a user may be able to access magazine
articles, books, papers, images, sound files, and videos.

MOBILE DATABASE
A mobile database is a database that can be connected to by a mobile computing device over a mobile network.
A database that is portable and physically separate from the corporate database server.
The mobile database includes the following components −
• The mobile database that allows users to view information even while on the move. It shares information with
the main database.
• The device that uses the mobile database to access data. This device can be a mobile phone, laptop etc.
• A communication link that allows the transfer of data between the mobile database and the main database.

Advantages of Mobile Databases


• The data in a database can be accessed from anywhere using a mobile database. It provides wireless database
access.
• The database systems are synchronized using mobile databases and multiple users can access the data with
seamless delivery process.
• Mobile databases require very little support and maintenance.
• The mobile database can be synchronized with multiple devices such as mobiles, computer devices, laptops etc.

Disadvantages of Mobile Databases


• The mobile data is less secure than data that is stored in a conventional stationary database. This presents a
security hazard.
• The mobile unit that houses a mobile database may frequently lose power because of limited battery. This
should not lead to loss of data in database.

SPATIAL DATABSE
A spatial database is a database that is optimized for storing and querying data that represents objects defined in a
geometric space. Most spatial databases allow the representation of simple geometric objects such as points, lines
and polygons. Spatial data is associated with geographic locations such as cities, towns etc.

Example
A road map is a visualization of geographic information. A road map is a 2-dimensional object which contains points,
lines, and polygons that can represent cities, roads, and political boundaries such as states or provinces.
In general, spatial data can be of two types −
• Vector data: This data is represented as discrete points, lines and polygons
• Rastor data: This data is represented as a matrix of square cell

METADATA:
Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata.
For example, the index of a book serves as a metadata for the contents in the book. In other words, we can say that
metadata is the summarized data that leads us to the detailed data.
In terms of data warehouse, we can define metadata as following −
• Metadata is a road-map to data warehouse.
• Metadata in data warehouse defines the warehouse objects.
• Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data
warehouse.

METADATA REPOSITORY:
Metadata repository is an integral part of a data warehouse system. It contains the following metadata −
• Business metadata − It contains the data ownership information, business definition, and changing policies.
• Operational metadata − It includes currency of data and data lineage. Currency of data refers to the data being
active, archived, or purged. Lineage of data means history of data migrated and transformation applied on it.
• Data for mapping from operational environment to data warehouse − It metadata includes source databases
and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and purging rules.
• The algorithms for summarization − It include dimension algorithms, data on granularity, aggregation,
summarizing, etc.

Multimedia database is the collection of interrelated multimedia data that includes text, graphics (sketches,
drawings), images, animations, video, audio etc. and have vast amounts of multisource multimedia data. The
framework that manages different types of multimedia data which can be stored, delivered and utilized in different
ways is known as multimedia database management system. There are three classes of the multimedia database
which includes static media, dynamic media and dimensional media.
Content of Multimedia Database management system:
• Media data – The actual data representing an object.
• Media format data – Information such as sampling rate, resolution, encoding scheme etc. about the format of
the media data after it goes through the acquisition, processing and encoding phase.
• Media keyword data – Keywords description relating to the generation of data. It is also known as content
descriptive data. Example: date, time and place of recording.
• Media feature data – Content dependent data such as the distribution of colors, kinds of texture and different
shapes present in data.

Types of multimedia applications based on data management characteristic are:


• Repository applications – A Large amount of multimedia data as well as meta-data (Media format date, Media
keyword data, Media feature data) that is stored for retrieval purpose, e.g., Repository of satellite images,
engineering drawings, radiology scanned pictures.
• Presentation applications – They involve delivery of multimedia data subject to temporal constraint. Optimal
viewing or listening requires DBMS to deliver data at certain rate offering the quality of service above a certain
threshold. Here data is processed as it is delivered. Example: Annotating of video and audio data, real-time
editing analysis.
• Collaborative work using multimedia information – It involves executing a complex task by merging drawings,
changing notifications. Example: Intelligent healthcare network.

There are still many challenges to multimedia databases, some of which are:
• Modeling – Working in this area can improve database versus information retrieval techniques thus, documents
constitute a specialized area and deserve special consideration.
• Design – The conceptual, logical and physical design of multimedia databases has not yet been addressed fully as
performance and tuning issues at each level are far more complex as they consist of a variety of formats like
JPEG, GIF, PNG, MPEG which is not easy to convert from one form to another.
• Storage – Storage of multimedia database on any standard disk presents the problem of representation,
compression, mapping to device hierarchies, archiving and buffering during input-output operation. In DBMS, a
“BLOB” (Binary Large Object) facility allows untied bitmaps to be stored and retrieved.
• Performance – For an application involving video playback or audio-video synchronization, physical limitations
dominate. The use of parallel processing may alleviate some problems but such techniques are not yet fully
developed. Apart from this multimedia database consume a lot of processing time as well as bandwidth.
• Queries and retrieval –For multimedia data like images, video, audio accessing data through query opens up
many issues like efficient query formulation, query execution and optimization which need to be worked upon.

You might also like