Summ of Dbms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

What is a database?

A database is a collection of logically related data. Data means raw facts or data that can be
recorded and that have implicit meaning.

A database has the following implicit properties:

I. A database represents some aspect of the real world, sometimes called the miniworld
or the Universe of Discourse (UoD).

II. A database is a logically coherent collection of data with some inherent meaning.

III. A database is designed, built, and populated with data for a specific purpose.

Database Management System

Database Management Systems are software or program products that help in defining,
creating, maintaining and controlling access to a database. Defining a database involves
specifying the data types, structures, and constraints for the data to be stored in the database.
Constructing the database is the process of storing the data itself on some storage medium
that is controlled by the DBMS. Manipulating a database includes such functions as querying
the database to retrieve specific data, updating the database to reflect changes in the
miniworld, and generating reports from the data.

Data Vs Information

Data means raw facts or data that can be recorded and have implicit meaning.

Information is a data that has been processed, organized, or structured in a way that provides
meaning, making it useful for understanding or decision-making.

Difference
Data Information
Data doesn’t depend on information. Information depends on data.
Data is measured in bits and bytes. Information is measured in meaningful units
like time, quantity, etc.
Data is unorganized and often unrelated Information is organized and meaningful
Data lacks inherent meaning Information contains meaning and context
Data requires processing to be understood Already interpreted and understandable

Classification of database

• The number of users supported (single-user and multiuser (workgroup or enterprises))


• Where the data is located (centralized or distributed).
• Types of data stored (general-purpose and discipline-specific).
• The intended data usage (operational or analytic).
• The degree to which the data is structured (structured and unstructured).
File Processing Approach

File processing approach is a traditional approach to information system design that focuses
on data processing needs of a particular department or an area within organization.

The file processing is being developed with a third-generation programming language where
each application or program is designed with its own set of data files. Traditional files often
exist in environments where there is no overall plan to guide the growth of applications.
Hence, there is a need to write an application for each file with a different storage structure.

File Processing Approach has the following drawbacks:-

• Data Redundancy: Duplication of data across different files, leading to increased


storage and maintenance.
• Data Inconsistency: Conflicting or mismatched data values in different files due to
redundant data.
• Data Isolation: Data is stored in separate files, making it difficult to access and
integrate across different applications.
• Dependency on Application Program: Changes in data structure require changes to
application programs that access the data, limiting flexibility.
• Atomicity Issues: Lack of mechanisms to ensure complete or partial transactions,
making it hard to revert changes in case of failure.
• Data Security: Limited access control, making it challenging to restrict or secure data
effectively from unauthorized access.

Database Approach

The database approach used database management system to provide facilities for querying,
data security, and integrity, and allows simultaneous access to data by a number of different
users.

advantages of Database management system over file system:

i. Minimal Data Redundancy.


ii. Consistency of Data.
iii. Integration of Data.
iv. Sharing of Data.
v. Ease of Application Development.

Costs of the Database Approach

• New Specialized Personnel: Requires hiring trained database administrators and


developers, which adds to operational costs.
• Need for Explicit Backup: Requires regular, structured backups to prevent data loss,
demanding time and resources.
• Interference with Shared Data: Multiple users accessing shared data can lead to
conflicts and slow performance if not managed well.
• Organizational Conflict: Changes in data control and management can lead to
conflicts between departments over data ownership and access policies.
Actors on the Scene

This refers to the people whose jobs involve the day-to-day use of a large database. They are:

• Database Administrators: Responsible for managing, maintaining, and securing the


database system, ensuring performance, availability, and data integrity.

• Database Designers: Focus on creating the database structure and schema,


determining how data will be stored, organized, and accessed.

• End Users: Individuals who interact with the database through applications to
retrieve or input data, including both casual users and power users.

• System Analysts and Application Programmers: Analyze user requirements to


design and develop database applications, ensuring that they meet organizational
needs and are user-friendly.

Workers behind the scene

This refers to the people whose work is to maintain the database system environment but who
are not actively interested in the database itself. They are:

• System designers and implementers.


• Tool developers.
• Operators and maintenance personnel.

Components of the Database Environment


The overhead costs of using a DBMS are due to the following

• Licensing Fees.
• Hardware Costs.
• Integration Costs.
• Training Costs.
• Maintenance and Support.

Circumstances where file processing approach is more desirable:

• The database and applications are simple, well-defined, and not expected to change.
• There are stringent real-time requirements for some programs that may not be met
because of DBMS overhead.
• Multiple-user access to data is not required.

Database Development Process

Database application development is the process of obtaining real-world requirements,


analyzing requirements, designing the data and functions of the system, and then
implementing the operations in the system.

The waterfall Software Development Life Cycle has been adopted as the basis for a model of
database development.

Stages

• Planning: The goal of the planning phase is to align information technology with the
business strategies of an organization.
• Analysis: The purpose of this phase is to develop detailed specifications for the
information systems required to support the organization.
• Design: The purpose of this stage is to transform the conceptual data model into an
implementation model. ( logical and physical).
• Implementation.

Data Modeling and Data Model

Database design focuses on how the database structure will be used to store and manage
end-user data.

Data modeling refers to the process of creating a specific data model for a determined
problem domain.

Data model is a relatively simple representation, usually graphical, of more complex real-
world data structures. Types:

• Relational: A data model that organizes data into tables (relations) with rows and
columns, allowing relationships between tables through foreign keys. It uses SQL for
data manipulation and retrieval.
• Semi-structured: A data model that does not have a fixed schema but still contains
organizational properties that make it easier to analyze, such as XML and JSON. It
allows for flexibility in data representation.
• Entity-relationship: A conceptual data model that is used to visually represent data
entities, their attributes, and the relationships between them. ER diagrams help in
database design and structure planning.
• Object-base: A data model that represents data as objects, similar to object-oriented
programming. It encapsulates both data and behaviors, allowing for complex data
structures and relationships, typically used in applications requiring advanced data
representation.

Database schema and instance

A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are
associated.

A database instance is the data that is stored in the database at a particular moment.

Three-Schema Architecture (level)

The goal of the three-schema architecture is to separate the user applications and the physical
database.

i. The internal level describes the physical storage structure of the database.
ii. The conceptual level describes the structure of the whole database.
iii. The external or view level includes many external schemas or user views.

Data Independence

This is defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level.

types

i. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas.
ii. Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas.

DATABASE MODELING USING ENTITY-RELATIONSHIP (ER)

This is a conceptual data model used to visually represent data entities, their attributes, and
the relationships between them. ER diagrams help in database design and structure planning.
the ER Model creates entity sets, relationship sets, general attributes, and constraints.
Entity

An entity in an ER model is a real-world entity having properties called attributes. Every


attribute is defined by its set of values called domain.

Entity type is a collection of entities that share common properties or characteristics.

Entity instance is a single occurrence of an entity type.

Attribute is a property or characteristic of an entity. Attributes are represented by means of


ellipses. Every ellipse represents one attribute and is directly connected to its entity
(rectangle).

Types of Attributes

1. Simple attribute: Simple attributes are atomic values that cannot be divided further.
For example, a student's phone number has an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have a Surname, first name,
and other name.
3. Derived attribute: Derived attributes are attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. It
is represented by a dashed oval in an ER Diagram.

Types

1. Single-value attribute: Single-value attributes contain single value. For example


NationalIdentificationNumber or InternationalPassportNo.
2. Multi-value attribute: Multi-value attributes may contain more than one value. For
example, a person can have more than one phoneNo, email address, etc.
Entity-Set and Keys

The key is an attribute or collection of attributes that uniquely identifies an entity among
entity sets. For example, the StudentID of a student makes him/her identifiable among
students.

1. Super Key: A set of attributes (one or more) that collectively identifies an entity in an
entity set.
2. Candidate Key: A minimal super key is called a candidate key. An entity set may
have more than one candidate key.
3. Primary Key: A primary key is one of the candidate keys chosen by the database
designer to uniquely identify the entity set.

Relationship This refers to the logical association among entities. Diamond or rhombus is
used to represent the relationship.

Relationship Set A set of relationships of similar type.

Degree of Relationship The number of participating entities in a relationship defines the


degree of the relationship.

1. Unary relationship (recursive relationship): It is a relationship between the instances


of one entity type.

2. Binary relationship: It is a relationship between instances of two entity types.


3. Ternary relationship: It is a simultaneous relationship among instances of three
entity types.

Mapping cardinalities

Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.

1. One to one
2. One to may
3. Many to one
4. Many to many

Existence Dependenc: an entity existence depends on another entity.

Weak Entity

An entity that requires a strong entity for identification and can not exist independently. It has
a partial key which is represented by a double rectangle.

RELATIONAL DATABASE MODEL, RELATIONAL DATABASE CONSTRAINT

Relational Database Model

A relational data model organizes data into tables (relations) with rows and columns,
allowing relationships between tables through foreign keys. It uses SQL for data
manipulation and retrieval.

Advantage of relational data model over order data model

1. Simple data representation


2. Ease of expression of complex queries.

Relational Database Model

1. Table: relations are saved in the format of Tables. A table has rows and columns,
where rows represent records and columns represent the attributes.
2. Tuple: A single row of a table, which contains a single record for that relation is
called a tuple.
3. Relation instance: A finite set of tuples in the relational database system.
4. Relation schema: A relation schema describes the relation name (table name),
attributes, and their names.
5. Relation key: Each row has one or more attributes, known as relation key, which can
identify the row in the relation (table) uniquely.
6. Attribute domain: Every attribute has some predefined value scope, known as
attribute domain.

Relational Integrity Constraints referred to conditions which must be present for a valid
relation.

Key constraints is an attribute that can uniquely identify a tuple in a relation is called the key
of the table. Key constraints force that:

• In a relation with a key attribute, no two tuples can have identical values for key
attributes.
• A key attribute cannot have NULL values.
• Key constraints are also referred to as Entity Constraints.

Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.

Operations in Relational Model

There are four basic update operations performed on relational database model are insert,
update, delete and select.

1. Insert is used to insert data into the relation.


2. Delete is used to delete tuples from the table.
3. Modify/update allows you to change the values of some attributes in existing tuples.
4. Select allows you to choose a specific range of data.
NOTE: Whenever one of these operations is applied, integrity constraints specified on the
relational database schema must never be violated.

Advantages of using Relational Model

1. Simplicity
2. Structural Independence
3. Easy to use
4. Query capability
5. Data independence
6. Scalable

Disadvantages of using Relational Model

1. Performance issues
2. Scalability challenges
3. Complexity

Mapping Entity Relationship Model to Relational Model

Mapping Process (Algorithm):-

• Create table for each entity.


• Entity's attributes should become fields of tables with their respective data types.
• Declare primary key.

FUNCTIONAL DEPENDENCIES AND NORMALIZATION

Normalization is the process of effectively organizing data into multiple relational tables to
minimize data redundancy. "Normalization is all about the key, the whole key, and
nothing but the key".

Normalization is a technique/method that can help you avoid data anomalies and other
problems with managing your data. It consists of transforming a table through various stages:
1st normal form, 2nd normal form, 3rd normal form, and beyond. It aims to:

1. Eliminate data redundancies (and therefore use less space).


2. Make it easier to make changes to data, and avoid anomalies when doing so.
3. Make referential integrity constraints easier to enforce.
4. Produce an easily comprehensible structure that closely resembles the situation the
data represents, and allows for growth.

There are three types of anomalies that may arise which include the following:-

1. Insertion Anomalies: Insertion anomalies occur when we try to insert data into a
flawed table. Likewise if you try to insert data in a record that does not exist at all.
2. Deletion Anomalies: Deletion anomalies occur when we delete data from a flawed
schema. If you try to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
3. Update Anomalies: Update anomalies occur when we change data in a flawed
schema. − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations.

Advantages of Normalization

1. Redundant data gets removed efficiently.


2. Improved data quality and flexibility in database designing.
3. The improved overall organization of data in the database.
4. Data is consistent and logically stored in the database.

Functional Dependency

A Functional dependency defines the relationship between two attributes, typically between a
prime attribute (primary key) and non-prime attributes. If for a table X containing attributes
A and B, among which the attribute A is a primary key then the functional dependency is
defined as: A→B.

NOTE: The left-hand side attributes determine the values of attributes on the right-hand side.

Normalization consists of transforming a table through various stages: 1st normal form,
2nd normal form, 3rd normal form, and beyond.

First Normal Form (1NF):

• A table is in 1NF if all its columns contain atomic (indivisible) values, meaning each
column must contain only one value per row.
• It eliminates repeating groups or arrays within rows.
• For example, instead of storing multiple phone numbers in one cell, each phone
number should be in its row.

Second Normal Form (2NF):

• A table is in 2NF if it is already in 1NF and all non-key attributes (columns that are
not part of the primary key) are fully dependent on the entire primary key.
• This form applies primarily to tables with a composite primary key (a primary key
made up of more than one column).
• It eliminates partial dependencies, where a non-key column depends on only part of a
composite primary key.
• For example, if a table stores information about a student and their courses, details
like “student name” should not depend on only part of the primary key like “course
ID.”

Third Normal Form (3NF):

• A table is in 3NF if it is already in 2NF and all non-key attributes are only dependent
on the primary key, not on other non-key attributes.
• This eliminates transitive dependencies, where a non-key attribute depends on another
non-key attribute rather than directly on the primary key.
• For example, if a table has columns for “student ID,” “student address,” and “city,”
the “city” column should depend on the “student ID” directly, not indirectly through
“student address.”

Boyce-Codd Normal Form (BCNF):

• A stronger version of 3NF, BCNF requires that for any dependency \( X \rightarrow Y
\), \( X \) must be a superkey (a unique identifier) of the table.
• BCNF addresses situations where a table in 3NF may still have anomalies, often due
to overlapping candidate keys.

You might also like