Summ of Dbms
Summ of Dbms
Summ of Dbms
A database is a collection of logically related data. Data means raw facts or data that can be
recorded and that have implicit meaning.
I. A database represents some aspect of the real world, sometimes called the miniworld
or the Universe of Discourse (UoD).
II. A database is a logically coherent collection of data with some inherent meaning.
III. A database is designed, built, and populated with data for a specific purpose.
Database Management Systems are software or program products that help in defining,
creating, maintaining and controlling access to a database. Defining a database involves
specifying the data types, structures, and constraints for the data to be stored in the database.
Constructing the database is the process of storing the data itself on some storage medium
that is controlled by the DBMS. Manipulating a database includes such functions as querying
the database to retrieve specific data, updating the database to reflect changes in the
miniworld, and generating reports from the data.
Data Vs Information
Data means raw facts or data that can be recorded and have implicit meaning.
Information is a data that has been processed, organized, or structured in a way that provides
meaning, making it useful for understanding or decision-making.
Difference
Data Information
Data doesn’t depend on information. Information depends on data.
Data is measured in bits and bytes. Information is measured in meaningful units
like time, quantity, etc.
Data is unorganized and often unrelated Information is organized and meaningful
Data lacks inherent meaning Information contains meaning and context
Data requires processing to be understood Already interpreted and understandable
Classification of database
File processing approach is a traditional approach to information system design that focuses
on data processing needs of a particular department or an area within organization.
The file processing is being developed with a third-generation programming language where
each application or program is designed with its own set of data files. Traditional files often
exist in environments where there is no overall plan to guide the growth of applications.
Hence, there is a need to write an application for each file with a different storage structure.
Database Approach
The database approach used database management system to provide facilities for querying,
data security, and integrity, and allows simultaneous access to data by a number of different
users.
This refers to the people whose jobs involve the day-to-day use of a large database. They are:
• End Users: Individuals who interact with the database through applications to
retrieve or input data, including both casual users and power users.
This refers to the people whose work is to maintain the database system environment but who
are not actively interested in the database itself. They are:
• Licensing Fees.
• Hardware Costs.
• Integration Costs.
• Training Costs.
• Maintenance and Support.
• The database and applications are simple, well-defined, and not expected to change.
• There are stringent real-time requirements for some programs that may not be met
because of DBMS overhead.
• Multiple-user access to data is not required.
The waterfall Software Development Life Cycle has been adopted as the basis for a model of
database development.
Stages
• Planning: The goal of the planning phase is to align information technology with the
business strategies of an organization.
• Analysis: The purpose of this phase is to develop detailed specifications for the
information systems required to support the organization.
• Design: The purpose of this stage is to transform the conceptual data model into an
implementation model. ( logical and physical).
• Implementation.
Database design focuses on how the database structure will be used to store and manage
end-user data.
Data modeling refers to the process of creating a specific data model for a determined
problem domain.
Data model is a relatively simple representation, usually graphical, of more complex real-
world data structures. Types:
• Relational: A data model that organizes data into tables (relations) with rows and
columns, allowing relationships between tables through foreign keys. It uses SQL for
data manipulation and retrieval.
• Semi-structured: A data model that does not have a fixed schema but still contains
organizational properties that make it easier to analyze, such as XML and JSON. It
allows for flexibility in data representation.
• Entity-relationship: A conceptual data model that is used to visually represent data
entities, their attributes, and the relationships between them. ER diagrams help in
database design and structure planning.
• Object-base: A data model that represents data as objects, similar to object-oriented
programming. It encapsulates both data and behaviors, allowing for complex data
structures and relationships, typically used in applications requiring advanced data
representation.
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are
associated.
A database instance is the data that is stored in the database at a particular moment.
The goal of the three-schema architecture is to separate the user applications and the physical
database.
i. The internal level describes the physical storage structure of the database.
ii. The conceptual level describes the structure of the whole database.
iii. The external or view level includes many external schemas or user views.
Data Independence
This is defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level.
types
i. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas.
ii. Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas.
This is a conceptual data model used to visually represent data entities, their attributes, and
the relationships between them. ER diagrams help in database design and structure planning.
the ER Model creates entity sets, relationship sets, general attributes, and constraints.
Entity
Types of Attributes
1. Simple attribute: Simple attributes are atomic values that cannot be divided further.
For example, a student's phone number has an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have a Surname, first name,
and other name.
3. Derived attribute: Derived attributes are attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. It
is represented by a dashed oval in an ER Diagram.
Types
The key is an attribute or collection of attributes that uniquely identifies an entity among
entity sets. For example, the StudentID of a student makes him/her identifiable among
students.
1. Super Key: A set of attributes (one or more) that collectively identifies an entity in an
entity set.
2. Candidate Key: A minimal super key is called a candidate key. An entity set may
have more than one candidate key.
3. Primary Key: A primary key is one of the candidate keys chosen by the database
designer to uniquely identify the entity set.
Relationship This refers to the logical association among entities. Diamond or rhombus is
used to represent the relationship.
Mapping cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
1. One to one
2. One to may
3. Many to one
4. Many to many
Weak Entity
An entity that requires a strong entity for identification and can not exist independently. It has
a partial key which is represented by a double rectangle.
A relational data model organizes data into tables (relations) with rows and columns,
allowing relationships between tables through foreign keys. It uses SQL for data
manipulation and retrieval.
1. Table: relations are saved in the format of Tables. A table has rows and columns,
where rows represent records and columns represent the attributes.
2. Tuple: A single row of a table, which contains a single record for that relation is
called a tuple.
3. Relation instance: A finite set of tuples in the relational database system.
4. Relation schema: A relation schema describes the relation name (table name),
attributes, and their names.
5. Relation key: Each row has one or more attributes, known as relation key, which can
identify the row in the relation (table) uniquely.
6. Attribute domain: Every attribute has some predefined value scope, known as
attribute domain.
Relational Integrity Constraints referred to conditions which must be present for a valid
relation.
Key constraints is an attribute that can uniquely identify a tuple in a relation is called the key
of the table. Key constraints force that:
• In a relation with a key attribute, no two tuples can have identical values for key
attributes.
• A key attribute cannot have NULL values.
• Key constraints are also referred to as Entity Constraints.
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
There are four basic update operations performed on relational database model are insert,
update, delete and select.
1. Simplicity
2. Structural Independence
3. Easy to use
4. Query capability
5. Data independence
6. Scalable
1. Performance issues
2. Scalability challenges
3. Complexity
Normalization is the process of effectively organizing data into multiple relational tables to
minimize data redundancy. "Normalization is all about the key, the whole key, and
nothing but the key".
Normalization is a technique/method that can help you avoid data anomalies and other
problems with managing your data. It consists of transforming a table through various stages:
1st normal form, 2nd normal form, 3rd normal form, and beyond. It aims to:
There are three types of anomalies that may arise which include the following:-
1. Insertion Anomalies: Insertion anomalies occur when we try to insert data into a
flawed table. Likewise if you try to insert data in a record that does not exist at all.
2. Deletion Anomalies: Deletion anomalies occur when we delete data from a flawed
schema. If you try to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
3. Update Anomalies: Update anomalies occur when we change data in a flawed
schema. − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations.
Advantages of Normalization
Functional Dependency
A Functional dependency defines the relationship between two attributes, typically between a
prime attribute (primary key) and non-prime attributes. If for a table X containing attributes
A and B, among which the attribute A is a primary key then the functional dependency is
defined as: A→B.
NOTE: The left-hand side attributes determine the values of attributes on the right-hand side.
Normalization consists of transforming a table through various stages: 1st normal form,
2nd normal form, 3rd normal form, and beyond.
• A table is in 1NF if all its columns contain atomic (indivisible) values, meaning each
column must contain only one value per row.
• It eliminates repeating groups or arrays within rows.
• For example, instead of storing multiple phone numbers in one cell, each phone
number should be in its row.
• A table is in 2NF if it is already in 1NF and all non-key attributes (columns that are
not part of the primary key) are fully dependent on the entire primary key.
• This form applies primarily to tables with a composite primary key (a primary key
made up of more than one column).
• It eliminates partial dependencies, where a non-key column depends on only part of a
composite primary key.
• For example, if a table stores information about a student and their courses, details
like “student name” should not depend on only part of the primary key like “course
ID.”
• A table is in 3NF if it is already in 2NF and all non-key attributes are only dependent
on the primary key, not on other non-key attributes.
• This eliminates transitive dependencies, where a non-key attribute depends on another
non-key attribute rather than directly on the primary key.
• For example, if a table has columns for “student ID,” “student address,” and “city,”
the “city” column should depend on the “student ID” directly, not indirectly through
“student address.”
• A stronger version of 3NF, BCNF requires that for any dependency \( X \rightarrow Y
\), \( X \) must be a superkey (a unique identifier) of the table.
• BCNF addresses situations where a table in 3NF may still have anomalies, often due
to overlapping candidate keys.