Database 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

Relational Model Concepts

Andrej Tunevič
andrej.tunevic@vilniustech.lt
Relational Model
Concepts
• A Relation is a mathematical concept based on the ideas
of sets
• The model was first proposed by Dr. E.F. Codd of IBM
Research in 1970 in the following paper:
– "A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970
• The above paper caused a major revolution in the field of
database management and earned Dr. Codd the coveted
ACM Turing Award
Outline
• Relational Model Concepts
– Domains, Attributes, and Relations
– Characteristics of Relations
• Relational Model Constrains and Relational
Database Schemas
• Update Operations, Transactions, and Dealing
with Constrain Violations
Relational Model
Concepts
• The relational model represents the database as
a collection of relations

• Each relation resembles a table of values

• When a relation is thought of as a table of


values, each row in the table represents a
collection of related data values
Formal Terminology
• A row is called a tuple
• A column header is called an attribute
• The table is called relation
Domain
• A Domain D is a set of atomic values.

• Atomic means that each value in the domain is indivisible


as far as the relational model is concerned

• It means that if we separate an atomic value, the value


itself become meaningless, for example:
– SSN
– Local_phone_number
– Names
– Employee_ages
Relational Model
Concepts
• The relational Model of Data is based on the concept of a
Relation
– The strength of the relational approach to data
management comes from the formal foundation provided by
the theory of relations
• We review the essentials of the formal relational model in
this chapter
• In practice, there is a standard model based on SQL
• Note: There are several important differences between
the formal model and the practical model, as we shall see
Relational Model
Concepts
• A Relation is a mathematical concept based on
the ideas of sets
• The model was first proposed by Dr. E.F. Codd of
IBM Research in 1970 in the following paper:
– "A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970
• The above paper caused a major revolution in
the field of database management and earned
Dr. Codd the coveted ACM Turing Award
Informal Definitions
• Informally, a relation looks like a table of values.

• A relation typically contains a set of rows.

• The data elements in each row represent certain facts that


correspond to a real-world entity or relationship
– In the formal model, rows are called tuples

• Each column has a column header that gives an indication


of the meaning of the data items in that column
– In the formal model, the column header is called an attribute
name (or just attribute)
Informal Definitions
• Key of a Relation:
– Each row has a value of a data item (or set of items)
that uniquely identifies that row in the table
• Called the key
– In the STUDENT table, SSN is the key

– Sometimes row-ids or sequential numbers are


assigned as keys to identify the rows in a table
• Called artificial key or surrogate key
Formal Definitions -
Schema
• The Schema (or description) of a Relation:
– Denoted by R(A1, A2, .....An)
– R is the name of the relation
– The attributes of the relation are A1, A2, ..., An
• Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
– CUSTOMER is the relation name
– Defined over the four attributes: Cust-id, Cust-name,
Address, Phone#
• Each attribute has a domain or a set of valid values.
– For example, the domain of Cust-id is 6 digit numbers.
Formal Definitions -
Tuple
• A tuple is an ordered set of values (enclosed in angled
brackets ‘< … >’)
• Each value is derived from an appropriate domain.
• A row in the CUSTOMER relation is a 4-tuple and would
consist of four values, for example:
– <632895, "John Smith", "101 Main St. Atlanta, GA 30332",
"(404) 894-2000">
– This is called a 4-tuple as it has 4 values
– A tuple (row) in the CUSTOMER relation.
• A relation is a set of such tuples (rows)
Formal Definitions -
Domain
• A domain has a logical definition:
– Example: “USA_phone_numbers” are the set of 10 digit phone
numbers valid in the U.S.
• A domain also has a data-type or a format defined for it.
– The USA_phone_numbers may have a format: (ddd)ddd-dddd where
each d is a decimal digit.
– Dates have various formats such as year, month, date formatted
as yyyy-mm-dd, or as dd mm,yyyy etc.

• The attribute name designates the role played by a domain in a


relation:
– Used to interpret the meaning of the data elements corresponding
to that attribute
– Example: The domain Date may be used to define two attributes
named “Invoice-date” and “Payment-date” with different meanings
Formal Definitions -
State
• The relation state is a subset of the Cartesian
product of the domains of its attributes
– each domain contains the set of all possible values the
attribute can take.
• Example: attribute Cust-name is defined over the
domain of character strings of maximum length
25
– dom(Cust-name) is varchar(25)
• The role these strings play in the CUSTOMER
relation is that of the name of a customer.
Cartesian (example)
An illustrative example is the standard 52-card deck. The standard
playing card ranks {A, K, Q, J, 10, 9, 8, 7, 6, 5, 4, 3, 2} form a 13-element
set. The card suits {♠, ♥, ♦, ♣} form a four-element set.
The Cartesian product of these sets (Suits × Ranks) returns a 52-
element set of the form {(♠, A), (♠, K), (♠, Q), (♠, J), (♠, 10), ..., (♥, 6), (♥,
5), (♥, 4), (♥, 3), (♥, 2)}
Definition Summary
Characteristics Of
Relations
• Ordering of tuples in a relation r(R):
– The tuples are not considered to be ordered,
even though they appear to be in the tabular
form.
• Ordering of attributes in a relation schema R (and
of values within each tuple):
– We will consider the attributes in R(A1, A2, ...,
An) and the values in t=<v1, v2, ..., vn> to be
ordered .
• (However, a more general alternative definition of relation
does not require this ordering).
Characteristics Of
Relations
• Values in a tuple:
– All values are considered atomic (indivisible).
– Each value in a tuple must be from the domain of the
attribute for that column
• If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state
r of R(A1, A2, …, An)
• Then each vi must be a value from dom(Ai)

– A special null value is used to represent values that


are unknown or inapplicable to certain tuples.
Relational Integrity
Constraints
• Constraints are conditions that must hold on all valid
relation states.
• There are three main types of constraints in the relational
model:
– Key constraints
– Entity integrity constraints
– Referential integrity constraints
• Another implicit constraint is the domain constraint
– Every value in a tuple must be from the domain of its
attribute (or it could be null, if allowed for that attribute)
Key Constraints
• Superkey of R:
– Is a set of attributes SK of R with the following condition:
• No two tuples in any valid relation state r(R) will have the
same value for SK
• That is, for any distinct tuples t1 and t2 in r(R), t1[SK]  t2[SK]
• This condition must hold in any valid state r(R)
• Key of R:
– A "minimal" superkey
– That is, a key is a superkey K such that removal of any
attribute from K results in a set of attributes that is not a
superkey (does not possess the superkey uniqueness
property)
Difference between
Key and Superkey
SUPER KEY: Attribute or set of attributes used to uniquely identify tuples
in the database.
CANDIDATE KEY:
• Minimal super key is the candidate key
• Can be one or many
• Potential primary keys
• not null
• attribute or set of attributes to uniquely identify records in DB
PRIMARY KEY:
• one of the candidate key which is used to identify records in DB
uniquely
• not null
COMPANY Database
Schema
Entity Integrity
• Entity Integrity:
– The primary key attributes PK of each relation schema
R in S cannot have null values in any tuple of r(R).
• This is because primary key values are used to identify the
individual tuples.
• t[PK]  null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of these
attributes
– Note: Other attributes of R may be constrained to
disallow null values, even though they are not
members of the primary key.
Referential Integrity
• A constraint involving two relations
– The previous constraints involve a single relation.
• Used to specify a relationship among tuples in
two relations:
– The referencing relation and the referenced
relation.
Referential Integrity (or
foreign key) Constraint
• Statement of the constraint
– The value in the foreign key column (or columns) FK of
the the referencing relation R1 can be either:
• (1) a value of an existing primary key value of a corresponding
primary key PK in the referenced relation R2, or
• (2) a null.
• In case (2), the FK in R1 should not be a part of
its own primary key.
Displaying a relational
database schema and

its constraints
Each relation schema can be displayed as a row of
attribute names
• The name of the relation is written above the attribute
names
• The primary key attribute (or attributes) will be underlined
• A foreign key (referential integrity) constraints is displayed
as a directed arc (arrow) from the foreign key attributes to
the referenced table
– Can also point the the primary key of the referenced relation
for clarity
• Next slide shows the COMPANY relational schema
diagram
Referential Integrity
Constraints for
COMPANY database
Other Types of
Constraints
• Semantic Integrity Constraints:
– based on application semantics and cannot be
expressed by the model per se
– Example: “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
• A constraint specification language may have
to be used to express these
• SQL-99 allows triggers and ASSERTIONS to
express for some of these
Populated database
state
• Each relation will have many tuples in its current relation
state
• The relational database state is a union of all the
individual relation states
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
– INSERT a new tuple in a relation
– DELETE an existing tuple from a relation
– MODIFY an attribute of an existing tuple
• Next slide shows an example state for the COMPANY
database
Populated database
state for COMPANY
Update Operations on
Relations
• INSERT a tuple.
• DELETE a tuple.
• MODIFY a tuple.
• Integrity constraints should not be violated by the
update operations.
• Several update operations may have to be
grouped together.
• Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints.
Update Operations on
Relations
• In case of integrity violation, several actions can
be taken:
– Cancel the operation that causes the violation
(RESTRICT or REJECT option)
– Perform the operation but inform the user of the
violation
– Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
– Execute a user-specified error-correction routine
Possible violations for
each operation
• INSERT may violate any of the constraints:
– Domain constraint:
• if one of the attribute values provided for the new tuple is not
of the specified attribute domain
– Key constraint:
• if the value of a key attribute in the new tuple already exists in
another tuple in the relation
– Referential integrity:
• if a foreign key value in the new tuple references a primary key
value that does not exist in the referenced relation
– Entity integrity:
• if the primary key value is null in the new tuple
Possible violations for
each operation
• DELETE may violate only referential integrity:
– If the primary key value of the tuple being deleted is
referenced from other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE,
SET NULL
– RESTRICT option: reject the deletion
– CASCADE option: propagate the new primary key value into the
foreign keys of the referencing tuples
– SET NULL option: set the foreign keys of the referencing tuples
to NULL
– One of the above options must be specified during
database design for each foreign key constraint
Possible violations for
each operation
• UPDATE may violate domain constraint and NOT NULL
constraint on an attribute being modified
• Any of the other constraints may also be violated,
depending on the attribute being updated:
– Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
– Updating a foreign key (FK):
• May violate referential integrity
– Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints
Building an Application
with a DBMS
• Requirements modeling (conceptual, pictures)
– Decide what entities should be part of the application
and how they should be linked.
• Schema design and implementation
– Decide on a set of tables, attributes.
– Define the tables in the database system.
– Populate database (insert tuples).
• Write application programs using the DBMS
– way easier now that the data management is taken
care of.
Database Design
• Why do we need it?
– Agree on structure of the database before deciding on
a particular implementation.
• Consider issues such as:
– What entities to model
– How entities are related
– What constraints exist in the domain
– How to achieve good designs
2. Entity / Relationship
Diagrams
Entities Product

Attributes address

Relationships between entities buys


Keys in E/R Diagrams
• Every entity set must have a key

name category

price

Product
name category
name

price
makes Company
Product

stockprice

buys
employs

Person

address name ssn


• A mathematical definition:
– if A, B are sets, then a relation R is a subset of
AxB
• A={1,2,3}, B={a,b,c,d},
R = {(1,a), (1,c), (3,b)}

- makes is a subset of Product x Company:


makes Company
Product
Multiplicity of E/R
Relations

• one-one:
1 a
2 b
3 c
d
• many-one
1 a
2 b
3 c
d
• many-many
1 a
2 b
3 c
d
name category
name

price
makes Company
Product

stockprice
What does
this say ?
buys employs

Person

address name ssn


Multi-way Relationships
How do we model a purchase relationship between buyers,
products and stores?

Product

Purchase Store

Person
Arrows in Multiway
Relationships
Q: what does the arrow mean ?
Invoice
VideoStore

Rental Movie

Person
A: if I know the store, person, invoice, I know the
movie too
Arrows in Multiway
Relationships
Q: what do these arrow mean ?
Invoice
VideoStore

Rental Movie

Person
A: store, person, invoice determines movie and
store, invoice, movie determines person
Arrows in Multiway
Relationships
Q: how do I say: “invoice determines store” ?
A: no good way; best approximation:
Invoice
VideoStore

Rental Movie

Person

Q: Why is this incomplete ?


Roles in Relationships
What if we need an entity set twice in one relationship?

Product

Purchase Store

salesperson buyer

Person
Attributes on
Relationships
date
Product

Purchase Store

Person
Converting Multi-way
Relationships to Binary
date
ProductOf Product

Purchase
StoreOf Store

BuyerOf Person
From E/R Diagrams
to Relational Schema
• Entity set → relation
• Relationship → relation
Entity Set to Relation
name category

price

Product

Product(name, category, price)

name category price

gizmo gadgets $19.99


Relationships to
Relations
price name category
Start Year name

makes Company
Product

Stock price
Makes(product-name, product-category, company-name, year)
Product-name Product-Category Company-name Starting-year

gizmo gadgets gizmoWorks 1963


(watch out for attribute name conflicts)
Relationships to
Relations
price name category
Start Year name

makes Company
Product

Stock price
No need for Makes. Modify Product:

name category price StartYear companyName

gizmo gadgets 19.99 1963 gizmoWorks


Multi-way Relationships to
Relations
name address
Product

name price Purchase Store

Person
Purchase( , , )
ssn name
3. Design Principles
What’s wrong?

Product Purchase Person

Country President Person

Moral: be faithful!
Design Principles:
What’s Wrong?
date
Product

Purchase Store

Moral: pick the right


kind of entities.
personAddr personName
Design Principles:
What’s Wrong?

Dates date

Product

Purchase Store

Moral: don’t
complicate life more
than it already is.
Person
Modeling Subclasses

The world is inherently hierarchical. Some entities are


special cases of others
• We need a notion of subclass.
• This is supported naturally in object-oriented
formalisms.
Products

Software Educational
products products
Subclasses
name in E/R
category
Diagrams
price

Product

isa isa

Software Product Educational Product

platforms Age Group


Understanding
Subclasses
• Think in terms of records:
– Product field1
field2

– SoftwareProduct field1
field2
field3
– EducationalProduct
field1
field2
field4
field5
Subclasses to Relations
Product

name category

price

Product Sw.Product

isa isa
Ed.Product
Software Product Educational Product

platforms Age Group


Modeling UnionTypes
With Subclasses
FurniturePiece

Company
Person

Say: each piece of furniture is owned either


by a person, or by a company
Modeling Union Types
with Subclasses
Say: each piece of furniture is owned either by a
person, or by a company
Solution 1. Acceptable, imperfect (What’s wrong ?)
Person FurniturePiece Company

ownedByPerson ownedByPerson
Modeling Union Types
with Subclasses
Solution 2: better, more laborious
Owner

isa isa

ownedBy
Person Company

FurniturePiece
Constraints in E/R
Diagrams
• Finding constraints is part of the modeling process.
• Commonly used constraints:

• Keys: social security number uniquely identifies a person.

• Single-value constraints: a person can have only one father.

• Referential integrity constraints: if you work for a company, it


• must exist in the database.

• Other constraints: peoples’ ages are between 0 and 150.


Keys in E/R Diagrams
name category
Underline:

price

No formal way Product


to specify multiple
keys in E/R diagrams
Person

address name ssn


Single Value
Constraints

makes

v. s.

makes
Referential Integrity
Constraints
Product makes Company

Product makes Company


Thank you

You might also like