504 Lecture4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Relational Model

Data Models and Database


Design
• Think “logically” when design a database, need
some kind of framework to design the database
• Like designing a data structure in some
programming language, a data model is like a
type system, but is abstract
• Organize the data into tables in the relational
data model
• Initially no need to worry about how these tables
are implemented
Relational Model Concepts
• The relational model of data is based on
the concept of a mathematical relation
• A relation is a mathematical concept
based on set theory
• The strength of the relational approach to
data management comes from the formal
foundation provided by the theory of
relations
Relational Model Concepts
• The model was first proposed by Dr. E.F.
Codd of IBM in 1970 in the following paper:
"A Relational Model for Large Shared Data
Banks," Communications of the ACM, June
1970

The above paper caused a major revolution in the


field of Database management and earned Ted
Codd the coveted Turing Award
What is a Relational Database?
Routes
RId RName Grade Rating Height
1 Last Tango II 12 100
2 Garden Path I 2 60
3 The Sluice I 8 60
4 Picnic III 3 400
Climbers Climbs
CId Cname Skill Age CId RId Date Duration
123 Edmund EXP 80 123 1 10/10/88 5
214 Arnold BEG 25 123 3 11/08/87 1
313 Bridget EXP 33 313 1 12/08/89 5
212 James MED 27 214 2 08/07/92 2
313 3 06/07/94 3
Why is the Database Like This?
• Each route has an id, a name, a grade (an
estimate of the time needed), a rating (how
difficult it is), and a height
• Each climber has an id, a name, a skill level and
an age
• A climb records who climbed what route on what
date and how long it took (duration)
• The data values in these tables are all “simple”
Describing Relations
• Relations are described by a schema which
can be expressed in various ways
• A database schema is usually expressed in a
data definition language (DDL)
Routes(RId:int, RName:string, Grade:string,
Rating:int, Height:int)
Climbers(CId:int, CNname:string,
Skill:string, Age:int)
Climbs(CId:int, RId:int, Date:date,
Duration:int)
Expressing Constraints
• In SQL, constraints are defined as follows:

CREATE TABLE Climbers CREATE TABLE Climbs


(CId INTEGER, (CId INTEGER,
CName CHAR(20), RId INTEGER,
Skill CHAR(4), Date DATE,
Age INTEGER, Duration INTEGER,
PRIMARY KEY (CId), PRIMARY KEY (CId, RId, Date),
UNIQUE (CName,Age)) FOREIGN KEY (CId) REFERENCES
Climbers,
FOREIGN KEY (RId) REFERENCES
Routes)
Informal Definition
• RELATION: A table of values
– A relation: a set of rows
– A relation: alternately a set of columns
– Each row represents a fact that corresponds to a real-
world entity or relationship
– Each row has a value of an item or set of items that
uniquely identifies that row in the table
– Sometimes row-ids or sequential numbers are assigned
to identify the rows in the table
– Each column: called by its column name or column
header or attribute name
Relational Model Terminology
• Table = relation
• Column headers = attributes
• Row = tuple
• Possible values of each attribute = domain
– E.g., the domain of CName is string and that for Rating is real
• Relation schema = relation name + attributes + other
structure info.,
– E.g., keys, other constraints
• Relation instance is the current set of rows for a relation
schema
• Database schema = collection of relation schemas
Schema Definition
• The schema of a Relation: R (A1, A2, .....
An )
• Relation schema R is defined over
attributes A1, A2, .....An
– R: Name
– Degree: # of attributes n
– Cardinality: # of tuples (rows)
Example
CUSTOMER (Cust-id, Cust-name, Address, Phone#)

• CUSTOMER is a relation defined over four attributes


Cust-id, Cust-name, Address, Phone#
• Each attribute has a domain or a set of valid values
– E.g., the domain of Cust-id could be 6 digit numbers
• A tuple is an ordered set of values
– <632895, "John Smith", "101 Main St. Atlanta, GA 30332",
"(404) 894-2000">
– A relation may be regarded as a set of tuples (rows)
Formal Definition
• A relation is a subset of the Cartesian product of the
domains
– Domain is used in a specific role conveyed by the attribute
name
– E.g., attribute Cust-name is defined over the domain of strings
of 26 characters. The role these strings play in the CUSTOMER
relation is the name of customers
• Formally,
Given R(A1, A2, .........., An)
r(R) Ì dom (A1) X dom (A2) X ....X dom(An)
R: schema of the relation (intension)
r(R): a specific "value" or population of R (extension)
Example
• Let D1 = {0,1}
• Let D2 = {a, b, c}
• Let r(R) Ì D1 X D2
• For example: r(R) = {<0,a> , <0,b> , <1,c>}
is one possible “state” or “population” or
“extension” r of the relation R, defined over
domains D1 and D2
Definition Summary
Informal Terms Formal Terms
Table Relation
Column name Attribute
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
Example
Characteristics of Relations
• Ordering of tuples in a relation
– Not ordered (like a set)
• Ordering of attributes and their
values in a relation
– Not important as long as the
correspondence between attributes and
values is maintained
Characteristics of Relations
• Values in a tuple:
– A set of atomic values
– Each value in the domain is indivisible
– Composite and multi-valued attributes are not
allowed
– null value: unknown or inapplicable to certain
tuples
Some Relational Model Notation

• t[Ai] = vi (the value of attribute Ai for tuple


t)
• t[Au, Av, ..., Aw]: the tuple of t containing
the values of attributes Au, Av, ..., Aw
Example
Relational Data Model
Relational Integrity Constraints
• Constraints are conditions that must hold
on all valid relation instances
• Three main constraints:
1. Key constraints (single relation)
2. Entity integrity constraints (single relation)
3. Referential integrity constraints (two
relations)
Key Constraints
• Superkey: A set of attributes SK of R such
that no two tuples in any valid relation
instance r(R) will have the same value for
SK
– For any distinct tuples t1 and t2 in r(R), t1[SK] ¹
t2[SK]
• Candidate Key: A "minimal" superkey
– A superkey K such that removal of any
attribute from K results in a set of attributes
that is not a superkey
Example 1
CAR(State, Reg#, SerialNo, Make, Model, Year)

• Two candidate keys:


– Key1 = {State, Reg#}
– Key2 = {SerialNo}
– Are Key1 and Key2 superkeys?
• {SerialNo, Make}, candidate key or superkey?
• Primary key: If a relation has several
candidate keys, one is chosen arbitrarily to be
the primary key
– The primary key attributes are underlined
– Implies “not null”
Example 2
• The CAR relation, with two candidate keys:
License-number and Engine_serial_number
Entity Integrity
• Relational Database Schema: A set of relation
schemas that belong to the same database
– S is the name of the database
S = {R1, R2, ..., Rn}
• Entity Integrity: The PK (primary key) attributes of
each relation schema R in S cannot have null values
in any tuple of r(R)
– t[PK] ¹ null for any tuple t in r(R)
– Why?
• Other attributes of R may be similarly constrained to
disallow null values, even though they are not
members of the primary key
Referential Integrity
Constraint
• If a tuple in R1 refers to R2, it must refer to
an existing tuple in that relation
• E.g., Dno in every Employee tuple must
match Dnumber value of some tuple in the
Department relation
Referential Integrity
(Foreign Key Constraints)
• A constraint involving two relations
• Specify a relationship among tuples in two
relations:
– The referencing relation (R1) and the referenced
relation (R2)
– FK (foreign key attributes) of R1 reference PK
(primary key attributes) or candidate key of R2
• Displayed in a relational database schema as
a directed arc from R1.FK to R2.PK or R2.CK
Foreign Key
• A set of attributes in R1 is a foreign key of R1 that
references relation R2 if it satisfies the following
two rules:
– The attributes in FK have the same domain(s) as the
primary key (PK) attributes or candidate key of R2
– A value in the foreign key column (or columns) of the
the referencing relation R1 can be either:
1. a value of an existing primary key or candidate key value in
the referenced relation R2
2. is null
• In case (2), the FK in R1 should not be a part of
its own primary key
Example

• If no row exists in R2 – violation of referential integrity


• Not all rows in R2 need to be referenced
• Value of a foreign key might not be specified
• Names of K1 and K2 need not to be the same
Foreign Key (Cont.)
• Foreign key can refer to its own relation
• E.g. SuperSSN is a FK that refers to
Employee relation itself
Example 1
• Schema diagram for the COMPANY relational database
schema, the primary keys are underlined
Example 1 (Cont.)
• One possible relational database state corresponding to
the COMPANY schema
Example 1 (Cont.)
• Referential integrity constraints displayed on the
COMPANY relational database schema diagram
Example 2
Example 2 (Cont.)
Other Types of Constraints
• Domain constraints: values of each attribute
A must be an atomic value from the domain
for that attribute, dom(A)
• Semantic Integrity Constraints:
- Based on application semantics and cannot be
expressed by the model
- E.g., “one student cannot register for more than 9
credits”
- A constraint specification language may have to
be used to express these
- SQL-99 uses triggers and ASSERTIONS to allow for
some of these constraints
Exercise
Consider the following relations for a database that keeps track of
student enrollment in courses and the books adopted for each
course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Add foreign key constraints to the schema above
Update Operations on Relations
• INSERT a tuple
– Domain, key, entity and referential integrity constraints can
be violated
• DELETE a tuple
– Only referential integrity constraint can be violated
• MODIFY a tuple
– Modify PK: delete one tuple and insert another in its place
– Modify FK: make sure the new value refers to an existing
tuple in the referenced relation (or is null)
– Neither PK nor FK: only check new value is of the correct
domain and data type
Update Operations on Relations
(Cont.)
• Integrity constraints should not be violated
by the update operations
• Several update operations may have to be
grouped together
• Updates may propagate to cause other
updates automatically
Update Operations on Relations
(Cont.)
• In case of integrity violation, several actions
can be taken:
– Cancel the operation that causes the violation
(REJECT option)
– Perform the operation but inform the user of the
violation
– Trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option)
– Execute a user-specified error-correction routine
Example
• The instances below satisfy some constraints
Climbers: Climbs:
CId CName Skill Age CId RId Date Duration
123 Edmund EXP 80 123 1 10/10/88 5
214 Arnold BEG 25 123 3 11/08/87 1
313 Bridget EXP 33 313 1 12/08/89 5
212 James MED 27 214 2 08/07/92 2
313 1 06/07/94 3

• Insert (123, Jeremy, MED, 16) into Climbers?


• Insert (456, 2, 09/13/98, 3) into Climbs?
• Delete (313, Bridget, EXP, 33) from Climbers?
• Modify 123 to 456 in Climbers?

You might also like