Fundamental Concepts in Relational Data Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

For useful Documents like this and Lots of more Educational and Technological Stuff Visit...

www.thecodexpert.com

Fundamental concepts in Relational Data Model


The Relation is the basic element in a relational data model.

A relation is subject to the following rules:

1. Relation (file, table) is a two-dimensional table.


2. Attribute (i.e. field or data item) is a column in the table.
3. Each column in the table has a unique name within that table.
4. Each column is homogeneous. Thus the entries in any column are all of the same type (e.g. age, name,
employee-number, etc).
5. Each column has a domain, the set of possible values that can appear in that column.
6. A Tuple (i.e. record) is a row in the table.
7. The order of the rows and columns is not important.
8. Values of a row all relate to some thing or portion of a thing.
9. Repeating groups (collections of logically related attributes that occur multiple times within one record
occurrence) are not allowed.
10. Duplicate rows are not allowed (candidate keys are designed to prevent this).
11. Cells must be single-valued (but can be variable length). Single valued means the following:
o Cannot contain multiple values such as 'A1, B2, and C3’.
o Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one thing and 'XYZ'
another.

Domain: A domain D is the original sets of atomic values used to model data.

By atomic, we mean that each value in the domain is indivisible as far as the relational model is concerned.
For example:

* The domain of day shift is the set of all possible days: {Mon, Tue, Wed…}

* The domain of salary is the set of all floating-point numbers greater than 0 and less than 200,000 (say).

* The domain of name is the set of character strings that represents names of person

Relation can be viewed as a “table”. In that table, each row represents a Tuple of data values and each column
represents an attribute.

1. Attribute: A column of a relation designated by name. The name associated should be meaningful. Each
attributes associates with a domain.

2. A relation schema denoted by R is a list of attributes (A1, A2, …, An)


For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

The degree of the relation is the number of attributes of its relation schema.

The cardinality of the relation is the number of tuples in the relation.

Example of relation, relation schema and attribute:

EMPLOYEE is Relation Name

Eid, Name, Birthdate, Salary, Department are called Attributes (or Columns)

Each row of the tables is called Tuple (or Row/Record)

Relation schema: EMPLOYEE (Eid, Name, Birthdate, Salary, Department)

Relational model
The relational model consists of three components:

1. A Structural component -- a set of TABLES (also called RELATIONS).


2. MANIPULATIVE component consisting of a set of high-level operations which act upon and produce
whole tables.
3. A SET OF RULES for maintaining the INTEGRITY of the database.

Constraints in Relational Data


Constraint is a very important feature in relational model. In fact, relational model support a well-defined
theory of constraint on attributes or tables.

Constraint is useful because it allows designer to specify the semantics of data in database and it is the rules to
enforce DBMSs to check that new data satisfies the semantics.

Keys

1. A simple key contains a single attribute.


2. A composite key is a key that contains more than one attribute.
3. A candidate key is an attribute (or set of attributes) that uniquely identifies a row. A candidate key
must possess the following properties:
o Unique identification - For every row the value of the key must uniquely identify that row.
o Non redundancy - No attribute in the key can be discarded without destroying the property of
unique identification.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

4. A primary key is the candidate key which is selected as the principal unique identifier. Every relation
must contain a primary key. The primary key is usually the key selected to identify a row when the
database is physically implemented. For example, a part number is selected instead of a part
description.
5. A superkey is any set of attributes that uniquely identifies a row. A superkey differs from a candidate
key in that it does not require the non redundancy property.
6. A foreign key is an attribute (or set of attributes) that appears (usually) as a non key attribute in one
relation and as a primary key attribute in another relation. I say usually because it is possible for a
foreign key to also be the whole or part of a primary key:
7. A semantic or natural key is a key for which the possible values have an obvious meaning to the user
or the data. For example, a semantic primary key for a COUNTRY entity might contain the value
'USA' for the occurrence describing the United States of America. The value 'USA' has meaning to the
user.

Integrity constraint
Domain restricts the values of attributes in the relation and it is a constraint of relational model. However,
there are real –world semantics on data that cannot specify if use only domain. We need more specific way to
state what data values are/are not allows, what format is suitable for an attributes. For example, employee
number must be unique; employees’ age is in the range [23, 65].
Such information is provided in logical statements called integrity constraints.

Data integrity allows defining certain data quality requirements that the data in the database needs to meet. If
a user tries to insert data that doesn't meet these requirements, DBMS will not allow so.

Not Null
A column in a table can be specified not null. It's not possible to insert a null in such a column. The default is
null. So, in the following create table statement, a null can be inserted into the column named c.

Unique Key
The unique constraint doesn't allow duplicate values in a column. If the unique constraint encompasses two or
more columns, no two equal combinations are allowed.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

Primary Key
On a technical level, a primary key combines a unique and a not null constraint. Additionally, a table can have
at most one primary key. After creating a primary key, it can be referenced by a foreign key.

Foreign Key
A foreign key constraint (also called referential integrity constraint) on a column ensures that the value in that
column is found in the primary key of another table.

In relational database, key are often implemented by introducing an attributes specially designed for being
key.

Superkey: {Name, Salary} , {Eid}, {Name, Birthdate}

Candidate key: {Eid}

Primary key: {Eid}

Entity constraint: No attribute in the primary key can be NULL. This is because, NULL values for the
primary key means we cannot identify some tuples. For example, in the EMPLOYEE relation showed above,
CellPhone cannot be a key since we cannot use this attribute to identify employees 20012322 and employee
19991323.

Referential Integrity constraint: The constraint that is specified between two relations and maintain the
correspondence between tuples in these relations. It means the reference from a tuple in one relation to other
relation must be valid.

Examples of Referential integrity constraint

* In the Bank Database (From Data Modelling lecture) : The ACCOUNT relation need to take note the
BRANCH where each account is held so in implementation, in each tuple of ACCOUNT relation, there is an
attribute such as branchname to identify the associate BRANCH . The referential integrity constraint must
state that the branchname attribute in the ACCOUNT relation refer to a valid branch (i.e. existing branch)

* In the Company Database: In the EMPLOYEE relation, we need to store the information about the
department where each employee works in. The attribute Department in EMPLOYEE relation is used for that
purpose so the values appears in the columns Department must be taken from set of values identify the
Department number in the relation DEPARTMENT if this relation exists.

Semantic constraints: This is a special kind of constraints that may have to enforce in relational database.
Such constraints describe the semantics of data in the database or sometimes called the rules on data. For
example, in the COMPANY database, we have the rule “An employee cannot take a part in more than 5
projects” or “Salary of an employee cannot exceed the salary of the employee’s manager”
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

Functional Dependency constraints: This constraint establishes a functional relationship among two sets of
attributes.

Constraint Checking using Query


Relational database instance is changing over time. At a moment of time, we can have an instance that
satisfied all the constraints but when some update operations performs, we must re-check the constraints.
There are three basic update operations on relations: insert a new record, delete an existing record and modify
an existing record.

Domain constraint checking:

* For insert operation, it is need to check attribute value for type and other domain restrictions.
* For delete operation, it is no need to check any domain constraints
* For update operation, it is also need to check attribute value for type and other domain restrictions.

The following changes satisfy domain constraints

* Insert Account(HaThanh, 50000, S-20071280)


* Insert Account(HaThan, 20000, C-20072242) ( it is looks ok but actually the data value is not correct)
* Update Account(HaThanh, 50000, S-20071280) to Account(HaThanh, 60000, S-20071280)

The changes that do not satisfy domain constraints:


For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

* Insert Account(HaThanh, 5000USD, S-20071280)


* Insert Account(DongDo, -20, C-12894349)
* Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-34252525)

Key constraint checking:

* For insert operation, it is need to check the key value does not occur in any existing tuple in the relation.
* For delete operation, it is no need to check any domain key constraints
* For update operation, if the key value is modified then need the same check as for insertion.

Changes that satisfy key constraints:

* Insert Account(DongDo, 20000, C-12894350) (there is no account with that account number in the
current relation)
* Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number 12334)
* Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-34252525) (key is not
modified)

Changes that do not satisfied key constraints:

* Insert Account(DongDo, 50000, C-12894350) (key is alredy there in the relation)


* Update Account(DongDo, 20000, C-12894350) to Account(DongDo, 20000, C-12894349) ( the account
C-12894349 is already in the relation)

Referential integrity constraint checking:

* For insert operation, it is need to check the foreign keys occur as primary keys in the referenced relation.
* For delete operation, check all relations that have foreign keys refering to this relation
* An update need to treat as delete - then – insert for referential constraints checking.

Changes that satisfy referential constraint

* Insert Account(ThangLong, 5000, C-12891230)


* Insert Account-Holder(111111, C-12891230)
* Update Customer(515016, Son, Hoan Kiem, HaThanh) to Customer(515016, Son, Hoan Kiem,
ThangLong)
* Delete Account-Holder(111111, C-12894350)

Changes that does not satisty referential constraint

* Insert Account-Holder(12334, C-12894350) ( no such customer)


* Insert Customer(222222, Nha, DongDa, An Binh) ( no such branch)
* Delete Customer with customerNumber = ‘111111’ ( this is not acceptable since there are tuples in
Account-Holder relation refer to this customer)

Deletion can violate referential constraint when the tuple being deleted is referenced by the foreign keys from
others tuples in a different relation. Several approaches are consider to handle this kind of violation.

* The first approach is simply disallow the deletion.


For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

* The second approach: User must find the refering tuple then either delete them manually or change their
foreign key to an acceptable value or NULL value ( not possible if the foreign key also forms part of the
primary key such as in the Account-Holder relation)
* The third approach: attempt to remove all refering tuple automatically (cascade)

When the referential constraint is specified in the database during the creation phase, the DBMSs will allow
user to specify which of the above approach applies when a violation occur.

Relational Algebra
The Relational Algebra was introduced by E. F. Codd in 1972. It consists of a set of operations on relations:

 SELECT
 PROJECT
 PRODUCT
 UNION
 INTERSECT
 DIFFERENCE

SELECT is used to obtain a subset of the tuples of a relation that satisfy a select condition.

For example, find all employees born after 1st Jan 1950:

SELECTdob '01/JAN/1950'(employee)

PROJECT operation is used to select a subset of the attributes of a relation by specifying the names of the
required attributes.

For example, to get a list of all employees surnames and employee numbers:

PROJECTsurname,empno(employee)
PRODUCT
Builds a relation from two specified relations consisting of all possible combinations of rows, one from each
of the two relations.

For example, consider two relations, A and B, consisting of rows:

A: a B: d => A product B: a d
b e a e
c b d
b e
c d
c e
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

UNION

Builds a relation consisting of all rows appearing in either or both of the two relations.

For example, consider two relations, A and B, consisting of rows:

A: a B: a => A union B: a
b e b
c c
e
INTERSECT

Builds a relation consisting of all rows appearing in both of the two relations

For example, consider two relations, A and B, consisting of rows:

A: a B: a => A intersect B: a
b e
c
DIFFERENCE
Builds a relation consisting of all rows appearing in the first and not in the second of the two relations.

For example, consider two relations, A and B, consisting of rows:

A: a B: a => A - B: b and B - A: e
b e c
c

Logical Data Model

At the highest level, there are five big entities which can be defined and documented:

People Places Things Events Concepts

These five entities and the relationships among them can document anything in the entire spectrum of human
(or inhuman) experience. This highest-level model is sometimes called a Conceptual Data Model. It contains
major entities, broadly defined and without attributes or details.

The task of a Logical Data Model is to particularize the Conceptual Data Model entities and relate them to
each other, creating a data structure which supports the intellectual and physical worlds in which work is
done.

A logical data model does not contain real data. Rather, it contains the infrastructure into which real data fits.
The following will describes the infrastructure and distinguishes it from the physical database structure.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

A. Metadata
Data in a relational data model is called metadata, i.e., data about data.
Metadata provides
 A commonly understood body of data which can be used in multiple applications and
 Common data structures which users from diverse process areas can populate with unique data
values.
B. Principles for Creating Metadata
When defining metadata, the following principles apply:
• Logical data is defined in the abstract and without redundancy.
• Logical data is defined independent of, and outside the context of, functions, processes, and
automated applications.
• Logical data is defined by users from diverse functional areas who need the same logical data.
• Logical data element names are consistent and meaningful; they are created according to naming
standards.
• Composite data is broken down into its smallest meaningful parts, each of which is defined
separately.

CODD Rules
In 1985 Codd produced the following set of 'rules' by which systems should be judged:

4.1. Information.
All information is represented at the logical level and in exactly one way - by values in tables.

4.2. Guaranteed access.


Each atomic value in a relational database is guaranteed to be logically accessible through a combination of
table-name, primary key value and column-name.

4.3. Systematic treatment of null values.


Null values are supported in a fully relational DBMS for representing missing information and inapplicable
information in a systematic way, independent of data type.

4.4. Database description.


The database description is represented at the logical level in the same way as ordinary data, so that authorised
users can apply the same relational language to its interrogation as they apply to the regular data.

4.5. Comprehensive sub-language.


A relational system may support several languages and various modes of terminal use. However there must be
at least one language whose statements are expressible through some well defined syntax as character strings,
and that is comprehensive in supporting all the following items:

1. Data definition
2. View definition
3. Data manipulation (interactive and by program)
4. Integrity constraints
5. Authorization
6. Transaction boundaries (begin, commit and rollback)
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

4.6. View updating.


All theoretically-updatable views are also updatable by the system. (A view is theoretically updatable if there
is a time-independent algorithm for unambiguously determining a single series of changes to the base tables,
having as their effect precisely the requested changes in the view.)

4.7. Insert and update.


The capability of handling a base table or a derived table as a single operand applies not only to retrieval of
data but also to insertion, updating, and deletion. (This allows the system to optimise its execution sequence
by determining the best access paths. It may be important in obtaining efficient handling of transactions across
a distributed database, avoiding the communications costs of transmitting separate requests for each record
obtained from remote sites.)

4.8. Physical data independence.


Application programs and terminal activities remain logically unimpaired whenever any changes are made in
either storage representations or access methods. (There must be a clear distinction between logical and
physical design levels.)

4.9. Logical data independence.


Application programs and terminal activities remain logically unimpaired when information-preserving
changes of any kind that theoretically permit unimpairment are made to the base tables. (This rule permits
logical database design to be changed dynamically, e.g. by splitting or joining base tables in ways which do
not entail loss of information.)

4.10. Integrity independence.


Integrity constraints must be definable in the relational data sub-language and storable in the catalogue, not in
the applications program. Certain integrity constraints hold for every relational database, further application-
specific rules may be added. The general rules relate to:

1. Entity integrity : no component of a primary key may have a null value.


2. Referential integrity : for each distinct non-null 'foreign key' value in the database, there must exist a
matching primary key value from the same domain.

4.11. Distribution independence.


A relational DBMS has distributional independence - i.e. if a distributed database is used it must be possible
to execute all relational operations upon it without knowing or being constrained by the physical locations of
data. This must apply both when distribution is originally introduced, and when data is redistributed.

4.12. Non-subversion.

A low-level (single record-at-a-time) language cannot be used to subvert or bypass the integrity rules and
constraints expressed in the higher-level (multiple record-at-a-time) language.
For useful Documents like this and Lots of more Educational and Technological Stuff Visit...
www.thecodexpert.com

For useful Documents like


this and
Lots of more
Educational and
Technological Stuff...

Visit...

www.thecodexpert.com

You might also like