Sta 2107 Database Mangement 2017 - Copy-1-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

Database System Notes

Database and File System are two methods used to store, retrieve, manage and manipulate
data. Both systems can be used to allow the user to work with data in a similar way. A File
System is a collection of raw data files stored in the hard-drive, whereas a database is
intended for easily organizing, storing and retrieving large amounts of data. In other words,
a database holds a bundle of organized data (typically in a digital form) for one or more
users. Databases, often abbreviated DB, are classified according to their content, such as
document-text, bibliographic and statistical. It should be noted that, even in a database, data
are eventually (physically) stored in some sort of files.
What is a File system?
As mentioned above, in a typical File System electronic data are directly stored in a set of
files. If only one table is stored in a file, it is called a flat file. They contain values in each row
separated with a special delimiter like commas. In order to query some random data, first it
is required to parse each row and load it to an array at run time, but for this file should be
read sequentially (because, there is no control mechanism in files); therefore it is quite
inefficient and time consuming. The burden of locating the necessary file, going through the
records (line by line), checking for the existence of a certain data and remembering what
files/records to edit are on the user. The user either has to perform each task manually or
has to write a script that does them automatically with the help of the file management
capabilities of the operating system. Because of these reasons, File Systems are easily
vulnerable to serious issues like inconsistency, inability to maintain concurrency, data
isolation, threats on integrity and lack of security.
What is a Database?
A database is a collection of information stored, that is organized so that it can easily be
accessed, managed, and updated.
A Database may contain different levels of abstraction in its architecture. Typically, the
three levels: external, conceptual and internal make up the database architecture. External
level defines how the users view the data. A single database can have multiple views. The
internal level defines how the data is physically stored. The conceptual level is
the communication medium between internal and external levels. It provides a unique view
of the database regardless of how it is stored or viewed. There are several types of
databases such as Analytical databases, Data warehouses and Distributed databases.
Databases (more correctly, relational databases) are made up of tables, and they contain
rows and columns, much like spreadsheets in Excel. Each column corresponds to an
attribute while each row represents a single record. For example, in a database, which
stores employee information of a company, the columns could contain employee name,
employee Id and salary, while a single row represents a single employee. Most databases
come with a Database Management System (DBMS) that makes it very easy to
create/manage/organize data.
What is the difference between File system and Database?
As a summery, in a File System, files are used to store data while, a database is a collection
of organized data. Although File System and databases are two ways of managing data,
databases clearly have many advantages over File Systems. Typically when using a File
System, most tasks such as storage, retrieval and search are done manually (even though
most operating systems provide graphical interfaces to make these tasks easier) and it is
quite tedious whereas when using a database, the inbuilt DBMS will provide automated
methods to complete these tasks.

Because of this reason, using a File System will lead to problems like data integrity, data
inconsistency and data security, but these problems could be avoided by using a database.
Unlike a File System, databases are efficient because reading line by line is not required, and
certain control mechanisms are in place.
Disadvantages of File Systems
 Data redundancy is found- the presence of duplicate data in multiple data files e.g.
Customer Name, NI number, Address is present in several different files on several
different systems. Errors are generated, time entering data is wasted, computer
resources are needlessly taken up and updates can be an enormous problem.
 Data inconsistency will become a problem e.g. information is duplicated in each
system and may be updated in most systems but not necessarily in all – the savings
account and loan account may have different addresses for a particular customer.
 Lack of data independence. (Data dependence refers to the close relationship
between data stored in files and the specific software programs needed to update
and maintain those files). In the bank imagine cases where any change in data
format or structure requires a change in all the programs that access the data. E.g.
effort required to change from 3-digit STD to 4-digit STD may be very difficult.
 Data lacks integrity, i.e. this is the quality by which information from the system
can be trusted. The problems already stated show that information can be out of
date, can have different values in different parts of the system, can be inaccurate etc.
 Lack of flexibility i.e. creating reports, e.g. in this system a summary of account
balances in each separate account would require separate reports for each open
account or cutting and pasting into a word processor file – not a satisfactory state of
affairs.
 Data not shareable. Similarly if different departments want to access data this
would be very awkward.
Main Functions of Databases System.
 Facilitate remote data entry.
 Facilitate information and resource sharing.
 Facilitate long term and large amount of information storage
 Facilitate centralized information storage.

A DBMS (Database management System) Vs Database Systems (DBS)


A DBMS is application software that lets you create, organize, update, store, and
retrieve data from a single database or several databases. Lets you transform or map
data from one model to another, or between the central model and stored database
while database is a collection of information stored, that is organized so that it can
easily be accessed, managed, and updated.

There are many functions a Database Management System (DBMS) serves that are key
components to the operation of database management. When deciding to implement a
DBMS in your business, first you must decide what type of DBMS you want. Common types
of DBMS are the relational, network, hierarchy and object oriented models. Each kind of
database structure has its own pros and cons.

While each is unique in its own way, there are some standard functions of a DBMS, and
these are:

The ability to update and retrieve data

This is a fundamental component of a DBMS and essential to database management.


Without the ability to view or manipulate data, there would be no point to using a database
system.

Updating data in a database includes adding new records, deleting existing records and
changing information within a record. The user does not need to be aware of how DBMS
structures this data, all the user needs to be aware of is the availability of updating and/or
pulling up information, the DBMS handles the processes and the structure of the data on a
disk.

Support Concurrent Updates

Concurrent updates occur when multiple users make updates to the database
simultaneously. Supporting concurrent updates is also crucial to database management as
this component ensures that updates are made correctly and the end result is accurate.
Without DBMS intervention, important data could be lost and/or inaccurate data stored.

DBMS uses features to support concurrent updates such as batch processing, locking, two-
phase locking, and time stamping to help make certain that updates are done accurately.
Again, the user is not aware all this is happening as it is the database management system’s
responsibility to make sure all updates are stored properly.

Backup and recovery of Data

In the event a catastrophe occurs, DBMS must provide ways to recover a database so that
data is not permanently lost. There are times computers may crash, a fire or other natural
disaster may occur, or a user may enter incorrect information invalidating or making
records inconsistent.

If the database is destroyed or damaged in any way, the DBMS must be able to recover the
correct state of the database, and this process is called Recovery. The easiest way to do this
is to make regular backups of information. This can be done at a set structured time so in
the event a disaster occurs, the database can be restored to the state that it was last at prior
to backup.

A disadvantage to this is any data or changes entered after the backup would be lost. A way
to counteract this is to set the DBMS to provide a feature called Journaling. This involves
keeping a log of all updates made to the database, it is maintained in a file separate from the
database and can be obtained to re-update the database after it is recovered from the
backup.

Security

Security is the prevention of unauthorized users accessing the database. DBMS uses
features such as encryption, authentication, authorization and views to provide security to
the database. Encryption is when DBMS converts the data in a database to an
indecipherable format. No unauthorized person trying to access this information will be
able to read it. Authorized users will be able to see it in normal form.

Authentication is a technique in which the database administrator can identify the person
accessing the database. Authorized users are given passwords and successful entry of a
valid password will allow the user entry into the database, if a password is not successfully
entered, the user will be denied access. Authorization is a set of rules that the database
administrator (DBA) sets up to specify levels of usage that individuals or groups are allowed
to have. Some users may only be allowed viewing options, while others may be allowed to
both view/make changes.

In some circumstances, users may only be allowed to access certain pieces of the database,
and be denied access to areas that does not relate to their specific needs. In these cases, the
DBA will assign workgroups, and these workgroups will be assigned levels of access and
permissions. Views are when DBA allows certain users the ability to view the tables or fields
that pertain to them, any other view does not exist for them in the database, DBMS does this
behind the scenes and to the user it appears that the information they see are the only
existing data.

Data Integrity

Data integrity is an important function in database management. This is a set of rules that
DBMS provides to see that data integrity is enforced, thus avoiding incorrect or inconsistent
data. Types of integrity that DBMS provides are data type, legal values and format. Key
integrity also falls into this function of DBMS. This enforces that the primary key of a record
remains unique.

Without any of the above listed functions, a database would not be able to work effectively.
Each of these functions plays an important role in database management. And many more
such as

Builds the database – creates and maintains the database schema.

Manages the database - has to ensure problems do not arise if two people simultaneously
access a record and try to update it.

Governs interactions between application programs, input data and the database itself i.e.
allows users to store, retrieve and update as easily as possible without having to be aware
of the internal structure of the database.
Provides an interface between users and the database.

Backup and Recovery in event of a system failure.

DBMS Benefits to businesses

 Improved strategic use of corporate data

 Reduced complexity of the organizations information systems

 environment

 Reduced data redundancy and inconsistency

 Enhanced data integrity

 Application-data independence

 Improved security

 Reduced application development and maintenance costs

 Improved flexibility of information systems

 Increased access and availability of data and information

 Logical & Physical data independence

 Concurrent access anomalies.

 Facilitate atomicity problem.

 Provides central control on the system through DBA.

A transaction

A transaction is a logical unit of work that contains one or more SQL statements. A
transaction is an atomic unit. The effects of all the SQL statements in a transaction can be
either all committed (applied to the database) or all rolled back (undone from the
database).

A transaction begins with the first executable SQL statement. A transaction ends when it is
committed or rolled back, either explicitly with a COMMIT or ROLLBACK statement or
implicitly when a DDL statement is issued.
To illustrate the concept of a transaction, consider a banking database. When a bank
customer transfers money from a savings account to a checking account, the transaction can
consist of three separate operations:

 Decrement the savings account


 Increment the checking account
 Record the transaction in the transaction journal

Oracle must allow for two situations. If all three SQL statements can be performed to
maintain the accounts in proper balance, the effects of the transaction can be applied to the
database. However, if a problem such as insufficient funds, invalid account number, or a
hardware failure prevents one or two of the statements in the transaction from completing,
the entire transaction must be rolled back so that the balance of all accounts is correct.

A Banking

Transaction

ACID properties
(all good DBMS's should guarantee these)
 Atomicity
o should not be able to execute half of an operation
o either all or none of the effects of a transaction are made permanent
 Consistency
o there should be no surprises in the world, e.g., gpa > 4.0, balance < 0, cats
should never have more than 1 tail!
o the effect of concurrent transactions is equivalent to some serial execution
o use constraints, triggers, active DB elements (context-free)
 Isolation
o concurrency control
o transactions should not be able to observe the partial effects of other
transactions
o use locks (whole relations or individual tuples?)
 Durability
o if power goes out, nothing bad should happen
o once accepted, the effects of a transaction are permanent (until, of course,
changed by another transaction)
o use logs

Applications of database systems


(Shifts in application domains help illustrate evolution of DBMS's)

 Reservation systems, banking systems


 Record/book keeping (corporate, university, medical), statistics
 Bioinformatics, e.g., gene databases
 Criminal justice
o fingerprint matching
o how do you encode `looks like'?
 Multimedia systems
o require terabytes (1012 bytes) of storage
o tertiary storage devices, e.g., CD, DVDs
o image/audio/video retrieval
o streaming, interactivity
 Satellite imaging; can require petabytes (1015 bytes) of storage
 The web
o client-server and multi-tier architectures
o almost all data-intensive websites are database-driven; IMDB.com is an
exception
 Information integration
o Over the web
o Legacy systems; must deal with issues of
 synonymy: different words having the same meaning, e.g., coffee
shop vs. café
 polysemy: same word (homonym) having different meanings, e.g.,
shot
o Data warehouses
o Data mining (KDD, Knowledge Discovery in Databases), e.g., association
rules: `diapers → beer'; we pass these on to the marketing folks
 in sum, databases are everywhere!

Three classical data models

1. Hierarchical model

The hierarchical data model organizes data in a tree structure. There is a hierarchy of
parent and child data segments. This structure implies that a record can have repeating
information, generally in the child data segments. Data in a series of records, which have
a set of field values attached to it. It collects all the instances of a specific record
together as a record type. These record types are the equivalent of tables in the
relational model, and with the individual records being the equivalent of rows. To create
links between these record types, the hierarchical model uses Parent Child
Relationships. These are a 1:N mapping between record types. This is done by using
trees, like set theory used in the relational model, "borrowed" from maths. For example,
an organization might store information about an employee, such as name, employee
number, department, salary. The organization might also store information about an
employee's children, such as name and date of birth. The employee and children data
forms a hierarchy, where the employee data represents the parent segment and the
children data represents the child segment. If an employee has three children, then
there would be three child segments associated with one employee segment. In a
hierarchical database the parent-child relationship is one to many. This restricts a child
segment to having only one parent segment. Hierarchical DBMSs were popular from the
late 1960s, with the introduction of IBM's Information Management System (IMS)
DBMS, through the 1970s.
For example, the following is the hierarchical schema of a company database:

The tree representation of the above hierarchical schema is shown below:

The Hierarchical Data Model structures data in a tree of records, with each record having
one parent record and many children. It can be represented as follows:

Figure 1 - The Hierarchical Data Model


A hierarchical database consists of the following:

1. It contains nodes connected by branches.


2. The top node is called the root.
3. If multiple nodes appear at the top level, the nodes are called root segments.
4. The parent of node nx is a node directly above nx and connected to nx by a branch.
5. Each node (with the exception of the root) has exactly one parent.
6. The child of node nx is the node directly below nx and connected to nx by a branch.
7. One parent may have many children.

Network model

The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than one
parent per child. So, the network model permitted the modeling of many-to-many
relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL)
formally defined the network model. The basic data modeling construct in the network
model is the set construct. A set consists of an owner record type, a set name, and a
member record type. A member record type can have that role in more than one set,
hence the multiparent concept is supported. An owner record type can also be a
member or owner in another set. The data model is a simple network, and link and
intersection record types (called junction records by IDMS) may exist, as well as sets
between them . Thus, the complete network of relationships is represented by several
pairwise sets; in each set some (one) record type is owner (at the tail of the network
arrow) and one or more record types are members (at the head of the relationship
arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The
CODASYL network model is based on mathematical set theory.

The Network Data Model uses a lattice structure in which a record can have many parents
as well as many children. It can be represented as follows:

The Network Data Model

Like the The Hierarchical Data Model the Network Data Model also consists of nodes and
branches, but a child may have multiple parents within the network structure instead of
being restricted to just one.

Both hierarchical and network databases, and they both suffered from the following
deficiencies (when compared with relational databases):

 Access to the database was not via SQL query strings, but by a specific set of API's,
typically for FIND, CREATE, READ, UPDATE and DELETE.

 Each API would only access a single table (dataset), so it was not possible to
implement a JOIN, which would return data from several tables.

 It was not possible to provide a variable WHERE clause. The only selection
mechanism available was

a. Read all entries (a full table scan).


b. Read a single entry using a specific primary key.
c. Read all entries on a child table which were associated with a selected entry
on a parent table
d. Any further filtering had to be done within the application code.
 It was not possible to provide an ORDER BY clause. Data was presented in the order
in which it existed in the database. This mechanism could be tuned by specifying sort
criteria to be used when each record was inserted, but this had several disadvantages:

 Only a single sort sequence could be defined for each path (link to a parent), so all
records retrieved on that path would be provided in that sequence.
 It could make inserts rather slow when attempting to insert into the middle of a
large collection, or where a table had multiple paths each with its own set of sort
criteria.
Relational model
(RDBMS - relational database management system) A database based on the relational
model developed by E.F. Codd. A relational database allows the definition of data structures,
storage and retrieval operations and integrity constraints. In such a database the data and
relations between them are organised in tables. A table is a collection of records and each
record in a table contains the same fields.

Properties of Relational Tables:

 Values Are Atomic

 Each Row is Unique

 Column Values Are of the Same Kind

 The Sequence of Columns is Insignificant

 The Sequence of Rows is Insignificant

 Each Column Has a Unique Name

Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the product-
code fields of the two tables. This can be extended to joining multiple tables on
multiple fields. Because these relationships are only specified at retreival time,
relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.
The Relational Data Model

The Relational Data Model has the relation at its heart, but with a whole series of rules
governing it for example,

keys, relationships, joins, functional dependencies, transitive dependencies, multi-valued


dependencies, and modification anomalies.

The Relation is the basic element in a relational data model.

Figure 3 - Relations in the Relational Data Model

A relation is subject to the following rules:

1. Relation (file, table) is a two-dimensional table.


2. Attribute (i.e. field or data item) is a column in the table.
3. Each column in the table has a unique name within that table.
4. Each column is homogeneous. Thus the entries in any column are all of the same
type (e.g. age, name, employee-number, etc).
5. Each column has a domain, the set of possible values that can appear in that
column.
6. A Tuple (i.e. record) is a row in the table.
7. The order of the rows and columns is not important.
8. Values of a row all relate to some thing or portion of a thing.
9. Repeating groups (collections of logically related attributes that occur multiple
times within one record occurrence) are not allowed.
10. Duplicate rows are not allowed (candidate keys are designed to prevent this).
11. Cells must be single-valued (but can be variable length). Single valued means the
following:
 Cannot contain multiple values such as 'A1,B2,C3'.
 Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one
thing and 'XYZ' another.

A relation may be expressed using the notation R(A,B,C, ...) where:

 R = the name of the relation.


 (A,B,C, ...) = the attributes within the relation.
 A = the attribute(s) which form the primary key.
Keys

1. A simple key contains a single attribute.


2. A composite key is a key that contains more than one attribute.
3. A candidate key is an attribute (or set of attributes) that uniquely identifies a row.
A candidate key must possess the following properties:
in the key can be discarded without destroying the property of unique identification
Unique identification - For every row the value of the key must uniquely identify that
row.

Non redundancy - No attribute.

4. A primary key is the candidate key which is selected as the principal unique
identifier. Every relation must contain a primary key. The primary key is usually the
key selected to identify a row when the database is physically implemented. For
example, a part number is selected instead of a part description.
5. A superkey is any set of attributes that uniquely identifies a row. A superkey differs
from a candidate key in that it does not require the non redundancy property.
6. A foreign key is an attribute (or set of attributes) that appears (usually) as a non
key attribute in one relation and as a primary key attribute in another relation. I
say usuallybecause it is possible for a foreign key to also be the whole or part of a
primary key:
 A many-to-many relationship can only be implemented by introducing an
intersection or link table which then becomes the child in two one-to-many
relationships. The intersection table therefore has a foreign key for each of its
parents, and its primary key is a composite of both foreign keys.

 A one-to-one relationship requires that the child table has no more than one
occurrence for each parent, which can only be enforced by letting the foreign
key also serve as the primary key.

7. A semantic or natural key is a key for which the possible values have an obvious
meaning to the user or the data. For example, a semantic primary key for a COUNTRY
entity might contain the value 'USA' for the occurrence describing the United States of
America. The value 'USA' has meaning to the user.
8. A technical or surrogate or artificial key is a key for which the possible values
have no obvious meaning to the user or the data. These are used instead of semantic
keys for any of the following reasons:
 When the value in a semantic key is likely to be changed by the user,
or can have duplicates. For example, on a PERSON table it is unwise
to use PERSON_NAME as the key as it is possible to have more than
one person with the same name, or the name may change such as
through marriage.

 When none of the existing attributes can be used to guarantee


uniqueness. In this case adding an attribute whose value is
generated by the system, e.g from a sequence of numbers, is the only
way to provide a unique value. Typical examples would be
ORDER_ID and INVOICE_ID. The value '12345' has no meaning to the
user as it conveys nothing about the entity to which it relates.

9. A key functionally determines the other attributes in the row, thus it is always
a determinant.
10. Note that the term 'key' in most DBMS engines is implemented as an index which
does not allow duplicate entries.
Relationships

One table (relation) may be linked with another in what is known as a relationship.
Relationships may be built into the database structure to facilitate the operation
of relational joins at runtime.

1. A relationship is between two tables in what is known as a one-to-many or parent-


child or master-detail relationship where an occurrence on the 'one' or 'parent' or
'master' table may have any number of associated occurrences on the 'many' or
'child' or 'detail' table. To achieve this the child table must contain fields which link
back the primary key on the parent table. These fields on the child table are known
as a foreign key, and the parent table is referred to as the foreign table (from the
viewpoint of the child).

2. It is possible for a record on the parent table to exist without corresponding


records on the child table, but it should not be possible for an entry on the child table
to exist without a corresponding entry on the parent table.

3. A child record without a corresponding parent record is known as an orphan.


4. It is possible for a table to be related to itself. For this to be possible it needs
a foreign key which points back to the primary key. Note that these two keys cannot
be comprised of exactly the same fields otherwise the record could only ever point to
itself.

5. A table may be the subject of any number of relationships, and it may be


the parent in some and the child in others.

6. Some database engines allow a parent table to be linked via a candidate key, but if
this were changed it could result in the link to the child table being broken.

7. Some database engines allow relationships to be managed by rules known


as referential integrity or foreign key restraints. These will prevent entries
on child tables from being created if the foreign key does not exist on
the parent table, or will deal with entries on child tables when the entry on
the parent table is updated or deleted.

Personal Guidelines

Even if you obey all the preceding rules it is still possible to produce a database design that
causes problems during development. I have come across many different implementation
tips and techniques over the years, and some that have worked in one database system have
been successfully carried forward into a new database system. Some tips, on the other hand,
may only be applicable to a particular database system.

For particular options and limitations you must refer to your database manual.

Database Names

1. Database names should be short and meaningful, such


as products, purchasing and sales.
o Short, but not too short, as in prod or purch.

o Meaningful but not verbose, as in 'the database used to store product


details'.

2. Do not waste time using a prefix such as db to identify database names. The SQL
syntax analyser has the intelligence to work that out for itself - so should you.
3. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, it is better to stick to a standard naming convention such as:
o All uppercase.
o All lowercase (my preference - see The choice between upper and lower
case).

o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.

4. If a database name contains more than one word, such as in sales


orders and purchase orders, decide how to deal with it:
o Separate the words with a single space, as in sales orders (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in sales_orders (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in sales-orders.

o Use camel caps, as in SalesOrders.

Again, be consistent.

5. Rather than putting all the tables into a single database it may be better to create
separate databases for each logically related set of tables. This may help with
security, archiving, replication, etc.
Table Names

1. Table names should be short and meaningful, such as part, customer and invoice.
o Short, but not too short.

o Meaningful, but not verbose.

2. Do not waste time using a prefix such as tbl to identify table names. The SQL syntax
analyser has the intelligence to work that out for itself - so should you.
3. Table names should be in the singular (e.g. customer not customers). The fact that
a table may contain multiple entries is irrelevant - any multiplicity can be derived
from the existence of one-to-many relationships.
4. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, It is better to stick to a standard naming convention such as:
o All uppercase.

o All lowercase. (my preference - see The choice between upper and lower
case)
o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.

5. If a table name contains more than one word, such as in sales order and purchase
order, decide how to deal with it:
o Separate the words with a single space, as in sales order (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in sales_order (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in sales-order.

o Use camel caps, as in SalesOrder.

Again, be consistent.

6. Be careful if the same table name is used in more than one database - it may lead to
confusion.
Field Names

1. Field names should be short and meaningful, such


as part_name and customer_name.
o Short, but not too short, such as in ptnam.

o Meaningful, but not verbose, such as the name of the part.

2. Do not waste time using a prefix such as col or fld to identify column/field names.
The SQL syntax analyser has the intelligence to work that out for itself - so should
you.
3. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, it is better to stick to a standard naming convention such as:
o All uppercase.

o All lowercase. (my preference - see The choice between upper and lower
case)

o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.
4. If a field name contains more than one word, such as in part name and customer
name, decide how to deal with it:
o Separate the words with a single space, as in part name (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in part_name (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in part-name.

o Use camel caps, as in PartName.

Again, be consistent.

5. Common words in field names may be abbreviated, but be consistent.


o Do not allow a mixture of abbreviations, such as 'no', 'num' and 'nbr' for
'number'.

o Publish a list of standard abbreviations and enforce it.

6. Although field names must be unique within a table, it is possible to use the same
name on multiple tables even if they are unrelated, or they do not share the same set
of possible values. It is recommended that this practice should be avoided, for
reasons described in Field names should identify their content and The naming of
Foreign Keys.
Primary Keys

1. It is recommended that the primary key of an entity should be constructed from the
table name with a suffix of _ID. This makes it easy to identify the primary key in a
long list of field names.
2. Do not waste time using a prefix such as pk to identify primary key fields. This has
absolutely no meaning to any database engine or any application.
3. Avoid using generic names for all primary keys. It may seem a clever idea to use the
name ID for every primary key field, but this causes problems:
o It causes the same name to appear on multiple tables with totally different
contexts. The string ID='ABC123' is extremely vague as it gives no idea of the
entity being referenced. Is it an invoice id, customer id, or what?
o It also causes a problem with foreign keys.
4. There is no rule that says a primary key must consist of a single attribute - both
simple and composite keys are allowed - so don't waste time creating artificial keys.
5. Avoid the unnecessary use of technical keys. If a table already contains a satisfactory
unique identifier, whether composite or simple, there is no need to create another
one. Although the use of a technical key can be justified in certain circumstances, it
takes intelligence to know when those circumstances are right. The indiscriminate
use of technical keys shows a distinct lack of intelligence. For further views on this
subject please refer to Technical Keys - Their Uses and Abuses.
Foreign Keys

1. It is recommended that where a foreign key is required that you use the same name
as that of the associated primary key on the foreign table. It is a requirement of
arelational join that two relations can only be joined when they share at least one
common attribute, and this should be taken to mean the attribute name(s) as well as
the value(s). Thus where the customer and invoice tables are joined in a parent-
child relationship the following will result:
o The primary key of customer will be customer_id.

o The primary key of invoice will be invoice_id.

o The foreign key which joins invoice to customer will be customer_id.

2. For MySQL users this means that the shortened version of the join condition may be
used:
o Short: A LEFT JOIN B USING (a,b,c)

o Long: A LEFT JOIN B ON (A.a=B.a AND A.b=B.b AND A.c=B.c)

3. The only exception to this naming recommendation should be where a table


contains more than one foreign key to the same parent table, in which case the names
must be changed to avoid duplicates. In this situation I would simply add a
meaningful suffix to each name to identify the usage, such as:
o To signify movement I would use location_id_from and location_id_to.
o To signify positions in a hierarchy I would
use node_id_snr and node_id_jnr.
o To signify replacement I would use part_id_old and part_id_new.
I prefer to use a suffix rather than a prefix as it makes the leading characters match
(as in PART_ID_old and PART_ID_new) instead of having the traiing characters
match (as in old_PART_ID and new_PART_ID).

Do not waste time using a prefix such as fk to identify foreign key fields
Entity-Relationship Diagram (ERD)

An entity-relationship diagram (ERD) is a data modeling technique that creates a graphical


representation of the entities, and the relationships between entities, within an information
system. Any ER diagram has an equivalent relational table, and any relational table has an
equivalent ER diagram. ER diagramming is an invaluable aid to engineers in the design,
optimization, and debugging of database programs.

 The entity is a person, object, place or event for which data is collected. It is
equivalent to a database table. An entity can be defined by means of its properties,
called attributes. For example, the CUSTOMER entity may have attributes for such
things as name, address and telephone number.

 The relationship is the interaction between the entities. It can be described using a
verb such as:

o A customer places an order.


o A sales rep serves a customer.
o A order contains a product.
o A warehouse stores a product.

In an entity-relationship diagram entities are rendered as rectangles, and relationships are


portrayed as lines connecting the rectangles. One way of indicating which is the 'one' or
'parent' and which is the 'many' or 'child' in the relationship is to use an arrowhead, as
in figure 4.

Figure 4 - One-to-Many relationship using arrowhead notation

This can produce an ERD as shown in figure 5:

Figure 5 - ERD with arrowhead notation


Another method is to replace the arrowhead with a crowsfoot, as shown in figure 6:

Figure 6 - One-to-Many relationship using crowsfoot notation

The relating line can be enhanced to indicate cardinality which defines the relationship
between the entities in terms of numbers. An entity may be optional (zero or more) or it
may be mandatory (one or more).

 A single bar indicates one.


 A double bar indicates one and only one.
 A circle indicates zero.
 A crowsfoot or arrowhead indicates many.

As well as using lines and circles the cardinality can be expressed using numbers, as in:

 One-to-One expressed as 1:1


 Zero-to-Many expressed as 0:M
 One-to-Many expressed as 1:M
 Many-to-Many expressed as N:M

This can produce an ERD as shown in figure 7:

Figure 7 - ERD with crowsfoot notation and cardinality


In plain language the relationships can be expressed as follows:

 1 instance of a SALES REP serves 1 to many CUSTOMERS


 1 instance of a CUSTOMER places 1 to many ORDERS
 1 instance of an ORDER lists 1 to many PRODUCTS
 1 instance of a WAREHOUSE stores 0 to many PRODUCTS

In order to determine if a particular design is correct here is a simple test that I use:

1. Take the written rules and construct a diagram.


2. Take the diagram and try to reconstruct the written rules.

If the output from step (2) is not the same as the input to step (1) then something is wrong.
If the model allows a situation to exist which is not allowed in the real world then this could
lead to serious problems. The model must be an accurate representation of the real world in
order to be effective. If any ambiguities are allowed to creep in they could have disastrous
consequences.

We have now completed the logical data model, but before we can construct the physical
database there are several steps that must take place:

 Assign attributes (properties or values) to all the entities. After all, a table without
any columns will be of little use to anyone.

 Refine the model using a process known as 'normalisation'. This ensures that each
attribute is in the right place. During this process it may be necessary to create new
tables and new relationships.
Data Normalisation

Relational database theory, and the principles of normalisation, were first constructed by
people with a strong mathematical background. They wrote about databases using
terminology which was not easily understood outside those mathematical circles. Below is
an attempt to provide understandable explanations.

Data normalisation is a set of rules and techniques concerned with:

 Identifying relationships among attributes.


 Combining attributes to form relations.
 Combining relations to form a database.

It follows a set of rules worked out by E F Codd in 1970. A normalised relational database
provides several benefits:

 Elimination of redundant data storage.


 Close modeling of real world entities, processes, and their relationships.
 Structuring of data so that the model is flexible.

Because the principles of normalisation were first written using the same terminology as
was used to define the relational data model this led some people to think that
normalisation is difficult. Nothing could be more untrue. The principles of normalisation are
simple, common sense ideas that are easy to apply.

Although there are numerous steps in the normalisation process -


1NF, 2NF, 3NF, BCNF, 4NF, 5NF and DKNF - a lot of database designers often find it
unnecessary to go beyond 3rd Normal Form. This does not mean that those higher forms
are unimportant, just that the circumstances for which they were designed often do not
exist within a particular database. However, all database designers should be aware of all
the forms of normalisation so that they may be in a better position to detect when a
particular rule of normalisation is broken and then decide if it is necessary to take
appropriate action.
The guidelines for developing relations in 3rd Normal Form can be summarised as follows:

1. Define the attributes.


2. Group logically related attributes into relations.
3. Identify candidate keys for each relation.
4. Select a primary key for each relation.
5. Identify and remove repeating groups.
6. Combine relations with identical keys (1st normal form).
7. Identify all functional dependencies.
8. Decompose relations such that each non key attribute is dependent on all the
attributes in the key.
9. Combine relations with identical primary keys (2nd normal form).
10. Identify all transitive dependencies.

o Check relations for dependencies of one non key attribute with another non
key attribute.

o Check for dependencies within each primary key (i.e. dependencies of one
attribute in the key on other attributes within the key).

11. Decompose relations such that there are no transitive dependencies.


12. Combine relations with identical primary keys (3rd normal form) if there are
no transitive dependencies.

1st Normal Form

A table is in first normal form if all the key attributes have been defined and it
contains no repeating groups.
Taking the ORDER entity in figure 7 as an example we could end up with a set of attributes
like this:

ORDER

order_id customer_id product1 product2 product3

123 456 abc1 def1 ghi1

456 789 abc2

This structure creates the following problems:

 Order 123 has no room for more than 3 products.


 Order 456 has wasted space for product2 and product3.

In order to create a table that is in first normal form we must extract the repeating groups
and place them in a separate table, which I shall call ORDER_LINE.

ORDER

order_id customer_id

123 456

456 789

I have removed 'product1', 'product2' and 'product3', so there are no repeating groups.

ORDER_LINE

order_id product

123 abc1
123 def1

123 ghi1

456 abc2

Each row contains one product for one order, so this allows an order to contain any number
of products.

This results in a new version of the ERD, as shown in figure 8:

Figure 8 - ERD with ORDER and ORDER_LINE

The new relationships can be expressed as follows:

 1 instance of an ORDER has 1 to many ORDER LINES


 1 instance of a PRODUCT has 0 to many ORDER LINES
2nd Normal Form

A table is in second normal form (2NF) if and only if it is in 1NF and every non
key attribute is fully functionally dependent on the whole of the primary key (i.e.
there are no partial dependencies).
1. Anomalies can occur when attributes are dependent on only part of a multi-
attribute (composite) key.
2. A relation is in second normal form when all non-key attributes are dependent on
the whole key. That is, no attribute is dependent on only a part of the key.
3. Any relation having a key with a single attribute is in second normal form.

Take the following table structure as an example:

order(order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realise that cust_address and cust_contact are functionally


dependent on cust but not on order_id, therefore they are not dependent on the whole key.
To make this table 2NF these attributes must be removed and placed somewhere else.

3rd Normal Form

A table is in third normal form (3NF) if and only if it is in 2NF and every non key
attribute is non transitively dependent on the primary key (i.e. there are
no transitive dependencies).

1. Anomalies can occur when a relation contains one or more transitive dependencies.
2. A relation is in 3NF when it is in 2NF and has no transitive dependencies.
3. A relation is in 3NF when 'All non-key attributes are dependent on the key, the
whole key and nothing but the key'.

Take the following table structure as an example:

order(order_id, cust, cust_address, cust_contact, order_date, order_total)

Here we should realise that cust_address and cust_contact are functionally


dependent on cust which is not a key. To make this table 3NF these attributes must be
removed and placed somewhere else.

You must also note the use of calculated or derived fields. Take the example where a table
contains PRICE, QUANTITY and EXTENDED_PRICE where EXTENDED_PRICE is calculated as
QUANTITY multiplied by PRICE. As one of these values can be calculated from the other two
then it need not be held in the database table. Do not assume that it is safe to drop any one
of the three fields as a difference in the number of decimal places between the various fields
could lead to different results due to rounding errors. For example, take the following fields:

 AMOUNT - a monetary value in home currency, to 2 decimal places.


 EXCH_RATE - exchange rate, to 9 decimal places.
 CURRENCY_AMOUNT - amount expressed in foreign currency, calculated as
AMOUNT multiplied by EXCH_RATE.

If you were to drop EXCH_RATE could it be calculated back to its original 9 decimal places?

Reaching 3NF is is adequate for most practical needs, but there may be circumstances which
would benefit from further normalisation.

Normalizing an Example Table


These steps demonstrate the process of normalizing a fictitious student table.

1. Unnormalized table:

Student# Advisor Adv-Room Class1 Class2 Class3

1022 John 412 101-07 143-01 159-02

4123 Simon 216 201-01 211-02 214-01

2. First Normal Form: No Repeating Groups

Tables should have only two dimensions. Since one student has several classes, these classes
should be listed in a separate table. Fields Class1, Class2, and Class3 in the above records are
indications of design trouble.

Spreadsheets often use the third dimension, but tables should not. Another way to look at
this problem is with a one-to-many relationship, do not put the one side and the many side
in the same table. Instead, create another table in first normal form by eliminating the
repeating group (Class#), as shown below:

Student# Advisor Adv-Room Class#

1022 John 412 101-07

1022 John 412 143-01

1022 John 412 159-02

4123 Simon 216 201-01


4123 Simon 216 211-02

4123 Simon 216 214-01

Second Normal Form


 Create separate tables for sets of values that apply to multiple records.

 Relate these tables with a foreign key.

Records should not depend on anything other than a table's primary key (a compound key, if
necessary). For example, consider a customer's address in an accounting system. The address is
needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and
Collections tables. Instead of storing the customer's address as a separate entry in each of these
tables, store it in one place, either in the Customers table or in a separate Addresses table.

The following two tables demonstrate second normal form:

Students:

Student# Advisor Adv-Room

1022 John 412

4123 Simon 216

3.

Registration:

Student# Class#

1022 101-07

1022 143-01

1022 159-02

4123 201-01
4123 211-02

4123 214-01

Third Normal Form


Eliminate fields that do not depend on the key. Values in a record that are not part of that record's
key do not belong in the table. In general, any time the contents of a group of fields may apply to
more than a single record in the table, consider placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate's university name and address may be
included. But you need a complete list of universities for group mailings. If university information is
stored in the Candidates table, there is no way to list universities with no current candidates. Create
a separate Universities table and link it to the Candidates table with a university code key.
EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical.
If you have a Customers table and you want to eliminate all possible interfield dependencies, you
must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any
other factor that may be duplicated in multiple records. In theory, normalization is worth pursing.
However, many small tables may degrade performance or exceed open file and memory capacities.
It may be more feasible to apply third normal form only to data that changes frequently. If some
dependent fields remain, design your application to require the user to verify all related fields when
any one is changed.

4. Third Normal Form: Eliminate Data Not Dependent On Key

In the last example, Adv-Room (the advisor's office number) is functionally dependent on
the Advisor attribute. The solution is to move that attribute from the Students table to the
Faculty table, as shown below:

Students:

Student# Advisor

1022 John

4123 Simon

5.
Faculty:

Name Room Dept


John 412 42

Simon 216 42

Boyce-Codd Normal Form

A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every
determinant is a candidate key.

1. Anomalies can occur in relations in 3NF if there is a composite key in which part of
that key has a determinant which is not itself a candidate key.
2. This can be expressed as R(A,B,C), C A where:
o The relation contains attributes A, B and C.

o A and B form a candidate key.

o C is the determinant for A (A is functionally dependent on C).

o C is not part of any key.

3. Anomalies can also occur where a relation contains several candidate keys where:
o The keys contain more than one attribute (they are composite keys).
o An attribute is common to more than one key.

Take the following table structure as an example:

schedule(campus, course, class, time, room/bldg)

Take the following sample data:

campus course class time room/bldg

East English 101 1 8:00-9:00 212 AYE

East English 101 2 10:00-11:00 305 RFK


West English 101 3 8:00-9:00 102 PPR

Note that no two buildings on any of the university campuses have the same name, thus
ROOM/BLDG CAMPUS. As the determinant is not a candidate key this table is NOT in
Boyce-Codd normal form.

This table should be decomposed into the following relations:

R1(course, class, room/bldg, time)

R2(room/bldg, campus)

As another example take the following structure:

enrol(student#, s_name, course#, c_name, date_enrolled)

This table has the following candidate keys:

 (student#, course#)
 (student#, c_name)
 (s_name, course#) - this assumes that s_name is a unique identifier
 (s_name, c_name) - this assumes that c_name is a unique identifier

The relation is in 3NF but not in BCNF because of the following dependencies:

 student# s_name
 course# c_name
4th Normal Form

A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no
more than one multi-valued dependency.

1. Anomalies can occur in relations in BCNF if there is more than one multi-valued
dependency.
2. If A B and A C but B and C are unrelated, ie A (B,C) is false, then we have
more than one multi-valued dependency.
3. A relation is in 4NF when it is in BCNF and has no more than one multi-valued
dependency.

Take the following table structure as an example:


info(employee#, skills, hobbies)

Take the following sample data:

employee# skills hobbies

1 Programming Golf

1 Programming Bowling

1 Analysis Golf

1 Analysis Bowling

2 Analysis Golf

2 Analysis Gardening

2 Management Golf

2 Management Gardening

This table is difficult to maintain since adding a new hobby requires multiple new rows
corresponding to each skill. This problem is created by the pair of multi-valued
dependencies EMPLOYEE# SKILLS and EMPLOYEE# HOBBIES. A much better
alternative would be to decompose INFO into two relations:

skills(employee#, skill)

hobbies(employee#, hobby)

5th (Projection-Join) Normal Form

A table is in fifth normal form (5NF) or Projection-Join Normal Form (PJNF) if it is


in 4NF and it cannot have a lossless decomposition into any number of smaller
tables.

Another way of expressing this is:


... and each join dependency is a consequence of the candidate keys.

Yet another way of expressing this is:

... and there are no pairwise cyclical dependencies in the primary key comprised
of three or more attributes.

 Anomalies can occur in relations in 4NF if the primary key has three or more fields.
 5NF is based on the concept of join dependence - if a relation cannot be decomposed
any further then it is in 5NF.
 Pairwise cyclical dependency means that:
o You always need to know two values (pairwise).
o For any one you must know the other two (cyclical).

Take the following table structure as an example:

buying(buyer, vendor, item)

This is used to track buyers, what they buy, and from whom they buy.

Take the following sample data:

buyer vendor item

Sally Liz Claiborne Blouses

Mary Liz Claiborne Blouses

Sally Jordach Jeans

Mary Jordach Jeans

Sally Jordach Sneakers

The question is, what do you do if Claiborne starts to sell Jeans? How many records must
you create to record this fact?

The problem is there are pairwise cyclical dependencies in the primary key. That is, in order
to determine the item you must know the buyer and vendor, and to determine the vendor
you must know the buyer and the item, and finally to know the buyer you must know the
vendor and the item.

The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item, and
Vendor-Item.

6th (Domain-Key) Normal Form

A table is in sixth normal form (6NF) or Domain-Key normal form (DKNF) if it is


in 5NF and if all constraints and dependencies that should hold on the relation
can be enforced simply by enforcing the domain constraints and the key
constraints specified on the relation.

Another way of expressing this is:

... if every constraint on the table is a logical consequence of the definition of keys
and domains.

1. An domain constraint (better called an attribute constraint) is simply a constraint to


the effect a given attribute A of R takes its values from some given domain D.
2. A key constraint is simply a constraint to the effect that a given set A, B, ..., C of R
constitutes a key for R.

This standard was proposed by Ron Fagin in 1981, but interestingly enough he made no
note of multi-valued dependencies, join dependencies, or functional dependencies in his
paper and did not demonstrate how to achieve DKNF. However, he did manage to
demonstrate that DKNF is often impossible to achieve.

If relation R is in DKNF, then it is sufficient to enforce the domain and key constraints for R,
and all constraints on R will be enforced automatically. Enforcing those domain and key
constraints is, of course, very simple (most DBMS products do it already). To be specific,
enforcing domain constraints just means checking that attribute values are always values
from the applicable domain (i.e., values of the right type); enforcing key constraints just
means checking that key values are unique.

Unfortunately lots of relations are not in DKNF in the first place. For example, suppose
there's a constraint on R to the effect that R must contain at least ten tuples. Then that
constraint is certainly not a consequence of the domain and key constraints that apply to R,
and so R is not in DKNF. The sad fact is, not all relations can be reduced to DKNF; nor do we
know the answer to the question "Exactly when can a relation be so reduced?"
Personal Guidelines

Even if you obey all the preceding rules it is still possible to produce a database design that
causes problems during development. I have come across many different implementation
tips and techniques over the years, and some that have worked in one database system have
been successfully carried forward into a new database system. Some tips, on the other hand,
may only be applicable to a particular database system.

For particular options and limitations you must refer to your database manual.

Database Names

6. Database names should be short and meaningful, such


as products, purchasing and sales.
o Short, but not too short, as in prod or purch.

o Meaningful but not verbose, as in 'the database used to store product


details'.

7. Do not waste time using a prefix such as db to identify database names. The SQL
syntax analyser has the intelligence to work that out for itself - so should you.
8. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, it is better to stick to a standard naming convention such as:
o All uppercase.

o All lowercase (my preference - see The choice between upper and lower
case).

o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.

9. If a database name contains more than one word, such as in sales


orders and purchase orders, decide how to deal with it:
o Separate the words with a single space, as in sales orders (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in sales_orders (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in sales-orders.

o Use camel caps, as in SalesOrders.


Again, be consistent.

10. Rather than putting all the tables into a single database it may be better to create
separate databases for each logically related set of tables. This may help with
security, archiving, replication, etc.
Table Names

7. Table names should be short and meaningful, such as part, customer and invoice.
o Short, but not too short.

o Meaningful, but not verbose.

8. Do not waste time using a prefix such as tbl to identify table names. The SQL syntax
analyser has the intelligence to work that out for itself - so should you.
9. Table names should be in the singular (e.g. customer not customers). The fact that
a table may contain multiple entries is irrelevant - any multiplicity can be derived
from the existence of one-to-many relationships.
10. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, It is better to stick to a standard naming convention such as:
o All uppercase.

o All lowercase. (my preference - see The choice between upper and lower
case)

o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.

11. If a table name contains more than one word, such as in sales order and purchase
order, decide how to deal with it:
o Separate the words with a single space, as in sales order (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in sales_order (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in sales-order.

o Use camel caps, as in SalesOrder.

Again, be consistent.
12. Be careful if the same table name is used in more than one database - it may lead to
confusion.
Field Names

7. Field names should be short and meaningful, such


as part_name and customer_name.
o Short, but not too short, such as in ptnam.

o Meaningful, but not verbose, such as the name of the part.

8. Do not waste time using a prefix such as col or fld to identify column/field names.
The SQL syntax analyser has the intelligence to work that out for itself - so should
you.
9. If your DBMS allows a mixture of upper and lowercase names, and it is case
sensitive, it is better to stick to a standard naming convention such as:
o All uppercase.

o All lowercase. (my preference - see The choice between upper and lower
case)

o Leading uppercase, remainder lowercase.

Inconsistencies may lead to confusion, confusion may lead to mistakes, mistakes can
lead to disasters.

10. If a field name contains more than one word, such as in part name and customer
name, decide how to deal with it:
o Separate the words with a single space, as in part name (note that some
DBMSs do not allow embedded spaces, while most languages will require such
names to be enclosed in quotes).

o Separate the words with an underscore, as in part_name (my preference -


see The choice between upper and lower case).

o Separate the words with a hyphen, as in part-name.

o Use camel caps, as in PartName.

Again, be consistent.

11. Common words in field names may be abbreviated, but be consistent.


o Do not allow a mixture of abbreviations, such as 'no', 'num' and 'nbr' for
'number'.

o Publish a list of standard abbreviations and enforce it.


12. Although field names must be unique within a table, it is possible to use the same
name on multiple tables even if they are unrelated, or they do not share the same set
of possible values. It is recommended that this practice should be avoided, for
reasons described in Field names should identify their content and The naming of
Foreign Keys.
Primary Keys

6. It is recommended that the primary key of an entity should be constructed from the
table name with a suffix of _ID. This makes it easy to identify the primary key in a
long list of field names.
7. Do not waste time using a prefix such as pk to identify primary key fields. This has
absolutely no meaning to any database engine or any application.
8. Avoid using generic names for all primary keys. It may seem a clever idea to use the
name ID for every primary key field, but this causes problems:
o It causes the same name to appear on multiple tables with totally different
contexts. The string ID='ABC123' is extremely vague as it gives no idea of the
entity being referenced. Is it an invoice id, customer id, or what?
o It also causes a problem with foreign keys.
9. There is no rule that says a primary key must consist of a single attribute - both
simple and composite keys are allowed - so don't waste time creating artificial keys.
10. Avoid the unnecessary use of technical keys. If a table already contains a satisfactory
unique identifier, whether composite or simple, there is no need to create another
one. Although the use of a technical key can be justified in certain circumstances, it
takes intelligence to know when those circumstances are right. The indiscriminate
use of technical keys shows a distinct lack of intelligence. For further views on this
subject please refer to Technical Keys - Their Uses and Abuses.
Foreign Keys

4. It is recommended that where a foreign key is required that you use the same name
as that of the associated primary key on the foreign table. It is a requirement of
arelational join that two relations can only be joined when they share at least one
common attribute, and this should be taken to mean the attribute name(s) as well as
the value(s). Thus where the customer and invoice tables are joined in a parent-
child relationship the following will result:
o The primary key of customer will be customer_id.

o The primary key of invoice will be invoice_id.

o The foreign key which joins invoice to customer will be customer_id.

5. For MySQL users this means that the shortened version of the join condition may be
used:
o Short: A LEFT JOIN B USING (a,b,c)

o Long: A LEFT JOIN B ON (A.a=B.a AND A.b=B.b AND A.c=B.c)

6. The only exception to this naming recommendation should be where a table


contains more than one foreign key to the same parent table, in which case the names
must be changed to avoid duplicates. In this situation I would simply add a
meaningful suffix to each name to identify the usage, such as:
o To signify movement I would use location_id_from and location_id_to.
o To signify positions in a hierarchy I would
use node_id_snr and node_id_jnr.
o To signify replacement I would use part_id_old and part_id_new.
I prefer to use a suffix rather than a prefix as it makes the leading characters match
(as in PART_ID_old and PART_ID_new) instead of having the traiing characters
match (as in old_PART_ID and new_PART_ID).

7. Do not waste time using a prefix such as fk to identify foreign key fields. This has
absolutely no meaning to any database engine or any application.
Generating Unique ids

Where a technical primary key is used a mechanism is required that will generate new and
unique values. Such keys are usually numeric, so there are several methods available:

1. Some database engines will maintain a set of sequence numbers for you which can
be referenced using code such as :

2. SELECT <seq_name>.NEXTVAL FROM DUAL

Using such a sequence is a two-step procedure:

o Access the sequence to obtain a value.


o Use the supplied value on an INSERT statement.

It is sometimes possible to access the sequence directly from an INSERT statement, as


in the following:

INSERT INTO tablename (col1,col2,...) VALUES


(tablename_seq.nextval,'value2',...)

If the number just used needs to be retrieved so that it can be passed back to the
application it can be done so with the following:
SELECT <seq_name>.CURRVAL FROM DUAL

I have used this method, but a disadvantage that I have found is that the DBMS has no
knowledge of what primary key is linked to which sequence, so it is possible to insert
a record with a key not obtained from the sequence and thus cause the two to
become unsynchronised. The next time the sequence is used it could therefore
generate a value which already exists as a key and therefore cause an INSERT error.

3. Some database engines will allow you to specify a numeric field as 'auto-increment',
and on an INSERT they will automatically generate the next available number
(provided that no value is provided for that field in the first place). This is better than
the previous method because:

o The sequence is tied directly to a particular database table and is not a


separate object, thus it is impossible to become unsynchronised.
o It is not necessary to access the sequence then use the returned value on an
INSERT statement - just leave the field empty and the DBMS will fill in the value
automatically.

4. While the previous methods have their merits, they both have a common failing in
that they are not-standard extensions to the SQL standard, therefore they are not
available in all SQL-compliant database engines. This becomes an important factor if
it is ever decided to switch to another database engine. A truly portable method
which uses a standard technique and can therefore be used in any SQL-compliant
database is to use an SQL statement similar to the following to obtain a unique key for
a table:

5. SELECT max(table_id) FROM <tablename>

6. table_id = table_id+1

Some people seem to think that this method is inefficient as it requires a full table
search, but they are missing the fact that table_id is a primary key, therefore the
values are held within an index. The SELECT max(...) statement will automatically be
optimised to go straight to the last value in the index, therefore the result is obtained
with almost no overhead. This would not be the case if I used SELECT count(...) as
this would have to physically count the number of entries. Another reason for not
using SELECT count(...) is that if records were to be deleted then record count would
be out of step with the highest current value.
7. The Radicore development framework has separate data access objects for each
DBMS to which it can connect. This means that the different code for dealing with
auto_increment keys can be contained within each object, so is totally transparent to
the application. All that is necessary is that the key be identified as 'auto_increment'
in the Data Dictionary and the database object will take care of all the necessary
processing.

What is SQL
Structured Query Language

Using sql one can store manipulate and retrieve data stored in databases

What can SQL do


 SQL can execute queries against a database

 Create new databases and tables in a database

 it can retrieve data from a database

 Insert records in database

 update records in database

 delete records in database

Division of SQL

Data definition language


Used to create (define) data structures such as tables, indexes, clusters ie,

 CREATE databases, tables

 ALTER databases, tables


 DROP tables

Data Manipulation Language


The data manipulation language is used to access and update data; it is not

important for representing the data. (Of course, the data manipulation language

must be aware of how data is represented, and reflects this in the constructs that

it supports ie

 SELECT - extracts data from databases

 UPDATE- updates data in a database

 DELETE- deletes data from tables

 INSERT INTO - inserts data into tables

Data Definition

 Creating a database:

CREATE DATABASE databasename

 Create a table:

CREATE TABLE tablename

column1 datatype,

columnname2 datatype,

columnname3 datatype,

...

 Data types: bigint,int,datetime,char,varchar,text,image

 Alter: TABLE tablename


 Deleting a table:

 DROP TABLE tablename

Insert data into table

NSERT INTO table name

VALUES (value1, value2, value3,. . . )

INSERT INTO table name (column1, column2, column3,. . . )

VALUES (value1, value2, value3,. . . )

Select data
 SELECT column name(s)

FROM table name

 SELECT FROM table name

 SELECT DISTINCT column name(s)

FROM table name

 Selecting data items based upon some condition

WHERE

SELECT column name(s)

FROM table name

WHERE column name operator value

 Operators:=, <>, <, >, <=, >=, BETWEEN, LIKE

 condtions may be joined using AND/OR

 Sorting on some attribute

SELECT column name(s) FROM table name


WHERE condition(S)

ORDER BY column name(s) ASC|DESC

 Updating some data items

UPDATE table name SET column1=value,

column2=value2,. . . WHERE some column=some value

 What if we omit where clause?

Sorting on some attribute

SELECT column name(s) FROM table name

WHERE condition(S)

ORDER BY column name(s) ASC|DESC

 Updating some data items

UPDATE table name SET column1=value,

column2=value2,. . . WHERE some column=some value

 What if we omit where clause?

 DELETE FROM table name

WHERE some column=some value

 How would you delete all rows?

 Sorting on some attribute

SELECT column name(s) FROM table name

WHERE condition(S)

ORDER BY column name(s) ASC|DESC

 Updating some data items

UPDATE table name SET column1=value,

column2=value2,. . . WHERE some column=some value

 What if we omit where clause?


DELETE FROM table name

WHERE some column=some value

 How would you delete all rows?

DELETE * FROM table name

SQL Functions

 The Average function

SELECT AVG(column name) FROM table name

 The Count function

SELECT COUNT(column name) FROM table name

 To count distinct values

SELECT COUNT(DISTINCT column name) FROM

table name

 The max function

SELECT MAX(column name) FROM table name

Similarly there is a min function

 The sum function

SELECT SUM(column name) FROM table name

 The now function

SELECT ProductName, UnitPrice, Now( as PerDate FROM

Products)
Creating a Schema

 Creating a Table:
 CREATE TABLE employee (

 EmployeeID VARCHAR(10) NOT NULL,

 Last_Name VARCHAR(20) NOT NULL,

 First_name VARCHAR(18) NOT NULL,

 Soc_Sec VARCHAR(11) NOT NULL,

 Date_of_Birth DATE,

 Salary NUMBER

 ) ;

 CREATE TABLE dependant (

 Last_Name VARCHAR(20) NOT NULL,

 First_name VARCHAR(18) NOT NULL,

 Soc_Sec VARCHAR(11) NOT NULL,

 Date_of_Birth DATE,

 EmployeeID VARCHAR(10) NOT NULL

 );

 Note: When naming tables, columns and other database objects, do not
include spaces in the names. For example, do not call the last name
column: Last Name
If you wish to separate words in a name, use the underscore character.
 Specifying Primary and Foreign keys:
 CREATE TABLE order_header (

 order_number NUMBER(10,0) NOT NULL,

 order_date DATE,
 sales_person VARCHAR(25),

 bill_to VARCHAR(35),

 bill_to_address VARCHAR(45),

 bill_to_city VARCHAR(20),

 bill_to_state VARCHAR(2),

 bill_to_zip VARCHAR(10),

 PRIMARY KEY (order_number)

 );

 CREATE TABLE order_items (

 order_number NUMBER(10,0) NOT NULL,

 line_item NUMBER(4,0) NOT NULL,

 part_number VARCHAR(12) NOT NULL,

 quantity NUMBER(4,0),

 PRIMARY KEY (order_number, line_item),

 FORIEGN KEY (order_number)

 REFERENCES order_header (order_number),

 FOREIGN KEY (part_number)

 REFERENCES parts (part_number)

 );

 CREATE INDEX order_index

 ON order_header (order_number) ASC ;

 CREATE INDEX items_index

 ON order_items (order_number, line_item) ASC ;


 CREATE TABLE workers (

 FirstName TEXT,

 LastName TEXT,

 ssn INTEGER

 CONSTRAINT ssnConstraint PRIMARY KEY

 );

 CREATE INDEX worker_index

 ON worker (ssn) ;

More examples using Primary Key and Foreign keys

CREATE TABLE department (

departmentid NUMBER NOT NULL

CONSTRAINT pk_department PRIMARY KEY,

department_name VARCHAR(30) NOT NULL,

department_location VARCHAR(30)

CREATE TABLE employee (

EmployeeID VARCHAR(10) NOT NULL

CONSTRAINT pk_Employee PRIMARY KEY,

Last_Name VARCHAR(20) NOT NULL,

First_name VARCHAR(18) NOT NULL,


Soc_Sec VARCHAR(11) NOT NULL,

Date_of_Birth DATE,

Salary NUMBER,

departmentid NUMBER

) ;

ALTER TABLE employee

ADD CONSTRAINT fk_department

FOREIGN KEY (departmentid)

REFERENCES department (departmentid)

Specifying Constraints on Columns and Tables

 Constraints on attributes:
o NOT NULL - Attribute may not take a NULL value
o DEFAULT - Store a given default value wh no value is specified
o PRIMARY KEY - Indicate which attribute(s) form the primary key
o FOREIGN KEY - Indicate which attribute(s) form a foreign key.
This enforces referential integrity
o UNIQUE - Indicates which attribute(s) must have unique values.
 Specify when constraint should be enforced:
o Immediate
o Deferrable until commit time
 Referential Integrity Constraint: Specify the behavior for child tuples
when a parent tuple is modified.
 Action to take if referential integrity is violated:
o SET NULL - Child tuples foreign key is set to NULL - Orphans.
o SET DEFAULT - Set the value of the foreign key to some default
value.
o - Child tuples are updated (or deleted) according to the
CASCADE
action take on the parent tuple.
 Examples of ON DELETE and ON UPDATE
 CREATE TABLE order_items (

 order_number NUMBER(10,0) NOT NULL,

 line_item NUMBER(4,0) NOT NULL,

 part_number VARCHAR(12) NOT NULL,

 quantity NUMBER(4,0),

 PRIMARY KEY (order_number, line_item),

 FORIEGN KEY (order_number)

 REFERENCES order_header (order_number)

 ON DELETE SET DEFAULT

 ON UPDATE CASCADE,

 FOREIGN KEY (part_number)

 REFERENCES parts (part_number)

 );

 Constraints can also be given names so that they can later be modified
or dropped easily.
 CREATE TABLE order_header (

 order_number NUMBER(10,0) NOT NULL,

 order_date DATE,

 sales_person VARCHAR(25),

 bill_to VARCHAR(35),

 bill_to_address VARCHAR(45),

 bill_to_city VARCHAR(20),

 bill_to_state VARCHAR(2),

 bill_to_zip VARCHAR(10),
 CONSTRAINT pk_order_header

 PRIMARY KEY (order_number)

 );

 CREATE TABLE order_items (

 order_number NUMBER(10,0) NOT NULL,

 line_item NUMBER(4,0) NOT NULL,

 part_number VARCHAR(12) NOT NULL,

 quantity NUMBER(4,0),

 CONSTRAINT pk_order_items

 PRIMARY KEY (order_number, line_item),

 CONSTRAINT fk1_order_items

 FORIEGN KEY (order_number)

 REFERENCES order_header (order_number)

 ON DELETE SET DEFAULT

 ON UPDATE CASCADE,

 CONSTRAINT fk2_order_items

 FOREIGN KEY (part_number)

 REFERENCES parts (part_number)

 ON DELETE SET DEFAULT

 ON UPDATE CASCADE

 );
 An even better approach is to create the tables without constraints and
then add them separately with ALTER TABLE statements
 CREATE TABLE order_header (

 order_number NUMBER(10,0) NOT NULL,

 order_date DATE,

 sales_person VARCHAR(25),

 bill_to VARCHAR(35),

 bill_to_address VARCHAR(45),

 bill_to_city VARCHAR(20),

 bill_to_state VARCHAR(2),

 bill_to_zip VARCHAR(10)

 );

 ALTER TABLE order_header

 ADD CONSTRAINT pk_order_header

 PRIMARY KEY (order_number);

 CREATE TABLE order_items (

 order_number NUMBER(10,0) NOT NULL,

 line_item NUMBER(4,0) NOT NULL,

 part_number VARCHAR(12) NOT NULL,

 quantity NUMBER(4,0)

 );

 ALTER TABLE order_items ADD


 CONSTRAINT pk_order_items

 PRIMARY KEY (order_number, line_item) ;

 ALTER TABLE order_items ADD

 CONSTRAINT fk1_order_items

 FORIEGN KEY (order_number)

 REFERENCES order_header (order_number)

 ON DELETE SET DEFAULT

 ON UPDATE CASCADE;

 ALTER TABLE order_items ADD

 CONSTRAINT fk2_order_items

 FOREIGN KEY (part_number)

 REFERENCES parts (part_number)

 ON DELETE SET DEFAULT

 ON UPDATE CASCADE;

Removing Schema Components with DROP

 DROP SCHEMA schema_name CASCADE


Drop the entire schema including all tables. CASCADE option deletes all
data, all tables, indexes, domains, etc.
 DROP SCHEMA schema_name RESTRICT
Removes the schema only if it is empty.
 DROP TABLE table_name
Remove the table and all of its data.
 DROP TABLE table_name CASCADE
Remove the table and all related tables as specified by FOREIGN KEY
constraints.
 DROP TABLE table_name RESTRICT
Remove the table only if it is not referenced (via a FORIEGN KEY
constraint) by other tables.
 DROP INDEX index_name
Removes an index.
 DROP CONSTRAINT table_name.constraint_name
Removes a constraint from a table.

Changing Schema Components with ALTER

 Changing Attributes:
ALTER TABLE student ALTER last_name VARCHAR(35);
ALTER TABLE student ALTER gpa DROP DEFAULT
ALTER TABLE student ALTER gpa SET DEFAULT 0.00;
 Adding Attributes:
ALTER TABLE student ADD admission DATE;
 Removing Attributes (not widely implemented):
ALTER TABLE student DROP home_phone;

Data Manipulation Language

 DDL is used to create and specify the schema. DML is then used to
manipulate (select, insert, update, delete) data.

Inserting Data into Tables

 General syntax:
 INSERT INTO tablename (column1, column2, ... columnX)

 VALUES (val1, val2, ... valX);

 Examples:

 INSERT INTO stocks (symbol, close_date, close_price)

 VALUES ("IBM", "03-JUN-94", 104.25);


 INSERT INTO student_grades (student_id, test_name, score, grade)

 VALUES (101, "Quiz 1", 88, "B+");

 Quotes are placed around the data depending on the Data type and on
the specific RDBMS being used:

RDBMS Text Data Type Dates

MS Access TEXT: Either " or ' DATETIME: Either # or '

Oracle VARCHAR: ' DATE: '

IBM DB2 VARCHAR: ' DATE: '

Sybase CHAR and VARCHAR: " DATE: "

Examples for Employee and Department .

INSERT INTO department (departmentid, department_name,


department_location)

VALUES (1, "Finance", "NY");

INSERT INTO department (departmentid, department_name,


department_location)

VALUES (2, "Marketing", "NY");

INSERT INTO department (departmentid, department_name,


department_location)

VALUES (3, "IT", "NJ");

INSERT INTO department (departmentid, department_name,


department_location)
VALUES (4, "Accounting", "PA");

INSERT INTO department (departmentid, department_name,


department_location)

VALUES (5, "Shipping", "CA");

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)

VALUES ("E101", "Joe", "Smith", "111-22-3344", '5/3/1972',


43000.00, 1)

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)

VALUES ("E102", "Mary", "Green", "333-11-2222", '12/4/1968',


41500.00, 2)

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)

VALUES ("E103", "Ed", "Jones", "777-66-5555", '1/9/1971', 39500.00,


3)

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)

VALUES ("E104", "Sally", "Smith", "661-61-6161", '8/19/1973',


55000.00, 3)

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)
VALUES ("E105", "Howard", "Brown", "777-88-9999", '8/12/1976',
45000.00, 1)

INSERT INTO employee (employeeid, first_name, last_name, soc_sec,


date_of_birth, salary, departmentid)

VALUES ("E106", "Jane", "Foster", "222-22-2222", '1/1/1972',


35000.00, 4)

Retrieving Data from Tables with Select

 Main way of getting data out of tables is with the SELECT statement.
 SELECT syntax:
 SELECT column1, column2, ... columnN

 FROM tableA, tableB, ... tableZ

 WHERE condition1, condition2, ...conditionM

 GROUP BY column1, ...

 HAVING condition

 ORDER BY column1, column2, ... columnN

Assume an employee table:


employee(employeeid, first_name, last_name, date_of_birth, salary)
and a "Stocks" table:
stocks(symbol, close_date, close_price)

 Some example queries:


 SELECT employeeid, last_name, first_name

 FROM employee

 WHERE last_name = 'Smith'

 ORDER BY first_name DESC


 SELECT employeeid, last_name, first_name

 FROM employee

 WHERE salary > 41000

 ORDER BY last_name, first_name DESC

 SELECT *

 FROM employee

 ORDER BY 2;

 SELECT last_name, first_name, salary

 FROM employee

 WHERE departmentid = 3

 ORDER BY salary DESC

 SELECT symbol, close_price

 FROM stocks

 WHERE close_date > "01-JAN-95" AND

 symbol = "IBM"

 ORDER BY close_date


 SELECT symbol, close_date, close_price

 FROM stocks

 WHERE close_date >= "01-JAN-95"

 ORDER BY symbol, close_date

SQL Built-in Functions

 SQL has two main types of functions:


1. Aggregate Functions: Operate on a collection of records. For
example, SUM, AVG, MIN, MAX and COUNT are all aggregate
functions.
2. Scalar functions: Operate on every record value individually. For
example math functions like SIN, COS, ROUND and text functions
like UCASE, MID, LEN and FORMAT.
 Examples of Aggregate functions follow:
 Average salary in the company:
 SELECT AVG(salary)

 FROM employee;

 Results:

 Expr1000

 ---------

 43166.667

SELECT AVG(salary) AS AverageSalary

 FROM employee;

 Results:

 AverageSalary
 -------------

 43166.667

Give the name of the employee with the highest salary in the company:
This is an example of a subquery

 SELECT first_name, last_name, salary

 FROM employee

 WHERE salary =

 ( SELECT MAX(salary) FROM employee

 );

Results:

first_name last_name salary

Sally Smith 55000

Show the employees with the highest salaries in each department:

SELECT first_name, last_name, departmentid, salary

FROM employee e1

WHERE salary =(SELECT max(salary)

FROM employee e2 WHERE e1.departmentid = e2.departmentid

ORDER BY salary DESC;

Results:
first_name last_name departmentid salary

Sally Smith 3 55000

Howard Brown 1 45000

Mary Green 2 41500

Jane Foster 4 35000

Note the two aliases given to the employee table: e1 and e2. These
allow us to refer to different views of the same table.

Show a COUNT of the number of employees in each department

SELECT departmentid, COUNT(employeeid) AS EmployeeCount

FROM employee

GROUP BY departmentid

Results:

departmentid EmployeeCount

3 2

1 2

2 1

4 1

 SELECT MID(first_name, 1, 1) AS first_init, last_name

 FROM employee

 Result:

 first_init last_name

 J Smith
 M Green

 E Jones

 S Smith

 H Brown

 J Foster

 Show the First initial and last name of each employee but make them all
upper case
 SELECT UCASE(MID(first_name, 1, 1)) AS first_init,
UCASE(last_name) AS last_name

 FROM employee

 Result:

 first_init last_name

 J SMITH

 M GREEN

 E JONES

 S SMITH

 H BROWN

 J FOSTER

 Show the current salary and a proposed new salary for each employee:
 SELECT first_name, salary AS CurrentSalary,

 (salary * 1.04) AS SalaryWithRaise

 FROM employee


 Result:

 first_name CurrentSalary SalaryWithRaise

 Joe 43000 44720

 Mary 41500 43160

 Ed 39500 41080

 Sally 55000 57200

 Howard 45000 46800

 Jane 35000 36400

 Show the Date of Birth and the age of each employee in Department 3.
This uses the Now() function which returns the current date and time.
 SELECT first_name, date_of_birth,

 (Now() - date_of_birth) AS Age

 FROM employee

 WHERE departmentid = 3

 Result:

 first_name date_of_birth Age

 Ed 1/9/1971 14901.5790625

 Sally 8/19/1973 13948.5790625

 Note that the "Age" is given in number of days. A better output would
be to give this in years. So divide answer by 365.
 SELECT first_name, date_of_birth,

 ((Now() - date_of_birth) / 365) AS Age

 FROM employee
 WHERE departmentid = 3

 Result:

 first_name date_of_birth Age

 Ed 1/9/1971 40.8262484462202

 Sally 8/19/1973 38.2152895421106

 Perhaps we do not want so many decimal places. In this case use


FORMAT function (in MS ACCESS)
 SELECT first_name, date_of_birth,

 FORMAT(((Now() - date_of_birth) / 365), "0.0") AS Age

 FROM employee

 WHERE departmentid = 3

 Result:

 first_name date_of_birth Age

 Ed 1/9/1971 40.8

 Sally 8/19/1973 38.2

 Show all employees who have a birthday in August. this uses the
MONTH function. Given a date, MONTH(date) returns the month as a
number. Similar functions include DAY and YEAR.
 SELECT first_name, last_name

 FROM employee

 WHERE MONTH(date_of_birth) = 8;

 Result:

 first_name last_name
 Sally Smith

 Howard Brown

Selecting from 2 or More Tables

 In the FROM portion, list all tables separated by commas. Called a Join.
 The WHERE part becomes the Join Condition
 List all of the employees working in New York:
 SELECT employee.first_name, employee.last_name

 FROM employee, department

 WHERE employee.departmentid = department.departmentid

 AND department.department_location = 'NY';

 Results:

 first_name last_name

 Joe Smith

 Howard Brown

 Mary Green

The Where clause: WHERE employee.departmentid =


department.departmentid is the Join Condition

 List each employee name and what state (location) they work in. List
them in order of location and name:
 SELECT employee.last_name, department.department_location

 FROM employee, department

 WHERE employee.departmentid = department.departmentid

 ORDER BY department.department_location, employee.last_name;

 Results:
 last_name department_location

 Jones NJ

 Smith NJ

 Brown NY

 Green NY

 Smith NY

 Foster PA

This is similar to a LEFT JOIN.

 List each department and all employees that work there. Show the
department and location even if no employees work there.
 SELECT department.departmentid,
department.department_location,

 employee.last_name

 FROM employee RIGHT JOIN department

 ON employee.departmentid = department.departmentid

 Results:

 departmentid department_location last_name

 1 NY Smith

 1 NY Brown

 2 NY Green

 3 NJ Jones

 3 NJ Smith

 4 PA Foster
 5 CA NULL

 What is the highest paid salary in New York ?


 SELECT MAX(employee.salary)

 FROM employee, department

 WHERE employee.departmentid = department.departmentid

 AND department.department_location = 'NY';

 Results:

 MAX(SALARY)

 ------------

 45000

 Cartesian Product of the two tables:


 SELECT *

 FROM employee, department;

 In which states do our employees work ?


 SELECT DISTINCT department_location

 FROM department;

 From our Bank Accounts example.


List the Customer name and their total account holdings:
 SELECT customers.LastName, Sum(Balance)

 FROM customers, accounts

 WHERE customers.CustomerID = accounts.customerid

 GROUP BY customers.LastName

 Results:

 LASTNAME SUM(BALANCE)
 --------- ------------

 Axe $15,000.00

 Builder $1,300.00

 Jones $1,000.00

 Smith $6,000.00

 We can also use a Column Alias to change the title of the columns
 SELECT customers.LastName, Sum(Balance) AS TotalBalance

 FROM customers, accounts

 WHERE customers.CustomerID = accounts.customerid

 GROUP BY customers.LastName

 Results:

 LASTNAME TotalBalance

 --------- ------------

 Axe $15,000.00

 Builder $1,300.00

 Jones $1,000.00

 Smith $6,000.00

 Here is a combination of a function and a column alias:


 SELECT first_name, last_name, departmentid,

 salary AS CurrentSalary,

 (salary * 1.03) AS ProposedRaise

 FROM employee;

 Results:

 first_name last_name departmentid CurrentSalary


ProposedRaise

 Joe Smith 1 43000 44290

 Mary Green 2 41500 42745

 Ed Jones 3 39500 40685

 Sally Smith 3 55000 56650

 Howard Brown 1 45000 46350

 Jane Foster 4 35000 36050

 Provide a listing of each student and the name of their tutor:


 SELECT s1.name AS Student, tutors.name AS Tutor

 FROM students s1, students tutors

 WHERE s1.student_tutorid = tutors.studentid;

WHERE Clause Expressions

 There are a number of expressions one can use in a WHERE clause.


 Typical Logic expressions:
COLUMN = value
Also:

 < > = != <= >=

Also consider BETWEEN

SELECT name, grade, "You Got an A"

FROM students
WHERE grade between 91 and 100

 Using BETWEEN with dates: Show the employees who were born in the
first 5 months of 1972.
 For MS Access:

 SELECT first_name, last_name, date_of_birth

 FROM employee

 WHERE date_of_birth BETWEEN #1/1/1972# AND #6/30/1972#

 Result:

 first_name last_name date_of_birth

 Joe Smith 5/3/1972

Jane Foster 1/1/1972

Note for Oracle or DB2, enclose the dates in single quotes like so:

SELECT first_name, last_name, date_of_birth

FROM employee

WHERE date_of_birth BETWEEN '01-JAN-1972' AND '30-JUN-1972'

 Subqueries using = (equals): Find the student with the highest grade.
 SELECT name, grade

 FROM students

 WHERE grade =

 ( SELECT MAX(grade) FROM students

 );

 Find th eemployee with the lowest salary:


 SELECT first_name, last_name, salary
 FROM employee

 WHERE salary = (SELECT MIN(salary) FROM employee)

 Result:

 first_name last_name salary

 Jane Foster 35000

This assumes the subquery returns only one tuple as a result.


Typically used for aggregate functions.

 Subqueries using IN:


 SELECT last_name

 FROM employee

 WHERE departmentid IN (1, 3);

 SELECT last_name, first_name

 FROM employee

 WHERE departmentid IN

 (SELECT departmentid

 FROM department

 WHERE department.department_location = 'NY');

In the above case, the subquery returns a set of tuples. The IN clause
returns true when a tuple matches a member of the set.

 Subqueries using EXISTS:


 SELECT first_name, last_name, salary

 FROM employee
 WHERE EXISTS

 (SELECT last_name

 FROM EMPLOYEE e2

 WHERE e2.salary > employee.salary)

 AND EXISTS

 (SELECT last_name

 FROM EMPLOYEE e3

 WHERE e3.salary < employee.salary)

 Results:

 first_name last_name salary

 Joe Smith 43000

 Mary Green 41500

 Ed Jones 39500

 Howard Brown 45000

The above query shows all employees names and salaries where there
is at least one person who makes more money (the first exists) and at
least one person who makes less money (second exists).

 NOT EXISTS:
 SELECT first_name, salary

 FROM employee

 WHERE NOT EXISTS

 (SELECT last_name

 FROM EMPLOYEE e2

 WHERE e2.salary > employee.salary)


 Results:

 first_name salary

 Sally 55000

Above query shows all employees for whom there does not exist an
employee who is paid less.

 INLINE SQL Queries


The result of a query is a table. Normally the results are shown to the
user. However we can use the reuslts of a query as the basis for another
query. That is, the subquery becomes a table we can select FROM. For
exmaple:
 SELECT first_name, salary

 FROM (SELECT employee.first_name, employee.salary,


department.department_location

 FROM employee, department

 WHERE employee.departmentid = department.departmentid

 AND department.department_location = 'NY'

 )

 WHERE salary > 35000

Recall this example:

SELECT s1.name AS TutorName,

COUNT(tutors.student_tutorid) AS NumberTutored

FROM students s1, students tutors

WHERE s1.studentid = tutors.student_tutorid

GROUP BY s1.name;
We might be interested in the maximum number of people one person
is tutoring:

SELECT TutorName, NumberTutored

FROM (SELECT s1.name AS TutorName,

COUNT(tutors.student_tutorid) AS NumberTutored

FROM students s1, students tutors

WHERE s1.studentid = tutors.student_tutorid

GROUP BY s1.name

WHERE NumberTutored =

(SELECT MAX(NumberTutored)

FROM (SELECT s1.name AS TutorName,

COUNT(tutors.student_tutorid) AS NumberTutored

FROM students s1, students tutors

WHERE s1.studentid = tutors.student_tutorid

GROUP BY s1.name

 LIKE operator:
Use the LIKE operator to perform a partial string match. Generally, the
% character is used as the wild card although in some DBMS, the *
character is used.

Show all employees whose name starts with 'S'

SELECT first_name, last_name, salary


FROM employee

WHERE last_name LIKE 'S%';

Show all employees whose name contains the letters 'en'

SELECT first_name, salary

FROM employee

WHERE first_name LIKE '%ar%';

Note that chatacters within quotes are case sensitive.

Show all employees whose name contains the letter 'e' and the letter 'n'
in that order:

SELECT first_name, salary

FROM employee

WHERE first_name LIKE '%a%r%';

Show all employees whose name contains the letter 'e' and the letter 'n'
in any order:

SELECT first_name, salary

FROM employee

WHERE first_name LIKE '%a%r%' OR

first_name LIKE '%r%a%';

HAVING Clause Expressions


HAVING is like WHERE except that it works on aggregate functions.

For example, assume we want to get a total of salaries paid in each


department but only want to show those with total salary greater than
$50,000

SELECT department.department_name, SUM(salary) AS TotalDeptSalary

FROM employee, department


WHERE employee.departmentid = department.departmentid

GROUP BY department.department_name

HAVING SUM(salary) > 50000

Result:

department_name TotalDeptSalary

Finance 88000

IT 94500

Or we can put it another way. Show total of salaries paid by each department
but only if the total is greater than the average paid by each department

SELECT department.department_name, SUM(salary) AS TotalDeptSalary

FROM employee, department

WHERE employee.departmentid = department.departmentid

GROUP BY department.department_name

HAVING SUM(salary) > AVG(salary)

Suppose we are interested in the name of the department with the largest
total salary. In this case we need to find the MAX of the SUM

SELECT department.department_name, SUM(salary) AS TotalDeptSalary

FROM employee, department

WHERE employee.departmentid = department.departmentid

GROUP BY department.department_name

HAVING SUM(salary) =

(SELECT MAX(TotalDeptSalary)

FROM

(SELECT department.department_name,

SUM(salary) AS TotalDeptSalary

FROM employee, department


WHERE employee.departmentid = department.departmentid

GROUP BY department.department_name

Deleting Tuples with DELETE

 DELETE is used to remove tuples from a table.


 With no WHERE clause, DELETE will remove all tuples from a table.
 Remove all employees:
 DELETE employee;

 Remove only employees making more than $50,000


 DELETE employee

 WHERE salary > 50000;

 Remove all employees working in New York:


 DELETE employee

 WHERE departmentid IN

 (SELECT departmentid

 FROM department

 WHERE department_location = 'NY');

 DELETE will not be successful if a constraint would be violated.


For example, consider the department attribute in the Employee table as
a Foreign Key.
Removing a department would then be contingent upon no employees
working in that department.
This is what we call enforcing Referential Integrity
Change Values using UPDATE

 The UPDATE command is used to change attribute values in the


database.
 UPDATE uses the SET clause to overwrite the value.
 Change the last name of an Employee:
 UPDATE employee

 SET last_name = 'Smith'

 WHERE employeeid = 'E101';

 Give an Employee a raise:


 UPDATE employee

 SET salary = salary * 1.05

 WHERE employeeid = 'E101';

You might also like