Database Theory
Database Theory
Database Theory
In a flat-file database all the data is held in a single table and there are
several problems associated with this. Consider the following scenario:
Duplicated data – highlights show typing errors made during data entry
The problems that can then occur:
Data inconsistency: Note the data entry errors highlighted in the above
table. Such data inconsistency occurs when large amounts of duplicate
data have to be entered into a database. By duplicating data across
records in flat files we introduce a risk of data inconsistency.
SUMMARY: The problems with flat-file database systems are that they can
lead to:
Data redundancy – the needless duplication of data
Data inconsistency – errors in the values or format of data that
should be identical
An entity is a “real world thing” about which data can is held in a database.
Examples of entities include:
Customers
Products
Pupils
Suppliers
Loans
Videos/DVD’s
Flights
Employees
Treatments
Contracts
Library books
Cars
Orders
Zoo animals
Rentals
In general, entities can have attributes but attributes have no smaller parts.
An attribute belongs to a single entity.
What is a key field and how are they used to link database tables?
The key field (sometimes called a primary key) is used to store an attribute
that makes that particular entity entry in the database unique. For example,
a NHS number, a passport number, a vehicle registration, a booking
reference, a flight number etc.
The tables in a relational database are linked by the key field in one
table being added as a foreign key in another table to form a
relationship between the entities.
A diagram
of a simplified relational database showing the tables linked by the key fields
The book table would only have details about the book itself while details of
the author would be stored in a separate table. A further table would hold
details of the library members who could borrow books.
Another further table would also be needed to store the details of the
actual loans, such as the lending data and the return date.
This loan table would not need to store details of the book, the author or
the borrower, just the links to such data in the other tables.
The simplified diagram shows how the key fields in each table are used to
link each entity.
Student ID
Forename
Surname
Date of Birth
How is the key field used to connect tables and avoid data
redundancy?
To link tables, the key field in one table must appear as a foreign key field
in the other table. This is best explained with an example, in this case a
simplified relational database for a doctor’s surgery.
Two tables in a
relational database showing the link between them
There are two entities, ‘Patient’ and ‘Test Results’ so two tables are
created.
‘Forename’, ‘Surname’ etc. are examples of attributes in the
‘Patient’ entity.
‘Test Performed’ and ‘Test Result’ are examples of attributes in the
‘Test Result’ entity.
The example below shows how the two tables might look with some real
data.
Data stored in two tables in a relational database, showing the linked field
As can be seen, the Patient ID key field makes a link between the two
tables and, apart from the foreign key field, there is no duplication of data
so no data redundancy. If more details of the patient who was tested are
needed they can easily be accessed because of the relationship between
the two tables of data.
Problems with data inconsistency and data integrity are also avoided
because there is only one set of data.
One-to-one – e.g. Husband and wife. A husband can have one wife
and a wife can have one husband
One-to-many – e.g. Political party and politician. A political party has
many politicians, but a politician belongs to only one political party
Many-to-many – e.g. Actor and film. An actor stars in many films and
a film can have many actors
Note that database tables are often designed so that when a new record is
created, the key field entry is generated automatically as an incremental
number (e.g. in Microsoft Access you can assign a data type of
“AutoNumber” to a field). Sometimes however there is a more natural
choice of unique data which can be used instead, e.g. “Car Registration
Number”, “National Insurance Number” or “Product Serial Number”.
In order to create a relationship between two tables the primary key of one
of the tables must be added as a field to the other table where it will is
becomes the foreign key.
In the case of the library database, the shared field would be the ‘Student
ID’ field, i.e. the primary key of the Students table. It is selected because,
as a general rule, in one-to-many relationships between entities,
the primary key of the entity that is on the one end of the relationship is
used as the foreign key field of the entity that is on the many end of the
relationship (see the E-R diagram below).
After the Student ID field is added to the Library Loans table, the tables
would now look like the one below.
Database relationships
showing primary and foreign keys
By appearing in both tables, the ‘Student ID’ field has created a relationship
between the two tables. It acts as the primary key field in the Students
table and as a foreign key field in the Library Loans table.
Unlike the ‘Loan ID’ field, the ‘Student ID’ field can occur in more than one
record of the Library Loans table because a student may have borrowed
more than one resource from the library.
The diagram below shows example data in the Students and Library Loans
tables. Notice how the values of the primary key in the Students table
match the value of the foreign key in the Library Loans table. This creates a
link between the two tables and permits questions such as “What loans has
student 6783 borrowed?” to be answered.
A form is used for data INPUT and for displaying and editing selected
data, rather than having to use a database table directly. When edited, the
changes update the database table.
Data can be laid out in a meaningful way with clear labels rather
than just columns of data and field names.
Font colour, type and size can be customised to improve clarity.
Graphics such as boxes and lines can be used to group and separate
information.
Images can be included on forms, both as fields and as graphics such
as logos.
Forms can be tailored to accept only certain information or only
information in the correct format.
Drop-down list boxes can speed up data input as well as reducing
errors.
Forms can include control buttons to navigate to other records or
forms, close forms, open help screens or print out information.
A typical database form
In the GCSE Computing exam, you might well be asked to design a layout
for a form. Make sure you take note of the following:
always include a title that clearly describes the purpose of the form;
arrange the fields in a logical order, grouping similar fields if
necessary;
make the field labels as clear as possible (they do not have to
exactly match the field names);
position each field label carefully, either above or to the left of the
field;
make sure the field label and the field itself are easily
distinguished(showing the field as a box with the field name inside is
the best option);
if the form uses data from multiple tables, make sure you include all
the fields relevant to the form;
optional – add symbols to show drop down lists and navigation
buttons (but label these clearly if you include them);
if an image field is required, think about the orientation (portrait or
landscape) and don’t forget to add a label;
make sure you fill the box you are given to create your design in and
spread things out.
A typical examination question and design sketch answeral
What is a query?
There are many reasons why users may wish to query data, including:
The above list shows that queries are a means of producing information
from data. This information is used by the decision makers in
organisations to plan strategies and tactics. Databases usually allow users
to create and reuse their own queries.
A query design specifies which records the user is searching for and what
fields to display out of those records. There are two types of query:
A
simple report from a database
A report is used for data OUTPUT. It can either be displayed on a screen or
more typically be printed.
Reports can include charts and formatting to make their information easy
to assimilate. As with forms, reports can be customised to include the
logos, colours and font styles of an organisation.
When a database is designed, all the fields are set to accept a particular
data type. This helps check for the wrong type of data being entered
(validation). For example, a database will not allow you to store text in a
field of data type: ‘integer’.
Data types also help to make sure that the data is stored as efficiently as
possible. For example a field of data type ‘integer’ requires less storage
space to hold the value 2 compared a field of data type “real”.
Data types also help ensure that data can be sorted correctly. For example,
if the integer values 1, 2, 3 and 20 are stored in a number field then they
can easily be sorted correctly. However, if the same values were stored in
a text/alphanumeric field as text then they would sort in the alphabetical
order: 1, 2, 20, 3. In the same way, date fields allow data to be sorted by
year, then month and then day.
Text/String/Alphanumeric
A field of data type text will accept both numbers and text.
When to use: Use text data types if you want to enter text or a
mixture of text and numbers into a field.
When NOT to use: if you want to store only numbers or only dates.
You can enter numbers or dates into text fields but they would get
sorted as if they were words and could not easily be used in number
or date calculations. They would also be more difficult to validate
using range checks.
Note, telephone ‘numbers’ are always stored as text, this allows a
leading zero, spaces, brackets etc.
Number
Real: This is for numbers which can have decimal places. For
example:
132.01
45.7
9.18652493
Real numbers can be formatted as currency (e.g. £5.67) and/or to a
fixed number of decimal places (e.g. entering 3.1427 into a real
number field formatted to two decimal places would mean it appeared
as 3.14)
When to use the real data type:
use the real data type if the field is going to be used to store
numbers where precision is important. For example:
Height of people (e.g. 1.56m)
Length of a room (e.g. 3.7m)
Price of goods (e.g. £4.25) (Note: this data would be
formatted as currency but would still be a real number)
Integer: This for whole numbers with no decimal places. For example:
44
-763
9321389
When to use the integer data type:
Integers are used when accuracy is not of vital importance or
when the value allocated is specifically a whole number (e.g. a
count of something).
For example:
Number of car doors
Large amounts of money (e.g. Average house price
£155000)
Number of children
Logical/Boolean
A field of data type Boolean is one which can contain only one of two
values: true and false. These two values can be used to represent
anything containing two states.
When to use a Boolean data type – when you need to store a data
item that has only one of two possible values. It is often phrased as a
question:
Has the pet been vaccinated?
Does the property have a garden?
Are you female?
Date/Time
A field of this type stores days, months and years so that records can
be sorted correctly (by year, then month and finally day) and be easily
validated using range checks. Date fields can display the date information
in different formats such as the full name of the day/month (28th August
1961) or the numerical versions (28/08/61).
When to use a date/time field data type – when you need to store
dates or times and have the data easily validated and sorted.
Inventory Management
Payroll
Human Resources (i.e. staff management)
Marketing
Sales
Purchasing Order Management
The following example uses a complex query to find all of the pupils in
Form 10B who were born before 1995. This query uses the logical
operator AND and is written as:
(Form = “10B”) AND (Date of Birth < 01/01/1995)
= (equals)
< (less than)
≤ (less than or equal to)
> (greater than)
≥ (greater than or equal to)
<> (not equal to)
A query design in MS Access is shown below. Note that there are two
entries in the search criteria row and that the query has been given a
meaningful name (“Select Query – Pupils in 10B born before 1995”).
This time, in the query definition there will be two criteria rows. The query
and its results are shown below:
A complex
query using OR
What are wildcards in queries?
sensible
reasonable
within acceptable boundaries
complete
Type Checks
Field data types provide a basic method of validation. Field data types are
assigned to fields during the creation of the database table and data types
such as Numeric, Boolean, Date/Time and Image restrict what can be
entered.
Range checks
These are used with numerical data to limit the range of numbers a user
can enter.
Check digits
This type of check is used with numbers. An extra ‘check digit’ is calculated
from the numbers to be entered and added to the end. The numbers can
then be checked at any stage by re-calculating the check digit from the
other numbers and seeing if it matches the one entered. One example
where a check digit is used is in the 10 digit ISBN number which uniquely
identifies books. The last number of the ISBN is actually the check digit for
the other numbers, for example – the ISBN 0192761501.
i.e. the number 019276150 would have a check digit of 1 as shown below.
Check
digit
Position 10 9 8 7 6 5 4 3 2 1
(Number) x 17 remainder
0 9 72 14 42 30 4 15 0 1 187
(position) 0
The weighted checksum without the check digit is 186. 186 divided
by 11 = 16 remainder 10. 11 minus the remainder therefore gives a
check digit of 1
When the check digit is included in the calculation in position 1 then
the checksum is 187. 187 divided by 11 = 17 remainder 0 so the
check digit matches the rest of the numbers which means they were
probably entered correctly.
Presence checks
A presence check
These simply check that an entry has been made in a particular field
before the user can continue i.e. a null value (empty field) is not permitted.
Usually, not every field in a record needs to be filled in, however there are
likely to be some that must have a value and the presence check means
that the system will not allow the record to be saved until an entry is made.
Length Checks
All alphanumeric data has a length. A single character has a length of 1
and a string of text such as “Hello World” has a length of 11 (spaces are
counted in text strings). A length check ensures that such data is either an
exact length or does not exceed a specified number of characters.
Lookup
A lookup check takes the value entered and compares it against a list of
values in a separate table. It can then return confirmation of the value
entered or a second list based on the value.
Dropdown list
One use of lookups restricts users to pre-defined input using drop-down
lists. A user is forced to use a list box to select from a predetermined list of
valid values. For example, they may have to select Male or Female or an
exam grade that should be in the range ‘A’…’G’ or ‘U’. , etc.
A database system that manages airline seat bookings could use this
check to limit the entries available in a list box selection. Rather than
checking for double-bookings after data has been entered, travel agents
could be presented with a list box that only shows the seats that are still
available. This would eliminate double-bookings and is an example of more
complicated, ‘real-time’ lookup validation.
Validation using referential integrity in a
database
Referential integrity is a special form of lookup. It is used in relational
databases to enforce valid inputs and involves checking entries in certain
fields against values in other tables.
Input Masks
TY856J
FS871K
AP120N
The first two characters are letter, the next three characters are numbers
and the last character is a letter.
Pasword verification
SUMMARY