Database & File Concepts Key Terms
Database & File Concepts Key Terms
Key Terms:
Each entity has categories of information. These categories are known as attributes or fields. In
the example above Last Name, First Name, Job Title, titles and Employee ID are all field/attribute
names.
One of the fields will be known as the primary key. The primary key contains a unique value for
each record. This means that each value can only exist once in that table so it is possible to
identify each record using the primary key. In the example above, Sales Rep ID is the primary key.
DBMS
The software that is used to manage the database is called a database management system
(DBMS). Sometimes, this is referred to as a relational database management system (RDBMS) as
it is managing a database that includes relationships.
Data Types
Each field in a table will have a data type assigned to it. Data types include:
• Text
• Alphanumeric
• Numeric (integer/decimal)
• Date/time
• Boolean
Fields within a table will have field sizes applied to them. This is because most fields are a fixed
length. This means that only a specified amount of data can be stored in each field.
Text and alphanumeric fields will have a length to specify the maximum number of characters
that can be stored. For example, the Product Name in the Product table is limited to 40
characters. This avoids having lots of wasted storage space where field space is not used up if the
length is too long.
Numbers can also have a field size. This could be defined as the number of digits or it could be
defined as the maximum numeric value.
Dates will always be the same field size as they will always store the date is the same way, but
they can be formatted to be displayed differently.
Some text fields can be formatted to be a variable length which means they can store as little or
as much data as possible. These are sometimes referred to as memo or long text data types.
These are useful for fields that will contain notes or comments.
Relationships
Relationships within a database can be used to connect entities together. A foreign key is an
attribute (field) in one entity that connects to a primary key in another entity. This allows related
data to be looked up and found.
In the Order table, the foreign key is Customer ID which connects to the primary key Customer
ID in the customer table. For Order Number 4, the Customer ID 3 is looked up in the Customer
table to find Damion Matkin.
The Sales Rep table stores details of the sales representatives within a business. This only
contains basic information about their name but their full employee details are stored in a
separate table called Employee. Each sales representative only has one employee record and
each employee record can only refer to one sales rep record.
One-to-Many
A one-to-many relationship is when each record in one table can connect to many (zero or more)
records in another table. A foreign key will exist within the table on the many side of the
relationship and will connect to a primary key in the one side of the relationship. This is the most
common type of relationship within relational databases.
Figure: One to Many relationship Between Product table & Category table
Many-to-Many
Many-to-many relationships are only conceptual. They are not used in relational databases
because they are converted into two sets of one-to-many relationships. In a many-to-many
relationship, each record in one table can connect to many records in another table but each
record in the other table can also connect to many records in the original table.
Create & Use Relationships
One-to-Many
When creating a one-to-many relationship, there are some rules to follow:
• the table on the one side must have a primary key.
• the table on the many side will have a foreign key.
• the data type and field size of the foreign key must match the primary key on the one side.
• only data items that exist in the primary key on the one side can be used in the foreign key.
One-to-One
When creating a one-to-one relationship, there are also some rules to follow:
• at least one of the tables (table A) must have a primary key
• the other table (table B) must either have a primary key that is also a foreign key and will link
to the primary key in table A or a foreign key field with a unique index that will link to the primary
key in table A.
• the data type and field size of the foreign key in table B and primary key in table A must match.
• only data items that exist in the primary key in table A can be used in the foreign key in another
table.
An entity relationship diagram (ERD) shows the relationships (connections) between each entity.
Each entity is represented by a rectangle. Each relationship is represented by a line.
Figure shows a one-to-one relationship between a Sales Rep and an Employee. Each sales rep is
related to one employee and each employee can only be one sales rep.
Figure shows a one-to-many relationship between Category and Product. Each category can have
many products, but each product has only one category.
Figure shows a many-to-many relationship between Order and Product. Each order can be for
many products and each product can exist on many orders. This is a conceptual diagram only.
Other RDBMSs may use two symbols at each end of the relationship. For example, 0:1or 0| could
be used to depict that there can be between zero and one related record on that side of the
relationship, whereas 1:1or || could be used to depict that there must be exactly one related
record on that side of the relationship.
Compound key
A compound key is two or more fields combined to form a unique identity.
Foreign key
A foreign key is a field in a table that refers to the primary key in another table. It is used to create
the relationship between the two tables. The foreign key must always have the same data type
and field size as the primary key it is linking to.
Referential Integrity
Referential integrity exists when data in the foreign key of the table on the many side of a
relationship exists in the primary key of the table on the one side of a relationship.
In the Order table above, Customer ID 5 does not exist in the Customer table. This means that
the Order table does not contain referential integrity because the related customer does not
exist.
Validation Rules
The data that has been validated is not necessarily correct. It is therefore necessary to verify data.
When inputting data into a database, you should verify that the data input has been input
correctly by comparing what has been entered with the original source.
Simple Queries
A simple query is a query that only has one criterion (singular for criteria). A criterion is the rule
that will be applied to the data.
Complex Queries
Complex queries are queries that have more than one criterion. They use either the AND or OR
Boolean operators. If all specified criteria need to be satisfied, then the AND operator is used. If
any one of the specified criteria needs to be satisfied, then the OR operator is used.
Nested Queries
So far, each query that you have seen has been based upon existing tables within a database. It
is also possible to create a query on an existing query. This means that the results of the original
query will be narrowed down further by the new query. This is called a nested query.
Summary Queries
A cross-tab query is a type of summary query used to summarize data. It can be used to show the
total number of records, the sum of values within a field, averages or other statistical data.
Complex queries should be used when more than one criterion is required, such as Gender=Male
AND Country = Israel, or Gender= Male OR Country=Israel.
Static Parameters
So far, all the queries you have used include static parameters. The parameters are the values
used within the criteria. Each time the query is run, the values of the parameters will not change.
Dynamic Parameters
Dynamic parameters allow the values within the criteria to be chosen at the time of running the
query. This can be useful when the user needs to decide what those parameters might be.
Analyzing and evaluating when static and dynamic parameters should be used in queries. Static
parameter values should be used when those parameter values will not change, no matter how
many times the query is used. For example, if you wanted to search a table on a regular basis for
all customers based in China, then you would use a static parameter query with the criterion of
Country = “China". Dynamic parameters should be used when the user is likely to want to change
the value each time the query is run. For example, if you wanted a query that would enable you
to search for a specific product code, but that product code could be different each time, then
you would use a dynamic query with the criterion of Product Code=[Please enter product code],
Export Data
Table
Data in tables can be exported as a delimited file in CSV format, as fixed length fields in text
format or as a table within a RTF file. It is sensible to save the field names when saving to CSV or
text files.
Query
Data from queries can be exported in the same way as tables, but only the data that meets the
criteria of the query will be exported.
Report
Reports include formatting, so the best method of export is to a RTF file which will include all the
original formatting. Data that is exported from a report to a text file will lose its formatting.
Designing a Relational Database
Normalisation is a set of formal rules that must be considered once we have a set of table
designs. By following the normalisation rules we ensure that the final table designs do not result
in duplicated data. If the initial designs were well thought through then the normalisation process
will not result in any changes to the table designs.
Table: ORDER
Table: ORDER-PRODUCTS
In our ORDER-PRODUCTS table, Description depends only on ProdID and not on Num. Hence the
non-key attribute (Description) is not dependent on all of the primary key. We say that
Description is dependent on ProdID or, turned around: ProdID determines Description or ProdID
→ Description.
We remove the partial dependency by:
● moving the Description attribute to a new table.
● linking the new table to the ORDER-PRODUCTS table with a foreign key.
Third normal form (like second normal form) is concerned with the non-key attributes. To be in
3NF, there must be no dependencies between any of the non-key attributes. A table with no or
one non-key attribute must be in 3NF, so PRODUCT and ORDER-PRODUCTS are in 3NF.
There is a problem with the original ORDER table. City determines the Country, so we have two
nonkey attributes which are dependent. This means that ORDER is not in 3NF. Below tables and
show the data in third normal form.
File Types
When data is saved it is stored in a file. Different software applications use data in different ways
and so the way the data is stored differs between application types. For example, a database
stores data in tables, whereas graphics software stores data about pixels. Each file will typically
include a header, which will be metadata (data about the file), then the main content will be
stored followed by an end-of-file marker. To a user, file types are usually identified by their
extension. For example, Students.txt has an extension of txt which identifies it as a text file.
Proprietary file formats are file types that have been developed by software manufacturers solely
for use within their software. Using their own formats means that software manufacturers are
free to develop software features that will store data in a way that is most suitable for the
software and without waiting for a standard format to adapt to the software’s needs. This
enables software to improve and provide new features that otherwise would not be available.
Open-Source File Formats
Open-source file formats are file types that have been developed for the purpose of being used
by any proprietary software or open-source software. They are free from copyright, patents and
trademarks, and their structure is known publicly. They are usually maintained by an
international standards organization or a public interest group. Their main advantage is that the
files can be shared between users of different software. However, they can hold back
development of open-source software because new features will require the file format standard
to be updated.
However, when reading the data, it was still necessary to read the whole file serially from the
beginning because there was no way of knowing where each record was stored. Indexed
sequential files are stored in exactly the same way as sequential files but the file also has an index
based on the field used to sort the file. A field with an index is known as a secondary key. The
index file stores each secondary key value and the address in storage (e.g. tape or disk) where
the first record containing that value is stored.
The index is small enough to store in main memory and so all that needs to be done to find a
record is to search the index, find the location in storage and then read the records from that
point until the record is found.
With direct file access, records are stored in a random order. There is no sequence. When storing
a file, a hashing algorithm (calculation) is performed on the key field to determine the storage
address where the record should be stored. Then when the record is searched for, the same
hashing algorithm will be performed on the key field to determine where the record can be
found. The computer system can then directly access that record without having to read through
other records.
The additional software is likely to collate data from more than one database and interconnect
the data from those databases to produce reports that analyze all the data together. When
additional software is used to collate data from more than one database, it is often referred to
as an executive information system (EIS).
Example
Managers within a large second-hand car dealership Need to be able to monitor sales. They need
to be able to identify trends in sales for different makes and models of cars at different times of
the year. This will enable them to identify which cars are selling the most and which are making
the most profit. They can then decide which second-hand cars they want to acquire to sell.
Marketing managers can analyze how effective a marketing campaign was by comparing sates
figures during an advertising campaign with sales figures outside the advertising campaign. This
will help them to decide whether to run similar campaigns in the future.