0% found this document useful (0 votes)
120 views23 pages

Database & File Concepts Key Terms

Uploaded by

kai Leo-Andoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views23 pages

Database & File Concepts Key Terms

Uploaded by

kai Leo-Andoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

IT Chapter 10th Database & File Concepts Fahim Siddiq 03336581412

Key Terms:

Database: It is a structured method of storing data.


Table: A set of similar data (about people, places, objects or events).
Record: A common word for entity.
Entity: A set of data about one thing (person, place, object or event).
Attribute: A category of information within an entity.
Field: A common word for attribute.
Primary key: A field that contains the unique identifier for a record.
Database management system: software used to manage a database.
Relationship: The way in which two entities in two different tables are connected.
Foreign key: A field in a table that refers to the primary key in another table.
Normal form: The extent to which a database has been normalized.
Index: A list of keys or keywords which identify a unique record and can be used to search and
sort records more quickly.
Entity relationship diagram: A diagram that represents the relationships between entities.
Flat file: A database stored in a single table.
Compound key: Two or more fields that form the primary key.
Referential Integrity: data in the foreign key of the table on the many side of a relationship
must exist in the primary key of the table on the one side of a relationship.
Query: A question used to retrieve data from a database.
Parameter: Data used within the criteria for a query.
What is a database?
A database is a structured method for storing data in sets of tables. Each table contains similar
data about people, places, objects or events.
The above table shows the sales representatives, within a table are rows known as records. Each
record is an entity which is a set of data about one single thing (person, place, object or event).
In the example above, the data about Davolio Nancy is a single record/entity.

Each entity has categories of information. These categories are known as attributes or fields. In
the example above Last Name, First Name, Job Title, titles and Employee ID are all field/attribute
names.

One of the fields will be known as the primary key. The primary key contains a unique value for
each record. This means that each value can only exist once in that table so it is possible to
identify each record using the primary key. In the example above, Sales Rep ID is the primary key.

DBMS
The software that is used to manage the database is called a database management system
(DBMS). Sometimes, this is referred to as a relational database management system (RDBMS) as
it is managing a database that includes relationships.

Data Types
Each field in a table will have a data type assigned to it. Data types include:

• Text
• Alphanumeric
• Numeric (integer/decimal)
• Date/time
• Boolean

Fields within a table will have field sizes applied to them. This is because most fields are a fixed
length. This means that only a specified amount of data can be stored in each field.
Text and alphanumeric fields will have a length to specify the maximum number of characters
that can be stored. For example, the Product Name in the Product table is limited to 40
characters. This avoids having lots of wasted storage space where field space is not used up if the
length is too long.

Numbers can also have a field size. This could be defined as the number of digits or it could be
defined as the maximum numeric value.

Dates will always be the same field size as they will always store the date is the same way, but
they can be formatted to be displayed differently.

Some text fields can be formatted to be a variable length which means they can store as little or
as much data as possible. These are sometimes referred to as memo or long text data types.
These are useful for fields that will contain notes or comments.

Relationships
Relationships within a database can be used to connect entities together. A foreign key is an
attribute (field) in one entity that connects to a primary key in another entity. This allows related
data to be looked up and found.
In the Order table, the foreign key is Customer ID which connects to the primary key Customer
ID in the customer table. For Order Number 4, the Customer ID 3 is looked up in the Customer
table to find Damion Matkin.

The Three Relationships


One-to-one
A one-to-one relationship is when each record in one table only connects to one record in
another table. Each foreign key value will link to one primary key value and each primary key
value will only be linked to by one foreign key value. The foreign key can exist on either side of
the relationship.

Figure - One-to-one relationship, Sales Rep table; & Employee table.

The Sales Rep table stores details of the sales representatives within a business. This only
contains basic information about their name but their full employee details are stored in a
separate table called Employee. Each sales representative only has one employee record and
each employee record can only refer to one sales rep record.

One-to-Many
A one-to-many relationship is when each record in one table can connect to many (zero or more)
records in another table. A foreign key will exist within the table on the many side of the
relationship and will connect to a primary key in the one side of the relationship. This is the most
common type of relationship within relational databases.
Figure: One to Many relationship Between Product table & Category table

Many-to-Many

Many-to-many relationships are only conceptual. They are not used in relational databases
because they are converted into two sets of one-to-many relationships. In a many-to-many
relationship, each record in one table can connect to many records in another table but each
record in the other table can also connect to many records in the original table.
Create & Use Relationships

One-to-Many
When creating a one-to-many relationship, there are some rules to follow:
• the table on the one side must have a primary key.
• the table on the many side will have a foreign key.
• the data type and field size of the foreign key must match the primary key on the one side.
• only data items that exist in the primary key on the one side can be used in the foreign key.

One-to-One

When creating a one-to-one relationship, there are also some rules to follow:
• at least one of the tables (table A) must have a primary key
• the other table (table B) must either have a primary key that is also a foreign key and will link
to the primary key in table A or a foreign key field with a unique index that will link to the primary
key in table A.
• the data type and field size of the foreign key in table B and primary key in table A must match.
• only data items that exist in the primary key in table A can be used in the foreign key in another
table.

Create and Interpret an Entity Relationship Diagram

An entity relationship diagram (ERD) shows the relationships (connections) between each entity.
Each entity is represented by a rectangle. Each relationship is represented by a line.

Figure shows a one-to-one relationship between a Sales Rep and an Employee. Each sales rep is
related to one employee and each employee can only be one sales rep.

Figure shows a one-to-many relationship between Category and Product. Each category can have
many products, but each product has only one category.

Figure shows a many-to-many relationship between Order and Product. Each order can be for
many products and each product can exist on many orders. This is a conceptual diagram only.
Other RDBMSs may use two symbols at each end of the relationship. For example, 0:1or 0| could
be used to depict that there can be between zero and one related record on that side of the
relationship, whereas 1:1or || could be used to depict that there must be exactly one related
record on that side of the relationship.

The Difference Between a Flat File and a Relational Database


Flat Files Relational Database
Data are stored in a number of files. Data are contained in a single software
application – the relational database or DBMS
software.
Data are highly likely to be duplicated and may Duplication of data is minimized and so the
become inconsistent – it can never be certain chance of data inconsistency is reduced.
that all copies of a piece of data have been As long as there is a link to the table storing
updated. the data, they can always be accessed via the
link rather than repeating the data.
Good database design avoids data
duplication.
Because of data duplication, the volume of Because data duplication is minimized, the
data stored is large. volume of data is reduced, leading to faster
searching and sorting of data.
When data structures need to be altered, the Data structures remain the same even when
software must be re-written. the tables are altered. Existing programs do
not need to be altered when a table design is
changed.
Views of the data are governed by the Queries and reports can be set up with simple
different files used to control the data and “point and click” features or using the data
produced by individual departments. All views manipulation language. A novice user can
of the data have to be programmed and this is write queries quickly.
very time-consuming.
Primary key
A primary key is a unique identifier for each record in a table. Therefore, the field used for the
primary key must contain unique values and not have any repeating values.
Examples of primary keys could include:
• registration plate for a car
• student number for a student
• product code for a product.

Compound key
A compound key is two or more fields combined to form a unique identity.

Foreign key
A foreign key is a field in a table that refers to the primary key in another table. It is used to create
the relationship between the two tables. The foreign key must always have the same data type
and field size as the primary key it is linking to.

Candidate key: A key that could be chosen as the primary key.


Secondary key: A candidate key that has not been chosen as the primary key.

Referential Integrity
Referential integrity exists when data in the foreign key of the table on the many side of a
relationship exists in the primary key of the table on the one side of a relationship.

In the Order table above, Customer ID 5 does not exist in the Customer table. This means that
the Order table does not contain referential integrity because the related customer does not
exist.
Validation Rules

Type Field Rule


Lookup in list Gender ”M"or“F”
Lookup in list Title IN (“Mr”, “Mrs”, "Miss", “Dr”)
Range Date of Birth >DATE() (must be after today's date)
Range Date Joined >28/02/1995
Range Reorder Amount Between 1 and 2000
Range Reorder Level >0
Data Type State Like "[A-Z][A-Z]” (must be two text characters)
Format Email Address Like “*@*.*” (* means any character)
Length Color Like “??” (must be two characters)
Presence Forename IS NOT NULL

Verify Data Entry

The data that has been validated is not necessarily correct. It is therefore necessary to verify data.
When inputting data into a database, you should verify that the data input has been input
correctly by comparing what has been entered with the original source.

Simple Queries
A simple query is a query that only has one criterion (singular for criteria). A criterion is the rule
that will be applied to the data.
Complex Queries
Complex queries are queries that have more than one criterion. They use either the AND or OR
Boolean operators. If all specified criteria need to be satisfied, then the AND operator is used. If
any one of the specified criteria needs to be satisfied, then the OR operator is used.

Nested Queries
So far, each query that you have seen has been based upon existing tables within a database. It
is also possible to create a query on an existing query. This means that the results of the original
query will be narrowed down further by the new query. This is called a nested query.

Summary Queries
A cross-tab query is a type of summary query used to summarize data. It can be used to show the
total number of records, the sum of values within a field, averages or other statistical data.
Complex queries should be used when more than one criterion is required, such as Gender=Male
AND Country = Israel, or Gender= Male OR Country=Israel.

Static Parameters
So far, all the queries you have used include static parameters. The parameters are the values
used within the criteria. Each time the query is run, the values of the parameters will not change.

Dynamic Parameters
Dynamic parameters allow the values within the criteria to be chosen at the time of running the
query. This can be useful when the user needs to decide what those parameters might be.
Analyzing and evaluating when static and dynamic parameters should be used in queries. Static
parameter values should be used when those parameter values will not change, no matter how
many times the query is used. For example, if you wanted to search a table on a regular basis for
all customers based in China, then you would use a static parameter query with the criterion of
Country = “China". Dynamic parameters should be used when the user is likely to want to change
the value each time the query is run. For example, if you wanted a query that would enable you
to search for a specific product code, but that product code could be different each time, then
you would use a dynamic query with the criterion of Product Code=[Please enter product code],

Perform Calculations Within a Database


Within a database, calculations can be performed on fields. This can be done within forms,
reports and queries. Within a form, a text box can be added to the form which includes the
calculation. As with spreadsheets, the calculation must start with the equals (=) sign.
Sort Data (Ascending/Descending)
Data can be sorted in ascending or descending order. This can be done within a table, within a
query or within a report. The order can be based on numbers, alphabetical characters, dates or
times.

Sort Data (Grouped)


More than one field can be used to create the sort. This is commonly used when sorting names
into order. The surname would be set as the first sort order and then if there are any people with
the same surname, the forename would be set as the second sort order.

Data Entry Forms


As its name suggests, a data entry form is used for the entry of data. It can also be used to view
existing data. When designing a data entry form, you should consider who will be using it and
how to make the form effective and user friendly by using some of the following techniques.

• Appropriate font styles and sizes


• Spacing between fields
• Character spacing of individual fields
• Use of white space
• Navigation buttons
• Radio buttons
• Check boxes
• Drop-down menus
• Highlighting key fields

Export Data

Table
Data in tables can be exported as a delimited file in CSV format, as fixed length fields in text
format or as a table within a RTF file. It is sensible to save the field names when saving to CSV or
text files.

Query
Data from queries can be exported in the same way as tables, but only the data that meets the
criteria of the query will be exported.

Report
Reports include formatting, so the best method of export is to a RTF file which will include all the
original formatting. Data that is exported from a report to a text file will lose its formatting.
Designing a Relational Database

Normalisation is a set of formal rules that must be considered once we have a set of table
designs. By following the normalisation rules we ensure that the final table designs do not result
in duplicated data. If the initial designs were well thought through then the normalisation process
will not result in any changes to the table designs.

Table: ORDER

First normal form (1NF)


A table with no repeating groups is said to be in first normal form. Table “Order” has repeating
groups in the attributes ProdID and Description. We remove the
repeating groups by:
● moving the ProdID and Description attributes to a new table.
● linking the new table to the original table ORDER with a foreign key.
Below tables show the data in first normal form. The primary key of each table is shown in red.

Table: ORDER (1NF).

Table: ORDER-PRODUCTS

Second normal form (2NF)


A table is in second normal form if any partial dependencies have been removed. That is, every
non-key attribute must be fully dependent on all of the primary key.

In our ORDER-PRODUCTS table, Description depends only on ProdID and not on Num. Hence the
non-key attribute (Description) is not dependent on all of the primary key. We say that
Description is dependent on ProdID or, turned around: ProdID determines Description or ProdID
→ Description.
We remove the partial dependency by:
● moving the Description attribute to a new table.
● linking the new table to the ORDER-PRODUCTS table with a foreign key.

Below tables show the data in second normal form.


Table: ORDERPRODUCTS (2NF). Table: The PRODUCT(2NF)

At this stage, the ORDER-PRODUCTS table is fully normalised:


● 1NF – it does not have a repeated group of attributes.
● 2NF – there are no non-key attributes.

The PRODUCT table is also fully normalised:


● 1NF – it does not have a repeated group of attributes.
● 2NF – it has a single-attribute primary key.

Third normal form (3NF)

Third normal form (like second normal form) is concerned with the non-key attributes. To be in
3NF, there must be no dependencies between any of the non-key attributes. A table with no or
one non-key attribute must be in 3NF, so PRODUCT and ORDER-PRODUCTS are in 3NF.

There is a problem with the original ORDER table. City determines the Country, so we have two
nonkey attributes which are dependent. This means that ORDER is not in 3NF. Below tables and
show the data in third normal form.

Table: ORDER table (3NF). Table: CITY-COUNTRIES (3NF).


To summarise, we have been through the stages shown in below table. The primary key is
underlined.

Data Dictionary: Metadata (information) about the Database.


A data dictionary is a document or file that describes the structure of the data held within the
database. It is known as metadata which means ‘data about data’. It is a tool that is used by
database developers and administrators. It will include the following items:

1. Data About Fields:


• Field names to identify each field.
• Data types, such as text, integer, date/time.
• Field size, such as the length of a text field or the maximum value of a numeric field
format of fields.
• Default values which are values a field is set to be initially when a new record is created
primary keys, compound keys and foreign keys.
• Indexed fields which improve search times.
• Validation rules that restrict data entry for that field.

2. Data About Tables:


• The primary key of the table.
• What sort order to use when displaying data relationships to other tables.
• Total number of records.
• Validation rules that apply based on multiple fields within the table.
• Permissions and security as to which users can access the table.
File and Data Management

File Types
When data is saved it is stored in a file. Different software applications use data in different ways
and so the way the data is stored differs between application types. For example, a database
stores data in tables, whereas graphics software stores data about pixels. Each file will typically
include a header, which will be metadata (data about the file), then the main content will be
stored followed by an end-of-file marker. To a user, file types are usually identified by their
extension. For example, Students.txt has an extension of txt which identifies it as a text file.

Extension File type Purpose


Stores plain text without any
formatting. It is useful for
.txt Text transferring data between
applications, but any
formatting is lost.
Stores structured data as
plain text in rows with each
column separated by
commas. It is useful for
.csv Comma separated values transferring data between
databases and spreadsheets
or other applications which
require data in a structured
format.
Stores text-based documents
and includes the formatting
(rich text). It is used to
.rtf Rich text format transfer data between
different word processing or
other text-based applications.
Stores Microsoft’s word
processing documents in
.docx Microsoft Word XML open XML format by saving
document all objects separately with in a
compressed file.
Used to share read-only
documents in a common
.pdf Portable Document Format format that can be accessed
by any PDF reader software. It
is commonly used for storing
documents on the web as its
contents can be indexed by
search engines.
An open-source file type for
word processor documents
.odt OpenDocument Text that is used by opensource
word processors and is not
tied to one manufacturer.
An open-source file type for
spreadsheets that is used by
Open Document Spreadsheet open-source spreadsheet
.ods software and is not tied to
one manufacturer.
An open-source file type for
presentations that is used by
open-source presentation
.odp Open Document Presentation software and is not tied to
one manufacturer.
.html Hypertext Markup Language Stores web pages that can be
opened by any web browser.
A data file that uses markup
language to define objects
and their attributes.
.xml Extensible Markup Language They are used to transfer data
between applications and can
be read by a simple text
editor.
Microsoft’s method of storing
video files with very little
Audio Video Interleave (video compression. File
.avi file) sizes are very big but no data
is lost.
Audio and video are
Moving Pictures Experts compressed and videos can
.mp4 Group (MPEG) Layer-4 (video be shared across the
file) internet.
Stores audio files as
waveform data and enables
Waveform Audio File Format different sampling rates and
.wav bit rates. This is the standard
format for audio CDs but does
not include compression so
files are large.
Stores audio files in a
compressed format
MPEG Layer-3 audio approximately 10% the size
.mp3 compression of .wav files. Enables audio
files to be shared across the
internet.
Stores images as
uncompressed raster images,
.bmp Bitmap image storing each pixel individually.
They are large files but can be
accessed by any software.
Stores images as compressed
raster images. It is used by
Joint Photographic Experts most digital cameras and is a
.jpeg Group (compressed image) common format for web
graphics but its use of lossy
compression can mean some
quality is lost.
Stores images as compressed
raster images and can include
.png Portable Network Graphic background transparency
colors making it useful when
images are required on
different color backgrounds.
Stores images as two-
dimensional (2D) vector
.svg Scalable Vector Graphics graphics. It is a standard
format for using vector
graphics on the web.
Stores program object code
.exe Executable program file which enables the program to
be executed by the computer.

Proprietary File Formats

Proprietary file formats are file types that have been developed by software manufacturers solely
for use within their software. Using their own formats means that software manufacturers are
free to develop software features that will store data in a way that is most suitable for the
software and without waiting for a standard format to adapt to the software’s needs. This
enables software to improve and provide new features that otherwise would not be available.
Open-Source File Formats
Open-source file formats are file types that have been developed for the purpose of being used
by any proprietary software or open-source software. They are free from copyright, patents and
trademarks, and their structure is known publicly. They are usually maintained by an
international standards organization or a public interest group. Their main advantage is that the
files can be shared between users of different software. However, they can hold back
development of open-source software because new features will require the file format standard
to be updated.

Generic File Formats


Generic file formats enable data to be transferred between software. Data can be exported from
software to a generic file format and generic file formats can be imported into software. They
store the essential data but will not include any formatting. The two main file formats used within
databases are CSV and TXT.
Indexed Sequential Access
Many years ago, data was often stored on tape, which required records to be written one after
another onto the tape. This was known as storing the data serially. To access the data, all the
records would need to be read from the first onwards until the required record was found or
until the end of the file was reached. It could take a very longtime to read through a whole table
of data and so indexed sequential files were developed. Indexed sequential files still store records
one after each other but they are sorted into an order based upon a field. For example, data
about customers might be sorted into surname order or customer ID order. Sequential files are
particularly useful when data is being batch processed such as when gas bills are being generated
and the master customer file will be processed in order of customer ID and any transaction files
will also be processed in order of customer ID.

However, when reading the data, it was still necessary to read the whole file serially from the
beginning because there was no way of knowing where each record was stored. Indexed
sequential files are stored in exactly the same way as sequential files but the file also has an index
based on the field used to sort the file. A field with an index is known as a secondary key. The
index file stores each secondary key value and the address in storage (e.g. tape or disk) where
the first record containing that value is stored.

The index is small enough to store in main memory and so all that needs to be done to find a
record is to search the index, find the location in storage and then read the records from that
point until the record is found.

Direct File Access


The use of indexed sequential file access still requires some serial access of data and there are
problems with trying to maintain a file in a sequential order as new records are added and old
records deleted.

With direct file access, records are stored in a random order. There is no sequence. When storing
a file, a hashing algorithm (calculation) is performed on the key field to determine the storage
address where the record should be stored. Then when the record is searched for, the same
hashing algorithm will be performed on the key field to determine where the record can be
found. The computer system can then directly access that record without having to read through
other records.

Hierarchical Database Management Systems


The hierarchical database model was created in the 1960s and is not commonly used today. The
model relies upon a tree structure where each parent branch is the one side of a relationship and
each child branch is the many side of a relationship. The tree structure can only deal with one to
many relationships and can only work in one direction. Hierarchical databases are only suitable
for models which have a strict hierarchy.
Each disk contains folder sand there may be further subfolders within each folder. Each subfolder
has only one folder at the level above it. To find the data, the user browses through the system,
selects the disk the data is stored on, then selects the folder, then selects the next subfolder until
eventually the file is found.

Management Information Systems

A management information system (MIS) provides summary information to managers to enable


them to make decisions. The MIS will collate data from a database and present it in the form of
reports and charts. These reports and charts can be produced within the database system itself
or they may be part of an additional piece of software that is used to analyze the data.

The additional software is likely to collate data from more than one database and interconnect
the data from those databases to produce reports that analyze all the data together. When
additional software is used to collate data from more than one database, it is often referred to
as an executive information system (EIS).

A MIS has the following essential features:


• data is collated from databases and other sources.
• data is interconnected from different sources.
• data is analyzed to provide the data that is required by management.
• summary reports and charts are produced for managers. that will help with decision making.
The reports and charts are created by people, but once they are created, they can be reused as
the data changes within the data sources. It’s important that the reports and charts provide
information that managers need.

Example
Managers within a large second-hand car dealership Need to be able to monitor sales. They need
to be able to identify trends in sales for different makes and models of cars at different times of
the year. This will enable them to identify which cars are selling the most and which are making
the most profit. They can then decide which second-hand cars they want to acquire to sell.

Marketing managers can analyze how effective a marketing campaign was by comparing sates
figures during an advertising campaign with sales figures outside the advertising campaign. This
will help them to decide whether to run similar campaigns in the future.

You might also like