Cape It Unit 2 Compiled
Cape It Unit 2 Compiled
Cape It Unit 2 Compiled
Several tools are available for documenting the flow of information within an organization, from a simple
descriptive method to a diagrammatic model. For example, using short statements to list the set of
activities to be performed within a department of an organization is an example of a simple descriptive
method.
Types of DFD
In analyzing a business, several sets of DFDs are drawn. Initial DFDs might model the existing system
(flaws and all), while later DFDs may model a solution to the problem being analyzed. For these solution
DFDs a logical and physical DFD is drawn. Physical DFDs represent physical files and transactions, while
logical or conceptual DFDs can be used to represent business functions or processes.
Components of a DFD
A data flow diagram illustrates the processes, data stores, and external entities in a business or other
system and the connecting data/resource flows.
The four components of a data flow diagram (DFD) are:
External Entities (represented by a square or oval)
Processes (represented by a rounded rectangle)
Data Stores (represented by an open ended narrow rectangle)
Data Flows (represented by an arrow)
Resource Flows
This diagram represents a banking process, which maintains customer accounts. In this example,
External Entities
An external entity is a source (where the data comes from) or destination (where the data goes) of a data
flow which is outside the system but interacts with the system (e.g., person, department or organization).
When modeling a DFD, the designer is not interested in the inner workings of the external entity, but only
what data is produced/needed by the entity. The symbol used to represent an external entity is a square or
oval.
Student OR
Custome
r
Diagrams used to represent external entities
Process
A process shows a transformation or manipulation of data flows within the system. A process is a manual
or computerized routine that changes data by performing a calculation, updating information, sorting a list,
and so on. There are two ways in which a process is represented, one with a rounded rectangle, which is
numbered and labeled with a verb phase (except for the process numbered 0, which gives the system’s
name). Two, with a rectangle that has three parts an identification number, a location that appears to the
right of the identifier which describes where in the system the process takes place and a descriptive title
placed in the centre of the box. The process shows a part of the system that transforms inputs into outputs;
that is, it shows how one or more inputs are changed into outputs.
Enroll OR
student
Data Store
A data store is a holding place for information within the system. It is represented by an open ended
narrow rectangle. Data stores may be long-term files such as sales ledgers, or may be short-term
accumulations: for example batches of documents that are waiting to be processed. Each data store should
be given a reference followed by an arbitrary number. It could represent the data being held manually or
on a computer. Basically, a data store is a logical collection of data.
OR
D1Student’s Records
Data flow
A data flow shows the flow of information from its source to its destination. A data flow is represented by
a line, with arrowheads showing the direction of flow. Information always flows to or from a process or a
data store and may be written, verbal or electronic.
Resource Flow
A resource flow shows the flow of any physical material from its source to its destination. For this reason
they are sometimes referred to as physical flows. The physical material in question should be given a
meaningful name. Resource flows are usually restricted to early, high-level diagrams and are used when a
description of the physical flow of materials is considered to be important to help the analysis.
Data Flows
For data flows the rules are as follows:
Data flows and resource flows are allowed between external entities and processes. Data flows are also
allowed between different external entities. However, data flows and resource flows are not allowed
between external entities and data stores.
Processes
For processes the data flow rules are as follows:
Data flows and resource flows are allowed between processes and external entities and between processes
and data stores. They are also allowed between different processes. In other words processes can
communicate with all other areas of the business process diagram.
Data Stores
For data stores the data flow rules are as follows:
Data flows and resource flows are allowed between data stores and processes. However, these flows are
not allowed between data stores and external entities or between one data store and another. In practice
this means that data stores cannot initiate a communication of information, they require a process to do
this.
The first step in constructing DFDs is to draw a context diagram. A context diagram is a top-level view of
an information system that shows the system’s boundaries and scope. To draw a context diagram, you
start by placing a single process symbol in the centre of the page. The symbol represents the entire system,
and you identify it as process 0. Then you place the entities around the perimeter of the page and use data
flows to connect the entities to the central process. You do not show any data stores in a context diagram
because data stores are internal to the system. In drawing the context diagram you should only be
concerned with the most important information flows. Remember that no data flow process diagram is
the definitive solution - there is no absolute right or wrong.
The level 1 diagram shows the main functional areas of the system under investigation. As with the
context diagram, any system under investigation should be represented by only one level 1 diagram.
The level 1 diagram is surrounded by the outline of a process box that represents the boundaries of the
system. Because the level 1 diagram depicts the whole of the system under investigation, it can be difficult
to know where to start.
1. All processes must have at least one data flow in and one data flow out.
2. All processes should modify the incoming data, producing new forms of outgoing data.
3. Each data store must be involved with at least one data flow.
4. Each external entity must be involved with at least one data flow.
5. A data flow must be attached to at least one process.
6. Two entities, or two data stores, or an entity and a data store or vice versa, cannot be connected
together with a data flow.
7. Choose meaningful names for processes, flows, stores, and external entities.
8. Number the processes.
9. Avoid overly complex DFDs, do not cross lines.
10. Make sure the DFD is internally consistent and consistent with any associated DFDs.
11. The general principle in Data Flow Diagramming is that a system can be decomposed into
subsystems, and subsystems can be decomposed into lower level subsystems, and so on.
12. Each subsystem represents a process or activity in which data is processed. At the lowest level,
processes can no longer be decomposed.
13. Each process in a DFD has the characteristics of a system.
14. Data enters the system from the environment; data flows between processes within the system;
and data is produced as output from the system.
2. Two Data Stores, two Entities, or an Entity and a Data Store or vice versa, must not be
directly connected.
D1 D1
Video File Video File
Member Student
D1
Member Video File
4.
1
Print airline Ticket A Process that has no input is called a Miracle. It is
tickets impossible for an information system to process data it
doesn’t have.
2
5. Serial numbers A Process that has no output is called a Black hole. A
Sort Serial
process that just does
numbers
something to data, but doesn’t put that data anywhere is
useless.
6. A Data Flow must not be split into two or more Data Flows
8. The Data Flow label leading to a Process must be different from the Data Flow label being
outputted from that same Process.
(If you put grapes into a blender and blend you will not get grapes when finished but grape juice)
This data flow diagram represents a system used in a video rental store.
D1 Video File
Video Record
1 1
Video Video Video
Number Get Details Co
Video mp
ute
Details Bill
Member Tot
Details Bill Total
al
Payment
Member Member
Number 1 1
Get Ge
Memb t
er Pa
Details
Member Payment ym Member
Record Detailsent
D1 Member File
Member
Record
1
Pri
nt
Re
Class Activity cei
pt
Use the Data Flow Diagram below to answer the following questions.
1. How many processes are in the diagram?
Question 1
A student completes an application to become a member of a library. The student takes the application to
the librarian who checks the form and then issue a membership card. The completed application is stored
as Member Record in file, named Member Folder.
To borrow books, a member takes the books and his membership card to the librarian. She updates her
cards in the Book Folder.
When a member returns the borrowed books to the library, the librarian updates her records.
New books come into the library periodically. The librarian records the relevant details on cards, which
are stored in the Book Folder.
Question 2
You have an address book in which you keep details of your friends. From time to time, you will make
amendments such as change of address or telephone number for ‘old’ friend or add details for ‘new’
friends. In addition, a friend may request that you get the address or telephone number (from the address
book) of another contact.
Required: Construct a Context Diagram (Level 0).
Content: Definition of normalisation; attribute redundancy and anomalies; normal forms: including First
Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF); keys: primary, Foreign
and Composite (or compound or concatenated); Partial and Non-key Dependencies; relationships, use of
entity-relationship diagrams (ERD).
Higher degrees of normalization typically involve more tables and create the need for a larger number of
joins, which can reduce performance. Accordingly, more highly normalized tables are typically used in
database applications involving many isolated transactions (e.g. an automated teller machine), while less
normalized tables tend to be used in database applications that need to map complex relationships between
data entities and data attributes (e.g. a reporting application, or a full-text search application).
Database theory describes a table's degree of normalization in terms of normal forms of successively
higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second
normal form (2NF) as well; but the reverse is not necessarily the case.
Although the Normal Forms are often defined informally in terms of the characteristics of tables, rigorous
definitions of the normal forms are concerned with the characteristics of mathematical constructs known
as relations. Whenever information is represented relationally, it is meaningful to consider the extent to
which the representation is normalised.
Background to Normalisation
Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than
a number of possible unique keys. A primary key is a key which the database designer has
designated for this purpose.
NORMAL FORMS
The Normal Forms (abbrev. NF) of Relational Database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to
a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form"
(HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower
than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its
HNF.
The Normal Forms are applicable to individual tables; to say that an entire database is in Normal Form n
is to say that all of its tables are in Normal Form n.
First normal form (1NF or Minimal Form) is a normal form used in database normalization. A
relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. These
criteria are basically concerned with ensuring that the table is a faithful representation of a relation and
that it is free of repeating groups.
The concept of a "repeating group" is, however, understood in different ways by different theorists. As a
consequence, there is not universal agreement as to which features would disqualify a table from being in
1NF. Most notably, 1NF as defined by some authors (for example, Ramez Elmasri and Shamkant B.
Navathe, following the precedent established by Edgar F. Codd) excludes relation-valued attributes (tables
within tables); whereas 1NF as defined by other authors (for example, Chris Date) permits them.
According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation",
which means, specifically, that it satisfies the following five conditions:
Examples of tables (or views) that would not meet this definition of 1NF are:
A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in
violation of condition 3.
A table whose definition mandates that results be returned in a particular order, so that the row-
ordering is an intrinsic and meaningful aspect of the view. This violates condition 1. The tuples
in true relations are not ordered with respect to each other.
A table with at least one null able attribute. A nullable attribute would be in violation of
condition 4, which requires every field to contain exactly one value from its column's domain. It
should be noted, however, that this aspect of condition 4 is controversial. It marks an important
departure from Codd's original vision of the relational model, which made explicit provision for
nulls.
Date's fourth condition, which expresses "what most people think of as the defining feature of 1NF", is
concerned with repeating groups. The following example illustrates how a database design might
incorporate repeating groups, in violation of 1NF.
The designer then becomes aware of a requirement to record multiple telephone numbers for some
customers. He reasons that the
simplest way of doing this is CUSTOMER
to allow the "Telephone Customer ID First Name Surname Telephone Number
Number" field in any given 123 Robert Ingram 555-861-2025
record to contain more than 456 Jane Wright 555-403-1659
one value: 555-776-4100
789 Maria Fernandez 555-808-9633
Assuming, however, that the
Telephone Number column is defined on some Telephone Number-like domain (e.g. the domain of strings
12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the RDBMS)
prohibits a field from containing more than one value from its column's domain.
The designer might attempt to get around this restriction by defining multiple Telephone Number
columns:
CUSTOMER
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659 555-776-4100
789 Maria Fernandez 555-808-9633
This representation, however, makes use of null able columns, and therefore does not conform to Date's
definition of 1NF. Even if the table is taken that null able columns are allowed, the design is not in
keeping with the spirit of 1NF. Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and
exactly the same meaning; the splitting of Telephone Number into three headings is artificial and causes
logical problems. These problems include:
Difficulty in querying the table. Answering such questions as "Which customers have telephone
number X?" and "Which pairs of customers share a telephone number?" is awkward.
Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS.
Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1
value.
Restriction of the number of telephone numbers per customer to three. If a customer with four
telephone numbers comes along, we are constrained to record only three and leave the fourth
unrecorded. This means that the database design is imposing constraints on the business process,
rather than (as should ideally be the case) vice-versa.
The designer might, alternatively, retain the single Telephone Number column but alter its domain,
making it a string of sufficient length to accommodate multiple telephone numbers:
CUSTOMER
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659, 555-776-4100
789 Maria Fernandez 555-808-9633
This design is not consistent with 1NF, and presents several design issues. The Telephone Number
heading becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers share a
telephone number?" is more difficult to formulate, given the necessity to cater for lists of telephone
numbers as well as individual telephone numbers. Meaningful constraints on telephone numbers are also
very difficult to define in the RDBMS with this design.
A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a Customer
Telephone Number table.
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone
Number link appears on its own record.
Newcomers to database design sometimes suppose that normalisation proceeds in an iterative fashion, i.e.
a 1NF design are first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of
how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt;
furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher"
normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the
designer, because 3NF tables usually need no modification to meet the requirements of these higher
normal forms.
Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These Normal
Forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole
key and nothing but the key".
A table is in first normal form (1NF) if and only if it represents a relation. Given that database tables
embody a relation-like form, the defining characteristic of one in First Normal Form is that it does not
allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents
duplicate rows) and without any null able columns is in 1NF.
By redefining 1NF to exclude null able columns in 1NF, no level of normalization can ever be achieved
unless all null able columns are completely eliminated from the entire database.
One requirement of a relation is that every table contains exactly one value for each attribute. This is
sometimes expressed as "no repeating groups". While that statement itself is axiomatic, experts disagree
about what qualifies as a "repeating group", in particular whether a value may be a relation value; thus the
precise definition of 1NF is the subject of some controversy. Notwithstanding, this theoretical uncertainty
applies to relations, not tables. Table manifestations are intrinsically free of variable repeating groups
because they are structurally constrained to the same number of columns in all rows.
Put at its simplest; when applying 1NF to a database, every record must be the same length. This means
that each record has the same number of fields, and none of them contain a null value.
EMPLOYEES' SKILLS
Employee Skill Current Work Location
Jones Typing 114 Main Street
Jones Shorthand 114 Main Street
Jones Whittling 114 Main Street
Roberts Light Cleaning 73 Industrial Way
Ellis Alchemy 73 Industrial Way
Ellis Juggling 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely
Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are
represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73
Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example,
possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his
"Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones'
current work location?"
Update anomalies cannot occur in these tables, which are both in 2NF.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers
from update anomalies is:
TOURNAMENT WINNERS
Tournament Year Winner Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and
not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple
records. This problem is addressed by third normal form (3NF).
A table for which there are no partial functional dependencies on the primary key is typically, but not
always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary
to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF.
{Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset
of it: Manufacturer.
The Third Normal Form (3NF) is a normal form used in database normalization. 3NF was originally
defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the
following conditions hold:
An example of a 2NF table that fails to meet the requirements of 3NF is:
TOURNAMENT WINNERS
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent
on {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is
functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing
to stop the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Additional Reading
By Mike Hillyer
Table of contents:
Introduction
Mike's Bookstore
First Normal Form
Defining Relationships
Second Normal Form
Third Normal Form
Joining Tables
Conclusion
Resources
Introduction
This article will take a practical look at database normalization, focusing on the first three of seven
generally recognized normal forms. Additional resources that look at the theory of database normalization
and the additional normal forms can be found in the Resources section at the end of this article.
Note:
This article has been updated from an original version published in 2003.
Mike's Bookstore
Let's say you were looking to start an online bookstore. You would need to track certain information about
the books available to your site viewers, such as:
Title
Author
Author Biography
ISBN
Price
Subject
Number of Pages
Publisher
Publisher Address
Description
Review
Reviewer Name
Let's start by adding the book that coined the term ?Spreadsheet Syndrome?. Because this book has two
authors, we are going to need to accommodate both in our table. Lets take a look at a typical approach
Note:
Normalization is a part of relational theory, which requires that each relation (AKA table) has a
primary key. As a result, this article assumes that all tables have primary keys, without which a
table cannot even be considered to be in first normal form.
Second, this table is not very efficient with storage. Lets imagine for a second that our publisher is
extremely busy and managed to produce 5000 books for our database. Across 5000 rows we would need
to store information such as a publisher name, address, phone number, URL, contact email, etc. All that
information repeated over 5000 rows is a serious waste of storage resources.
Third, this design does not protect data consistency. Lets once again imagine that Jon Stephens has written
20 books. Someone has had to type his name into the database 20 times, and it is possible that his name
will be misspelled at least once (i.e.. John Stevens instead of Jon Stephens). Our data is now in an
inconsistent state, and anyone searching for a book by author name will find some of the results missing.
This also contributes to the update anomalies mentioned earlier.
The normalization process involves getting our data to conform to progressive normal forms, and a higher
level of normalization cannot be achieved unless the previous levels have been satisfied (though many
experienced designers can create normalized tables directly without iterating through the lower forms).
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic
we mean that there are no sets of values within a column.
In our example table, we have a set of values in our author and subject columns. With more than one value
in a single column, it is difficult to search for all books on a given subject or by a specific author. In
addition, the author names themselves are non-atomic: first name and last name are in fact different
values. Without separating first and last names it becomes difficult to sort on last name.
One method for bringing a table into first normal form is to separate the entities contained in the table into
separate tables. In our case this would result in Book, Author, Subject and Publisher tables.
Subject_ID Name
1 MySQL
2 Database Design
The Author, Subject, and Publisher tables use what is known as a surrogate primary key -- an
artificial primary key used when a natural primary key is either unavailable or impractical. In the
case of author we cannot use the combination of first and last name as a primary key because there
is no guarantee that each author's name will be unique, and we cannot assume to have the author's
government ID number (such as SIN or SSN), so we use a surrogate key.
Some developers use surrogate primary keys as a rule, others use them only in the absence of a
natural candidate for the primary key. From a performance point of view, an integer used as a
surrogate primary key can often provide better performance in a join than a composite primary key
across several columns. However, when using a surrogate primary key it is still important to create
a UNIQUE key to ensure that duplicate records are not created inadvertently (but some would
argue that if you need a UNIQUE key it would be better to stick to a composite primary key).
By separating the data into different tables according to the entities each piece of data represents, we can
now overcome some of the anomalies mentioned earlier: we can add authors who have not yet written
books, we can delete books without losing author or publisher information, and information such as author
names are only recoded once, preventing potential inconsistencies when updating.
Depending on your point of view, the Publisher table may or may not meet the 1NF requirements because
of the Address column: on the one hand it represents a single address, on the other hand it is a
concatenation of a building number, street number, and street name.
The decision on whether to further break down the address will depend on how you intend to use the data:
if you need to query all publishers on a given street, you may want to have separate columns. If you only
need the address for mailings, having a single address column should be acceptable (but keep potential
future needs in mind).
Defining Relationships
As you can see, while our data is now split up, relationships between the tables have not been defined.
There are various types of relationships that can exist between two tables:
The relationship between the Book table and the Author table is a many-to-many relationship: A book can
have more than one author, and an author can write more than one book. To represent a many-to-many
relationship in a relational database we need a third table to serve as a link between the two. By naming
the table appropriately, it becomes instantly clear which tables it connects in a many-to-many relationship
(in the following example, between the Book and the Author table).
ISBN Author_ID
1590593324 1
1590593324 2
Similarly, the Subject table also has a many-to-many relationship with the Book table, as a book can cover
multiple subjects, and a subject can be explained by multiple books:
ISBN Subject_ID
1590593324 1
1590593324 2
As you can see, we now have established the relationships between the Book, Author, and Subject tables.
A book can have an unlimited number of authors, and can refer to an unlimited number of subjects. We
can also easily search for books by a given author or referring to a given subject.
The case of a one-to-many relationship exists between the Book table and the Publisher table. A given
book has only one publisher (for our purposes), and a publisher will publish many books. When we have a
one-to-many relationship, we place a foreign key in the table representing the ?many?, pointing to the
primary key of the table representing the ?one?. Here is the new Book table:
Since the Book table represents the ?many? portion of our one-to-many relationship, we have placed the
primary key value of the Publisher as in aPublisher_ID column as a foreign key.
In the tables above the values stored refer to primary key values from the Book, Author, Subject and
Publisher tables. Columns in a table that refer to primary keys from another table are known as foreign
keys, and serve the purpose of defining data relationships.
Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with
relationships between composite key columns and non-key columns. As stated earlier, the normal forms
are progressive, so to achieve Second Normal Form, your tables must already be in First Normal Form.
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the
case of a composite primary key, this means that a non-key column cannot depend on only part of the
composite key.
In this situation, the URL for the author of the review depends on the Author_ID, and not to the
combination of Author_ID and ISBN, which form the composite primary key. To bring the Review table
into compliance with 2NF, the Author_URL must be moved to the Author table.
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the
Third Normal Form when one column depends on another column, which in turn depends on the primary
key (a transitive dependency).
One way to identify transitive dependencies is to look at your table and see if any columns would require
updating if another column in the table was updated. If such a column exists, it probably violates 3NF.
In the Publisher table the City and State fields are really dependent on the Zip column and not the
Publisher_ID. To bring this table into compliance with Third Normal Form, we would need a table based
on zip code:
A complete normalization of tables is desirable, but you may find that in practice that full normalization
can introduce complexity to your design and application. More tables often means more JOIN operations,
and in most database management systems (DBMSs) such JOIN operations can be costly, leading to
decreased performance. The key lies in finding a balance where the first three normal forms are generally
met without creating an exceedingly complicated schema.
Joining Tables
With our tables now separated by entity, we join the tables together in our SELECT queries and other
statements to retrieve and manipulate related data. When joining tables, there are a variety of JOIN
syntaxes available, but typically developers use the INNER JOIN and OUTER JOIN syntaxes.
An INNER JOIN query returns one row for each pair or matching rows in the tables being joined. Take
our Author and Book_Author tables as an example:
The third author in the Author table is missing because there are no corresponding rows in the
Book_Author table. When we need at least one row in the result set for every row in a given table,
regardless of matching rows, we use an OUTER JOIN query.
There are three variations of the OUTER JOIN syntax: LEFT OUTER JOIN, RIGHT OUTER JOIN and
FULL OUTER JOIN. The syntax used determines which table will be fully represented. A LEFT OUTER
JOIN returns one row for each row in the table specified on the left side of the LEFT OUTER JOIN
clause. The opposite is true for the RIGHT OUTER JOIN clause. A FULL OUTER JOIN returns one row
for each row in both tables.
In each case, a row of NULL values is substituted when a matching row is not present. The following is an
example of a LEFT OUTER JOIN:
The third author is returned in this example, with a NULL value for the ISBN column, indicating that
there are no matching rows in the Book_Author table.
Conclusion
Through the process of database normalization we bring our schema's tables into conformance with
progressive normal forms. As a result our tables each represent a single entity (a book, an author, a
subject, etc) and we benefit from decreased redundancy, fewer anomalies and improved efficiency.
What is SQL?
SYNTAX
N.B. The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City
columns are of type varchar with a maximum length of 255 characters.
SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition
Language (DDL).
The query and update commands form the DML part of SQL:
The DDL part of SQL permits database tables to be created or deleted. It also defines indexes (keys),
specifies links between tables, and imposes constraints between tables. The most important DDL
statements in SQL are:
SYNTAX
OR
SYNTAX
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
UPDATE Persons
WHERE FirstName='Jamiel';
SYNTAX
DELETE *
FROM “Table name”
WHERE Columm1='value';
This will become tricky as there are many ways to utilize the “select” function
SYNTAX
SELECT column_name(s)
FROM table_name
OR
OR
SELECT column_name(s)
FROM table_name
WHERE column_name(s)=value
EXAMPLES
SELECT *
FROM Persons
SELECT LastName
FROM Persons
N.B Literal Words need single quotes around them, numbers DO NOT
WHERE FirstName='Jamiel’
N.B Using the words “AND” and “OR”, one can dictate more than one columns and criteria
Specific Objective 1: Explain the function and uses of the major types of software tools
Content: For example types of software packages and their function and uses. Including financial
packages, Software Development Tools, Statistical Tools, Word Processors, Spreadsheets, Presentation
Tools and Database Management Tools, Desktop publishing, graphics and personal information
management.
Productivity tools
Within the category of general purpose software is a set of application tools known as productivity tools.
These software tools were originally designed for use in businesses. The most popular productivity tools
that are available on the market today are
Word processing – produce documents such as letters, memos, newsletter and reports
Spreadsheet – process numerical data and formulae to produce tabulated information or graphs.
Also perform what is scenarios
Desktop publishing – combine text and graphics such as photographs and line drawings to create
visually appealing, high quality pages for books, magazines etc. Example of DTP include
PageMker, Ms. Publisher, Pageplus
Presentation packages and DBMS – prepare great looking visual such as slide show for
presentation, especially to large audience.
Financial or accounting packages – track income and expenses and produce financial reports and
statements. Examples of this type of software include ACPAC and Quicken
Statistical tools – perform statistical function of variance, regression, correlation and distribution.
Spreadsheet
A spreadsheet is a program that is designed to create tables and financial schedules by allowing you to enter
data into rows and columns arranged as a grid.
Dan Bricklin and Bob Frankson invented the first practical spreadsheet for microcomputers. They called it
VisiCalc. It was released for the Apple II computer in 1979. Prior to this, spreadsheets had to be manually
re-calculated each time one cell was changed. With Visicalc, work that might take a week could be done
in minutes. Several years after VisiCalc's release, the inventors sold the program to Lotus Development
Corporation, who renamed it Lotus 1-2-3 and upgraded the software to run on an IBM-PC or compatible
computer.
Function
Spreadsheets can do more than perform simple arithmetic calculations. A spreadsheet can translate
complicated data and reports into a combination of numbers and graphs. Modern versions include an
extensive list of financial calculators, such as interest calculations, loan amortization, even calculations for
Treasury bill rates. Statistical functions from common calculations (Chi Square, Pearson Coefficient of
Correlation and Standard Deviation) to abstruse functions like the Hypergeometric Distribution and the
Poisson distribution return needed values with no pain. There are at least 100 of these formulas included
in contemporary spreadsheets.
Features
Spreadsheets can function as basic databases. By inserting data and number in different columns, the
results can be sorted, searched or filtered. What-if analysis can run using data from cells in a spreadsheet.
There are mathematical and trigonometric functions as well. A user can create a simplified mailing list in
a spreadsheet by entering names and addresses in individual columns. There are even word processing
features offering control over fonts, bold or italic typeface, size, color and page formatting.
Warning
This package allows the user to enter data and to perform calculations on them. A typical use might be to
keep simple accounts. Spreadsheet are used primarily by persons who work with numbers. They are ideally
suited for fats, accurate number crunching and forecasting.
Examples of spreadsheet on the market are Ms. Excel, Lotus 1-2-3 and Quattro Pro.
Spreadsheet hide mathematical formulae “under” the numbers, so that some of the numbers that you are
seeing on the screen are actually generated by the underlying formulae.
Built in functions allow you to quickly and easily create complex formulae. The capabilities to recalculate
complex tables of number makes spreadsheets ideal for what if scenarios.
You can create a variety of graphs and charts by using the program’s built in wizard.
The data are organised into cells that can contain data or formulae. The formulae can use numbers or the
contents of other cells to perform the calculations. Typical features of a spreadsheet package are:
• Enter numerical data or text.
• Enter formulae that contain numerical operations.
• Functions to perform mathematical or statistical calculations.
• Sorting of data into sequence.
• Charts of various types.
• Automatically perform a series of operations using a macro.
Importing and exporting capabilities
Electronic spreadsheets allow for calculation across multiple spreadsheets whereas paper spreadsheets
make this task very difficult and time consuming.
Electronic spreadsheets can be sent to numerous recipients via email and shared across different locations
in a matter of minutes. Paper spreadsheets need to be faxed or mailed; this would take more time to
distribute. Additionally, parts of the spreadsheets can be locked to prevent data from being modified.
A spreadsheet is a grid, like a paper ledger, made up of columns and rows. But unlike a paper ledger,
spreadsheets are on the computer and have more components and functions.
Row
1. The horizontal strips of the spreadsheet are called rows. They are identified by number at
the far left hand side of the spreadsheet. This is called the row header.
Columns
2. The vertical strips of the spreadsheet are columns. They are identified by letters across the
top of the spreadsheet. This is called the column header.
Cell
3. The point where the rows and columns intersect is called a cell. To activate a cell, simply
click on it.
Formula Bar
4. Data or formulas can be entered directly into the active cell or entered into the formula
bar. If you click on a cell that already has data in it, the formula used will show in this bar.
Name Box
5. The name box is located at the upper left corner before the formula bar and lists the
location of the active cell you are working on.
Shortcomings of spreadsheet
While spreadsheets are a great step forward in quantitative modeling, they have deficiencies. At the level
of overall user benefits, spreadsheets have four main shortcomings.
The practical expressiveness of spreadsheets is limited. Several factors contribute to this limitation.
Implementing a complex model requires implementing detailed layouts, cell-at-a-time. Authors
have difficulty remembering the meanings of hundreds or thousands of cell addresses that appear
in formulas
Collaboration in authoring spreadsheet formulas is difficult because such collaboration must occur
at the level of cells and cell addresses. By comparison, programming languages aggregate cells
with similar meaning into indexed variables with names that indicate meaning. Although some
spreadsheets have good collaboration features, authoring at the level of cells and cell formulas
remains a significant obstacle to collaboration in authoring spreadsheet models. On the other hand,
many people collaborate on entering numerical data and many people can use the same
spreadsheet.
These four deficiencies in high-level benefits have deeper causes that, ironically, flow directly from the
signature strength of spreadsheets (that they capture the structure of models in terms of WYSIWYG sheet
layout for authors and report users).
Spreadsheets capture model logic in terms of sheet layout, especially contiguous layout of cells in
a table. Spreadsheets have weak or nonexistent methods to capture higher level structures such as
named variables, segmentation dimensions, and time series.
Formulas are subordinated to the cell layout. This forces the sheet layout to carry the structure of
the model, not variables and formulas that relate variables. This also causes a large proliferation of
cells, formulas and cell-level tasks even when only a few basic concepts are involved in a model.
This forces authors to think and work at the level of cells instead of at the level of the natural
concepts and structures of the model.
Formulas expressed in terms of cell addresses are hard to keep straight and hard to audit. Research
shows that spreadsheet auditors who check numerical results and cell formulas find no more errors
than auditors who only check numerical results
Some sources advocate the use of specialized software instead of spreadsheets for some applications
(budgeting, statistics)
Many spreadsheet software products, such as Microsoft Excel (versions prior to 2007) and
OpenOffice.org Calc have a capacity limit of 65,536 rows by 256 columns. This can present a
problem for people using very large datasets, and may result in lost data.
Lack of auditing and revision control. This makes it difficult to determine who changed what and
when. This can cause problems with regulatory compliance. Lack of revision control greatly
increases the risk of errors due the inability to track, isolate and test changes made to a document.
Lack of security. Generally, if one has permission to open a spreadsheet, one has permission to
modify any part of it. This, combined with the lack of auditing above, can make it easy for
someone to commit fraud.
Because they are loosely structured, it is easy for someone to introduce an error, either accidentally
or intentionally, by entering information in the wrong place or expressing dependencies among
cells (such as in a formula) incorrectly.
The results of a formula (example "=A1*B1") applies only to a single cell (that is, the cell the
formula is actually located in - in this case perhaps C1), even though it can "extract" data from
many other cells, and even real time dates and actual times. This means that to cause a similar
calculation on an array of cells, an almost identical formula (but residing in its own "output" cell)
must be repeated for each row of the "input" array. This differs from a "formula" in a conventional
computer program which would typically have one calculation which would then apply to all of
the input in turn. With current spreadsheets, this forced repetition of near identical formulas can
have detrimental consequences from a quality assurance standpoint and is often the cause of many
spreadsheet errors. Some spreadsheets have array formulas to address this issue.
Trying to manage the sheer volume of spreadsheets which sometimes exists within an organization
without proper security, audit trails, the unintentional introduction of errors and other items listed
above can become overwhelming.
While there are built-in and third-party tools for desktop spreadsheet applications that address some of
these shortcomings, awareness and use of these is generally low. A good example of this is that 55% of
Capital market professionals "don't know" how their spreadsheets are audited; only 6% invest in a third-
party solution
Database
This package stores and manipulates large quantities of data. It allows the user to enter the data and
provides reports that allow the data to be viewed in different forms. Most modern database packages
store the data in tables and allow the data in different tables to be related in various ways. Typical
features of a database package are:
• Forms to enter the data.
• Queries that allow the data to be selected and sorted in various ways.
• Tabular reports.
History
The history of software tools began with the first computers in the early 1950s that used linkers, loaders,
and control programs. Tools became famous with Unix in the early 1970s with tools like grep, awk and
make that were meant to be combined flexibly with pipes. The term "software tools" came from the book
of the same name by Brian Kernighan and P. J. Plauger.
Tools were originally simple and light weight. As some tools have been maintained, they have been
integrated into more powerful integrated development environments (IDEs). These environments
consolidate functionality into one place, sometimes increasing simplicity and productivity, other times
sacrificing flexibility and extensibility. The workflow of IDEs is routinely contrasted with alternative
approaches, such as the use of Unix shell tools with text editors like Vim and Emacs.
The distinction between tools and applications is murky. For example, developers use simple databases
(such as a file containing a list of important values) all the time as tools. However a full-blown database is
usually thought of as an application in its own right.
For many years, computer-assisted software engineering (CASE) tools were sought after. Successful tools
have proven elusive. In one sense, CASE tools emphasized design and architecture support, such as for
UML. But the most successful of these tools are IDEs.
The ability to use a variety of tools productively is one hallmark of a skilled software engineer.
Categories
Software development tools can be roughly divided into the following categories:
In DTP you first task is to determine the layout followed by the text and graphics. The page layout
capabilities, combined with its precision make the DTP ideal for professionally printed documents such as
books and magazines.
SS give greater flexibility in building formulae, performing what if analysis, generating charts and graphs,
and in the manipulation of rows and columns of data. DBMS are ideal for data entry, building queries,
generating formatted reports and developing information systems.
Wordprocessing packages are the most popular type of computer software. For many people a word
processing package is the only software they use. It is hard to think of a single job where some use for
word-processing d not be found.
Many people now type their own documents directly into a computer rather than give them to a typist. This
saves time, if people can type quickly. Because the text is stored in memory once it is typed, people can
alter their text before finally saving and print.
Now, even printing letters is becoming out-of-date because electronic mail is often to send word processed
documents in electronic form from one place to another. There is no need for an envelope or a stamp and the
recipient may store the letter on disk for future reference, thus saving valuable storage space.
Apart from the operating system, the only Software needed is a word-processing package. This can be a
separate package which just does word-processing or part of an integrated package, where wordprocessing is
one part of the complete package.
You can also use special voice recognition software to dictate straight into a word-processor.
2. Fewer resources are used, provided that material is carefully proofread on the screen before finally printing.
Since we can edit and correct mistakes before printing, this reduces the amount of paper used. Most large
companies now send all their internal letters and memos using electronic mail. The widespread use of
electronic mail will help to conserve valuable resources.
3. More people are able to produce their own documents rather than pass them to someone else to type. This
can save both time and money.
4. More word processors are becoming like simple desktop publishing packages. These extra facilities enable
people to produce notices, posters, tickets, etc. with very little training.
Most word processors allow you to change the fonts (type styles and sizes).
Spell checkers
Nearly all word processors have a dictionary against which the words in a document canj compared to check
their spelling. Most allow you to add words to the dictionary which is useful if you use special terms in
subjects such as law or medicine. It is important to note that spell checking a document will not remove all
the errors. For instance, if you intended to type 'the' and typed 'he' instead, then the spell checker will not
detect this since 'he' as a word is spelt correctly. After using a spell checker it is still necessary to of read a
document.
Thesaurus
A thesaurus is useful for creative writing (perhaps for English GCSE coursework). It allows you to highlight
a word in a document and the computer lists words with similar meanings (called synonyms)
Mail merge
Merging involves combining a list of say, names and addresses, with a typed letter, that a series of similar
letters is produced, each addressed to a different person. The list is either created using the word processor
or by importing data from a database of names and addresses. The letter is typed, using the wordprocessor,
with blanks where the data the list is to be inserted.
Indexing
Macros
Macros allow you automatically to produce a sequence of keystrokes so that, for example, you can just press
one key or a combination of keys and have your name and address printed at the top of the page. You can
also insert dates just by pressing a couple of keys. Macros are very useful for things that need to be done
repetitively.
Grammar checkers
Some of the more sophisticated word processors have a feature called a grammar checker. This is useful if
your English is not so good. If the word processor you use does not have this facility, then you can buy a
separate grammar checking package to use with it. Figure 28.5 shows a grammar checker being used.
Because of the complexities of the English language, grammar checkers do have their limitations. As yet
they find only a few faults and do tend to provide an incorrect analysis of the grammar of writing. You need
to exercise care when using them.
Print preview - using the print preview you can see the whole page (as a smaller version) before you print
it out. This can save time and paper.
Templates and wizards - here the framework of the document (fonts, sizes, headings, etc.) is laid out for
you. You do not need to worry about the layout. All you have to do is supply the text.
Save in different file formats - this means you can transfer your wordprocessed material to almost any other
package.
Word-count - this counts the number of words in a document or part of a document. This is useful if you
have to write essays or course work of a certain number of words.
On-line help - you can search for help by entering a keyword or by selecting a topic from a list. Good on-
line help is particularly important as a lot of software is supplied without manuals.
Drawing tools - these tools allow you to produce arrows, shapes, boxes, etc. without having to use a special
graphics package.
Zooming in and out - particularly useful if you have difficulty seeing small text on the screen.
Tables - rather than use a spreadsheet you can use the tables function in the wordprocessor. If there are
numbers in the tables you can even perform simple calculations on them.
Most people's favourite word processor is the one they are used to, but there are many new and useful
features in the newer packages to tempt you. How do you decide which ONE TO USE? You may like to ask
yourself the following questions to help you decide.
1. If you are a fast typist you would look for a wordprocessor that could keep up with
your typing speed. Windows word-processors can be slow on older computers, so a DOS based
package might be better.
2. you would like to do simple desktop publishing, then many of the Windows,
Macintosh or Windows 98-based word processors would be suitable. Many of have some of the
features you used find only in DTP packages.
3) If you need to send out a number of letters to lots of different people, it would be best to go for a word
processor with a mail merge facility. Mail merging allows you to create standard letters and link them
to a database of names and addresses so that you can make it appear that the letters are all tailor made.
4) If you use a variety of word processors, then you should check whether you can transfer the files
between the different packages. With some it is easy, but with others you may find that one word
processor is unable to read files from a different word processor.
In this tutorial we use a plain text editor (like Notepad) to edit HTML. I believe this is the best way to
learn HTML.
You save the document ALWAYS with whatever name you chooses ending with “.html”
</body>
</html>
The code <html> </html> tells us that you want to create a web page. Please note a webpage must have
an open and a close tag.
The code <body></body> defines what will be shown on the web page. Please note the body must have an
opening and a closed tag
HTML headings are defined with the <h1> to </h1> code. Heading is what we call the title. You can
change the number to different sizes. <h1> is the largest while <h6> is the smallest. All header tags should
have an opening and a closed tag.
This is called text aligning where you can choose for the words to be left, right or centre. We can do this
from within the “h” tags. For instance if I wanted to centre the title I would use the following codes.
Similarly if I wanted to centre the paragraph I would use the following codes.
<p1 style="text-align:center">
We use the following codes when we want to give the web page some life.
<body style="background-color:yellow">
We write the code within the body tag. Also noticed that the word colour is spelt the American way color
as that is the way the computer understands it.
We use the following codes to change the style of the writing. We do this in the <body> tag, the <p> tag
or the <h> wherever you would like to change the style of the writing.
<body style=”font-family:arial”>
Note that you must type the code “font-famly:” before the name of the font you would like to use.
We use the following codes to change the colour of the writing. We do this in the <body> tag, the <p> tag
or the <h> wherever you would like to change the colour of the writing.
<p style="color:red”>
<body style=”color:blue”>
Note that you must type the code “color:” before the name of the colour you would like to use. If you
want a particular paragraph to have a different colour you use the <p> tag or if you want the whole page
to have a particular colour you use the <body> tag and similarly if you want your title to have a different
colour you use the <h> tag.
We use the following codes to change the size of the words. We do this in the <body> tag or the <p>
wherever you would like to change the side of the words.
<p style="font-size:10px”>
<body style=”font-size:15px”>
Note that you must type the code “font-size:” before how big you would like your words to look. If you
want a particular paragraph to have a different size you use the <p> tag or if you want the whole page to
have a particular font size you use the <body> tag.
Tables are defined with the <table> tag. A table is divided into rows (with the <tr> tag), and each row is
divided into data cells (with the <td> tag). The letters td stands for "table data," which is the content of a
cell. A data cell can contain text, images, lists, paragraphs, forms, horizontal rules, tables, etc.
<table>
<tr>
<th>Day</th>
<th>Subject</th>
</tr>
<tr>
<td>Monday</td>
<td>Information Technology</td>
</tr>
</table>
The table has an opening and a closed tag. The rows and columns are defined within these tags <table>
</table>
If you would like to make more rows and column simply copy the <tr> to </tr> code.
Notice in the code above there are some codes called <th></th>. Within this code is the table header or
what gives the title for each of the column.
<img src="mypicture.jpg"/>
A web portal is most often a specially designed web site that brings information together from
diverse sources in a uniform way. Usually, each information source gets its dedicated area on the page for
displaying information
Justification: Yahoo! is an American multinational technology company. It operates a portal that provides
the latest news, entertainment, and sports information. It is important to note that the portal also gives
users access to other Yahoo services like Yahoo! Search, Yahoo Mail, Yahoo Maps, Yahoo Finance,
Yahoo Groups and Yahoo Messenger. For this reason, it can be classified as a suitable web portal
News
A news website is one that is created for the main purpose of providing accurate and up-to-date
news information from around the region and the world and gathering it together in one spot so the user
can see it. A news website can sometimes be the online version of a newspaper.
Justification: The BBC News website is one of largest online based newspaper in the world and is the
most frequently accessed news website in the United Kingdom. The website contains international news
coverage, as well as British, entertainment, science, and political news. Many reports are accompanied by
audio and video from the BBC's television and radio news services, while the latest TV and radio bulletins
are also available to view or listen to on the site together with other current affairs programmes. Therefore,
World Wide Web consisting of discrete, often informal diary-style text entries ("posts"). Blogs can also be
Example: Lifehacker
Justification: Lifehacker is a weblog about life hacks and software (a life hack refers to any shortcut or
trick that increases productivity in all walks in life). The blog posts cover a wide variety of topics that
discuss both software and life itself. Topics include Microsoft Windows, Mac, Linux programs, iOS and
Android, as well as general life tips and tricks. The staff of Lifehacker updates the site about 18 times per
weekdays, with reduced times on weekends. This in accordance with a blog’s characteristics of periodic
updates. Because of its nature to share stories, tricks and advice on a timely basis, Lifehacker can easily be
classified as a blog.
Social Networking
A social networking site an online platform that is used by people to build social networks or
social relations with other people who share similar personal or career interests, activities, backgrounds or
real-life connections. Depending on the social media platform, members may be able to contact any other
member. In other cases, members can contact anyone they have a connection to, and subsequently anyone
Justification: Facebook is easily, and by far, “the biggest and most recognizable social media network on
the Internet, both in terms of total number of users and name recognition. Founded on February 4, 2004,
Facebook has within 12 years managed to accumulate more than 1.59 billion monthly active users and this
automatically makes it one of the best mediums for connecting people from all over the world with your
business.” (Maina, 2016). Facebook provides a plethora of options for individuals to connect to friends,
Personal
A personal website are World Wide Web pages created by an individual to contain content of a
personal nature rather than content pertaining to a company, organization or institution. Personal web
pages are primarily used for informative or entertainment purposes but can also be used for personal
career marketing.
Justification: Josh Johnson is a personal website created by a man of the same name. It is used to sell
himself as a very skilled web designer. Characteristic of a personal website, JoshJohnson contains
personal information of Johnson as well being used to market his personal career.
Business/Marketing
A business website is a collection webpages that are used to carry out some form of business
and/or marketing purposes. They are setup so that individuals can purchases a product, request a service or
market a product of their own. Businesses use these websites to expand their range of customers and carry
Example: Amazon.com
Justification: Amazon is an American based electronic e-commerce and Business Company. It is the
largest Internet-based retailer in the world by total sales and market capitalization. Amazon more than
qualifies as a business website as it allows users to purchase a wide range of products from their stores by