S2:
Data
Structures
and
SQL
Shawndra
Hill
Spring
2013
TR
1:30-3pm
and
3-4:30
APIs
Phases
in
the
DM
Process:
CRISP-DM
Business
Understanding
Data
Understanding
Data
PreparaIon
Modeling
EvaluaIon
Deployment
www.crisp-dm.org
4
CRoss
Industry
Standard
Process-DM
Business
Understanding:
Understanding
project
objecIves
and
data
mining
problem
idenIcaIon
Data
Understanding:
Capturing,
understand,
explore
your
data
for
quality
issues
Data
Prepara2on:
Data
cleaning,
merge
data,
derive
aUributes
etc.
Modeling:
Select
the
data
mining
techniques,
build
the
model
Evalua2on:
Evaluate
the
results
and
approved
models
Deployment:
Put
models
into
pracIce,
monitoring
and
maintenance
5
Data
Structures
A
data
structure
is
a
scheme
for
organizing
data
in
the
memory
of
a
computer.
Some
of
the
more
commonly
used
data
structures
include
lists,
arrays,
stacks,
queues,
heaps,
trees,
and
graphs.
Binary Tree
Data
Structures
The
way
in
which
the
data
is
organized
aects
the
performance
of
a
program
for
dierent
tasks.
Computer
programmers
decide
which
data
structures
to
use
based
on
the
nature
of
the
data
and
the
processes
that
need
to
be
performed
on
that
data.
Binary Tree
Example:
A
Queue
A
queue
is
an
example
of
commonly
used
simple
data
structure.
A
queue
has
beginning
and
end,
called
the
front
and
back
of
the
queue.
Data
enters
the
queue
at
one
end
and
leaves
at
the
other.
Because
of
this,
data
exits
the
queue
in
the
same
order
in
which
it
enters
the
queue,
like
people
in
a
checkout
line
at
a
supermarket.
Example:
A
Binary
Tree
A
binary
tree
is
another
commonly
used
data
structure.
It
is
organized
like
an
upside
down
tree.
Each
spot
on
the
tree,
called
a
node,
holds
an
item
of
data
along
with
a
leY
pointer
and
a
right
pointer.
Binary Tree
Example:
A
Binary
Tree
The
pointers
are
lined
up
so
that
the
structure
forms
the
upside
down
tree,
with
a
single
node
at
the
top,
called
the
root
node,
and
branches
increasing
on
the
leY
and
right
as
you
go
down
the
tree.
Binary Tree
Choosing
Data
Structures
By
comparing
the
queue
with
the
binary
tree,
you
can
see
how
the
structure
of
the
data
aects
what
can
be
done
eciently
with
the
data.
Choosing
Data
Structures
A
queue
is
a
good
data
structure
to
use
for
storing
things
that
need
to
be
kept
in
order,
such
as
a
set
of
documents
waiIng
to
be
printed
on
a
network
printer.
Choosing
Data
Structures
The
jobs
will
be
printed
in
the
order
in
which
they
are
received.
Most
network
print
servers
maintain
such
a
print
queue.
.
Choosing
Data
Structures
A
binary
tree
is
a
good
data
structure
to
use
for
searching
sorted
data.
The
middle
item
from
the
list
is
stored
in
the
root
node,
with
lesser
items
to
the
leY
and
greater
items
to
the
right.
Choosing
Data
Structures
A
search
begins
at
the
root.
The
computer
either
nd
the
data,
or
moves
leY
or
right,
depending
on
the
value
for
which
you
are
searching.
Each
move
down
the
tree
cuts
the
remaining
data
in
half.
Choosing
Data
Structures
Items
can
be
located
very
quickly
in
a
tree.
Telephone
directory
assistance
informaIon
is
stored
in
a
tree,
so
that
a
name
and
phone
number
can
be
found
quickly.
Lecture Outline
See the Access Database File (Lab1_Database_BBB.accdb) for this lecture on the course website ! Relational Databases
Tables Columns Rows
Queries
Query Design View Structured Query Language (SQL View)
Selecting rows meeting a condition Algebraic operations Text matching (with LIKE) Computing summary figures Joining multiple tables
Inserting, updating, and deleting data
17
Databases: Overview
Overview what databases are, and what a database management system is. Present key concepts and vocabulary. Demonstrate Microsoft Access and the BBB database.
BBB: Bookbinders Book Club We are using a reduced version, of a real database, for class discussion Full version exists
18
Rationale
Why do we have databases?
Because data are valuable and essential for conduct of business Because data are created in huge volumes, and need to be retained (on disk) and efficiently accessed
Why do we have DBMSs?
To manage the huge amounts of data and to facilitate use of these data
Our focus will be on extracting data from databases in useful ways. a.k.a. querying.
19
RelaIonal
Databases
Relational databases
Databases that use a series of logically related two-dimensional tables to store their information Tables are comprised of fields/records, which in turn contain field values
Field
Last Name SS#
Student
DOB 06/11/84 1/1/85 12/31/81 3/3/88
Table
Major IS FIN MKT ACC
20
Record
Smith Kim
100201122 200202222 300201232 999132212
Field value
Davis Pat
RelaIonal
Databases
Relational Database Tables Records Fields Field values Bytes, bits
Field
Last Name SS# 100201122 200202222 300201232 999132212
Student
DOB 06/11/84 1/1/85 12/31/81 3/3/88
Table
Major IS FIN MKT ACC
21
Record
Smith Kim
Field value
Davis Pat
Why
do
we
need
relaIonal
databases?
Problems arise when data is stored in one big file
Order# Date
Customer ID Last Name First Name Address
ISBN
Book Name
Author
Price
1 9/1/03 C1001 2 9/2/03 C1004 3 9/3/03 C1002 4 9/4/03 C1003 5 9/5/03 C1003 6 9/6/03 C1001 7 9/7/03 C1002 8 9/8/03 C1001
Bezos Sproull Student Gates Gates Bezos Student Bezos
Jeff Lee Pat Bill Bill Jeff Pat Jeff
1 Amazon Plaza Dean's Office, Stern School, New York
#0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Haag, Stephen Barabasi, AlbertLaszlo Rheingold, Howard Rheingold, Howard Litman, Jessica Rheingold, Howard
$25.00 $55.00 $98.75 $34.95 $29.95 $29.95 $55.00 $29.95
Tisch LC-12, New York #0072952849 MIS in the Information Age Microsoft Corporation, Redmond Microsoft Corporation, Redmond Linked: The New Science of #0738206679 Networks
Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution
22
Insertion anomalies
Anomalies
in
un-normalized
data
Inability to insert a piece of information about an object that can exist independently of another object without having to insert a piece of information about the other object Example: Adding a new customer/book before it is ordered
Deletion anomalies
The loss of a piece of information about one object when a piece of information about a different object is deleted Example: Deleting an order => deleting a customer/book
Update anomalies
A need to change the same piece of information about an object multiple times Example: Changing Bill Gates address
23
A
normalized
version
of
Amazon s
data
Normalization
The process of assuring that a database can be implemented effectively as a set of two-dimensional tables Prevents insertion, deletion and update anomalies
24
Benefits of Normalization
Greater
overall
database
organizaIon
Minimize
data
redundancies
Data
consistency
within
the
database
A
more
exible
database
design
Data
can
be
used
more
producIvely
A
beUer
handle
on
database
security
Disadvantage of Normalization
Reduced database performance because database must locate requested tables and join data - requires additional processing logic A lot of Planning goes in to the design of a database
Primary keys
ConnecIng
tables
together
A field (or group of fields in some cases) that uniquely describes each record in a table Examples: Customer ID, ISBN, Order#
Foreign keys
A field that is a primary key in one table and appears in a different table (though not as the primary key) Examples: Customer ID in Orders
Integrity constraints
rules (most built in) that help ensure the quality of the information. Not NULL
26
Concepts & Terminology
Database: an organized collection of information about entities Sales, business transactions, personnel, inventory, products, Entities have attributes. e.g. Person entity has attributes: Last Name, First Name, IDnumber, Database management system (DBMS): a computerized recordkeeping system. DBMS: software for managing (creating, modifying, viewing, retrieving information, etc.) from databases. Note: We focus on Relational DBMSs (RDBMs), the kind most widely used in business
27
Let s Look at Microsoft Access
Microsoft Access: A RDBMS, part of the Microsoft Office Professional suite, bundled with Word, Excel, PowerPoint, NOT an industrial strength DBMS Very useful for small applications, data analysis, decision making Works well with Excel; easy to use
28
ACCDB files
Access 2007/2010 files end in .accdb Previous versions ended in .mdb One database = one .accdb file Access handles one database file at a time. Each database consists of a number of tables. In other databases, you might find one file per table, rather than just one file per database. You will notice a temporary .laccdb file created whenever you open your Access database: this is a lock file that prevents problems when there is concurrent access by multiple users.
29
Industrial Strength Databases
Different brands of RDBMSs:
Commercial providers: Oracle, Microsoft SQL Server, DB2, Sybase, Informix, Open Source (free): MySQL, PostgreSql, All share a common view of things: tables All may be interacted with using SQL Knowledge of one directly transfers to all
30
Launch Microsoft Access
Open Lab1_Database_BBB.accdb The database window: your portal into the database
There are 6 tables in this database. Double-click a table to open it.
31
Relationships between tables
To explore the relationships between tables Go to Database Tools tab Click on Relationships You will see an overview of tables and their interconnections (see next slide)
32
Relationships between Tables
Primary key for Product table is ProdNum
33
Concepts & Terminology
Tables and Relationships In an RDBMS, all records (all data) are kept in tables (a.k.a. relations).
A relation = A table A relationship = A cross-reference between tables
The database consists of tables, each table having many rows (a.k.a. records or tuples). The 'Relationships View' of the database shows the tables and their attributes (columns), and the crossreferences between tables.
34
Structure vs Content
of a Database Table To view or edit the structure of a table:
Right-click it, then choose Design View or Table Design
To view or edit the contents of a table:
Double-click it, or Right-click it, then choose Open
35
Structure of a Table
Design View of Customers Table (ACCTNUM is Key)
Primary key for Customer table is AcctNum
36
Concepts & Terminology
Design View of a Single Table (e.g. Customers) Every table is made-up of attributes Every attribute has a data type Every table has a primary key: one or more attributes whose values uniquely identify each table record. Primary keys are shown with a key icon in Design View
37
Contents
of a Database Table If in Design View, and you want to view or edit the contents of the table:
Choose Datasheet View from the Design tab
38
Data in the Table
A Portion of the Customer Table
Records (rows) Attributes (columns) Fields (row elements or "cells").
39
Design View of Purchase
A two-attribute primary key: ACCTNUM and CAMPNUM Details of primary key assignment and database design are beyond the scope of this course. Typically, though, it is best to create a single, system generated (auto-incremented) column, that uniquely identifies each row.
40
Indexes
to improve performance
To improve the performance of your database queries, you should create indexes on any columns that are regularly looked up. Creating an index is easy: simply indicate Indexed as Yes in the settings for the column in Design view.
41
Key Points
Tables: hold all data in a database Rows = records Columns = attributes Attributes: have a data type (number, text, date, etc.) Fields: at the intersection of a row and a column Primary Keys: attributes whose values uniquely refer to any given row in a table
42
Single-Table Queries: Overview
Querying single tables:
Intuitions Microsoft Access Query Design View SQL Microsoft SQL-details
Database design intuitions
(Why split one table into many small ones)
43
Querying single tables
Table structure
Attributes (columns) Records (rows) Column order and row order do not matter
Queries subdividing a table
Pick the table you want to look at Pick the column(s) you wish to see Pick the row(s) you wish to see
44
Querying single tables
BBB Customers: What data do we have about customers
Name (first, middle, last) Address (street num, street, city, state, zip) Demographics (gender, money non-book purchases)
What do we want to know
45
Querying single tables: Intuitions
Names of all male customers
ACCTNUM 13015 13016 13017 13018 13971 14200 17640 17641 17642 17643 17644 17645 17646 19630 19631 19632 19633 19634 19635 23088 23089 23090 23360 23361 FIRSTNAME DENNIS LOIS JANET ANN SAMUEL EARL AUDREY DOROTHY SANDRA GERALD BETTY LYNN MARY CATHERINE EDITH KIMBERLY MARILYN JOSEPH ROBERT JAMES PATRICIA MARY CATHERINE HARRIET MIDDLEN AME FAMILYNAME N SKRYPEK T ENGLANDER SIECK A MARUZZI E HAYES J TUCHMAN S KAUFFMANN VACCHER M POSNOCK C CANNON K DEL HUEBBERS C TALLUTO R CASANO E WISCHHUSEN A SCHOENBERG FLAHERTY L FAIELLA A SANDLER M MILLER WERNER J MILLER J IMPERATORE A FAILLA STREET NUMBER STREET 1260 34th AV 105 9th ST 767 3rd ST 1948 10th ST 382 7th ST 1707 29th ST 1002 RICHMOND HILL AV 1751 17th AV 1016 MOON AV 1841 2nd AV 1794 MARIA ST 1357 14th AV 1861 2nd ST 965 7th ST 1520 1st ST 1081 18th ST 33 24th AV 1322 4th AV 316 18th ST 409 11th AV 1535 MARIS ST 379 5th ST 1671 24th ST 140 JAMES ST CITY PRT WASHINGTN WILLINGBORO FOXBORO TEWKSBURY CENTERVILLE LAKEWOOD WEST ORANGE BROOKLYN RIDGWAY LEXINGTN MULLICA ANDOVER NEPTUNE SCRANTON PITTSBURGH KINGSVILLE WARREN CLEVERDALE PHILADELPHIA SPRING LONG BEACH FRAMINGHAM BROOKLYN REHOBOTH STATE NY NJ MA MA PA NJ NJ NY PA MD NJ MA NJ PA PA MD NJ NY PA NY NY MA NY DE ZIP GENDER MONEY 11050 M 164 8046 F 254 2035 F 169 1876 F 65 16404 M 81 8701 M 135 7052 F 236 11234 M 68 15853 F 180 20653 M 159 8062 F 112 1810 F 41 7753 F 102 18503 F 209 15243 F 89 21087 F 262 7059 F 170 12820 M 200 19148 M 93 10977 M 63 11561 F 137 1701 F 50 11226 F 187 19971 F 190
46
Querying single tables: Intuitions
Names of all male customers
MIDDLE FIRSTNAME NAME DENNIS N SAMUEL E EARL J DOROTHY GERALD C JOSEPH L ROBERT A JAMES M
FAMILYNAME SKRYPEK HAYES TUCHMAN VACCHER CANNON FAIELLA SANDLER MILLER
47
Querying single tables: Intuitions
Query variations on single tables: 1. Pick attributes (columns) 2. Pick records (rows) specifying a single condition
(e.g. Pick all rows where Gender = 'M')
3. 4. 5. 6. 7. 8. 9.
Pick rows using two or more conditions Selecting rows with attributes you do not display Using string matches to pick rows Renaming an attribute (column) name Arithmetic operations in a query Sorting the output (ordering the rows) Mathematical aggregations in a query
48
Querying single tables:
Microsoft Access Query Design View
Go to the Create tab Choose either Query Wizard or Query Design Pick one table (for querying single tables) Pick the columns you want to retrieve, and specify the criteria for the rows you want to retrieve. Once in Query Design view, choose Design tab, then Run (! button on the toolbar) OR Datasheet View to see your query results.
49
Query Wizard
50
Query Wizard
Specify which table your data is in.
Press > button to select columns one at a time. Press >> button to select all columns.
51
Query Wizard
Choose Detail to retrieve individual rows that meet certain criteria. Choose Summary to compute aggregates (e.g. min, max, average, ..)
52
Query Wizard
Name your query, so that you can save it and view / modify it later
53
Query Design View
Select the Table you want to query, then press Add the button. Press Done button when finished.
54
Query Design View
Selecting Columns
Double-click the columns you want to include in your query, or drag-and-drop them from the table into the grid view at the bottom Double-click * to include them all
55
Query Design View
Selecting Columns
The grid view at the bottom shows you whether the column is visible (tick Show row). It also lets you specify filter conditions, using the Criteria row.
56
Query Design View
Selecting Columns
View the query results
Show tables, so you can add more columns to your query results.
Choose whether you want to: View data (SELECT) Insert a table (MAKE) Insert a row (APPEND) Edit a row (UPDATE) Produce aggregate reports (CROSSTAB) Remove rows of data (DELETE)
57
Viewing Saved Queries
Click on the down arrow to the right of the Tables heading in the database view Then choose Queries
58
Query Design View
Query 1: Picking attributes (columns)
Drag and drop
59
Query Design View
Viewing the query results
60
Query Design View
Saving the query
Right-click in the title bar for the Query Then choose Save from the pop-up menu Choose an intuitve and descriptive name (e.g. Names of our customers )
61
Viewing Saved Queries
After you save the SELECT query, it will look like this in the query list on the left
62
Query Results View
Switching back to the query design
63
Query Design View
Making a copy of a query
Right-click in the Query list on the left Choose Copy Right-click again, and choose Paste Choose an intuitive name for your new query
64
Query Design View
Query 2: Picking records (rows)
Drag and drop your criterion into the grid Specify the filter condition (e.g. = PA )
65
Query Design View
Query 2: Picking records (rows)
Now press Run or press DataSheet View to view your results Notice how the results only include customers who live in Pennsylvania (PA)
66
Query Design View
Query 3.1: Picking rows with two conditions - AND
67
Query Design View
Query 3.1: Picking rows with two conditions - AND
Here there are no records satisfying both conditions.
68
Query Design View
Query 3.2: Picking rows with two conditions - AND
69
Query Design View
Query 3.2: Picking rows with two conditions - AND
The ACCTNUM column is numeric, so specifying a string condition causes a data type mismatch error..
70
Query Design View
Query 3.3: Picking rows with two conditions - OR
71
Query Design View
Query 3.3: Picking rows with two conditions - OR
72
Query Design View
Query 4: Picking rows using attributes you do not display
Uncheck the Show box if you don t want to show the column in your results
73
Query Design View
Query 4: Picking rows using attributes you do not display
Notice that the customers Last Name is not shown, because we unchecked the Show box for Last Name in the Query Design View in the previous slide. 74
Query Design View
Query 5: Using string matching to pick rows
LIKE means match wildcards The wildcard * means match any number of characters So LIKE *IL* means find any strings that have IL anywhere inside them. That is, anything before IL and anything after. You can also use ? to match a single character In most other databases: use % instead of * use _ instead of ? You must use the keyword LIKE if you use a wildcard (e.g. * or ?). If you forget LIKE the database will look for an actual * or ? and you won t get the matches you wanted !!
75
Query Design View
Query 5: Using string matching to pick rows
Notice here we have all customers whose familyname has 'IL' in the middle of the string
76
Query Design View
Query 6: Renaming an attribute (column) name
[New column name] : [Old column name]
77
Query Design View
Query 6: Renaming an attribute (column) name
New column name appears in results, in place of old column name.
78
Query Design View
Query 7: Arithmetic operations in a query
Simply type the formula in the Field row. No loops or VBA code are necessary.
79
Query Design View
Query 7: Arithmetic operations in a query
SQL is called a declarative language, because we tell the database what we need, rather than how to get it, and the database gets the result in the most efficient way possible. There is no need to write loops. In contrast, VBA is a procedural language: we have to write loops to tell VBA exactly how to get the data we want. Procedural languages are far more verbose, and require more code to get the same job done.
80
Query Design View
Query 8: Sorting the output (ordering the rows)
81
Query Design View
Query 8: Sorting the output (ordering the rows)
Notice our results are now sorted by Tax, from highest to lowest
82
Query Design View
Query 9.1: Mathematical aggregations Average for all customers
Other aggregate functions: Min, Max, Count, See the Query Design Wizard.
83
Query Design View
Query 9.1: Mathematical aggregations Average for all customers
We can see the average Money for all customers in the data set
84
Query Design View
Query 9.2: Mathematical aggregations Average for a subset of customers
Aggregate function returns one row.
Non-Aggregate criterion returns multiple rows.
85
Query Design View
Query 9.2: Mathematical aggregations Average for a subset of customers
Aggregate function returns one row.
Non-Aggregate criterion returns multiple rows.
86
Query Design View
Query 9.2: Mathematical aggregations Average for a subset of customers
Solution: Do not show the non-aggregate filter condition column. That will fix the error.
87
Query Design View
Query 9.2: Mathematical aggregations Average for a subset of customers
We can see the average Money for all customers in Pennsylvania
88
Query Design View
Query 9.3: Mathematical aggregations Average for each group of customers
Sort by average money spent for the state.
Group customers by state.
89
Query Design View
Query 9.3: Mathematical aggregations Average for each group of customers
We can see the average money spent for customers in each state, and we ve sorted by average money so we can easily see which are most lucrative states are.
90
Query Design View
Building Complex Expressions
Right-click in the Field row. Then choose Build to build a more complex expression.
91
SQL
General query framework SQL is the language underlying Microsoft Access, and is the query language standard used by all major RDBMS
SELECT <pick attribute(s)>
in the order you want to see them
FROM <pick the table> [WHERE] <pick the row(s)>
92
Editing SQL
Just like "Editing a recorded VBA Macro : Write a query using Query Design View and then edit using SQL View or Open the SQL View from a new query and write in SQL from scratch
93
SQL View
Viewing the SQL underlying all your graphical query designs
For any query, you can view the SQL that Microsoft Access wrote for you, by simply choosing View and then SQL View. Editing the SQL directly is quicker and more powerful, so expert users tend to use SQL view, whereas novices tend to use the Graphical Design view.
94
SQL
Query 1: Picking attributes (columns)
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER SELECT STATE, ACCTNUM, FAMILYNAME FROM CUSTOMER
Attribute order does not matter
95
SQL
Query 2: Picking records (rows)
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA'
ACCTNUM FAMILYNAME 17642 POSNOCK 19630 CASANO 19631 WISCHHUSEN 19635 SANDLER 50670 KEYSER STATE PA PA PA PA PA ACCTNUM FAMILYNAME 19630 CASANO 50670 KEYSER 17642 POSNOCK 19635 SANDLER 19631 WISCHHUSEN STATE PA PA PA PA PA
Order of rows in output does not matter, and is not necessarily predictable
96
SQL
Query 3.1: Picking rows with two conditions - AND
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME = 'MILLER' SQL is case sensitive within quotes Microsoft Access is NOT SELECT acctnum, familyname, state FROM customer WHERE state = 'PA' and familyname = 'Miller' SELECT acctnum, familyname, state FROM customer WHERE state = 'PA' AND familyname = 'MILLER'
97
SQL
Query 3.2: Picking rows with two conditions - AND Remember that all attributes have a datatype SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND ACCTNUM = 'MILLER'
ACCTNUM is a NUMBER MILLER is a STRING
98
SQL
Query 3.3: Picking rows with two conditions - OR
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' OR FAMILYNAME = 'MILLER' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND (FAMILYNAME = 'MILLER' OR FAMILYNAME = 'SMITH')
99
SQL
Query 4: Picking rows using attributes you do not display SELECT ACCTNUM, STATE FROM CUSTOMER WHERE STATE = 'PA' OR FAMILYNAME = 'MILLER'
100
SQL
Query 5: Using string matching to pick rows*
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '%IL%' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE 'IL%' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '_IL'
% matches any number of characters (Use * in Access)
_ matches any one character (Use ? in Access)
Standard SQL notation shown above Microsoft Access notation differs as shown
SQL
Query 6: Renaming an attribute (column) name
SELECT ACCTNUM, FAMILYNAME AS Last_Name, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '*IL*' SELECT ACCTNUM, FAMILYNAME AS [Last Name], STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '*IL*' Bracket notation to allow white-space is unique to Microsoft
102
SQL
Query 7: Arithmetic operations in a query
SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 0.10 AS Taxed_Expense FROM CUSTOMER WHERE FAMILYNAME LIKE '*IL*' SELECT ACCTNUM, FAMILYNAME AS Last_Name, (MONEY + ACCTNUM)*1.09 AS Random_nonsense FROM CUSTOMER WHERE FAMILYNAME LIKE '*IL*'
103
SQL
Query 8: Sorting the output (ordering the rows)
SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 1.09 AS Taxed_Expense FROM CUSTOMER You must use the original WHERE FAMILYNAME LIKE '%IL%' column name in your ORDER BY clause. ORDER BY FAMILYNAME ASC Use ASC for Ascending order or
DESC for Descending order.
SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 1.09 AS Taxed_Expense FROM CUSTOMER If you use the new column name, it won t work. WHERE FAMILYNAME LIKE '%IL%' ORDER BY Last_Name DESC
104
SQL
Query 9.1: Mathematical aggregations Average for all customers
SELECT AVG(MONEY) AS Average_Expense FROM CUSTOMER
105
SQL
Query 9.2: Mathematical aggregations Average for a subset of customers
SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER WHERE STATE = "PA"
SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER
Be careful when selecting both aggregations and ordinary attributes
106
SQL
Query 9.2: Mathematical aggregations Average for a subset of customers
SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER WHERE STATE = "PA"
Be careful when selecting both aggregations and ordinary attributes
107
SQL
Query 9.3: Mathematical aggregations Average for each group of customers
SELECT Avg(CUSTOMER.[Money]) AS [Average Money], CUSTOMER.State FROM CUSTOMER GROUP BY CUSTOMER.State ORDER BY Avg(CUSTOMER.[Money]) DESC;
108
Microsoft Flavored SQL
Attribute names in the SELECT clause are prefaced by table name
(e.g. CUSTOMER.ACCTNUM)
Use of the * versus % in matching multiple characters in LIKE expression Use of ? Versus _ in String matching single character in LIKE expression Use of [ ] in renaming to allow white spaces e.g. CUSTOMER.FAMILYNAME AS [Last Name] Case sensitivity (Microsoft Access is case insensitive e.g. State = "PA" vs. State = "pa")
109
WARNING
Case Sensitivity in SQL
Text in Microsoft Access is not case-sensitive. However, many other databases are case-sensitive. To avoid problems with case-sensitivity in SQL queries, use the UPPER() function, to convert the text to uppercase, before using it !
SELECT * FROM CUSTOMER WHERE UPPER(STATE) = PA
110
Key Points
Querying is simple
Pick the table Pick the columns Pick the rows (using some simple or complex criteria)
Create queries just like VBA Macros
Begin in Query Design View; Edit in SQL View Write queries in SQL View from scratch
Nine (9) simple cases were given as examples. Be wary of exceptions (e.g. SELECT AVG(MONEY), STATE)
111
Multi-Table Queries: Overview
Database design intuitions
(Why split one table into many small ones)
Querying two tables
Intuitions SQL Microsoft Access Query Design View Microsoft SQL-details
Querying more than two tables
112
Laying out Data in a Single Spreadsheet Table
Sort by product
PRODNAME CAR MAINTANANCE CATNAME PRICE CAMPDATE CHANNAME DO-IT-YOURSELF 15 12/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13971 1 1/21/1988 14200 1 1/8/1988 42763 1 1/13/1988 13018 1 1/26/1987 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 17646 1 3/8/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987
SECRETS OF FRENCH COOKING COOK
15
12/1/1986 "ROCK STARS" - MAG
2/1/1987 2 MIN TV SPOT ESPN
113
Laying out Data in a Single Spreadsheet Table
Sort by channel
PRODNAME CATNAME PRICE CAMPDATE CHANNAME SECRETS OF FRENCH COOKING COOK 15 12/1/1986 "ROCK STARS" - MAG CAR MAINTANANCE DO-IT-YOURSELF 15 12/1/1987 2 MIN TV SPOT ESPN 2/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13018 1 1/26/1987 13971 1 1/21/1988 17646 1 3/8/1987 14200 1 1/8/1988 42763 1 1/13/1988 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987
114
Laying out Data in a Single Spreadsheet Table
Filling out the table
PRODNAME CAR MAINTANANCE CAR MAINTANANCE CAR MAINTANANCE SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING CATNAME DO-IT-YOURSELF DO-IT-YOURSELF DO-IT-YOURSELF COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK PRICE CAMPDATE CHANNAME 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13971 1 1/21/1988 14200 1 1/8/1988 42763 1 1/13/1988 13018 1 1/26/1987 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 17646 1 3/8/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987
115
Database Design
Splitting tables Products
PROD CAT NUM PRODNAME NUM 1 ALICE IN WONDERLAND 1 2 PINNOCCHIO 1 21 SECRETS OF FRENCH COOKING 3 31 CAR MAINTANANCE 4 33 GARDENING 4
Campaign
CAMP PROD CHAN NUM NUM PRICE CAMPDATE NUM 21 21 15 12/1/1986 13 221 21 15 2/1/1987 3 201 1 15 2/1/1987 4
Category
CAT NUM 1 2 3 4 5 6 7 CATNAME CHILDREN YOUTH COOK DO-IT-YOURSELF REFERENCE ART GEOGRAPHY
Channels
CHAN NUM 3 4 6 13 CHANNAME 2 MINUTES SPOT - TV - ESPN (SPORT CHANNEL) 2 MINUTES SPOT - TV - MTV (MUSIC CHANNEL) ADVERTISING IN LOCAL NEWSPAPERS "ROCK STARS" - MAGAZINE
116
Database Design
Splitting tables (cont)
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987 25508 21 1 1/26/1987 27028 21 1 1/20/1987 27259 21 1 1/22/1987 34652 21 1 1/23/1987 39403 21 1 1/18/1987 50670 21 1 1/26/1987 58775 21 1 1/11/1987 17646 221 1 3/8/1987 23088 221 1 3/20/1987 32698 221 1 3/18/1987 44985 221 1 3/16/1987 54690 221 1 3/20/1987 13971 1231 1 1/21/1988 14200 1231 1 1/8/1988 42763 1231 1 1/13/1988
117
Purchase
Database Design
Splitting tables
Go to Database Tools table and click Relationships button to view this
118
Database Design
Splitting tables
Avoid repetition within individual tables Separate attributes that are used independently into separate tables Introduce (foreign) keys to link tables and keep any redundancy minimal/simple
119
Querying two tables
Table structure
Attributes (columns) Records (rows) Column order and row order do not matter
Queries combine and then subdivide
Pick two tables that you want to look at Combine the two tables (Cartesian product) Pick the row(s) you wish to see Pick the column(s) you wish to see
120
Querying two tables
BBB Customers and Purchases What data do we have about customers What data do we have about purchases
Who purchased (Acctnum) In response to what advertising campaign In what quantity When did they purchase
What do we want to know
e.g. Find responses by customers to campaigns
121
Querying two tables
Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987
Step 1. Pick two tables
Customer (truncated to fit on the slide)
ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190
122
Querying two tables
Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987
Step 2. Combine the two tables
(Cartesian Product i.e. every row from first table combined, in turn, with every row from second)
Customer (truncated to fit on the slide)
ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190
123
Querying two tables
Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987
Customer
ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET
STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987
ACCT NUM FIRSTNAME STATE ZIP GENDER MONEY 13015 DENNIS NY 11050 M 164
13018 13018
21 21
1 1
1/26/1987 1/26/1987
19635 ROBERT
23361 HARRIET
PA
DE
19148 M
19971 F
93
190
continue for all permutations of Purchase and Customer records
124
Querying two tables
Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987
Customer
ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET
STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190
Step 3. Pick some rows:
Purchase.Acctnum = Customer.Acctnum
PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME 13018 21 1 1/26/1987 13015 DENNIS 13018 21 1 1/26/1987 19635 ROBERT 13018 21 1 1/26/1987 23361 HARRIET 19635 21 1 1/9/1987 13015 DENNIS 19635 21 1 1/9/1987 19635 ROBERT 19635 21 1 1/9/1987 23361 HARRIET 23361 21 1 1/22/1987 13015 DENNIS 23361 21 1 1/22/1987 19635 ROBERT 23361 21 1 1/22/1987 23361 HARRIET
STATE NY PA DE NY PA DE NY PA DE
ZIP 11050 19148 19971 11050 19148 19971 11050 19148 19971
GENDER MONEY M 164 M 93 F 190 M 164 M 93 F 190 M 164 M 93 F 190
125
Querying two tables
Customer response (purchases) to campaigns
PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME 13018 21 1 1/26/1987 13015 DENNIS 13018 21 1 1/26/1987 19635 ROBERT 13018 21 1 1/26/1987 23361 HARRIET 19635 21 1 1/9/1987 13015 DENNIS 19635 21 1 1/9/1987 19635 ROBERT 19635 21 1 1/9/1987 23361 HARRIET 23361 21 1 1/22/1987 13015 DENNIS 23361 21 1 1/22/1987 19635 ROBERT 23361 21 1 1/22/1987 23361 HARRIET
STATE NY PA DE NY PA DE NY PA DE
ZIP 11050 19148 19971 11050 19148 19971 11050 19148 19971
GENDER MONEY M 164 M 93 F 190 M 164 M 93 F 190 M 164 M 93 F 190
Step 4. Pick some columns
PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME STATE ZIP GENDER MONEY PA 19148 M 93 19635 21 1 1/9/1987 19635 ROBERT DE 19971 F 190 23361 21 1 1/22/1987 23361 HARRIET
126
Querying two tables: The final result
Customer response (purchases) to campaigns
CAMP NUM QTY 21 1 21 1
ACCT NUM GENDER 19635 M 23361 F
127
SQL
General JOIN framework: two tables
SELECT
<pick attribute(s)>
in the order you want to see them prefix attributes with tablename e.g. table1.attr1
FROM WHERE
<table1>, <table2> <pick the row(s)>
You must cross-reference (i.e. join) the related columns in the tables, using table1.columnA = table2.columnA
128
SQL
Customer responses to campaigns (i.e. customer purchases)
Notice how, unlike VBA, no line-continuation character is used in SQL
SELECT CAMPNUM, QUANTITY, CUSTOMER.ACCTNUM, GENDER FROM PURCHASE, CUSTOMER WHERE PURCHASE.ACCTNUM = CUSTOMER.ACCTNUM
129
SQL
Customer responses to campaigns (i.e. customer purchases)
If you make a spelling mistake in your query, you ll get an (error) message like the one shown below
130
SQL
A nonsense query for example purposes
SELECT CAMPNUM, PRICE, ACCTNUM FROM CAMPAIGN, CUSTOMER WHERE PRICE > MONEY OR FIRSTNAME LIKE 'ROB%'
131
Query Design View
Querying two tables Two cases using Microsoft Access Query Design View: 1. If both tables have a key attribute with the same name (e.g. PURCHASE and CUSTOMER both have an attribute called ACCTNUM), then Microsoft Access will automatically create the join. 2. In all other circumstances, you must manually specify the join:
the tables have attributes with the same name, but one or both are not designated as key attributes the key attributes in different tables do not have the same name the tables have no shared attributes whatsoever
132
Query Design View
Finding all purchases by customers: Picking our tables
Press Shift + Click to select multiple tables, and press the Add button. Press Close button when done.
133
Query Design View
Case 1: Key attribute with same name
Drag and drop your columns into the grid In this case there is no need to explicitly say: Customer.AcctNum = Purchase.Acctnum because Microsoft Access is intelligent enough to infer the relationship between the tables automatically
134
Query Design View: Results of Query
Case 1: Key attribute with same name
This result isn t very readable Campaign number doesn t mean much: Product name (from the Product table) would be more descriptive Account number doesn t mean much: Customer name (from the Customer table) would be more descriptive
135
SQL View
Case 1: Key attribute with same name
Microsoft Access generates some pretty nasty looking SQL:
You can actually simplify it to this:
136
Query Design View
Case 2: No shared attributes Finding all products purchased by customers
The Customer and Product tables are not directly related, so Microsoft Access is unable to infer the relationship.
137
Query Design View
Case 2: No shared attributes Finding all products purchased by customers
To find the relationship between Customers and Products: Go to Show Tables, then Select Purchase and Campaign tables, which link Customer to Product Press Add button Notice the relationships now visible
138
Querying more than two tables
Example 1: What products did each customer purchase?
139
Querying more than two tables
What products did each customer purchase?
140
Querying more than two tables
Following the Yellow Brick Road
141
Querying more than two tables
What products did each customer purchase?
142
Querying more than two tables
What products did each customer purchase?
Microsoft Access generates some pretty nasty looking SQL:
You can actually simplify it to this:
143
Querying more than two tables: Results
What products did each customer purchase?
To confirm you have the correct number of results, check that you have the same number of rows here as there are in the PURCHASES table. There were 19 purchases, so there should be 19 results here !
144
Querying more than two tables: Results What happens if you forget the join / cross-reference?
Notice that you get an impossible number of results if you forget the JOIN in the query: i.e. if you forget to specify which columns cross-reference ! There were 19 purchases, so there should be 19 results here !
145
Querying more than two tables
Example 2: What channels was each product advertised on ?
146
Querying more than two tables
What channels was each product advertised in ?
147
Querying more than two tables
What channels was each product advertised in ?
148
Querying more than two tables
What channels was each product advertised in ?
149
Querying more than two tables
What channels was each product advertised in ?
150
Querying more than two tables
What channels was each product advertised in ?
Microsoft Access generates some pretty nasty looking SQL:
You can actually simplify it to this:
151
Querying more than two tables
What channels was each product advertised in ?
The results of the query :
To confirm you have the correct number of results, check that you have the same number of rows here as there are in the CAMPAIGN table. There were 3 campaigns, so there should be 3 results here !
152
Querying more than two tables
Option 1: Create one giant table Create one giant table (Cartesian Product)
ABCD
Option 2: Combine tables pair-wise Associative: (A B) C = A (B C) Combine in stages. For three or more
A B (A B) C ((A B) C ) D
153
Key Points
Divide tables for ease in managing data Query two (or more) tables like one big table
Combine tables: Cartesian Product Put tables side-by-side: every row permutation Look for attributes that link tables (e.g. Acctnum in CUSTOMER and PURCHASES)
Create queries just like VBA Macros
Begin in Query Design View; Edit in SQL View, or Write queries in SQL View from scratch
154
Inserting, Updating, and Deleting Data
When you open Access, you must click Options on the message bar, and choose Enable this content, if you want SQL INSERTs, UPDATES, and DELETES to work!
155
Inserting Data
Use the INSERT statement to append new data.
Here are the various parts of the INSERT statement: INSERT INTO [table-name] ( [ column1, column2, ] ) VALUES ( [value for column1], [value for column2], );
156
Inserting Data
Use the INSERT statement to append new data.
157
Inserting Data
After you save the INSERT query, it will look like this in the query list on the left
158
Updating Data
Use the UPDATE statement to edit existing data.
Here are the various parts of the UPDATE statement: UPDATE [table-name] SET [column1] = [value], [column2] = [value], WHERE [column] = [value]
DON T FORGET YOUR WHERE CLAUSE !! IF YOU FORGET IT YOU COULD END UP MISTAKENLY UPDATING ALL DATA IN THE TABLE ! Access 2007 will give you a warning if you inadvertently update multiple rows, but, if you execute the UPDATE statement from Excel 2007, you ll get no such warning !!
159
Updating Data
Use the UPDATE statement to edit existing data.
160
Updating Data
After you save the UPDATE query, it will look like this in the query list on the left
NOTE: It is often bad practice to update data in a database. For example, rather than update a customer s address, you can add an event table e.g. a table called MOVE with a date the customer moved. That way you always have an audit log of when the customer moved, and you have not lost any data. You might, for instance, later want to count how many times the customer moved in the past 3 years, so you can compute a credit score for them.
161
Deleting Data
Use the DELETE statement to permanently remove a row of data.
Here are the various parts of the DELETE statement: DELETE FROM [table-name] WHERE [column] = [value]
DON T FORGET YOUR WHERE CLAUSE !! IF YOU FORGET IT YOU COULD END UP MISTAKENLY DELETING ALL DATA IN THE TABLE ! Access 2007 will give you a warning if you inadvertently delete multiple rows, but, if you execute the DELETE statement from Excel 2007, you ll get no such warning, and it will be impossible to recover the data you have deleted !!
162
Deleting Data
Use the DELETE statement to permanently remove a row of data.
163
Deleting Data
After you save the DELETE query, it will look like this in the query list on the left
NOTE: It is generally bad practice to delete data from a database. Rather, you want to add a column, and mark the data as inactive (include a column with the date it was marked inactive and a column with the user-ID of the user who marked it as inactive, if necessary). That way you always have an audit log of when data was deleted , and you can always recover the data later.
164
Obtaining Data via Email
Microsoft Access 2007 contains great new features to allow you to collect data via email. You need to have Microsoft Outlook 2007 installed and working. Click on External Data tab in Access 2007. Then click on Create Email.
165
Obtaining Data via Email
A wizard will help you through the process of composing your electronic data collection form.
166
Obtaining Data via Email
Recipients will need to have an HTML-capable email client, to be able to respond to your emails. Luckily, most people have one nowadays.
167
Obtaining Data via Email
Specify what you would like to do with the data.
168
Obtaining Data via Email
Specify which fields (database columns) you want to include in your email.
169
Obtaining Data via Email
Specify how you want to hand replies.
170
Obtaining Data via Email
Set automatic processing options. You can choose to process replies automatically, or to manually add replies to the database after you have checked each individually in Outlook.
171
Obtaining Data via Email
Select how you want to specify the email addresses of recipients.
172
Obtaining Data via Email
If your email addresses are in your database, specify whether the addresses are in the current table, or in a linked table.
173
Obtaining Data via Email
Specify the contents of the email you are sending.
174
Obtaining Data via Email
Confirm the email addresses of the message recipients.
175
Obtaining Data via Email
After sending your email, wait for some responses, then click Manage Replies to manage your responses.
176
Obtaining Data via Email
Under Manage Replies, you can choose to resend the email, if it wasn t sent successfully the first time.
177
Lab Exercise
Our next session will be a
Databases
lab exercise ROOM JMHH 380
178
CREDITS: Alan Abrahams, VA Tech
179
S2:
Data
Structures
and
SQL
Shawndra
Hill
Spring
2013
TR
1:30-3pm
and
3-4:30