0% found this document useful (0 votes)
42 views127 pages

Course Slides - SQL Fundamentals

The document outlines the SQL Fundamentals course objectives, focusing on basic SQL queries, data manipulation, and SQL theory. It includes practical steps for using Azure Data Studio, examples of SQL code, and various SQL functions such as SELECT, GROUP BY, and HAVING. Additionally, it provides exercises for students to apply their SQL knowledge in real-world scenarios.

Uploaded by

waihong114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views127 pages

Course Slides - SQL Fundamentals

The document outlines the SQL Fundamentals course objectives, focusing on basic SQL queries, data manipulation, and SQL theory. It includes practical steps for using Azure Data Studio, examples of SQL code, and various SQL functions such as SELECT, GROUP BY, and HAVING. Additionally, it provides exercises for students to apply their SQL knowledge in real-world scenarios.

Uploaded by

waihong114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

SQL Fundamentals

BIDA TM - Business Intelligence & Data Analysis


Course Objectives

Basic SQL Queries Manipulating Values SQL Theory


Retrieve and filter data from a Transform your data with the Understand essential terminology
relational database using basic help of numerical, date and text related to SQL, databases and data
SQL syntax. functions. warehouses.

Working With Multiple Tables SQL for Reporting


Write queries that combine Use SQL in popular BI tools, and learn
data from multiple tables. how to summarize the results of SQL
queries.

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals
Basic SQL Queries
BIDA TM
- Intro
- Business
to Business
Intelligence
Intelligence
& Data
® Analysis
Basic SQL Queries - Section Objectives

Tasks

Retrieve data from a relational Filter the results and remove


01. database using basic SQL queries. 02. duplicate values

Skills

Order of SELECT & WHERE GROUP BY and ORDER BY TOP N,


Operations FROM Clause Clause HAVING OFFSET-FETCH,
DISTINCT

BIDA TM - Business Intelligence & Data Analysis


Applied Steps

Video – Install and Intro to Azure Data Studio

1 Do a web search for ‘Download and install azure data studio’


2 Download the latest release and install it. (Do not click Download SQL Server)

Video – Creating a connection to a database

1 Open SQL Operations Studio/Azure Data Studio


2 Click new connection & enter the credentials provided in the student files folder.
5 Expand the database folder, you should see AdventureWorksDW
6 Right click on the newly connected query icon and Select New Query

BIDA TM - Business Intelligence & Data Analysis


Adventure Works Intro

Company Overview
• Large, multinational manufacturing company
• Focused on metal and composite bicycles to
North American, European, and Australian
commercial markets
• 290 employees
• Several regional sales teams

Business Goals:
• Broaden market share by targeting their
sales
• Extend product availability through an
external Website

BIDA TM - Business Intelligence & Data Analysis


Database Connection Details

Please refer to the SQL Credentials text file in your SQL Student Files folder

BIDA TM - Business Intelligence & Data Analysis


Example SQL Code

Create a new query and type the following code. Run the code using the PLAY button.

SQL Code

SELECT
CustomerKey AS CustomerID,
SUM(SalesAmount) AS SalesAmount

FROM FactInternetSales

Query WHERE YEAR(OrderDate) > 2020

GROUP BY CustomerKey

HAVING SUM(SalesAmount) > 10000

ORDER BY SalesAmount DESC

BIDA TM - Business Intelligence & Data Analysis


Saving your queries in Notepadd++

An advanced notepad is a great way to create a code repository, for SQL and other coding languages.

NOTEPAD++ (Windows) Bbedit 13 (MAC)

BIDA TM - Business Intelligence & Data Analysis


SELECT and FROM

SELECT and FROM are the two most basic key words that form part of any SQL query.

SELECT * allows us to see all the columns in a table, however we should only use this for testing.
SQL Code

SELECT *
Query
FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


SELECT specific columns

We can include specific columns in our SELECT statement to make our query more specific.

SQL Code

SELECT

SalesOrderNumber,
OrderDate,
Query SalesAmount,
TaxAmt,
OrderQuantity

FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


Creating a column alias

A column alias allows us to modify the database name of a column.

SQL Code

SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
Code SalesAmount,
TaxAmt,
OrderQuantity

FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


Using WHERE to filter rows

WHERE allows us to filter the rows that are selected in the FROM tables.

SQL Code

SELECT
SalesOrderNumber AS InvoiceNumber,
OrderDate,
SalesAmount,
TaxAmt,
Query OrderQuantity,
SalesTerritoryKey

FROM FactInternetSales
WHERE SalesTerritoryKey = 6

After using a WHERE filter, use the ROW count at the bottom of the query to see how many rows remain.

BIDA TM - Business Intelligence & Data Analysis


Limiting results to 1 invoice number for testing

We can use WHERE to filter to a single invoice number to help us test the group by in the next video.

SQL Code

SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
SalesAmount,
TaxAmt,
Query
OrderQuantity

FROM FactInternetSales
--WHERE SalesTerritoryKey = 6
WHERE SalesOrderNumber = ‘S051182’

BIDA TM - Business Intelligence & Data Analysis


Using GROUP BY to Combine Rows

The GROUP BY clause allows us to combine rows into a single row.

Current Returned Rows Grouped Rows

InvoiceNumber OrderDate SalesAmount TaxAmt OrderQuantity InvoiceNumber OrderDate SalesAmount TaxAmt OrderQuantity

SO51182 2012-12-28 1700.99 136.0792 1 SO51182 2012-12-28 1750.98 140.0784 2


SO51182 2012-12-28 49.99 3.9992 1

GROUP GROUP SUM SUM SUM


BY BY SELECT

SalesOrderNumber AS InvoiceNumber,
Every column in the selected dataset must either be grouped, or
OrderDate,
aggregated.
SUM(SalesAmount) AS InvoiceSubTotal,
SUM(TaxAmt) AS TaxAmount,
• Groups are determined by the attributes we specify in the
SUM(OrderQuantity) AS TotalQuantity
GROUP BY statement
FROM FactInternetSales
• Aggregate functions are listed in the SELECT statement.
--WHERE SalesTerritoryKey = 6
WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate

BIDA TM - Business Intelligence & Data Analysis


Using GROUP BY to combine invoice numbers

The GROUP BY function allows us to combine rows in our output table.

SQL Code

SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal,
SUM(TaxAmt) AS TaxAmount,
Query SUM(OrderQuantity) AS TotalQuantity

FROM FactInternetSales
--WHERE SalesTerritoryKey = 6
WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate

BIDA TM - Business Intelligence & Data Analysis


Using HAVING to Filter Grouped Rows

Updated request: View only invoices where total amount is more than $1,000.

Original Rows from FactSales Grouped Invoice Rows


SalesNumber SalesTerritoryKey Amount InvoiceNumber InvoiceSubTotal
SO24982 1 3255 SO43697 3578.2700

SO24982 1 40 SO43769 3578.2700

SO24982 2 565 SO43836 699.0982

SO34506 4 25 SO44005 699.0982

SO32452 6 99.99 SO44023 3374.9900

SO53142 6 27.99 SO53142 3374.9900

SO53142 7 48.99 SO53143 3374.9900

WHERE SalesTerritoryKey = 6 HAVING InvoiceSubTotal > 1000


SalesNumber SalesTerritoryKey Amount InvoiceNumber InvoiceSubTotal
SO32452 6 99.99 SO43836 699.0982

SO53142 6 27.99 SO44005 699.0982

BIDA TM - Business Intelligence & Data Analysis


Filtering grouped invoices with HAVING

The HAVING statement allows us to filter grouped rows in our output table.

SQL Code

SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal,
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity
Query
FROM FactInternetSales
WHERE SalesTerritoryKey = 6
--WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate


HAVING SUM(SalesAmount) > 1000

BIDA TM - Business Intelligence & Data Analysis


SQL Order of Operations

SQL code is not processed in the order it is written.

In particular, the SELECT statement is processed later on.

This means the results of the SELECT statement such as column aliases cannot be used in earlier steps.

FROM WHERE GROUP BY HAVING SELECT ORDER BY

BIDA TM - Business Intelligence & Data Analysis


Using ORDER BY to sort query rows

The ORDER BY statement allows us to sort the final rows in our table.

SQL Code

SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal,
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity

Query FROM FactInternetSales


WHERE SalesTerritoryKey = 6
--WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate


HAVING SUM(SalesAmount) > 1000

ORDER BY InvoiceSubTotal ASC

BIDA TM - Business Intelligence & Data Analysis


Recap and common errors

The SELECT statement is processed after the HAVING statement.


Derived columns in the SELECT statement cannot refer to other column aliases.

SQL Code
SELECT

SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal,
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity,
SUM(SalesAmount) + SUM (TaxAmt) AS InvoiceTotal
Query
FROM FactInternetSales
WHERE SalesTerritoryKey = 6
--WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate


HAVING SUM(SalesAmount) > 1000

ORDER BY InvoiceSubTotal ASC

BIDA TM - Business Intelligence & Data Analysis


Filtering Rows: TOP N

The TOP (N) statement allows us to return the first N rows from our output table.

You will need to use ORDER BY to ensure you get the true TOP rows.

SQL Code

SELECT TOP(10)
FROM
SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal, WHERE
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity,
SUM(SalesAmount) + SUM (TaxAmt) AS InvoiceTotal GROUP BY
Query
FROM FactInternetSales
WHERE SalesTerritoryKey = 6 HAVING

--WHERE SalesOrderNumber = ‘S051182’


SELECT & TOP
GROUP BY SalesOrderNumber, OrderDate
HAVING SUM(SalesAmount) > 1000
ORDER BY
ORDER BY InvoiceSubTotal ASC

BIDA TM - Business Intelligence & Data Analysis


Filtering Rows: TOP N Percent

The TOP (N) PERCENT statement allows us to return the first N % of rows from our output table.

SQL Code

SELECT TOP(10) PERCENT


FROM
SalesOrderNumber AS InvoiceNumber,
OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal, WHERE
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity,
SUM(SalesAmount) + SUM (TaxAmt) AS InvoiceTotal GROUP BY

Query
FROM FactInternetSales
WHERE SalesTerritoryKey = 6 HAVING
--WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate SELECT & TOP


HAVING SUM(SalesAmount) > 1000
ORDER BY
ORDER BY InvoiceSubTotal ASC

BIDA TM - Business Intelligence & Data Analysis


Filtering Rows: OFFSET FETCH

The OFFSET FETCH statement is similar to TOP but allows us to skip the first N rows from our output table.

SQL Code

SELECT
SalesOrderNumber AS InvoiceNumber, FROM

OrderDate,
SUM(SalesAmount) AS InvoiceSubTotal,
WHERE
SUM(TaxAmt) AS TaxAmount,
SUM(OrderQuantity) AS TotalQuantity,
SUM(SalesAmount) + SUM (TaxAmt) AS InvoiceTotal GROUP BY

FROM FactInternetSales
Query WHERE SalesTerritoryKey = 6 HAVING
--WHERE SalesOrderNumber = ‘S051182’

GROUP BY SalesOrderNumber, OrderDate SELECT


HAVING SUM(SalesAmount) > 1000

ORDER BY InvoiceSubTotal ASC ORDER BY

OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY


OFFSET FETCH

BIDA TM - Business Intelligence & Data Analysis


Filtering Rows: DISTINCT

The DISTINCT statement allows us to return a unique list of rows in the output table.

SQL Code

FROM

WHERE
SELECT DISTINCT

CustomerKey GROUP BY

Query
FROM FactInternetSales HAVING

ORDER BY CustomerKey
SELECT & DISTINCT

ORDER BY

BIDA TM - Business Intelligence & Data Analysis


ORDER BY another column

We can order our final output table using a column that is not present in our output table.

SQL Code

SELECT

CustomerKey
Query
FROM FactInternetSales

ORDER BY SalesAmount

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 1a

Create a list of product costs, grouped by invoice numbers.

1. Write a query to return InvoiceNumber and InvoiceNumber TotalProductCost


TotalProductCost from the FactInternetSales table. 1 SO43697 2171.2942

1. Return only invoices that HAVE a total product cost 2 SO43702 2171.2942

per Invoice Number > 2000. 3 SO43703 2171.2942

4 SO43706 2171.2942
Hint: You first need to group by the invoice to get the
5 SO43707 2171.2942
total and then filter.
6 SO43709 2171.2942

7 SO43710 2171.2942

..

1551 SO46602 2171.2942

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 1b

We need a detailed list of invoices and invoice line numbers, but we’re only interested in currency key 100.

1. Write a query to return InvoiceNumber, Invoice InvoiceNumber InvoiceLineNumber SalesAmount


LineNumber and SalesAmount from the
FactInternetSales table. 1 SO43699 1 3399.99

2 SO43700 1 699.0982
1. Return only lines WHERE the currency key is 100.
3 SO43702 1 3578.27

Hint: Since line number is the lowest level of detail in the 4 SO43706 1 3578.27
FactInternetSales table, you won’t need to use GROUP 5 SO43707 1 3578.27
BY.
6 SO43711 1 3578.27

7 SO43713 1 3578.27

...

33400 SO75123 3 8.99

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 1c

We have a new data analyst in the team who wants to see a unique list of sales territory keys. This will help
her to better understand the database.
SalesTerritory
1. Write a query to return the sales territory column
1 1
from the FactInternetSales table.
2 2
1. Return a unique list of territories only. 3 3

4 4
1. Order the results alphabetically for ease.
5 5

6 6

7 7

8 8

9 9

10 10

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals
Manipulating Values
BIDA TM
- Intro
- Business
to Business
Intelligence
Intelligence
& Data
® Analysis
Manipulating Values - Section Objectives

Tasks

Transform your data with the help Filter our data by creating
01. of numerical, date and text 02. conditions that test if something
functions. is true or false.

Skills

Data Types Functions Comparison Logical Logical Data Type


Operators Operators Functions Conversion

BIDA TM - Business Intelligence & Data Analysis


Aggregate Functions

Aggregate functions help define how we want values to be treated when we use a
GROUP BY in our query.

Function Description
SUM Returns the sum of all the values, or only the DISTINCT values, in the
expression. SUM works with numeric columns only. Null values are ignored.
AVG Returns the average of the values in a group. It ignores null values.
COUNT Returns the number of items found in a group.
MAX Returns the maximum value in a group.
MIN Returns the minimum value in a group

BIDA TM - Business Intelligence & Data Analysis


Counting rows with COUNT(*) aggregation

COUNT(*) is a special aggregated function that allows us to count the rows in a table.

Aggregated functions cannot be used with non-aggregated columns.

SQL Code

SELECT

COUNT(*) AS TotalCustomers,
Query AVG(YearlyIncome) AS AverageIncome,
--YearlyIncome AS TotalIncome –Non aggregated columns cannot be used alongside aggregated columns

FROM DimCustomer

BIDA TM - Business Intelligence & Data Analysis


How aggregate functions respond to NULL values

Aggregate functions ignore NULL values. For example COUNT will not include NULL values.

SQL Code

--Use this query to look at everything in the customer table.


Query 1
SELECT * FROM DimCustomer

--Use this query to count the total number of customers


Query 2
SELECT COUNT(*) AS TotalCustomers FROM DimCustomer

--Notice that when you count the same table by middle name, you get a lower count. That’s because it ignore the NULLs.
Query 3
SELECT COUNT(MiddleName) AS MiddleNameCount FROM DimCustomer

BIDA TM - Business Intelligence & Data Analysis


Data Types

Numeric Date & Time String Other

BIDA TM - Business Intelligence & Data Analysis


Numeric Data Types

Data Type Description Storage


bit An integer data type that can take a value of 1, 0, or NULL 1 byte
tinyint whole numbers from 0 to 255 1 byte
smallint whole numbers between -32,768 and 32,767 2 bytes
int whole numbers between -2,147,483,648 and 2,147,483,647 4 bytes
bigint whole numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 8 bytes
P = The maximum number of decimal digits to be stored. This number includes both the
left and the right sides of the decimal point.
Decimal(p,s) 5-17 bytes
S = The number of decimal digits that are stored to the right of the decimal point.
Ex. decimal (4,2), 2 digits before the decimal point, two digits after. Allows up to +/- 10^38.
Numeric(p,s) Same as above 5-17 bytes
Smallmoney Accurate to a ten-thousandth of the monetary units, from - 214,748.3648 to 214,748.3647 4 bytes
money Accurate to a ten-thousandth of the monetary units, up to +/- 922,337,203,685,477.5808 8 bytes
Float(n) n is the number of bits that are used to store the mantissa of the float number and, 4-8 bytes
therefore, dictates the precision and storage size. If n is specified, it must be a value
between 1 and 53. The default value of n is 53.
Takes values from -1.79E + 308 to 1.79E + 308.
real Floating precision number data from -3.40E + 38 to 3.40E + 38 4 bytes

BIDA TM - Business Intelligence & Data Analysis


Numeric Functions

Function Description Statement Result

ROUND Rounds a number to a specified SELECT ROUND(2.36, 1) 2.4


number of decimal places
ABS Returns the absolute value of a SELECT ABS (-353) 353
number
CEILING Returns the smallest integer value that SELECT CEILING (5.75) 6
is >= a number
FLOOR Returns the largest integer value that SELECT FLOOR (5.75) 5
is <= to a number

BIDA TM - Business Intelligence & Data Analysis


Numeric Functions

Numeric functions are used to manipulate number type data in a column.


SQL Code

SELECT TOP(10) PERCENT


SalesOrderNumber AS InvoiceNumber,
OrderDate,

SUM(SalesAmount) AS InvoiceSubTotal,
ROUND(SUM(SalesAmount),1) AS InvoiceSubTotalRounded,

SUM(TaxAmt) AS TaxAmount,
FLOOR(SUM(TaxAmt)) AS TaxAmountFloor,
Query
SUM(OrderQuantity) AS TotalQuantity,
SUM(SalesAmount) + SUM(TaxAmt) AS InvoiceTotal

FROM FactInternetSales
WHERE SalesTerritoryKey = 6

GROUP BY SalesOrderNumber, OrderDate


HAVING SUM(SalesAmount) > 1000

ORDER BY InvoiceSubTotal DESC

BIDA TM - Business Intelligence & Data Analysis


Where is the BOOLEAN data type

The BOOLEAN data type is represented by the BIT number type in SQL.

A BIT number type = 1 for TRUE and 0 for FALSE.

A BIT number can also be NULL.

SQL Code

SELECT *
FROM DimProduct
Query
WHERE FinishedGoodsFlag = 1

BIDA TM - Business Intelligence & Data Analysis


Date and Time Data Types

Data Type Format Storage Date range

DATETIME* ‘YYYYMMDD hh:mm:ss.nnn’ 8 bytes January 1st ,1753 though December 31st ,9999
2020-01-01 11:45:32.547
SMALLDATETIME* ‘YYYYMMDD hh:mm’ 4 bytes January 1st,1900 through June 6th,2079
2020-01-01 11:45
DATE ‘YYYY-MM-DD’ 3 bytes January 1st,0001 through Dec 31,9999
2020-01-01
TIME ‘hh:mm:ss:nnnnnnn’ 3-5 bytes Stores times only to an accuracy of 100
11:45:42.4356456 nanoseconds
DATETIME2 ‘YYYYMMDD hh:mm:ss:nnnnnnn’ 6 to 8 January 1st,0001 through Dec 31,9999
2020-01-01 11:45:42.4356456 bytes
DATETIMEOFFSET ‘YYYYMMDD hh:mm:ss:nnnnnnn’ +/- 8 to 10 January 1st,0001 through Dec 31,9999
hh:mm’ bytes
2020-01-01 11:45:42.4356456 + 08:00
*These data types are legacy types, meaning they are still supported, but are not as up to date or accurate.

BIDA TM - Business Intelligence & Data Analysis


Datepart Values

Parameter Description
year, yyyy, yy Returns the year
quarter, qq, q Returns the quarter
month, mm, m Returns the month
day, dd, d Returns the day of the month
dayofyear, dy, y Returns the day of the year
week, ww, wk Returns the week
weekday, dw, w Returns the weekday
hour, hh Returns the hour

BIDA TM - Business Intelligence & Data Analysis


Date and Time Functions

Function Description Statement Result


GETDATE() Returns the current date and time SELECT GETDATE() 2020-12-03
04:16:09.247
DATENAME (datepart, date) Returns a character string representing SELECT January
a specified datepart of a specified date. DATENAME(month,'20200101')
DATEPART (datepart, date) Returns an integer representing the SELECT 2020
specified datepart of the specified date. DATEPART(year,'20200101')
MONTH (date) Returns an integer representing the SELECT MONTH('20200301') 3
month part of a specified date.
YEAR (date) Returns an integer representing the SELECT YEAR(‘20190301') 2019
year part of a specified date.
DATEDIFF Returns the number of days or SELECT 29
(datepart, startdate, enddate) time datepart boundaries, crossed DATEDIFF(day,'20201201','2020
between two specified dates. 1230' )
DATEADD Returns a new datetime value by SELECT 2020-12-30
(datepart, number, date ) adding an interval to the specified DATEADD(day,29,'20201201' )
datepart of the specified date.

BIDA TM - Business Intelligence & Data Analysis


Date and time functions in practice

Date and time functions allow us to manipulate date and time values in a column.

SQL Code

SELECT

GETDATE() AS DateTimeStamp
DueDate,
Query ShipDate,
DATEDIFF (day, ShipDate, DueDate) AS DaysBetweenShippedAndDueDate,
DATEDIFF (hour, ShipDate, DueDate) AS HoursBetweenShippedAndDueDate

FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


DATEADD

The DATEADD function allows us to add N date parts to a date.

SQL Code

SELECT

GETDATE() AS DateTimeStamp
DueDate,
ShipDate,
Query DATEDIFF (day, ShipDate, DueDate) AS DaysBetweenShippedAndDueDate,
DATEDIFF (hour, ShipDate, DueDate) AS HoursBetweenShippedAndDueDate,
DATEADD (day, 10, DueDate) AS DueDatePlusTenDays,
DATEADD (day, -10, DueDate) AS DueDateLessTenDays

FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


Working with specific dates

To avoid confusion or mis-interpretation of regions, you should use YYYY-MM-DD format.

SQL Code

SELECT

MONTH(‘20201011’) AS MonthNumerical,
Query 1 MONTH(‘2020-10-11’) AS MonthNumerical

SET LANGUAGE British

Query 2 SELECT
DATENAME(month, ‘02/12/2020’) AS MonthName

BIDA TM - Business Intelligence & Data Analysis


Server Properties: Collation

Collation tells us how our database deals with case sensitivity and what languages we can use.

SQL Code

SELECT CONVERT (varchar(256), SERVERPROPERTY(‘collation’))


Query 1
RESULT: SQL_Latin1_General_CP1_CI_AS

Query 2 Right click on the dimProduct table and SELECT TOP 1000

Query 3 SELECT * FROM DimProduct WHERE Color = ‘silver’

Query 4 SELECT * FROM DimProduct WHERE Color = ‘Silver’

BIDA TM - Business Intelligence & Data Analysis


String or Text Data Types
Text, String and Char are all used interchangeably when referring to text.

Data Type Description Max Size Storage (bytes)


(characters)
Char(n) REGULAR: Fixed length 8000 1x Defined length
varchar(n) REGULAR: Variable length 8000 Number of characters + 2
hide this
varchar(max) REGULAR: Variable length Very Large Number of characters + 2
row
text REGULAR: Variable length Very Large Number of characters + 4
nchar(n) UNICODE: Fixed length, multiple languages 4000 2x Defined length
nvarchar(n) UNICODE: Variable length, multiple languages 4000 2x number of characters + 2
nvarchar(max) UNICODE: Variable length, multiple languages Very Large 2x number of characters + 2
ntext
hide this row UNICODE: Variable length, multiple languages Very Large 2x Defined length
binary(n) BINARY: Fixed length 8000 Defined length
varbinary(n) BINARY: Variable length 8000 Defined length + 2
varbinary(max) BINARY: Variable length Very Large Defined length + 2

BIDA TM - Business Intelligence & Data Analysis


String or Text Data Types
Text, String and Char are all used interchangeably when referring to text.

nvarchar(20) UNICODE: Variable length, multiple languages 4000 2x number of characters + 2

“Finance” = 2(7) + 2 = 16 bytes

“Corporate Finance” = 2(17) + 2 = 36 bytes

BIDA TM - Business Intelligence & Data Analysis


String Functions

Function Description Statement Result


CONCAT returns a string resulting from the concatenation, SELECT CONCAT(‘Short, ‘sell’') Shortsell
or joining, of two or more string values
LEFT Returns the left n characters of a string. SELECT LEFT('Amazon', 3) Ama

RIGHT Returns the right characters of a string. SELECT RIGHT('Amazon', 3) zon

REPLACE Replaces all occurrences of a specified string SELECT REPLACE('Buy Stock', 'Buy', 'Sell') Sell Stock
value with another string
UPPER Changes the format to UPPERCASE SELECT UPPER(‘futures’’) FUTURES

LOWER Changes the format to LOWERCASE SELECT LOWER(‘FUTURES’) futures

LEN Returns the string length, excluding trailing SELECT LEN (’stock ’) 5
spaces.

BIDA TM - Business Intelligence & Data Analysis


The CONCAT String Function

The CONCAT function joins several strings together into one long string.

SQL Code

SELECT

EnglishProductName AS ProductName,
EnglishDescription AS ProductDescription,
Query 1
CONCAT(EnglishProductName, '-', EnglishDescription) AS ProductNameAndDescription

FROM DimProduct
WHERE ProductKey=555

SELECT *
Query 2
FROM DimProduct

BIDA TM - Business Intelligence & Data Analysis


LEN UPPER LOWER REPLACE string functions

These are the most common string functions, allowing us to perform manipulations on text.

SQL Code
SELECT

EnglishProductName AS ProductName,
EnglishDescription AS ProductDescription,
CONCAT(EnglishProductName, '-', EnglishDescription) AS ProductNameAndDescription,
LEN(EnglishDescription) AS DescriptionLength,
Query 1
UPPER(EnglishProductName) AS UpperProductName,
LOWER(EnglishProductName) AS LowerProductName,
REPLACE(EnglishProductName, 'Front', 'Ultra Durable Front' ) AS EnglishProductNameReplaced

FROM DimProduct
WHERE ProductKey=555

SELECT *
Query 2
FROM DimProduct

BIDA TM - Business Intelligence & Data Analysis


Flexible LEFT and RIGHT functions

We can use LEFT and RIGHT to return a dynamic number of characters from a string.

SQL Code
SELECT

ProductKey,
ProductAlternateKey,
EnglishProductName AS ProductName,
EnglishDescription AS ProductDescription,
CONCAT(EnglishProductName, '-', EnglishDescription) AS ProductNameAndDescription,
LEN(EnglishDescription) AS DescriptionLength,
Query UPPER(EnglishProductName) AS UpperProductName,
LOWER(EnglishProductName) AS LowerProductName,
REPLACE(EnglishProductName, 'Front', 'Ultra Durable Front' ) AS EnglishProductNameReplaced
LEFT(ProductAlternateKey,2) AS ProductShort,
RIGHT(ProductAlternateKey,LEN(ProductAlternateKey)-3) AS ProductSize

FROM DimProduct
WHERE ProductKey=555

BIDA TM - Business Intelligence & Data Analysis


Comparison Operators

Comparison operators are usually used to determine if a certain condition is TRUE OR FALSE.
However, comparisons that involve a NULL return UNKNOWN.

=
Equal To
>
Greater than
<
Less than

Not equal to

Greater or

Less or equal
equal to to

Syntax = > < != OR <> >= <=

Date = ‘2020-08- SUM(SaleAmount) SUM(TaxAmt)< [ProductName] <> SUM(SaleAmount) SUM(TaxAmt) <=


Example Code 01’ > 25000 9,000 ‘Mountain-500’ >= 3000 2000

BIDA TM - Business Intelligence & Data Analysis


Comparison Operators – Dealing with NULL

NULLs are not captured with regular comparison operators. We have to identify them with IS NULL.
SQL Code
SELECT * SELECT * SELECT *
FROM DimProduct FROM DimProduct FROM DimProduct
Query 1 WHERE Class <> ‘H’ WHERE Class <> ‘H’

RESULT 🡪 606 ROWS RESULT 🡪 X ROWS RESULT 🡪 X ROWS


In the above examples, the number of rows where Class = H and Class does not = H, is not equal to the total rows. This
is because of the NULLS that exist in the Class column.

SELECT * SELECT * SELECT *


FROM DimProduct FROM DimProduct FROM DimProduct
Query 2
WHERE Class IS NULL WHERE Class IS NOT NULL
(a, b, c)
RESULT 🡪 606 ROWS RESULT 🡪 276 ROWS RESULT 🡪 330 ROWS
In the above examples, the number of rows where Class IS NULL and Class IS NOT NULL, does infact equal the total.

SELECT *
Query 3 FROM DimProduct
WHERE Class <> ‘H’ OR Class IS NULL

BIDA TM - Business Intelligence & Data Analysis


Logical Operators

Logical operators allow us to test multiple conditions at once, or to reverse a condition.

OR AND NOT

TRUE if either Boolean TRUE if both Boolean


Used to reverse a condition
expression is TRUE expressions are TRUE

Example Class <> ‘H’ OR Class IS NULL Size = ‘XL’ AND Color = ‘Black’ NOT(Color = ’Black’)

BIDA TM - Business Intelligence & Data Analysis


Logical Operators – Common Errors

We can use parenthesis to explicitly decide on the order of logical operators.

SQL Code
SELECT

EnglishProductName,
EnglishDescription,
Color,
[Status],
Class
Query
FROM DimProduct

-- Incorrect use of logical operators


-- WHERE Class <> ‘H’ OR Class IS NULL AND [Status] IS NOT NULL
Correct use of parenthesis to make the order of
WHERE (Class <> ‘H’ OR Class IS NULL) AND [Status] IS NOT NULL

BIDA TM - Business Intelligence & Data Analysis


Advanced Logical Operators - Simplifying Code

IN BETWEEN

TRUE if the value is equal to any TRUE if the value is within a


item in a list range (inclusive).

SalesAmount BETWEEN 500


Example Color IN (‘Red’, ‘Blue’,’White’)
AND 1000

Same Color = ‘Red’ OR Color = ‘Blue’ SalesAmount >= 500 AND


as… OR Color = ’White’ SalesAmount <= 1000

BIDA TM - Business Intelligence & Data Analysis


IN and BETWEEN in practice

IN and BETWEEN can help us simplify our code when we are dealing with multiple conditions.

SQL Code

SELECT EnglishProductName, EnglishDescription, Color, [Status], Class, SafetyStockLevel

FROM DimProduct
Query 1
WHERE (SafetyStockLevel BETWEEN 500 AND 1000) AND [Status] IS NOT NULL --BETWEEN IS INCLUSIVE OF BOTH END
-- WHERE (SafetyStockLevel >= 500 AND SafetyStockLevel <= 1000) AND [Status] IS NOT NULL

SELECT EnglishProductName, EnglishDescription, Color, [Status], Class, SafetyStockLevel

FROM DimProduct
Query 2
WHERE Color IN (‘Black’, ‘Silver’, ’White’ , ’Yellow’)
WHERE Color = ‘Black’ OR Color = ‘Silver’ OR Color =’White’ AND ‘Color’=‘Yellow’

Add these two queries to Manipulating Data – Logical Operators

BIDA TM - Business Intelligence & Data Analysis


Advanced Logical Operators - Partial Matches

WILDCARDS
LIKE
% represents any number of characters
_ represents one character
TRUE if the value matches a
specified pattern

EnglishProductName
Example
LIKE ‘%BRAKE%’

BIDA TM - Business Intelligence & Data Analysis


LIKE in practice

The LIKE operator can be used to find partial matches.

The _ represents a single wildcard character.

The % represents any number of wildcard characters.

SQL Code
SELECT

FirstName,
EmailAddress
Query
FROM DimCustomer

WHERE FirstName LIKE ’_R%’

BIDA TM - Business Intelligence & Data Analysis


Using IIF statements to create a conditional column

We can use IIF statements in the same way as we do in Excel.

SQL Code
SELECT

FirstName,
LastName,
Query YearlyIncome,
EmailAddress,
IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) AS IncomeCategory

FROM DimCustomer

BIDA TM - Business Intelligence & Data Analysis


Using a CASE statement for multiple conditions

A CASE statement helps us define multiple conditional scenarios in one column.

SQL Code
SELECT

FirstName,
LastName,
YearlyIncome,
EmailAddress,
IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) AS IncomeCategory,
CASE
Query WHEN NumberChildrenAtHome = 0 THEN '0’
WHEN NumberChildrenAtHome = 1 THEN '1’
WHEN NumberChildrenAtHome BETWEEN 2 AND 4 THEN '2-4’
WHEN NumberChildrenAtHome >=5 THEN '5+’
ELSE 'UNKN’
END AS NumberOfChildrenCategory,
NumberOfChildrenCategory AS ActualChildren

FROM DimCustomer

BIDA TM - Business Intelligence & Data Analysis


Basic SQL Formatting

Indenting SQL code helps to keep sections clear.

SQL Code
SELECT

FirstName,
LastName,
YearlyIncome,
EmailAddress,
IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) AS IncomeCategory,
CASE
Query WHEN NumberChildrenAtHome = 0 THEN '0’
WHEN NumberChildrenAtHome = 1 THEN '1’
WHEN NumberChildrenAtHome BETWEEN 2 AND 4 THEN '2-4’
WHEN NumberChildrenAtHome >=5 THEN '5+’
ELSE 'UNKN’
END AS NumberOfChildrenCategory,
NumberOfChildrenCategory AS ActualChildren

FROM DimCustomer

BIDA TM - Business Intelligence & Data Analysis


Using IF in a WHERE statement

We can replicate a conditional IIF column for use in a WHERE statement.

SQL Code
SELECT

FirstName,
LastName,
YearlyIncome,
EmailAddress,
IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) AS IncomeCategory,
CASE
WHEN NumberChildrenAtHome = 0 THEN '0’
Query WHEN NumberChildrenAtHome = 1 THEN '1’
WHEN NumberChildrenAtHome BETWEEN 2 AND 4 THEN '2-4’
WHEN NumberChildrenAtHome >=5 THEN '5+’
ELSE 'UNKN’
END AS NumberOfChildrenCategory,
NumberOfChildrenCategory AS ActualChildren

FROM DimCustomer

WHERE IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) = ‘Above Average’

BIDA TM - Business Intelligence & Data Analysis


Replacing NULL using IIF, ISNULL, and COALESCE

We can test for and replace NULL values in multiple ways.


SQL Code
SELECT
FirstName,
IIF(MiddleName IS NULL, ‘UNKN’ , MiddleName) AS MiddleName,
ISNULL(MiddleName, ‘UNKN’ ) AS MiddleName2,
COALESCE(MiddleName, ‘UNKN’ ) AS MiddleName3,
LastName,
YearlyIncome,
EmailAddress,
IIF(YearlyIncome > 50000, ‘Above Average’ , ’Below Average’ ) AS IncomeCategory
CASE
Query WHEN NumberChildrenAtHome = 0 THEN '0’
WHEN NumberChildrenAtHome = 1 THEN '1’
WHEN NumberChildrenAtHome BETWEEN 2 AND 4 THEN '2-4’
WHEN NumberChildrenAtHome >=5 THEN '5+’
ELSE 'UNKN’
END AS NumberOfChildrenCategory,
NumberOfChildrenCategory AS ActualChildren

FROM DimCustomer

WHERE IIF(YearlyIncome > 50000, ‘Above Average’ , ‘Below Average’ ) =

BIDA TM - Business Intelligence & Data Analysis


Using CAST to change the data type

The CAST operator allows us to change the data type.

SQL Code

SELECT

SalesAmount,
CAST(SalesAmount AS INT) AS SalesAmountCast,
Query
OrderDate,
CAST(OrderDate AS DATE) AS OrderDateCast

FROM FactInternetSales

BIDA TM - Business Intelligence & Data Analysis


Practical examples using CAST

In this example, the CAST operator is used to change two INTEGERS into DECIMALS.

SQL Code
/*SELECT

SalesAmount,
CAST(SalesAmount AS INT) AS SalesAmountCast,
OrderDate,
CAST(OrderDate AS DATE) AS OrderDateCast

FROM FactInternetSales*/
Query
SELECT

EnglishProductName,
ReOrderPoint,
SafetyStockLevel
CAST(ReOrderPoint AS DECIMAL(8,4)) / CAST(SafetyStockLevel AS DECIMAL(8,4)) AS PctOfTotalSafetyStock

FROM DimProduct
WHERE [Status] = ‘Current’

BIDA TM - Business Intelligence & Data Analysis


NATIONAL for Unicode best practice

The National ‘N’ should be used before all strings interacting with a UNICODE column type.

SQL Code

SELECT
EnglishProductName,
EnglishDescription,
Color,
[Status],
Class,
Query
SafetyStockLevel

FROM DimProduct

WHERE Color IN (N‘Black’, N‘Silver’, N’White’ , N’Yellow’)


WHERE Color = N‘Black’ OR Color = N‘Silver’ OR Color = N’White’ AND Color= N‘Yellow’

BIDA TM - Business Intelligence & Data Analysis


Precedence Among Operators

Earlier, we mentioned that the AND operator is executed before the OR operator.

Here is a full list of precedence for operators in SQL.

1. ( ) (Parentheses)

2. * (Multiplication), / (Division), % (Modulo)

3. + (Positive), – (Negative), + (Addition), + (Concatenation), – (Subtraction)

4. =, >, <, >=, <=, <>, !=, !>, !< (Comparison operators)
5. NOT

6. AND

7. BETWEEN, IN, LIKE, OR

8. = (Assignment)

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 2a

Sales territory 1 need a summary of their sales for the lead up period to Christmas.

1. Write a query against the FactInternet InvoiceNumber InvoiceLineNumber SalesAmount TaxAmount

Sales table that returns orders placed in


December for the Sales Territory 1 1 SO43699 1 3399.99 271.9992

2 SO46406 1 3578.27 286.2616


2. The query should include
3 SO46431 1 3578.27 286.2616
SalesOrderNumber,
SalesOrderLineNumber, SalesAmount 4 SO46445 1 3578.27 286.2616

and TaxAmount. 5 SO46446 1 3578.27 286.2616

6 SO46452 1 3374.99 269.9992

7 SO46466 1 3578.27 286.2616

.. ... ... ... ...

919 SO74251 3 34.99 2.7992

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 2b

Marketing need a list of homeowner customers, along with the number of cars owned.

1. Write a query against the CustomerName NumberOfCarsOwned Email


dimCustomer table that
returns all customers that
are homeowners and have 1 Elizabeth Johnson 4+ elizabeth5@adventure-works.com

more than 1 car. 2 Marco Mehta 2-3 marco14@adventure-works.com

2. The query should include 3 Rob Verhoff 2-3 rob4@adventure-works.com


full customer names,
number of cars owned, and 4 Curtis Lu 4+ curtis9@adventure-works.com
email.
5 Lauren Walker 2-3 lauren41@adventure-works.com

1. The numbers of cars owned 6 Ian Jenkins 2-3 ian47@adventure-works.com


should categorize
customers into groups: 7 Shannon Wang 2-3 shannon1@adventure-works.com
- 2-3 .. ... ... ...
- 4+
6126 Colin Xu 2-3 colin28@adventure-works.com

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals
SQL Theory
BIDA TM
- Intro
- Business
to Business
Intelligence
Intelligence
& Data
® Analysis
SQL Theory - Section Objectives

Tasks

Understand the difference Learn the most important


01. between OLTP and Data 02. terminology relating to SQL
Warehouse

Skills

Server vs Cloud vs On OLTP vs Database Fact vs Star,


Instance vs Premise Data Normalization Dimension Snowflake
Database Warehouse Tables and Hybrid
Schema

BIDA TM - Business Intelligence & Data Analysis


A History of SQL

1979
Relational Software Inc. introduced
the first commercially available
implementation of SQL Today

1989 o Microsoft SQL Server is


Microsoft SQL Server V1.0 one of many instances
shipped for the first time of an RDBMS

SQL is here to stay


SQL was created SQL becomes
(initially called an ASNI and ISO o Widely adopted
SEQUEL) standard
o Built on mathematical
1970 1986 principles

BIDA TM - Business Intelligence & Data Analysis


SQL Terminology
RDBMS
SQL =
Database 1
Structured
Query Database 2
Language
Default
System Database
RDBMS =
Relational
RDBMS
DataBase
Management Database 1
System
Database 2
Popular RDBMS Named
Microsoft SQL Server System Database
Oracle
MY SQL Server Instances Database
(Hardware) (Installation of SQL) (Related tables of data)

BIDA TM - Business Intelligence & Data Analysis


RDBMS Installation Options

Box (On Prem) IAAS - Infrastructure As A Service (hardware only)


Business responsible for PAAS - Platform As A Service (hardware & software)
server and software
Host looks after server hardware and/or software

Advantages Advantages
o Security o No upfront cost
o Flexibility o Easy to scale

Disadvantages Disadvantages
o Upfront cost o Targeted by hackers
o Requires expertise o Cost escalation

BIDA TM - Business Intelligence & Data Analysis


Data Security
Privacy Laws

Personal Data

PII
(Personally
Identifiable Information)

Name, Email, Address, SSN


IP Address, Phone Number
Date of Birth, Place of Work,
Company Phone Number

BIDA TM - Business Intelligence & Data Analysis


Data Security
If data leaves the operational environment, for example to be used in analysis, we should remove all
information that can identify an individual in the data.

First Name Last Name Email User ID Country Visit Count

John Xi xigi777@yahuu.com 000021 US 1

Nathan Wilson Nathan45.Wilson@bmail.com 000022 CAN 5

Alex Lee Alex@youtax.com 000023 MEX 2

Sophia Smith Fifi-foever@me.com 000024 CAN 2

Mia MacDonald Mia23@bmail.com 000025 US 15

Total Visits ???


25

BIDA TM - Business Intelligence & Data Analysis


Types of Database Systems

OLTP DW

ETL
Extract, Transform, and Load

o Designed for efficient data entry o Efficient for analysis and reporting
o Reduce data redundancy o Data Warehouse 🡪 Organization
o Fast at reading / writing data o Data Mart 🡪 Specific team

BIDA TM - Business Intelligence & Data Analysis


Data Normalization in OLTP Systems

Normalization is the process of organizing data to minimize redundancy.

Department
First Name Last Name Salary Hourly Salary Department Location
Manager

John Xi 52000 30 IT Thomas Vancouver

Nathan Wilson 45000 25.96 Finance Felix Toronto

Alex Lee 60000 34.62 Finance Felix Toronto

Sophia Smith 47000 27.12 IT Thomas Vancouver

Mia MacDonald 50000 28.85 Finance Felix Toronto

BIDA TM - Business Intelligence & Data Analysis


Data Normalization in OLTP Systems

Department
First Name Last Name Salary Hourly Salary Department Location
Manager

John Xi 52000 30 IT Thomas Vancouver


Nathan Wilson 45000 25.96 Finance Felix Toronto
Alex Lee 60000 34.62 Finance Felix Toronto
Sophia Smith 47000 27.12 IT Thomas Vancouver
Mia MacDonald 50000 28.85 Finance Felix Toronto

Normalization

First Last Department


Id Salary DepID DepId Department Location
Name Name Manager
1 John Xi 52000 1
1 IT Thomas Vancouver San Francisco
2 Nathan Wilson 45000 2
2 Finance Felix Toronto
3 Alex Lee 60000 2
4 Sophia Smith 47000 1
5 MacDona
Mia 50000 2
ld

BIDA TM - Business Intelligence & Data Analysis


Denormalization in DW Systems

Denormalization is used to simplify multiple related tables into one.

Category
CategoryID Category
Product Name DimProduct
1 Cell Phone Product Cateogy BrandID Product Product Category Brand Product
ID ID Name ID Name
2 Tablet
1 1 1 Iphone X 1 Cell Phone Apple Iphone X

2 1 1 Iphone
XI 2 Cell Phone Apple Iphone XI
Brand
3 2 1 Ipad 3 Tablet Apple Ipad 10.2
BrandID Brand Name 10.2
4 1 2 Galaxy 4 Cell Phone Samsung Galaxy
1 Apple Note20 Note20

2 Samsung

BIDA TM - Business Intelligence & Data Analysis


Fact Tables

Fact tables contain measurements about a particular business event.

OrderID CutomerID OrderDate Revenue Quantity Discount Total Cost

544122 1 2020-5-12 250.12 1 0 190

545428 2 2020-06-11 5211.45 24 25 2500

546584 3 2020-06-15 2000.24 8 50 940

547514 4 2020-08-11 5201.2 18 0 2800

Fact tables contain IDs (or Keys).

BIDA TM - Business Intelligence & Data Analysis


Dimension Tables

Dimension tables provide descriptive information about the attributes in the fact table.

CustomerID First Name Last Name Education Level Occupation

1 John Brooks Bachelors Professional

2 Lilly Xi Graduate Degree Management

3 Taylor Hess Partial College Skilled Manual

4 Tina Navarro High School Manual

IDs are used to connect dimension tables to fact tables.

BIDA TM - Business Intelligence & Data Analysis


Relationships & Keys
Date Table (Dimension Table) Customer Table (Dimension Table)

Date (PK) Day of Week CustomerID Customer


(PK) Name
01/12/2020 Tuesday
1 John Brooks
02/12/2020 Wednesday
2 Lilly Xi
03/12/2020 Thursday 3 Taylor Hess
Order CustomerID ProductID
Date (FK) Revenue
ID (PK) (FK) (FK)
152156 01/12/2020 261.24 1 3

Sales Transaction Table (Fact Table) 138688 02/12/2020 22.99 2 1

108966 02/12/2020 350.99 1 2


115812 03/12/2020 350.99 3 2

Columns used to join tables


Product Table (Dimension Table)
are known as keys.
ProductID
ProductName SubCategory Category
A Primary Key (PK) identifies (PK)
a row in the current table. 1
Water Bottle
Water Bottles Accessories
(Blue)
A Foreign Key (FK) identifies a
2 Winter Jacket Jackets Clothing
row in another table.
3 Rain Jacket Jackets Clothing

BIDA TM - Business Intelligence & Data Analysis


Star Schema
Date Table (Dimension Table) Customer Table (Dimension Table)

Date (PK) Day of Week CustomerID Customer


(PK) Name
01/12/2020 Tuesday
1 John Brooks
02/12/2020 Wednesday
2 Lilly Xi
03/12/2020 Thursday 3 Taylor Hess
Order CustomerID ProductID
Date (FK) Revenue
ID (PK) (FK) (FK)
152156 01/12/2020 261.24 1 3

Sales Transaction Table (Fact Table) 138688 02/12/2020 22.99 2 1

108966 02/12/2020 350.99 1 2


115812 03/12/2020 350.99 3 2

Product Table (Dimension Table)


ProductID
ProductName SubCategory Category
(PK)
Water Bottle
1 Water Bottles Accessories
(Blue)

2 Winter Jacket Jackets Clothing

3 Rain Jacket Jackets Clothing

BIDA TM - Business Intelligence & Data Analysis


Star Schema

A simple star schema has one central fact table, and a number of single dimension tables.

BIDA TM - Business Intelligence & Data Analysis


Snowflake/Hybrid Schema

Product Table (Dimension Table)


Date Table (Dimension Table) Customer Table (Dimension Table)
Product SubCatego
Customer ProductName
Date (PK) Day of Week Customer Name ID (PK) ryID(FK)
ID (PK)
01/12/2020 Tuesday Water Bottle
1 1
1 John Brooks (Blue)
02/12/2020 Wednesday
2 Lilly Xi 2 Winter Jacket 2
03/12/2020 Thursday
3 Taylor Hess 3 Rain Jacket 2

Sub-Category Table
(Dimension Table)
Order CustomerID ProductID SubCategory CategoryID(FK
Date (FK) Revenue SubCategory
ID (PK) (FK) (FK) ID (PK) )
152156 01/12/2020 261.24 1 3 1 Water Bottles 1

138688 02/12/2020 22.99 2 1 2 Jackets 2

108966 02/12/2020 350.99 1 2

115812 03/12/2020 350.99 3 2


Category Table (Dimension Table)
CategoryID (PK) Category
Sales Transaction Table (Fact Table)
1 Accessories
2 Clothing

BIDA TM - Business Intelligence & Data Analysis


Star Schema

A snowflack schema has one central fact table and includes dimension tables which are further
normalized with additional dimension tables.

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals
Working with Multiple
Tables
BIDA TM
- Intro
- Business
to Business
Intelligence
Intelligence
& Data
® Analysis
Working with Multiple Tables - Section Objectives

Tasks

Combine data from multiple Understand how to deal with


01. tables into a single query 02. MANY to MANY relationships

Skills

Learn how JOINS and Bridge Create Views Create dynamic


to read an UNIONS Tables results using
ERD subqueries

BIDA TM - Business Intelligence & Data Analysis


Relationships and ER diagrams

Scenario
Marketing would like a list of top customers
by sales, along with email addresses.

Primary keys uniquely identify


rows in a table.

Multiple primary keys mean that


the combination of both
columns is used to uniquely
identify a row.

Indicates a foreign key. This tells


us we can join these tables
together.

BIDA TM - Business Intelligence & Data Analysis


Purpose of DW Relationships
1) Relationship cardinality must be respected:

Many to One
Each key can only appear once in one table, but
may appear many times in the other.
E.g. Product tbl to Sales tbl

One to One
A single occurrence of the key in each table.
E.g. Tax Payers tbl to Social Security tbl

Many to Many
Potentially many occurrences of each key in each
table.
E.g. Customer tbl to Addresses tbl

2) Primary keys must exist in dimensions before we add the


same foreign key in fact tables.

eg. ProductID 922 must exist in DimProduct PK

DW relationships help users understand the connections between tables.


before

DW relationships also help maintain data integrity.


ProductID 922 can be used in FactInternetSales FK

BIDA TM - Business Intelligence & Data Analysis


Inner Join

FROM JOIN
OrderID Revenue CustomerID CustomerID Customer Name

152156 261.24 CG-12520 CG-12520 Claire Gute

138688 14.62 DV-13045 DV-13045 Darrin Van Huff

FROM JOIN 108966 957.57 SO-20335 SW-14531 Sara Wei

115812 1706.18 BH-11710

Inner Join
Returns only the rows
where the linking value(s) OrderID Revenue CustomerID Customer Name
match in both tables. 152156 261.24 CG-12520 Claire Gute

138688 14.62 DV-13045 Darrin Van Huff

In SQL, the INNER JOIN can be written as INNER JOIN or JOIN

BIDA TM - Business Intelligence & Data Analysis


Left Join

OrderID Revenue CustomerID CustomerID Customer Name

152156 261.24 CG-12520 CG-12520 Claire Gute

138688 14.62 DV-13045 DV-13045 Darrin Van Huff


FROM JOIN
108966 957.57 SO-20335 SW-14531 Sara Wei

115812 1706.18 BH-11710


Left Join
Returns all records from the
left (primary) table, but only
OrderID Revenue CustomerID Customer Name
the matched rows from the
JOIN table. 152156 261.24 CG-12520 Claire Gute

138688 14.62 DV-13045 Darrin Van Huff

108966 957.57 SO-20335 NULL

115812 1706.18 BH-11710 NULL

BIDA TM - Business Intelligence & Data Analysis


Right Join

OrderID Revenue CustomerID CustomerID Customer Name

152156 261.24 CG-12520 CG-12520 Claire Gute

138688 14.62 DV-13045 DV-13045 Darrin Van Huff


FROM JOIN
108966 957.57 SO-20335 SW-14531 Sara Wei

115812 1706.18 BH-11710


Right Join
Returns all records from the
right (JOIN) table, but only
OrderID Revenue CustomerID Customer Name
the matched rows from the
primary table table. 152156 261.24 CG-12520 Claire Gute

138688 14.62 DV-13045 Darrin Van Huff

NULL NULL SW-14531 Sara Wei

BIDA TM - Business Intelligence & Data Analysis


Full Outer Join

OrderID Revenue CustomerID CustomerID Customer Name

152156 261.24 CG-12520 CG-12520 Claire Gute

138688 14.62 DV-13045 DV-13045 Darrin Van Huff

108966 957.57 SO-20335 SW-14531 Sara Wei


FROM JOIN
115812 1706.18 BH-11710

Full Outer Join


Returns all records from OrderID Revenue CustomerID Customer Name
BOTH tables, matching rows
152156 261.24 CG-12520 Claire Gute
where possible.
138688 14.62 DV-13045 Darrin Van Huff

108966 957.57 SO-20335 NULL


115812 1706.18 BH-11710 NULL
NULL NULL SW-14531 Sara Wei

BIDA TM - Business Intelligence & Data Analysis


A basic INNER JOIN using sales and customers

The INNER JOIN here matches the Customer Key (FK) from FactInternetSales to the Customer Key (PK)
from DimCustomer.

SQL Code

SELECT *

Query FROM FactInternetSales AS f


INNER JOIN DimCustomer AS c
ON f.CustomerKey = c.CustomerKey

BIDA TM - Business Intelligence & Data Analysis


Returning only the TOP 100 customers

Here we GROUP BY Customer Names and Email Addresses, to ensure we have unique customers.

We then used ORDER BY – DESC to rank the unique customers, and TOP to select only 100.

We could also have grouped by Customer Key.

SQL Code
SELECT TOP(100)

CONCAT(c.FirstName,' ', c.LastName) AS CustomerName,


c.EmailAddress AS EmailAddress,
SUM(f.SalesAmount) AS AmountSpent

Query FROM FactInternetSales AS f


INNER JOIN DimCustomer AS c
ON f.CustomerKey = c.CustomerKey

GROUP BY dc.FirstName, dc.LastName, dc.EmailAddress

ORDER BY AmountSpent DESC

BIDA TM - Business Intelligence & Data Analysis


INNER JOIN the currency table

We can use INNER JOIN again, to JOIN a second table to our fact table.

SQL Code
SELECT TOP(100)

CONCAT(dc.FirstName,' ‘, dc.LastName) AS CustomerName,


dc.EmailAddress AS EmailAddress,
SUM(fs.SalesAmount) AS AmountSpent,
dcy.CurrencyName AS Currency

FROM FactInternetSales AS fs
INNER JOIN DimCustomer AS dc
Query
ON fs.CustomerKey = dc.CustomerKey
INNER JOIN DimCurrency AS dcy
ON fs.CurrencyKey = dcy.CurrencyKey

GROUP BY dc.FirstName, dc.LastName, dc.EmailAddress, dcy.CurrencyName

HAVING dcy.CurrencyName = N‘US Dollar’

ORDER BY AmountSpent DESC

BIDA TM - Business Intelligence & Data Analysis


HAVING or WHERE

In this example, it is more efficient to filter the original data using WHERE.

SQL Code
SELECT TOP(100)

CONCAT(dc.FirstName,' ‘, dc.LastName) AS CustomerName,


dc.EmailAddress AS EmailAddress,
SUM(fs.SalesAmount) AS AmountSpent,
--dcy.CurrencyName AS Currency

FROM FactInternetSales AS fs
INNER JOIN DimCustomer AS dc
ON fs.CustomerKey = dc.CustomerKey
Query
INNER JOIN DimCurrency AS dcy
ON fs.CurrencyKey = dcy.CurrencyKey

WHERE dcy.CurrencyName = N’US Dollar’

GROUP BY dc.FirstName, dc.LastName, dc.EmailAddress --, dcy.CurrencyName

--HAVING dcy.CurrencyName = N‘US Dollar’

ORDER BY AmountSpent DESC

BIDA TM - Business Intelligence & Data Analysis


RIGHT JOIN to retrieve full product catalogue

The RIGHT JOIN keeps the whole table right of the join (DimProduct)

SQL Code
SELECT

dp.EnglishProductName AS ProductName,
dp.Color AS ProductColor,
ISNULL(dp.Size,'UNKN') AS ProductSize,
ISNULL(SUM(fs.SalesAmount),0) AS SalesAmount

FROM FactInternetSales AS fs
Query
RIGHT JOIN DimProduct AS dp
ON fs.ProductKey = dp. ProductKey

WHERE dp.Status = N’Current’

GROUP BY dp.EnglishProductName, dp.Color, dp.Size

ORDER BY SalesAmount DESC

BIDA TM - Business Intelligence & Data Analysis


LEFT JOIN vs LEFT JOIN

LEFT JOINS can be easier to visualize.

Here we return all rows from our main table, and find matches where available from our JOIN table.

SQL Code
SELECT

dp.EnglishProductName AS ProductName,
dp.Color AS ProductColor,
ISNULL(dp.Size,'UNKN') AS ProductSize,
ISNULL(SUM(fs.SalesAmount),0) AS SalesAmount

FROM DimProduct AS dp
Query LEFT JOIN FactInternetSales AS fs
ON dp.ProductKey = fs. ProductKey

WHERE dp.Status = N’Current’

GROUP BY dp.EnglishProductName, dp.Color, dp.Size

ORDER BY SalesAmount DESC

BIDA TM - Business Intelligence & Data Analysis


Bridge Tables for Many to Many Relationships

Sales Transaction Table (Fact Table) Sales Reason Table (dimension)

SalesOrder SalesOrder SalesO SalesOrd


Reason
Number Line Revenue rderN er Sales Reason
Id
(PK) (PK) umber Line

SO51178 1 2319 SO51178 1 1 Price

SO51178 2 9.99 SO51178 1 2 On Promotion

SO51187 1 539.99 SO511


2 1 Price
78
SO51187 2 21.5 SO511
2 2 On Promotion
78
SO511
1 1 Price
87
SO511
1 2 On Promotion
87
SO511
2 1 Price
87
SO511
2 2 On Promotion
87
BIDA TM - Business Intelligence & Data Analysis
Bridge Tables

BIDA TM - Business Intelligence & Data Analysis


Bridge Tables for Many to Many Relationships

FactInternetSales
SalesOrder SalesOrder Revenue dimSalesReason
Number Line Reason Sales
SO51178 1 2319 ID Reason

SO51178 2 9.99 1 Price


Promotio
SO51187 1 539.99 2
n
SO51187 2 21.5

BIDA TM - Business Intelligence & Data Analysis


Bridge Tables for Many to Many Relationships
Bridge Table

FactInternetSalesReason
SalesOrder SalesOrder Reason
Number Line ID
FactInternetSales
SO51178 1 1
SalesOrder SalesOrder Revenue dimSalesReason
Number Line SO51178 1 2
Reason Sales
SO51178 1 2319 SO51178 2 1 ID Reason

SO51178 2 9.99 SO51178 2 2 1 Price


Promotio
SO51187 1 539.99 SO51187 1 1 2
n
SO51187 2 21.5 SO51187 1 2

SO51187 2 1

SO51187 2 2

BIDA TM - Business Intelligence & Data Analysis


Creating JOINS across BRIDGE tables.
The first INNER JOIN creates a link between our fact table and the bridge table.

The second INNER JOIN creates a link between the bridge table and the dimension.
SQL Code
SELECT

--fs.SalesOrderNumber AS InvoiceNumber,
--fs.SalesOrderLineNumber AS InvoiceLineNumber,
dsr.SalesReasonReasonType AS SalesReason,
SUM(fs.SalesAmount) AS SalesAmount

FROM FactInternetSales AS fs
Query
INNER JOIN FactInternetSalesReason AS fsr
ON fs.SalesOrderNumber = fsr.SalesOrderNumber AND fs.SalesOrderLineNumber = fsr.SalesOrderLineNumber
INNER JOIN DimSalesReason dsr
ON fsr.SalesReasonKey = dsr.SalesReasonKey

--WHERE fs.SalesOrderNumber = N'SO51178'

GROUP BY dsr.SalesReasonReasonType

The MANY to MANY nature of this connection means we should use caution when presenting results.

BIDA TM - Business Intelligence & Data Analysis


Adventure Works Schemas
FactInternetSales records of sales from the website. FactResellerSales records of sales from re-sellers.

BIDA TM - Business Intelligence & Data Analysis


UNION

A UNION combines 2 or more tables by adding the rows from one table to another.

OrderID Revenue CustomerID

1 261.24 1
OrderID Revenue CustomerID
2 14.62 2
1 261.24 1

3 957.57 3 2 14.62 2

3 957.57 3
OrderID Revenue CustomerID
4 1706.18 4
3 957.57 3

4 1706.18 4

A UNION discards duplicate values.

BIDA TM - Business Intelligence & Data Analysis


UNION ALL

A union combines 2 or more tables by adding the rows from one table to another.

OrderID Revenue CustomerID

1 261.24 1
OrderID Revenue CustomerID
2 14.62 2
1 261.24 1

3 957.57 3 2 14.62 2

3 957.57 3
OrderID Revenue CustomerID
3 957.57 3
3 957.57 3
4 1706.18 4
4 1706.18 4

UNION ALL keeps duplicate values.

BIDA TM - Business Intelligence & Data Analysis


Creating a UNION between Internet and Reseller Sales

A UNION combines the rows from multiple tables (queries).

The UNION should combine two queries that have the same columns.
The ORDER BY can only be used after the UNION of the two queries.

SQL Code
-- For illustrative purposes only. For full code refer to Completed Queries folder.

SELECT
COL1
COL2
COL3
FROM FactInternetSales

Query UNION

SELECT
COL1
COL2
COL3
FROM FactResellerSales

ORDER BY COL3

BIDA TM - Business Intelligence & Data Analysis


SELF JOIN

The SELF JOIN links a table to a copy of itself.

SELF JOINS are particularly common when dealing with hierarchical data, such as manager employee relationships.

EmployeeID EmployeeName ManagerID


1 Jason 2
2 Flo NULL EmployeeID EmployeeNa ManagerID ManagerNa
3 Rahul 2
me me
1 Jason 2 Flo
2 Flo NULL NULL
EmployeeID EmployeeName ManagerID
3 Rahul 2 Flo
1 Jason 2
2 Flo NULL
3 Rahul 2

BIDA TM - Business Intelligence & Data Analysis


CROSS JOIN
Need Something like this

The Cross Join combines every row in the first table with every row from the second table

CardType
MasterCard
Visa CardType Color
Cross Join
MasterCard Blue
Visa Blue
Color MasterCard Yellow
Blue Visa Yellow
Yellow MasterCard Red
Red Visa Red

BIDA TM - Business Intelligence & Data Analysis


Creating a VIEW

Views allow us to save queries in the database.

SQL Code

-- For illustrative purposes only. For full code refer to Completed Queries folder.

CREATE VIEW vwOrdersALL


AS

--Description of view here to assist future analysts.

SELECT
Query
COL1
COL2
COL3
FROM FactResellerSales

ORDER BY COL3

GO - GO is used as best practice at the end of the View creation code.

BIDA TM - Business Intelligence & Data Analysis


Querying a view

We can query a pre-defined view, which means we don’t need to re-do all our hard work, including JOINS.

SQL Code

SELECT *
Query
FROM vwOrdersALL

BIDA TM - Business Intelligence & Data Analysis


Creating dynamic results using SUBQUERIES

A subquery (inner query) returns data that we can re-use in our main query (outer query).

SQL Code

SELECT *
FROM vwOrdersALL

SELECT

InvoiceNumber,
InvoiceLineNumber,
Query OrderDate,
SalesAmount,
ProductName,
ProductSubcategory

FROM vwOrdersALL

WHERE OrderDate = (SELECT MAX(OrderDate) FROM vwOrdersALL)

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 4a

Summarize the Internet Sales by Subcategory and return the top 5 subcategories.

1. Write a query that returns the top 5 best-selling


subcategories by SalesAmount.
SubCategory SalesAmount

2. We’re only interested in sales from our website 1 Road Bikes 4289925.9
(internet sales). 2 Mountain Bikes 3417457.74
3 Touring Bikes 1292475.9
1. Finally, the data should only include sales where
the country is United States and the currency is 4 Tires and Tubes 88762.86

US Dollar 5 Helmets 76663.09

1. You are avoid using the view we created.

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 4b

It’s performance review time. HR Europe need to see sales by sales representative, and by currency.

EmployeeName EmployeeTitle Currency TotalSalesAmount


1. Write a query that will return a list
of all current Sales Representatives
European Sales United Kingdom
or Sales Managers in the European
1 Amy Alberts Manager Pound 441081.6364
territory.
European Sales
2. For each person, HR need to see 1 Amy Alberts Manager EURO 200960.57
sales amounts grouped by
currency. European Sales
2 Amy Alberts Manager US Dollar 90036.2386
3. Please include the following fields:
Sales United Kingdom
Full employee name, Employee 3 José Saraiva Representative Pound 3837927.19
Title, Currency Name and total sales
amount Sales
4 Rachel Valdez Representative EURO 1790640.23
4. The query should be sorted by
Employee Name and Sales Amount. Ranjit Varkey Sales
5 Chudukatil Representative US Dollar 4026954.02

Ranjit Varkey Sales


6 Chudukatil Representative EURO 482934.909

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals
SQL For Reporting
BIDA TM
- Intro
- Business
to Business
Intelligence
Intelligence
& Data
® Analysis
SQL for Reporting - Section Objectives

Tasks

Connect to our database and views Create three summary reports that
01. from popular BI tools 02. require more advanced
functions

Skills

Query SQL Query SQL in Query SQL Create a Create a Calculate %


in Power BI Excel in Tableau summary with summary with of Total
CUBE Function ROLLUP
Function

BIDA TM - Business Intelligence & Data Analysis


Finance Schema

BIDA TM - Business Intelligence & Data Analysis


Using CUBE to return subtotals and totals

When using GROUP BY across multiple columns, we lose the ability to see subtotals.

CUBE allows us to see the subtotals and totals of each row combination.

SQL Code

SELECT

Region,
ProductSubCategory,
SUM(SalesAmount) AS TotalSales
Query
FROM vwOrdersALL

WHERE YEAR(OrderDate) = 2013


AND Currency = N'US Dollar'

GROUP BY CUBE(Region, ProductCategory)

BIDA TM - Business Intelligence & Data Analysis


Using ROLLUP to return subtotals and totals

ROLLUP gives a similar outcome to CUBE, but performs better on hierarchical data.

SQL Code

SELECT

Region,
ProductSubCategory,
SUM(SalesAmount) AS TotalSales
Query
FROM vwOrdersALL

WHERE YEAR(OrderDate) = 2013


AND Currency = N'US Dollar'

GROUP BY CUBE(ProductCategory, ProductSubCategory)

BIDA TM - Business Intelligence & Data Analysis


Common scenario: Percent of total

We can use a subquery to calculate the grand total, and use it in our SELECT statement to calc % of total.

SQL Code

SELECT

Source AS Reseller,
SUM(SalesAmount) AS Sales,
SUM(SalesAmount) / (SELECT SUM(SalesAmount) FROM vwOrdersALL WHERE Country = N'United States' AND Source
<> N'Web') AS PctOfTotal
Query
FROM vwOrdersALL

WHERE Country = N'United States' AND Source <> N'Web'

GROUP BY Source

ORDER BY Sales DESC

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 5a

Create a summary of expenditure accounts.

1. Write a query that will Organization AccountType Account Amt


return the sum of actuals
from the FactFinance table. Southwest Standard Cost of
1 Division Expenditures Sales 122573
2. Filter the data to meet the
Southwest
following conditions:
2 Division Expenditures Salaries 39240
- January 2011 only
- Southwest division only Southwest
- Expenditure accounts only 3 Division Expenditures Taxes 36299

3. For each row, list the 4


Southwest
Division Expenditures Salaries 28280
Organization, Account Type
and Account Name.
...
4. Group the rows by
Southwest
Organization, Account Type 28 Division Expenditures Equipment 43
and Account.

BIDA TM - Business Intelligence & Data Analysis


SQL Fundamentals: Student Exercise 5b

Create a summary of expenditure account totals, and then calculate a PCT of total.

1. Write a query that will return Account


Description, and amounts corresponding AccountDescription Amount PctofTotal
to ACTUALS.
Standard Cost of
2. Filter the results to meet the following 1 Sales 2672904.1 0.3572165
conditions:
- ACTUALS only
2 Salaries 2163556.3 0.28914543
- Canadian Division only
- Calendar year 2013 only
- Expenditure accounts only 3 Taxes 665875.37 0.08898998

3. Create a subquery to help calculate the 4 Variances 441498.92 0.0590035


total sales that meet the same
conditions. 5 Commissions 320161.79 0.04278757

HINT: Sometimes it’s easier to create the subquery


separately and then add it to your main query. ...

Other Travel
29 Related 3053.93 0.00040814

BIDA TM - Business Intelligence & Data Analysis

You might also like