Advanced SQL
FOR DATA ANALYSIS
Agenda
Installing SQLiteStudio
Subqueries, Derived Tables, and Unions
Regular Expressions
Advanced Joins
Window FuncCons
Programming with SQL (Python, R and Java)
About the Speaker
Thomas Nield
Business consultant for Southwest Airlines in Schedule IniCaCves
Author of Ge#ng Started with SQL by O'Reilly and Learning RxJava by Packt
Teaches a few online trainings at O'Reilly
◦ SQL Fundamentals for Data
◦ Advanced SQL for Data Analysis
◦ Reac@ve Python for Data Science
Setting Up SQLite
SQLiteStudio can be downloaded at the official site:
hSps://sqlitestudio.pl/index.rvt?act=download
For Windows and Linux, simply unzip the folder to a locaCon of your choice and open
SQLiteStudio or SQLiteStudio.exe
For MacOS, double-click the downloaded DMG to install it or drag it to your ApplicaCons folder
Getting Resource Files
The few resources needed for this class are available on GitHub:
hSps://github.com/thomasnield/oreilly_advanced_sql_for_data
Unzip the contents to a locaCon of your choice, and note where you put them
Contents include:
◦ A SQLite database file called thunderbird_manufacturing.db
◦ Class notes with all examples (in three formats)
◦ A customer_order.sql SQL script file to create a CUSTOMER_ORDER table
Section II Exercise
Bring in all fields from CUSTOMER_ORDER, but for each record show the total quanCty ordered
for that given CUSTOMER_ID and PRODUCT_ID.
Section III Exercise
Find all customers with an address ending in "Blvd" or "St"
Section VI Exercise
For every CALENDAR_DATE and CUSTOMER_ID, show the total QUANTITY ordered for the date
range of 2017-01-01 to 2017-03-31:
Section V Exercise
For the month of March, bring in the rolling sum of QUANTITY ordered (to each ORDER_DATE) by
CUSTOMER_ID and PRODUCT_ID.
Windowing Functions Support
Windowing funcCons are found on many database plagorms, including:
◦ Oracle
◦ Teradata
◦ PostgreSQL
◦ SQL Server
◦ Apache Spark SQL
◦ MySQL 8
These plagorms notably do not have windowing funcCons:
◦ MySQL (previous to version 8)
◦ SQLite
◦ MariaDB
Since SQLite does not support windowing funcCons, we will use a web-based PostgreSQL
client: hSp://rextester.com/l/postgresql_online_compiler
Mixing Programming with SQL
When using SQL with a programming plagorm like Python, Java, or R, you will constantly be
making a decision where the onus of processing will happen.
Should the database engine do the computaCon work, or the programming plagorm?
◦ You can simply pull in data and have your Python/Java/R codebase do the heavy-liiing.
◦ You can also leverage more complex SQL against the database, and have Python/Java/R consume the
results.
◦ With a very large, expensive and calculated dataset you can save it to a temporary table and use it to
support your Python/R/Java applicaCon.
A good rule of thumb: start with the simplest soluCon with minimal code/SQL that liberally hits
the database as-needed, and gradually introduce caching strategies as performance starts to
warrant it.
Never concatenate parameters, and use established SQL libraries to inject parameters safely to
prevent SQL injecCon.