Data Analysis
with
Databricks SQL
©2022 Databricks Inc. — All rights reserved 1
Getting Started
in the Course
©2022 Databricks Inc. — All rights reserved 2
Introductions
▪ Introductions
▪ Name
▪ SQL/Databricks Experience
▪ Professional Responsibilities
▪ Fun Personal Interest/Fact
▪ Expectations for the Course
©2022 Databricks Inc. — All rights reserved
Lesson goals
1 Get to know the instructor and other students
2 Describe the course objectives
3 Describe course structure
4 Give an overview of the technical environment
©2022 Databricks Inc. — All rights reserved
Course Objectives
● Import data and persist it in Databricks SQL as schemas, tables, and views
● Query data in Databricks SQL
● Use Databricks SQL to create visualizations and dashboards
©2022 Databricks Inc. — All rights reserved
Agenda
Module Name Duration
Getting Started with Databricks SQL 1 hr 40 min
Basic SQL on Databricks SQL 2 hrs
Presenting Data Visually 2 hrs 20 min
● We will take 10 minute breaks at the beginning of every hour
● We will take a one hour break for lunch
©2022 Databricks Inc. — All rights reserved 6
Getting Started with Databricks SQL
Lesson Name Duration
Getting Started with Databricks SQL 10 min
Navigating Databricks SQL Demo 20 min
Unity Catalog on Databricks SQL 10 min
Schemas, Tables, and Views on Databricks SQL Demo 20 min
Schemas, Tables, and Views on Databricks SQL Lab 20 min
©2022 Databricks Inc. — All rights reserved 7
Basic SQL on Databricks SQL
Lesson Name Duration
Ingesting Data for Databricks SQL 10 min
Ingesting Data Demo 20 min
Ingesting Data Lab 20 min
Joins 10 min
Delta Commands in Databricks SQL Demo 20 min
Optional: Basic SQL Demo 30 min
Basic SQL Lab 30 min
©2022 Databricks Inc. — All rights reserved 8
Presenting Data Visually
Lesson Name Duration
Data Visualizations 10 min
Data Visualizations and Dashboards Demo 30 min
Data Visualizations and Dashboards Lab 20 min
Notifying Stakeholders Demo 20 min
Notifying Stakeholders Lab 20 min
Final Lab Assignment Overview 10 min
Final Lab Assignment 30 min
©2022 Databricks Inc. — All rights reserved 9
Technical Environment Overview
The Databricks SQL workspace
• Everyone is in the same workspace
• Everyone has their own schema and SQL warehouse
• Only the instructor has administrator privileges in the workspace
•Only a select few tasks in this course require admin privileges
•You will see these tasks in the slides in order to provide context
•The demos and labs do not require admin privileges
©2022 Databricks Inc. — All rights reserved 10
Technical Environment Overview, cont.
Working through the course
• Slides
• Some content is provided using slide decks
• Take notes, if you wish
• The slide decks will be provided to you
• Demonstrations
• Work through these together with the instructor
• Labs
• Reinforce your learning by accomplishing tasks on your own
• Ask questions, as needed
©2022 Databricks Inc. — All rights reserved 11
Databricks Certified Data Analyst Associate
Become Certified
• Five certification pathways
• Data Analyst
• Business Leader
• Data Engineer
• Machine Learning Practitioner
• Platform Administrator
• Recommended Self-Paced Courses
• How to Ingest Data for Databricks SQL
• How to Integrate BI Tools with Databricks SQL
©2022 Databricks Inc. — All rights reserved 12
Getting Started
with
Databricks SQL
©2022 Databricks Inc. — All rights reserved 13
Lesson goals
1 Describe what Databricks is.
2 Describe what Databricks SQL is.
3 Describe the benefits of Databricks SQL.
4 Warehouse Configuration
©2022 Databricks Inc. — All rights reserved 14
What is Databricks?
©2022 Databricks Inc. — All rights reserved 15
Databricks’ vision is to enable data-driven
innovation to all enterprises
©2022 Databricks Inc. — All rights reserved
Lakehouse Platform
SIMPLE OPEN COLLABORATIVE
BI & SQL Real-time Data Data Science
Data Engineering
Analytics Applications & Machine Learning
Data Management & Governance
Open Data Lake
Structured Semi-structured Unstructured Streaming
©2022 Databricks Inc. — All rights reserved 17
Business Intelligence
Visual ETL & Data Ingestion
Azure Data
Azure
Synapse
Open
Factory
Google
BigQuery
Amazon
Unify your data
ecosystem with open
Redshift
Machine Learning source, standards, and
Amazon
SageMaker
Azure Machine
Learning formats
Google
AI Platform
Lakehouse Platform
Data Providers
Centralized Governance
AWS Glue
Partners
Top Consulting & SI Partners
450+
Across the data landscape
©2022 Databricks Inc. — All rights reserved
Databricks SQL
©2022 Databricks Inc. — All rights reserved 19
Databricks SQL
Delivering analytics on the freshest
data with data warehouse
performance and data lake economics
■ Better price / performance than other cloud data
warehouses
■ Simplify discovery and sharing of new insights
■ Connect to familiar BI tools, like Tableau or Power
BI
■ Simplified administration and governance
©2022 Databricks Inc. — All rights reserved
Better price / performance
Run SQL queries on your
lakehouse and analyze your
freshest data with up to 6x
better price/performance than
traditional cloud data
warehouses.
Source: Performance Benchmark with Barcelona Supercomputing Center
©2022 Databricks Inc. — All rights reserved
Better together | Broad integration with BI
tools
Connect your preferred BI tools with
optimized connectors that provide
fast performance, low latency, and
high user conconcurrency to your
data lake for your existing BI tools.
Coming soon:
©2022 Databricks Inc. — All rights reserved
Why use Databricks
SQL?
©2022 Databricks Inc. — All rights reserved 23
A new home for data analysts
Enable data analysts to quickly
perform ad-hoc and exploratory
data analysis, with a new SQL
query editor, visualizations and
dashboards. Automatic alerts can
be triggered for critical changes,
allowing to respond to business
needs faster.
©2022 Databricks Inc. — All rights reserved
Simple administration and governance
Quickly setup SQL / BI
optimized compute with SQL
warehouses. Databricks
automatically determines
instance types and
configuration for the best
price/performance. Then, easily
manage usage, perform quick
auditing, and troubleshooting
with query history.
©2022 Databricks Inc. — All rights reserved
Use Cases
Connect existing BI tools to one Collaboratively explore the Build data-enhanced
source of truth for all your data latest and freshest data applications
Maximize existing investments by Respond to business needs faster Build rich and custom data
connecting your preferred BI tools to with a self-served experience enhanced applications for your
your data lake with Databricks SQL designed for every analysts in your own organization or your
Warehouses. Re-engineered and organization. Databricks SQL Analytics customers. Benefit from the ease
optimized connectors ensure fast provides a simple and secure access of connectivity, management, and
performance, low latency, and high user to data, ability to create or reuse SQL better price / performance of
concurrency to your data lake. Now queries to analyze the data that sits Databricks SQL Analytics to
analysts can use the best tool for the job directly on your data lake, and quickly simplify development of
on one single source of truth for your mock-up and iterate on visualizations data-enhanced applications at
data while minimizing more ETL and data and dashboards that fit best the scale, all served from your data
silos. business. lake.
©2022 Databricks Inc. — All rights reserved
Warehouse
Configuration
©2022 Databricks Inc. — All rights reserved 27
Warehouse Configuration
AWS Azure
©2022 Databricks Inc. — All rights reserved 28
Warehouse Configuration
In the course, SQL Warehouses have the following settings
• Cluster size – 2X-Small
• Scaling – Min: 1, Max 1
• Auto-stop – After two hours
• Photon – Off
©2022 Databricks Inc. — All rights reserved 29
01-2 – DEMO: NAVIGATING
DATABRICKS SQL
©2022 Databricks Inc. — All rights reserved
Unity Catalog
on
Databricks SQL
©2022 Databricks Inc. — All rights reserved 31
Lesson goals
1 Describe the object model in Unity Catalog.
2 Write queries using three-level namespace notation.
©2022 Databricks Inc. — All rights reserved
Unity Catalog
Object Model
©2022 Databricks Inc. — All rights reserved 33
Object Model
©2022 Databricks Inc. — All rights reserved 34
Metastore
• Stores data assets
• Permissions
• Created with default storage
location (external object store)
• Metastore Admin
©2022 Databricks Inc. — All rights reserved 35
Catalog
• First level of organization
• Users can see all catalogs where
USAGE is granted
©2022 Databricks Inc. — All rights reserved 36
Schema
• aka, Database
• Second level of organization
• Users can see all schemas where
USAGE is granted on both the
schema and the catalog
©2022 Databricks Inc. — All rights reserved 37
Managed Table
• Third level of organization
• Supported format: Delta
• Data is written to a new directory in
the metastore’s default location
• Created using CREATE TABLE
statement with no LOCATION
clause
• Example:
CREATE TABLE table1 …
©2022 Databricks Inc. — All rights reserved 38
External Table
• Third level of organization
• Data stored in a location outside
the managed storage location
• DROP TABLE does not delete data
• Can easily clone a table to a new
schema or table name without
moving data
• Supported formats:
• Delta, csv, json, avro, parquet, orc, text
©2022 Databricks Inc. — All rights reserved 39
Creating External Tables
• Two credential types:
• Storage Credential or External Location
• Use the LOCATION clause
• Example using External Location only
CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...
• Example using Storage Credential
CREATE TABLE table2
LOCATION 's3://<bucket_path>/<table_directory>'
...
WITH CREDENTIAL <credential-name>;
©2022 Databricks Inc. — All rights reserved 40
View
• Third level of organization
• Can be composed from tables and
views in multiple schemas or
catalogs
• Created using CREATE VIEW:
CREATE VIEW view1 AS
SELECT column1, column2
FROM table1 ...
©2022 Databricks Inc. — All rights reserved 41
Three-Level Namespace Notation
• Data objects must be specified with three elements, depending on
granularity required: Catalog, Schema, and Table
• Example:
CREATE TABLE main.default.department
(
deptcode INT,
deptname STRING,
location STRING
);
• Or, with a USE statement:
USE main.default;
SELECT * FROM department;
©2022 Databricks Inc. — All rights reserved 42
Suggestions for Best Practices
• Use Delta as the data format when creating tables
• If possible, use external tables
©2022 Databricks Inc. — All rights reserved 43
01-4 – DEMO: SCHEMAS,
TABLES, AND VIEWS ON
DATABRICKS SQL
©2022 Databricks Inc. — All rights reserved
01-5 – LAB: SCHEMAS,
TABLES, AND VIEWS ON
DATABRICKS SQL
©2022 Databricks Inc. — All rights reserved
Ingesting Data
for Databricks
SQL
©2022 Databricks Inc. — All rights reserved 46
Lesson goals
1 Describe how to connect Databricks SQL to an object store
2 Explain how Partner Connect can be used to ingest data
3 Provide proper data access privileges to users
©2022 Databricks Inc. — All rights reserved
Ingesting Existing Data
• Databricks SQL can ingest Parquet, JSON, CSV, Delta, and more
• Individual file
• Full directory of files of a single type
• Example (Azure Databricks):
CREATE TABLE table1 LOCATION
'wasbs://[account].blob.core.windows.net/[container]/[path/]'
©2022 Databricks Inc. — All rights reserved 48
Partner Connect
• Connect to Databricks partners
• Data ingestion, preparation, BI, and visualization tools
• Data Ingestion:
• Fivetran
• Rivery
• Click Partner Connect in the sidebar menu to get started
• More detail in Databricks Academy course:
• How to Ingest Data for Databricks SQL
©2022 Databricks Inc. — All rights reserved 49
GRANT and REVOKE
• Databricks SQL supports standard GRANT and REVOKE statements
in SQL
• Permission types include CREATE, MODIFY, SELECT, USAGE, and
more.
• Permissions can be granted to users, groups, or both
• Can also grant all permissions
• Example:
GRANT ALL PRIVILEGES ON TABLE table1 TO finance;
• Revoke privileges in the same way
©2022 Databricks Inc. — All rights reserved 50
Data Explorer
• A UI tool for working with database entities
• Grant and revoke permissions, view schema details, preview
sample data, and see table details and properties
• Click “Data” in the sidebar menu to access the Data Explorer
©2022 Databricks Inc. — All rights reserved 51
02-1 – DEMO:
INGESTING DATA
©2022 Databricks Inc. — All rights reserved
02-3 – LAB:
INGESTING DATA
©2022 Databricks Inc. — All rights reserved
Joins
©2022 Databricks Inc. — All rights reserved 54
Join
• Combine rows from two relations based on a criteria
• Relations are tables, views, and more
• Many join types: INNER, LEFT, RIGHT, FULL, SEMI, ANTI, and
CROSS
• The criteria is a boolean expression that specifies how the
relations will be joined
©2022 Databricks Inc. — All rights reserved 55
Join
• Example:
SELECT id, name, deptname
FROM employee
INNER JOIN department ON employee.deptno =
department.deptno;
©2022 Databricks Inc. — All rights reserved 56
SELECT name, f_color FROM table1 INNER JOIN table2 ON table1.id = table2.pid;
INNER JOIN
Output:
57
©2022 Databricks Inc. — All rights reserved
SELECT name, f_color FROM table1 LEFT OUTER JOIN table2 ON table1.id = table2.pid;
LEFT JOIN
Output:
58
©2022 Databricks Inc. — All rights reserved
SELECT name, f_color FROM table1 RIGHT OUTER JOIN table2 ON table1.id = table2.pid;
RIGHT JOIN
Output:
59
©2022 Databricks Inc. — All rights reserved
SELECT name, f_color FROM table1 FULL OUTER JOIN table2 ON table1.id = table2.pid;
FULL JOIN
Output:
60
©2022 Databricks Inc. — All rights reserved
SELECT name FROM table1 LEFT SEMI JOIN table2 ON table1.id = table2.pid;
SEMI JOIN
Output:
61
©2022 Databricks Inc. — All rights reserved
SELECT name FROM table1 LEFT ANTI JOIN table2 ON table1.id = table2.pid;
ANTI JOIN
Output:
62
©2022 Databricks Inc. — All rights reserved
SELECT name, f_color FROM table1 CROSS JOIN table2 ON table1.id = table2.pid;
Output:
CROSS JOIN
©2022 Databricks Inc. — All rights reserved 63
02-5-1 – DEMO:
DELTA COMMANDS
IN DATABRICKS SQL
©2022 Databricks Inc. — All rights reserved
02-5-2 – DEMO:
OPTIONAL:
BASIC SQL
©2022 Databricks Inc. — All rights reserved
02-6 – LAB:
BASIC SQL
©2022 Databricks Inc. — All rights reserved
Data
Visualization
©2022 Databricks Inc. — All rights reserved 67
Table
• Default visualization
• Customizable columns
• Change heading
• Add description
• Change font color
• Conditional font color
• Based on each data
value
©2022 Databricks Inc. — All rights reserved 68
Details, Counter, Pivot
Details Counter Pivot
©2022 Databricks Inc. — All rights reserved 69
Charts
• Chart types: Line, Bar,
Area, Pie, Scatter, Bubble,
Heatmap, and Box
• Grouping
• Stacking
• Error Bars
©2022 Databricks Inc. — All rights reserved 70
Histogram
• Display a count
• Control of number of
buckets
©2022 Databricks Inc. — All rights reserved 71
Cohort, Funnel, Word Cloud
Cohort Funnel Word Cloud
©2022 Databricks Inc. — All rights reserved 72
Maps
Choropleth Map Marker Map
©2022 Databricks Inc. — All rights reserved 73
Sankey and Sunburst
Sankey Sunburst
©2022 Databricks Inc. — All rights reserved 74
Databricks Academy
• We aren’t going to cover all visualizations in this course.
• More detail can be found in the Databricks Academy course:
• Data Visualization with Databricks SQL
• If you are using Tableau or PowerBI, you can connect both to Databricks
SQL
• More detail in the Databricks Academy course:
• How to Integrate BI Tools with Databricks SQL
©2022 Databricks Inc. — All rights reserved 75
03-2 – DEMO: DATA
VISUALIZATIONS AND
DASHBOARDS
©2022 Databricks Inc. — All rights reserved
03-3 – LAB: DATA
VISUALIZATIONS AND
DASHBOARDS
©2022 Databricks Inc. — All rights reserved
03-4 – DEMO:
NOTIFYING STAKEHOLDERS
©2022 Databricks Inc. — All rights reserved
03-5 – LAB:
NOTIFYING STAKEHOLDERS
©2022 Databricks Inc. — All rights reserved
03-6 – LAB:
FINAL LAB ASSIGNMENT
©2022 Databricks Inc. — All rights reserved