Overview of
Databricks SQL
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
DATABRICKS CONCEPTS
Databricks for SQL Users
Databricks SQL
Data Warehousing for the Lakehouse
Familiar environment for SQL users
SQL-optimized performance (Photon)
Connect to your favorite BI tools
Comes built into the platform!
DATABRICKS CONCEPTS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format
DATABRICKS CONCEPTS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format
Separation of compute and storage Storage often tied to compute
DATABRICKS CONCEPTS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format
Separation of compute and storage Storage often tied to compute
ANSI SQL Tech-specific SQL
DATABRICKS CONCEPTS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format
Separation of compute and storage Storage often tied to compute
ANSI SQL Tech-specific SQL
Integrated into other data workloads Usually lacking advanced analytics
DATABRICKS CONCEPTS
SQL in the Lakehouse Architecture
DATABRICKS CONCEPTS
Let's review!
D ATA B R I C K S C O N C E P T S
Getting started with
Databricks SQL
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
SQL Compute vs. General Compute
Designing compute clusters for data science is inherently different than designing compute
or data engineering workloads... for SQL workloads
import pyspark.sql.functions as F SELECT *
FROM user_table u
spark_df = (spark LEFT JOIN product_use p
.read ON u.userId = p.userId
.table('user_table')) WHERE country = 'USA'
AND utilization >= 0.6
spark_df = (spark_df
.withColumn('score',
F.flatten(...))
)
DATABRICKS CONCEPTS
SQL Warehouse
DATABRICKS CONCEPTS
SQL Warehouse
SQL Warehouse Configuration Options
1. Cluster Name
2. Cluster Size (S, M, L, etc.)
3. Scaling behavior
DATABRICKS CONCEPTS
SQL Warehouse
SQL Warehouse Configuration Options
1. Cluster Name
2. Cluster Size (S, M, L, etc.)
3. Scaling behavior
4. Cluster Type
DATABRICKS CONCEPTS
SQL Warehouse Types
Different types provide different benefits Classic
Most basic SQL compute
In customer cloud
Pro Serverless
More advanced features than Classic Cutting edge features
In customer cloud In Databricks cloud
Most cost performant
DATABRICKS CONCEPTS
SQL Editor
DATABRICKS CONCEPTS
Common SQL Commands
COPY INTO CREATE <entity> AS
Grab raw data and put into Delta Create a Table or View
The Extract of ETL The Transform in ETL
COPY INTO my_table CREATE TABLE events
FROM '/path/to/files' USING DELTA
FILEFORMAT = <format> AS (
FORMAT_OPTIONS ('mergeSchema' = 'true') SELECT *
COPY_OPTIONS ('mergeSchema' = 'true'); FROM raw_events
WHERE ...
)
DATABRICKS CONCEPTS
Let's practice!
D ATA B R I C K S C O N C E P T S
Databricks SQL
queries and
dashboards
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
Databricks SQL Assets
DATABRICKS CONCEPTS
Databricks SQL Assets
DATABRICKS CONCEPTS
Visualizations
Lightweight, in-platform visualizations
Support for standard visual types
Ability to quickly comprehend data in a graphical way
DATABRICKS CONCEPTS
Databricks SQL Assets
DATABRICKS CONCEPTS
Databricks SQL Assets
DATABRICKS CONCEPTS
Dashboards
Lightweight, easily created dashboards
Ability to share and govern across your organization
Scalable and performant
DATABRICKS CONCEPTS
Query Filters
Filters
Interactive query / dashboard components
that allow the user to reduce the size of the
result dataset SELECT *
Works on the client-side, so is very fast FROM nyctaxi.trips
WHERE pickup_zip = 10103
Supports single select, multi-select, text
AND dropoff_zip = 10023
fields, and date / time pickers
DATABRICKS CONCEPTS
Query Parameters
Parameters
More flexible than filters, and supports
more kinds of selectors SELECT *
Allow the user to provide a value that is FROM nyctaxi.trips
input into the underlying SQL query text WHERE pickup_zip = 10103
AND dropoff_zip = 10023
Created in the query by using the {{ }}
AND {{ nullCheck }} IS NOT NULL
syntax
DATABRICKS CONCEPTS
Let's practice!
D ATA B R I C K S C O N C E P T S
Creating a
Databricks SQL
Dashboard
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
Let's practice!
D ATA B R I C K S C O N C E P T S