All Course Slides
All Course Slides
All Course Slides
Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155
Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other 182
Tools
Snowflake Pricing 34 Swapping 104 Best Practices 186
Enjoy learning!
Best practices
✓ Pay only for what you use
Enjoy learning!
Snowflake Architecture
Snowflake Architecture
CLOUD SERVICES
QUERY PROCESSING
STORAGE
Snowflake Architecture
- Brain of the system -
Managing infrastructure, Access control, security,
Optimizier, Metadata etc.
XS 1 L 8
S 2
XL 16
M 4
4XL 128
Multi-Clustering
Multi-Clustering
… More queries …
S
S
Multi-Clustering
… More queries …
S
> Auto-Scaling
S
Multi-Clustering
Queue
Standard Economy
Favors starting Favors conserving
additional credits rather than
warehouses starting additional
warehouses
Scaling policy
HR data
HR data
ETL
sales data
data warehouse
ETL = Extract, Transform & Load
Different layers
Reporting
Data Science
Raw Data
Access layer
data integration
Staging Data
area Transformation
Cloud Computing
Cloud Computing
• Infrastucture
• Security
• Electricity
Data Center • Software/Hardware upgrades
Software-as-a-Service
Cloud Computing
Application Databases, tables etc.
Software
Snowflake
Data Managing data storage,
Virtual warehouses,
Software-as-a-service Upgrades/Metadata etc.
Operating System
Physical servers
Physical servers
Enterprise
additional features
for the needs of
large-scale
Standard enterprises Business
introductory level Critical
even higher levels of data
protection for
organizations with
extremely
sensitive data
Virtual Private
highest level of
security
Snowflake Editions
Compute Storage
Compute Storage
$/€ Credits
Consumed
Snowflake Pricing
Region: EU (Frankfurt)
Platform: AWS
Virtual Warehouse Sizes
XS 1 L 8
S 2
XL 16
M 4
4XL 128
Snowflake Pricing
On Demand Capacity
Storage ✓ We think we need 1 TB of storage Storage
On Demand Capacity
Storage Storage
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
Custom Role 3
PUBLIC
Snowflake Roles
BULK CONTINUOUS
LOADING LOADING
✓ stages
Location of data files where data can be loaded from
External Internal
Stage Stage
Understanding Stages
External Internal
Stage Stage
RETURN_n_ROWS (e.g. RETURN_10_ROWS) Validates & returns the specified number of rows;
fails at the first error encountered
RETURN_ERRORS Returns all errors in Copy Command
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ Specifies whether to return only files that have failed to load in the statement result
✓ DEFAULT = FALSE
Copy Options
✓ Specifies whether to truncate text strings that exceed the target column length
Copy Options
✓ Specifies to load all files, regardless of whether they’ve been loaded previously and
have not changed since they were loaded
✓ Specifies whether to truncate text strings that exceed the target column length
✓ FALSE = COPY produces an error if a loaded string exceeds the target column length
✓ DEFAULT = FALSE
Copy Options
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ DEFAULT: FALSE
Load unstructured data
Create Stage
✓ Separated according to different workloads ✓ For known patterns of high work load
Scaling Maximize
Out Cache Usage
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
Dedicated virtual warehouse
Identify &
Classify
Create dedicated
virtual
warehouses
✓ For every class of workload & assign users
Considerations
✓ Avoid underutilization
Refine
classifications
Let's practice!
How does it work in Snowflake?
ETL/ELT
Data scources
Scaling Up/Down
✓ Changing the size of the virtual warehouse
depending on different work loads in different periods
✓ Results are cached for 24 hours or until underlaying data has changed
What can we do?
✓ Example: Team of Data Scientists run similar queries, so they should all use
the same warehouse
Clustering in Snowflake
✓ Cluster keys are not always ideal and can change over time
✓ For large tables this improves the scan efficiency in our queries
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City Event Date Event ID Customers City
1
2021-12-04 134586 … … 2021-12-04 134586 … …
2
2021-06-07 134594 … … 2021-06-07 134594 … …
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-08-03
2021-08-04
134599
134601
…
…
…
…
3
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … … WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01 '
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
SELECT COUNT(*)
2021-04-05 134589 … … WHERE Event_Date > '2021-07-01'
2021-06-07 134594 … … AND Event_Date < '2021-08-01'
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City Event Date Event ID Customers City
1
2021-12-04 134586 … … 2021-12-04
2021-03-12 134586
134584 … …
2
2021-06-07 134594 … … 2021-06-07
2021-07-03 134594
134597 … …
2021-08-03
2021-08-04
134599
134601
…
…
…
…
2021-08-03
2021-11-04
2021-08-04
2021-12-04
134599
134588
134601
134586
…
…
…
…
3
When to cluster?
✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
S3 notification Serverless
Load
COPY
S3 bucket
Snowflake DB
Setting up Snowpipe
RETENTION
PERIODE
DEFAULT = 1
Fail Safe
Current
Data Storage
Continuous Data Protection Lifecycle
✓ No user
✓ Recovery beyond Time Travel
operations/queries ✓ SELECT … AT | BEFORE
✓ Access and query data etc.
✓ Restoring only by snowflake support UNDROP
etc.
✓ Time Travel Retention Period ✓ Time Travel Retention Period ✓ Time Travel Retention Period
0 – 90 days 0 – 1 day 0 – 1 day
✓ Time Travel Retention Period ✓ Time Travel Retention Period ✓ Time Travel Retention Period
0 – 90 days 0 – 1 day 0 – 1 day
Development Production
Swap
Meta data Meta data
Swapping
Copy
Account 1
Producer
Account 2
Consumer
Read-only
Data Sharing
Account 1
Account 2
Compute Resources Account 2
Read-only
Data Sharing
Account 1
Own Compute
Resources
Reader Account
Data Sharing
Reader Account
Data Sharing with
Non Snowflake Users
Reader Account
Sharing with
Non Snowflake users
New Reader ✓ Indepentant instance with
Account own url & own compute resources
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling
Why Sampling?
Why Sampling? 10 TB
SAMPLE
500 GB
Data Sampling Methods
Every row is chosen with percentage p Every block is chosen with percentage p
Reader Account
Tasks & Streams
Scheduling Tasks
Task A Task B
Table
Stream object
Streams
Table
Stream object
DELETE
INSERT
UPDATE
Streams
Table
Stream object
Streams
Table
Stream object
METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID
Streams
INSERT
Streams
ETL
Data scources
Streams
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
Reporting
Data Science
Raw Data
Access layer
data integration
✓ INSERT ✓ INSERT
✓ UPDATE
✓ DELETE
Syntax
✓ Benefits
✓ Maintenance costs
When to use MV?
Stream object
VIEW / TABLE
Underlaying Table
× UDFs
× HAVING clauses.
× ORDER BY clause.
× LIMIT clause
Data Masking
Data Masking
Data Masking
Data Masking
Column-level Security
Access Control
Access Control
Discretionary Role-based
Access Control Access Control
(DAC) (RBAC)
✓ Each object has an owner who can ✓ Access privileges are assigned to
grant access to that object roles, which are in turn assigned
to users
Access Control
GRANT <role>
TO <user>
User 1
Creates
Role 1 Table Role 2
Owns User 2
Privilege
Role 3
GRANT <privilege> User 3
ON <obeject>
TO <role>
Securable objects
Account
Other Account
User Role Database Warehouse objects
Schema
Other Schema
Table View Stage Integration objects
Access Control
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
Snowflake Roles
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
Custom Role 3
PUBLIC
Snowflake Roles
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
ACCOUNTADMIN
Top-Level-Role
✓ Account operations
✓ Multi-factor authentification
(create reader account, billing
At least two users should be assigned to that role
✓ First etc.)
✓
user will have this role assigned
✓ Avoid creating objects with that role unless you have
✓ Initial setup & managing account level to
objects
ACCOUNTADMIN
ACCOUNTADMIN
✓ Reader Account
✓ Multi-Factor Authentification
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
SECURITYADMIN
Custom Role 1
Custom Role 3
SYSADMIN
Database administrators
Data scources
Reporting
BI
ETL/ELT
Marketing
Data Science
Snowflake & other tools
✓ Virtual warehouses
✓ Table design
✓ Monitoring
✓ Retention period
How does it work in Snowflake?
ETL/ELT
Data scources
Virtual warehouse
✓ Dimensions
Retention period