Snowflake Notes
Snowflake Notes
Key Concepts
1. Database Storage
2. Query Processing
3. Cloud Services
WAREHOUSE_SIZE='X-SMALL'
AUTO_SUSPEND = 180
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED=TRUE;
Snowflake Editions:
4. Caching in snowflake & Types of Chasings?
When user given query request, the Cloud services take that request and submit
to the VWH then VWH will go and take the data form data storage and give to
Cloud service.
Finally the cloud service give result to user and store the result in Result cash.
Types of Caching’s:
1. Result Cashing: The result will store in Cloud services.
2. Local disk chasing: The query result will store in VWH cash.
3. Remote disk chasing: It will store in cloud cashing.
5. Clustering in snowflake?
Clustering: Clustering is to co-locate similar rows in the same micro-
partitions.
In snowflake the table file data will be stored as micro petitions in database
storage. If we apply cluster after storage data the cluster will be happen so
processing cost will be there. Better to add order by clause while loading data
into table.
Providers: A data provider is any Snowflake account that creates shares and
makes them available to other Snowflake accounts to consume
Consumers: A data consumer is any account that chooses to create a database
from a share made available by a data provider
Reader Account: Share data with a consumer who does not already have a
Snowflake account and/or is not ready to become a licensed Snowflake customer
• Sharing Data
• Web UI – Inbound , Outbound
• Outbound Shares (Providers)
• Shares that you have created and you have access to
• Create a Share
• Edit Shares
• Inbound Shares (Consumers)
• View Shares from providers that you are granted to
• Create a Database for the share
• Sharing Considerations
• Currently, consumer accounts must be in the same Snowflake
Region as your account; i.e. you can only share with other accounts
in your Snowflake Region.
• A share can include data from multiple databases.
• For data security and privacy reasons, only secure views are
supported in shares at this time. If a standard view is added to a
share, Snowflake returns an error.
• Adding accounts to a share immediately makes the share available
to consume by the accounts.
• New and modified rows in tables in a share (or in tables referenced
by a view in a share) are available immediately to all consumers who
have created a database from the share. Keep this in mind when
updating these tables.
• A new object created in a database in a share is not automatically
available to consumers.
• General Limitations for Shared Databases
• Shared databases are read-only. Users in a consumer account can
view/query data, but cannot insert or update data, or create any
objects in the database.
• The following actions are not supported:
• Creating a clone of a shared database or any schemas/tables
in the database.
• Time Travel for a shared database or any schemas/tables in
the database.
• Editing the comments for a shared database
Share DDLs
• Create Share
• Alter Share
• Drop Share
• Describe Share
• Show Shares
• Grant <Privileges> to Share
• Revoke <Privileges> to Share
• Show Grants to Share
• Show Grants of Share
create database reader_sales;
create table customer as
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER;
create share sales_s;
Fail Safe:
Fail safe provides 7-day period during which historical data is recoverable by
Snowflake.
Fail-safe is not provided access us to get historical data after the Time Travel
retention period has ended.
It is for use only by Snowflake to recover data that may have been lost or
damaged due to extreme operational failures
Permanent Table:
CREATE OR REPLACE TABLE READER_SALES.PUBLIC.CUSTOMER
AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER;
TRANSIENT:
CREATE OR REPLACE TRANSIENT TABLE READER_SALES.PUBLIC.CUSTOMER
AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER;
TEMPORARY:
CREATE OR REPLACE TEMPORARY TABLE
READER_SALES.PUBLIC.CUSTOMER_TEMP
AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.CUSTOMER;
11. Clone & Swap creation in snowflake ?
Clone: Which means taking metadata copy of the table. Then new copy of the
metadata pointing to same storage in backend. But after cloning the tables will be
independently each other.
If add any data into any tables, the files will be added in separately (S3 storage).
-- CLONE TABLE
CREATE OR REPLACE TRANSIENT TABLE EMP_CLONE CLONE EMP;
SELECT * FROM EMP_CLONE;
-- Make changes in EMP_CLONE it should not affect EMP Table.
DELETE FROM EMP_CLONE WHERE FIRST_NAME='Arlene';
SELECT * FROM EMP_CLONE WHERE FIRST_NAME='Violette';
---Checking clone id's,(ID and clone id will be same EMP, But clone id will same as
emp but ID will be unique).
SELECT * FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS WHERE
TABLE_NAME LIKE 'EMP'
AND TABLE_CATALOG='SFNOWFLAKE_TUTORIAL' AND TABLE_DROPPED IS NULL;
SELECT * FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS WHERE
TABLE_NAME LIKE 'EMP_CLONE'
AND TABLE_CATALOG='SFNOWFLAKE_TUTORIAL' AND TABLE_DROPPED IS NULL;
Swap: Swapping the one table metadata to another table. Once swap emp_dev
to Emp_prod, then emp_dev data will not there. The records will be there in
EMP_Prod.
-- We will be copying the data where we have error values for date columns.
copy into emp
from @my_s3_stage
file_format = (type = csv field_optionally_enclosed_by='"')
--pattern = '.*employees0[1-5].csv'
validation_mode = 'RETURN_ERRORS';
-- We will change the data type of date column and try to load it again.
copy into emp
from @my_s3_stage
file_format = (type = csv field_optionally_enclosed_by='"')
--pattern = '.*employees0[1-5].csv'
validation_mode = 'RETURN_ERRORS';
File Options:
-- Create emp table.
create or replace table emp (
first_name string ,
last_name string ,
email string ,
streetaddress string ,
city string ,
start_date date
);
-- Copy data using filter condition on file name.
copy into emp
from @my_s3_stage
file_format = (type = csv field_optionally_enclosed_by='"')
pattern = '.*employees0[1-5].csv'
-- ON_ERROR='CONTINUE'
files=('employees01.csv','employees_error_file0.csv','employees_error_file1.csv
')
ON ERROR:
Note1: If there is an error snow pipe will not give you any notification. It will
remain silent. It's your responsibility to check for errors.
-- Validate
select SYSTEM$PIPE_STATUS('snowpipe');
select * from
table(information_schema.copy_history(table_name=>'emp_snowpipe',
start_time=> dateadd(hours, -1, current_timestamp())));
--arn:aws:sqs:us-east-1:628993367716:sf-snowpipe-AIDAZE4XND2SCZEJXXYBR-
gmNjf-iFPpApjNiM7VwsSQ
--arn:aws:sqs:us-east-1:628993367716:sf-snowpipe-AIDAZE4XND2SCZEJXXYBR-
gmNjf-iFPpApjNiM7VwsSQ
Integration: The AWS & Screat key can integrate instead of giving directly in copy
command, Because others can see the keys’ and we can do changes in future in
once place instead of changing copy commands.
create or replace storage integration s3_int
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = 'arn:aws:iam::579834220952:role/snowflake_role'
storage_allowed_locations = ('s3://hartfordstar/');
XML:
20. AWS s3 account creation (S3 bucket, Policy, Role) & Creating Integration.
AWS S3: It is a simple storage service that we can use to store and retrieve any
amount data, at any time, from anywhere on the web.
How to connect S3 bucket & Integration from SK:
• Step 1: Create S3 Bucket / Access Permissions for the S3 Bucket
• Step 2: Snowflake related AWS IAM Role creation
• Step 3: Creation of Snowflake Cloud Storage Integration
• Step 4: AWS IAM User for your Snowflake Account updation with Snowflake
Integration
• Step 5: Granting the IAM User Permissions to Access Bucket Objects
• Step 6: Creating an External Stage
• Step 7: Load Data into Table
Integration:
• Integration With S3 can be done through Below Options
• Option 1: Configuring a Snowflake Storage Integration
• Option 2: Configuring an AWS IAM Role
• Option 3: Configuring AWS IAM User Credentials
==================END================