Snowflake - Interview Questions

External Tables
#What is an External Table in snowflake ?

Why would you use it in your project ?
External tables in snowflake as the name suggests are tables where the table data is stored
outside or externally to snowflake database in an external stage and can be queried from
snowflake with the help of SQL.
• External tables are ready only, no DML operation is allowed
• You can create views and use them in joins
• Generally slower than a snowflake table to return data.
• Can be partitioned to improve query performance.
Why use External Tables ?

• Can be used to ingest data from cloud storage by select as insert for large files.
• Avoid the effort of writing code to ingest data with the Copy command.
• Can create streams on external tables to ingest data in near real time
• Same file can be consumed by multiple applications
# What does the alter external table refresh do ?

When a file is added to cloud storage to the location where the external table is pointing to
this file is not accessible to the snowflake external table. We need to register the newly
added file with the external table and we can do this by using the alter external table refresh
command. Once we run the alter command, the file is registered with the external table and
the data can be selected from it.
It is also possible to setup automatic notifications so that the external table is alerted when
files are added or deleted.
select * from table
(information_schema.external_table_files(table_name=>'my_aws_ext_table'));
select * from table(information_schema.external_table_file_registration_history (table_name
=>'my_aws_ext_table'))
Query Performance Tuning
# What is query acceleration?
# How can you enable query acceleration? Query acceleration is a serverless compute service
offered by snowflake which can allocate compute to queries to enhance their performance.
The service must be enabled at the warehouse level for it to work. Query acceleration is enabled at
the warehouse level and can be done with the alter statement
ALTER WAREHOUSE COMPUTE_WH
SET ENABLE_QUERY_ACCELERATION = true

QUERY_ACCELERATION_MAX_SCALE_FACTOR = 1;
# What is QUERY_ACCELERATION_MAX_SCALE_FACTOR?
The query acceleration scale factor determines how much compute resource will be provided to the
warehouse.
If we set the value to 1 then it will consume the same number of credits as the warehouse
COMPUTE_WH.
The default value is 8
ALTER WAREHOUSE COMPUTE_WH SET
ENABLE_QUERY_ACCELERATION = true
QUERY_ACCELERATION_MAX_SCALE_FACTOR = 1;
# What is memory spillage ?
When a warehouse runs out of memory while executing a query it moves the data to the disk of the
warehouse. This is called memory spilling.
If the disk of the warehouse is not sufficient then it will spill to cloud storage.
Memory spillage is bad for query performance because it needs to write to disk and this slows down
the processing of the query
SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name,
warehouse_name,
bytes_spilled_to_local_storage,
bytes_spilled_to_remote_storage
FROM
snowflake.account_usage.query_history
WHERE (bytes_spilled_to_local_storage > 0 OR bytes_spilled_to_remote_storage > 0 ) AND

start_time::date > dateadd('days', -45, current_date)
ORDER BY bytes_spilled_to_remote_storage,
bytes_spilled_to_local_storage DESC LIMIT 10;
# How will you fix memory spillage issues?
Memory spillage happens because memory available is not sufficient to process the data being
consumed. Reduce the amount of data being processed , this can be done by eliminating columns
that are not being used , or filtering data that does not need to be processed. Increase the size of the
warehouse to increase the amount of memory available Use snowpark optimized warehouse which
has 16x memory and 10x local disk space.
Standard Snowflake Warehouse Snowpark Optimized Warehouse
X-small 16GB of RAM and 200GB SSD disk. ------ Not Available
Small 32GB of RAM and 400GB SSD disk.------------- Not Available
Medium 64GB of RAM and 800GB SSD disk.- 4 credits per hour -- Medium 1024 GB of RAM and 8000
GB disk – 6 credits per Hour
# What is partition pruning or query pruning in snowflake ?
The process of skipping partition when reading data from micro-partitions is called partition pruning
or query pruning.
The metadata stored in partitions allow snowflake to skip partitions that do not contain data
included in the where clause or join condition
What is metadata cache ?
Snowflake stores certain information about each table like row count, min max values of a column in
its metadata in the cloud services layer so that certain queries about the table can be easily
answered without reading the table.
• ROW COUNT of a table.
• MIN and MAX values of a column.
• Number of DISTINCT values in a column
# What is the difference between automatic clustering and manually clustering tables ?
Automatic clustering as the name suggests is automatically done by snowflake and is a process of
partitioning data into micro-partitions based on snowflakes best guess based on its analysis of the
data.
In manual clustering which is only recommended for tables that are over I TB in size, we pick the
cluster keys based on our understanding of the data and the conditions that are used in filter
conditions and joins for that table.
We use the CLUSTER BY keyword to cluster the table. The table can be clustered either during
creation or latter with the alter table statement
# What does clustering depth of a table tell us ?
It provides us with information on number of overlapping micro partitions.
In general, a table with a large clustering depth will perform poorly in queries. The function is called
CLUSTERING_DEPTH(‘’)
# You suspect that queries are waiting to get picked up by a warehouse to be processed, how will you
confirm this ?
If simple queries are talking more time than it should then the cause of the problem could be
multiple queries running concurrently.
So how can we determine if there is a queuing problem.
SELECT
DATE_TRUNC('hour', start_time) start_time_trunced_at_hour,
HOUR(start_time) start_time_hour,
AVG(avg_running) avg_nbr_of_query_running_per_hour,
AVG(avg_queued_load) avg_nbr_of_queued_query_due2overloaded_WH_per_hour,
AVG(avg_queued_provisioning) avg_nbr_of_queued_query_due2provisioning_WH_per_hour,
AVG(avg_blocked) avg_blocked
FROM snowflake.account_usage.warehouse_load_history
WHERE DATE_TRUNC('DAY', start_time) = ‘2021-13-19'

AND warehouse_name = 'WAREHOUSE_NAME'
GROUP BY start_time_trunced_at_hour, start_time_hour
ORDER BY start_time_trunced_at_hour;
If we observe that there is queuing, then we can change the cluster to multi-warehouse or increase
the maximum number of warehouses in the multi-cluster warehouse.
Unstructured Data
# Can you unload data from a relational table to a JSON stage with the COPY command
If we create a JSON stage and load relational data into it the load process will fail, because the stage
is expecting JSON format. We will need to use the object_construct function convert the relational
data to JSON format.
# What data type do you recommend to store JSON and XML data ?
Snowflake natively supports storing of semi-structured data like JSON and XML with the VARIANT
datatype.
# Can you explain a scenario in your project where you needed to use variant data type?
We receive IOT data files in JSON format and we copy this data into snowflake tables called the RAW
layer which has VARIANT data type.
We use snowpipe to copy the data to snowflake tables.
Once data is added to the raw tables, we parse the data from the variant columns and load the data
to the final tables in snowflake.
#What are the semi-structured data supported by snowflake?
• JSON • XML • PARQUET • AVRO • ORC
# What does the FLATTEN Command do ?
The flatten command is used with Array, Object VARIANT datatypes and is used to convert data
present in these data types from rows to columns.
SELECT student_id, first_name, last_name, cert.this, cert.value FROM students stu, LATERAL FLATTEN
(input => stu.courses) cert;
DATA SECURITY
# How do you protect sensitive data like Date of Birth and SSN# from being seen by end users ?
We can use Dynamic Data Masking to protect sensitive data from being seen by end users. Only roles
that have access to the data will be able to see the data in its unmasked form.
We can also use Views to protect data, we can create a view which contain columns that can be
exposed to the end user and grant select access to the view.
Static data masking is updating sensitive data like Date of Birth and SSN# so that it is changed. Eg: All
birth date can be updated to 31-Dec-2010, so we will have a birth date but none of them will be
correct.
We can also use Row Access Policies to limit the data being seen by end users who should not have
access to it.
# What is Dynamic Data masking
Dynamic Data Masking is a Column-level Security feature that uses masking policies to selectively
mask plain-text data in table and view columns at query time The reason it is called dynamic is
because the masking changes depending on who is seeing the data.
The masking is not hard coded into the data, rather it is a mask which covers the data and the
masking changes as per the rules we have set.
# What is Row Access Policy in snowflake ?
Row Access policy is a row level security feature which determines what rows a user will see when
they run a select statement.
Two users who select data from the same table may see different data if the Row Access Policy for
them is different.
Snowflake Architecture
# What is the max data storage capacity of snowflake ? The max capacity is limited by the cloud
storage provided which is basically unlimited.
Cost Saving
# How can you reduce storage costs in your project?

• Use transient tables wherever suitable
• Set time-travel period on permanent tables to a suitable level
• Identify unused objects and drop them if not required, build automated stored
procedure to locate and drop objects that are unused for over a year.
• Use temporary tables if the data is not required after the session is done.
# How can you reduce storage costs in your project?

• Use transient tables wherever suitable
• Set time-travel period on permanent tables to a suitable level
• Identify unused objects and drop them if not required, build automated stored
procedure to locate and drop objects that are unused for over a year.
• Use temporary tables if the data is not required after the session is done.
• What are the best practices that we can follow to reduce credit consumption?
• •Size the warehouse appropriately, start with a smaller warehouse and test the
query and progressively increase the size till you can find a good fit. Over
provisioning can cost credits and should be avoided as cost doubles every
time we size-up.
• •Use the WAREHOUSE_LOAD_HISTORY and ensure that the warehouses are
being fully utilized when they are running.
• •Enable auto-suspend, credits are consumed even if no queries are being
executed, so enable auto-suspend to shut down warehouses when they are not
needed.
• •Control access to CREATE and MODIFY privilege of warehouse to control
creation and re-sizing of the warehouse which can increase cost.
• •Use STATEMENT_TIMEOUT_IN_SECONDS to kill hung or runaway queries as
they consume a lot of credits, and we can define the maximum amount of time
an SQL statement can be run before it is canceled.
• •Use resource monitors to enforce spending limits.
Secure View &Sharing Data
#What is a secure view. Why would you create a secure view ?
Create secure view view_customer as Select * from customer Views should be defined as secure
when they are specifically needed to limit access to sensitive information.
What is the difference between a regular view and a secure view ?
• The query used in creation of the secure view is not visible to users(except owner) providing
greater security.
• Regular views are more performant than secure views because internal optimizations used in
regular views are not used in secure views as they can expose the data through user code this
provides greater data security with the secure view but reduce view performance
• Secure views can be shared with data sharing, regular views cannot be shared.
# Have you created a data share and shared data in your project ?
We use data shares to share data between accounts within our own organization. We also share data
with external partners who need access to our data with the help of data shares.
# What are reader accounts ?
Have you created reader accounts in your project ?

Reader account is snowflakes way of providing data access to users who do not have a snowflake
subscription and do not plan to get one.
Reader accounts enable these users to get setup with snowflake quickly without having to go thru
the process of signing a licensing agreement.
The account that creates the reader account is responsible for providing the compute needed for
the consumption of the data.
The project data architect creates reader accounts for vendors.
You can see all reader accounts by using the command SHOW MANAGED ACCOUNTS
#What objects can we share in snowflake?
Secure View
• Tables
• Secure UDF
• External Tables
Unloading Data
# What is data unloading ? Data unloading is the process of moving data out of snowflake to cloud
storage. We use the COPY command to unload data from snowflake.
Why is unloading of data needed ?
• Snowflake is the source of the data and is needed in other apps.
• Data has been computed inside of snowflake and is needed elsewhere.
• Business user wants data in XLS format.
#When unloading data into cloud storage what best practice do you use?
To ensure that the export process is completed quickly and with usage of least amount of credits, I
would use the partition clause in the COPY command to ensure that parallelism feature of snowflake
is used, and multiple files are generated based on the partition I want.
I would also name of the file to disallow the file to be named generically by snowflake.
COPY INTO @MYDB.MYSTAGES.UNLOAD_OUTPUT/CUST FROM

SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER PARTITION BY C_MKTSEGMENT
# Was there a scenario in your project where you had to unload the data to an internal stage ?
Request for data is frequently received from business users from snowflake. This could be for data
analysis or dealing with data quality issues.
The way I provide the data is to unload into an internal stage and use the GET command to download
the data, I use the SINGLE=TRUE option to ensure that they file does not get split to multiple files.
# Do you prefer unloading data into multiple files or a single file from snowflake ?
To take advantage of parallelism provided by snowflake and get the files faster I prefer multiple files
being generated. This is the default format.
There is also a 5GB file limitation on cloud storage and if your file size is bigger then you need to
generate multiple files.
To get a single file we need to use the COPY option SINGLE=TRUE
# Can you unload data from multiple tables in a COPY command
COPY statement supports the full syntax of snowflake SQL and hence you can join the tables in the
COPY command and get data from more than 1 table.
COPY INTO @MYDB.MYSTAGES.UNLOAD_OUTPUT FROM (SELECT C.* FROM

SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER C,SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION
N WHERE C.C_NATIONKEY=N.N_NATIONKEY )
# How are parquet files zipped when being unloaded?
The default compression when unloading data to snowflake in parquet format is snappy. We can also
use LZO if we wish.
However we need to explicitly provide the compression type if we don’t want the default. For CSV
and JSON files the default compression type is GZIP. The other supported compression types are
bzip2, Brotli,Zstandard
How do we modify column datatypes when unloading data?
When using the COPY command we can use the CAST function to modify data types when unloading
data in parquet format.
COPY INTO @UNLOAD_OUTPUT_PARQUET FROM (SELECT CAST(C_CUSTKEY AS STRING),

CAST(C_NATIONKEY AS STRING) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER )
# What problems have you encountered when unloading data?
Load failures due to files already being present in the folder is a challenge we have encountered, we
resolved that by creating a lambda function(or Azure functions) that copies the file to a different
folder and adds the timestamp to the file.
# What problems have you encountered when unloading data?
unloading data in CSV format we observed that snowflake truncates data in decimal columns to 15,9.
We overcame this problem by casting the data to string and unloading the data.
# How can you limit the size of a file generated by unloading data from snowflake.
We need to use the MAX_FILE_SIZE parameter and limit the size of the files. Files generated are
generally around 16 MB in size , you can make the files smaller by using the MAX_FILE_SIZE
parameter.
COPY INTO @MYDB.MYSTAGES.UNLOAD_OUTPUT FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER MAX_FILE_SIZE=10000000
Why have smaller files instead of 1 big file ?
• Faster processing
• Able to take advantage of parallel processing
• Easier to consume by other applications which may have size limits.
Virtual Warehouse
#What is the maximum number of warehouses you can have in a cluster? 10 is the max number of
warehouses you can have inside a cluster.
What kind of scaling policy do you use in your project ?
It depends on the warehouse and what jobs the warehouse is designed to process. If the warehouse
is designed for end user consumption in Snowsight, we use economy scaling policy.
However for jobs that are run in ETL the auto scaling policy is standard.
We also use the standard scaling policy for warehouses that are consumed for reporting via tableau
and power BI.
# What is scaling up vs scaling out ?
Scaling up is increasing the compute and memory capacity of a warehouse. We can scale up by
resizing a warehouse from the current size to a higher one.
We scale up when the job needs memory and processing power Scaling out is increasing the number
of clusters in a multicluster warehouse.
Scale out is recommended when we have large number of queries that need to be processed and
waiting to be picked up.
# What is the auto suspension time period you use on the warehouse used in your project?
For ETL jobs and tasks we auto suspend them immediately. For end user warehouses we have set the
auto-suspension to the default time-period of 10 mins.
# Can you re-size a warehouse when it is running ?
Yes, you can re-size the warehouse when it is running, if you decide to re-size it to make it bigger the
currently running queries will not be able to take advantage of the bigger size.
Subsequent queries will be able to take advantage of the increased size.
Zero Copy Cloning
# What is zero copy cloning, what are its advantages over create table as select ?
Zero Copy cloning is a metadata operation which creates an exact copy of the object being cloned.
Because it is a metadata operation the data is not being replicated when a table ,schema or database
is being cloned, rather it creates a pointer to the same data.
Advantages:
• It is faster to create
• Saves credit consumption
• Saves storage space
#When cloning a table do the grants get copied over to roles that has access to the original table?
The grants do not get copied over when we clone a table, we need to use the COPY GRANTS
command along with the clone to ensure that the privileges get copied over.
CREATE TABLE CLONE_CUSTOMER CLONE CUSTOMER
#When cloning a table do the grants get copied over to roles that has access to the original table?
The grants do not get copied over when we clone a table, we need to use the COPY GRANTS
command along with the clone to ensure that the privileges get copied over.
CREATE TABLE CLONE_CUSTOMER CLONE CUSTOMER COPY GRANTS
When cloning a table do the CLUSTER Keys get copied over?
When we clone a table all the properties including comments, cluster keys, data get copied over, the
only thing that does not get copied over are grants
which can be included with the COPY GRANTS command
CREATE TABLE CLONE_CUSTOMER CLONE CUSTOMER
# In your project did you use zero copy cloning?
(SCENERIO QUESTION) Zero copy cloning can be used to backup tables that are required after the
time travel retention period is complete.
Zero copy cloning can be used to create new DEV/QA environments or when data needs to be fully
refreshed in these env.
==
Let us say that data in your production table was accidentally Updated, how will you recover it if you
have time travel available ?
(SCENERIO QUESTION) The way I would do this is by
• creating a clone of the prod table with the help of time travel to a point before the data was
truncated/updated. CREATE TABLE MYDB.PUBLIC.CLONE_CUSTOMER CLONE
MYDB.PUBLiC.CUSTOMER AT (TIMESTAMP => TO_TIMESTAMP_TZ('04/05/2024 01:02:03',
'mm/dd/yyyy hh24:mi:ss'));
• Validating the data to verify that the clone table data is correct.
• Truncating the original table

• Inserting data from the clone to the original table
# You have a table in a snowflake prod account and you want to clone it and bring it to a snowflake
database in a different account(QA), can you use zero copy cloning to do that ?
Zero Copy cloning works only across databases inside an account, we cannot do zero copy cloning
across accounts.
If zero copy cloning does not work, how can you bring the data from one account to another?
• We can use SHARES to share the data a between the accounts and then create a new table from
that shared data
• We can unload the data from snowflake into cloud storage and then use the COPY command to
copy data to the data in a different account.
• We can use python to copy data from one snowflake account to another
• Database replication
# If I clone a table and update the clone will the original table also get updated
When the clone is created, they reference the same data files at creation, however whenever there is
an update/insert new micropartitions are created which are used to store the changed data. Hence
the 2 tables will diverge in what they will return when the table is updated
Loading Data
# Do you use transformation when loading data using the copy command ?
We generally don’t use transformation when loading data to the RAW tables with the copy
command, the reason for this is because this tends to slow down the loads on large files.
We prefer to bring the data into snowflake and then do the necessary transformation in SQL.
Exception: In some cases, we may receive very large files with over 1000 columns , in this situation
we may choose to load a subset of fields into the target raw table.
In this scenario we need to use select in the COPY command and may need to apply transformations.
# What is needed to establish connectivity between snowflake and cloud storage(s3 and Azure Blob
Storage) ?
We need to create a storage integration object.
The storage integration object helps us establish connectivity between cloud storage and snowflake
and we can use the Copy command to copy data from s3 to snowflake tables
#How can we ingest data without using snowpipe, COPY or ETL tools like DBT ?
We can ingest data into snowflake by creating external tables on cloud storage folders, this gives us
select access to data files and we can use the INSERT into target_table
SELECT * FROM external_table to get the data into snowflake.
This method is very useful when you have large files of greater than 250 MB which are not suitable
for loading with snowpipe.
Create streams on external tables to ingest data in near real time.
# You received files from your new vendor, you are aware of the layout of the file as they shared the
details in the past. A target table was created with the file structure. Before loading the file to the
target table how will you validate if the file provided will load without any issues to the target table ?
We can use the VALIDATION_MODE=RETURN_ALL_ERRORS in the COPY command to see if there are
any errors the file might encounter when loaded into the target table. When the COPY command is
run in VALIDATION_MODE data is not loaded to the target table
# How will you setup a process of automated validation and quality control of all files received from
your source system before loading them to snowflake ?
When running jobs on a regular basis, we can validate files by using python to read the files from s3
and check that the important fields and values are populated and to throw an error if the validation
fails.
The second option is to ingest the data to snowflake raw tables and run SQL to validate if the file is
good before loading then further.
We can also use the SKIP_FILE option to not load files that have load errors when loading them.
#Once a file is loaded into a table with the COPY command, can you reload the file again ?
If you try to reload the file, snowflake will ignore the file. Snowflake stores file load metadata which
is associated with the table and is aware that the file is already loaded and will stop reload to prevent
duplicates. This metadata is re-set when the table is truncated or dropped and re-created.
#How can you reload the file, even if it is already loaded ?
I will use the FORCE option in the COPY command to force load the files. Use Case: When you need a
large volume of data for performance testing
# What is the data loading strategy in your project?
We use Tasks/Airflow and stored procedures to ingest when data needs to be loaded in batch mode.
Merge statements inside the stored procedure are used to update data in the target table.
For tables that need to be updated near real time we use task and streams. The task detects data in
the stream and SQL statement associated with the task will update the target table.
This SQL is generally a merge statement which uses the stream as input.
We use snowpipe for loading files that need to be ingested immediately and are less than 250
MB.IOT data is ingested in this way.
We also use external tables to ingest files that are over 1 GB in size, we use SELECT statement to get
the data into snowflake. Streams on External tables are used when files need to be ingested without
any delay
# When ingesting data from data lake, are most of your files in parquet format or csv and what is the
reason for that?
Data in s3/Azure Blob Storage is stored in parquet format, this is our enterprise standard for storing
data in s3/Azure Blob Storage and the data lake.
Parquet format helps conserve space and stores data in columnar format which is more efficient
when we select a subset of columns from Athena/Azure Data Lake Analytics.
While csv files tend to load faster than parquet files with the COPY command when loading
snowflake, we were not interested in re-creating the parquet files for ingestion into snowflake as this
would consume a lot of time and compute resources and lead to duplication of data.
UDF & Stored Procedures
# You want to truncate a table in snowflake, can you use a UDF to do so ?
UDF do not permit DDL statements in them, so we will need to use a stored procedure, also UDF are
called from select statement and cannot be invoked independently.
#What are the benefit of using stored procedures and UDF?
• Reusability of the code
• Use stored procedures in tasks as tasks can have only 1 SQL statement.
Combine multiple statements into a sequence of execution

# Have you created stored procedures in your project, what did you use it for ?
We used stored procedures to run the COPY command to ingest files from cloud storage.
We used stored procedures with tasks as task can have only one SQL statement.
We use stored procedures to truncate staging tables for those loads that are batch based.
# What language does snowflake UDF support ?
SQL, Python, Java Script and Java # What do you use in your project to create UDF and Stored
procedure We use SQL mainly because it is easier to maintain.

Snowflake - Interview Questions

Uploaded by

Copyright:

Available Formats

Snowflake - Interview Questions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Snowflake - Interview Questions

Uploaded by

Copyright:

Available Formats

External Tables

#What is an External Table in snowflake ?

Why use External Tables ?

# What does the alter external table refresh do ?

Query Performance Tuning

# What is query acceleration?

ALTER WAREHOUSE COMPUTE_WH

SET ENABLE_QUERY_ACCELERATION = true

The default value is 8

ALTER WAREHOUSE COMPUTE_WH SET

# What is memory spillage ?

SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name,

WHERE (bytes_spilled_to_local_storage > 0 OR bytes_spilled_to_remote_storage > 0 ) AND

bytes_spilled_to_local_storage DESC LIMIT 10;

# How will you fix memory spillage issues?

Standard Snowflake Warehouse Snowpark Optimized Warehouse

Small 32GB of RAM and 400GB SSD disk.------------- Not Available

# What is partition pruning or query pruning in snowflake ?

• ROW COUNT of a table.

• MIN and MAX values of a column.

• Number of DISTINCT values in a column

# What does clustering depth of a table tell us ?

It provides us with information on number of overlapping micro partitions.

So how can we determine if there is a queuing problem.

DATE_TRUNC('hour', start_time) start_time_trunced_at_hour,

WHERE DATE_TRUNC('DAY', start_time) = ‘2021-13-19'

GROUP BY start_time_trunced_at_hour, start_time_hour

We use snowpipe to copy the data to snowflake tables.

#What are the semi-structured data supported by snowflake?

• JSON • XML • PARQUET • AVRO • ORC

# What does the FLATTEN Command do ?

# What is Dynamic Data masking

# What is Row Access Policy in snowflake ?

# How can you reduce storage costs in your project?

# How can you reduce storage costs in your project?

Secure View &Sharing Data

#What is a secure view. Why would you create a secure view ?

What is the difference between a regular view and a secure view ?

# What are reader accounts ?

Have you created reader accounts in your project ?

The project data architect creates reader accounts for vendors.

#What objects can we share in snowflake?

Why is unloading of data needed ?

• Snowflake is the source of the data and is needed in other apps.

• Data has been computed inside of snowflake and is needed elsewhere.

• Business user wants data in XLS format.

COPY INTO @MYDB.MYSTAGES.UNLOAD_OUTPUT/CUST FROM

To get a single file we need to use the COPY option SINGLE=TRUE

# Can you unload data from multiple tables in a COPY command

COPY INTO @MYDB.MYSTAGES.UNLOAD_OUTPUT FROM (SELECT C.* FROM

# How are parquet files zipped when being unloaded?

How do we modify column datatypes when unloading data?

COPY INTO @UNLOAD_OUTPUT_PARQUET FROM (SELECT CAST(C_CUSTKEY AS STRING),

# What problems have you encountered when unloading data?

# What problems have you encountered when unloading data?

Why have smaller files instead of 1 big file ?

• Able to take advantage of parallel processing

• Easier to consume by other applications which may have size limits.

What kind of scaling policy do you use in your project ?