0% found this document useful (0 votes)

180 views

Snowpark For Python

This document demonstrates connecting to Snowflake via Snowpark without using PySpark. It shows how to join and aggregate large tables, write results to a new table, and scale the warehouse size. Key benefits of Snowpark over Spark/PySpark are also summarized, including being quicker to migrate to, cheaper by using serverless compute that scales instantly, faster by eliminating unnecessary data movement, and easier to use with less maintenance required.

Uploaded by

Juan Francisco Painen Melo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views

Snowpark For Python

Uploaded by

Juan Francisco Painen Melo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

https://github.

com/NickAkincilar/Sample_Snowpark_Demos/blob/main/
Snowpark_Data_Engineering_Public.ipynb

https://h2o.ai/blog/h2o-integrates-with-snowflake-snowpark-java-udfs-how-to-better-leverage-
the-snowflake-data-marketplace-and-deploy-in-database/

Install Snowpark
In [ ]:
# !pip install snowflake-snowpark-python

Connect to Snowflake via SnowPark (&

without PySpark)
In [32]:
import time
# ---> REMOVE PYSPARK REFERENCES

# import pyspark.sql.functions as f
# from pyspark.sql import SparkSession
# from pyspark.sql.functions import udf,col
# from pyspark.sql.types import IntegerType
# spark = SparkSession.builder.appName("DataEngeering1").getOrCreate()

# <--- REPLACE WITH SNOWPARK REFERENCES (Rest of code is almost

identical)

import snowflake.snowpark.functions as f
from snowflake.snowpark import Session, DataFrame
from snowflake.snowpark.functions import udf, col
from snowflake.snowpark.types import IntegerType
from snowflake.snowpark.functions import call_udf

# <----- Make these changes before running the notebook -------

# Change Connection params to match your environment
#
<------------------------------------------------------------------------
----

Warehouse_Name = 'MY_DEMO_WH'
Warehouse_Size = "LARGE"
DB_name = 'DEMO_SNOWPARK'
Schema_Name = 'Public'

CONNECTION_PARAMETERS= {
'account': '<Snowflake_Account_Locator>',
'user': 'SomeUser',
'password': 'Not4u2Know',
'role': 'SYSADMIN'
}

print("Connecting to Snowflake.....\n")
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
print("Connected Successfully!...\n")

sql_cmd = f"CREATE OR REPLACE WAREHOUSE {Warehouse_Name} WAREHOUSE_SIZE =

'X-Small' AUTO_SUSPEND = 10 "
print("XS Cluster Created & Ready \n")

session.sql(sql_cmd).collect()

sql_cmd = f"CREATE OR REPLACE DATABASE {DB_name}"

session.sql(sql_cmd).collect()
print("Database is Created & Ready \n")

session.use_database(DB_name)
session.use_schema(Schema_Name)
session.use_warehouse(Warehouse_Name)
Connecting to Snowflake.....

Connected Successfully!...

XS Cluster Created & Ready

Database is Created & Ready

Start Data Engineering Process

In [30]:
# 2 - READ & JOIN 2 LARGE TABLES (600M & 1M rows)
print("Joining, Aggregating with 2 large tables(600M & 1M rows) & Writing
results to new table(80M rows) ..\n")

dfLineItems = session.table("SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM")
# 600 Million Rows
dfSuppliers = session.table("SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.SUPPLIER")
# 1 Million Rows

print('Lineitems Table: %s rows' % dfLineItems.count())

print('Suppliers Table: %s rows' % dfSuppliers.count())

# 3 - JOIN TABLES
dfJoinTables = dfLineItems.join(dfSuppliers,
dfLineItems.col("L_SUPPKEY") ==
dfSuppliers.col("S_SUPPKEY"))

# 4 - SUMMARIZE THE DATA BY SUPPLIER, PART, SUM, MIN & MAX

dfSummary = dfJoinTables.groupBy("S_NAME", "L_PARTKEY").agg([
f.sum("L_QUANTITY").alias("TOTAL_QTY"),
f.min("L_QUANTITY").alias("MIN_QTY"),
f.max("L_QUANTITY").alias("MAX_QTY"),
])
Joining, Aggregating with 2 large tables(600M & 1M rows) & Writing
results to new table(80M rows) ..

Lineitems Table: 600037902 rows

Suppliers Table: 1000000 rows

↑ Compute is NOT used up to this point. (Lazy Execution Model) !!!

3. Storing the Results in Table or Showing results

triggers the compute & previous steps.
In [31]:
start_time = time.time()

# 4 - INCREASE COMPUTE SIZE

print( f"Resizing to from XS(1 Node) to {Warehouse_Size} ..")

sql_cmd = f"ALTER WAREHOUSE {Warehouse_Name} SET WAREHOUSE_SIZE =

'{Warehouse_Size}' WAIT_FOR_COMPLETION = TRUE"
session.sql(sql_cmd).collect()

print("Completed!...\n\n")

# 5 - WRITE THE RESULTS TO A NEW TABLE ( 80 Million Rows)

# <-- This is when all the previous operations are compiled & executed as
a single job
print("Creating the target SALES_SUMMARY table...\n\n")
dfSummary.write.mode("overwrite").saveAsTable("SALES_SUMMARY")
print("Target Table Created!...")

# 6 - QUERY THE RESULTS (80 Million Rows)

print("Querying the results..\n")
dfSales = session.table("SALES_SUMMARY")
dfSales.show()
end_time = time.time()
# 7 - SCALE DOWN COMPUTE TO 1 NODE
print("Reducing the warehouse to XS..\n")
sql_cmd = "ALTER WAREHOUSE {} SET WAREHOUSE_SIZE =
'XSMALL'".format(Warehouse_Name)
session.sql(sql_cmd).collect()

print("Completed!...\n")

print("--- %s seconds to Join, Summarize & Write Results to a new Table

--- \n" % int(end_time - start_time))
print("--- %s Rows Written to SALES_SUMMARY table" % dfSales.count())
Resizing to from XS(1 Node) to LARGE ..
Completed!...

Creating the target SALES_SUMMARY table...

Target Table Created!...

Querying the results..

-------------------------------------------------------------------------
-
|"S_NAME" |"L_PARTKEY" |"TOTAL_QTY" |"MIN_QTY" |"MAX_QTY"
|
-------------------------------------------------------------------------
-
|Supplier#000941845 |13441818 |163.00 |14.00 |45.00
|
|Supplier#000816569 |1316566 |287.00 |3.00 |50.00
|
|Supplier#000305838 |18555783 |219.00 |3.00 |49.00
|
|Supplier#000030491 |10030490 |203.00 |4.00 |47.00
|
|Supplier#000659231 |1409229 |158.00 |19.00 |50.00
|
|Supplier#000911793 |13911792 |310.00 |2.00 |49.00
|
|Supplier#000560166 |9310156 |108.00 |6.00 |44.00
|
|Supplier#000598113 |7598112 |155.00 |12.00 |47.00
|
|Supplier#000951634 |16701617 |190.00 |9.00 |50.00
|
|Supplier#000460895 |7210887 |268.00 |4.00 |49.00
|
-------------------------------------------------------------------------
-
Reducing the warehouse to XS..

Completed!...

--- 19 seconds to Join, Summarize & Write Results to a new Table ---

--- 79975543 Rows Written to SALES_SUMMARY table

Benefits of Snowpark Over Spark &

PySpark
- Quick to Migrate as code is mostly identical & does not require re-learning
new language

- Cheaper as compute is fully serverless. It can Scale (up/Down) instantly via

code & runs(costs) only when in use.

- Faster as all unnecesseary data movement is eliminated = Less time using

Compute = Less Cost

- Easier to use = Less FTE as Little to No Maintanence needed for Compute

& Storage.

https://github.com/NickAkincilar/Sample_Snowpark_Demos

Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
OEL01
No ratings yet
OEL01
8 pages
SF-DAY11-SNOWPARK
No ratings yet
SF-DAY11-SNOWPARK
34 pages
business_requirements 2nd project
No ratings yet
business_requirements 2nd project
6 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
Snowflake Mini Project
No ratings yet
Snowflake Mini Project
7 pages
S
No ratings yet
S
22 pages
Divya Mishra Class 12
No ratings yet
Divya Mishra Class 12
5 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Data Exploration With SQL & Python - Ipynb - Colab
No ratings yet
Data Exploration With SQL & Python - Ipynb - Colab
6 pages
Com Project
No ratings yet
Com Project
39 pages
Project-Raghav
No ratings yet
Project-Raghav
26 pages
import mysql
No ratings yet
import mysql
4 pages
Big Data Analytics in Apache Spark
No ratings yet
Big Data Analytics in Apache Spark
79 pages
Shop Management System
No ratings yet
Shop Management System
17 pages
Snow SQL
No ratings yet
Snow SQL
3 pages
Customer - Segmentation - Jupyter Notebook
No ratings yet
Customer - Segmentation - Jupyter Notebook
3 pages
Inventory Management System
No ratings yet
Inventory Management System
17 pages
Sparktuning
No ratings yet
Sparktuning
10 pages
FASTION STORE FINAL
No ratings yet
FASTION STORE FINAL
27 pages
Pandas vs SQL Concepts Final
No ratings yet
Pandas vs SQL Concepts Final
13 pages
bakery management
No ratings yet
bakery management
4 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Optimizing 1TB Data Handling using PySpark 3p
No ratings yet
Optimizing 1TB Data Handling using PySpark 3p
3 pages
Online Reatil Data
No ratings yet
Online Reatil Data
3 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
Aaaaaaaaadsasdas
No ratings yet
Aaaaaaaaadsasdas
1 page
Item Module
No ratings yet
Item Module
18 pages
c.s Project Presentation (1)
No ratings yet
c.s Project Presentation (1)
18 pages
Wa0011.
No ratings yet
Wa0011.
26 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Dsadasdase 213
No ratings yet
Dsadasdase 213
1 page
Comp Proj 2
No ratings yet
Comp Proj 2
11 pages
Pyspark - SQL Module
No ratings yet
Pyspark - SQL Module
132 pages
RDD - Mini - Project - 1 - 1707570179 2024-02-10 13 - 03 - 29
No ratings yet
RDD - Mini - Project - 1 - 1707570179 2024-02-10 13 - 03 - 29
10 pages
Select AVG AS From Group BY: 'My - Account' 'My - User' 'My - Password' 'My - Warehouse'
No ratings yet
Select AVG AS From Group BY: 'My - Account' 'My - User' 'My - Password' 'My - Warehouse'
1 page
grocery management project
No ratings yet
grocery management project
25 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Fasion Store
No ratings yet
Fasion Store
4 pages
Grocery_Management_System_using_Python[1]
No ratings yet
Grocery_Management_System_using_Python[1]
7 pages
Project Title_ _E-Commerce Inventory Management System
No ratings yet
Project Title_ _E-Commerce Inventory Management System
5 pages
Supply Chain Management _ ML_ FA _ DA Project
No ratings yet
Supply Chain Management _ ML_ FA _ DA Project
13 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Warehouse Optimize- Snowspark
No ratings yet
Warehouse Optimize- Snowspark
14 pages
Python - Data Analysis
No ratings yet
Python - Data Analysis
11 pages
raghi IP prj
No ratings yet
raghi IP prj
27 pages
PRJCT Report
No ratings yet
PRJCT Report
22 pages
Exp - 5
No ratings yet
Exp - 5
3 pages
KBKrishnaTeja Interview Questions
No ratings yet
KBKrishnaTeja Interview Questions
2 pages
React Portfolio App Development: Increase your online presence and create your personal brand
From Everand
React Portfolio App Development: Increase your online presence and create your personal brand
Abdelfattah Ragab
No ratings yet
c
No ratings yet
c
28 pages
Boutique
No ratings yet
Boutique
21 pages
Cs Investigatory Project
No ratings yet
Cs Investigatory Project
16 pages
Advanced Features of The Api
No ratings yet
Advanced Features of The Api
26 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
Silver Linings in Dark Clouds
From Everand
Silver Linings in Dark Clouds
Silverline Dennis Dic-Fiberesima
No ratings yet
11G-Reduce XTTS Downtime Using Cross Platform Incremental Backup
No ratings yet
11G-Reduce XTTS Downtime Using Cross Platform Incremental Backup
16 pages
5 PIG Big Data Analytics Final Year
No ratings yet
5 PIG Big Data Analytics Final Year
25 pages
Office 365 IdFix Guide Version 1.11
No ratings yet
Office 365 IdFix Guide Version 1.11
21 pages
xii pre board re test
No ratings yet
xii pre board re test
9 pages
SQL Server-1 - Basics
No ratings yet
SQL Server-1 - Basics
65 pages
Management Information Systems Assignment 2
No ratings yet
Management Information Systems Assignment 2
8 pages
Features: Allows Developers To Create Custom Solutions Using VBA Code
No ratings yet
Features: Allows Developers To Create Custom Solutions Using VBA Code
3 pages
Troubleshooting CM
No ratings yet
Troubleshooting CM
6 pages
Index: S.nO Questiones 1 2
No ratings yet
Index: S.nO Questiones 1 2
43 pages
Updated Data Engineering Syllabus 1
No ratings yet
Updated Data Engineering Syllabus 1
6 pages
OC Schema Document
No ratings yet
OC Schema Document
342 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
65 pages
L I: Generating Organizational Threat Intelligence From Global and Local Cyber Knowledge
No ratings yet
L I: Generating Organizational Threat Intelligence From Global and Local Cyber Knowledge
9 pages
SQL Replication Basic
No ratings yet
SQL Replication Basic
22 pages
Protein Sequence Database Ankita Sharma
No ratings yet
Protein Sequence Database Ankita Sharma
31 pages
Django ORM and Queryset
No ratings yet
Django ORM and Queryset
29 pages
RDM Server
No ratings yet
RDM Server
2 pages
GraphQL Basics
No ratings yet
GraphQL Basics
42 pages
Department of Automobile & Mechanical Engineering: System Design and Simulation
No ratings yet
Department of Automobile & Mechanical Engineering: System Design and Simulation
8 pages
Summary:: Email: PHONE: 9703028243
No ratings yet
Summary:: Email: PHONE: 9703028243
2 pages
CST204 Database Management Systems, January 2024
No ratings yet
CST204 Database Management Systems, January 2024
4 pages
Describe The Interdependence That Exists Between DSDLC Stages. Answer
No ratings yet
Describe The Interdependence That Exists Between DSDLC Stages. Answer
9 pages
Module:-11. Day56,57,58
No ratings yet
Module:-11. Day56,57,58
17 pages
Database Administration
No ratings yet
Database Administration
56 pages
Atsms Report Na
No ratings yet
Atsms Report Na
37 pages
MFG Pro Eb21 Installation Guide Progress Database
No ratings yet
MFG Pro Eb21 Installation Guide Progress Database
183 pages
Working With Databricks Tables, Databricks File System (DBFS) Etc
No ratings yet
Working With Databricks Tables, Databricks File System (DBFS) Etc
3 pages
Databases in Bioinformatics - An Introduction
No ratings yet
Databases in Bioinformatics - An Introduction
11 pages
Unit-1 L1-DBMS Introduction
No ratings yet
Unit-1 L1-DBMS Introduction
44 pages
Weaviate Advanced RAG Techniques eBook
100% (1)
Weaviate Advanced RAG Techniques eBook
13 pages