0% found this document useful (0 votes)

25 views15 pages

Data Engineers Guide To Python On Snowflake

The document serves as a guide for data engineers on using Python within the Snowflake Data Cloud through Snowpark, a developer framework that supports multiple programming languages. It outlines the benefits of Snowpark, including streamlined architecture, enhanced security, and improved pipeline efficiency, while also detailing the integration of Python with Snowflake's processing engine. Additionally, it provides insights into best practices, user-defined functions, stored procedures, and the use of open-source packages to optimize data engineering workflows.

Uploaded by

sangammathpati73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views15 pages

Data Engineers Guide To Python On Snowflake

Uploaded by

sangammathpati73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

THE DATA ENGINEER’S GUIDE

TO PYTHON FOR SNOWFLAKE EBOOK

3 Introduction
4 Snowpark for Data Engineering
6 Snowpark for Python
- Snowpark Client API
- Snowpark Server-Side Runtime
- Snowflake, Anaconda, and the Open Source Ecosystem
11 Best Practices: Data Engineering in Snowpark with Python
12 Beyond Snowpark: Other Capabilities in the Snowflake Data Engineering Ecosystem
14 Getting Started with Snowpark and Resources
15 About Snowflake
INTRODUCTION
Python consistently ranks in the top three most popular In the pages that follow, we’ll discuss Snowpark and best practices for
programming languages, and 68% of all developers surveyed said using Python within the Snowflake Data Cloud. You will learn how:
they “love” working in Python, according to Stack Overflow’s Annual
Developer survey.1 Snowflake supports data engineering with Snowpark, its main
benefits, and use cases
But for many years, data engineers have had to use separate tools for
data transformations in Python versus other programming languages. Snowpark supports Python and other programming languages,
in addition to SQL
Even with knowledge in those languages, setting up and managing
separate compute environments for each one can be frustrating and Data engineers can use Python efficiently and with impact
time-consuming. within the Snowflake platform

Snowpark is Snowflake’s developer framework that enables all Snowpark fits into the larger Snowflake data
data users to bring their work to the Snowflake Data Cloud with engineering ecosystem
native support for Python, SQL, Java, and Scala. With Snowpark,
We’ll also share resources designed to help data engineers get started
data engineers can execute pipelines that feed ML models and
with Snowflake and Snowpark.
applications faster and more securely in a single platform using their
language of choice.

3
CHAMPION GUIDES
SNOWPARK FOR
DATA ENGINEERING
Snowpark, a modern developer framework A SINGLE PLATFORM CUSTOMER SPOTLIGHT
for Snowflake, allows data engineers to build Architecture complexity increases significantly
simple, governed, and fast pipelines in their when different teams use different languages across
Snowflake customer, HyperFinity, a no-code decision
preferred programming languages. Snowpark multiple processing engines. Snowpark streamlines
intelligence platform for retailers and CPGs, uses
offers data engineers many benefits: architectures by natively supporting programming
SQL and Python for their ML and AI initiatives.
languages of choice, without the need for separate
• A single platform that supports multiple With Snowpark, HyperFinity has a single platform
processing engines, as shown in Figure 1. Instead,
languages, including SQL, Python, Java, that supports both languages, thereby eliminating
Snowpark brings all teams together to collaborate on
and Scala the same data in a single platform—Snowflake.
cumbersome data movement and code developed
to keep data movement across different services.
• Consistent security across all workloads with As a result, HyperFinity works more seamlessly—
no governance trade-offs
developing, testing, and deploying Python and SQL in
• Faster, cheaper, and more resilient pipelines. one environment for more agile overall operations.

SUPPORTED LANGUAGES IN SNOWPARK

Figure 1: Snowpark allows developers working in many languages to leverage the power of Snowflake.
4
CHAMPION GUIDES
NO GOVERNANCE TRADE-OFFS FASTER, CHEAPER PIPELINES SNOWFLAKE (AND SNOWPARK)
FOR DATA ENGINEERING
Enterprise-grade governance controls and security Snowpark enables pipelines with better price
are built into Snowflake. For example, Snowpark is performance, transparent costs, and less operational Snowpark is a powerful developer framework for
secure with a design that isolates data to protect overhead thanks to Snowflake’s unique multi-cluster data engineering. Some of the critical use cases for
the network and host from malicious workloads, shared data architecture. Snowflake is a single, data engineers working in Snowpark include:
and gives administrators control over the libraries integrated platform that delivers the performance, • ETL/ELT: Data teams can use Snowpark to
developers execute. Developers can build scale, elasticity, and concurrency today’s transform raw data into modeled formats
confidently, knowing data security and compliance organizations require. regardless of type, including JSON, Parquet,
measures are consistent and built-in. and XML. All data transformations can then be
CUSTOMER SPOTLIGHT packaged as Snowpark stored procedures to
CUSTOMER SPOTLIGHT operate and schedule jobs with Snowflake Tasks
or other orchestration tools.
These benefits can be seen at IQVIA, a leading
• Custom logic: Users can leverage Snowpark’s
provider of analytics, technology solutions, and
EDF, a supplier of gas and zero-carbon electricity User Defined Functions (UDFs) to streamline
clinical research services in the life sciences
to homes and businesses in the United Kingdom, architecture with complex data processing and
industry, and a Snowflake customer. As IQVIA
tapped Snowpark to help deploy data applications. custom business logic written in Python or Java
processed increasingly large volumes of structured,
By working within Snowflake, the project did not in the same platform running SQL queries and
semistructured, and unstructured data, the company
require additional sign-offs and meetings to approve transformations. There are no separate clusters to
had to manage mounting complexity as its
data accessibility. Instead, the EDF team could scale manage, scale, or operate.
business scaled.
seamlessly by working within the security rules • Data science and ML pipelines: Data teams can
Snowflake enables and that applied to the project. Since implementing Snowpark in Snowflake,
use the integrated Anaconda repository and
IQVIA has developed data engineering pipelines
Since integrating Snowpark into its data engineering package manager to collaborate in bringing ML
and intelligent apps more quickly and easily, with
operations, EDF sped up production of customer- data pipelines to production. Trained ML models
consistent enterprise-level governance features
facing, ML-driven programs, from several months to can also be packaged as a UDF to run the model
such as row-level access, data masking, and closer
just three to four weeks, increasing output up by 4x. inference close to data, enabling faster paths from
proximity of data to processing. By leveraging
model development to production.
Snowpark to build their pipelines that process large
volumes of data, IQVIA has realized a cost savings of
3x compared to previous pipeline processes.

5
CHAMPION GUIDES
SNOWFLAKE FOR PYTHON
Using Snowpark for Python, data engineers SNOWPARK FOR PYTHON ARCHITECTURE
can take advantage of familiar tools and
programming languages while benefiting from
the scale, security, and performance of the
Snowflake engine. All processing is run in a
secure Python sandbox right next to your data,
resulting in faster, more scalable pipelines with
built-in governance regardless of the language
used. Figure 2 gives an overview of both the
Snowpark Client API and the Snowflake server
side runtime.

The diagram illustrates an overview of the

Snowpark for Python architecture, which consists
of DataFrames, UDFs, and stored procedures that
can be developed from any client IDE or notebook.
Their execution can all be pushed down to Snowflake
to benefit from the performance, elasticity, and
governance of the Snowflake processing engine.
Depending on what you develop, how the code
is executed in Snowflake varies. First, there are
DataFrame operations. Think of these as your
transformations and operations on data such as
filters, aggregations, joins, and other similar
operations. These DataFrame operations are
converted into SQL to leverage the proven Figure 2: Snowpark DataFrames and Python functions work seamlessly together.
performance of Snowflake to distribute and
scale the processing of that data.

6
CHAMPION GUIDES
For custom Python or Java code, there is no SNOWPARK DATAFRAME SNOWFLAKE SERVER-SIDE RUNTIME
translation to SQL. Rather the code is serialized and Snowpark brings deeply integrated, DataFrame-style Snowflake is cloud-built as a data platform that
sent to Snowflake to be processed inside the programming to the languages that data engineers architecturally separates but logically integrates
Java or Python Secure Sandbox. In the case of prefer to use. Data engineers can build queries in storage and compute, and optimized to enable near-
Python, if the custom code includes any third-party Snowpark using DataFrame-style programming limitless amounts of these resources. Elastic scaling,
open source libraries available in the integrated in Python, using their IDE or development tool of multi-language processing, and unified governance
Anaconda package repository, the package manager choice. Behind the scenes, all DataFrame operations also underpin Snowflake’s architecture.
can help ensure code runs without complex are transparently converted into SQL queries that are
environment management. The intelligent infrastructure is what makes
pushed down to the Snowflake scalable processing
everything just work. Compute clusters can be
engine. Because DataFrames use first-class language
SNOWPARK CLIENT API started, stopped, or resized—automatically or on
constructs, engineers also benefit from support for
the fly—accommodating the need for more or less
The Snowpark Client API is open source and type checking, IntelliSense, and error reporting in
compute resources at any time. Along with flexibility,
works with any Python environment. It allows data their development environment.
Snowflake prioritizes speed, granting near-instant
engineers to build queries using DataFrames right in
access to dedicated compute clusters for each
their Python code, without having to create and pass
workload, so users can take advantage of near-
along SQL strings.
limitless concurrency without degrading performance.
The three architectural layers that integrate within
Snowflake’s single platform are shown in Figure 3.
THE SNOWFLAKE PLATFORM ARCHITECTURE The Snowpark Python server-side runtime makes
it possible to write Python UDFs and Stored
Procedures that are deployed into Snowflake’s
secured Python sandbox. UDFs and stored
procedures are two other key components of
Snowpark that allow data engineers to bring custom
Python logic to Snowflake’s compute engine, while
taking advantage of open source packages pre-
installed in Snowpark.

Figure 3: Snowflake’s single platform integrates three unique architectural layers.

7
CHAMPION GUIDES
SNOWPARK USER DEFINED FUNCTIONS (UDFS) #Given geo-coordinates, UDF to calculate distance between
distribution center and shipping locations
Custom logic written in Python runs directly in
Snowflake using UDFs. Functions can stand alone or
be called as part of a DataFrame operation to process from snowflake.snowpark.functions import udf
the data. Snowpark takes care of serializing the
import geopandas as gpd
custom code into Python byte code and pushes all
of the logic to Snowflake, so it runs next to the data. from shapely.geometry import Point

To host the code, Snowpark has a secure, sandboxed

Python runtime built right into the Snowflake engine. @udf(packages=[‘geopandas’])
Python UDFs scale out processing associated with
def calculate_distance(lat1: float, long1: float, lat2: float, long2:
the underlying Python code, which occurs in parallel
float)-> float:
across all threads and nodes, and comprising the
virtual warehouse on which the function is executing. points_df = gpd.GeoDataFrame({‘geometry’: [Point(long1, lat1),
Point(long2, lat2)]}, crs=‘EPSG:4326’).to_crs(‘EPSG:3310’)
There are several types of UDFs that data engineers
can use in Snowpark, including: return points_df.distance(points_df.shift()).iloc[1]

• Scalar UDFs: Operate on each row in isolation

and produce a single result # Call function on dataframe containing location coordinates

• Vectorized UDFs: Receive batches of input rows distance_df = loc_df.select(loc_df.sale_id, loc_df.distribution_

as Pandas DataFrames and return batches of center_address, loc_df.shipping_address, \
results as Pandas arrays or series calculate_distance(loc_df.distribution_center_lat, loc_
• User-Defined Table Functions: Return multiple df.distribution_center_lng, loc_df.shipping_lat, loc_df.shipping_
rows for each input row, return a single result for lng) \
a group of rows, or maintain state across .alias(‘distribution_center_to_shipping_distance’))
multiple rows
To the right is an example of a Snowpark UDF used
to calculate the distance between a distribution
center and shipping locations.

8
CHAMPION GUIDES
STORED PROCEDURES -- Create python stored procedure to host and run the snowpark pipeline
to calculate and apply bonuses
Snowpark stored procedures help data engineers
operationalize their Python code and run, create or replace procedure apply_bonuses(sales_table string, bonus_table
orchestrate, and schedule their pipelines. A stored string)
procedure is created once and can be executed returns string
many times with a simple CALL statement in your
language python
orchestration or automation tools. Snowflake
supports stored procedures in SQL, Python, Java, runtime_version = ‘3.8’

Javascript, and Scala so data engineers can easily packages = (‘snowflake-snowpark-python’)

create polyglot pipelines. handler = ‘apply_bonuses’
To use a stored procedure, developers can use the AS
sproc() function in Snowpark in the Client API to
$$
bundle the Python function and have Snowpark
deploy it on the server side. Snowpark will serialize from snowflake.snowpark.functions import udf, col

the Python code and dependencies into bytecode from snowflake.snowpark.types import *
and store them in a Snowflake stage automatically.
They can be created either as a temporary (session-
def apply_bonuses(session, sales_table, bonus_table):
level) or permanent object in Snowflake.
session.table(sales_table).select(col(“rep_id”), col(“sales_amount”)*0.1).
Stored procedures are single-node, which means
write.save_as_table(bonus_table)
transformations or analysis of data at scale inside a
stored procedure should leverage the Client API or return “SUCCESS”
other deployed UDFs to scale compute across all $$;
nodes of a compute cluster.
To the right is a simple example of how to --Call stored procedure to apply bonuses
operationalize a Snowpark for Python pipeline that
call apply_bonuses(‘wholesale_sales’,‘bonuses’);
calculates and applies a company’s sales bonuses on
a daily basis.
– Query bonuses table to see newly applied bonuses

select * from bonuses;

– Create a task to run the pipeline on a daily basis

create or replace task bonus_task

warehouse = ‘xs’

schedule = ‘1440 minute’

call apply_bonuses(‘wholesale_sales’,‘bonuses’);

9
CHAMPION GUIDES
SNOWFLAKE, ANACONDA, AND pre-installed from the Anaconda repository, including write transformations in the language they find most
THE OPEN SOURCE ECOSYSTEM fuzzy wuzzy for string matching, h3 for geospatial familiar and fit for purpose. And dbt on Snowpark
One of the benefits of Python is its rich ecosystem of analysis, and scikit-learn for machine learning and allows analyses using tools available in the open
open-source packages and libraries. In recent years, predictive data analysis. Additionally, Snowpark is source Python ecosystem, including state-of-the-
open-source packages have been one of the biggest integrated with the Conda package manager so users art packages for data engineering and data science,
enablers for faster and easier data engineering. To can avoid dealing with broken Python environments all within the dbt framework familiar to many SQL
leverage open-source innovation, Snowpark has because of missing dependencies. users. It supports a SQL-first workflow, and in 2022,
partnered with Anaconda for a product integration Using open-source packages in Snowflake is as dbt introduced Python as a second language running
without any additional cost or licensing to the user simple as the code below, which demonstrates how Snowpark under the hood to perform analyses using
beyond warehouse usage. users can call packages such as NumPy, XGBoost, tools available in the open-source Python ecosystem.

Data engineers in Snowflake are now able to speed and Pandas, directly from Snowpark.
up their Python-based pipelines by taking advantage Snowpark also fully supports dbt, one of the most
of the seamless dependency management and popular solutions for data transformation today. It
comprehensive set of curated open-source packages supports a SQL-first transformation workflow, and
provided by Anaconda—all without moving or copying in 2022, dbt introduced support for Python. With
the data. All Snowpark users can benefit from dbt’s support for both SQL and Python, users can
thousands of the most popular packages that are

-- Returns an array of the package versions of NumPy, Pandas, and XGboost

create or replace function py_udf()

returns array

language python

runtime_version = 3.8

packages = (‘numpy’,’pandas==1.4.*’,’xgboost==1.5.0’)

handler = ‘udf’

as $$

import numpy as np

import pandas as pd

import xgboost as xgb

def udf():

return [np.version, pd.version, xgb.version]

$$;

10
CHAMPION GUIDES
BEST PRACTICES:
DATA ENGINEERING IN
SNOWPARK WITH PYTHON
As the Snowpark for Python developer 1. Maximize use of the Snowpark client for please submit feedback through the Snowflake
community grows rapidly, data engineers are development and the Snowflake engine for Community so Snowflake teams can facilitate
looking for “best practices” to guide their work. secure execution. its integration. If the package is a pure Python
Snowpark can be used with your preferred IDE package, you can unblock yourself and bring in
Understanding how Snowpark DataFrames,
and development and debugging tools, and the the package via Stages.
UDFs, and stored procedures work together
execution can be transparently pushed down 3. Use vectorized UDFs for feature transformations
can make data engineers’ work in Snowflake
to Snowflake. Maximize this utility while being and ML scoring.
more efficient and secure. We’ve compiled a
mindful of the use of to_pandas() from the
short list of best practices for data engineers Vectorized UDFs using the Batch API can
Snowpark Client, which brings full data into
execute scalar UDFs in batches. Use the
working with Python in Snowpark. memory. Also, Cachetools is a Python library that
Python UDF Batch API if leveraging third-party
provides a collection of caching algorithms to
Python packages where transformations are
store a limited number of items for a specified
independently done row by row and the process
duration. They can be used to speed up UDFs
could be efficiently scaled out by processing
and stored procedures by ensuring the logic is
rows in batches. This is a common scenario when
cached in memory in cases of repeated reads.
using third-party Python packages to do machine
2. Accelerate development to production flow with learning-specific transformations on data as part
Anaconda integration. of feature engineering or when executing ML
We recommend using the Snowflake Anaconda batch inference.
channel for local development to ensure 4. Use Snowpark-optimized warehouses for
compatibility between client- and server-side memory-intensive workloads.
operations. Building your code using the latest
Snowpark-optimized warehouses are important
stable versions of third-party packages doesn’t
for data engineers working on large data sets.
require users to specify dependencies because
Consider using a Snowpark-optimized warehouse
the Conda Package Manager takes care of this,
when you run into a 100357 (P0000): UDF
offering tremendous peace of mind. If a desired available memory exhausted error during
package is not available inside Snowflake, development. Avoid mixing other workloads
with workloads that require Snowpark-optimized
warehouses. If you must mix them, consider
calling the session.use_warehouse() method
to switch back to standard warehouses.

11
CHAMPION GUIDES
BEYOND SNOWPARK:
OTHER CAPABILITIES IN THE SNOWFLAKE
DATA ENGINEERING ECOSYSTEM
In addition to Snowpark, Snowflake has Snowflake is constantly enhancing functionality. Snowflake was designed to give data engineers
many other data engineering capabilities Dynamic tables, which provide a way to build access to all data at speed with performance and
that make it a fast and flexible platform that declarative pipelines, are currently in private review reliability at scale to build radically simple data
and offer a different approach to building pipelines pipelines. With innovative pipeline automation and
comprehensively supports simple, reliable data
from Snowpark for Python UDFs and Stored data programmability, data engineers can simplify their
pipelines in any language of choice. Procedures. These tools are designed to automatically workflows and eliminate what’s unnecessary, so they
Figure 4 offers an overview of Snowflake’s advanced process data incrementally as it changes to simplify can focus their effort on their most impactful work.
functionality for ingestion, transformation, and data engineering workloads. Snowflake automates all
delivery that simplify data engineering. the database objects and data manipulation language
management, enabling data engineers to easily build
Snowflake allows data engineering teams to ingest scalable, performant, and cost-effective data pipelines.
all types of data using a single platform, including
streaming or batch and structured, semi-structured, The resulting data pipelines have intelligent
or unstructured. Supported data formats include infrastructure, pipeline automation, and data
JSON, XML, Avro, Parquet, ORC, and Iceberg. programmability. Snowflake’s simplified pipelines
Streaming data, including streams from Apache Kafka then power analytics, applications, and ML models
topics, can also be ingested directly to a Snowflake with only one copy of data to manage and near-zero
table with Snowpipe Streaming, currently in public maintenance. Data can also be accessed and shared
preview. Thanks to the Data Cloud, all this data can directly using secure data sharing capabilities with
be accessed and shared across providers and between internal teams, customers, partners, and even more
internal teams, customers, partners, and other data data providers and consumers through the Snowflake
consumers via the Snowflake Marketplace. Marketplace. Data doesn’t move with Snowflake’s
modern data sharing technology. Instead a data
Data can be transformed using the data engineer’s provider grants a data consumer near-instant access
language of choice using Snowpark. Tasks can be to live, read-only copies of the data. This approach
combined with table streams for continuous ELT reduces latency, removes the need to copy and move
workflows to process recently changed table rows. stale data, and dramatically reduces the governance
Tasks are easily chained together for successive challenges of managing multiple copies of
execution to support more complex periodic the same data.
processing. All of this can be done fast, and scaled to
meet the evolving number of users, data, and jobs of
complex projects.

12
CHAMPION GUIDES
DATA ENGINEERING WITH SNOWFLAKE

Figure 4: Snowflake supports ingestion of unstructured, semi-structured, and structured data while automated workflows facilitate transformation and delivery.

13
CHAMPION GUIDES
GETTING STARTED
WITH SNOWPARK
To develop and deploy code with Snowpark, • Snowsight worksheets: Snowsight is Snowflake’s • Partner integrated solutions: Many Snowpark
developers have always had the flexibility web interface that provides SQL and Python Accelerated partners offer either hosted open
to work from their favorite integrated (currently in public preview) support in a unified, source notebooks or their own integrated
easy-to-use experience. These worksheets experiences. Their solutions include out-of-the-
development environment (IDE) or notebook.
provide autocomplete for the Snowpark session box Snowpark APIs preinstalled and offer secure
Data engineers can easily get started in Snowpark, and can run directly from the browser as a stored data connections. These deeply integrated
beginning development anywhere that can run procedure. Snowsight is a good option for teams experiences speed up the building and deploying
a Python kernel. Minimizing learning curves by looking for a zero-install editor for writing and of pipelines, models, and apps. More information
eliminating the need for a new tool, data engineers running Snowpark and quickly turning that code on partner integrations can be found on the
simply install the Snowpark Client API and establish a into Stored Procedures that can be orchestrated Snowpark Accelerated page.
connection to their Snowflake account. as part of an automated pipeline.

Snowpark aims to give developers flexibility. It • Open-source notebook solutions: One popular RESOURCES
supports many development interfaces, including: option for building pipelines in Snowpark is to
Start harnessing the power of Snowflake with
leverage notebooks. Notebooks enable rapid
• Code editors and IDEs: Many data engineers Snowpark for data engineering, and get started with
experimentation using cells. With Snowpark, you
prefer to build using code editors and IDEs. the resources below:
can run a variety of notebook solutions such as
These offer capabilities such as local debugging,
Jupyter Notebooks, which can be run locally while
autocomplete, and integration with source
connected securely to Snowflake to execute data FREE TRIAL
control. Snowpark works well in VS Code,
operations. Any machine running containers or
IntelliJ, PyCharm, and other tools. VS Code
Python can build and execute Snowpark pipelines.
QUICKSTART
works with a Jupyter extension that provides
A similar approach can be used for working with
a notebook experience within the editor,
Snowpark in other notebook solutions, including DEVELOPER DOCUMENTATION
bringing in breakpoints and debugging to the
Apache Zeppelin. Open source notebook
notebook experience, without requiring separate
solutions are a great choice for data exploration. MEDIUM BLOG
management of the Jupyter container or runtime.
Code editors and IDEs are a great choice for rich
SNOWFLAKE FORUMS
development and testing experience for
building pipelines.

14
ABOUT SNOWFLAKE
Snowflake enables every organization to mobilize their data with Snowflake’s Data Cloud. Customers use the Data Cloud to unite siloed data,
discover and securely share data, and execute diverse analytic workloads. Wherever data or users live, Snowflake delivers a single data experience
that spans multiple clouds and geographies. Thousands of customers across many industries, including 573 of the 2022 Forbes Global 2000
(G2K) as of January 31, 2023, use Snowflake Data Cloud to power their businesses.

Learn more at snowflake.com

© 2023 Snowflake Inc. All rights reserved. Snowflake, the Snowflake logo, and all other Snowflake product, feature and service names mentioned herein
are registered trademarks or trademarks of Snowflake Inc. in the United States and other countries. All other brand names or logos mentioned or used
herein are for identification purposes only and may be the trademarks of their respective holder(s). Snowflake may not be associated with, or be
sponsored or endorsed by, any such holder(s).

CITATIONS
1
https://insights.stackoverflow.com/survey

Snowflake For: Data Engineering
No ratings yet
Snowflake For: Data Engineering
15 pages
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
Solutions Partner Technical Onboarding Guide
100% (1)
Solutions Partner Technical Onboarding Guide
27 pages
The Data Engineers Guide To Python For Snowflake
No ratings yet
The Data Engineers Guide To Python For Snowflake
15 pages
Kuchh To Hua Hai
No ratings yet
Kuchh To Hua Hai
12 pages
An Introduction To Snowflake - SQLKonferenz
No ratings yet
An Introduction To Snowflake - SQLKonferenz
29 pages
Snowflake Vs Data Bricks
No ratings yet
Snowflake Vs Data Bricks
10 pages
Tecnical Seminar
No ratings yet
Tecnical Seminar
16 pages
Setting Up Your Development Environment For Snowpark Python - Snowflake Documentation
No ratings yet
Setting Up Your Development Environment For Snowpark Python - Snowflake Documentation
1 page
Best Practices For Optimizing Your DBT and Snowflake Deployment
100% (1)
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
What Is Snowflake - 1
No ratings yet
What Is Snowflake - 1
91 pages
Working With DataFrames in Snowpark Python - Snowflake Documentation
No ratings yet
Working With DataFrames in Snowpark Python - Snowflake Documentation
1 page
A Practitioners Guide To Databricks Vs Snowflake
No ratings yet
A Practitioners Guide To Databricks Vs Snowflake
8 pages
What Is Snowflake
No ratings yet
What Is Snowflake
34 pages
GCP Snowflake
No ratings yet
GCP Snowflake
83 pages
Snowflake Material
No ratings yet
Snowflake Material
2,157 pages
Oracle to Snowflake Migration Guide
No ratings yet
Oracle to Snowflake Migration Guide
11 pages
Data Brick
No ratings yet
Data Brick
6 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
63 pages
Snowflake For Data Applications
No ratings yet
Snowflake For Data Applications
2 pages
DB For Data Engineering Solution Sheet
No ratings yet
DB For Data Engineering Solution Sheet
2 pages
Snowflake 101 - For Data Architects - LinkedIn
No ratings yet
Snowflake 101 - For Data Architects - LinkedIn
17 pages
Simplifying Data Engineering Databricks
100% (1)
Simplifying Data Engineering Databricks
20 pages
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
No ratings yet
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
19 pages
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
No ratings yet
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
172 pages
Data Engineering - Behind The Scene of Data by Hoda Ragaie
No ratings yet
Data Engineering - Behind The Scene of Data by Hoda Ragaie
44 pages
Snowflake Training Slide SANMs
71% (7)
Snowflake Training Slide SANMs
218 pages
Snowflake Vs Databricks Platform Comparison
No ratings yet
Snowflake Vs Databricks Platform Comparison
2 pages
Snowflake To Oracle
No ratings yet
Snowflake To Oracle
16 pages
Snowpro™ Advanced: Data Engineer: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Data Engineer: Exam Study Guide
16 pages
Python and Pyaprk
No ratings yet
Python and Pyaprk
4 pages
Why The Company Was Called Snowflake?: Founders
0% (1)
Why The Company Was Called Snowflake?: Founders
15 pages
Steam Engine in The Cloud: Snow Ake Has Raised $3.5bn in A Record Software Listing. Now What? Rolling in IT
No ratings yet
Steam Engine in The Cloud: Snow Ake Has Raised $3.5bn in A Record Software Listing. Now What? Rolling in IT
1 page
Modern Data Platform Requirements
No ratings yet
Modern Data Platform Requirements
10 pages
2 - Snowflake de Feb25
No ratings yet
2 - Snowflake de Feb25
90 pages
Writing Snowpark Code in Python Worksheets - Snowflake Documentation
No ratings yet
Writing Snowpark Code in Python Worksheets - Snowflake Documentation
1 page
Snowflake Elastic Data Warehouse
No ratings yet
Snowflake Elastic Data Warehouse
2 pages
How To Knit Your Data Mesh On Snowflake-2
No ratings yet
How To Knit Your Data Mesh On Snowflake-2
14 pages
Snowflake Fast Facts Sheet
No ratings yet
Snowflake Fast Facts Sheet
2 pages
7 Snowflake Reference Architectures For Application Builders
No ratings yet
7 Snowflake Reference Architectures For Application Builders
13 pages
Snowflake
No ratings yet
Snowflake
73 pages
Presentation 1
No ratings yet
Presentation 1
57 pages
Python
No ratings yet
Python
23 pages
The Ultimate Guide to Snowpark: Design and deploy Snowflake Snowpark with Python for efficient data workloads
From Everand
The Ultimate Guide to Snowpark: Design and deploy Snowflake Snowpark with Python for efficient data workloads
Shankar Narayanan SGS
No ratings yet
Intermediate Vulkan Programming- Building 3D Graphics: Vulcan Fundamentals, #2
From Everand
Intermediate Vulkan Programming- Building 3D Graphics: Vulcan Fundamentals, #2
Kameron Hussain
No ratings yet
Snowflake - Shwetank Singh
No ratings yet
Snowflake - Shwetank Singh
3 pages
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
No ratings yet
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
50 pages
7 Snowflake Reference Architectures For Application Builders
No ratings yet
7 Snowflake Reference Architectures For Application Builders
13 pages
Snowflake
No ratings yet
Snowflake
58 pages
Python in Industrial Environment
No ratings yet
Python in Industrial Environment
9 pages
Mithun Snowflake
No ratings yet
Mithun Snowflake
3 pages
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
13 pages
Databricks Competitive Positioning August 2022
No ratings yet
Databricks Competitive Positioning August 2022
50 pages
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Development with Neovim: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with Neovim: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn Java from Scratch: A Practical Guide with Examples
From Everand
Learn Java from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Snowflake Fast Facts Sheet
No ratings yet
Snowflake Fast Facts Sheet
2 pages
Implementing Cloud Storage with OpenStack Swift
From Everand
Implementing Cloud Storage with OpenStack Swift
Amar Kapadia
No ratings yet
Snowflake Overview
No ratings yet
Snowflake Overview
8 pages
Research Methods: Arif Ur Rahman
No ratings yet
Research Methods: Arif Ur Rahman
29 pages
DSE212 4 Statement
No ratings yet
DSE212 4 Statement
2 pages
Republic of The Philippines Regional Trial Court: Petition For Declaration of Absolute Nullity of Marriage
No ratings yet
Republic of The Philippines Regional Trial Court: Petition For Declaration of Absolute Nullity of Marriage
12 pages
Quiz 14
No ratings yet
Quiz 14
7 pages
18.2 MEMO - Evolution 1pager 2 0f 3
No ratings yet
18.2 MEMO - Evolution 1pager 2 0f 3
1 page
Writing Formulas For Ionic Compounds
No ratings yet
Writing Formulas For Ionic Compounds
13 pages
Eaton Fs-4205a
No ratings yet
Eaton Fs-4205a
18 pages
New Living Expo 2011 Program Guide Lowres
No ratings yet
New Living Expo 2011 Program Guide Lowres
60 pages
Comparison Chart: DNA, or Deoxyribonucleic Acid, Is Like A Blueprint of Biological
No ratings yet
Comparison Chart: DNA, or Deoxyribonucleic Acid, Is Like A Blueprint of Biological
5 pages
BMC Control-M Usage Reporting Instructions: Help Video
No ratings yet
BMC Control-M Usage Reporting Instructions: Help Video
3 pages
Lycoming Recommended Time Between Overhaul-Master
No ratings yet
Lycoming Recommended Time Between Overhaul-Master
4 pages
Forensic Readiness Danny
No ratings yet
Forensic Readiness Danny
7 pages
Bai Tap Tieng Anh Lop 8 Unit 5
No ratings yet
Bai Tap Tieng Anh Lop 8 Unit 5
4 pages
Urban Settlement Definations
No ratings yet
Urban Settlement Definations
24 pages
Initial Page
No ratings yet
Initial Page
12 pages
Alliance Formation in Civil Wars
No ratings yet
Alliance Formation in Civil Wars
362 pages
Introducing The: The Sedan For The
No ratings yet
Introducing The: The Sedan For The
4 pages
English-Medium-Instruction Management - The Missing Piece in The Internationalisation Puzzle of Vietnamese Higher Education
No ratings yet
English-Medium-Instruction Management - The Missing Piece in The Internationalisation Puzzle of Vietnamese Higher Education
19 pages
Assignment On The Formation of Agency and Termination
No ratings yet
Assignment On The Formation of Agency and Termination
14 pages
Granulation Collette Mixer
No ratings yet
Granulation Collette Mixer
16 pages
1 Intro DSU
No ratings yet
1 Intro DSU
23 pages
Assembly Tables Project Plan
No ratings yet
Assembly Tables Project Plan
8 pages
Cvek Pulpotomy: Report of A Case With Five-Year
No ratings yet
Cvek Pulpotomy: Report of A Case With Five-Year
4 pages
International Case Study - Ebay
No ratings yet
International Case Study - Ebay
39 pages
Kingfisher School of Business and Finance Collegiate Calendar For SY 2020-2021
No ratings yet
Kingfisher School of Business and Finance Collegiate Calendar For SY 2020-2021
1 page
DX Diag
No ratings yet
DX Diag
39 pages
Class 12 - Chemistry Sample Paper 2
No ratings yet
Class 12 - Chemistry Sample Paper 2
10 pages
Dec 09
No ratings yet
Dec 09
24 pages
Faxing A Purchase Order
No ratings yet
Faxing A Purchase Order
15 pages
Fundamentals of Data Analytics (In-Person) Course Outline
No ratings yet
Fundamentals of Data Analytics (In-Person) Course Outline
3 pages

Data Engineers Guide To Python On Snowflake

Uploaded by

Data Engineers Guide To Python On Snowflake

Uploaded by

THE DATA ENGINEER’S GUIDE

TO PYTHON FOR SNOWFLAKE EBOOK

SUPPORTED LANGUAGES IN SNOWPARK

The diagram illustrates an overview of the

Figure 3: Snowflake’s single platform integrates three unique architectural layers.

To host the code, Snowpark has a secure, sandboxed

• Scalar UDFs: Operate on each row in isolation

• Vectorized UDFs: Receive batches of input rows distance_df = loc_df.select(loc_df.sale_id, loc_df.distribution_

Javascript, and Scala so data engineers can easily packages = (‘snowflake-snowpark-python’)

select * from bonuses;

– Create a task to run the pipeline on a daily basis

create or replace task bonus_task

schedule = ‘1440 minute’

-- Returns an array of the package versions of NumPy, Pandas, and XGboost

create or replace function py_udf()

import xgboost as xgb

return [np.__version__, pd.__version__, xgb.__version__]

Learn more at snowflake.com

You might also like

return [np.version, pd.version, xgb.version]