0% found this document useful (0 votes)
8 views73 pages

Snowflake

Snowflake is a cloud data warehouse that integrates data warehousing, data lakes, and data marts into a single platform, simplifying data architecture and reducing costs. It offers full SQL support, a managed service model, and features like real-time data sharing, automated ingestion, and zero-copy cloning for efficient data management. Snowflake's architecture supports scalable, concurrent access and provides robust tools for data organization, loading, and transformation.

Uploaded by

rish577577
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views73 pages

Snowflake

Snowflake is a cloud data warehouse that integrates data warehousing, data lakes, and data marts into a single platform, simplifying data architecture and reducing costs. It offers full SQL support, a managed service model, and features like real-time data sharing, automated ingestion, and zero-copy cloning for efficient data management. Snowflake's architecture supports scalable, concurrent access and provides robust tools for data organization, loading, and transformation.

Uploaded by

rish577577
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Snowflake

Overview
• Snowflake is a next-generation cloud data warehouse designed to address modern data and
analytics challenges. Its vision is to allow customers to access all their data in one place,
enabling actionable insights, anytime, anywhere, with any number of users.

• Core features include - being fully SQL-compatible so anyone with SQL skills can work with it,
being built exclusively for the cloud, and offering a fully managed service model that relieves
users of infrastructure setup and management.

• Snowflake simplifies data architecture, integrating data warehousing, data lakes, and data
marts under one platform, thus consolidating multiple legacy tools and reducing failure points
and costs.
Legacy Data Landscape vs.
Snowflake
Legacy Architecture Complexity

Traditional data pipelines required multiple tools


and technologies—ETL tools for extracting and
transforming data, separate platforms for
structured and unstructured data, and independent
systems for data warehouses and data marts. This
setup often led to increased cost, complexity, and
multiple points of failure.

Snowflake's Unified Platform

Snowflake unifies the roles of data warehouse,


data lake, and data mart, and can also handle ELT
processes. Businesses can consolidate all these
capabilities in a single, managed platform, which
streamlines pipelines, reduces costs, and enhances
reliability.
Key Features of
Snowflake
Full SQL Support Self-Tuning & Managed Service

Snowflake supports standard SQL, making it The platform automatically manages performance
accessible to professionals with existing SQL optimization, query tuning, and infrastructure
knowledge. Its familiar structure—databases, scalability. Users do not need to handle hardware
schemas, tables, and views—accelerates or database tuning, allowing teams to focus on
onboarding and productivity. analytics.

All Data Types and Users Pay-Per-Use & Live Data Sharing

Snowflake stores structured, semi-structured, Snowflake employs a credit-based, pay-per-


and unstructured data. It serves multiple use billing model. It enables real-time data
departments concurrently, including sharing and ensures all users see updated,
analytics, reporting, data exploration, and consistent data, supporting collaborative
machine learning teams. workflows and eliminating data silos.
Snowflake
Editions
1. Standard 2. Premium

Includes complete SQL data warehouse Adds 24x7 support, faster response times,
features, 1-day time travel, customer- and SLAs with refunds for outages—ideal
dedicated warehouses, and database for use cases requiring higher support
replication. Designed for core BI and guarantees.
analytics requirements.

3. Enterprise & Business Critical 4. VPC Edition

Enterprise adds multi-cluster warehousing, up Runs on dedicated virtual private cloud


to 90-day time travel, materialized view infrastructure with additional operational
support, and AWS PrivateLink. Business visibility and security for organizations
Critical adds HIPAA/PCI compliance, enhanced with the most stringent requirements.
encryption, security, and failover for business
continuity.
Snowflake Architecture: Multi-Cluster
Shared Data
Hybrid Benefits Key Layers

Snowflake combines elements of shared-disk and The architecture includes three main layers:
shared-nothing architectures, using centralized • Cloud Services (the 'brain' for
storage with independent compute clusters. This management, security, and optimization)
enables scalable, concurrent access without • Storage Layer (using micro-partitioning
bottlenecks. for efficient access and time travel)
• Compute Layer (virtual warehouses for
isolated, scalable compute resources).
Snowflake UI & User
Experience

1 2 3

Getting Started Navigation & Objects Roles and Security

Users can set up a free trial, The UI provides easy access to Access is role-based, with fine-grained
access the platform through a databases, schemas, tables, permissions assigned to users.
web portal, and quickly start roles, users, warehouses, and System-defined roles (account admin,
working with SQL or Python monitoring tools. Users can security admin, etc.) can be extended
worksheets. create and manage data and customized to suit organizational
objects through both UI and needs.
SQL commands.
Data Organization: Databases, Schemas,
and Tables
• Databases and schemas logically
organize data within a Snowflake
account.

• A database is a logical grouping of


schemas.

• Each database belongs to a single


account.

• Databases & Schemas


Each database belongs to a single
account
Databases are logical groupings for
schemas, which in turn contain various
objects like tables, views, and procedures.
Each database is tied to a single account,
and schemas organize objects for
different use cases or departments.
Table Types & Views in
Snowflake
Table Types

Snowflake supports permanent (full data protection), temporary (session-based), and transient
(persistent without full retention) tables. External tables reference data stored outside
Snowflake, such as in S3 or Azure.

Standard Views Secure Views Materialized Views


Virtual tables that execute a Enhanced privacy, only Store results in cache for fast,
query each time the view is accessible and visible to repeated access and auto-refresh
accessed, simplifying complex authorized users, hiding when base data changes,
queries and reducing underlying table structures and improving efficiency for
redundancy. data from unauthorized aggregation-heavy queries.
access.
Stages & File Formats

Stages for Data Ingestion


Stages are named locations (internal or external)
for temporarily storing data files before loading or
unloading. Table, user, and named stages support
different sharing, security, and access needs.

Configuring File Formats


File formats (CSV, JSON, Parquet, XML, ORC) define
how Snowflake reads and writes data to and from
stages. Easily configurable through UI or SQL,
ensuring proper parsing and compatibility.
Data Loading Approaches

Bulk Loading with Copy Command


Data can be loaded from local file systems, internal
stages, or external cloud storage (AWS S3, Azure
Blob, GCP). Snowflake’s copy command enables
parallel loading, efficient error handling, and
scalable ingestion.

Continuous Loading with Snowpipe


Snowpipe enables real-time, automated ingestion
from cloud storage. Through REST APIs or auto-
ingest integration with services like S3 event
notifications, new files are continuously loaded into
Snowflake tables.
External Stages and Snowpipe
Setup
External Stage Integration
Set up secure integration with AWS, Azure, or GCP
by defining storage integration objects, IAM roles,
and external stages. Ensure secure, scalable data
access for enterprise cloud environments.

Automated Ingestion with Event Notifications


Combine Snowpipe with cloud storage event
notifications to create end-to-end automated
pipelines. Uploaded files trigger ingestion, enabling
continuous real-time analytics without manual
intervention.
Streams: Change Data Capture in
Snowflake
Streams Overview
Streams are objects that track changes (inserts,
updates, deletes) on source tables, adding
metadata columns to enable change data capture
(CDC) without duplicating all data.

Types of Streams
Standard streams track all modifications, append-
only streams record only new rows, and insert-only
streams are limited to external tables. Useful for
ETL pipelines, incremental loading, and data
replication.
Using Streams in ETL
Streaming changes can be consumed by
downstream processes and target tables, enabling
just-in-time or batch data transformation and
movement, improving efficiency over querying
entire source tables.
Tasks: Scheduling and Automating
Processes

What Are Tasks?


Tasks enable scheduling and automation of SQL
queries or stored procedures. Tasks can use a user-
managed warehouse or Snowflake-managed
serverless compute.
Task Scheduling
Tasks run at defined intervals using simple time-
based expressions or cron notation. Automates
recurring ETL steps, table refreshes, and more.
Automating Data Movement
Move data from landing to target tables, refresh
materialized views, or trigger complex workflows
without manual intervention, improving data
pipeline robustness and reliability.
Time Travel & Fail Safe

Time Travel
Retrieve historical data or earlier states of tables,
schemas, or databases within a defined retention
period (up to 90 days for Enterprise). Allows
undoing accidental deletes, restores, or recovering
previous versions.

Fail Safe
After the time travel retention expires, fail safe
provides an additional seven-day window for
Snowflake support to recover lost data. Designed
for critical recovery scenarios and enterprise
resiliency.
Cloning: Zero-Copy Replication

• Snowflake offers zero-copy cloning, allowing instant replication of databases, schemas, or


tables for development, testing, or backup without additional storage costs.

• Both the original and the clone share the same underlying micro-partitioned storage,
differing only when new changes occur in the clone.

• This speeds up dev/test environment setup, enables rapid backups, and supports full data
recovery and sandboxing for analytics or code testing.

• Cloning simplifies what, in legacy systems, would require complex, resource-intensive copy
processes.

You might also like