Snowflake
Snowflake
Overview
• Snowflake is a next-generation cloud data warehouse designed to address modern data and
analytics challenges. Its vision is to allow customers to access all their data in one place,
enabling actionable insights, anytime, anywhere, with any number of users.
• Core features include - being fully SQL-compatible so anyone with SQL skills can work with it,
being built exclusively for the cloud, and offering a fully managed service model that relieves
users of infrastructure setup and management.
• Snowflake simplifies data architecture, integrating data warehousing, data lakes, and data
marts under one platform, thus consolidating multiple legacy tools and reducing failure points
and costs.
Legacy Data Landscape vs.
Snowflake
Legacy Architecture Complexity
Snowflake supports standard SQL, making it The platform automatically manages performance
accessible to professionals with existing SQL optimization, query tuning, and infrastructure
knowledge. Its familiar structure—databases, scalability. Users do not need to handle hardware
schemas, tables, and views—accelerates or database tuning, allowing teams to focus on
onboarding and productivity. analytics.
All Data Types and Users Pay-Per-Use & Live Data Sharing
Includes complete SQL data warehouse Adds 24x7 support, faster response times,
features, 1-day time travel, customer- and SLAs with refunds for outages—ideal
dedicated warehouses, and database for use cases requiring higher support
replication. Designed for core BI and guarantees.
analytics requirements.
Snowflake combines elements of shared-disk and The architecture includes three main layers:
shared-nothing architectures, using centralized • Cloud Services (the 'brain' for
storage with independent compute clusters. This management, security, and optimization)
enables scalable, concurrent access without • Storage Layer (using micro-partitioning
bottlenecks. for efficient access and time travel)
• Compute Layer (virtual warehouses for
isolated, scalable compute resources).
Snowflake UI & User
Experience
1 2 3
Users can set up a free trial, The UI provides easy access to Access is role-based, with fine-grained
access the platform through a databases, schemas, tables, permissions assigned to users.
web portal, and quickly start roles, users, warehouses, and System-defined roles (account admin,
working with SQL or Python monitoring tools. Users can security admin, etc.) can be extended
worksheets. create and manage data and customized to suit organizational
objects through both UI and needs.
SQL commands.
Data Organization: Databases, Schemas,
and Tables
• Databases and schemas logically
organize data within a Snowflake
account.
Snowflake supports permanent (full data protection), temporary (session-based), and transient
(persistent without full retention) tables. External tables reference data stored outside
Snowflake, such as in S3 or Azure.
Types of Streams
Standard streams track all modifications, append-
only streams record only new rows, and insert-only
streams are limited to external tables. Useful for
ETL pipelines, incremental loading, and data
replication.
Using Streams in ETL
Streaming changes can be consumed by
downstream processes and target tables, enabling
just-in-time or batch data transformation and
movement, improving efficiency over querying
entire source tables.
Tasks: Scheduling and Automating
Processes
Time Travel
Retrieve historical data or earlier states of tables,
schemas, or databases within a defined retention
period (up to 90 days for Enterprise). Allows
undoing accidental deletes, restores, or recovering
previous versions.
Fail Safe
After the time travel retention expires, fail safe
provides an additional seven-day window for
Snowflake support to recover lost data. Designed
for critical recovery scenarios and enterprise
resiliency.
Cloning: Zero-Copy Replication
• Both the original and the clone share the same underlying micro-partitioned storage,
differing only when new changes occur in the clone.
• This speeds up dev/test environment setup, enables rapid backups, and supports full data
recovery and sandboxing for analytics or code testing.
• Cloning simplifies what, in legacy systems, would require complex, resource-intensive copy
processes.