0% found this document useful (0 votes)
32 views2 pages

Data Validation in Snowflake

Data

Uploaded by

Goutham Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Data Validation in Snowflake

Data

Uploaded by

Goutham Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Data validation in Snowflake:

Example 1: Check for Null Values


Example 2: Ensure Unique Constraints
Example 3: Cross-Table Referential Integrity
Example 4: Check Data Ranges
Example 5: Validate Record Counts
Example 6: Validate Email Formats
Example 7: Calculate Data Completeness
Example 8: Schedule Data Quality Checks
-----------------------------------------------------
1. Basic Data Quality Checks
--------------------------------
Perform basic checks to validate data integrity and quality within tables.
Example 1: Check for Null Values
--------------------------------
-- Check for null values in critical columns
SELECT *
FROM my_table
WHERE column1 IS NULL OR column2 IS NULL;

Example 2: Ensure Unique Constraints


------------------------------------------
-- Check for duplicate records based on a unique constraint
SELECT duplicate_column, COUNT(*)
FROM my_table
GROUP BY duplicate_column
HAVING COUNT(*) > 1;

2. Cross-Table Data Consistency Checks


--------------------------------------------
Verify data consistency across related tables or datasets.
Example 3: Cross-Table Referential Integrity

-- Check for orphaned records in a child table


SELECT child_table.*
FROM child_table
LEFT JOIN parent_table ON child_table.parent_id = parent_table.parent_id
WHERE parent_table.parent_id IS NULL;

3. Data Completeness Checks


--------------------------
Ensure data completeness by verifying expected data ranges or counts.
Example 4: Check Data Ranges

-- Check date range validity


SELECT *
FROM sales_data
WHERE order_date < '2023-01-01' OR order_date > '2023-12-31';

Example 5: Validate Record Counts


---------------------------------
-- Validate expected record counts in a table
SELECT COUNT(*) AS actual_count, expected_count
FROM my_table
GROUP BY expected_count
HAVING COUNT(*) <> expected_count;

4. Data Consistency and Format Checks


---------------------------------------
Verify data consistency and format adherence using regular expressions or pattern
matching.

Example 6: Validate Email Formats

-- Validate email formats using regular expressions


SELECT email_column
FROM my_table
WHERE NOT REGEXP_LIKE(email_column, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]
{2,}$');

5. Advanced Data Quality Metrics


--------------------------------------
Calculate advanced data quality metrics such as completeness or accuracy
percentages.

Example 7: Calculate Data Completeness


-------------------------------------
-- Calculate data completeness percentage for a column
SELECT column_name,
COUNT(*) AS total_records,
COUNT(column_name) AS non_null_records,
100 * COUNT(column_name) / COUNT(*) AS completeness_percentage
FROM my_table;

6. Automated Data Validation with Snowflake Tasks


----------------------------------------------------
Automate data validation tasks using Snowflake tasks and scheduling capabilities.

Example 8: Schedule Data Quality Checks


-- Create a Snowflake task for automated data validation
CREATE OR REPLACE TASK data_validation_task
WAREHOUSE = my_warehouse
SCHEDULE = '5 MINUTE'
AS
BEGIN
-- Example: Run data quality checks
INSERT INTO validation_results
SELECT 'Null Check' AS check_name,
COUNT(*) AS issues_found
FROM my_table
WHERE column1 IS NULL OR column2 IS NULL;
END;

You might also like