Data validation in Snowflake:
Example 1: Check for Null Values
Example 2: Ensure Unique Constraints
Example 3: Cross-Table Referential Integrity
Example 4: Check Data Ranges
Example 5: Validate Record Counts
Example 6: Validate Email Formats
Example 7: Calculate Data Completeness
Example 8: Schedule Data Quality Checks
-----------------------------------------------------
1. Basic Data Quality Checks
--------------------------------
Perform basic checks to validate data integrity and quality within tables.
Example 1: Check for Null Values
--------------------------------
-- Check for null values in critical columns
SELECT *
FROM my_table
WHERE column1 IS NULL OR column2 IS NULL;
Example 2: Ensure Unique Constraints
------------------------------------------
-- Check for duplicate records based on a unique constraint
SELECT duplicate_column, COUNT(*)
FROM my_table
GROUP BY duplicate_column
HAVING COUNT(*) > 1;
2. Cross-Table Data Consistency Checks
--------------------------------------------
Verify data consistency across related tables or datasets.
Example 3: Cross-Table Referential Integrity
-- Check for orphaned records in a child table
SELECT child_table.*
FROM child_table
LEFT JOIN parent_table ON child_table.parent_id = parent_table.parent_id
WHERE parent_table.parent_id IS NULL;
3. Data Completeness Checks
--------------------------
Ensure data completeness by verifying expected data ranges or counts.
Example 4: Check Data Ranges
-- Check date range validity
SELECT *
FROM sales_data
WHERE order_date < '2023-01-01' OR order_date > '2023-12-31';
Example 5: Validate Record Counts
---------------------------------
-- Validate expected record counts in a table
SELECT COUNT(*) AS actual_count, expected_count
FROM my_table
GROUP BY expected_count
HAVING COUNT(*) <> expected_count;
4. Data Consistency and Format Checks
---------------------------------------
Verify data consistency and format adherence using regular expressions or pattern
matching.
Example 6: Validate Email Formats
-- Validate email formats using regular expressions
SELECT email_column
FROM my_table
WHERE NOT REGEXP_LIKE(email_column, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]
{2,}$');
5. Advanced Data Quality Metrics
--------------------------------------
Calculate advanced data quality metrics such as completeness or accuracy
percentages.
Example 7: Calculate Data Completeness
-------------------------------------
-- Calculate data completeness percentage for a column
SELECT column_name,
COUNT(*) AS total_records,
COUNT(column_name) AS non_null_records,
100 * COUNT(column_name) / COUNT(*) AS completeness_percentage
FROM my_table;
6. Automated Data Validation with Snowflake Tasks
----------------------------------------------------
Automate data validation tasks using Snowflake tasks and scheduling capabilities.
Example 8: Schedule Data Quality Checks
-- Create a Snowflake task for automated data validation
CREATE OR REPLACE TASK data_validation_task
WAREHOUSE = my_warehouse
SCHEDULE = '5 MINUTE'
AS
BEGIN
-- Example: Run data quality checks
INSERT INTO validation_results
SELECT 'Null Check' AS check_name,
COUNT(*) AS issues_found
FROM my_table
WHERE column1 IS NULL OR column2 IS NULL;
END;