0% found this document useful (0 votes)

15 views3 pages

EDA SQL Document

EDA

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

EDA SQL Document

EDA

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Exploratory Data Analysis (EDA) Using SQL

1. Understanding the Dataset

- Data Overview:

Use SQL queries like SELECT TOP (5) * FROM table_name; or SELECT COLUMN_NAME,

DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table_name';

to quickly understand the structure of your dataset. Identify whether the data types are appropriate

(categorical, numerical, dates).

2. Data Cleaning

- Check for Missing Values:

Identify missing values with a query like:

SELECT COUNT(*) AS missing_count FROM table_name WHERE column_name IS NULL;

- Check for Duplicates:

Find duplicate records using:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING

COUNT(*) > 1;

- Outlier Detection:

For numerical data, outliers can be detected using standard deviation:

SELECT AVG(column_name), STDEV(column_name) FROM table_name;

Filter out rows where values fall beyond a certain threshold:

SELECT * FROM table_name WHERE column_name > (AVG(column_name) + 3 *

STDEV(column_name));
- Data Type Corrections:

Use ALTER TABLE statements to ensure columns have the correct data type, e.g.:

ALTER TABLE table_name ALTER COLUMN column_name INT;

3. Descriptive Statistics

- Summary Statistics:

Get summary statistics (mean, min, max, etc.) with:

SELECT MIN(column_name), MAX(column_name), AVG(column_name), COUNT(*) FROM

table_name;

- For categorical data, get frequency distribution:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

4. Data Relationships

- Correlation Analysis:

Use aggregate SQL functions to mimic correlation.

- Scatter Plots:

SQL cannot directly create plots, but you can retrieve the necessary data for visualization.

5. Data Visualization:

SQL doesn't produce charts directly. Export results for visualization using external tools.

6. Handling Categorical Variables

- Encoding:
Use CASE statements to manually assign numerical values to categories.

- Frequency Analysis:

Analyze category frequency distribution with GROUP BY.

7. Feature Engineering:

Use SQL to create new features, bins, or calculated columns.

8. Outlier Treatment:

Identify and manage outliers using threshold-based queries.

9. Dimensionality Reduction:

SQL supports column filtering via SELECT.

10. Summarizing Findings:

Use GROUP BY and aggregations to reveal trends.

SQL Notes
50% (4)
SQL Notes
16 pages
Learn
No ratings yet
Learn
31 pages
SQL For Data Science
No ratings yet
SQL For Data Science
8 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
SQL_basics_1752319177
No ratings yet
SQL_basics_1752319177
37 pages
SQL-Data Analytcs
No ratings yet
SQL-Data Analytcs
13 pages
SQL Master
No ratings yet
SQL Master
10 pages
SQL 2024
No ratings yet
SQL 2024
3 pages
8 SQL Techniques Data Analysis Analytics Data Science
No ratings yet
8 SQL Techniques Data Analysis Analytics Data Science
13 pages
Learning Task 4 Document 1
No ratings yet
Learning Task 4 Document 1
20 pages
SQL - Eda Process
No ratings yet
SQL - Eda Process
7 pages
67S SQL Fundamentals For Financial Analysis Detailed Guide 2024
No ratings yet
67S SQL Fundamentals For Financial Analysis Detailed Guide 2024
67 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
Learn
No ratings yet
Learn
33 pages
SQL Manuscript
No ratings yet
SQL Manuscript
154 pages
SQLNotes
No ratings yet
SQLNotes
223 pages
S07 Slides
No ratings yet
S07 Slides
17 pages
SQL Keywords and Functions
No ratings yet
SQL Keywords and Functions
9 pages
Database Testing
No ratings yet
Database Testing
52 pages
Introduction To Structured Query Language
No ratings yet
Introduction To Structured Query Language
23 pages
Day 12 Facilitation Guide MySQL Advanced SQL Queries
No ratings yet
Day 12 Facilitation Guide MySQL Advanced SQL Queries
23 pages
SQL Master Doc
No ratings yet
SQL Master Doc
10 pages
Mehak Dbms
No ratings yet
Mehak Dbms
21 pages
Master SQL SQL
No ratings yet
Master SQL SQL
17 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
21 pages
Advanced Structured Query Language
No ratings yet
Advanced Structured Query Language
76 pages
Managing Data and Assignments
No ratings yet
Managing Data and Assignments
5 pages
SQL Essentials PDF
No ratings yet
SQL Essentials PDF
36 pages
code (1)
No ratings yet
code (1)
4 pages
Benja's Notes
No ratings yet
Benja's Notes
40 pages
Basic SQL Commands
No ratings yet
Basic SQL Commands
10 pages
Topicos 1Z0-071 Oracle
No ratings yet
Topicos 1Z0-071 Oracle
3 pages
SQL Structured Query Language
No ratings yet
SQL Structured Query Language
3 pages
SQL Notes
No ratings yet
SQL Notes
5 pages
SQL Made Easy A Beginners Guide To Easily Learn SQL b096w2gtdf
No ratings yet
SQL Made Easy A Beginners Guide To Easily Learn SQL b096w2gtdf
214 pages
Simple SQL Queries
No ratings yet
Simple SQL Queries
4 pages
Lec07 - SQL (Cont.)
No ratings yet
Lec07 - SQL (Cont.)
79 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
27 pages
Full DB and SQL
No ratings yet
Full DB and SQL
27 pages
CUET Computer Notes
No ratings yet
CUET Computer Notes
55 pages
SQL
No ratings yet
SQL
2 pages
RDBMS Lab Record-IV Sem
No ratings yet
RDBMS Lab Record-IV Sem
39 pages
SQL Is A Standard Language For Accessing and Manipulating Databases. What Is SQL?
No ratings yet
SQL Is A Standard Language For Accessing and Manipulating Databases. What Is SQL?
25 pages
Fundamentals of Data Analysis (Access)
No ratings yet
Fundamentals of Data Analysis (Access)
24 pages
SQL Notes
No ratings yet
SQL Notes
8 pages
Cheat Sheet For SQL From Beginner To Expert
No ratings yet
Cheat Sheet For SQL From Beginner To Expert
2 pages
Cheat Sheet For SQL From Beginner To Expert
No ratings yet
Cheat Sheet For SQL From Beginner To Expert
2 pages
Step-by-Step Guide To Learn SQL
No ratings yet
Step-by-Step Guide To Learn SQL
11 pages
Data Analyst Syllabus (For Aundh)
No ratings yet
Data Analyst Syllabus (For Aundh)
8 pages
SQL Cheat Codes
No ratings yet
SQL Cheat Codes
8 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
12 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
MY SQL Cheat Sheet
No ratings yet
MY SQL Cheat Sheet
6 pages
SQL Summary Version 5
No ratings yet
SQL Summary Version 5
7 pages
SQL Notes
No ratings yet
SQL Notes
81 pages
3 Ass
No ratings yet
3 Ass
21 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Enhanced Visual LEAD DATEDIFF SQL Guide
No ratings yet
Enhanced Visual LEAD DATEDIFF SQL Guide
3 pages
Episode 2 - Transcription
No ratings yet
Episode 2 - Transcription
10 pages
Episode 4 - Transcript
No ratings yet
Episode 4 - Transcript
10 pages
Official Microsoft Assessment For PL300 - 02
No ratings yet
Official Microsoft Assessment For PL300 - 02
29 pages
Official Microsoft Assessment For PL300 - 03
No ratings yet
Official Microsoft Assessment For PL300 - 03
27 pages

EDA SQL Document

Uploaded by

EDA SQL Document

Uploaded by

Exploratory Data Analysis (EDA) Using SQL

1. Understanding the Dataset

DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table_name';

(categorical, numerical, dates).

- Check for Missing Values:

Identify missing values with a query like:

SELECT COUNT(*) AS missing_count FROM table_name WHERE column_name IS NULL;

- Check for Duplicates:

Find duplicate records using:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING

For numerical data, outliers can be detected using standard deviation:

SELECT AVG(column_name), STDEV(column_name) FROM table_name;

Filter out rows where values fall beyond a certain threshold:

SELECT * FROM table_name WHERE column_name > (AVG(column_name) + 3 *

ALTER TABLE table_name ALTER COLUMN column_name INT;

Get summary statistics (mean, min, max, etc.) with:

SELECT MIN(column_name), MAX(column_name), AVG(column_name), COUNT(*) FROM

- For categorical data, get frequency distribution:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

Use aggregate SQL functions to mimic correlation.

6. Handling Categorical Variables

Analyze category frequency distribution with GROUP BY.

Use SQL to create new features, bins, or calculated columns.

Identify and manage outliers using threshold-based queries.

SQL supports column filtering via SELECT.

10. Summarizing Findings:

Use GROUP BY and aggregations to reveal trends.

You might also like