0% found this document useful (0 votes)

12 views15 pages

Exploratory Data Analysis

Data analytics

Uploaded by

dharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Exploratory Data Analysis

Data analytics

Uploaded by

dharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Exploratory Data Analysis

Prasad Deshmukh
Exploratory Data Analysis
 EDA is a crucial step in data analysis,
involving data exploration,
visualization, and summarization to
uncover patterns and gain insights.
 EDA helps to understand the structure
and characteristics of the dataset, detect
outliers, and identify relationships
between variables through statistical
analysis and visualizations.

Prasad Deshmukh
Data Collection

 Obtain the dataset you want to

import pandas as pd
analyze.
 This may involve downloading data # Read data from a CSV file
from a database, gathering data from data = pd.read_csv('data.csv')
surveys, or accessing publicly
available datasets.

Prasad Deshmukh
Data Exploration
 Explore the dataset to gain an initial understanding.
 This can involve examining the structure of the data, checking the number of
rows and columns, and previewing the first few rows to get a sense of the
variables and their values.

# Check the number of rows and columns

data.shape

# Preview first few rows

data.head()

# View column names

data.columns
Prasad Deshmukh
Data Cleaning
 Clean the data to ensure it is in a usable format.
 This includes handling missing values, removing duplicates, correcting
inconsistent data, and transforming data types if necessary.

# Handling missing values

data.dropna() # Drop rows with missing values
data.fillna(value) # Fill missing values with a specific value

# Removing duplicates
data.drop_duplicates()

# Correcting inconsistent data

data['column_name'].replace(old_value, new_value, inplace=True)
Prasad Deshmukh
Missing Value Treatment
 Address missing values in the dataset.
 This can involve imputing missing values using techniques like mean, median,
mode, or advanced imputation methods like regression or machine learning
algorithms.
# Drop rows with missing values
data.dropna(inplace=True)

# Fill missing values with mean

data.fillna(data.mean(), inplace=True)

# Fill missing values with forward fill

data.fillna(method='ffill', inplace=True)
Prasad Deshmukh
Summary Statistics
 Compute basic summary
statistics such as mean, # Compute basic summary statistics
median, mode, standard data.describe()
deviation, and quartiles for
# Calculate mean, median, mode
numerical variables. data.mean()
 For categorical variables, you data.median()
can calculate frequency counts data.mode()
or proportions for each
category.

Prasad Deshmukh
Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
 Create visual representations of
the data using graphs, charts, and # Histogram
plt.hist(data['column_name'])
plots.
 This helps to identify patterns, # Box plot
sns.boxplot(x=data['column_name'])
trends, and outliers.
# Scatter plot
 Common visualizations include plt.scatter(data['x_column'],
histograms, box plots, scatter data['y_column'])
plots, bar charts, and heatmaps.
# Bar chart
sns.countplot(data['category_column'])

# Heatmap
sns.heatmap(data.corr())
Prasad Deshmukh
Correlation Analysis
 Examine the relationships between variables by calculating correlation
coefficients.
 This helps to identify variables that are highly correlated, positively or
negatively, and can provide insights into potential predictors or
multicollinearity.

# Calculate correlation matrix

correlation_matrix = data.corr()

# Heatmap of correlation matrix

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')

Prasad Deshmukh
Outlier Detection
 Identify and handle outliers in the data.
 Outliers can significantly impact analysis results, so it's important to detect and understand
their presence.
 Common techniques for outlier detection include box plots, z-scores, and clustering
methods.
# Box plot
sns.boxplot(x=data['column_name'])

# Z-score method
from scipy.stats import zscore

data['z_score'] = zscore(data['column_name'])
outliers = data[(data['z_score'] > 3) | (data['z_score'] < -3)]
Prasad Deshmukh
Data Transformation
 Perform transformations on variables to make the data more suitable for
analysis or modeling.
 Examples include log transformations, square roots, normalization, or
standardization.
# Log transformation
data['log_transformed'] = np.log(data['column_name'])

# Standardization
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
data['standardized_column'] =
scaler.fit_transform(data['column_name'].values.reshape(-1, 1))
Prasad Deshmukh
Hypothesis Testing
 Ifapplicable, conduct statistical tests to validate hypotheses or
assumptions about the data.
 This can involve t-tests, chi-square tests, ANOVA, or other
appropriate tests based on the nature of the data and the research
questions.
from scipy.stats import ttest_ind

# Perform t-test between two groups

group1 = data[data['group'] == 1]['column_name']
group2 = data[data['group'] == 2]['column_name']
statistic, p_value = ttest_ind(group1, group2)
Prasad Deshmukh
Iterative Analysis

 EDA is often an iterative process.

 Asyou uncover insights, you may go back and refine
your analysis, perform additional transformations, or
explore specific aspects in more detail.

Prasad Deshmukh
In conclusion, Exploratory Data Analysis (EDA) is a
crucial step in the data analysis process that helps to
understand the dataset, identify patterns, relationships,
and outliers, and inform subsequent analysis and
modeling decisions. It provides valuable insights and
serves as a foundation for data-driven decision-making.

Prasad Deshmukh
THANK YOU

Prasad Deshmukh

London - What To Do Next
No ratings yet
London - What To Do Next
7 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Description of Control: Service Lift
100% (1)
Description of Control: Service Lift
46 pages
Exploratory Data Analysis: Prasad Deshmukh
No ratings yet
Exploratory Data Analysis: Prasad Deshmukh
15 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Exploratory Data Analysis (EDA) and Descriptive Analytic
No ratings yet
Exploratory Data Analysis (EDA) and Descriptive Analytic
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Unit 1
No ratings yet
Unit 1
23 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Eda 2
No ratings yet
Eda 2
69 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
4.1 Advanced Data Analysis & Visualization
No ratings yet
4.1 Advanced Data Analysis & Visualization
12 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Unit 2
No ratings yet
Unit 2
58 pages
Dev 1
No ratings yet
Dev 1
2 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
Unit 2
No ratings yet
Unit 2
36 pages
ML Exp No 1
No ratings yet
ML Exp No 1
8 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
84 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Document
No ratings yet
Document
21 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
Class Activity-2
No ratings yet
Class Activity-2
3 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
EDA - Task
No ratings yet
EDA - Task
20 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Data Analytics Interview Questions
No ratings yet
Data Analytics Interview Questions
3 pages
Group 7
No ratings yet
Group 7
19 pages
Dev Core
No ratings yet
Dev Core
7 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Unit 1
No ratings yet
Unit 1
50 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Electronic Reservation Slip (ERS) : 4235113970 17308/basava Express Sleeper Class (SL)
No ratings yet
Electronic Reservation Slip (ERS) : 4235113970 17308/basava Express Sleeper Class (SL)
2 pages
Types of Telecommunication Networks
No ratings yet
Types of Telecommunication Networks
1 page
Maths Final Class 12 Paper 12
No ratings yet
Maths Final Class 12 Paper 12
5 pages
Message
No ratings yet
Message
11 pages
Law of The European Union (LAW310) 2021-22 Semester 1 Summative Assessment
No ratings yet
Law of The European Union (LAW310) 2021-22 Semester 1 Summative Assessment
2 pages
XXXXX: Important Instructions To Examiners
No ratings yet
XXXXX: Important Instructions To Examiners
22 pages
Accomplishment Report Week 2
No ratings yet
Accomplishment Report Week 2
3 pages
MR MW
No ratings yet
MR MW
2 pages
Project Review Checklist
No ratings yet
Project Review Checklist
7 pages
An CV Av Soc DDG 683360 666598
No ratings yet
An CV Av Soc DDG 683360 666598
75 pages
629 2322 1 PB
No ratings yet
629 2322 1 PB
11 pages
Icom - IC-F410 - Service - Manual Ok Enviar Correo
No ratings yet
Icom - IC-F410 - Service - Manual Ok Enviar Correo
41 pages
Flowgorithm Session1 Tutorial On How To Print Hello World
No ratings yet
Flowgorithm Session1 Tutorial On How To Print Hello World
10 pages
Fundamentals of Scanning Probe Micros
No ratings yet
Fundamentals of Scanning Probe Micros
98 pages
Manu Mishra Resume 2023 UPDATEDpdf
No ratings yet
Manu Mishra Resume 2023 UPDATEDpdf
2 pages
Internet of Things
No ratings yet
Internet of Things
28 pages
Apollo ELD - Hours of Service (HOS) Mobile APP© - Driver's Guide Booklet
No ratings yet
Apollo ELD - Hours of Service (HOS) Mobile APP© - Driver's Guide Booklet
11 pages
Co17554 PDF
No ratings yet
Co17554 PDF
7 pages
E.kyc Process Final - 03.08.22
No ratings yet
E.kyc Process Final - 03.08.22
82 pages
Quarter 2 Week 2 Lecture Matatag Sy2425
No ratings yet
Quarter 2 Week 2 Lecture Matatag Sy2425
5 pages
Configure Integration - Jira Service Management
No ratings yet
Configure Integration - Jira Service Management
10 pages
Embedded
No ratings yet
Embedded
9 pages
Resume With Pic
No ratings yet
Resume With Pic
2 pages
Gek 107155
No ratings yet
Gek 107155
4 pages
Voice 1
No ratings yet
Voice 1
3 pages
Call ASREM2020 VF
No ratings yet
Call ASREM2020 VF
1 page
Geostatistics and Reservoir Modeling Module: Review of Basic Statistics
No ratings yet
Geostatistics and Reservoir Modeling Module: Review of Basic Statistics
52 pages
Innovation Lessons Learned Report
No ratings yet
Innovation Lessons Learned Report
3 pages

Exploratory Data Analysis

Uploaded by

Exploratory Data Analysis

Uploaded by

Exploratory Data Analysis

 Obtain the dataset you want to

# Check the number of rows and columns

# Preview first few rows

# View column names

# Handling missing values

# Correcting inconsistent data

# Fill missing values with mean

# Fill missing values with forward fill

# Calculate correlation matrix

# Heatmap of correlation matrix

# Perform t-test between two groups

 EDA is often an iterative process.

You might also like