0% found this document useful (0 votes)
2 views

Python Data Analysis: Exploratory Data Analysis

This document is a cheat sheet for exploratory data analysis using Python, detailing various methods and their corresponding code examples. It covers techniques such as correlation matrices, scatter plots, regression plots, box plots, grouping by attributes, group by statements, pivot tables, pseudocolor plots, and calculating the Pearson coefficient and p-value. Each method is accompanied by a brief description and a code snippet for implementation.

Uploaded by

w123lucy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python Data Analysis: Exploratory Data Analysis

This document is a cheat sheet for exploratory data analysis using Python, detailing various methods and their corresponding code examples. It covers techniques such as correlation matrices, scatter plots, regression plots, box plots, grouping by attributes, group by statements, pivot tables, pseudocolor plots, and calculating the Pearson coefficient and p-value. Each method is accompanied by a brief description and a code snippet for implementation.

Uploaded by

w123lucy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

2/23/25, 9:18 PM about:blank

Data Analysis with Python


Cheat Sheet: Exploratory Data Analysis

Package/Method Description Code Example

df.corr()
Complete dataframe correlation Correlation matrix created using all the attributes of the dataset.

df[['attribute1','attribute2',...]].corr()
Specific Attribute correlation Correlation matrix created using specific attributes of the dataset.

Create a scatter plot using the data points of the dependent from matlplotlib import pyplot as
Scatter Plot variable along the x-axis and the independent variable along the plt plt.scatter(df[['attribute_1']],df[['attribute_2']])
y-axis.

Uses the dependent and independent variables in a Pandas data import seaborn as sns
Regression Plot frame to create a scatter plot with a generated linear regression sns.regplot(x='attribute_1',y='attribute_2', data=df)
line for the data.

Create a box-and-whisker plot that uses the pandas dataframe, import seaborn as sns
Box plot sns.boxplot(x='attribute_1',y='attribute_2', data=df)
the dependent, and the independent variables.

Create a group of different attributes of a dataset to create a df_group = df[['attribute_1','attribute_2',...]]


Grouping by attributes
subset of the data.

a. Group the data by different categories of an attribute,


displaying the average value of numerical attributes with the a) df_group = df_group.groupby(['attribute_1'],as_index=False).mean()
same category. b) df_group = df_group.groupby(['attribute_1',
GroupBy statements 'attribute_2'],as_index=False).mean()
b. Group the data by different categories of multiple attributes,
displaying the average value of numerical attributes with the
same category.

Create Pivot tables for better representation of data based on grouped_pivot = df_group.pivot(index='attribute_1',columns='attribute_2')
Pivot Tables
parameters

Create a heatmap image using a PsuedoColor plot (or pcolor) from matlplotlib import pyplot as plt
Pseudocolor plot plt.pcolor(grouped_pivot, cmap='RdBu')
using the pivot table as data.

From scipy import stats


Calculate the Pearson Coefficient and p-value of a pair of pearson_coef,p_value=stats.pearsonr(df['attribute_1'],
Pearson Coefficient and p-value
attributes df['attribute_2'])

about:blank 1/1

You might also like