Python Data Wrangling Tutorial: Pandas Cheatsheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Pandas Cheatsheet:

Python Data Wrangling Tutorial

This Pandas cheatsheet will cover some of the most common and useful functionalities for data wrangling in
Python. Broadly speaking, data wrangling is the process of reshaping, aggregating, separating, or otherwise
transforming your data from one format to a more useful one.

Pandas is the best Python library for wrangling relational (i.e. table-format) datasets, and it will be doing most of
the heavy lifting for us.

To see the most up-to-date full tutorial and download the sample dataset, visit the online tutorial at
elitedatascience.com.

SETUP Shift the pivoted dataset


First, make sure you have the following installed on your computer: delta_dict = {}
for offset in [7, 14, 21, 28]:
• Python 2.7+ or Python 3
• Pandas delta_dict[‘delta_{}’.format(offset)] = pivoted_df /
• Jupyter Notebook (optional, but recommended)
pivoted_df.shift(offset) - 1
*note: We strongly recommend installing the Anaconda Distribution, which
comes with all of those packages. Simply follow the instructions on that
download page. Melt the shifted dataset
Once you have Anaconda installed, simply start Jupyter (either through the melted_dfs = []
command line or the Navigator app) and open a new notebook.
for key, delta_df in delta_dict.items():
melted_dfs.append( delta_df.reset_index().melt(id_vars=[‘Date’],
Import libraries and dataset value_name=key) )
import pandas as pd
pd.options.display.float_format = ‘{:,.2f}’.format return_df = pivoted_df.shift(-7) / pivoted_df - 1.0
pd.options.display.max_rows = 200 melted_dfs.append( return_df.reset_index().melt(id_vars=[‘Date’],
pd.options.display.max_columns = 100 value_name=’return_7’) )

df = pd.read_csv(‘BNC2_sample.csv’,
Reduce-merge the melted data
names=[‘Code’, ‘Date’, ‘Open’, ‘High’, ‘Low’
from functools import reduce
‘Close’, ‘Volume’, ‘VWAP’, ‘TWAP’])
*The sample dataset can be downloaded here.
base_df = df[[‘Date’, ‘Code’, ‘Volume’, ‘VWAP’]]
feature_dfs = [base_df] + melted_dfs
Filter unwanted observations
gwa_codes = [code for code in df.Code.unique() if ‘GWA_’ in code] abt = reduce(lambda left,right: pd.merge(left,right,on=[‘Date’,
df = df[df.Code.isin(gwa_codes)] ‘Code’]), feature_dfs)

Pivot the dataset Aggregate with group-by


pivoted_df = df.pivot(index=’Date’, columns=’Code’, values=’VWAP’) abt[‘month’] = abt.Date.apply(lambda x: x[:7])
gb_df = abt.groupby([‘Code’, ‘month’]).first().reset_index()

To see the most up-to-date full tutorial, explanations, and additional context, visit the online tutorial at elitedatascience.com.
We also have plenty of other tutorials and guides.

ELITEDATASCIENCE.COM

You might also like