Python Data Wrangling Tutorial: Pandas Cheatsheet

This Pandas cheatsheet provides a tutorial on data wrangling techniques in Python using the Pandas library. It covers reshaping, aggregating, separating, and transforming data from one format to a more useful format. The cheatsheet walks through importing data, filtering observations, pivoting a dataset, calculating shifts in the data over time, melting the shifted data, merging melted datasets, aggregating with group-by operations, and where to find additional online resources.

Uploaded by

Rajas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views

Python Data Wrangling Tutorial: Pandas Cheatsheet

Uploaded by

Rajas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Pandas Cheatsheet:

Python Data Wrangling Tutorial

This Pandas cheatsheet will cover some of the most common and useful functionalities for data wrangling in
Python. Broadly speaking, data wrangling is the process of reshaping, aggregating, separating, or otherwise
transforming your data from one format to a more useful one.

Pandas is the best Python library for wrangling relational (i.e. table-format) datasets, and it will be doing most of
the heavy lifting for us.

To see the most up-to-date full tutorial and download the sample dataset, visit the online tutorial at
elitedatascience.com.

SETUP Shift the pivoted dataset

First, make sure you have the following installed on your computer: delta_dict = {}
for offset in [7, 14, 21, 28]:
• Python 2.7+ or Python 3
• Pandas delta_dict[‘delta_{}’.format(offset)] = pivoted_df /
• Jupyter Notebook (optional, but recommended)
pivoted_df.shift(offset) - 1
*note: We strongly recommend installing the Anaconda Distribution, which
comes with all of those packages. Simply follow the instructions on that
download page. Melt the shifted dataset
Once you have Anaconda installed, simply start Jupyter (either through the melted_dfs = []
command line or the Navigator app) and open a new notebook.
for key, delta_df in delta_dict.items():
melted_dfs.append( delta_df.reset_index().melt(id_vars=[‘Date’],
Import libraries and dataset value_name=key) )
import pandas as pd
pd.options.display.float_format = ‘{:,.2f}’.format return_df = pivoted_df.shift(-7) / pivoted_df - 1.0
pd.options.display.max_rows = 200 melted_dfs.append( return_df.reset_index().melt(id_vars=[‘Date’],
pd.options.display.max_columns = 100 value_name=’return_7’) )

df = pd.read_csv(‘BNC2_sample.csv’,
Reduce-merge the melted data
names=[‘Code’, ‘Date’, ‘Open’, ‘High’, ‘Low’
from functools import reduce
‘Close’, ‘Volume’, ‘VWAP’, ‘TWAP’])
*The sample dataset can be downloaded here.
base_df = df[[‘Date’, ‘Code’, ‘Volume’, ‘VWAP’]]
feature_dfs = [base_df] + melted_dfs
Filter unwanted observations
gwa_codes = [code for code in df.Code.unique() if ‘GWA_’ in code] abt = reduce(lambda left,right: pd.merge(left,right,on=[‘Date’,
df = df[df.Code.isin(gwa_codes)] ‘Code’]), feature_dfs)

Pivot the dataset Aggregate with group-by

pivoted_df = df.pivot(index=’Date’, columns=’Code’, values=’VWAP’) abt[‘month’] = abt.Date.apply(lambda x: x[:7])
gb_df = abt.groupby([‘Code’, ‘month’]).first().reset_index()

To see the most up-to-date full tutorial, explanations, and additional context, visit the online tutorial at elitedatascience.com.
We also have plenty of other tutorials and guides.

ELITEDATASCIENCE.COM

M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
From Everand
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Pedro Perfeito
5/5 (1)
Python For Quants.
No ratings yet
Python For Quants.
4 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Finding Similar Items
No ratings yet
Finding Similar Items
85 pages
3 - Big Data Insight V.2019 PDF
No ratings yet
3 - Big Data Insight V.2019 PDF
28 pages
Python Data Viz Tutorial: Setup Overlaying Plots
No ratings yet
Python Data Viz Tutorial: Setup Overlaying Plots
1 page
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
(Treading On Python 2) Matt Harrison - Treading On Python Volume 2 - Intermediate Python 2 (2013, Hairysun)
No ratings yet
(Treading On Python 2) Matt Harrison - Treading On Python Volume 2 - Intermediate Python 2 (2013, Hairysun)
144 pages
Enterprise GENERATIVE AI Well-Architected Framework & Patterns: An Architect's Real-life Guide to Adopting Generative AI in Enterprises at Scale
From Everand
Enterprise GENERATIVE AI Well-Architected Framework & Patterns: An Architect's Real-life Guide to Adopting Generative AI in Enterprises at Scale
Suvoraj Biswas
No ratings yet
How To Model Residual Errors To Correct Time Series Forecasts With Python
No ratings yet
How To Model Residual Errors To Correct Time Series Forecasts With Python
22 pages
Creating Stunning Dashboards With QlikView - Sample Chapter
No ratings yet
Creating Stunning Dashboards With QlikView - Sample Chapter
18 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Managing Memory in SAS
No ratings yet
Managing Memory in SAS
17 pages
Predictive Modeling: Project Documentation Team 10
No ratings yet
Predictive Modeling: Project Documentation Team 10
16 pages
Scala and Spark Overview PDF
No ratings yet
Scala and Spark Overview PDF
37 pages
Memsql
No ratings yet
Memsql
23 pages
Azure Data Lake A Clear and Concise Reference
From Everand
Azure Data Lake A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Fundamentals of Analytics Engineering: An introduction to building end-to-end analytics solutions
From Everand
Fundamentals of Analytics Engineering: An introduction to building end-to-end analytics solutions
Dumky De Wilde
No ratings yet
An Insurance Recommendation System Using Bayesian Networks
No ratings yet
An Insurance Recommendation System Using Bayesian Networks
5 pages
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
AI Mastery Trilogy: AI Fundamentals
From Everand
AI Mastery Trilogy: AI Fundamentals
Andrew Hinton
No ratings yet
XL Wings
No ratings yet
XL Wings
214 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
IICT - Data Science
No ratings yet
IICT - Data Science
22 pages
Apache Spark Graph Processing
From Everand
Apache Spark Graph Processing
Ramamonjison Rindra
No ratings yet
Semantic Knowledge Graphing Third Edition
From Everand
Semantic Knowledge Graphing Third Edition
Gerardus Blokdyk
No ratings yet
Practical Data Analytics for BFSI
From Everand
Practical Data Analytics for BFSI
Mr. Bharat Sikka
No ratings yet
Tableau Syllabus
No ratings yet
Tableau Syllabus
13 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Data Scientist Master Program v4
100% (1)
Data Scientist Master Program v4
28 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Advanced Dimensional Modeling
No ratings yet
Advanced Dimensional Modeling
19 pages
Tabular Iceberg-Spark Cheat-Sheet
No ratings yet
Tabular Iceberg-Spark Cheat-Sheet
1 page
Chapter 4: Advanced SQL
100% (2)
Chapter 4: Advanced SQL
57 pages
Deep Learning for Computer Vision with SAS: An Introduction
From Everand
Deep Learning for Computer Vision with SAS: An Introduction
Robert Blanchard
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
SQL With Python Guide
No ratings yet
SQL With Python Guide
17 pages
IBM Cognos 10 Framework Manager
From Everand
IBM Cognos 10 Framework Manager
Terry Curran
No ratings yet
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
The Stream Processing Paradigm: Research Report For HIT-382
100% (1)
The Stream Processing Paradigm: Research Report For HIT-382
36 pages
Patterns of Big Data Forrester
No ratings yet
Patterns of Big Data Forrester
74 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Power BI Cheat Sheet
No ratings yet
Power BI Cheat Sheet
10 pages
Qlik_Replicate_More_Data_AnalyticsReady_White_Paper_US
No ratings yet
Qlik_Replicate_More_Data_AnalyticsReady_White_Paper_US
14 pages
Data Mart V Data Warehouse
No ratings yet
Data Mart V Data Warehouse
19 pages
Cheat Sheet - Machine Learning - Data Science Interview PDF
No ratings yet
Cheat Sheet - Machine Learning - Data Science Interview PDF
16 pages
Excel With Python Performing Advanced Operations
No ratings yet
Excel With Python Performing Advanced Operations
57 pages
Comparison Kubeflow TFX
No ratings yet
Comparison Kubeflow TFX
12 pages
Professional Microsoft SQL Server 2014 Integration Services
From Everand
Professional Microsoft SQL Server 2014 Integration Services
Devin Knight
No ratings yet
Getting Started with Greenplum for Big Data Analytics
From Everand
Getting Started with Greenplum for Big Data Analytics
Sunila Gollapudi
No ratings yet
Lecture 2: Markov Decision Processes: David Silver
No ratings yet
Lecture 2: Markov Decision Processes: David Silver
57 pages
Django 1.0 Template Development
From Everand
Django 1.0 Template Development
Scott Newman
No ratings yet
Trust-In Machine Learning Models
No ratings yet
Trust-In Machine Learning Models
11 pages
CS-2 Unit Test-2: Class XII Computer Science
No ratings yet
CS-2 Unit Test-2: Class XII Computer Science
4 pages
Chapter 4 AI
No ratings yet
Chapter 4 AI
2 pages
Final Examination (Free Elective 3)
No ratings yet
Final Examination (Free Elective 3)
3 pages
Smart Display For Displaying Faculty Availability in Cabin: A Project Report
No ratings yet
Smart Display For Displaying Faculty Availability in Cabin: A Project Report
24 pages
Notes Ip
No ratings yet
Notes Ip
3 pages
HCA Paper Exam For Python Programming Making Guide
No ratings yet
HCA Paper Exam For Python Programming Making Guide
5 pages
Get (Ebook) Effective Python 90 Specific Ways to Write Better Python 2nd Edition by Brett Slatkin ISBN 9780134853987, 0134853989 free all chapters
100% (8)
Get (Ebook) Effective Python 90 Specific Ways to Write Better Python 2nd Edition by Brett Slatkin ISBN 9780134853987, 0134853989 free all chapters
47 pages
Python Data Visualization Cookbook 2nd Edition Igor Milovanovic - Own the ebook now with all fully detailed content
100% (1)
Python Data Visualization Cookbook 2nd Edition Igor Milovanovic - Own the ebook now with all fully detailed content
62 pages
417-AI-X
No ratings yet
417-AI-X
10 pages
w23
No ratings yet
w23
2 pages
Mojo Basics
No ratings yet
Mojo Basics
3 pages
Py Slides 7
No ratings yet
Py Slides 7
11 pages
Python and Data Structures Roadmap
No ratings yet
Python and Data Structures Roadmap
14 pages
Algo Brochure PDF
No ratings yet
Algo Brochure PDF
4 pages
Project K
No ratings yet
Project K
11 pages
Python For Finance PDF
50% (2)
Python For Finance PDF
28 pages
Req - Full Doc - Online Fake Reviews Detection in E-Commerce
No ratings yet
Req - Full Doc - Online Fake Reviews Detection in E-Commerce
52 pages
CLASS 12 PRE BOARD 2023-24 ComputerScience
No ratings yet
CLASS 12 PRE BOARD 2023-24 ComputerScience
25 pages
4.GE3171_SET5
No ratings yet
4.GE3171_SET5
4 pages
Question Bank - Python
No ratings yet
Question Bank - Python
2 pages
Python Project Report
No ratings yet
Python Project Report
12 pages
MODULE 1
No ratings yet
MODULE 1
8 pages
Vaibhav Sharma 12-A Roll No. 34assignment - 3 Binary File
No ratings yet
Vaibhav Sharma 12-A Roll No. 34assignment - 3 Binary File
12 pages
Python 1
No ratings yet
Python 1
289 pages
Python Automation
No ratings yet
Python Automation
12 pages
Scapy
67% (3)
Scapy
105 pages
Python Interview Questions
No ratings yet
Python Interview Questions
2 pages
SP - 34
No ratings yet
SP - 34
18 pages
sushmitha.cv
No ratings yet
sushmitha.cv
1 page