Pandas Notes

Uploaded by

Qudrat Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Pandas Notes

Uploaded by

Qudrat Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Manipulation With Pandas

Transforming DataFrames:
Inspecting a DataFrame
.head() returns the first few rows (the “head” of the DataFrame).

.info() shows information on each of the columns, such as the data type and number of missing values.

.shape returns the number of rows and columns of the DataFrame.

.describe() calculates a few summary statistics for each column.

Parts of a DataFrame
.values: A two-dimensional NumPy array of values.

.columns: An index of columns: the column names.

.index: An index for the rows: either row numbers or row names.

Sorting rows
Finding interesting bits of data in a DataFrame is often easier if you change the order of the rows. You
can sort the rows by passing a column name to .sort_values().

In cases where rows have the same value (this is common if you sort on a categorical variable), you may
wish to break the ties by sorting on another column. You can sort on multiple columns in this way by
passing a list of column names.

Sort on … Syntax

one column df.sort_values("breed")

multiple columns df.sort_values(["col1", "col2"])

homelessness_reg_fam = homelessness.sort_values(["region", "family_members"], ascending=[True,

False])
Subsetting columns
When working with data, you may not need all of the variables in your dataset. Square brackets ([]) can
be used to select only the columns that matter to you in an order that makes sense to you. To select only
"col_a" of the DataFrame df, use

df["col_a"]

To select "col_a" and "col_b" of df, use

df[["col_a", "col_b"]]

Subsetting rows
A large part of data science is about finding which bits of your dataset are interesting. One of the
simplest techniques for this is to find a subset of rows that match some criteria. This is sometimes known
as filtering rows or selecting rows.

There are many ways to subset a DataFrame, perhaps the most common is to use relational operators to
return True or False for each row, then pass that inside square brackets.

dogs[dogs["height_cm"] > 60]

dogs[dogs["color"] == "tan"]

You can filter for multiple conditions at once by using the "bitwise and" operator, &.

dogs[(dogs["height_cm"] > 60) & (dogs["color"] == "tan")]

Subsetting rows by categorical variables

Subsetting data based on a categorical variable often involves using the "or" operator (|) to select rows
from multiple categories. This can get tedious when you want all states in one of three different regions,
for example. Instead, use the .isin() method, which will allow you to tackle this problem by writing one
condition instead of three separate ones.

colors = ["brown", "black", "tan"]

condition = dogs["color"].isin(colors)
dogs[condition]

Adding new columns

You aren't stuck with just the data you are given. Instead, you can add new columns to a DataFrame. This
has many names, such as transforming, mutating, and feature engineering.

You can create new columns from scratch, but it is also common to derive them from other columns, for
example, by adding columns together or by changing their units.

Df["new_col"] = Df["col_1"] + Df["col_2"]

Statistics
2)Agregating DataFrames
Efficient summaries
While pandas and NumPy have tons of functions, sometimes, you may need a different function
to summarize your data.

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as
apply functions to more than one column of a DataFrame at once, making your aggregations
super-efficient. For example,

df['column'].agg(function)

In the custom function for this exercise, "IQR" is short for inter-quartile range, which is the 75th
percentile minus the 25th percentile. It's an alternative to standard deviation that is helpful if
your data contains outliers.

Cumulative statistics
Cumulative statistics can also be helpful in tracking summary statistics over time. In this
exercise, you'll calculate the cumulative sum and cumulative max of a department's weekly
sales, which will allow you to identify what the total sales were so far as well as what the
highest weekly sales were so far.

Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Graphic Design
78% (9)
Graphic Design
60 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Pandas
No ratings yet
Pandas
5 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas
No ratings yet
Pandas
13 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
Pandas
No ratings yet
Pandas
94 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
83% (12)
Pandas Cheat Sheet
2 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
01-Numpy & Pandas
No ratings yet
01-Numpy & Pandas
69 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
RA Continuing Education (Data Processing With Pandas)
No ratings yet
RA Continuing Education (Data Processing With Pandas)
77 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
Python for ML
No ratings yet
Python for ML
41 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
python interviews
No ratings yet
python interviews
154 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
MLStackCafe2
No ratings yet
MLStackCafe2
11 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
24
No ratings yet
24
7 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
CSS Grid Layout
From Everand
CSS Grid Layout
Abdelfattah Ragab
No ratings yet
ISTQB CT AuT Sample Exam A Questions v2.2
No ratings yet
ISTQB CT AuT Sample Exam A Questions v2.2
17 pages
Keya
No ratings yet
Keya
38 pages
Wireless Charging of Electric Vehicle
No ratings yet
Wireless Charging of Electric Vehicle
4 pages
Mail Management Service: How To Design An Activity-Based Costing System For Service Firms
No ratings yet
Mail Management Service: How To Design An Activity-Based Costing System For Service Firms
11 pages
The_Mirror_of_the_Blessed_Life_of_Jesus_Christ
No ratings yet
The_Mirror_of_the_Blessed_Life_of_Jesus_Christ
2 pages
A Case Study of Implementing Knowledge Management System in Healthcare in Malaysia
No ratings yet
A Case Study of Implementing Knowledge Management System in Healthcare in Malaysia
8 pages
Jashandeep Resume
No ratings yet
Jashandeep Resume
1 page
MBRE
No ratings yet
MBRE
25 pages
Scherer 2025 Multiple lines of evidence support anagenesis in Daspletosaurus a
No ratings yet
Scherer 2025 Multiple lines of evidence support anagenesis in Daspletosaurus a
15 pages
KCP-Canada Catalogue 2023
No ratings yet
KCP-Canada Catalogue 2023
42 pages
Spinal Deformities The Essentials - 2nd Edition Complete Chapter Download
100% (7)
Spinal Deformities The Essentials - 2nd Edition Complete Chapter Download
17 pages
05 - 14 Steel Repairs
No ratings yet
05 - 14 Steel Repairs
1 page
Animal Vs Plant PPT New Edited
No ratings yet
Animal Vs Plant PPT New Edited
6 pages
Binding Energy Data
No ratings yet
Binding Energy Data
8 pages
Multiple Choice
No ratings yet
Multiple Choice
12 pages
2.2 Absorption Costing
No ratings yet
2.2 Absorption Costing
3 pages
ĐỀ DUYÊN HẢI LỚP 10 BẮC NINH 2024
100% (1)
ĐỀ DUYÊN HẢI LỚP 10 BẮC NINH 2024
12 pages
Creativity in Marketing
No ratings yet
Creativity in Marketing
27 pages
Technip Subsea Case Study Using Acumen: 11-Sept-13 Presenters: Cesar Ramos & Pat Smith
No ratings yet
Technip Subsea Case Study Using Acumen: 11-Sept-13 Presenters: Cesar Ramos & Pat Smith
20 pages
Property Digest
No ratings yet
Property Digest
16 pages
Theories & Concepts in Management
No ratings yet
Theories & Concepts in Management
41 pages
Frequently Asked Questions
No ratings yet
Frequently Asked Questions
3 pages
Yamaha R-N402 (D)
No ratings yet
Yamaha R-N402 (D)
83 pages
DevOps With Azure Training
No ratings yet
DevOps With Azure Training
4 pages
Q2 Week 4 Socio Economic Factors Affecting Business Industry
No ratings yet
Q2 Week 4 Socio Economic Factors Affecting Business Industry
56 pages
Hart & Fuller
No ratings yet
Hart & Fuller
8 pages
Reeda Scale
100% (1)
Reeda Scale
7 pages
Toward A Blurry Rasterizer: Jacob Munkberg
No ratings yet
Toward A Blurry Rasterizer: Jacob Munkberg
52 pages
Material Derivative - Wikipedia
No ratings yet
Material Derivative - Wikipedia
20 pages