0% found this document useful (0 votes)

26 views6 pages

python 2.1.3 (2)

The document provides an overview of various data manipulation techniques in Pandas, including merging and joining datasets, aggregation and grouping, creating pivot tables, vectorized string operations, and working with time series data. It also introduces high-performance functions like eval() and query() for efficient data evaluation and filtering. Examples are included to illustrate the usage of each technique.

Uploaded by

hritikp266

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views6 pages

python 2.1.3 (2)

Uploaded by

hritikp266

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

3.

Combining Datasets: Merge and Join, Aggregation and Grouping, Pivot Tables, Vectorized String
Operations, Working with Time Series. High-Performance Pandas: eval()and query()

1. Combining Datasets: Merge and Join

Pandas provides several methods to combine or merge multiple datasets. The most common
methods are merge() and join(), both of which are used to combine data from multiple
DataFrames based on a common column.

Merge:

The merge() function is used to combine two DataFrames based on common columns or
indices. It is similar to SQL joins (e.g., inner, left, right, and outer joins).

Example:

import pandas as pd

# Creating two DataFrames

df1 = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']
})

df2 = pd.DataFrame({
'ID': [1, 2, 4],
'Age': [25, 30, 35]
})

# Merging the DataFrames on 'ID'

merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

Output:

ID Name Age
0 1 Alice 25
1 2 Bob 30

Explanation:

 The merge() function combines df1 and df2 on the common column ID.
 The how='inner' argument specifies that we want an inner join, meaning only the
rows with common IDs will be kept.

Join:

The join() function combines DataFrames by their index or columns. It is more commonly
used when the two DataFrames have a shared index.
Example:

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}, index=[1, 2, 3])

df2 = pd.DataFrame({
'Country': ['USA', 'Canada', 'UK']
}, index=[1, 2, 3])

# Using join to combine the DataFrames based on index

joined_df = df1.join(df2)
print(joined_df)

Output:

Name Age Country

1 Alice 25 USA
2 Bob 30 Canada
3 Charlie 35 UK

Explanation:

 The join() function combines df1 and df2 using their index.
 The resulting DataFrame contains the columns from both df1 and df2.

2. Aggregation and Grouping

Aggregation and grouping allow you to perform calculations (such as sum, mean, or count)
on subsets of your data.

GroupBy:

The groupby() function splits the data into groups based on some criteria, applies a function
to each group, and then combines the results.

Example:

df = pd.DataFrame({
'Team': ['A', 'B', 'A', 'B', 'A', 'B'],
'Points': [10, 20, 15, 25, 30, 35]
})

# Grouping by 'Team' and calculating the sum of 'Points'

grouped = df.groupby('Team').sum()
print(grouped)

Output:

Points
Team
A 55
B 80

Explanation:

 The groupby() function groups the data by the column Team.

 The sum() function is applied to each group to calculate the total Points for each
team.

3. Pivot Tables

Pivot tables allow you to reshape data and perform aggregations. They are similar to Excel
pivot tables.

Example:

df = pd.DataFrame({
'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
'Temperature': [32, 75, 30, 77]
})

# Creating a pivot table to find the average temperature by date and city
pivot_table = df.pivot_table(values='Temperature', index='Date',
columns='City', aggfunc='mean')
print(pivot_table)

Output:

City Los Angeles New York

Date
2021-01-01 75.0 32.0
2021-01-02 77.0 30.0

Explanation:

 The pivot_table() function reshapes the data.

 It aggregates Temperature by Date and City and calculates the mean temperature for
each group.

4. Vectorized String Operations

Pandas provides powerful vectorized operations for string manipulation. These operations are
applied directly to entire columns or Series, making them very efficient.

Example:

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'City': ['New York', 'Los Angeles', 'Chicago']
})

# Convert all names to uppercase using vectorized string operations

df['Name'] = df['Name'].str.upper()
print(df)

Output:

Name City
0 ALICE New York
1 BOB Los Angeles
2 CHARLIE Chicago

Explanation:

 The str.upper() method is applied to the entire Name column, converting all the
names to uppercase.

5. Working with Time Series

Pandas provides extensive functionality for working with time series data, including
generating ranges of dates, resampling data, and performing date/time operations.

Example:

# Creating a DateTime index

dates = pd.date_range('20210101', periods=6)
df = pd.DataFrame({
'Date': dates,
'Temperature': [32, 35, 31, 30, 29, 28]
})

# Setting 'Date' as the index

df.set_index('Date', inplace=True)

# Resampling the data to get the average temperature per month

monthly_avg = df.resample('M').mean()
print(monthly_avg)

Output:

Temperature
Date
2021-01-31 31.666667

Explanation:

 The date_range() function generates a range of dates starting from '2021-01-01'.

 We resample the data by month using .resample('M') and calculate the mean of the
Temperature column.
6. High-Performance Pandas: eval() and query()

Pandas provides two high-performance functions, eval() and query(), for efficiently
evaluating expressions and filtering data.

eval():

The eval() function allows you to evaluate an expression as a string and operate on large
datasets efficiently.

Example:

import pandas as pd

df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [10, 20, 30, 40]
})

# Using eval() to perform arithmetic operations on columns

df['C'] = pd.eval('df.A + df.B')
print(df)

Output:

A B C
0 1 10 11
1 2 20 22
2 3 30 33
3 4 40 44

Explanation:

 The eval() function evaluates the expression 'df.A + df.B' and computes the
result in a new column C.

query():

The query() function allows you to filter rows of a DataFrame based on a condition
expressed as a string.

Example:

# Using query() to filter rows where A is greater than 2

filtered_df = df.query('A > 2')
print(filtered_df)

Output:

A B C
2 3 30 33
3 4 40 44

Explanation:

 The query() function allows you to filter the DataFrame based on a condition. Here,
we selected rows where the value in column A is greater than 2.

Questions:

1. What is the difference between the merge() and join() functions in Pandas? Provide an
example of when to use each of them.
2. What is a pivot table in Pandas? Explain how to create a pivot table and describe the role
of the aggfunc parameter.
3. Define vectorized string operations in Pandas. How are they different from using
Python’s regular string methods? Provide an example of using a vectorized string operation
on a column in a DataFrame.
4. Explain how to handle missing data in Pandas. What are the common techniques for
dealing with NaN values in a DataFrame?
5. What is the purpose of the eval() function in Pandas? How does it improve performance
compared to traditional methods for column operations?
6. Explain the query() function in Pandas. How is it used to filter data based on a specific
condition or expression?

PYTHON UNIT IV- PANDAS
No ratings yet
PYTHON UNIT IV- PANDAS
36 pages
EXP-6
No ratings yet
EXP-6
9 pages
UnitIV.1
No ratings yet
UnitIV.1
4 pages
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
python interviews
No ratings yet
python interviews
154 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
a5
No ratings yet
a5
28 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
Pandas
No ratings yet
Pandas
26 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
c
No ratings yet
c
5 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
2. advanced analytic techniques
No ratings yet
2. advanced analytic techniques
2 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
data handling module
No ratings yet
data handling module
10 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
IOCL - Impact Assessment Report - Fodder Bank Project
No ratings yet
IOCL - Impact Assessment Report - Fodder Bank Project
82 pages
Ronald Knox - Trials of A Translator-Sheed & Ward (1949)
No ratings yet
Ronald Knox - Trials of A Translator-Sheed & Ward (1949)
63 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Magic Quadrant For Talent Management Suites
100% (1)
Magic Quadrant For Talent Management Suites
24 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
Pandas
No ratings yet
Pandas
94 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Time in French Exam Purpose
No ratings yet
Time in French Exam Purpose
14 pages
Reference Guide - Pandas Tools For Structuring A Dataset
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
5 pages
Great Consummation Way of Falun Dafa 2006
No ratings yet
Great Consummation Way of Falun Dafa 2006
46 pages
SAIPEM RFI 4-Dec-2024
No ratings yet
SAIPEM RFI 4-Dec-2024
23 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
12 Pandas
100% (1)
12 Pandas
21 pages
This Discussion of Commercial Activities
No ratings yet
This Discussion of Commercial Activities
23 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Pandas
No ratings yet
Pandas
25 pages
AICTE Published Free Open Source Software List Instead of Commercial Software For Academics
0% (1)
AICTE Published Free Open Source Software List Instead of Commercial Software For Academics
3 pages
Review of Related Literature For Attendance Monitoring System
100% (2)
Review of Related Literature For Attendance Monitoring System
7 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
AR 700-141 Hazardous Materials - Army
No ratings yet
AR 700-141 Hazardous Materials - Army
19 pages
Analysis For Frank O'Hara - Sevan Amiroğlu
No ratings yet
Analysis For Frank O'Hara - Sevan Amiroğlu
93 pages
Pandas
No ratings yet
Pandas
13 pages
DOC-20250522-WA0001.
No ratings yet
DOC-20250522-WA0001.
4 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Personal Letter Class 11 Senior High School
No ratings yet
Personal Letter Class 11 Senior High School
3 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Markov Analysis: 1 Sasadhar Bera, IIM Ranchi
No ratings yet
Markov Analysis: 1 Sasadhar Bera, IIM Ranchi
37 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Tutorial Letter 202/1/2017: Audit Planning and Tests of Control
No ratings yet
Tutorial Letter 202/1/2017: Audit Planning and Tests of Control
12 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas
No ratings yet
Pandas
9 pages
PS400 Bolted Plastic Pump: Engineering Operation Maintenance
No ratings yet
PS400 Bolted Plastic Pump: Engineering Operation Maintenance
28 pages
GH013S
No ratings yet
GH013S
3 pages
5 Palamuru University
No ratings yet
5 Palamuru University
4 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
PM Report
No ratings yet
PM Report
3 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
What is pandas
No ratings yet
What is pandas
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Operation Risk Management PDF
No ratings yet
Operation Risk Management PDF
13 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Financial Asset at Fair Value Problem 21-1 (IFRS) : Solution 21-1 Answer C
No ratings yet
Financial Asset at Fair Value Problem 21-1 (IFRS) : Solution 21-1 Answer C
15 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Greater Palatine Artery
No ratings yet
Greater Palatine Artery
9 pages
Wenchuan Earthquake
No ratings yet
Wenchuan Earthquake
8 pages
3D Leadership Assessment Tool
No ratings yet
3D Leadership Assessment Tool
7 pages
Informal Email Worksheet 9
No ratings yet
Informal Email Worksheet 9
2 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Accomodation block checklist
No ratings yet
Accomodation block checklist
2 pages
IXL Analyze short stories 6th grade language art
No ratings yet
IXL Analyze short stories 6th grade language art
1 page
Pandas
No ratings yet
Pandas
5 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
XP00-000 840 04 01 02
100% (1)
XP00-000 840 04 01 02
46 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Model For Base Correlations
No ratings yet
Model For Base Correlations
12 pages
The Famous .45 ACP (.45 Auto) - by Chuck Hawks
No ratings yet
The Famous .45 ACP (.45 Auto) - by Chuck Hawks
3 pages