100% found this document useful (2 votes)

270 views33 pages

Chapter-2 Python Pandas

This chapter discusses pandas, a popular Python library for data analysis. It covers pandas data structures like Series and DataFrame, as well as common operations like selecting/accessing data, descriptive statistics, pivoting, sorting, grouping, and aggregating. Functions like loc, iloc, describe(), hist(), groupby(), agg(), and transform() are explained. The chapter aims to teach readers how to efficiently work with data frames in pandas.

Uploaded by

Swarnim Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

270 views33 pages

Chapter-2 Python Pandas

Uploaded by

Swarnim Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Chapter -2

Python Pandas

Introduction :

Pandas is one of the most preferred and widely used data

science libraries. It offer efficient data structure which are not
only powerful , but also very convenient and flexible.

In this chapter we will learn :

pivoting, sorting and aggregation.

Descriptive statistics – histogram and quantiles, various

functions, reindexing and altering labels.

Pandas provide two basic data structures – Series and

DataFrame.

Series – It represents a one-dimensional array of indexed

data.

DataFrame – It store data in two-dimensional way.

Some attributes of DataFrame are :

1. index – to display index (row labels) of the DataFrame.

2. columns – to display column labels of the DataFrame.

3. axes – Return a list representing both the axes – row and
columns.

4. size – Return number of elements in DataFame

5. shape – Return dimension of DataFrame

6. values – Return DataFrame in the form of Numpy Array.

Selecting/Accessing a Column :
Selecting/Accessing a Subset from a DataFrame using
Row/Column Names :

Syntax :

<DataFrameObject>.loc[<startrow>:<endrow>]

To access selective columns use :

<DF object>.loc[:,<start column>:<end column>]

To access range of columns from a range of rows , use :

<DataFrameObject>.loc[<startrow>:<endrow>,
<startcolumn>:<endcolumn>]

Obtaining a Subset/Slice from a DataFrame using

Row/Columns Numeric Index/Position

<DF object>.iloc[<start row index>: <end row index>,

<start col index> : <end column index>]

Selecting/Accessing Individual Values :

<DF object>.<column>[<row name or row numeric index>]

Adding and Deleting Columns in DataFrames

Deleting Columns : use del statement to delete a column

Del<DF object>[<column name>]

Descriptive Statistics with Pandas :

Pandas include many useful statistical functions.

Reference dataframe namely sal_df

Functions min() and max() : The min() and max() functions

find out the minimum or maximum values respectively form a
given set of data.

Parameters :

axis : (0 or 1) by default, minimum and maximum is

calculated along axis 0.
Functions mode() , mean(), median() :

mode() – Returns the mode value (i.e., the value that appears
most often) from a set of values.

Parameters :

axis : axis 0 or ‘index’ get mode of each column

axis 1 or ‘columns’ : get mode of each row

mean() – Returns the computed mean(average) form a set of

values.

median() – returns the middle number form a set of numbers.

(2000 10000 12000 13000)

(10000+12000) =22000, 22000/2 = 11000

(2000 6000 7000 7000)

(6000 + 7000) = 13000, 13000/2 = 6500

Functions count( ) and sum( )

count( ) – counts the non-NA entries for each row or column.

sum( ) – returns the sum of values for the requested axis.

Applying Functions on a Subset of Dataframe :

Sometimes , we need to apply a function on a selective column

or row or a subset of the data frame.

Applying Function on a Column of a DataFrame:

To apply a function on a column, write –

<dataframe>[<column name>]

Applying function on Multiple Columns of a DataFrame :

<dataframe>[[<column name>, <column name>,….]]

Applying Function on a row of a DataFrame :

<dateframe>.loc[<row index>, :]

Applying Functions on a range of rows of a DataFrame:

<dataframe>.loc[<start row> : <end row>, :]

Applying functions to a subset of the DataFrame :

<dataframe>.loc[<start row>:<end row>, :

<start column> : <end column>]

Advanced Operations on DataFrame :

Three advanced operations are :

1. pivoting
2. sorting
3. aggregation
1. Pivoting : Pivoting is actually a summary technique that
works on tabular data (i.e., data in rows and columns).
Pivoting technique rearranges the data from rows and columns,
by possibly aggregating data from multiple sources, in a report
form (with rows transferred to columns) so that data can be
viewed in a different perspective.

Real life example :

An online tutoring company maintains its data about tutors and

online classes in the following table.

Using pivot Function :

Cells in the pivoted table which do not have a matching entry
in the original one are set with NaN.

Now change the rows and columns , i.e. the index and columns
arguments

We can skip the values argument:

The above data is for one quarter only. The online tutoring
company has data for the entire year as shown below :
The index i.e. , the rows are specified as ‘Tutor’ and the
columns as ‘Country’. There are multiple entries of tutor which
are very much different for same country.

Consider Tahira’s entries

Tutor Classes Quarter Country

Tahira 28 1 USA
Tahira 36 2 USA
Tahira 24 3 Barzil
Tahira 36 4 Japan

Try to create a row for tutor Tahira from the above data with
columns as Country.

USA Brazil Japan

Tahira 24 36

Using pivot_table( ) function :

Example-1 Considering the tutoring company data, compute

total classes per tutor.
Example-2 considering the tutoring company data, compute
number of countries (count) per tutor.

Example -3 Considering the tutoring company data, compute

total classes by country.

Example-4 Considering the tutoring company data, compute

total classes on two fields, tutor and country wise.

Example 2.5 Considering the tutoring company data, compute

average classes on two fields
Sorting : Sorting refers to arranging values in a particular
order.

sort_values( ) – function arrange the values in ascending or

descending order.
Creating Histogram :

Histogram – A histogram is a plot that lets you discover, and

show the underlying frequency distribution(shape) of a set of
continuous data.

hist( ) function of pandas is used to create histogram.

Consider the following histogram that has been computed using

the following datasets containing age of 20 people.

37 28 38 44 53 69 74 53 35 38 66 46 24 45 92 48 51 62 58 57
Bin Frequency Age included in Bin

20-30 2 28,24

30-40 4 37,38,35,38

40-50 4 44,46,45,48

50-60 5 53,53,51,56,57

60-70 3 69,66,92

70-80 1 74

80-90 0 --

90-100 1 92
Function Application : It means that a function (a library
function or user defined function ) may be applied on a
dataframe in multiple ways:

(a) on the whole dataframe

(b) row-wise or column-wise
(c) on individual elements, i.e. element-wise

for the above mentioned three types of function application,

Pandas offers following three functions :

(a) pipe() – dataframe wise function application.

(b) apply() – row-wise/column-wise function application.
(c) applymap() – individual elements-wise function application.

(a) pipe( ) function : The piping of functions through pipe( )

basically means the chaining of functions in the order they are
executed.
pipe() Example 1 Function add( ) followed by multiply( )
applied on a dataframe.

The apply and applymap() functions :

apply () – apply is a series function, so it applies the given

function to one row or one column of the dataframe (as single
row/columns of a dataframe is equivalent to a series).

Syntax - <dataframe>.apply(<funcname>,axis = 0)

axis 0 or 1 default 0 , axis along with the function is applied.

If axis is 0 or ‘index’ : function is applied on each column

If axis is 1 or ‘columns’ : function is applied on each row.

applymap() – is an element function, so it applies the

given function to each individual elements, separately.

Syntax - <dataframe>.applymap(<funcname>)
To apply, apply() row-wise write :

<dataframe>.apply(<func>, axis = 1)

NOTE – The apply() will apply the function on individual

columns/rows, only if the passed function name is a Series
function. If you pass a single value function, then apply() will
behave like applymap()
Function groupby( ) :

The duplicate values in the same filed are grouped together to

form groups, e.g. for creating Tutor wise groups :

All the rows having Tutor as Tahira will be clubbed to form

Tahira group.

groupby() function - is used to create group for the

duplicate values in the same filed.
The groupby() creates the group internally and does not
display the grouped data by default.
Grouping on Multiple Columns : Create a group for Tutor
and for each tutor group, a Country-wise subgroup :

Example : to get a group having tutor name as ‘Anusha’ and

Country as ‘UK’ write :

Aggregation via groupby () :

agg( ) method – aggregates the data of the dataframe using

one or more operations over the specified axis.
Syntax - <dataframe>.agg(func, axis = 0)

mean :-

36,40,30,32

(36+40+30+32)/4 =
34.5

median :-

30 32 36 40

n/2 = 4/2 = 2

(n/2) + 1 = (4/2) + 1
=3

(32 + 36) / 2 = 34

Sum : (36+40+30+32)
= 138

We may combine the groupby( ) and agg( ) in single command

:
The transform( ) function : This function transforms the
aggregate data by repeating the summary result for each row
of the group and make the result have the same shape as
original data.

Q- What if we want to add this aggregate data to the

dataframe itself?
Reindexing and Altering Labels : The methods provided by
Pandas for reindexing and relabeling are :

1. rename( ) – simply rename the index and/or column labels

in a dataframe.
(ii) reindex( ) – specify the new order of existing indexes
and column labels, and/or also create new indexes/column
labels.

(a) Reordering the existing indexes using reindex( )

Adding indexes :
(iii) reindex_like() – for creating indexes/column-labels
based on other dataframe object.

<dataframe>.reindex_like(other)
Solved Problems :

1. Consider the following code to create two dataframes with

similar values. What will be printed by the code given below ?
Justify your answer.

import pandas as pd

df1 = pd.DataFrame([1,2,3])

df2 = pd.DataFrame([[1,2,3]])

print("df1")

print(df1)

print("df2")

print(df2)

Ans :

G10 - DLL - 2ND Quarter
85% (13)
G10 - DLL - 2ND Quarter
171 pages
Control Flow Statements and Loops
No ratings yet
Control Flow Statements and Loops
8 pages
Object Serialization With Pickle, JSON and YAML PDF
No ratings yet
Object Serialization With Pickle, JSON and YAML PDF
10 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Strings PDF
No ratings yet
Strings PDF
14 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
File Handling: Types of Files
No ratings yet
File Handling: Types of Files
19 pages
Chapter - 6 Dictionary
100% (2)
Chapter - 6 Dictionary
25 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Exception Handling
No ratings yet
Exception Handling
28 pages
Chapter 2 - NumPy and Pandas
No ratings yet
Chapter 2 - NumPy and Pandas
26 pages
Pandas
No ratings yet
Pandas
41 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Python File Handling PDF
100% (1)
Python File Handling PDF
20 pages
Advanced Python
No ratings yet
Advanced Python
204 pages
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
No ratings yet
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
20 pages
Python UNIT III-Part-1
No ratings yet
Python UNIT III-Part-1
34 pages
03 Strings in Python
No ratings yet
03 Strings in Python
29 pages
Class XII (As Per CBSE Board) : Informatics Practices
No ratings yet
Class XII (As Per CBSE Board) : Informatics Practices
43 pages
Input and Output Statements
No ratings yet
Input and Output Statements
9 pages
Python Functions
No ratings yet
Python Functions
29 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Database Management System
No ratings yet
Database Management System
35 pages
String Data Type PDF
No ratings yet
String Data Type PDF
24 pages
Unit 4
No ratings yet
Unit 4
60 pages
40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Python Exception Handling PDF
100% (1)
Python Exception Handling PDF
23 pages
Python Main Program Set 2
No ratings yet
Python Main Program Set 2
18 pages
Python Revision Material - CH.1,2.3.5.9
No ratings yet
Python Revision Material - CH.1,2.3.5.9
31 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
Pandas
100% (2)
Pandas
2,017 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Python Programming Notes
No ratings yet
Python Programming Notes
143 pages
IP Class-XI Chapter-9 NOTES
No ratings yet
IP Class-XI Chapter-9 NOTES
14 pages
Python Practical Program-Xi
No ratings yet
Python Practical Program-Xi
10 pages
Python Practice Exercise PDF
No ratings yet
Python Practice Exercise PDF
3 pages
Python Handwritten Notes
No ratings yet
Python Handwritten Notes
42 pages
13.file Handling
No ratings yet
13.file Handling
66 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Dictionary Data Structure
No ratings yet
Dictionary Data Structure
10 pages
Python Program
No ratings yet
Python Program
7 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
12 Ip
No ratings yet
12 Ip
5 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
Python Question Bank Complete 100 Question
No ratings yet
Python Question Bank Complete 100 Question
23 pages
Python Date Time
No ratings yet
Python Date Time
6 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
29 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Interpolating Polynomial
No ratings yet
Interpolating Polynomial
32 pages
PERIODICAL 3rd QUARTER GRADE 7
No ratings yet
PERIODICAL 3rd QUARTER GRADE 7
4 pages
10 Quiz 1 Llasos
No ratings yet
10 Quiz 1 Llasos
3 pages
Harmonic Compensation For Single Phase Load Using Single Phase Shunt Active Power Filter
No ratings yet
Harmonic Compensation For Single Phase Load Using Single Phase Shunt Active Power Filter
20 pages
MATH 107 Curve-Fitting Project - Linear Regression Model (UMUC)
No ratings yet
MATH 107 Curve-Fitting Project - Linear Regression Model (UMUC)
3 pages
Adequacy of Length Records
100% (1)
Adequacy of Length Records
16 pages
Grades 20
No ratings yet
Grades 20
2 pages
Grade 12 NSC Mathematics P1 (English) 2019 Preparatory Examination Possible Answers
No ratings yet
Grade 12 NSC Mathematics P1 (English) 2019 Preparatory Examination Possible Answers
20 pages
Set 3 Maths Paper (2022-2023)
No ratings yet
Set 3 Maths Paper (2022-2023)
12 pages
Data Structures Answers-1
No ratings yet
Data Structures Answers-1
3 pages
Fibers: Shrinkage Cracking of Concrete Slabs-On-Grade: A Numerical Parametric Study
No ratings yet
Fibers: Shrinkage Cracking of Concrete Slabs-On-Grade: A Numerical Parametric Study
18 pages
ANSWERS Areas Related To Circles 4 M
No ratings yet
ANSWERS Areas Related To Circles 4 M
5 pages
Pidc Questions Bank
No ratings yet
Pidc Questions Bank
33 pages
6 Mathematical Thinking-Geo
No ratings yet
6 Mathematical Thinking-Geo
17 pages
Math 2020
No ratings yet
Math 2020
17 pages
5g Wireless Propagation Channel Models
No ratings yet
5g Wireless Propagation Channel Models
24 pages
Identify Quad Parts & Projectile Motion
No ratings yet
Identify Quad Parts & Projectile Motion
1 page
Y2024 AM 4049 Additional Mathematics P2
No ratings yet
Y2024 AM 4049 Additional Mathematics P2
21 pages
The Distribution of Share Price Changes (The Journal of Business, Vol. 45, Issue 1) (1972)
No ratings yet
The Distribution of Share Price Changes (The Journal of Business, Vol. 45, Issue 1) (1972)
8 pages
Lecture 5 - Strategy Analysis and Choice
No ratings yet
Lecture 5 - Strategy Analysis and Choice
24 pages
Lecture 1 W2 Engineering Drawing
100% (1)
Lecture 1 W2 Engineering Drawing
18 pages
Robotic and AI Book-9 Booklet
No ratings yet
Robotic and AI Book-9 Booklet
12 pages
01 - Function Operations
No ratings yet
01 - Function Operations
2 pages
(B. D. Shizgal) Towards The Resolution of The Gibbs
No ratings yet
(B. D. Shizgal) Towards The Resolution of The Gibbs
25 pages
Shi Et Al - 2022 - Development of Predictive Statistical Shape Models For Paediatric Lower Limb
No ratings yet
Shi Et Al - 2022 - Development of Predictive Statistical Shape Models For Paediatric Lower Limb
12 pages
Cost Efficiency of The Banking Industry
No ratings yet
Cost Efficiency of The Banking Industry
25 pages
NMO Seismic Data Processing
No ratings yet
NMO Seismic Data Processing
56 pages
Physics - Horizontal Projectile Motion NOTES
No ratings yet
Physics - Horizontal Projectile Motion NOTES
19 pages
GUJCET - 2021 - MCQ Exam - WWW - Gujcet.in
No ratings yet
GUJCET - 2021 - MCQ Exam - WWW - Gujcet.in
8 pages