Pandas Tutorial

Pandas is a Python library designed for data manipulation and analysis, providing functions for cleaning, exploring, and transforming datasets. It is particularly suited for smaller datasets and offers a more flexible interface compared to SQL, which is optimized for larger datasets and complex queries. Key features of Pandas include data structures like Series and DataFrames, methods for handling missing values, and capabilities for data visualization and outlier detection.

Uploaded by

I M Hi-MAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

Pandas Tutorial

Uploaded by

I M Hi-MAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

What is PANDAS

• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• Pandas can clean messy data sets, and make them readable and
relevant.
• Relevant data is very important in data science.
PANDAS vs SQL
• Pandas is ideal for working with smaller datasets that can fit into memory and
provides a more flexible and intuitive interface for data manipulation. SQL, on the
other hand, is ideal for working with larger datasets that cannot fit into memory
and provides powerful aggregation and filtering capabilities.
• When it comes to speed and performance, SQL has the upper hand over Pandas.
SQL is optimized for working with large datasets and can handle millions of rows of
data with ease. However, Pandas provides a more flexible and intuitive interface for
data manipulation, making it easier to work with for smaller datasets.
• To choose Between Pandas and SQL depends on the specific requirements of your
project, If you’re working with smaller datasets or need more flexibility in data
manipulation, Pandas is the way to go. If you’re working with larger datasets or
need more advanced aggregation and filtering capabilities, SQL is preferrable.
Types of Datatypes in Pandas
• Series
• Dataframe
• Panel
Series
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.

import pandas as pd
a = [12, 74, 26]

var = pd.Series(a)

print(myvar)
Labels
• Create your own label
• We can create our own label using index parameter
var = pd.Series(a, index = ["x", "y", "z"])
DATAFRAMES
• Data sets in Pandas are usually multi-dimensional
tables, called DataFrames.
• Series is like a column, a DataFrame is the whole table.
• It is created by using DataFrame method.

pd.DataFrame()
import pandas as pd
pd.DataFrame([1,3,5,6],columns=['w','x','y','z'])
#It will create 4 rows as we have 4 elements in list

pd.DataFrame([[23,43,89,1]],columns=['A'])
#It will create 1 row as we have only one element in list.
Create dataframe using Dictionary
Import pandas as pd
dic = {'A':112,'B':78,'C':43}
pd.DataFrame(dic)
LOcate elements in Dataframe
• To locate perticular elements of a dataframe we use loc and iloc method in
pandas
• To locate the element based on index name and column name, we use loc.

df.loc[0]
df.loc[2,'x']

• To locate the element based on index number and column number we use
iloc.
df.iloc[2,3]
Other functions of loc and iloc
• Df.loc[:,'A']
• Df.loc[:,:]
• Df.loc[2:6,'X']
• Df.iloc[2:10,23:56]
• Df.iloc[:]
Replace value using loc
• Df.loc[1,'A'] = 34
• Df.iloc[2,4] = 'ABC'
Read a file in Pandas
import pandas as pd

df = pd.read_csv('data.csv')

df.head()
df.tail()
df.shape
df.dtypes
df.select_dtypes('int')
df.describe()
Select and filter records
To select records there are mutliple ways.

df['col1']
df[df.col1 == 'val']
df[df.col.isin([val1,val2....,valn])
df.loc[df[df.col1=='val'].index,col2]
df[df.col.between(10,200)]
Handle Null and duplicate values
• Detect Nan values
df.isnull().sum()
df.dropna(subset=[],how='all',inplace=True)
df.fillna()
df.duplicated()
df.drop_duplicates()
Drop a row or column in Dataframe
df.drop('A',axis=1)
df.drop(1,axis=0)
df.drop(df[df.col=='val'].index,axis=0)
df = df[~(df[col=='val'])]
Mean, Median,Mode
df['col'].mean()
df['col'].median()
df['col'].mode()
df['col'].max()
Insert a row and column to a dataframe
df.insert(col_pos,col_name,value)
df['col1'] = val
df.iloc[row_number]=[2,3,4,5]
Other Important Methods
• df.rename()
• df.col.unique()
• df.nunique()
• df.value_counts('col')
• df.replace()
Plot in Pandas
• Bar plot
• Histogram
• Line plot
• scatter plot
Lambda Function
An anonymous function which is used to execute the code in
a single line of code.
It is very fast compared to traditional function

Syntax : lambda x : x*2

Map vs apply vs applymap
• map is defined on Series data only
• applymap is defined on DataFrames only
• apply is defined on both
Sort and groupby
Sort a dataframe
df.sort_values('col1',ascending=False)

Groupby
df.groupby('col1')['col2'].mean()
Iterate over dataframe
• For i in df.col:
• Pass
• df.iterrows() - iterate over row wise
• df.iteritems() - iterate over column
• df.itertuples() - iterate over row wise
Handle datetime data
• From datetime Import datetime
• pd.date_range(start,end,freq)
• datetime.strptime(x,'%Y-%m-%d')
• df.col.dt.strftime()

Ref : https://www.programiz.com/python-
programming/datetime/strptime
Concat and Joins
Like sql we can perform Join operations in Pandas.
Operations are almost equal to sql join except syntax.

pd.merge(df1,df2,on='col1',how='')
Concat vs Join vs Merge
Concat is used to concat a dataframe in row or column based on axis.
pd.concat()
Join is used to join the dataframes based on index.
df1.join(df2)
Merge will join the dataframes based on columns like sql join
df.merge(df1,on=[],how=’inner’)
Pivoting tables
• import pandas as pd
• pd.pivot_table(df,columns=[],index=[],values=[],aggfunc='mean')
Handle categorical values
• Categorical values represents the class of a variable.
• In real time data processing we can't go with class value
as system only understand numerical value

Using get_dummies() , we can convert to numerical.

pd.get_dummies(df,columns=['col1'])
Using map function to assign our own values to each class.
e.g {'IND': 1,"AUS" :2, "PAK" 3}
df.col.map({'IND': 1,"AUS" :2, "PAK" 3})
Onehot encoding
Save a Dataframe
• We can save Pandas dataframe into any file.
Outlier checking
• An outlier is a data point that significantly deviates from the rest of the data. It can be caused by
measurement errors, data entry errors, or simply natural variation in the data. Outliers can skew
statistical analyses and lead to incorrect conclusions.
We have several methods to identify.
• IQR:
Defines outliers as values below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR.
• Z-score:
Measures how many standard deviations a data point is from the mean. Values greater than 3 or less than
-3 are often considered outliers.
z =(x-mean)/std_dev
Box Plot:
Visually represent the distribution of data and highlight potential outliers.
Methods for Handling Outliers
• Removing outliers:
• If outliers are due to errors or anomalies, they can be removed from
the dataset.
• Imputing outliers:
• If outliers are valid data points, they can be replaced with the mean,
median, or mode of the remaining data.
• Capping outliers:
• Outliers can be capped at a certain value, such as the 90th or 95th
percentile.
Read large files
To read large files efficiently we use chunking in pandas.

e.g
Import pandas as pd
chunksize = 1000
for chunk in pd.read_csv('datasets.csv', chunksize=chunksize):
print(chunk)

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pressure Loss Calculation Sheet
No ratings yet
Pressure Loss Calculation Sheet
6 pages
The Defining Issues Test
100% (2)
The Defining Issues Test
4 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Anubhav Case Study Competition: Building A Winning Formulation: The Turnaround Story of Ajanta Pharma
No ratings yet
Anubhav Case Study Competition: Building A Winning Formulation: The Turnaround Story of Ajanta Pharma
12 pages
Pandas
No ratings yet
Pandas
42 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas
No ratings yet
Pandas
13 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
41 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas
No ratings yet
Pandas
4 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas
No ratings yet
Pandas
7 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
AI Student HandbookXII 2025-26!8!20
No ratings yet
AI Student HandbookXII 2025-26!8!20
13 pages
Pandas
No ratings yet
Pandas
94 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
Learn Complete Pandas With Real World Interviews Questions
No ratings yet
Learn Complete Pandas With Real World Interviews Questions
40 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Unit 3
No ratings yet
Unit 3
10 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas
No ratings yet
Pandas
5 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Bussiness Proposal - Social Problem Subject
No ratings yet
Bussiness Proposal - Social Problem Subject
13 pages
Lok Sabha: Bulletin - Part I
No ratings yet
Lok Sabha: Bulletin - Part I
2 pages
P211 Modbus
No ratings yet
P211 Modbus
12 pages
Implementation Integrated Water Resources Management Arab Countries English
No ratings yet
Implementation Integrated Water Resources Management Arab Countries English
89 pages
Design of An Extrudate Filament Machine For Recycl
No ratings yet
Design of An Extrudate Filament Machine For Recycl
8 pages
Tutorial No.5-Applications of Bernoulli's Equation: AE 225 Fluid Dynamics
No ratings yet
Tutorial No.5-Applications of Bernoulli's Equation: AE 225 Fluid Dynamics
2 pages
Decision Support System Assignment 1
No ratings yet
Decision Support System Assignment 1
4 pages
Thejas Consulting LTD Form
No ratings yet
Thejas Consulting LTD Form
3 pages
Introduction To Curriculum Development
100% (1)
Introduction To Curriculum Development
45 pages
1 Introduction
No ratings yet
1 Introduction
29 pages
3314911-030 Int-Eng Lifepak 15 Operating Instructions
No ratings yet
3314911-030 Int-Eng Lifepak 15 Operating Instructions
318 pages
Opportunities For Foldable Backpack With Posture Corrector
No ratings yet
Opportunities For Foldable Backpack With Posture Corrector
2 pages
Group 5 Section A Bellissimo Premium Ice Cream
0% (1)
Group 5 Section A Bellissimo Premium Ice Cream
40 pages
Computer Graphics - Saurabh Kumar (01714402009) Bca 3 Year
100% (1)
Computer Graphics - Saurabh Kumar (01714402009) Bca 3 Year
35 pages
LUMS SSE Personal Statement
No ratings yet
LUMS SSE Personal Statement
5 pages
911 GT3 Porsche Car Configurator
No ratings yet
911 GT3 Porsche Car Configurator
1 page
SAP First Guidance Deploy SAP BW 4HANA T
No ratings yet
SAP First Guidance Deploy SAP BW 4HANA T
39 pages
Enrolio Documentation
No ratings yet
Enrolio Documentation
48 pages
Mak Camex Compound
No ratings yet
Mak Camex Compound
2 pages
Design A Robust Proportional-Derivative Gain-Sched
No ratings yet
Design A Robust Proportional-Derivative Gain-Sched
21 pages
Ar Pp2 Sample Conract Documents Reeport
No ratings yet
Ar Pp2 Sample Conract Documents Reeport
26 pages
Miniature Rotor Craft
No ratings yet
Miniature Rotor Craft
16 pages
ING-ABB-GE-BAWER-CCM-PTAP-BateriaColon-Planos - Tipicos-BomReport-LA-MCC Bowell
No ratings yet
ING-ABB-GE-BAWER-CCM-PTAP-BateriaColon-Planos - Tipicos-BomReport-LA-MCC Bowell
5 pages
Analytical Report On Case Study Titled H
No ratings yet
Analytical Report On Case Study Titled H
12 pages
The Development of An Air Quality Management System For Lahore Pakistan
No ratings yet
The Development of An Air Quality Management System For Lahore Pakistan
19 pages
Personal Statement
No ratings yet
Personal Statement
4 pages
CSC 508 Simulation
No ratings yet
CSC 508 Simulation
57 pages

Pandas Tutorial

Uploaded by

Pandas Tutorial

Uploaded by

What is PANDAS

• Pandas is a Python library used for working with data sets.

Syntax : lambda x : x*2

Using get_dummies() , we can convert to numerical.

You might also like