0% found this document useful (0 votes)

6 views

Pandas DataFrame

The document provides an overview of the Python Pandas library, focusing on its data handling capabilities for manipulation and analysis. It details the two primary data structures in Pandas, Series and DataFrame, and demonstrates various operations such as creating, modifying, and accessing data within these structures. Additionally, it covers methods for adding and deleting rows and columns, as well as selecting and renaming data elements.

Uploaded by

akritijha0908

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Pandas DataFrame

Uploaded by

akritijha0908

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Unit 1: Data Handling using Pandas

Introduction
• Python Pandas is a software library written for the Python programming language for
data manipulation and analysis regardless of the origin of the data.
• Pandas is defined as an open-source library that provides high-performance data
manipulation in Python.
• The name of Pandas is derived from the word Panel Data.
• It is developed by Wes McKinney in 2008.
• Using the Pandas, we can accomplish five typical steps- load, prepare, manipulate,
model and analyse

Benefits of Pandas
• It can easily represent data in a form naturally suited for data
analysis
• It provides clear code to focus on the core part of the code.
Made by PGT Comp Sc. Ms. Puja Gupta
Data structures in Pandas
Data structure is defined as the storage and management of the data for its
efficient and easy access in the future where the data is collected, modified and
the various types of operations are performed on the data respectively.

Pandas provides two data structures for processing the data:

(1) Series: It is one dimensional object similar to an array, list or column in a
table. It will assign a labelled index to each item in the series. By default, each
item will receive an index label from 0 to N, where N is the length of the series
minus one.
(2) DataFrame: It is a tabular data structure comprised of rows and columns.
DataFrame is defined as a standard way to store data which has two different
indexes i.e., row index and column index.
Made by PGT Comp Sc. Ms. Puja Gupta
Panda DataFrame
Key Points of Series
1. Two Dimensional Data Structure - That is because it has both rows and columns
2. Labelled Indexes - The rows and columns have indices.
3. Heterogeneous Data - Each column will have similar data, however, the entire
DataFrame can have multiple columns with Different Datatypes.
4. Value is Mutable - Data can be updated at any point in time.
5. Size is Mutable - Rows and Columns can be added or removed after the creation of
the DataFrame.
6. For rows the axis=0 and columns axis=1
A pandas DataFrame can be created
using the following function

pandas. DataFrame (data, index, columns, dtype, copy)

Made by PGT Comp Sc. Ms. Puja Gupta
Panda DataFrame- Creating Empty DataFrame
pandas. DataFrame (data, index, columns, dtype, copy)

import pandas as pd
df=pd.DataFrame()
#pd.DataFrame(None)
print(df)

Made by PGT Comp Sc. Ms. Puja Gupta

Panda DataFrame- Creating From Dictionary
When DataFrame is created by using Dictionary, keys of dictionary are set as columns of
DataFrame. You can change the order of columns and store specified columns. If you try
to change the column name, NaN will be displayed.
Note: Column name values must be same as dictionary keys

import pandas as pd
Dic={'roll':[1,2,3],'name':('a','b','c'),'marks':(24,53,66)}
df=pd.DataFrame(Dic)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From Dictionary with custom index

import pandas as pd
Dic={'roll':[1,2,3],'name':('a','b','c'),'marks':(24,53,66)}
df=pd.DataFrame(Dic,index=[11,12,13])
# or df=pd.DataFrame(index=[11,12,13],data=Dic)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From nested list (2D List)

import pandas as pd
l=[['eng',101],['chem',99],['comp',100]]
df=pd.DataFrame(l )
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From nested list (2D List) with labelled indexes

import pandas as pd
l=[['eng',101],['chem',99],['comp',100]]
df=pd.DataFrame(l,index=list(range(5,20,5)),columns=['sname','scode'])
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Add a column (always at the end)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df['UT1']=[12,13,14]
df['UT2']=df['UT1']+5
df['UT3']=df['UT1']+df['UT2']
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Add a column using insert() method
df.insert(loc, column, value, allow_duplicates = False) loc is an integer which is the location of column where we
want to insert new column. This will shift the existing column at that position to the right.
import pandas as pd
dict1={"Name":["sanah",'chavi','suditi'],"PB1":[78,88,98],"PB2":[87,93,97]}
df=pd.DataFrame(dict1,index=['a','b','c'])
print(df)
df.insert(2,'age',[1,2,3],allow_duplicates=True)
#2 is index location at which field age will be inserted
#and if ‘age’ field already exists then allow_duplicates will permit
Columns names
print(df)

Row index
Before Made by PGT Comp Sc. Ms. Puja Gupta
After insertion
DataFrame- Add a Row
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df.loc[len(df)]=['raji',100]
# df.loc[3]=[ 'raji',100]
# if given an already existing id then will replace
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Drop or Delete a Row
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df=df.drop(1,axis=0)
# by default axis is 0, so if not given any axis it will be 0
#df.drop(1,axis=0,inplace=True) will make changes in df
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Drop or Delete a Column
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df=df.drop(‘name’,axis=1)
#df.drop(‘name’,axis=1,inplace=True)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of some condition
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['name']=='puja'])
#print(df[df['name']=='puja'])

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of JUST
condition, answer will be Boolean values

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df['name']=='puja')
#print(df['name']=='puja')

print(df.loc[df['marks']>=80])
#print(df[df['marks']>=80])

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of some condition
Showing few columns

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['name']=='puja']['name'])
#single column
print(df.loc[df['name']=='puja'][['name','marks']])

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Accessing Elements from a DataFrame on the basis of some condition,
showing one columns/few columns/ all columns

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['marks']>=80,'marks'])

print(df.loc[df['marks']>=80,['marks','name']])

print(df.loc[df['marks']>=80,:])
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Drop or Delete a Row with some condition
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
ind=df.loc[df['name']=='puja'].index
print(ind)
df=df.drop(ind,axis=0)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Selecting a column

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df['name'])# return data type is series

print(df[['name']]) #return data type is DataFrame

#print(df['UT1'])#if column does not exists will give KeyError

Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Selecting a subset of DataFrame with loc and iloc methods
• The loc() function is label based data selecting method which means that we have to pass the name of the row or
column which we want to select.
• This method includes the last element of the range passed in it, unlike iloc().
• loc() can accept the boolean data unlike iloc().

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
print(df.loc[:,:]) # all rows and columns

print(df.loc[0:1,'name':'marks'])#range of rows & columns

print(df.loc[[0,2],['name']]) #few rows & columns

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Selecting a subset of DataFrame with loc and iloc methods
• The iloc() function is an indexed-based selecting method which means that we have to pass an integer index in the
method to select a specific row/column.
• This method does not include the last element of the range passed in it unlike loc().
• iloc() does not accept the boolean data unlike loc().

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
print(df.iloc[:,:])

print(df.iloc[0:1,0:1])

print(df.iloc[[0,2],[0,1]])
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Rename column names(few/all)/ Row index(few/all)
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

df=df.rename(columns={'name':'Child Name'},index={0:'stud1'})
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Rename column names(all)/ Row index(all)
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

df.columns=['Child Name','marks']
df.index=['stud1','stud2','stud3']
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- head() and tail() function
head()-By default DataFrame.head() function display top 5 rows. To print n no of top rows, pass n as
parameter i.e. DataFrame. head(n)
tail()-By default DataFrame.tail() function display last 5 rows. To print n no of last rows, pass n as parameter
i.e. DataFrame. tail(n)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.head())
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- head() and tail() function
head()-with negative argument(-n) : Total no of rows-n=head(ans)
tail()-with negative argument(-n) : Total no of rows-n=tail(ans)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.head(-4))
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- head() and tail() function
head()-By default DataFrame.head() function display top 5 rows. To print n no of top rows, pass n as
parameter i.e. DataFrame. head(n)
tail()-By default DataFrame.tail() function display last 5 rows. To print n no of last rows, pass n as parameter
i.e. DataFrame. tail(n)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.tail())
Columns names

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.tail(-4))
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- modifying/accessing a single cell
import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df.loc[0,'name']='puja gupta'
df.at[1,'name']='aadya gupta'

df.iloc[0,1]=7
df.iat[1,1]=8 Columns names
print(df)
Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Attributes
1.DataFrame.index- to display row labels
FArea={'Pre Board 1' :[10,11,12],"Pre Board 2":[20,21,22]}
df1=pd.DataFrame(FArea,index=['a','b','c'])
print(df1)
print(df1.index)

2. DataFrame.columns- to display column labels

print(df1.columns)

3. DataFrame.dtypes- to display data type of each column

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Attributes
4. DataFrame.values- to display a NumPy ndarray having all
values in the DataFrame, without axes labels
print(df.values)

5. DataFrame.size- to display total no of elements in DataFrame

print(df.szie)# 6

6. DataFrame.T- to transpose the DataFrame i.e rows will become columns

and columns will become rows.
print(df.T)
Before After
Transpose
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Attributes
7. DataFrame.empty- to return True if DataFrame is empty otherwsie False
df1=pd.DataFrame([1,np.NaN])
df1.empty #Ans is False

8. DataFrame.axes - to return a list representing both the axes(axis 0 (row-index) axis 1 (columns))
print(df.axes)

9. DataFrame.ndim- The ndim attribute returns the number of dimensions, which is

2 for a DataFrame instance.
print(df1.ndim)#ans is 2

10. DataFrame.shape- to display tuple representing the dimensionality of DataFrame (no of

rows, no of columns)
print(df1.shape)#ans is (3, 2) Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Deleting a Single Row/more than 1 row
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)

df=df.drop(0,axis=0)
df=df.drop([1,2],axis=0)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Deleting a Single Column/more than 1 column
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)

df=df.drop('name',axis=1)
#df=df.drop(['name','marks'],axis=1)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Iterating row-wise
• iterrows() : In order to iterate over rows, we apply a iterrows() function this function return
each index value along with a series containing the data in each row. Now we apply
iterrows() function in order to get a each element of rows. It gives u horizontal subset and
gives u (row-index, Series)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
for a,b in df.iterrows():
print(a)#row index
print(b)#row with series dtype

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Iterating column -wise
iteritems():It is used to iterate data column wise.It gives u vertical subset and gives u
(column-index, Series)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
for a,b in df.iteritems():
print(a)#column name
print(b)#column values with series dtype

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- 2 types of indexing Label Based, Boolean based
1. label based indexing: DataFrame.loc[ ] is an important method that is used for label
based indexing with DataFrames.
2. Boolean based Indexing: means a binary variable that can represent either of the two
states - True (indicated by 1) or False (indicated by 0). In Boolean indexing, we can
select the subsets of data based on the actual values in the DataFrame rather than their
row/column labels. Thus, we can use conditions on column names to filter data values.

import pandas as pd
df=pd.DataFrame([1,2,3,4,5],index=[True,False,True,False,True])
#It divides dataframe into 2 groups-True rows and False rows
print(df.loc[True])
print(df.loc[False])

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- slicing a DataFrame()
import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[11,12,23,24,25,36]}
df=pd.DataFrame(d,index=[1,2,3,4,5,6]) First Slicing

print(df[:])# first slicing

print(df[1:3:1])#second slicing
print(df[-1:-4:-1])#third slicing
print(df[1::2])#fourth slicing
Fourth Slicing
Second Slicing Third Slicing

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Sorting values in ascending order
The values can be sorted on the basis of specific column or columns and can be ascending or descending order Syntax
df.sort_values(by,ascending=True,inplace=False) where by- string or list of strings

import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)
df1=df.sort_values(by='marks',axis=0,inplace=False)
print(df1)
Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Sorting values in descending order
The values can be sorted on the basis of specific column or columns and can be ascending or descending order Syntax
df.sort_values(by,ascending=True,inplace=False) where by- string or list of strings

import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)
df1=df.sort_values(by='marks',axis=0,inplace=False,ascending=False)
print(df1)
Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Sorting index in descending order by axis=0
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d,index=['s3','s1','s2','s4'])
df1=df.sort_index(axis=0,ascending=False)
print(df1)

Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Sorting index in descending order by axis=1
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d,index=['s3','s1','s2','s4'])
df1=df.sort_index(axis=1,ascending=False)
print(df1)

Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Arithmetic Operations
If matches, operation takes place otherwise it shows NaN (Not a Number). It is called Data Alignment in panda object.

Made by PGT Comp Sc. Ms. Puja Gupta

DataFrame- Arithmetic Operations
import pandas as pd
d1={ 'marks':[77,88,99,86]}
df1=pd.DataFrame(d1,index=[1,3,4,5])
print(df1)
d2={'marks':[7,8,9,6]}
df2=pd.DataFrame(d2,index=[3,2,4,5])
print(df2)
print(df1+df2)# d1.add(d2) Columns names

Row index

Made by PGT Comp Sc. Ms. Puja Gupta

Row index

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- min() with axis=0
Syntax: df.min(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.min(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- min() with axis=1
Syntax: df.min(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.min(1))# axis=1 default does for each row

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- max() with axis=0
Syntax: df.max(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.max(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- max() with axis=1
Syntax: df.max(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.max(1))# axis=1 default does for each row

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- count() with axis=0
It counts the non-NaN entries
if you do not pass any argument or pass 0 (default is 0) then it returns count of non NaN values for each column If you pass
argument as 1 then returns count of non NaN values for each row. Syntax df.count(axis=0,numeric_only=False)

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.count(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- count() with axis=1
It counts the non-NaN entries
if you do not pass any argument or pass 0 (default is 0) then it returns count of non NaN values for each column If you pass
argument as 1 then returns count of non NaN values for each row. Syntax df.count(axis=0,numeric_only=False)

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- sum() with axis=0
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.sum()) #df.sum(0) for each column

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- sum() with axis=1
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.sum(1)) #df.sum(axis=1) for each row

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- sum() of some columns
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df['Benjamin'].sum())

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- Other functions
quantile(),std()- standard deviation, var()-variance, mad() - mean absolute
deviation
1. describe()- it gives following info about the dataframe
count mean std Min 25%,50%,75% Max
2. info()- gives following info about dataframe
type
Index Values
Number of rows
Data Column and values in them
Data Type of each column
Memory usage
Data Visualization

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- isnull()
to detect missing values returns True/False.

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.isnull())

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- dropna() with axis=0
will drop any row having NaN in it

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.dropna(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- dropna() with axis=1
will drop any column having NaN in it

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.dropna(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- fillna()
will fill the NaN values with given value in them

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.fillna(33))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- idxmax() with axis=1
function to function to find the index of the maximum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmax(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- idxmax() with axis=0
function to function to find the index of the maximum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmax(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- idxmin() with axis=1
function to function to find the index of the minimum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmin(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- idxmax() with axis=0
function to function to find the index of the minimum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmin(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns.
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1)#no changes comes in either df1 or df2
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns. Ignore_index will reassign indexes from 0, by default it is False
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1,ignore_index=True)
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta

Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns. Verify_integrity if set to False will not show errors if duplicate row indexes are there.
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1,verify_integrity=False)
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta

Some other ways of Creating DataFrame
•To create from numpy array
a1=np.array([10,20,30])
df1=pd.DataFrame(a1)

•To create DataFrame from more than one ndarray (numpy array)
a1=np.array([10,20,30])
a2=np.array(["A","b","c"])
df1=pd.DataFrame([a1,a2],index=["ayush","samudra"],columns=['m1','m2','m3'])

Made by PGT Comp Sc. Ms. Puja Gupta

•To create from a list of Dictionary
Here, the dictionary keys are taken as column labels, and the values corresponding to each
key are taken as rows. There will be as many rows as the number of dictionaries present in
the list.
l=[{'grade':'A',"per":82},{'grade':'B',"per":77.9}, {'grade':'C',"per":66,"age":12}]
d=pd.DataFrame(l)

•To create from Series

ser1=pd.Series([1,2,3,4,5])
ser2=pd.Series([1000,2000,3000,4000,5000])
ser3=pd.Series([-1000,-2000,-3000,-4000,-5000])
df1=pd.DataFrame([ser1,ser2,ser3])

Made by PGT Comp Sc. Ms. Puja Gupta

•To create from Dictionary of series
s1=pd.Series([78,88],index=["aashi",'dhruv'])
s2=pd.Series([87,93],index=["adriyan",'aashi'])
d1={"pre board 1":s1, "pre board 2":s2}
d=pd.DataFrame(d1)

•Creating a DataFrame from another DataFrame

dict1={"pre board 1":pd.Series([78,88]),"pre board 2":pd.Series([87,93])}
d1=pd.DataFrame(dict1)
d2=pd.DataFrame(d1)#Here d1 and d2 are referring to same memory location so change in
one will be reflected in other.

Made by PGT Comp Sc. Ms. Puja Gupta

•Creating a DataFrame from a nested dictionary- So, Columns-
Outer Dictionary Keys and Rows- Inner Dictionary Keys
•d={'stud1':{'name':'puja','age':1},'stud2':{'name':'aadi','age':2}}
df=pd.DataFrame(d)
print(df)

Made by PGT Comp Sc. Ms. Puja Gupta

AutoCAD Vs AutoCAD LT Toolset Productivity Study Highlights (EN)
No ratings yet
AutoCAD Vs AutoCAD LT Toolset Productivity Study Highlights (EN)
1 page
Data Frames
No ratings yet
Data Frames
60 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Unit 4
No ratings yet
Unit 4
36 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Lab 9
No ratings yet
Lab 9
9 pages
Python Pandas - DataFrame
No ratings yet
Python Pandas - DataFrame
12 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Chapter 1 - Part 2 - DataFrame (1)
No ratings yet
Chapter 1 - Part 2 - DataFrame (1)
48 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
No ratings yet
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
20 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Pandas
No ratings yet
Pandas
8 pages
Line By Line 12 IP
No ratings yet
Line By Line 12 IP
21 pages
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
No ratings yet
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
12 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Pandas
No ratings yet
Pandas
5 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Handout Pandas
No ratings yet
Handout Pandas
33 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Chapter 2 Data Handling using pandas - I(DATA FRAME)
No ratings yet
Chapter 2 Data Handling using pandas - I(DATA FRAME)
15 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
_8th_of_10_Python_Resources_PANDAS_Interview_Q_A_?_1737825285
No ratings yet
_8th_of_10_Python_Resources_PANDAS_Interview_Q_A_?_1737825285
19 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Pandas
No ratings yet
Pandas
11 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
18_Pandas
No ratings yet
18_Pandas
33 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
Data Frame
No ratings yet
Data Frame
17 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Limitations of A Sequential Process
No ratings yet
Limitations of A Sequential Process
3 pages
PCS 7 Advanced Process Library (V9.1 SP2) 2-Basics of APL (47-390)
No ratings yet
PCS 7 Advanced Process Library (V9.1 SP2) 2-Basics of APL (47-390)
344 pages
I Love Mincraft Powerpoint
No ratings yet
I Love Mincraft Powerpoint
27 pages
Job Title - Backend Software Developer Intern
No ratings yet
Job Title - Backend Software Developer Intern
2 pages
Abb 61850 Sas General
No ratings yet
Abb 61850 Sas General
33 pages
Lecture-1 ARM Cortex M4-Based System PDF
100% (2)
Lecture-1 ARM Cortex M4-Based System PDF
36 pages
CH 6
No ratings yet
CH 6
11 pages
Taha Science Academy: Subjective Type
No ratings yet
Taha Science Academy: Subjective Type
1 page
Abdul Hadi Walizai - Scrum Master
No ratings yet
Abdul Hadi Walizai - Scrum Master
6 pages
SAP CRM Overview
No ratings yet
SAP CRM Overview
75 pages
Database Presentation Slides
No ratings yet
Database Presentation Slides
52 pages
Testing and Debugging: Logic Probe - Very Simple But Enough For Quick Test Oscilloscope
No ratings yet
Testing and Debugging: Logic Probe - Very Simple But Enough For Quick Test Oscilloscope
21 pages
The List, Stack, and Queue Adts Abstract Data Type (Adt)
No ratings yet
The List, Stack, and Queue Adts Abstract Data Type (Adt)
17 pages
Visvesvaraya Technological University
No ratings yet
Visvesvaraya Technological University
33 pages
Toshiba L515-SP4031
No ratings yet
Toshiba L515-SP4031
3 pages
Chef - Configuration Management Tool
No ratings yet
Chef - Configuration Management Tool
5 pages
CRTE Notes
No ratings yet
CRTE Notes
12 pages
Key Injection by Master POS Operation Manual - v1.00 - 20161228
100% (2)
Key Injection by Master POS Operation Manual - v1.00 - 20161228
9 pages
SCORM 2004 4ED v1 1 RTE 20090814
No ratings yet
SCORM 2004 4ED v1 1 RTE 20090814
202 pages
2G4 S4HANA1909 Set-Up EN XX
No ratings yet
2G4 S4HANA1909 Set-Up EN XX
12 pages
List of Handsets Supportin Telenor Auto Location
No ratings yet
List of Handsets Supportin Telenor Auto Location
8 pages
Udyog Aadhar 1
100% (1)
Udyog Aadhar 1
1 page
Add2Exchange Guide
No ratings yet
Add2Exchange Guide
175 pages
LAB8 DSA W23 Open Ended
No ratings yet
LAB8 DSA W23 Open Ended
5 pages
Tarp Da 3
No ratings yet
Tarp Da 3
7 pages
Samsung Galaxy Note 10.1 GT-N8000 UM User Manual Guide
No ratings yet
Samsung Galaxy Note 10.1 GT-N8000 UM User Manual Guide
163 pages
The Extraction of A Digital Surface Model (DSM) : Jun 29th, 2016 Page 1 of 18
No ratings yet
The Extraction of A Digital Surface Model (DSM) : Jun 29th, 2016 Page 1 of 18
18 pages
Pizza Restaurant PowerPoint Templates
No ratings yet
Pizza Restaurant PowerPoint Templates
48 pages
L&T Infotech Sample Technical Placement Paper Level1
No ratings yet
L&T Infotech Sample Technical Placement Paper Level1
6 pages