0% found this document useful (0 votes)
6 views

Pandas DataFrame

The document provides an overview of the Python Pandas library, focusing on its data handling capabilities for manipulation and analysis. It details the two primary data structures in Pandas, Series and DataFrame, and demonstrates various operations such as creating, modifying, and accessing data within these structures. Additionally, it covers methods for adding and deleting rows and columns, as well as selecting and renaming data elements.

Uploaded by

akritijha0908
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Pandas DataFrame

The document provides an overview of the Python Pandas library, focusing on its data handling capabilities for manipulation and analysis. It details the two primary data structures in Pandas, Series and DataFrame, and demonstrates various operations such as creating, modifying, and accessing data within these structures. Additionally, it covers methods for adding and deleting rows and columns, as well as selecting and renaming data elements.

Uploaded by

akritijha0908
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Unit 1: Data Handling using Pandas

Introduction
• Python Pandas is a software library written for the Python programming language for
data manipulation and analysis regardless of the origin of the data.
• Pandas is defined as an open-source library that provides high-performance data
manipulation in Python.
• The name of Pandas is derived from the word Panel Data.
• It is developed by Wes McKinney in 2008.
• Using the Pandas, we can accomplish five typical steps- load, prepare, manipulate,
model and analyse

Benefits of Pandas
• It can easily represent data in a form naturally suited for data
analysis
• It provides clear code to focus on the core part of the code.
Made by PGT Comp Sc. Ms. Puja Gupta
Data structures in Pandas
Data structure is defined as the storage and management of the data for its
efficient and easy access in the future where the data is collected, modified and
the various types of operations are performed on the data respectively.

Pandas provides two data structures for processing the data:


(1) Series: It is one dimensional object similar to an array, list or column in a
table. It will assign a labelled index to each item in the series. By default, each
item will receive an index label from 0 to N, where N is the length of the series
minus one.
(2) DataFrame: It is a tabular data structure comprised of rows and columns.
DataFrame is defined as a standard way to store data which has two different
indexes i.e., row index and column index.
Made by PGT Comp Sc. Ms. Puja Gupta
Panda DataFrame
Key Points of Series
1. Two Dimensional Data Structure - That is because it has both rows and columns
2. Labelled Indexes - The rows and columns have indices.
3. Heterogeneous Data - Each column will have similar data, however, the entire
DataFrame can have multiple columns with Different Datatypes.
4. Value is Mutable - Data can be updated at any point in time.
5. Size is Mutable - Rows and Columns can be added or removed after the creation of
the DataFrame.
6. For rows the axis=0 and columns axis=1
A pandas DataFrame can be created
using the following function

pandas. DataFrame (data, index, columns, dtype, copy)


Made by PGT Comp Sc. Ms. Puja Gupta
Panda DataFrame- Creating Empty DataFrame
pandas. DataFrame (data, index, columns, dtype, copy)

import pandas as pd
df=pd.DataFrame()
#pd.DataFrame(None)
print(df)

Made by PGT Comp Sc. Ms. Puja Gupta


Panda DataFrame- Creating From Dictionary
When DataFrame is created by using Dictionary, keys of dictionary are set as columns of
DataFrame. You can change the order of columns and store specified columns. If you try
to change the column name, NaN will be displayed.
Note: Column name values must be same as dictionary keys

import pandas as pd
Dic={'roll':[1,2,3],'name':('a','b','c'),'marks':(24,53,66)}
df=pd.DataFrame(Dic)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From Dictionary with custom index

import pandas as pd
Dic={'roll':[1,2,3],'name':('a','b','c'),'marks':(24,53,66)}
df=pd.DataFrame(Dic,index=[11,12,13])
# or df=pd.DataFrame(index=[11,12,13],data=Dic)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From nested list (2D List)

import pandas as pd
l=[['eng',101],['chem',99],['comp',100]]
df=pd.DataFrame(l )
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Creating From nested list (2D List) with labelled indexes

import pandas as pd
l=[['eng',101],['chem',99],['comp',100]]
df=pd.DataFrame(l,index=list(range(5,20,5)),columns=['sname','scode'])
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Add a column (always at the end)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df['UT1']=[12,13,14]
df['UT2']=df['UT1']+5
df['UT3']=df['UT1']+df['UT2']
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Add a column using insert() method
df.insert(loc, column, value, allow_duplicates = False) loc is an integer which is the location of column where we
want to insert new column. This will shift the existing column at that position to the right.
import pandas as pd
dict1={"Name":["sanah",'chavi','suditi'],"PB1":[78,88,98],"PB2":[87,93,97]}
df=pd.DataFrame(dict1,index=['a','b','c'])
print(df)
df.insert(2,'age',[1,2,3],allow_duplicates=True)
#2 is index location at which field age will be inserted
#and if ‘age’ field already exists then allow_duplicates will permit
Columns names
print(df)

Row index
Before Made by PGT Comp Sc. Ms. Puja Gupta
After insertion
DataFrame- Add a Row
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df.loc[len(df)]=['raji',100]
# df.loc[3]=[ 'raji',100]
# if given an already existing id then will replace
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Drop or Delete a Row
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df=df.drop(1,axis=0)
# by default axis is 0, so if not given any axis it will be 0
#df.drop(1,axis=0,inplace=True) will make changes in df
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Drop or Delete a Column
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
df=df.drop(‘name’,axis=1)
#df.drop(‘name’,axis=1,inplace=True)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of some condition
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['name']=='puja'])
#print(df[df['name']=='puja'])

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of JUST
condition, answer will be Boolean values

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df['name']=='puja')
#print(df['name']=='puja')

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of some condition
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['marks']>=80])
#print(df[df['marks']>=80])

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Accessing Elements from a DataFrame on the basis of some condition
Showing few columns

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['name']=='puja']['name'])
#single column
print(df.loc[df['name']=='puja'][['name','marks']])

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Accessing Elements from a DataFrame on the basis of some condition,
showing one columns/few columns/ all columns

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df.loc[df['marks']>=80,'marks'])

print(df.loc[df['marks']>=80,['marks','name']])

print(df.loc[df['marks']>=80,:])
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Drop or Delete a Row with some condition
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
ind=df.loc[df['name']=='puja'].index
print(ind)
df=df.drop(ind,axis=0)
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Selecting a column

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

print(df['name'])# return data type is series

print(df[['name']]) #return data type is DataFrame

#print(df['UT1'])#if column does not exists will give KeyError


Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Selecting a subset of DataFrame with loc and iloc methods
• The loc() function is label based data selecting method which means that we have to pass the name of the row or
column which we want to select.
• This method includes the last element of the range passed in it, unlike iloc().
• loc() can accept the boolean data unlike iloc().

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
print(df.loc[:,:]) # all rows and columns

print(df.loc[0:1,'name':'marks'])#range of rows & columns

print(df.loc[[0,2],['name']]) #few rows & columns

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Selecting a subset of DataFrame with loc and iloc methods
• The iloc() function is an indexed-based selecting method which means that we have to pass an integer index in the
method to select a specific row/column.
• This method does not include the last element of the range passed in it unlike loc().
• iloc() does not accept the boolean data unlike loc().

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
print(df.iloc[:,:])

print(df.iloc[0:1,0:1])

print(df.iloc[[0,2],[0,1]])
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Rename column names(few/all)/ Row index(few/all)
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

df=df.rename(columns={'name':'Child Name'},index={0:'stud1'})
print(df)

Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- Rename column names(all)/ Row index(all)
import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)

df.columns=['Child Name','marks']
df.index=['stud1','stud2','stud3']
print(df)
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DATA
DataFrame- head() and tail() function
head()-By default DataFrame.head() function display top 5 rows. To print n no of top rows, pass n as
parameter i.e. DataFrame. head(n)
tail()-By default DataFrame.tail() function display last 5 rows. To print n no of last rows, pass n as parameter
i.e. DataFrame. tail(n)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.head())
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- head() and tail() function
head()-with negative argument(-n) : Total no of rows-n=head(ans)
tail()-with negative argument(-n) : Total no of rows-n=tail(ans)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.head(-4))
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- head() and tail() function
head()-By default DataFrame.head() function display top 5 rows. To print n no of top rows, pass n as
parameter i.e. DataFrame. head(n)
tail()-By default DataFrame.tail() function display last 5 rows. To print n no of last rows, pass n as parameter
i.e. DataFrame. tail(n)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.tail())
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- head() and tail() function
head()-with negative argument(-n) : Total no of rows-n=head(ans)
tail()-with negative argument(-n) : Total no of rows-n=tail(ans)

import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df=pd.DataFrame(d)
print(df.tail(-4))
Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- modifying/accessing a single cell
import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[1,2,3,4,5,6]}
df.loc[0,'name']='puja gupta'
df.at[1,'name']='aadya gupta'

df.iloc[0,1]=7
df.iat[1,1]=8 Columns names
print(df)
Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Attributes
1.DataFrame.index- to display row labels
FArea={'Pre Board 1' :[10,11,12],"Pre Board 2":[20,21,22]}
df1=pd.DataFrame(FArea,index=['a','b','c'])
print(df1)
print(df1.index)

2. DataFrame.columns- to display column labels


print(df1.columns)

3. DataFrame.dtypes- to display data type of each column

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Attributes
4. DataFrame.values- to display a NumPy ndarray having all
values in the DataFrame, without axes labels
print(df.values)

5. DataFrame.size- to display total no of elements in DataFrame


print(df.szie)# 6

6. DataFrame.T- to transpose the DataFrame i.e rows will become columns


and columns will become rows.
print(df.T)
Before After
Transpose
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Attributes
7. DataFrame.empty- to return True if DataFrame is empty otherwsie False
df1=pd.DataFrame([1,np.NaN])
df1.empty #Ans is False

8. DataFrame.axes - to return a list representing both the axes(axis 0 (row-index) axis 1 (columns))
print(df.axes)

9. DataFrame.ndim- The ndim attribute returns the number of dimensions, which is


2 for a DataFrame instance.
print(df1.ndim)#ans is 2

10. DataFrame.shape- to display tuple representing the dimensionality of DataFrame (no of


rows, no of columns)
print(df1.shape)#ans is (3, 2) Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Deleting a Single Row/more than 1 row
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)

df=df.drop(0,axis=0)
df=df.drop([1,2],axis=0)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Deleting a Single Column/more than 1 column
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)

df=df.drop('name',axis=1)
#df=df.drop(['name','marks'],axis=1)
print(df) Columns names

Row index
Made by PGT Comp Sc. Ms. Puja Gupta
DataFrame- Iterating row-wise
• iterrows() : In order to iterate over rows, we apply a iterrows() function this function return
each index value along with a series containing the data in each row. Now we apply
iterrows() function in order to get a each element of rows. It gives u horizontal subset and
gives u (row-index, Series)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
for a,b in df.iterrows():
print(a)#row index
print(b)#row with series dtype

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Iterating column -wise
iteritems():It is used to iterate data column wise.It gives u vertical subset and gives u
(column-index, Series)

import pandas as pd
d={'name':['puja','aadi','srish'], 'marks':[77,88,99]}
df=pd.DataFrame(d)
for a,b in df.iteritems():
print(a)#column name
print(b)#column values with series dtype

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- 2 types of indexing Label Based, Boolean based
1. label based indexing: DataFrame.loc[ ] is an important method that is used for label
based indexing with DataFrames.
2. Boolean based Indexing: means a binary variable that can represent either of the two
states - True (indicated by 1) or False (indicated by 0). In Boolean indexing, we can
select the subsets of data based on the actual values in the DataFrame rather than their
row/column labels. Thus, we can use conditions on column names to filter data values.

import pandas as pd
df=pd.DataFrame([1,2,3,4,5],index=[True,False,True,False,True])
#It divides dataframe into 2 groups-True rows and False rows
print(df.loc[True])
print(df.loc[False])

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- slicing a DataFrame()
import pandas as pd
d={'name':['a','b','c','d','e','f'], 'marks':[11,12,23,24,25,36]}
df=pd.DataFrame(d,index=[1,2,3,4,5,6]) First Slicing

print(df[:])# first slicing


print(df[1:3:1])#second slicing
print(df[-1:-4:-1])#third slicing
print(df[1::2])#fourth slicing
Fourth Slicing
Second Slicing Third Slicing

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Sorting values in ascending order
The values can be sorted on the basis of specific column or columns and can be ascending or descending order Syntax
df.sort_values(by,ascending=True,inplace=False) where by- string or list of strings

import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)
df1=df.sort_values(by='marks',axis=0,inplace=False)
print(df1)
Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Sorting values in descending order
The values can be sorted on the basis of specific column or columns and can be ascending or descending order Syntax
df.sort_values(by,ascending=True,inplace=False) where by- string or list of strings

import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d)
df1=df.sort_values(by='marks',axis=0,inplace=False,ascending=False)
print(df1)
Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Sorting index in descending order by axis=0
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d,index=['s3','s1','s2','s4'])
df1=df.sort_index(axis=0,ascending=False)
print(df1)

Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Sorting index in descending order by axis=1
import pandas as pd
d={'name':['puja','aadi','srish','raji'], 'marks':[77,88,99,86]}
df=pd.DataFrame(d,index=['s3','s1','s2','s4'])
df1=df.sort_index(axis=1,ascending=False)
print(df1)

Columns names

Row index

Before Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Arithmetic Operations
If matches, operation takes place otherwise it shows NaN (Not a Number). It is called Data Alignment in panda object.

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Arithmetic Operations
import pandas as pd
d1={ 'marks':[77,88,99,86]}
df1=pd.DataFrame(d1,index=[1,3,4,5])
print(df1)
d2={'marks':[7,8,9,6]}
df2=pd.DataFrame(d2,index=[3,2,4,5])
print(df2)
print(df1+df2)# d1.add(d2) Columns names

Row index

Made by PGT Comp Sc. Ms. Puja Gupta


DataFrame- Arithmetic Operations
import pandas as pd
d1={ 'marks':[77,88,99,86]}
df1=pd.DataFrame(d1,index=[1,3,4,5])
print(df1)
d2={'marks':[7,8,9,6]}
df2=pd.DataFrame(d2,index=[3,2,4,5])
print(df2)
d1.add(d2,fill_value=0) Columns names

Row index

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- min() with axis=0
Syntax: df.min(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.min(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- min() with axis=1
Syntax: df.min(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.min(1))# axis=1 default does for each row

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- max() with axis=0
Syntax: df.max(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.max(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- max() with axis=1
Syntax: df.max(axis=0,skipna=True ,numeric_only=None) axis(0/1)- can be 0(rows)-Default or 1(columns)
skipna(True/False)-Excludes NA/null/NaN values when computing result numeric_only(True/False)-Includes
only float,int,boolean columns, If set to None will try using everything #if a column has NaN value it
becomes float as integer cant store NaN in its

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.max(1))# axis=1 default does for each row

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- count() with axis=0
It counts the non-NaN entries
if you do not pass any argument or pass 0 (default is 0) then it returns count of non NaN values for each column If you pass
argument as 1 then returns count of non NaN values for each row. Syntax df.count(axis=0,numeric_only=False)

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.count(0))# axis=0 default does for each column

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- count() with axis=1
It counts the non-NaN entries
if you do not pass any argument or pass 0 (default is 0) then it returns count of non NaN values for each column If you pass
argument as 1 then returns count of non NaN values for each row. Syntax df.count(axis=0,numeric_only=False)

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.count(1))# axis=1 default does for each row

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- sum() with axis=0
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.sum()) #df.sum(0) for each column

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- sum() with axis=1
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.sum(1)) #df.sum(axis=1) for each row

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- sum() of some columns
import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df['Benjamin'].sum())

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- Other functions
quantile(),std()- standard deviation, var()-variance, mad() - mean absolute
deviation
1. describe()- it gives following info about the dataframe
count mean std Min 25%,50%,75% Max
2. info()- gives following info about dataframe
type
Index Values
Number of rows
Data Column and values in them
Data Type of each column
Memory usage
Data Visualization

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- isnull()
to detect missing values returns True/False.

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.isnull())

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- dropna() with axis=0
will drop any row having NaN in it

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.dropna(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- dropna() with axis=1
will drop any column having NaN in it

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.dropna(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- fillna()
will fill the NaN values with given value in them

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,\
np.NaN,88],'subject':["Acct",'Eco','Eng','IP']})
print(df.fillna(33))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- idxmax() with axis=1
function to function to find the index of the maximum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmax(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- idxmax() with axis=0
function to function to find the index of the maximum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmax(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- idxmin() with axis=1
function to function to find the index of the minimum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmin(axis=1))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- idxmax() with axis=0
function to function to find the index of the minimum value along the axis

import pandas as pd
import numpy as np
df=pd.DataFrame({"Benjamin":[99,90,95,94],"Krishna":[94,89,
np.NaN,88]})
print(df.idxmin(axis=0))

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns.
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1)#no changes comes in either df1 or df2
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns. Ignore_index will reassign indexes from 0, by default it is False
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1,ignore_index=True)
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta


Statistical Functions in DataFrame- append()
appends or adds the second dataframe to the first dataframe
It appends rows of the second DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are
added as new columns. Verify_integrity if set to False will not show errors if duplicate row indexes are there.
import pandas as pd
import numpy as np
df1=pd.DataFrame({"a":[1,2],"b":[5,6]},index=['puja','aashi'])
df2=pd.DataFrame({"c":[9,10,11],"b":[13,14,15]},index=['puja','aashi','
dhruv'])
df3=df2.append(df1,verify_integrity=False)
print(df3)

Made by PGT Comp Sc. Ms. Puja Gupta


Some other ways of Creating DataFrame
•To create from numpy array
a1=np.array([10,20,30])
df1=pd.DataFrame(a1)

•To create DataFrame from more than one ndarray (numpy array)
a1=np.array([10,20,30])
a2=np.array(["A","b","c"])
df1=pd.DataFrame([a1,a2],index=["ayush","samudra"],columns=['m1','m2','m3'])

Made by PGT Comp Sc. Ms. Puja Gupta


•To create from a list of Dictionary
Here, the dictionary keys are taken as column labels, and the values corresponding to each
key are taken as rows. There will be as many rows as the number of dictionaries present in
the list.
l=[{'grade':'A',"per":82},{'grade':'B',"per":77.9}, {'grade':'C',"per":66,"age":12}]
d=pd.DataFrame(l)

•To create from Series


ser1=pd.Series([1,2,3,4,5])
ser2=pd.Series([1000,2000,3000,4000,5000])
ser3=pd.Series([-1000,-2000,-3000,-4000,-5000])
df1=pd.DataFrame([ser1,ser2,ser3])

Made by PGT Comp Sc. Ms. Puja Gupta


•To create from Dictionary of series
s1=pd.Series([78,88],index=["aashi",'dhruv'])
s2=pd.Series([87,93],index=["adriyan",'aashi'])
d1={"pre board 1":s1, "pre board 2":s2}
d=pd.DataFrame(d1)

•Creating a DataFrame from another DataFrame


dict1={"pre board 1":pd.Series([78,88]),"pre board 2":pd.Series([87,93])}
d1=pd.DataFrame(dict1)
d2=pd.DataFrame(d1)#Here d1 and d2 are referring to same memory location so change in
one will be reflected in other.

Made by PGT Comp Sc. Ms. Puja Gupta


•Creating a DataFrame from a nested dictionary- So, Columns-
Outer Dictionary Keys and Rows- Inner Dictionary Keys
•d={'stud1':{'name':'puja','age':1},'stud2':{'name':'aadi','age':2}}
df=pd.DataFrame(d)
print(df)

Made by PGT Comp Sc. Ms. Puja Gupta

You might also like