Name of Chapter: Pandas Dataframe
(Ref: Various websites and blogs on IP)
Topics Covered: Data Frames: creation - from dictionary of Series, list of dictionaries, Text/CSV files;
display; iteration; Operations on rows and columns: add, select, delete, rename; Head and Tail functions;
Indexing using Labels, Boolean Indexing , importing/Exporting Data between CSV files and
Data Frames.
Key Points
DATAFRAME:It is a two-dimensional labelled data structure with columns of potentially different data
types. It represents the data in the form of rows and columns. It is similar to a spreadsheet or an SQL table,
or a dictionary of series objects and is one of the most used objects in Pandas.
Dataframe Structure:
¤ Properties of a Dataframe:
1) A Dataframe has two indexes or we can say that two axes- • Row index (axis=0) • Column index
(axis=1)
2) Conceptually it is like a spreadsheet where each value is identifiable with the combination of row index
and column index. The row index is known as index and the column index is known as column-name.
3) A Dataframe contains heterogenous data, is size mutable and data mutable as well.
❖A Dataframe can be created using any of the following:
1) Series 2) Lists 3) Dictionaries 4) A numpy 2D array 5) Text files 6) CSV files
1) Creating an empty DataFrame- import pandas
as pd df=pd.DataFrame()
print(df)
2) Creating DataFrame from Series- import
pandas as pd s=pd.Series([1,2,3,4])
df=pd.DataFrame(s)
print(df)
3) Creating DataFrame from Dictionary ofSeries-
import pandas as pd name=pd.Series(['Shyam','Disha')
stream=pd.Series(['Science','Commerce'])
Dict={'Name':name,'Stream':stream} df1=pd.DataFrame(Dict)
PAGE:27
print(df1)
4) Creating DataFrame from List ofDictionaries-
import pandas as pd IP=[{"Name":"Raj","Marks":33},
{"Name":"Priya","Marks":29}] df=pd.DataFrame(IP)
print(df)
¤ Iteration on Rows and Columns:
In case we need to access any record or data from a Dataframe row-wise or column-wise, iteration can be
used. Pandas provides us with 2 functions to perform iterations:
1) iterrows(): It is used to access the Dataframe row-wise.
2) interitem(): It is used to access the Dataframe column-wise.
i) iterrows( )- Used to access data row-wise.
EXAMPLE:
import pandas as pd
1=[{'Name':'Riya','Surname':'Verma'},{'Name':'Di
a','Surname':'Sen'}] D=pd.DataFrame(1)
print(D)
for(row_index,row_value)in D.iterrows(): print('\n Row
Index is::',row_index) print('Row Value is::')
print(row_value)
OUTPUT:
Name Surname
0 Riya Verma
1 Dia Sen
Row Index is::0
Row Value is::
Name Riya Surname Verma
Name: 0, dtype:object
ii)iteritems( ) - Used to access data column wise.
EXAMPLE:
import pandas as pd
1=[{'Name':'Riya','Surname':'Verma'}
{'Name':'Dia','Surname':'Sen'}] D=pd.DataFrame(1)
print(D)
for(col_name,col_value)in D.iteritems():
print('\n')
print ('Column Name is::',col_name)
print('S column Values are::')
print(col_value)
PAGE:28
OUTPUT:
Name Surname
0 Riya Verma
1 Dia Sen
Column Name is:: Surname Column
Values are::
0 Verma
1 Sen
Name: Surname,dtype: object
¤ Different operations in Dataframe:
1) To access the data in the columns, we can mention the column name as subscript.
For example: “df[empid]” – This can also be done by using “df.empid”.
2) To access multiple columns, we can write it as- “df[[col1,col2…..]]”.
3) We can add columns of a Dataframe too:
df = pd.DataFrame({"D": [1, 2, 3], "E": [4, 5, 6]})
f = [7,8,9]
df[‘F'] = f
In the same manner rows can be added as well.
Select operation inDataFrame-
● To access the column data,we can mention the column name assubscript.
i) To access singlecolumn:
Syntax-df [ 'col_name'] or, df.col_name
Eg-df ['rollno'] or, df.rollno
ii) To access multiplecolumns:
Syntax-df[[col1,col2,. ...... ]]
Eg-df [['rollno','sname']]
Adding/Modifying a Column andRow-
☆Columns in a dataframe can be used in multiple ways.Assigning a value to a column:
◇ will modify it,if the column alreadyexists.
◇ will add a new column,if it does not existalready.
□To change or add a new column,syntax:
df.columns=<new value> Or,
df[ new col_name]=<new value>
Eg-
df.columns=['List1']
Or,
df['List2']=10
PAGE:29
Adding two DataFrames ,syntax:
df[ new col_name]=df[col1]+df [col2]
Eg-
df ['List3']=df ['List1']+df ['List2']
Note: we can use append () function to add two DataFrames.
Syntax-
df1.append(df2)
□Adding/Modifying a Row:
We can change or add rows in Dataframe using at or loc attributes.
To change or add a row,syntax:
df.at [row_name,:]=<new value> Or,
df.loc [row_name,:]=<new value>
Eg-
df.at ['Bangalore',:]=1000 And,
df.loc ['Mohali',:]=[45200,56,211]
Note:
•while adding a row ,we have to make sure that the sequence containing values for different
columns has values for all the columns,otherwise it will raise ValueError.
We can delete the columns and rows from a Dataframe by using any of the following:
i) del ii) pop() iii) drop()
i) del
Syntax-del df [col_name]
Eg-del df ['List 3']
ii) pop ()
Syntax-df.pop (col_name)
Eg- df.pop ('List2')
iii) drop()
Syntax for deleting the data column wise with example:
df1=df.drop('List2',axis=1)
Syntax for deleting the data row wise with example:
df2=df.drop (index=[2,3],axis=0)
Renaming Rows/Columns:
Syntax-
i) To renamerows-
PAGE:30
Syntax- df.rename(index={<names dictionary>},inplace=False)
Eg-
Df.rename (index={'Sec A':'A' , 'Sec B':'B'})
ii) To renamecolumns-
Syntax- df.rename(columns={names dictionary},inplace=False)
Eg- Df.rename(columns={'Rollno':'Rno'})
Note:
i)For both index and columns arguments ,specify the names-change dictionary containing
original names and the new names and the new names in a firm like {old name:newname}
ii) Specify inplaceargument as True ,if we want to rename rows/columns in the
samedataframe.
Accessing a Dataframe:
We can access a Dataframe through loc() and iloc() method or indexing using the following functions:
Pandas provides us with loc() and iloc() methods to access the subset from a Dataframe using row/column:
1) loc(): It is used to access a group of rows and columns- .loc[ : , : ]
2) iloc(): It is used to access a group of rows and columns based on numeric index value. .iloc[ : , : ]
• The above two syntaxes are generally used to access single/multiple rows/columns. There are other
syntaxes as well which are used for accessing a particular type of subset such as single rows only, multiple
rows only, etc.
1) Accessing the DataFrame through loc() and iloc() method -i)loc()-label
based. Used to access group of rows and columns.Syntax-df.loc [Start Row
: End Row,StartColumn:EndColumn]
Note-if we pass : in row or column part ,it means it will print the entire rows or columns
respectively.
Eg-
We are using DataFrame dtf5:
Population Hospitals Schools
Delhi 10927986 189 7916
Mumbai 12691836 200 8508
Kolkata 4631392 149 7226
Chennai4328063 157 7617
>>>dtf5.loc['Delhi',:]
Population 10927986
Hospitals 189
Schools 7916
>>>dtf5.loc['Mumbai' : 'Chennai',:] Population
Hospitals Schools
Mumbai 12691836 208 8508
Kolkata 4631392 149 7226
Chennai 4328063 157 7612
PAGE:31
>>>dtf5.loc[:,'Population' : 'Hospitals']
Population Hospitals
Delhi 10927986 189
Mumbai 12691836 208
Kolkata 4631392 149
Chennai 4328063 157
>>>dtf5.loc['Delhi':
'Mumbai','Population': 'Hospitals']
Population Hospitals
Delhi 10927986 189
Mumbai 12691836 208
ii)iloc()-integer based.Used to access a group of rows and columns based on numeric
index value.
Syntax-df.iloc [Start Rowindexes : End Row index, Start Columnindex : End
Columnindex]
Note-Here,end result is excluded.
Eg-
We are using DataFrame dtf5: Population
HospitalsSchools
Delhi 10927986 189 7916
Mumbai12691836 200 8508
Kolkata4631392 149 7226
Chennai4328063 157 7617
>>>dtf5.iloc[0:2,1:2]
Hospitals
Delhi 189
Mumbai 200
>>>dtf5.iloc [:,0:2]
Population Hospitals
Delhi 10927986 189
Mumbai 12691836 200
Kolkata 4631392 149
Chennai 4328063 157
¤ head() and tail() Method:
The method of head() gives the first 5 rows by default and the method tail() returns the last 5 rows by
default. The syntax head(5) and tail(5) work the same and can be customizable as desired.
head() and tail() method-
i)head() -used to access first 5 rows
head(3)-used to access first 3rows.
PAGE:32
ii)tail()- used to access last 5rows.
tail(3)-used to access last 3rows.
¤ Boolean Indexing in Dataframe: Boolean indexing, as the name suggests, means having Boolean values
[(True or false) or (1 or 0) sometimes] as indexes of a Dataframe. While creating a Dataframe with Boolean
indexing True and False, we make sure that True and False are not enclosed in quotes (i.e., like ‘true’ or
‘false’), otherwise it will give us error(KeyError) while accessing data Boolean indexes using .loc, because
‘True’ and ‘False’ are string values, not Boolean values.
Boolean Indexing in DataFrame-
Used to select the data from the DataFrame using Boolean operators,i.e.True and False.
import pandas as pd
dict = {'name':["Deep","Rahul","Priya","Vinod"],
'age': ["28", "39", "34", "36"]}
info = pd.DataFrame(dict, index = [True, True, False,True])
print(info)
print(info.loc[True])
print (info.iloc [2])
Output-
name age
True Deep 28
True Rahul 39
False Priya 34
True Vinod 36
name age
True Deep 28
True Rahul 39
True Vinod 36
name Priya
age 34
Some Imp.DataFrameAttributes:
I) index- display the index of theDataFrame.
II) columns- display the column labels of theDataFrame.
III) size-return an int representing the number of elements in this object. IV)shape-
return a tuple representing the dimensionality of the DataFrame.
V)empty- indicate whether DataFrame isempty.
VI) ndim- return an int representing arraydimensions.
VII) T-to transpose index andcolumns.
(Transpose means to interchange the order of rows and columns in place of each other.)
Eg-
We take DataFrame dfn:
PAGE:33
Marketing Sales
Age 25 24
Name Neha Rohit
Sex Female Male
>>>dfn.index
Index (['Age',Name','Sex'],dtype='object')
>>>dfn.columns
Index (['Marketing','Sales'],dtype='object')
>>>dfn.size 6
>>>dfn.shape(3,2)
>>>dfn.emptyFalse
>>>dfn.ndim2
>>>dfn.T
Age Name Sex
Marketing 25 Neha Female
Sales 24 Rohit Male
15 Objective Question (1 Mark )
Q1. Complete the following code –
_____________________ #missing statement
D = {'code' : [102 , 104, 105 ], 'ename' : ['Arun', 'Geet', 'Amy'] }
df1 = pp.DataFrame(D)
print(df1)
a) import pandas
b) import pandas as pp
c) import Pandas as pp
d) import pandas as pd
Ans b) import pandas as pp
Q2. Missing data in a Dataframe object is represented through –
a) NULL b) None
c) NaN d) <empty>
Ans c) NaN
Q3. The function to create a dataframe from a CSV file is –
a) to_csv() b)load_csv()
c) fetch_csv() d) read_csv()
PAGE:34