0% found this document useful (0 votes)
21 views45 pages

Chapter 1 and 2 Series and Data Frame

Uploaded by

Kushagra Karan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views45 pages

Chapter 1 and 2 Series and Data Frame

Uploaded by

Kushagra Karan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

8

UNIT - I
DATA HANDLING USING PANDAS
Pandas:
• Python libraries contain a collection of built-in modules
• NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific and analytical
use.
• PANDAS (PANel DAta) is a high-level data manipulation tool used for Data Analysing
• Pandas is an Open Source library built for Python Programming language.
• The main author of Pandas is Wes McKinney.

Data Structure in Pandas :


• A data structure is a collection of data values and operations that can be applied to that data.
• It enables efficient storage, retrieval and modification to the data
• Data structures in Pandas are:
1) Series
2) DataFrame
3) Panel
Series:
• Series is a data structure of Pandas
• It is a one dimensional structure
• It contains homogeneous data
• Data values are associated with labelled
index
• Index can be numeric, string or any other
datatype
• Default index is zero if no index is given
• Series has two main components- Key Points :
✓ An array of actual data. ● Homogeneous data
✓ An associated array of indexes or data ● Size Immutable
labels ● Values of Data Mutable

9
DataFrame :

• Pandas store tabular data using a


DataFrame.
• DataFrame is a data structure of Pandas
• A DataFrame is a two-dimensional
structure
• It contains heterogenous data
• It is like a table of MySQL
• It contains rows and columns, and
therefore has both a row and column
index.
• Row index is called index and column
index is called column name
• Dimensions of dataframe are also called as
Axis Key Points :
[Row index (axis=0), Column index ● Heterogeneous data
(axes=1)] ● Size Mutable
• Dataframe is size and value mutable ● Values of Data Mutable
What is a Series?
▪ A Pandas Series is like a column in a table.
▪ It is a one-dimensional array holding data of any type

Creation of Series :
There are a number of ways to create a DataFrame

(A) Creation of an empty Series:


An empty Series can be created as follows:

Coding: Output:
import pandas as pd Series([], dtype: float64)
s1=pd.Series()
print(s1)

(B) Creation of a Series from List:Series can be created from a List:[ default indices range from 0 through
N – 1. Here N is the number of data elements]
Coding: Output:
import pandas as pd
s2=pd.Series(['p', 'y', 't', 'h', 'o', 'n'])
print(s2)

Index of the Series can be changed by User defined Labels

10
Coding: Output:
import pandas as pd
s2=pd.Series(['p', 'y', 't', 'h', 'o', 'n'],
index=[111,222,333,444,555,666])

print(s2)

(C) Creation of a Series from Numpy Array: [ one-dimensional (1D) NumPy array]
Coding: Output:
import pandas as pd
a1=np.array([10,20,30,40])
s=pd.Series(a1)
print(s)

Data and index- numpy array used


a1=np.array([10,20,30,40])
a2=np.array([11,22,33,44])
s=pd.Series(a1,a2)
print(s)

(D) Creation of a Series from Dictionary: Keys become Index and Values become Data
Coding: Output:
import pandas as pd
d1={'I': 'one', 'II': 'two', 'III': 'three'}
s=pd.Series(d1)
print(s)

(E) Creation of a Series from Scalar value(Single value for all items)
Coding: Output:
import pandas as pd
s=pd.Series(5)
print(s)

[Depending on the number of index,


scalar values are repeated]
s=pd.Series(5,[11,22,33])
print(s)

(F) Creation of a Series with Incomplete Value

11
Coding: Output:
import pandas as pd
import numpy as np
s=pd.Series([10,20,30,np.NaN,50])
print(s)
Note: numpy should be imported

(G) Creation of a Series using range() function


Coding: Output:
import pandas as pd
s=pd.Series(range(1,10,2))
print(s)

(H) Creation of a Series using for loop


Coding: Output:
import pandas as pd
s=pd.Series(y for y in "chennai")
print(s)

Accessing Elements of a Series


There are two common ways for accessing the elements of a series:
(i)Indexing
(ii)Slicing.
(A) Indexing:
Indexes are of two types: positional index and labelled index.
a) Positional index :
It takes an integer value that corresponds to its position in the series starting from 0
b) Labelled index:
It takes any user-defined label as index
Positional index :
▪ Single element can be accessed using positional index (Seriesobject[index])
▪ More than one element of a series can be accessed using a list of positional integers

If s is the series given below

12
s=pd.Series(['p', 'y', 't', 'h', 'o', 'n'])

1. s[0] gives p
2. s[2] gives t
3. s[[1,3]] gives

Labelled index :
▪ Single element can be accessed using labelled index (Seriesobject [labelled index])
▪ More than one element of a series can be accessed using a list of index labels

If s is the series given below(Data is the city name, Index is the State )
INDEX DATA
city=['Mumbai','kolkata','Chennai','Bangalore','Hydrebad']
state=['Maharashtra','west Bengal','Tamilnadu',
'karnataka','Telangana']
s=pd.Series(city,state)
print(s)

1) s['Maharashtra']) gives Mumbai


2) s['Tamilnadu'] gives Chennai

3) s[['west Bengal','Telangana']]) gives

4) Even though labelled index is used, We can also access elements of the series using
the positional index
5) Both s[3], s['karnataka'] gives Bangalore, Bangalore

(B) Slicing:
▪ It is used to extract a part of a series.

▪ Part of the series to be sliced can be defined by specifying the start and end parameters
[start :end] with the series name. eg: s[2:5]

▪ When we use positional indices for slicing, the value at the end index position is excluded,
i.e., In s[2:5], element at 5th index is excluded, (end-start) 5-2=3 elements at index 2,3 and 4
are extracted
13
▪ If labelled indexes are used for slicing, then value at the end index label is also included
i.e s['west Bengal':'Telangana'] includes all elements from index westbengal till
Telangana(included)

Note: Negative indexing also works

Slicing: If s is the given series


0 -6
1 -5
2 -4
3 -3
4 -2
5 -1

s[1:3] Displays Elements at positional


index 1 and 2

s[:2] Displays Elements at positional


index 0 and 1

s[2:] Displays Elements from


positional index 2 till last

s[::-1]) Displays Elements in reverse

s[-2:] Displays Elements from


positional index -2
(i.e -2 and -1)

14
s[:-2] Displays Elements from
positional index 0 till -3
(-2 will not display)

s[-4:-2] Displays Elements at


positional index -4 and -3

Modifying Series Data:

Modifying Single Element using index:


s[1]=75 #This changes the data at positional index 1 as 75
s['Amit']=88 #This changes labelled index Amit’s data as 88

s[1]=75
s['Amit']=88

Modifying Multiple Elements using slicing:

s[3:5]=77
#This changes element from index 3 to 4 as 55

s[['Julie','Amar']]=90
#This changes Julie’s and Amar’s data as 90

15
s['Laxman':'Amit']=33
#This changes data from Laxman till Amit(including) as 33

Attributes of Series :
We can access certain properties called attributes of a series by using that property with the series name.

If s1 and s2 are two series as given below


s1=pd.Series([10,20,30]) s2=pd.Series([11,22,np.NaN,44], index=[x for x in ['a','b','c','d']])

Attributes Description

It returns the index of the object


Index
s1.index gives s2.index gives
RangeIndex(start=0, stop=3, step=1) Index(['a', 'b', 'c', 'd'], dtype='object')
It returns the ndarray of the data
values
s1.values s2.values
[10 20 30] [11. 22. nan 44.]
It returns number of bytes
nbytes
s1.nbytes s2.nbytes
24 32
It returns the data type of the data
dtype
s1.dtype s2.dtype
int64 float64
It returns the shape of the data in the form of tuple
shape
s1.shape s2.shape
(3,) (4,)
It returns the total number of elements in the data

16
size s1.size s2.size
3 4
It returns true in case of empty series
empty
s1.empty s2.empty
False False
It returns true if the series contains NaN
Hasnans
s1.hasnans s2.hasnans

It returns the number of dimension


ndim
s1.ndim s2.ndim
1 1
Head() & Tail()
Head():
i. head(<n>) function fetches first n rows from a pandas object
ii. To access first 3 rows you should write Series_name.head(3)
iii. If you do not provide any value for n, (Series_name.head() )it will return first 5 rows

Series s
s.head()

s.head(3)

Tail():
i. tail(<n>) function fetches last n rows from a pandas object
ii. To access last 3 rows you should write Series_name.tail(3)
iii. If you do not provide any value for n, (Series_name.tail() )will return last 5 rows

Series s

s.tail()

s.tail(3)
17
Note: if number of rows existing less than the required rows ,available rows will get displayed

Mathematical Operations on a series

Mathematical processing can be performed on series using scalar values and functions. All the arithmetic
operators such as +, -, *, /, etc. can be successfully performed on series.

Note:

Arithmetic operation is possible on objects of same index; otherwise, will result as NaN.

Coding: S1: S2:


import pandas as pd
s1 = pd.Series([10,20,30,40,50])
s2 = pd.Series([1,2,3,4])
s = s1 + s2
print("Addition of two Series:")
print(s)
print("Subtraction of two Series:")
s = s1 - s2
print(s)
print("Multiplication two Series:")
s = s1 * s2
print(s)
print("Division of Series1 by Series2:")
s = s1 / s2
print(s)

Vector Operations on a series:

Series also supports vector operations. Any operation to be performed on a series gets performed on every
single element of it

import pandas as pd
s1 = pd.Series([1,3,6,4])
print(s1)

18
print(s1+2) # 2 gets added with every element
print(s1*2) # every element gets multiplied by 2
print(s1>2) # It returns true if element >2, otherwise False
S1>2:
S1: S1+2: S1*2:

Retrieving values using conditions:

We can also give conditions to retrieve values from a series that satisfies the given condition

The following examples performing the filter operation and returns filtered result containing only those values
that return True for the given Boolean expression.

print(s1[s1>2]) #This returns only those result for which s1>2 is True (False data will not be displayed)
print(s1[s1%2==0]) #This returns only those result for which s1%2==0 is True

S1: S1>2: S1[S1>2]: s1[s1%2==0]


2:

Deleting elements from a Series:


We can delete an element from a series using drop( ) method by passing the index of the element to be deleted
as the argument to it.

s.drop("Kavita")

Accessing Data through iloc & loc:


● Indexing and accessing can also be done using iloc and loc.
● iloc :- iloc is used for indexing and selecting based on position (default position starts from 0), It
refers to position-based indexing.
Syntax: iloc [<row no. range>, <column no. range>]
● loc :- loc is used for indexing and selecting based on name(user defined label) It refers to name-based
indexing.
Syntax: loc [<list of row names>, <list of column names>]

19
s.loc['b'] s.iloc[2]
iloc()- does not
include end index
s.iloc[1:4] loc() includes end
label

s.loc['b':'e']
Index 1 till 3 data gets displayed, 4 is excluded in iloc

Index b till e all data gets displayed in loc

Pandas Series Assignment: Find the output of the following: ( 1 to 15)


1. import pandas as pd
s = pd.Series()
print (s)
2. import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
3. import pandas as pd
s = pd.Series([17,88,9,33,44],index = ['a','b','c','d','e'])
print (s['a'])
print (s[3])
print(s[:3])
print(s[-2:])
print (s[['d','e']])
4. import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[1,12,123,145])
print (s)
5. import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print (s)
6. import pandas as pd
import numpy as np
data = {'Mammal' : 'Tiger', 'Snake' : 'Python', 'Bird' : 'Peacock'}
s = pd.Series(data)
print (s)
7. import pandas as pd
k=[11,22,33,44,55]
i=['a','b','c','d','e']
s=pd.Series(data=k,index=i)
20
print(s)
print(s.loc['a'])
print(s.loc['a':'d'])
print(s.iloc[1])
print(s.iloc[2:4])
8. import pandas as pd
k=[11,22,33,44,55,66,77,88,99]
i=[1,2,3,4,5,6,7,8,9]
s=pd.Series(data=k,index=i)
print(s.head(1))
print(s.tail(3))
9. import pandas as pd
k=[11,22,33,44,55,66,77,88,99]
i=[1,2,3,4,5,6,7,8,9]
s=pd.Series(data=k,index=i)
print(s/2)

10. import pandas as pd


s1=pd.Series([10,20,30,40])
s2=pd.Series([1,2,3,4])
s3=pd.Series([10,20,30,40,50,60])
s4=pd.Series([10,20,30,40,5,6,7,8,9])
print(s1+s2)
print(s1+s3)
print(s3*s4)
11. import pandas as pd
s=pd.Series([34,56,78])
print(s>40)
12. import pandas as pd
k=[11,22,33,44,55]
i=['a','b','c','d','e']
s=pd.Series(data=k,index=i)
print(s)
print("val=",s.loc[:'c'])
print("val by iloc=",s.iloc[1:4])
13. import pandas as pd
k=[11,22,33,44,55,66,77,88,99,100]
s=pd.Series(k)
print(s[0],s[0:4],s[:3],s[3:],s[3:8])
print(s[:-1],s[-10:-5],s[-8:])
14. import pandas as p
k=[11,22,33,44,55,66,77,88,99,100]
s=pd.Series(k)
print(s[0:5],s[5:8],s[:2],s[5:],s[6:8])
print(s[-1:],s[-3:],s[-5:])
15. Consider the following Series object “S1” and write the output of the following statement :
import pandas as pd
L1=[2, 4, 2, 1, 3, 5, 8, 9]
S1 = pd.Series(L1)
21
print("1. ",S1.index)
print("2. ",S1.values)
print("3. ",S1.shape)
print("4. ",S1.ndim)
print("5. ",S1.size)
print("6. ",S1.nbytes)
print("9. ",S1[5]**2)
print("10. ",S1.empty)
print("11.\n", S1>60
print("12.\n", S1[: : -1])
16. Write a program to create the following series and display only those values greater than 200 in the
given Series “S1”
0 300
1 100
2 1200
3 1700
17. Write a program to create the following series and modify the value 5000 to 7000 in the following
Series “S1”
A 25000 C 8000
B 12000 D 5000
18. Write a Pandas program to convert a dictionary to a Pandas series.
Sample dictionary: d1 = {'a': 100, 'b': 200, 'c':300, 'd':400, 'e':800}
19. Define the Pandas/Python pandas?
20. Mention the different types of Data Structures in Pandas?

Creation of DataFrame :
There are a number of ways to create a DataFrame

(A) Creation of an empty DataFrame:


An empty DataFrame can be created as follows:

Coding: Output:
import pandas as pd Empty
df1=pd.DataFrame() DataFrame
print(df1) Columns: []
Index: []

(B) Creation of DataFrame from List of Dictionaries:


We can create DataFrame from a list of Dictionaries, for example:

22
Coding: Output:
Keys of dictionaries (Name,Age,Marks)
become column names
import pandas as pd
d1={'Name':'Priya','Age':16,'Marks':70}
d2={'Name':'Harshini','Age':11,'Marks':99}
d3={'Name':'Kanishka','Age':15,'Marks':90}
df1=pd.DataFrame([d1,d2,d3])
print(df1)

▪ The dictionary keys are taken as column labels


▪ The values corresponding to each key are taken as data
▪ No of dictionaries= No of rows, As No of dictionaries=3, No of rows=3
▪ No of columns= Total Number of unique keys of all the dictionaries of the list, as all dictionaries
have same 3 keys, no of columns=3
Coding: Output:
Keys of dictionaries (Name, Age, Marks, Gender, Grade)
become column names
import pandas as pd
d1={'Name':'Priya','Age':16,'Marks':70,'Gender':'f'}
d2={'Name':'Harshini','Age':11,'Marks':99,'Grade':'A'}
d3={'Name':'Kanishka','Age':15,'Marks':90}
df1=pd.DataFrame([d1,d2,d3])
print(df1)

▪ The dictionary keys are taken as column labels


▪ The values corresponding to each key are taken as data
▪ No of dictionaries= No of rows, As No of dictionaries=3, No of rows=3
▪ No of columns= Total Number of distinct keys of all the dictionaries of the list, as total keys is 5, no
of columns=5
▪ NaN (Not a Number) is inserted if a corresponding value for a column is missing
(As dictionary d1 has no Grade it has Grade as NaN, dictionary d2 has no
Gender , hence it has Gender as NaN and d3 has has no Gender and Grade , hence it has both values
as NaN)

(C) Creation of DataFrame from Dictionary of Lists


DataFrames can also be created from a dictionary of lists.

23
Coding: Output:
Keys of dictionary (Name, Age, Gender, Marks)
import pandas as pd become column names
name=['ramya','ravi','abhinav','priya','akash']
age=[16,17,18,17,16]
gender=['f','m','m','f','m']
marks=[88,34,67,73,45]
d1={'name':name,'age':age,'gender':gender,'marks'
:marks}
df1=pd.DataFrame(d1)
print(df1)

Dictionary keys become column labels by default in a Data Frame, and the lists become the rows

(D) Creation of DataFrame from Series

DataFrame created from One Series: Output: As no index passed default value of row index starts with 0,
Only one column with default 0 index
Coding:

import pandas as pd
s1=pd.Series([100,200,300,400])
df1=pd.DataFrame(s1)
print(df1)

DataFrame from One Series:


No of rows = No of elements in Series=4 (As s1 has 4 elements)
No of columns = one (As single series used)

DataFrame created from Multiple Series: Output:


Coding:
import pandas as pd Default value of row indices and column indices starts with 0
s1=pd.Series([100,200,300,400])
s2=pd.Series([111,222,333,444])
df1=pd.DataFrame([s1,s2])
print(df1)

s1=pd.Series([100,200,300,400],index=['a','b','c','d'])
Column index is index of Series
s2=pd.Series([111,222,333,444],index=['a','b','c','d'])
df1=pd.DataFrame([s1,s2])
print(df1)

s1=pd.Series([100,200,300,400],index=['a','b','c','d'])
Column index is union of all index of all Series
s2=pd.Series([111,222,333,444],index=['a','b','c','e'])
24
df1=pd.DataFrame([s1,s2])
print(df1)
DataFrame from Multiple Series:
• The labels(index) in the series object become the column names
• Each series becomes a row
• No of columns =No of distinct labels in all the series
• If a particular series does not have a corresponding value for a label, NaN is inserted in the DataFrame
column

(E) Creation of DataFrame from Dictionary of Series

DataFrame created from Dictionary of Series: Output: Keys becomes Column name
Coding: Values (Series) becomes column data
import pandas as pd
name=pd.Series(['ramya','ravi','abhinav','priya','akash'])
age=pd.Series([16,17,18,17,16])
gender=pd.Series(['f','m','m','f','m'])
marks=pd.Series([88,34,67,73,45])
d1={'name':name,'age':age,'gender':gender,'marks':marks}
df1=pd.DataFrame(d1)
print(df1)
DataFrame created from Dictionary of Series(With Output :
different index: Keys becomes Column name
Values (Series) becomes column data
import pandas as pd If no value for particular row index, NaN is inserted
name=pd.Series(['ramya','ravi','abhinav','priya','akash'],[111,
222,333,444,555])
age=pd.Series([16,17,18,17,16],[111,555,666,222,333])
gender=pd.Series(['f','m','m','f','m'],[111,333,444,555,666])
marks=pd.Series([88,34,67,73,45],[222,333,444,555,666])
d1={'name':name,'age':age,'gender':gender,'marks':marks}
df1=pd.DataFrame(d1)
print(df1)

DataFrame from Dictionary of Series


• Keys of dictionary become column name
• Values of dictionary(Series) become column data
• The labels(index) in the series object become the row index
25
• No of rows =No of distinct labels in all the series
If a particular series does not have a corresponding value for an index, NaN is inserted in the
DataFrame column

Operations on rows and columns in DataFrames


We can perform some basic operations on rows and columns of a DataFrame like selection, deletion,
addition, and renaming etc

(A) Adding a New Column to a DataFrame


• If the new column name does not exists , new column will be created
• if it already exists, the old values will get updated with new values
• if we try to add a column with lesser values, than the number of rows in the DataFrame, it results
in a ValueError, with the error message:
ValueError:Length of values does not match length of index.
The following command will add new column city with list of values
If the given dataframe is

df1['city'] =
['chennai','mumbai','delhi'
,'mumbai','kolkata']

The following command will add new column newcity with same value ‘chennai’ for all rows

df1['newcity']='chennai'

The following command will change the content of existing column city with new value as chennai for all
rows

26
df1['city']='chennai'

(B) Adding a New Row to a DataFrame


We can add a new row to a DataFrame using the DataFrame.loc[ ] method

The following command will add new row ‘Swetha’ with given list of values

df1.loc['Sita'] = [77,67,76]

The following command will add new row ‘Gita’ with value 80 for all columns

df1.loc['Gita'] =
80

The following command can set all values of a DataFrame to a particular value

df1[:]=0

(C) Deleting Rows or Columns from a DataFrame


• DataFrame.drop() method can be used to delete rows and columns from a DataFrame.
• To delete a row, axis=0 and for deleting a column axis=1 should be given default value of axis is 0

27
The following command removes the row ‘Ramya’ [default value of axis is 0]

df1=df1.drop('Ramya')

The following command removes the row ‘Priya’ [axis is 0]

df1=df1.drop('Priya',axis=0)

The following command removes the row ‘Kavita’


inplace = True makes changes in the dataframe permanent

df1.drop('Kavita',inplace=True)

The following command removes the Column ‘Eng’

df1=df1.drop('Eng',axis=1)

28
The following command removes the Columns Eng ,Maths

df1=df1.drop(['Eng','Maths'],
axis=1)

(D) Renaming Row Labels of a DataFrame


▪ The labels of rows and columns can be changed using the DataFrame.rename() method.
▪ If no new label is passed corresponding to an existing label, the existing row label is left as it is

The following command renames the row label Ramya by Ram[By default axis =0 so row label changes]

df1=df1.rename({'Ramya':'Ram'})

The following command renames the row label Kavita by Savita [ index used]

df1=df1.rename
(index={'Kavita':'Savita'})

The following command renames the row label Priya by Riya [ axis=0 used by default axis is 0]

df1=df1.rename
({'Priya':'Riya'},axis=0)

29
The following command renames the Column label Eng by English [
axis=1 ]

df1=df1.rename({'Eng':'English'}
,axis=1))

The following command renames the Column labels Science by EVS and Maths by Mathematics [ columns ]

df1=df1.rename(columns={'Maths
':'Mathematics','Science':'EVS'})

Accessing DataFrames Elements:

A)Indexing: Accessing Single Column

Select Columns by Name in Pandas DataFrame using [ ]


The [ ] is used to select a column by mentioning the respective column name Df[‘Columnname’]
Note: Df.Columnname also can be used
In the given dataframe df1,

df1['Maths']

df1.Maths

Indexing: Accessing Multiple Columns

The [ ] is used to select multiple columns passed as a list ,Df [[list of column names]]
In the given dataframe df1,

30
df1[['Eng','Maths']]

B) Slicing:

• We can use slicing to select a subset of rows and/or columns from a DataFrame, like Select all rows
with particular columns, Select particular rows with all columns etc

C) Accessing the data frame through loc()[label indexing] and iloc()[positional indexing] method
• Pandas provide loc() and iloc() methods to access the subset from a data frame using row/column

Loc() method :
• The loc property is used to access a group of rows and columns by label(s) [label index]
Df.loc[StartRow : EndRow, StartColumn : EndColumn]
• when the row label is passed as an integer value, it is interpreted as a label of the index and not as an
integer position along the index
• When labelled indices are used for slicing, value at the end index label is also included in the output.
Df1.loc[a:e,col1:col4] access ‘a’ to ‘e’ [including ‘e’] and columns col1 to col4

iLoc() method :
• It is used to access a group of rows and columns based on numeric index value
Df.iloc[StartRowindex : EndRowindex, StartColumnindex : EndColumnindex]
• When positional indices are used for slicing, the value at end index position is excluded
Df1.iloc[1:5,2:6] access rows 1 to 4 [excluding 5] and columns 2 to 5[excluding 6]

Note -If we pass “:” in row or column part then pandas provide the entire rows or columns respectively

Using Label Indexing loc()

1) Single Row Access:

The following commands helps to access Single row [Details of Ramya ] [Symbol “:” indicates all columns]

31
df1.loc['Ramya']

df1.loc['Ramya',:]

2) Multiple Row Access:

The following commands helps to access Multiple rows (Details of Ramya and Kanishka)
[Records not necessary to be continuous, it should be enclosed in list ]
df1.loc[['Ramya','Kanishka']]

3) Multiple Row Access:


The following commands helps to access Multiple rows (Details from Ramya to Kanishka)
[Symbol ‘:’ should be used]

df1.loc['Ramya':'Kanishka']

4) Multiple Row Access:


1) The following commands helps to access Multiple rows [Display all rows from Ramya till last row]

df1.loc['Ramya':']

32
5) Single Column Access:
The following commands helps to access Single Column [Details of Maths ] [Symbol “:” indicates all
rows]
df1.loc[:,'Maths']

6) Multiple Column Access:


The following commands helps to access Multiple Columns (Details of English and Science)
[Records not necessary to be continuous, column name should be given as a list]

df1.loc[:,['Eng','Science']]

7) Multiple Columns Access:


The following commands helps to access Multiple Columns (Details from Eng till Science)
[Symbol ‘:’ should be used]

df1.loc [: ,'Eng':'Science']

Using Positional Indexing (iloc):


1) Single Row Access:
The following commands helps to access Single row [Details of Ramya ] [Symbol “:” indicates all columns]

33
df1.iloc[1]
df1.iloc[1,:]

2) Multiple Row Access:


The following commands helps to access Multiple rows (Details from Ramya to Kanishka index 1 to 3)
[Symbol ‘:’ should be used]

df1.iloc[1:4]

3) The following commands helps to access Multiple rows [Display all rows from Ramya(index 1) till
last row]

df1.iloc[1:]

4) 4) Single Column Access: The following commands helps to access Single Column [Details of Maths
index-1 ] [Symbol “:” indicates all rows]

df1.iloc[:,1]

5) Multiple Columns Access:


The following commands helps to access Multiple Columns (Details of English and Science Index 0,2)
[Records not necessary to be continuous, indices should be given as a list]

34
df1.iloc[:,[0,2]]

6) Multiple Columns Access:

The following commands helps to access Multiple Columns (Details from Eng till Science index 0 till last)
[Symbol ‘:’ should be used]
df1.iloc[:,0:]

Boolean Indexing :

▪ Boolean means a binary variable that can represent either of the two states - True (indicated by 1) or
False (indicated by 0).
▪ In Boolean indexing, we can select the subsets of data based on the actual values in the DataFrame
rather than their row/column labels.
▪ Thus, we can use conditions on column names to filter data values.

The following commands displays True or False depending on whether the data value satisfies the
given condition or not. (if Maths>=95 it returns True otherwise it returns False]

df1.Maths>=95

35
The following commands displays the details of those students who secured >= 95 in Maths

df1[df1.Maths>=95]

The following commands displays the English and Science marks of those students who secured >= 95
in Maths

df1[df1.Maths>= 95] [['Eng','Science']]

DATA FRAME ATTRIBUTES:

When we create an object of a DataFrame then all information related to it like size, datatype etc can
be accessed by attributes. <DataFrame Object>.<attribute name>

ATTRIBUTE DESCRIPTION
Index It shows index of dataframe
Index(['Priya', 'Ramya', 'Kavita', 'Kanishka', 'Harshini'],
dtype='object')

Columns It shows column labels of dataframe


Index(['Eng', 'Maths', 'Science'], dtype='object')

Axes It returns both the axes i.e index and column


[Index(['Priya', 'Ramya', 'Kavita', 'Kanishka', 'Harshini'],
dtype='object'),
Index(['Eng', 'Maths', 'Science'], dtype='object')]

36
Dtypes It returns data type of data contained by dataframe
Eng int64
Maths int64
Science int64
dtype: object
Size It returns number of elements in an object
15
Shape It returns tuple of dimension of dataframe
(5, 3)
Values It returns numpy form of dataframe
[[80 88 73]
[70 98 81]
[75 77 66]
[86 96 94]
[90 95 92]]
Empty It is an indicator to check whether dataframe is empty or not
False

Ndim It returns an int representing the number of axes/dimensions


2
T It transpose index and columns

Head() and Tail():

▪ DataFrame.head(n) to display the first n rows in the DataFrame


▪ If the parameter n is not specified by default, it gives the first 5 rows of the DataFrame.

The following command displays first 2 rows

df1.head(2)

If df1.head() command is executed it displays first 5 rows, if number of rows is less than 5, it will display all rows

37
▪ DataFrame.tail(n) to display the last n rows in the DataFrame
▪ If the parameter n is not specified by default, it gives the last 5 rows of the DataFrame.

The following command displays last 2 rows

df1.tail(2)

If df1.tail() command is executed it displays last 5 rows, if number of rows is less than 5, it will display all rows

Iterations in DataFrame:
Iterrows():
▪ DataFrame. iterrows() method is used to iterate over rows
▪ Each iteration produces an index and a row (a Pandas Series object)

df1:

CODING:
for i,j in df1.iterrows():
print("Details of ",i,":\n",j)

In the coding df1.iterrows(), it helps to iterate data row wise, where in


i,j -> i represents row index
j represents row data as series

38
Iteritems():
▪ DataFrame. iteritems() method is used to iterate over columns
▪ Each iteration produces a column name and a column(a Pandas Series
object)
df1:

CODING: for i,j in df1.iteritems():


print(i,j)
In the coding df1.iteritems(), it helps to iterate data column wise, where in
i,j -> i represents column name
j represents column data as a series

Itertuples():
▪ DataFrame. Itertuple() method return a named tuple for each row in the DataFrame
▪ The first element of the tuple will be the row’s corresponding index value, while the remaining values
are the row values
df1:

CODING:
for i in df1.itertuples():
print(i)

Binary Operations in a DataFrame


It is possible to perform add, subtract, multiply and division operations on DataFrame.
To Add :( +, add or radd )

39
#Addition
df3=df1+df2 # This performs addition of two dataframe elementwise
print("df3=df1+df2","\n",df3)
print("********************")

df4=df1.add(df2) # add() also performs addition of two dataframe elementwise


print("df4=df1.add(df2)","\n",df4)
print("********************")

df5=df1.radd(df2) # radd() also performs addition of two dataframe elementwise


# but in reverse order df5=df2+df1
print("df5=df1.radd(df2)","\n",df5)
print("********************")

Similiarly Subtraction, Multiplication, and Division can be performed


To Subtract : ( - , sub or rsub)
To Multiply: ( * , mul,or rmul)
To Divide : (/ , div or rdiv)
Note: Use of at, iat
iat:
▪ The iat property gets, or sets, the value of the specified position.
▪ Specify both row and column with a number representing the
position.

Syntax : dataframe.iat[row, column)


df1.iat[1,2] # it gives the data at row index 1 and column index 2 ( so it displays 81)
df1.iat[2,1]=45 # it changes the data at row index 2 , column index 1 as 45
(It changes Kavita’s Maths mark as 45)
at :

▪ The at property gets, or sets, the value of the specified position.


▪ Specify both row (index) and column (label) of the cell you want to return.

Syntax : dataframe.at[index, label)


print(df1.at['Harshini','Science']) # it displays the row label ‘Harshini’’s Column ‘Science’ data (92 will be
displayed)

40
df1.at['Ramya','Maths']=77 # it changes the row label ‘Ramya’’s Column ‘Maths’ as 77

1. Some Important points at glance


Creating an Empty Dataframe:

Output:
Empty DataFrame
Columns: []
Index: []
2. Creating an Empty Dataframe with columnnames:

Output:
Empty DataFrame
Columns: [Name, Articles, Improved]
Index: []
3. Creating an Empty Dataframe with columnnames and indices:

Output:
Name Articles Improved
a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
4. Creating Dataframes using Dictionary(Keys of dictionary- becomes column names)

Output:

5. Creating a Dataframe object from dictionary with custom indexing

41
Output:

6. Create DataFrame from Dictionary with required columns only.


# creating a Dataframe object with skipping one column i.e skipping age column

Output:

7. Find the output:

a) print(df)
b) print(df.index)
c) print(df.columns)
d) print(df.axes)
e) print(df.dtypes)
f) print(df.size)
g) print(df.shape)
h) print(df.values)
i) print(df.empty)
j) print(df.ndim)
k) print(df.T)
a) df :

42
b) df.index : It gives the index of the dataframe

c) df.columns : It shows column labels of dataframe

d) df.axes : It returns both the axes i.e index and column

e) print(df.dtypes) : It returns data type of data contained by dataframe

f) print(df.size) : It returns number of elements in an object

g) print(df.shape) : It returns tuple of dimension of dataframe

h) print(df.values): It returns numpy form of dataframe

i) print(df.empty) : It is an indicator to check whether dataframe is empty or not

j) print(df.ndim) : It returns an int representing the number of axes/dimensions

k) print(df.T) : It transpose index and columns

8. In the given Dataframe give the command to access column ‘Age’

43
Answer: ( All the codings given below will display column ‘Age’ )
print(df['Age'])
print(df.Age)
print(df.loc[:,'Age'])
print(df.iloc[:,1])
9. In the given Dataframe give the command to do the following:

a) Display columns TotalMarks and Grade


Answer: ( All the codings given below will display columns TotalMarks and Grade)
print(df[['TotalMarks','Grade']])
print(df.iloc[:,[1,2]])
print(df.iloc[:,1:3])
print(df.loc[:,['TotalMarks','Grade']])
print(df.loc[:,'TotalMarks':'Grade'])
b) Display columns TotalMarks and Promoted
Answer: ( All the codings given below will display columns TotalMarks and Promoted)
print(df[['TotalMarks','Promoted']])
print(df.iloc[:,[1,3]])
print(df.iloc[:,1:4:2])
print(df.loc[:,['TotalMarks','Promoted']])
print(df.loc[:,'TotalMarks':'Promoted':2])
print(df.get(['TotalMarks','Promoted']))
c) Display all columns from TotalMarks
Answer: ( All the codings given below will display Display all columns from TotalMarks)
print(df.iloc[:,1:])
print(df.loc[:,'TotalMarks':])
d) Display columns Name ,Totalmarks and grade
Answer: ( All the codings given below will display Display all columns Name ,Totalmarks and grade)
print(df[['Name','TotalMarks','Grade']])
print(df.iloc[:,[0,1,2]])
print(df.iloc[:,0:3])
print(df.loc[:,'Name':'Grade'])
print(df.loc[:,:'Grade'])
print(df.get(['Name','TotalMarks','Grade']))
10. Adding a single column:

44
a) Give the command to add a column named ‘C’ with all values as 11
Answer: ( All the codings given below will add column ‘C’)

df['C']=11
df['C']=[11,11,11,11]
df.insert(2, "C", 11)
df.insert(2, "C", [11,11,11,11])
[The insert function takes 3 parameters which are the index, the name of the column, and the values. The column
indices start from 0 so we set the index parameter as 2 to add the new column next to column B. ]
df.loc[:, "C"]=11
df=df.assign(C=11)[‘Note: C is not enclosed in quotes and it is assigned to df]

b) Adding single column City with a list of values ['Delhi', 'Bangalore', 'Chennai', 'Patna']
Answer: ( All the codings given below will add column ‘City’)
df['City']=['Delhi', 'Bangalore', 'Chennai', 'Patna']
df.insert(2,"City",['Delhi', 'Bangalore', 'Chennai', 'Patna'])
df = df.assign(City = ['Delhi', 'Bangalore', 'Chennai', 'Patna'])
df.loc[:,'City']=['Delhi', 'Bangalore', 'Chennai', 'Patna']
df.at[:,'City']=['Delhi', 'Bangalore', 'Chennai', 'Patna']

c)Adding a Single row


Answer: ( All the codings given below will add row with values [‘a’,’b’])
df.at[4]=['a','b']
df.at[4,:]=['a','b']
df.loc[4]=['a','b']
df.loc[4,:]=['a','b']

45
Worksheet - Basic Level Questions: (L1)

1) Create an Empty DataFrame


2) Create the above DataFrame
3) Find the output
a. df.shape
b. df.size
c. df.ndim
d. df.empty
e. df.columns
f. df.T
4) print(df.loc['d'])
5) print(df.loc['d','Name'])
6) print(df.loc['d',['Name','University']])
7) print(df.loc['d','Age':])
8) print(df.loc['d',:'Age'])
9) print(df.loc['b':'d'])
10) print(df.loc[['b','d']])

11) Which of the following can be used to specify the data while creating a DataFrame?
i. Series ii. List of Dictionaries iii. Structured ndarray iv. All of these
12) Carefully observe the following code:
import pandas as pd
Year1={'Q1':5000,'Q2':8000,'Q3':12000,'Q4': 18000}
Year2={'A' :13000,'B':14000,'C':12000}
totSales={1:Year1,2:Year2}
df=pd.DataFrame(totSales)
print(df)
Answer the following:
a. List the index of the DataFrame df
b. List the column names of DataFrame df.

Worksheet - Moderate Level Questions: (L2)

1) Write a Python code to create a DataFrame with appropriate column headings from the list given
below: [[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]
2) Consider the given DataFrame ‘Stock’:

46
Write suitable Python statements for the following:
i. Add a column called Special_Price with the following data: [135,150,200,440].
ii. Add a new book named ‘The Secret' having price 800. iii. Remove the column
Special_Price.
3) Mark the correct choice as
i. Both A and R are true and R is the correct explanation for A
ii. Both A and R are true and R is not the correct explanation for A
iii. A is True but R is False
iv. A is false but R is True
Assertion (A):- DataFrame has both a row and column index.
Reasoning (R): - A DataFrame is a two-dimensional labelled data structure like a table of
MySQL.
4) Mr. Som, a data analyst has designed the DataFrame df that contains data about Computer
Olympiad with ‘CO1’, ‘CO2’, ‘CO3’, ‘CO4’, ‘CO5’ as indexes shown below. Answer the
following questions:

A. Predict the output of the following python statement:


i. df.shape
ii. df[2:4]
B. Write Python statement to display the data of Topper column of indexes CO2 to CO4.

47
Worksheet - Difficult questions(L3):
If df is as given below, find the output of 1 to 14 and write commands for 15 to 20
1. print(df.loc['a':'d':2])
2. print(df.loc['b':'d','Name'])
3. print(df.loc[['b','d'],'Name'])
4. print(df.loc['a':'d':2,['Name','Age']])
5. print(df.loc['a':'d':2,'Name':'University'])
6. print(df.at['b','Name'])
7. df.at['b','Name']='Ravi'
print(df)
8. print(df.iat[2,1])
9. df.iat[2,1]=111
print(df)
10. print(df.iloc[2])
11. print(df.iloc[2:4])
12. print(df.iloc[2,2])
13. print(df.iloc[1:,1:])
14. df.iloc[2,2]='RU'
15. Display the details of Students who are from BHU university
16. Display the details of Students whose age is more than 21
17. Display the names of Students who are from JNU University
18. Display name and age whose university is DU
19. Give all the possible ways of displaying column Age
20. Make all the values as 0

Additional Practice Questions on Series:


1. What do you mean by pandas in python?
2. Name three data structures available in pandas?
3. What do you mean by Series in python?
4. Write the code in python to create an empty series.
5. Name a method which is used to create series in python.
6. Write a Program in python to create a series of first five even numbers.
7. Write a Program in python to create series of vowels.
8. Write a Program in python to create series of given tuple:A=(11,22,33,44,55)
9. Write a Program in python to create the pandas series of all the characters in the name
accepted from user.
10. Write a Program in python to create a series in python from the given dictionary.
D={"Jan":31,"Feb":28,"Mar":31}
11. Write a Program in python to create a series from dictionary that stores classes
(8,9,10,11,12) as keys and number f students as values.
12. Write the output of the following:
import pandas as pd
S1=pd.Series(15,index=[1,2,3])
print(s1)
13. Write the output of the following:
48
import pandas as pd
S1=pd.Series(range(2,16,2),index=[a for a in "super"])
print(s1)
14. Write the output of the following:
import pandas as pd
S1=pd.Series(range(101,151,11),index=[a for a in "My name is Arpita Misra".split()])
print(s1)
15. Write the output of the following:
import pandas as pd
L1=[1,"A",23]
S1=pd.Series(data=2*L1)
print(S1)
16. Name any two attributes of series in python.
17. Which property of series return all the index value?
18. Which property of Series returns the number of elements in the Series.
19. Write the output of the following:
import numpy as num
import pandas as pd
arr=num.array([1,7,21])
S1=pd.Series(arr)
print(S1)
20. Write the output of the following:
import numpy as num
import pandas as pd
arr=num.array([1,7,21])
S1=pd.Series(arr,index=(88,888))
print(S1)
21. Write the output of the following:
import numpy as num
import pandas as pd
arr=num.array([21,57,131])
S1=pd.Series(arr,index=(8,88,888))
print(S1[888])
22. Write the output of the following :
import numpy as num
import pandas as pd
arr=num.array([21,57,141])
S1=pd.Series(arr)
print(S1[0])
23. Write the output of the following :
import pandas as pd
L1=list("My name is Aarthi")
S1=pd.Series(L1)
49
print(S1[0])
24. Write the output of the following :
import pandas as pd
L1=list("My name is Aarthi".split( ))
S1=pd.Series(L1)
print(S1[0])
25. Give an example of creating Series from numpy array.
26. Which property of series help to check whether a Series is empty or not?Explain with
example
27. Fill in the blanks in the given code.
import pandas as pd
__________= _____________.Series([1,2,3,4,5])
print(S1)
28. Fill i the blanks in the given code,if the output is 71.
import pandas as pd
S1=pd.Series([10,20,30,40,71,50])
print(S1[________])
29. Complete the code to get the required output.
import ______ as pd
________=pd.Series([21,28,41],
index=["Jan","Feb","Mar"])
print(S1["________"])

Output:
28
30. Explain any three methods of pandas Series.

50
Importing and Exporting data between CSV files and Dataframes

CSV files
• Comma separated values files
• Data in tabular format
• Can be imported and exported from programs

To create a CSV file


• Open Note pad and create a new file
• Enter the data separated by commas and each rows separated by new lines
• Save the file with extension .csv

Importing data to dataframe from csv file


Function used
pd.read_csv( ) is the function used to read a csv file

51
Exporting data from dataframe to csv file
Function used
Dataframe.to_csv()

Worksheet for CSV files


1 Full form of CSV is …………………..
2 The function used to import data from csv file to dataframe is ………….
3 The function used to export data to csv file from dataframe is ………….
4 Write a program to export data to csv from a dataframe containing employee details.
5 Write a program to import data from csv file containing student details to dataframe and
display it.

52

You might also like