Data Handling using pandas - I Q & ANS (1)
Data Handling using pandas - I Q & ANS (1)
Worksheet
1. Write a python panda code to create a 1D array of size 8 with all elements as zero.
Assign 20 to 2nd element. (1)
Ans:
import pandas as pd
import numpy as np
a=np.zeros(8,dtype=int)
s=pd.Series(a)
print(s)
s[1]=20
print(s)
OR
import pandas as pd
s=pd.Series(0,index=[0,1,2,3,4,5,6,7])
print(s)
s[1]=20
print(s)
Ans:
(i) b 30
d 40
(ii) b 20
b 30
3. Differentiate between series data structure and dataframe data structure? (2)
Series
➢ one-dimensional array
➢ homogenous data
➢ one axis. It contains row and therefore it has row index.
➢ data is mutable
➢ size is immutable
DataFrame
➢ a two-dimensional labelled data structure like a table of MySQL.
➢ Heterogeneous data
➢ Two axes. It contains rows and columns, and therefore has both a row and column
index.
➢ Data is mutable.
➢ Size is mutable.
6 70
7 80
8 90
dtype: int64
0 1
1 30 40
2 50 60
3 70 80
dtype: int64
5. Write a python program to create a pandas DataFrame from a dictionary of Series. (2)
Ans:
import pandas as pd
s1 = pd.Series([1,2,3,4,5],index = ['a', 'b', 'c', 'd', 'e'])
s2 = pd.Series ([10,20,30,40,50],index = ['a', 'b', 'c', 'd', 'e'])
df = pd.DataFrame([s1, s2])
print(df)
Output
a b c d e
0 1 2 3 4 5
1 10 20 30 40 50
6. Write a python program to create a pandas DataFrame from a list of dictionaries. (2)
Ans:
import pandas as pd
df=pd.DataFrame([{'name':'ali','age':16,'mark':80},{'name':'zubair','age':17},{'nam
e':'amaan','age':16}])
print(df)
Output:
name age mark
0 ali 16 80.0
1 zubair 17 NaN
2 amaan 16 NaN
7. Give the output of the following code: (1)
import pandas as pd
dict={'Name':pd.Series(['Anoop','Abhi','Raju','Mitu']),'Age':pd.Series([16,15,17,18]),
'Score':pd.Series([57,97,76,65])}
df=pd.DataFrame(dict)
print("Dataframe contents")
print("*********************")
print(df)
Ans:
Dataframe contents
*********************
Name Age Score
0 Anoop 16 57
1 Abhi 15 97
2 Raju 17 76
3 Mitu 18 65
8. Difference between loc() and iloc() functions in series. Illustrate the functions with the
help of a program. (2)
Ans:
loc (): loc is used for selecting or setting elements of a dataframe based on label (by row
name or column name).
iloc (): iloc is used for selecting elements of a dataframe based on position. It refers to
position-based indexing.
import pandas as pd
print("creation of dataframe from dictionary of list")
df2=pd.DataFrame({'name':['ali','giri','mini','geena','meena','reena'],'age':[15,16,17,1
6,16,17],'mark':[60,70,80,90,85,75]},index=['s1','s2','s3','s4','s5','s6'])
print(df2)
print("to display the rows s1,s3,s5 and columns name,mark using loc")
print(df2.loc[['s1','s3','s5'],['name','mark']])
print("to display the rows s1,s3,s5 and columns name,mark using iloc")
print(df2.iloc[0:5:2,0:3:2])
13. Given two objects, a list Object namely Mylist and a Series Object namely MySeries,
both are having similar contents i.e. 1 3 5 7 9. Find out the output produced by the
following two statements
a. print(Mylist * 2) b. print(MySeries * 2) (2)
14. Create a python program that creates the 2 series given below and perform any four
arithmetic operations with the given Series and print the result. (2)
Series1 Series 2
0 10 0 2
1 20 1 3
2 30 2 4
Ans:
b 20
c 30
dtype: int64
19. Explain Boolean indexing in data frame. Illustrate Boolean indexing using a data frame
program. (2)
20. Write a program to create and perform following operations on rows and
columns of data frame. (6)
(i) creating new row in existing dataframe
(ii) Creating new column in existing dataframe
(iii) print first 3 rows
(iv) print first and third column
(v) delete a column using drop function
21. Write the command to find the sum of series S1 and S2 (1)
Ans:
print(S1+S2)
Ans:
i. stud[‘mark’]=[30,45]
ii. stud.loc[‘S3’]=[105,’Murali’, ‘X’]
23. Consider two objects x and y. x is a list whereas y is a Series. Both have values 20,
40,90, 110. What will be the output of the following two statements considering that the
above objects have been created already. (3)
a. print (x*2) b. print(y*2)
Justify your answer.
Ans:
a. will give the output as: [20,40,90,110,20,40,90,110]
b. will give the output as
0 40
1 80
2 180
3 220
Justification: In the first statement x represents a list so when a list is multiplied by a
number, it is replicated that many number of times. The second y represents a series. When
a series is multiplied by a value, then each element of the series is multiplied by that
number.
24. The command used to display the last 2 rows in a dataframe df is………… (1)
Ans:
df.tail(2)
25. Name any one python library generally used for data analysis. (1)
Ans: pandas
Ans:
iii) option a and c
a) Replace the index with student name as [Siya, Ram, Fiza, Diya, Manish].
b) Display the failed students (passing mark is 33)
Ans:
i) S.index=[' Siya ',' Ram ',' Fiza ','Diya',' Manish ']
ii) print(S[S<33])
28. Consider the following DataFrame df and answer any four questions from (i) to (iv)
i) Write down the command to add a new column “Height” with values
156,173,140,146,185 (1)
a) df ['Height']=[ 156,173,140,146,185]
b) df. Height=[ 156,173,140,146,185]
c) df (Height) =[ 156,173,140,146,185]
d) both (a) and (b)
Ans:
a) df ['Height']=[ 156,173,140,146,185]
ii) Write down the command to display the column “Name‟ from the dataframe. (1)
a) print(df.Name) b) print(df[column]=‟Name‟)
c) print(df[“Name‟]) d) Both (a) and (c)
Ans:
d) Both (a) and (c)
iii) Write command to display the number of rows and columns in dataframe. (1)
a) print(df.size) b) print(df[index,column])
c) print(df.shape) d) print(df.ndim)
Ans:
c) print(df.shape)
iv) Write command to delete the column “Age‟ from the dataframe. (1)
a) del df[‘Age’] b) drop df['Age']
c) df.del[“Age‟] d) drop[“Age‟]
Ans:
a) del df[‘Age’]
29. Create a series Month (from Jan-May) , from a dictionary having number of days as data
and month name as keys. (2)
Ans:
import pandas as pd
dic={'Jan':31,'Feb':28,'Mar':31,'Apr':30,'May':31}
s=pd.Series(dic)
print(s)
30. The average marks of 5 subjects in three divisions given below: (5)
Ans:
i) import pandas as pd
dic={'Division A':{'English':65,'Maths':45,'Science':87},
'Division B':{'English':67,'Maths':34,'Science':87},
'Division C':{'English':87,'Maths':87,'Science':56}}
d=pd.DataFrame(dic)
print(d)
ii) print(d.rename(columns={'Division C':'Division D'}))
iii)print(d['Division A'])
iv) print(d.index.values)