0% found this document useful (0 votes)
4 views16 pages

DXE 24gksmknvj

This document is a question paper for a Data Exploration and Visualization course, containing instructions and questions divided into two sections. Section A is compulsory and consists of various programming tasks related to Python libraries such as NumPy, Pandas, and Matplotlib. Section B includes additional questions that require practical coding skills and understanding of data manipulation and visualization techniques.

Uploaded by

royalnitinrao1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views16 pages

DXE 24gksmknvj

This document is a question paper for a Data Exploration and Visualization course, containing instructions and questions divided into two sections. Section A is compulsory and consists of various programming tasks related to Python libraries such as NumPy, Pandas, and Matplotlib. Section B includes additional questions that require practical coding skills and understanding of data manipulation and visualization techniques.

Uploaded by

royalnitinrao1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

|This question paper contains 16 printed pages.

Your Roll No....

Sr. No. of Question Paper : 4270 H

Unique Paper Code 2343012009


Name of the Paper Data Exploration and
Visualization
Name of the Course
Common Prog Group
(DSE/GE)
Year of Admission
: 2022
Semester : IV

Duration: 3 Hours Maximum Marks: 90

Instructions for Candidates


1. Write your Roll No. on the top immediately on
receipt
of this question paper.

2. Section A is compulsory.
3. Attempt any four questions from Section B.
4. All the parts of a question must be answered together.
5. Assume that numpy has been imported as np and
pandas has been imported as pd.

P.T.0.
4270 2

Section A

on the execution
1. (a) What will be the output produced
(2)
of the following code?

56.9,29.4,-72.7])
narr = np.array ([-4.6, 8.2, 14.5,
narrl = narr.astype (np. int32)
print (narr1)

a column EmpId.
(b) Consider a DataFrame dfl with
rows
Write a Python statement to remove
containing duplicate Empld values from
dfl and
combination.
keep oniy the last observed value
(2)

below:
(c) (i) Plot the graph generated by the code
(2)

import matplotlib.pyplot as plt


plt.plot ( [5, 4, 3, 2])

(ii) Show the change in plot, if any, when the


code is modified to

plt.plot ([5, 4, 3, 2], *b')

(d) Write a code in Python using the plotly express


library to draw a line graph for the function
y=sin (x) for x in the range 0 to I (pi). (3)
4270 3

(e) Given the following arrays: (4)

arr x = np. array ([[3, 5, 71, [2, 4, 6]1)


arr y= np. array ([[1, 2, 31, [4, 5, 611)
What will be the output produced on the execution
of the following statements?

(i) arr_x.dtype

(ii) arr y - arr x

(iii) 1/arr_ y

(iv) arr_ x > arr_y

(f) What are the following functions used for?


Illustrate with suitable examples. (4)

(i) reindex()

(i) arange()

(g) Differentiate between the use of the following


Python functions/data structures with the help of
suitable examples. (4)

() np.random.rand and np.random.randn


(ii) Series and DataFrames

P.T.0.
4270 4

(h)
data = pd. DataFrame ({
(4)
'Dept': ['X', X', Y', 'Y',
'Age': [19, 21, 23, 21, 20],
' Category': ['I', 'II', ' I , II',
Salary': [1800, 2100, 2000, 2200, 1100]
})

For the DataFrame df given above, write Python


statements to :

(i) Generate a cross-tabulation to summarize


the data by Dept and Category.

(ii) Define a function to select the row with


the largest values in the Salary column.

(iii) Group the data by Dept.

(iv) Apply the function defined in part (ii) to


print the row with the maximum Salary for
each group.

(i) State whether the following statements are True


or False. (5)

(1) Reshape and pivot tables are used for


rearranging tabular data.
4270 5

(i) The dimension of a numpy array with shape


(2, 1, 4) is 8.

(iii) Plotly is an open-source libraiy for


interactive visualization.

(iv) plot) function in the module matplotlib.pyplot


draws scatter plots by default.

(v) Pandas is designed to work with Series


and DataFrames while NumPy is best
suited for numeric arrays.

Section B

2. (a) Given the array arrl:


(5)
arrl =
np.array(CÚ[3,-7], 9,2]], [[-15,4], [1,2]]])
Write numpy commands to perform the following
operations :

(i)) Create an array of ones with the same


shape as arrl.

(ii) Print the elements of arrl which are less


than 2.

P.T.0.
4270 6

(iii) Print the dimensions of arr1.

(iv) Print the datatype of elements stored :


arr1.

(b) Consider the following DataFrame df. (S)

Col1 Col2

B23 52

A 43 89

C 12 86

Write statements to perform the following


operations:

(i) Set the name of the index as ID.

(ii) Add column Co 13 with the values 23, 12,


10.

(iii) Find the index for the row contain1ng


ne
maximum value of Col2.

(iv) Reindex df in the order A, B, and


C.
(v) Delete the row corresponding to
index
4270 7

(c) In which situations is the read csv) function


used? List any two parameters used in the function
and describe their significance giving suitable
examples. (5)

3. (a) Given the following Series serl and ser2 in


Pandas :
(5)

serl ser2

A 3 A 9

B 2 B 8

C 1 D 6

D 4 7

What willbe the output produced on the execution


of the following code?

()) serl.values

(ii) ser2.index

(ii) ser2[:3] * 5

(iv) serl Ser2

(v) ser2|: :

P.T.0.
8
4270
represented in Pandas?
(b) How are missing values
detected.
Explain how the missing values are
Pandas.
removed and replaced using functions in
(6)

salesman
(c) Consider the year-wise sales made by a
89,
in the last ten years for product A: [80, 92, 88,
100, 70, 78, 88, 80, 34] and product B: [35, 76, 35,
88, 100, 48, 79, 88, 65, 35]. Write Python
statements to

(i) Import the necessary libraries and draw a


scatter plot of year-wise sales of products
A and B.

(ii) Label Y-axis of the plot as Sales'.

(iii) Title the plot as 'Comparison of Sales'

(iv) Place ticks on the Y-axis as 30, 40, 50,...,


100. (4)

4. (a) Consider the DataFrames First and Second given


below : (5)
4270 9

First Second
C OneC Two C OneCTwo
0 W 0 X
2 X 1 7

3
U
6 7 2 W

Consider the following python code


segment :
right res=pd.merge (First, Second, how='right',
left res=pd.merge (First, Second, on='C TwO')
how=inner', on=C One')
Show the contents of the new
DataFrames
right res and left res.

(b) Consider the given Series names


and data given
below :
(5)

names = np.array(['Ram', 'Raj', 'Rahim', 'Shamshi',


'Gourav', 'Raj', 'Raj'])
data
np.arange(14).reshape(7,2)
What will be the output produced on the execution
of the following code?

P.T.0.
4270 10

() print (data.T)

(ii) print (data [~ (names == "Raj']), 2:)

(iii) print (data [-2, -4, -7])


(iv) print (data [1, 5, 21, [0, 0, 1]])
(v) dl=data[0].copy
dl=13

print(data)

(c) Consider the monthly rent (in INR


thousands) for
ten apartments in a city :
(5)

rent = [20, 12.5, 35, 47.5, 56, 22, 17.5, 31, 28, 40]

Write python code to:

() create a Pandas object to hold rents of


apartments.

(ii) display the average and total of the given


rent values.

(iii) display the rent values greater than the


median rent.
4270 11

(iv) draw a line graph using matplotlib, plotting


data
adashed line with black color, and the
Points marked with an x sign on the drawn
line.

5. (a) Given the DataFrame


df: (7.5)

df = pd. DataFrame ({
' category': ['A' 'B "B', "A'],
type: ['X', "Y', Z'
value': np.arange (10, 20, 2)
})

What willbe the output produced on the execution


of the following code?

(i) print(df)

(ii) xl=df ['value'].groupby ([df[' category'1,


df['type']]).mean ()
print (x1)

(ii) x2=d£ ['value'].groupby (df ['type']).mean()


print (x2)

(iv) x3=dict (1list (df.groupby(' keyl')))


print (x3['a'])

T
4270 12
students
(b) Consider the following data about five
enrolled in a course: (7.5)

RollNo Dept Age Marks

2101 CS 18 87

2102 CS 19 45

2103 CS 19 69

3101 Stats 20 91

3102 Stats 21 56

Write commands in Python to

(i) Create a DataFrame df to store the above


data.

(ii) Give the summary level statistics for data


stored in df.

(iii) Compute the correlation between columns


Age and Marks.

(iv) Compute the average marks obtained by


students of each department.
4270
13

ia) Write Python commands to plot a 2 x 2 grid si


to the one shown below using matplotlib's
figure
and add subplot method such that: (7)
(i) Plot 1 is a scatter plot with 10 random
points.

(ii) Plot 2 is a line plot with 5


points in
decreasing order.
(iii) Plot 3 is a histogram with 50 randomly
generated numbers plotted in 10 bins.

(iv) Plot 4 is empty.

5
0.5
4

0.0 3

-0.5 2

2 3 4
12.5 1.0

10.0 0.8
7.5
0.6
5.0
0.4
2.5 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.3 I.0

P.T.0,
4270 14
following
(b) Consider the file data.csv with the
contents : (8)

1,2, 3, 4, hello

5, 6, 7, 8, world
9, 10, 11,12, foo

the
(i) Write a Python statement to read
the
above file into a DataFrame data with
last column (hello, world, and foo)
constituting the index.

data.
(ii) List the contents of DataFrame

the
(iii) Write a Python statement to store
DataFrame data in the file updated data.csv
using colon (:) as the separator.

(iv) List the contents of the text file


updated_data.csv created in the previous
step.

7. (a) Consider the lists containing names of some tech


companies and their share prices given below as
Python lists : (5)
15
4270

company_names = [Tesla', 'Apple', Google',

'Amazon']

share_price = [25, 25, 40, 10]


plotlygraph objects
Write a code in Python using the
company prices 2s
module to plot apie-chart with
values.
labels and share prices as

execution
produced on the
(b) What will be the output (10)
of the following code?

.reshape (4, 4)
np.arange (16)
(1) a=
print (a)
a[(a 7) ] = -1
print (a)

1]])
([ [4, 3], [2,
np.array
(ii) x = zeroes like (x)
y = np.
print (y)
Z= np.eye (4) * X
print (z)

'blue', 'purple', 'yellow']


Series ([
(iii) obj = pd. 2, 4])
index=[0,
print (obj) method=ffill)
obj.reindex (range
(6),
print (obj)

P.T.0.
4270 16

Y!
(iv) obj1 = pd. Series (['X',"three',
A' B'],
index=['one ' , 'two', 'five' ])
print (obj1)
print (obj1 ['three':'five'])

(v) df=pd. DataFrame ([ [1, 2, 1], [3, 1, 1,], [4, 2, 01.


'x4"],
[1, 2, 3]], index = ('xl', 'x2', 'x3', 'Plot !))
columns = pd.Index (['P', 'Q', 'R'], name =
df.plot.bar ()

(500)

You might also like