DL Assignment Solution 00 To 10

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 0
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
𝑑𝑓
Find where 𝑓 = |𝑥|? |𝑥| means absolute of 𝑥.
𝑑𝑥

a. 1
b. 𝑆𝑖𝑔𝑛(𝑥)
c. 0
d. ∞

Correct Answer: b

Detailed Solution:

𝑑𝑓 1 𝑥>0
= {−1 𝑥 < 0 = 𝑠𝑖𝑔𝑛(𝑥)
𝑑𝑥
0 𝑥=0
______________________________________________________________________________

QUESTION 2:
𝑑𝜎 1
Find , where 𝜎 (𝑥 ) =
𝑑𝑥 1+𝑒 −𝑥
𝑑𝜎
a. = 1 − 𝜎(𝑥)
𝑑𝑥
𝑑𝜎
b. = 1 + 𝜎(𝑥)
𝑑𝑥
𝑑𝜎
c. = 𝜎(𝑥)(1 − 𝜎 (𝑥 ))
𝑑𝑥
𝑑𝜎
d. = 𝜎(𝑥)(1 + 𝜎 (𝑥 ))
𝑑𝑥

Correct Answer: c

Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

1
𝜎(𝑥) =
1 + 𝑒 −𝑥
𝑑𝜎
= (1 + 𝑒 −𝑥 )−2 ∗ 𝑒 −𝑥
𝑑𝑥
𝑑𝜎 𝑒 −𝑥 1 + 𝑒 −𝑥 − 1 1 1 1 1
= = = − = (1 − )
𝑑𝑥 (1 + 𝑒 −𝑥 )2 (1 + 𝑒 −𝑥 )2 1 + 𝑒 −𝑥 (1 + 𝑒 −𝑥 )2 1 + 𝑒 −𝑥 1 + 𝑒 −𝑥

𝑑𝜎
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥
______________________________________________________________________________

QUESTION 3:
There are 5 black 7 white balls. Assume we have drawn two balls randomly one by one without
any replacement. What will be the probability that both balls are black?

a. 20/132
b. 25/144
c. 20/144
d. 25/132

Correct Answer: a

Detailed Solution:

Probability of first ball being black = 𝟓/(𝟓 + 𝟕) = 𝟓/𝟏𝟐.

Probability of drawing second ball black is = 𝟒/(𝟒 + 𝟕) = 𝟒 /𝟏𝟏.

Now overall probability of both balls being black = (𝟓/𝟏𝟐) × (𝟒/𝟏𝟏) = 𝟐𝟎/𝟏𝟑𝟐

______________________________________________________________________________

QUESTION 4:
Two dices are rolled together. What will be the probability of getting 1 and 4 together?

a. 1/18
b. 1/36
c. 1
d. None of the above

Correct Answer: a
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

Number of possible outcomes = (𝟔 × 𝟔) = 𝟑𝟔.

Number of times getting 𝟏 & 𝟒 together = 𝟐 (where 𝟏 in first dice, 𝟒 in second dice or 𝟒 in
first dice, 𝟏 in second dice).

So, probability = 𝟐/𝟑𝟔 = 𝟏/𝟏𝟖

_____________________________________________________________________________

QUESTION 5:
What will be possible median of the distribution?

a. 26
b. 34
c. 43
d. 55

Correct Answer: b

Detailed Solution:

Total Population = (𝟐𝟕𝟓 + 𝟐𝟗𝟏 + 𝟏𝟎𝟓 + 𝟏𝟐𝟑 + 𝟏𝟑𝟏 + 𝟏𝟓𝟎 + 𝟏𝟏𝟎 + 𝟗𝟎 + 𝟔𝟎 + 𝟒𝟗 +


𝟓𝟎) = 𝟏𝟒𝟑𝟒.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

So, median is the average of (𝟏𝟒𝟑𝟒/𝟐) = 𝟕𝟏𝟕th value and 𝟕𝟏𝟖th value.

So, median of the distribution is in the range of 𝟑𝟎 − 𝟒𝟎.


So, option b may be the result.
______________________________________________________________________________

QUESTION 6:

Image shows there normally distributed probability distribution function with zero mean and
three different variances (𝜎1 , 𝜎2 , 𝜎3 ). Which of the following relationship is valid?

a. 𝜎1 > 𝜎2 > 𝜎3
b. 𝜎1 < 𝜎2 < 𝜎3
c. 𝜎1 = 𝜎2 = 𝜎3
d. 𝜎1 > 𝜎2 < 𝜎3

Correct Answer: b

Detailed Solution:

Higher variance means the spread of the distribution will be higher. So, 𝝈𝟏 < 𝝈𝟐 < 𝝈𝟑

____________________________________________________________________________

QUESTION 7:
Matrix inverse of a square matrix 𝐴 exists if.

a. Determinant of 𝐴, 𝑑𝑒𝑡(𝐴) = 0
b. Eigen values of 𝐴 are non-zero
c. Sum of eigen values are non-zero
d. None of the above
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: b

Detailed Solution:

Matrix inverse exists if 𝒅𝒆𝒕(𝑨) is not equal to zero. 𝒅𝒆𝒕(𝑨) = product of all the eigen
values of the square matrix.

_________________________________________________________________________

QUESTION 8:
1 −2
𝑥1 , 𝑥2 , 𝑥3 are the linearly independent vectors. If 𝑥1 = [3] , 𝑥2 = [ 4 ], what is the possible
0 −5
value of 𝑥3 ?

−1
a. [ 7 ]
−5
0
b. [ 10 ]
−5
3
c. [4]
5
5
d. [−5]
10
Correct Answer: c

Detailed Solution:

𝑿 = [𝒙𝟏 𝒙𝟐 𝒙𝟑 ]. 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 are linearly independent if 𝒅𝒆𝒕𝒆𝒓𝒎𝒊𝒏𝒂𝒏𝒕(𝑿)/𝒅𝒆𝒕(𝑿) = 𝟎

𝟏 −𝟐 𝟑
𝒅𝒆𝒕([𝟑 𝟒 𝟒]) ≠ 𝟎
𝟎 −𝟓 𝟓
We also can validate linear dependency of option a, b, d.

Option a: 𝒙𝟏 + 𝒙𝟐 = 𝒙𝟑 ,

Option b: 𝟐𝒙𝟏 + 𝒙𝟐 = 𝒙𝟑 ,

Option d: 𝒙𝟏 − 𝟐𝒙𝟐 = 𝒙𝟑
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 9:
𝑥 + 2𝑦 − 𝑧 = 1 ∙∙∙∙∙∙∙∙∙∙ (1)
−2𝑥 − 4𝑦 + 2𝑧 = −2 ∙∙∙∙∙∙∙∙∙∙∙ (2)
𝑧 = 2 ∙∙∙∙∙∙∙∙∙∙ (3)

What are the values of 𝑥, 𝑦, 𝑧?

a. 𝑥 = 0, 𝑦 = 0, 𝑧 = 2
b. 𝑧 = 2 and infinitely possible 𝑥, 𝑦
c. 𝑧 = 2 and no possible 𝑥, 𝑦
d. None of the above

Correct Answer: b

Detailed Solution:

____________________________________________________________________________

QUESTION 10:
What are the eigen values of the matrix A?
5 4
𝐴=[ ]
−3 −2
a. 4, −3
b. 5, −2
c. −2, −1
d. 2, 1

Correct Answer: d

Detailed Solution:

𝐝𝐞 𝐭(𝝀𝑰 − 𝑨) = 𝟎
𝝀 − 𝟓 −𝟒
𝒐𝒓, 𝒅𝒆𝒕 ([ ]) = 𝟎 𝒐𝒓, (𝝀 − 𝟓)(𝝀 + 𝟐) + 𝟏𝟐 = 𝟎 𝒐𝒓, 𝝀𝟐 − 𝟑𝝀 + 𝟐 = 𝟎 𝒐𝒓, 𝝀 = 𝟐, 𝟏
𝟑 𝝀+𝟐

______________________________________________________________________

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 1
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2= 20
______________________________________________________________________________

QUESTION 1:
Signature descriptor of an unknown shape is given in the figure, can you identify the unknown
shape?

a. Circle
b. Square
c. Straight line
d. Cannot be predicted
Correct Answer: a
Detailed Solution:
Distance from centroid to boundary is same for every value of ϴ. This is true for Circle
with a radius k.

______________________________________________________________________________

QUESTION 2:
To measure the Smoothness, coarseness and regularity of a region we use which of the
transformation to extract feature?
a. Gabor Transformation
b. Wavelet Transformation
c. Both Gabor, and Wavelet Transformation.
d. None of the Above.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: c
Detailed Solution:
One of the important approach to region description is texture content. This helps to provide
the measure of some of the important properties of an image like smoothness, coarseness
and regularity of the region. We use Gabor filter and Wavelet transformation to extract
texture feature.

QUESTION 3:
Suppose Fourier descriptor of a shape has K coefficient, and we remove last few coefficient and
use only first m (m<K) number of coefficient to reconstruct the shape. What will be effect of
using truncated Fourier descriptor on the reconstructed shape?
a. We will get a smoothed boundary version of the shape.
b. We will get only the fine details of the boundary of the shape.
c. Full shape will be reconstructed without any loss of information.
d. Low frequency component of the boundary will be removed from contour of the
shape.

Correct Answer: a
Detailed Solution:
Low frequency component of Fourier descriptor captures the general shape properties of
the object and high frequency component captures the finer detail. So, if we remove the last
few component, then the finer details will be lost, and as a result the reconstructed shape
will be smoothed version of original shape. The boundary of the reconstructed shape will
be a low frequency approximation of the original shape boundary.
______________________________________________________________________________
QUESTION 4:

While computing polygonal descriptor of an arbitrary shape using splitting technique, which of
the following we take as the starting guess?

a. Vertex joining the two closet point above a threshold on the boundary.
b. Vertex joining the two farthest point on the boundary.
c. Vertex joining any two arbitrary point on the boundary.
d. None of the above.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: b

Detailed Solution:

Options are self-explanatory.

_____________________________________________________________________________

QUESTION 5:
Consider two class Bayes’ Minimum Risk Classifier. Probability of classes W1 and W2 are, P (ω1)
=0.3 and P (ω2) =0.7 respectively. P(x) = 0.545, P (x| ω1) = 0.65, P (x| ω2) =0.5 and the loss
𝜆 𝜆12
matrix values are [ 11 ]
𝜆21 𝜆22

If the classifier assign x to class W1, then which one of the following is true.
𝜆21 −𝜆11
a. < 1.79
𝜆12 −𝜆22

𝜆21 −𝜆11
b. > 1.79
𝜆12 −𝜆22

𝜆21 −𝜆11
c. < 1.09
𝜆12 −𝜆22

𝜆21 −𝜆11
d. > 1.09
𝜆12 −𝜆22

Correct Answer: b

Detailed Solution:

𝜆21 − 𝜆11 𝑃(𝜔2 ⁄𝑥)


>
𝜆12 − 𝜆22 𝑃(𝜔1⁄𝑥)
Now, P(ω1/x) = P (ω1)* P (x| ω1) / P(x) = 0.3*0.65 / 0.545 = 0.358

P(ω2/x) = P (ω2)* P (x| ω2) / P(x) = 0.7*0.50 / 0.545 = 0.642

𝜆21 − 𝜆11
> 1.79
𝜆12 − 𝜆22
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 6:
The Fourier transformation of a complex sequence of number 𝑠(𝑘) for 𝑘 = 0, … , 𝑁 − 1 is given
by:

a. 𝑎(𝑢) = ∑𝑁−1
𝑘=0 𝑠(𝑘)𝑒
𝑗2𝜋𝑢𝑘/𝑁

b. 𝑎(𝑢) = ∑𝑁
𝑘=0 𝑠(𝑘)𝑒
𝑗2𝜋𝑢𝑘/𝑁

c. 𝑎(𝑢) = ∑𝑁−1
𝑘=0 𝑠(𝑘)𝑒
−𝑗2𝜋𝑢𝑘/𝑁

𝑁⁄
2 −𝑗2𝜋𝑢𝑘/𝑁
d. 𝑎(𝑢) = ∑𝑘=− 𝑁⁄ 𝑠(𝑘)𝑒
2

Correct Answer: c

Detailed Solution:

Options are self-explanatory.

_____________________________________________________________________________

QUESTION 7:
The gray co-occurrence matrix C of an unknown image is given in below. What is the value of
maximum probability descriptor?

1 2 2

2 1 2

2 3 2

Fig 1: C

a. 3/17

b. 1/12

c. 3/16

d. 5/16
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: a

Detailed Solution:

Maximum probability = max (cij). cij is normalized co-occurrence matrix. Total values in C
is 17.

______________________________________________________________________________

QUESTION 8:
Which of the following is not a boundary descriptor.

a. Polygonal Representation
b. Fourier descriptor
c. Signature
d. Histogram.

Correct Answer: d

Detailed Solution:

Histogram is a region descriptor.

______________________________________________________________________________

QUESTION 9:

We use gray co-occurrence matrix to extract which type of information?


a. Boundary

b. Texture

c. MFCC

d. Zero Crossing rate.


Correct Answer: b
Detailed Explanation: We use different feature from the gray co-occurrence matrix to
determine the textural content of an image region.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 10:
If the larger values of gray co-occurrence matrix are concentrated around the main diagonal,
then which one of the following will be true?

e. The value of element difference moment will be low.

f. The value of inverse element difference moment will be low.

g. The value of entropy will be very low.

h. None of the above.

Correct Answer: a

Detailed Solution:

Options are self-explanatory. We can’t comment anything on the entropy based on the
values of diagonal elements. Because it depends on the randomness of the value. Whereas
element difference moment will be low and inverse element difference moment will be high.

______________________________________________________________________________

************END***********
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 2
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________

QUESTION 1:
Suppose if you are solving an n-class problem, how many discriminant function you will need
for solving?

a. n-1
b. n
c. n+1
d. n-2

Correct Answer: b

Detailed Solution: For n class problem we need n number of discriminant function.

______________________________________________________________________________

QUESTION 2:
If we choose the discriminant function 𝑔𝑖 (𝑥) as a function of posterior probability. i.e. 𝑔𝑖 (𝑥) =
𝑓(𝑝(𝑤𝑖 ⁄𝑥)). Then which of following cannot be the function 𝑓( )?

a. f(x) = a𝑥 , 𝑤ℎ𝑒𝑟𝑒 𝑎 > 1


b. f(x) = a−𝑥 , 𝑤ℎ𝑒𝑟𝑒 𝑎 > 1
c. f(x) = 2x + 3
d. 𝑓(𝑥) = exp(𝑥)

Correct Answer: b

Detailed Solution:

The function f () should be a monotonic increasing function.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 3:
What will be the nature of decision surface when the covariance matrices of different classes are
identical but otherwise arbitrary? (Given all the classes has equal class probabilities)

a. Always orthogonal to two surfaces


b. Generally not orthogonal to two surfaces
c. Bisector of the line joining two mean, but not always orthogonal to two surface.
d. Arbitrary

Correct Answer: c

Detailed Solution:

Options are self-explanatory.

_____________________________________________________________________________

QUESTION 4:
The mean and variance of all the samples of two different normally distributed class ω1 and ω2
are given

3 1⁄ 0 3 2 0
𝜇1 = [ ] ; Σ1 = [ 2 ] 𝑎𝑛𝑑 𝜇2 = [ ] ; Σ2 = [ ]
6 0 2 −2 0 2

What will be the value expression of decision boundary between these two classes if both the
𝑥1
class has equal class probability 0.5? For the input sample 𝑥 = [𝑥 ] consider 𝑔𝑖 (𝑥) = 𝑥 𝑡 −
2
1 1 1
Σ −1 𝑥 + Σ𝑖−1 𝜇𝑖 𝑥 − 2 𝜇𝑖𝑡 Σ𝑖−1 𝜇𝑖 − 2 ln|Σ𝑖 | + ln|𝑃(𝜔𝑖 )|
2 𝑖

a. 𝑥2 = 3.514 − 1.12𝑥1 + 0.187𝑥12


b. 𝑥1 = 3.514 − 1.12𝑥2 + 0.187𝑥22
c. 𝑥1 = 0.514 − 1.12𝑥2 + 0.187𝑥22
d. 𝑥2 = 0.514 − 1.12𝑥2 + 0.187𝑥22

Correct Answer: a

Detailed Solution:

This is the most general case of discriminant function for normal density. The inverse
matrices are
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

2 0 1/2 0
Σ1 −1 = [ ] , 𝑎𝑛𝑑 Σ2 −1 = [ ]
0 1/2 0 1/2

Setting 𝑔1 (𝑥) = 𝑔2 (𝑥) we get the decision boundary as 𝑥2 = 3.514 − 1.12𝑥1 + 0.187𝑥12

QUESTION 5:

For a two class problem, the linear discriminant function is given by g(x) =aty. What is the
updating rule for finding the weight vector a. Here y is augmented feature vector.
a. Adding the sum of all augmented feature vector which are misclassified multiplied by the
learning rate to the current weigh vector.
b. Subtracting the sum of all augmented feature vector which are misclassified multiplied by
the learning rate from the current weigh vector.
c. Adding the sum of the all augmented feature vector belonging to the positive class
multiplied by the learning rate to the current weigh vector.
d. Subtracting the sum of all augmented feature vector belonging to the negative class
multiplied by the learning rate from the current weigh vector.

Correct Answer: a
Detailed Solution:

𝑎(𝑘 + 1) = 𝑎(𝑘) + 𝜂 ∑ 𝑦

For derivation refer to video lectures.

____________________________________________________________________________

QUESTION 6:

For minimum distance classifier which of the following must be satisfied?

a. All the classes should have identical covariance matrix and diagonal matrix.
b. All the classes should have identical covariance matrix but otherwise arbitrary.
c. All the classes should have equal class probability.
d. None of above.

Correct Answer: c

Detailed Solution: Options are self-explanatory.


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 7:
Which of the following is the updating rule of gradient descent algorithm? Here ∇ is gradient
operator and 𝜂 is learning rate.

a. 𝑎𝑛+1 = 𝑎𝑛 − 𝜂∇𝐹(𝑎𝑛 )

b. 𝑎𝑛+1 = 𝑎𝑛 + 𝜂∇𝐹(𝑎𝑛 )

c. 𝑎𝑛+1 = 𝑎𝑛 − 𝜂∇𝐹(𝑎𝑛−1 )

d. 𝑎𝑛+1 = 𝑎𝑛 + 𝜂∇𝐹(𝑎𝑛−1 )

Correct Answer: a

Detailed Solution:

Gradient descent is an optimization algorithm used to minimize some function by


iteratively moving in the direction of steepest descent as defined by the negative of
the gradient.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 8:
The decision surface between two normally distributed class ω1 and ω2 is shown on the figure.
Can you comment which of the following is true?

a. 𝑝(𝜔1 ) = 𝑝(𝜔2 )

b. 𝑝(𝜔2 ) > 𝑝(𝜔1 )

c. 𝑝(𝜔1 ) > 𝑝(𝜔2 )

d. None of the above.

Correct Answer: c

Detailed Solution:

If the prior probabilities are not equal, the optimal boundary hyperplane is shifted away
from the more likely mean.

______________________________________________________________________________

QUESTION 9:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

In k-nearest neighbour’s algorithm (k-NN), how we classify an unknown object?

a. Assigning the label which is most frequent among the k nearest training samples.
b. Assigning the unknown object to the class of its nearest neighbour among training
sample.
c. Assigning the label which is most frequent among the all training samples except
the k farthest neighbor.
d. None of this.

Correct Answer: a

Detailed Solution:

Options are self-explanatory.

QUESTION 10:

What is the direction of weight vector w.r.t. decision surface for linear classifier?

a. Parallel
b. Normal
c. At an inclination of 45
d. Arbitrary

Correct Answer: b

Detailed Solution:

Options are self-explanatory.

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 3
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
Find the distance of the 3D point, 𝑃 = (−3, 1, 3) from the plane defined by
2𝑥 + 2𝑦 + 5𝑧 + 9 = 0?

a. 3.1
b. 4.6
c. 0
d. ∞ (infinity)

Correct Answer: b

Detailed Solution:

−𝟑∗𝟐 + 𝟏∗𝟐 + 𝟑∗𝟓 + 𝟗


Distance = = 𝟒. 𝟔
√−𝟑∗−𝟑 + 𝟏∗𝟏 + 𝟑∗𝟑
______________________________________________________________________________

QUESTION 2:
What is the shape of the loss landscape during optimization of SVM?

a. Linear
b. Paraboloid
c. Ellipsoidal
d. Non-convex with multiple possible local minimum

Correct Answer: b

Detailed Solution:

In SVM the objective to find the maximum margin based hyperplane (W) such that

WTx + b =1 for class = +1 else WTx + b = -1

For the max-margin condition to be satisfied we solve to minimize ||W||.


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function.

______________________________________________________________________________

QUESTION 3:
How many local minimum can be encountered while solving the optimization for maximizing
margin for SVM?

a. 1
b. 2
c. ∞ (infinite)
d. 0

Correct Answer: a

Detailed Solution:

In SVM the objective to find the maximum margin-based hyperplane (W) such that

WTx + b =1 for class = +1 else WTx + b = -1

For the max-margin condition to be satisfied we solve to minimize ||W||.

The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function. Since the shape is paraboloid, there can be only 1 global minimum.

______________________________________________________________________________

QUESTION 4:
Which of the following classifiers can be replaced by a linear SVM?

a. Logistic Regression
b. Neural Networks
c. Decision Trees
d. None of the above

Correct Answer: a

Detailed Solution:

Logistic regression framework belongs to the genre of linear classifier which means the
decision boundary can segregate classes only if they are linearly separable. SVM is also
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

capable of doing so and thus can be used instead of logistic regression classifiers. Neural
networks and decision trees are capable of modeling non-linear decision boundaries which
linear SVM cannot model directly.

______________________________________________________________________________

QUESTION 5:
Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>?

a. 0
4
b.
√5
2
c.
√17
−2
d. 17

Correct Answer: b

Detailed Solution:
𝒃∙𝒂
Scalar projection of b onto vector a is given by the scalar value |𝒂|

____________________________________________________________________________

QUESTION 6:
For a 2-class problem what is the minimum possible number of support vectors. Assume there
are more than 4 examples from each class?

a. 4
b. 1
c. 2
d. 8

Correct Answer: c

Detailed Solution:

To determine the separating hyper-plane, we need at least 1 example (which becomes a


support vector) from each of the classes.

____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 7:
Which one of the following is a valid representation of hinge loss (of margin = 1) for a two-class
problem?

y = class label (+1 or -1).

p = predicted (not normalized to denote any probability) value for a class.?

a. L(y, p) = max(0, 1- yp)


b. L(y, p) = min(0, 1- yp)
c. L(y, p) = max(0, 1 + yp)
d. None of the above

Correct Answer: a

Detailed Solution:

Hinge loss is meant to yield a value of 0 if the predicted output (p) has the same sign as that
of the class label and satisfies the margin condition, |p| > 1. If the signs differ, the loss is
meant to increase linearly as a function of p. Option (a) satisfies the above criteria.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 8:
Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1,
y1) = (−1, −1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect
to SVM?

a. Maximum margin will increase if we remove the point p2 from the training
set.
b. Maximum margin will increase if we remove the point p3 from the training
set.
c. Maximum margin will remain same if we remove the point p2 from the
training set.
d. None of the above.

Correct Answer: a

Detailed Solution:

Here the point p2 is a support vector, if we remove the point p2 then maximum margin will
increase.

____________________________________________________________________________

Question 9:

If we employ SVM to realize two input logic gates, then which of the following will be true?

a. The weight vector for AND gate and OR gate will be same.
b. The margin for AND gate and OR gate will be same.
c. Both the margin and weight vector will be same for AND gate and OR
gate.
d. None of the weight vector and margin will be same for AND gate and
OR gate.

Correct Answer: b

Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

As we can see although the weight vectors are not same but the margin is same.

______________________________________________________________________________

QUESTION 10:
What will happen to the margin length of a max-margin linear SVM if one of non-support vector
training example is removed??

a. Margin will be scaled down by the magnitude of that vector


b. Margin will be scaled up by the magnitude of that vector
c. Margin will be unaltered
d. Cannot be determined from the information provided

Correct Answer: c

Detailed Solution:

In max-margin linear SVM, the separating hyper-planes are determined only by the
training examples which are support vectors. The non-support vector training examples do
not influence the geometry of the separating planes. Thus, the margin, in our case, will be
unaltered.

____________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
A given cost function is of the form J(θ) = θ2 - θ+2? What is the weight update rule for gradient
descent optimization at step t+1? Consider, 𝛼=0.01 to be the learning rate.

a. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 0.01(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − (2𝜃 − 1)
d. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(𝜃 − 1)

Correct Answer: a

Detailed Solution:

𝜕𝐽(𝜃)
= 2𝜃 − 1
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 2:
Can you identify in which of the following graph gradient descent will not work correctly?

a. First figure
b. Second figure
c. First and second figure
d. Fourth figure
Correct Answer: b

Detailed Solution:

This is a classic example of saddle point problem of gradient descent. In the second graph
gradient descent may get stuck in the saddle point.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 3:
From the following two figures can you identify which one corresponds to batch gradient
descent and which one to Stochastic gradient descent?

a. Graph-A: Batch gradient descent, Graph-B: Stochastic gradient descent


b. Graph-B: Batch gradient descent, Graph-A: Stochastic gradient descent
c. Graph-A: Batch gradient descent, Graph-B: Not Stochastic gradient descent
d. Graph-A: Not batch gradient descent, Graph-B: Not Stochastic gradient descent

Correct Answer: a

Detailed Solution:

The graph of cost vs epochs is quite smooth for batch gradient descent because we are
averaging over all the gradients of training data for a single step. The average cost over the
epochs in Stochastic gradient descent fluctuates because we are using one example at a
time.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 4:
Suppose for a cost function 𝐽(𝜃) = 0.25𝜃 2 as shown in graph below, in which point do you feel
magnitude of weight update will be more? 𝜃 is plotted along horizontal axis.

a. Red point (Point 1)


b. Green point (Point 2)
c. Yellow point (Point 3)
d. Red (Point 1) and yellow (Point 3) have same magnitude of weight update

Correct Answer: a

Detailed Solution:

Weight update is directly proportional to the magnitude of the gradient of the cost
𝜕𝐽(𝜃)
function. In our case, 𝜕𝜃
= 0.5𝜃. So, the weight update will be more for higher values of 𝜃.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 5:
Which logic function can be performed using a 2-layered Neural Network?

a. AND
b. OR
c. XOR
d. All

Correct Answer: d

Detailed Solution:

A two layer neural network can be used for any type logic Gate (linear or non linear)
implementation.
____________________________________________________________________________

QUESTION 6:
Let X and Y be two features to discriminate between two classes. The values and class labels of
the features are given hereunder. The minimum number of neuron-layers required to design
the neural network classifier

X Y #Class

0 2 Class-II

1 2 Class-I

2 2 Class-I

1 3 Class-I

1 -3 Class-II

a. 1
b. 2
c. 4
d. 5
Correct Answer: a.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Detailed Solution:

Plot the feature points. They are linearly separable. Hence single layer is able to do the
classification task.

____________________________________________________________________________

QUESTION 7:
Which among the following options give the range for a logistic function?

a. -1 to 1
b. -1 to 0
c. 0 to 1
d. 0 to infinity

Correct Answer: c

Detailed Solution:

Refer to lectures, specifically the formula for logistic function.

______________________________________________________________________________

QUESTION 8:
The number of weights (including bias) to be learned by the neural network having 3 inputs and
2 classes and a hidden layer with 5 neurons is:

a. 12
b. 15
c. 25
d. 32
Correct Answer: d

Detailed Solution:

Please refer to lecture note week 4

(#input=3)+1(bias)x(#Hidden nodes=5) =(3+1)x5= 20 (#weights in 1st layer)


(#Hidden Nodes+1(bias))x(#classes=2)=(5+1)x2=12 (#weights in 2nd layer)

Hence, total weights= 20+12 =32


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 9:
For a XNOR function as given in the figure below, activation function of each node is given by:
1, 𝑥 ≥ 0
𝑓(𝑥) = { . Consider 𝑋1 = 1 and𝑋2 = 0, what will be the output for the above
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
neural network?

a. 1.5
b. 2
c. 0
d. 1

Correct Answer: c

Detailed Solution:

Output of 𝒂𝟏 : 𝒇(𝟎. 𝟓 ∗ 𝟏 + −𝟏 ∗ 𝟏 + −𝟏 ∗ 𝟎) = 𝒇(−𝟎. 𝟓) = 𝟎

Output of 𝒂𝟐 : 𝒇(−𝟏. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟏 + 𝟏 ∗ 𝟎) = 𝒇(−𝟎. 𝟓) = 𝟎

Output of 𝒂𝟑 : 𝒇(−𝟎. 𝟓 ∗ 𝟏 + 𝟏 ∗ 𝟎 + 𝟏 ∗ 𝟎) = 𝒇(−𝟎. 𝟓) = 𝟎

So, the correct answer is c.

____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 10:
Which activation function is more prone to vanishing gradient problem?

a. ReLU

b. Tanh

c. sigmoid

d. Threshold

Correct Answer: b

Detailed Solution:

Please refer to the lectures of week 4.

************END*******
Deep Learning
Assignment- Week 5
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20

_____________________________________________________________________________

QUESTION 1:
Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is
represented by a 3D feature vector and we have a binary classification problem. Calculate the
number of parameters of the network. Consider there are NO bias nodes in the network.

a. 100
b. 120
c. 140
d. 125

Correct Answer: b

Detailed Solution:

Number of parameters = (3 * 30) + (30 * 1) = 120

--------------------------------------------------------------------------------------------------------------------

QUESTION 2:
For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the
probability of belonging to class= -1 ?

a. 0
b. 0.22
c. 0.78
d. -0.22
Correct Answer: c
Detailed Solution:

In the binary classification setting we keep a single output node which can denote the probability
(p) of belonging to class= +1. So, probability of belonging to class= -1 is (1 - p) since the 2
classes are mutually exclusive.

______________________________________________________________________________

QUESTION 3:
Input to SoftMax activation function is [2,4,6]. What will be the output?

a. [0.11,0.78,0.11]
b. [0.016,0.117, 0.867]
c. [0.045,0.910,0.045]
d. [0.21, 0.58,0.21]

Correct Answer: b

Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟐) = 𝑛 =0.016and similarly the other values
∑𝑘=1 𝒆𝒙𝒌

______________________________________________________________________________

QUESTION 4:
A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of
proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be:
a. 40
b. 20
c. 80
d. 10
Correct Answer: a

Detailed Solution:

In order to find out the output, we multiply the weights with their respective inputs, add the
results and then further multiply them with their transfer function.

Thus, output= 2*(1*2 + 0.5*20 + 2*4 ) = 40

______________________________________________________________________

QUESTION 5:
Which one of the following activation functions is NOT analytically differentiable for all real
values of the given input?

a. Sigmoid
b. Tanh
c. ReLU
d. None of the above
Correct Answer: c

Detailed Solution:

ReLU(x) is not differentiable at x = 0, where x is the input to the ReLu layer.

______________________________________________________________________________

QUESTION 6:

Which function do the following perceptron realize? X1 and x2 can take only binary values. h(x)
is the activation function. ℎ(𝑥) = 1 if 𝑥 > 0, else 0.
a. NAND
b. NOR
c. AND
d. OR
Correct Answer: b

Detailed Solution:

In the above figure, when either i1 or i2 is 1, output is zero. When both i1 and i2 is 0,
output is 1, When both i1 and i2 is 1, output is 0. This is NOR logic.

______________________________________________________________________________

QUESTION 7:
In a simple MLP model with 10 neurons in the input layer, 100 neurons in the hidden layer and
1 neuron in the output layer. What is the size of the weight matrices between hidden output
layer and input hidden layer?
a. [10x1] , [100 X 2]
b. [100x1] , [ 10 X 1]
c. [100 x 10], [10 x 1]
d. [100x 1] , [10 x 100]
Correct Answer: d

Detailed Solution:

The size of weights between any layer 1 and layer 2 Is given by [nodes in layer 1 X nodes in layer
2]

______________________________________________________________________________

QUESTION 8:

Consider a fully connected neural network with input, one hidden layer, and output layer with
40, 2, 1 nodes respectively in each layer. What is the total number of learnable parameters (no
biases)?

a. 2
b. 82
c. 80
d. 40

Correct Answer: b

Detailed Solution:

Number of learnable parameters are weights and bias. Given there are no bias nodes. For
fully connected network, since each node is connected :
Thus it will be (40*2)+(2*1) =82.

QUESTION 9:
You want to build a 10-class neural network classifier, given a cat image, you want to classify
which of the 10 cat breeds it belongs to. Which among the 4 options would be an appropriate
loss function to use for this task?

a. Cross Entropy Loss


b. MSE Loss
c. SSIM Loss
d. None of the above

Correct Answer: a

Detailed Solution:

Out of the given options, Cross Entropy Loss is well suited for classification problems which is
the end task given in the question.

______________________________________________________________________________

QUESTION 10:

You’d like to train a fully-connected neural network with 5 hidden layers, each with 10 hidden
units. The input is 20-dimensional and the output is a scalar. What is the total number of
trainable parameters in your network? There is no bias.
a. (20+1)*10 + (10+1)*10*4 + (10+1)*1
b. (20)*10 + (10)*10*4 + (10)*1
c. (20)*10 + (10)*10*5 + (10)*1
d. (20+1)*10 + (10+1)*10*5 + (10+1)*1
Correct Answer: b

Detailed Solution:

Option (b) explains the answer.

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 6
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
Suppose a neural network has 3 input nodes, a, b, c. There are 2 neurons, X and Y. X = a+ b and
Y = X * c. What is the gradient of Y with respect to a, b and c? Assume, (a, b, c) = (6, -1, -4).
a. (5, -4, -4)
b. (4, 4, -5)
c. (-4, -4, 5)
d. (3, 3, 4)

Correct Answer: c

Detailed Solution:

𝝏𝒀
𝒀 = 𝑿. 𝒄, =𝑿=𝒂+𝒃=𝟓
𝝏𝒄
𝝏𝒀 𝝏𝒀
𝒀 = 𝑿. 𝒄 = (𝒂 + 𝒃). 𝒄, = 𝒄 = −𝟒, = 𝒄 = −𝟒
𝝏𝒂 𝝏𝒃
______________________________________________________________________________

QUESTION 2:
𝑑𝑦 𝑑𝑦
𝑦 = max(𝑎, 𝑏) and 𝑎 > 𝑏. What is the value of 𝑑𝑎 and 𝑑𝑏 ?

a. 1, 0
b. 0, 1
c. 0, 0
d. 1, 1

Correct Answer: a

Detailed Solution:

𝑦 = max(𝑎, 𝑏) and 𝑎 > 𝑏.


𝑑𝑦 𝑑𝑦
Now 𝑦 = 𝑎. So 𝑑𝑎 = 1 and 𝑑𝑏 = 0
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 3:
PCA reduces the dimension by finding a few________.

a. Hexagonal linear combination


b. Orthogonal linear combinations
c. Octagonal linear combination
d. Pentagonal Linear Combination

Correct Answer: b

Detailed Solution:

Direct from classroom lecture

______________________________________________________________________________

QUESTION 4:
Consider the four sample points below, 𝑋𝑖 ∈ ℝ2 .

We want to represent the data in 1D using PCA. Compute the unit-length principal component
directions of X, and then choose from the options below which one the PCA algorithm would
choose if you request just one principal component.

a. [1/√2 1/√2]𝑇
b. [1/√2 −1/√2]𝑇
c. [−1/√2 1/√2]𝑇
d. [1/√2 1/√2]𝑇

Correct Answer: d

Detailed Solution:

Centering X,
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

The above matrix is 𝑿𝒄 . Now,

𝟏𝟎 𝟔
𝟏
𝑿𝑻𝒄 𝑿𝒄 = 𝟒 [ ].
𝟔 𝟏𝟎
Now eigen vector with eigen value 16 is [1/√2 1/√2]𝑇
Now eigen vector with eigen value 4 is [1/√2 −1/√2]𝑇

QUESTION 5:

Which of the following is FALSE about PCA and Autoencoders?

a. Both PCA and Autoencoders can be used for dimensionality reduction


b. PCA works well with non-linear data but Autoencoders are best suited for linear
data
c. Output of both PCA and Autoencoders is lossy
d. None of the above

Correct Answer: b

Detailed Solution:

Options are self-explanatory

____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 6:
What is true regarding backpropagation rule?
a. It is a feedback neural network
b. Gradient of the final layer of weights being calculated first and the gradient of the first
layer of weights being calculated last
c. Hidden layers is not important, only meant for supporting input and output layer
d. None of the mentioned
Correct Answer: b

Detailed Solution:

Option is self explanatory

_____________________________________________________________________________

QUESTION 7:
Which of the following is true for PCA? Tick all the options that are correct.

a. Rotates the axes to lie along the principal components


b. Is calculated from the covariance matrix
c. Removes some information from the data
d. Eigenvectors describe the length of the principal components

Correct Answer: a,b,c

Detailed Solution:

See the definition

Direct from classroom lecture

_________________________________________________________________________

QUESTION 8:
A single hidden and no-bias autoencoder has 100 input neurons and 10 hidden neurons. What
will be the number of parameters associated with this autoencoder?

a. 1000
b. 2000
c. 2110
d. 1010
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: b

Detailed Solution:

As single hidden layer and no-bias autoencoder,

Input neurons = 100, Hidden neurons = 10. So Output neurons = 100

Total number of parameters = 100*10+10*100=2000

______________________________________________________________________________

QUESTION 9:
Which of the following two vectors can form the first two principal components?

a. {2; 3; 1} and {3; 1; −9}


b. {2; 4; 1} and {−2; 1; −8}
c. {2; 3; 1} and {−3; 1; −9}
d. {2; 3; −1} and {3; 1; −9}

Correct Answer: a

Detailed Solution:

Only in option (a), the vectors are othogonal

____________________________________________________________________________

QUESTION 10:
Lets say vectors 𝑎⃗ = {2; 4} and 𝑏⃗⃗ = {𝑛; 1} forms the first two principle components after
applying PCA. Under such circumstances, which among the following can be a possible value
of n?
a. 2
b. -2
c. 0
d. 1

Correct Answer: b

Detailed Solution:

Only option (b) makes the two vectors orthogonal.


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 7
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
Select the correct option about Sparse Autoencoder?

Statement 1: Sparse autoencoders introduces information bottleneck by reducing the number


of nodes at hidden layers

Statement 2: The idea is to encourage network to learn an encoding and decoding which only
relies on activating a small number of neurons

a. Both the statements are true


b. Statement 1 is true, but Statement 2 is false
c. Statement 1 is false, but statement 2 is true
d. Both the statements are false

Correct Answer: c

Detailed Solution:

Sparse autoencoders introduces an information bottleneck without requiring a reduction in


the number of nodes at hidden layers. It encourages network to learn an encoding and
decoding which only relies on activating a small number of neurons.

______________________________________________________________________________

QUESTION 2:
Select the correct option about Denoising autoencoders?

Statement A: The loss is between the original input and the reconstruction from a noisy version
of the input

Statement B: Denoising autoencoders can be used as a tool for feature extraction.

a. Both the statements are false


b. Statement A is false but Statement B is true
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

c. Statement A is true but Statement B is false


d. Both the statements are true

Correct Answer: d

Detailed Solution:

For denoising autoencoder, both statement 1 and 2 are true. Thus option (d) is correct

______________________________________________________________________________

QUESTION 3:
Which of the following autoencoder methods uses corrupted versions of the input?

a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design

Correct Answer: d

Detailed Solution:

Refer to classroom lecture.

______________________________________________________________________________

QUESTION 4:
Which of the following autoencoder methods uses a hidden layer with fewer units than the
input layer?

a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design

Correct Answer: b

Detailed Solution:

Refer to classroom lecture.


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 5:

Which of the following is false about autoencoder?

a. Autoencoders possesses generalization capabilities


b. Autoencoders are best suited for image captioning task
c. Its objective is to minimize the reconstruction loss so that output is similar to
input
d. It compresses the input into a latent space representation and then reconstruct
the output from it

Correct Answer: b

Detailed Solution:

Except option (b), rest all the options are true about auroencoders

____________________________________________________________________________

QUESTION 6:
Find the value of 𝑑(𝑡 − 34) ∗ 𝑥(𝑡 + 56); 𝑑(𝑡) being the delta function and * being the
convolution operation.

a. 𝑥(𝑡 + 56)
b. 𝑥(𝑡 + 32)
c. 𝑥(𝑡 + 22)
d. 𝑥(𝑡 − 56)

Correct Answer: c

Detailed Solution:

Convolution of a function with delta shifts accordingly

_____________________________________________________________________________

QUESTION 7:
Impulse response is the output of ________________system due to impulse input applied at
time=0. Fill in the blanks from the options below.

a. Linear
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

b. Time Varying
c. Time Invariant
d. Linear And Time Invariant

Correct Answer: d

Detailed Solution:

Impulse response is output of LTI system due to impulse input pplied at time t=0 or n=0.
Behaviour of an LTI system is characterized by its impulse response.

_________________________________________________________________________

QUESTION 8:
Convolution of an input with the system impulse function gives the output of a___ system. Fill
in the blanks.

a. Linear Time Invariant


b. Non-linear system
c. Time Invariant system
d. None of the above

Correct Answer: a

Detailed Solution:

Direct from classroom lecture

______________________________________________________________________________

QUESTION 9:
Given the image below where, Row 1: Original Input, Row 2: Noisy input, Row 3: Reconstructed
output. Choose one of the following variants of autoencoder that is most suited to get Row 3
from Row 2.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

a. Stacked autoencoder
b. Sparse autoencoder
c. Denoising autoencoder
d. None of the above

Correct Answer: c

Detailed Solution:

Reconstruction of original noise-free data from noisy input is the tasks of denoising
autoencoder

____________________________________________________________________________

QUESTION 10:
Which of the following is true for Contractive Autoencoders?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

a. penalizing instances where a small change in the input leads to a large change in
the encoding space
b. penalizing instances where a large change in the input leads to a small change in
the encoding space
c. penalizing instances where a small change in the input leads to a small change in
the encoding space
d. None of the above

Correct Answer: a

Detailed Solution:

Direct from definition of Contractive autoencoders

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 8
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
Which of the following is false about CNN?

a. Output should be flattened before feeding it to a fully connected lyer


b. There can be only 1 fully connected layer in CNN
c. We can use ana many convolutional layers in CNN
d. None of the above
Correct Answer: b

Detailed Solution:

Direct from classroom lecture


______________________________________________________________________________

QUESTION 2:
The input image has been converted into a matrix of size 64 X 64 and a kernel/filter of size 5x5
with a stride of 1 and no padding. What will be the size of the convoluted matrix?

a. 5x5
b. 59x59
c. 64x64
d. 60x60

Correct Answer: d

Detailed Solution:

The size of the convoluted matrix is given by CxC where C=((I-F+2P)/S)+1, where C is the
size of the Convoluted matrix, I is the size of the input matrix, F the size of the filter matrix
and P the padding applied to the input matrix. Here P=0, I=64, F=5 and S=1. Therefore,
the answer is 60x60.
______________________________________________________________________________

QUESTION 3:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Filter size of 3x3 is convolved with matrix of size 4x4 (stride=1). What will be the size of output
matrix if valid padding is applied:

a. 4x4
b. 3x3
c. 2x2
d. 1x1

Correct Answer: c

Detailed Solution:

This type is used when there is no requirement for Padding. The output matrix after
convolution will have the dimension of ((n – f +2P)/S+ 1) x ((n – f +2P)/S+ 1)

______________________________________________________________________________

QUESTION 4:
Let us consider a Convolutional Neural Network having three different convolutional layers in
its architecture as:

Layer-1: Filter Size – 3 X 3, Number of Filters – 10, Stride – 1, Padding – 0

Layer-2: Filter Size – 5 X 5, Number of Filters – 20, Stride – 2, Padding – 0

Layer-3: Filter Size – 5 X5 , Number of Filters – 40, Stride – 2, Padding – 0

Layer 3 of the above network is followed by a fully connected layer. If we give a 3-D
image input of dimension 39 X 39 to the network, then which of the following is the input
dimension of the fully connected layer.

a. 1960
b. 2200
c. 4563
d. 13690
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: a

Detailed Solution:

the input image of dimension 39 X 39 X 3 convolves with 10 filters of size 3 X 3 and takes
the Stride as 1 with no padding. After these operations, we will get an output of 37 X 37 X
10.

Output of layer 2 would be 17x17x20

Ouput of layer 3 would be 7x7x40. Flattening this gives 1960.

______________________________________________________________________________

QUESTION 5:
Suppose you have 40 convolutional kernel of size 3 x 3 with no padding and stride 1 in the first
layer of a convolutional neural network. You pass an input of dimension 1024x1024x3 through
this layer. What are the dimensions of the data which the next layer will receive?

a. 1020x1020x40
b. 1022x1022x40
c. 1021x1021x40
d. 1022x1022x3

Correct Answer: b

Detailed Solution:

The layer accepts a volume of size W1×H1×D1. In our case, 1024x1024x3

Requires four hyperparameters: Number of filters K=40, their spatial extent F=3, the
stride S=1, the amount of padding P=0.

Produces a volume of size W2×H2×D2 i.e. where: W2=(W1−F+2P)/S+1 =(1024−3)/1+1


=1022, H2=(H1−F+2P)/S+1 =(1024−3)/1+1 =1022, (i.e. width and height are computed
equally by symmetry), D2= Number of filters K=40.

____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 6:
Consider a CNN model which aims at classifying an image as either a rose,or a marigold, or a lily
or an orchid (consider the test image can have only 1 of the classes at a time) . The last (fully-
connected) layer of the CNN outputs a vector of logits, L, that is passed through a ____
activation that transforms the logits into probabilities, P. These probabilities are the model
predictions for each of the 4 classes. Fill in the blanks with the appropriate option.

Fill in the blanks with the appropriate option.

a. Leaky ReLU
b. Tanh
c. ReLU
d. Softmax

Correct Answer: d

Detailed Solution:

Softmax works best if there is one true class per example, because it outputs a probability
vector whose entries sum to 1.

____________________________________________________________________________

QUESTION 7:
Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer with
100 filters that are each 5x5. How many parameters does this hidden layer have (without bias)

a. 2501
b. 2600
c. 7500
d. 7600

Correct Answer: c

Detailed Solution:

As we have a RGB Image so each filter would be 3D, whose dimension is 5 * 5 * 3 = 75

Now we have 100 such filters. Now, as there is no bias so, total number of parameters= = 5
* 5 * 3 * 100 = 7500
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

______________________________________________________________________________

QUESTION 8:
Which of the following activation functions can lead to vanishing gradients?

a. ReLU
b. Sigmoid
c. Leaky ReLU
d. None of the above

Correct Answer: b

Detailed Solution:

For sigmoid activation, a large change in the input of the sigmoid function will cause a
small change in the output. Hence, the derivative becomes small. When more and more
layers uses such activation, the gradient of the loss function becomes very small making the
network difficult to train.

___________________________________________________________________________

QUESTION 9:
Statement 1: Residual networks can be a solution for vanishing gradient problem

Statement 2: Residual networks provide residual connections straight to earlier layers

Statement 3: Residual networks can never be a solution for vanishing gradient problem

Which of the following option is correct?

a. Statement 2 is correct
b. Statement 3 is correct
c. Both Statement 1 and Statement 2 are correct
d. Both Statement 2 and Statement 3 are correct

Correct Answer: c

Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Residual networks can be a solution to vanishing gradient problems, as they provide


residual connections straight to earlier layers. This residual connection doesn’t go through
activation functions that “squashes” the derivatives, resulting in a higher overall derivative
of the block.

____________________________________________________________________________

QUESTION 10:
Input to SoftMax activation function is [0.5,0.5,1]. What will be the output?

a. [0.28,0.28,0.44]
b. [0.022,0.956, 0.022]
c. [0.045,0.910,0.045]
d. [0.42, 0.42,0.16]

Correct Answer: a

Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟎. 𝟓) = 𝑛 =0.28and similarly the other values
∑𝑘=1 𝒆𝒙𝒌

______________________________________________________________________

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
What can be a possible consequence of choosing a very small learning rate?
a. Slow convergence
b. Overshooting minima
c. Oscillations around the minima
d. All of the above

Correct Answer: a
Detailed Solution:
Choosing a very small learning rate can lead to slower convergence and thus option (a) is
correct.
______________________________________________________________________________

QUESTION 2:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for 𝛾?
𝑉𝑡 = 𝛾𝑉𝑡−1 + 𝜂∇𝜃 𝐽(𝜃)
a. 𝛾 is the momentum term which indicates acceleration
b. 𝛾 is the step size
c. 𝛾 is the first order moment
d. 𝛾 is the second order moment

Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. 𝛾 is
that fraction which indicates how much acceleration you want and its value lies between 0 and 1.
______________________________________________________________________________

QUESTION 3:
Which of the following is true about momentum optimizer?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

a. It helps accelerating Stochastic Gradient Descent in right direction


b. It helps prevent unwanted oscillations
c. It helps to know the direction of the next step with knowledge of the previous step
d. All of the above

Correct Answer: d
Detailed Solution:
Option (a), (b) and (c) all are true for momentum optimiser. Thus, option (d) is correct.
______________________________________________________________________________

QUESTION 4:
Let 𝐽(𝜃) be the cost function. Let the gradient descent update rule for 𝜃𝑖 be,

𝜃𝑖+1 = 𝜃𝑖 + ∇𝜃𝑖

What is the correct expression of ∇𝜃𝑖 . 𝛼 is the learning rate.

𝑑𝐽(𝜃𝑖 )
a. −𝛼 𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
b. 𝛼 𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
c. − 𝑑𝜃
𝑖+1
𝑑𝐽(𝜃𝑖 )
d. 𝑑𝜃𝑖

Correct Answer: a
Detailed Solution:
Gradient descent update rule for 𝜃𝑖 is,
𝑑𝐽(𝜃𝑖 )
𝜃𝑖+1 = 𝜃𝑖 − 𝛼 , 𝛼 is the learning rate
𝑑 𝜃𝑖
______________________________________________________________________________

QUESTION 5:
A given cost function is of the form J(θ) =6 θ2 - 6θ+6? What is the weight update rule for
gradient descent optimization at step t+1? Consider, 𝛼 to be the learning rate.

a. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 6𝛼(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − 𝛼(12𝜃 − 6 + 6)
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

d. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 + 1)

Correct Answer: a
Detailed Solution:
𝜕𝐽(𝜃)
= 12𝜃 − 6
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
______________________________________________________________________________

QUESTION 6:
If the first few iterations of gradient descent cause the function f(θ0,θ1) to increase rather than
decrease, then what could be the most likely cause for this?

a. we have set the learning rate to too large a value


b. we have set the learning rate to zero
c. we have set the learning rate to a very small value
d. learning rate is gradually decreased by a constant value after every epoch

Correct Answer: a
Detailed Solution:
If learning rate were small enough, then gradient descent should successfully take a tiny small
downhill and decrease f(θ0,θ1) at least a little bit. If gradient descent instead increases the
objective value that means learning rate is too high.
______________________________________________________________________________

QUESTION 7:
For a function f(θ0,θ1), if θ0 and θ1 are initialized at a global minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent?

a. θ0 and θ1 will update as per gradient descent rule


b. θ0 and θ1 will remain same
c. Depends on the values of θ0 and θ1
d. Depends on the learning rate

Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________

QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
d. Too small update of weight values leading to slower convergence

Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
______________________________________________________________________________

QUESTION 9:
What are the steps for using a gradient descent algorithm?

1. Calculate error between the actual value and the predicted value
2. Update the weights and biases using gradient descent formula
3. Pass an input through the network and get values from output layer
4. Initialize weights and biases of the network with random values
5. Calculate gradient value corresponding to each weight and bias

a. 1, 2, 3, 4, 5
b. 5, 4, 3, 2, 1
c. 3, 2, 1, 5, 4
d. 4, 3, 1, 5, 2

Correct Answer: d
Detailed Solution:
Initialize random weights, and then start passing input instances and calculate error response
from output layer and back-propagate the error through each subsequent layers. Then update the
neuron weights using a learning rate and gradient of error. Please refer to the lectures of week 4.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 10:
You run gradient descent for 15 iterations with learning rate 𝜂 = 0.3 and compute error after
each iteration. You find that the value of error decreases very slowly. Based on this, which of
the following conclusions seems most plausible?

a. Rather than using the current value of a, use a larger value of 𝜂


b. Rather than using the current value of a, use a smaller value of 𝜂
c. Keep 𝜂 = 0.3
d. None of the above

Correct Answer: a
Detailed Solution:
Error rate is decreasing very slowly. Therefore increasing the learning rate is a most plausible
solution.
______________________________________________________________________________

______________________________________________________________________________

************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 10
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:

What is not a reason for using batch-normalization??

a. Prevent overfitting
b. Faster convergence
c. Faster inference time
d. Prevent Co-variant shift

Correct Answer: c

Detailed Solution:
Inference time does not become faster due to batch normalization. It increases the computational
burden. So, inference time increases.
____________________________________________________________________________

QUESTION 2:
A neural network has 3 neurons in a hidden layer. Activations of the neurons for three batches
1 0 6
are[2] , [2] , [9] respectively. What will be the value of mean if we use batch normalization in
3 5 2
this layer?
2.33
a. [4.33]
3.33
2.00
b. [2.33]
5.66
1.00
c. [1.00]
1.00
0.00
d. [0.00]
0.00
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: a

Detailed Solution:
1 1 0 6 2.33
× ([2] + [2] + [9]) = [4.33]
3
3 5 2 3.33
______________________________________________________________________________

QUESTION 3:
How can we prevent underfitting?

a. Increase the number of data samples


b. Increase the number of features
c. Decrease the number of features
d. Decrease the number of data samples

Correct Answer: b

Detailed Solution:
Underfitting happens whenever feature samples are capable enough to capture the data
distribution. We need to increase the feature size, so data can be fitted perfectly well.
______________________________________________________________________________

QUESTION 4:
How do we generally calculate mean and variance during testing?

a. Batch normalization is not required during testing


b. Mean and variance based on test image
c. Estimated mean and variance statistics during training
d. None of the above

Correct Answer: c

Detailed Solution:
We generally calculate batch mean and variance statistics during training and use the estimated
batch mean and variance during testing.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 5:
Which one of the following is not an advantage of dropout?

a. Regularization
b. Prevent Overfitting
c. Improve Accuracy
d. Reduce computational cost during testing

Correct Answer: d

Detailed Solution:
Dropout makes some random features during training but while testing we don’t zero-down any
feature. So there is no question of reduction of computational cost.
______________________________________________________________________________

QUESTION 6:
What is the main advantage of layer normalization over batch normalization?

a. Faster convergence
b. Lesser computation
c. Useful in recurrent neural network
d. None of these

Correct Answer: c

Detailed Solution:
See the lectures/lecture materials.
______________________________________________________________________________

QUESTION 7:
While training a neural network for image recognition task, we plot the graph of training error
and validation error. Which is the best for early stopping?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

a. A
b. B
c. C
d. D

Correct Answer: c

Detailed Solution:
Minimum validation point is the best for early stopping.
______________________________________________________________________________

QUESTION 8:
Which among the following is NOT a data augmentation technique?

a. Random horizontal and vertical flip of image


b. Random shuffle all the pixels of an image
c. Random color jittering
d. All the above are data augmentation techniques

Correct Answer: b

Detailed Solution:
Random shuffle of all the pixels of the image will distort the image and neural network will be
unable to learn anything. So, it is not a data augmentation technique.
______________________________________________________________________________

QUESTION 9:
Which of the following is true about model capacity (where model capacity means the ability of
neural network to approximate complex functions)?

a. As number of hidden layers increase, model capacity increases


b. As dropout ratio increases, model capacity increases
c. As learning rate increases, model capacity increases
d. None of these

Correct Answer: a

Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Dropout and learning rate has nothing to do with model capacity. If hidden layers increase, it
increases the number of learnable parameter. Therefore, model capacity increases.
______________________________________________________________________________

QUESTION 10:
Batch Normalization is helpful because

a. It normalizes all the input before sending it to the next layer


b. It returns back the normalized mean and standard deviation of weights
c. It is a very efficient back-propagation technique
d. None of these

Correct Answer: a

Detailed Solution:
Batch normalization layer normalizes the input.

______________________________________________________________________________

______________________________________________________________________________

************END*******

You might also like