DL Assignment Solution 00 To 10
DL Assignment Solution 00 To 10
DL Assignment Solution 00 To 10
Deep Learning
Assignment- Week 0
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
𝑑𝑓
Find where 𝑓 = |𝑥|? |𝑥| means absolute of 𝑥.
𝑑𝑥
a. 1
b. 𝑆𝑖𝑔𝑛(𝑥)
c. 0
d. ∞
Correct Answer: b
Detailed Solution:
𝑑𝑓 1 𝑥>0
= {−1 𝑥 < 0 = 𝑠𝑖𝑔𝑛(𝑥)
𝑑𝑥
0 𝑥=0
______________________________________________________________________________
QUESTION 2:
𝑑𝜎 1
Find , where 𝜎 (𝑥 ) =
𝑑𝑥 1+𝑒 −𝑥
𝑑𝜎
a. = 1 − 𝜎(𝑥)
𝑑𝑥
𝑑𝜎
b. = 1 + 𝜎(𝑥)
𝑑𝑥
𝑑𝜎
c. = 𝜎(𝑥)(1 − 𝜎 (𝑥 ))
𝑑𝑥
𝑑𝜎
d. = 𝜎(𝑥)(1 + 𝜎 (𝑥 ))
𝑑𝑥
Correct Answer: c
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
1
𝜎(𝑥) =
1 + 𝑒 −𝑥
𝑑𝜎
= (1 + 𝑒 −𝑥 )−2 ∗ 𝑒 −𝑥
𝑑𝑥
𝑑𝜎 𝑒 −𝑥 1 + 𝑒 −𝑥 − 1 1 1 1 1
= = = − = (1 − )
𝑑𝑥 (1 + 𝑒 −𝑥 )2 (1 + 𝑒 −𝑥 )2 1 + 𝑒 −𝑥 (1 + 𝑒 −𝑥 )2 1 + 𝑒 −𝑥 1 + 𝑒 −𝑥
𝑑𝜎
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥
______________________________________________________________________________
QUESTION 3:
There are 5 black 7 white balls. Assume we have drawn two balls randomly one by one without
any replacement. What will be the probability that both balls are black?
a. 20/132
b. 25/144
c. 20/144
d. 25/132
Correct Answer: a
Detailed Solution:
Now overall probability of both balls being black = (𝟓/𝟏𝟐) × (𝟒/𝟏𝟏) = 𝟐𝟎/𝟏𝟑𝟐
______________________________________________________________________________
QUESTION 4:
Two dices are rolled together. What will be the probability of getting 1 and 4 together?
a. 1/18
b. 1/36
c. 1
d. None of the above
Correct Answer: a
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Number of times getting 𝟏 & 𝟒 together = 𝟐 (where 𝟏 in first dice, 𝟒 in second dice or 𝟒 in
first dice, 𝟏 in second dice).
_____________________________________________________________________________
QUESTION 5:
What will be possible median of the distribution?
a. 26
b. 34
c. 43
d. 55
Correct Answer: b
Detailed Solution:
So, median is the average of (𝟏𝟒𝟑𝟒/𝟐) = 𝟕𝟏𝟕th value and 𝟕𝟏𝟖th value.
QUESTION 6:
Image shows there normally distributed probability distribution function with zero mean and
three different variances (𝜎1 , 𝜎2 , 𝜎3 ). Which of the following relationship is valid?
a. 𝜎1 > 𝜎2 > 𝜎3
b. 𝜎1 < 𝜎2 < 𝜎3
c. 𝜎1 = 𝜎2 = 𝜎3
d. 𝜎1 > 𝜎2 < 𝜎3
Correct Answer: b
Detailed Solution:
Higher variance means the spread of the distribution will be higher. So, 𝝈𝟏 < 𝝈𝟐 < 𝝈𝟑
____________________________________________________________________________
QUESTION 7:
Matrix inverse of a square matrix 𝐴 exists if.
a. Determinant of 𝐴, 𝑑𝑒𝑡(𝐴) = 0
b. Eigen values of 𝐴 are non-zero
c. Sum of eigen values are non-zero
d. None of the above
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: b
Detailed Solution:
Matrix inverse exists if 𝒅𝒆𝒕(𝑨) is not equal to zero. 𝒅𝒆𝒕(𝑨) = product of all the eigen
values of the square matrix.
_________________________________________________________________________
QUESTION 8:
1 −2
𝑥1 , 𝑥2 , 𝑥3 are the linearly independent vectors. If 𝑥1 = [3] , 𝑥2 = [ 4 ], what is the possible
0 −5
value of 𝑥3 ?
−1
a. [ 7 ]
−5
0
b. [ 10 ]
−5
3
c. [4]
5
5
d. [−5]
10
Correct Answer: c
Detailed Solution:
𝟏 −𝟐 𝟑
𝒅𝒆𝒕([𝟑 𝟒 𝟒]) ≠ 𝟎
𝟎 −𝟓 𝟓
We also can validate linear dependency of option a, b, d.
Option a: 𝒙𝟏 + 𝒙𝟐 = 𝒙𝟑 ,
Option b: 𝟐𝒙𝟏 + 𝒙𝟐 = 𝒙𝟑 ,
Option d: 𝒙𝟏 − 𝟐𝒙𝟐 = 𝒙𝟑
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
______________________________________________________________________________
QUESTION 9:
𝑥 + 2𝑦 − 𝑧 = 1 ∙∙∙∙∙∙∙∙∙∙ (1)
−2𝑥 − 4𝑦 + 2𝑧 = −2 ∙∙∙∙∙∙∙∙∙∙∙ (2)
𝑧 = 2 ∙∙∙∙∙∙∙∙∙∙ (3)
a. 𝑥 = 0, 𝑦 = 0, 𝑧 = 2
b. 𝑧 = 2 and infinitely possible 𝑥, 𝑦
c. 𝑧 = 2 and no possible 𝑥, 𝑦
d. None of the above
Correct Answer: b
Detailed Solution:
____________________________________________________________________________
QUESTION 10:
What are the eigen values of the matrix A?
5 4
𝐴=[ ]
−3 −2
a. 4, −3
b. 5, −2
c. −2, −1
d. 2, 1
Correct Answer: d
Detailed Solution:
𝐝𝐞 𝐭(𝝀𝑰 − 𝑨) = 𝟎
𝝀 − 𝟓 −𝟒
𝒐𝒓, 𝒅𝒆𝒕 ([ ]) = 𝟎 𝒐𝒓, (𝝀 − 𝟓)(𝝀 + 𝟐) + 𝟏𝟐 = 𝟎 𝒐𝒓, 𝝀𝟐 − 𝟑𝝀 + 𝟐 = 𝟎 𝒐𝒓, 𝝀 = 𝟐, 𝟏
𝟑 𝝀+𝟐
______________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 1
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2= 20
______________________________________________________________________________
QUESTION 1:
Signature descriptor of an unknown shape is given in the figure, can you identify the unknown
shape?
a. Circle
b. Square
c. Straight line
d. Cannot be predicted
Correct Answer: a
Detailed Solution:
Distance from centroid to boundary is same for every value of ϴ. This is true for Circle
with a radius k.
______________________________________________________________________________
QUESTION 2:
To measure the Smoothness, coarseness and regularity of a region we use which of the
transformation to extract feature?
a. Gabor Transformation
b. Wavelet Transformation
c. Both Gabor, and Wavelet Transformation.
d. None of the Above.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: c
Detailed Solution:
One of the important approach to region description is texture content. This helps to provide
the measure of some of the important properties of an image like smoothness, coarseness
and regularity of the region. We use Gabor filter and Wavelet transformation to extract
texture feature.
QUESTION 3:
Suppose Fourier descriptor of a shape has K coefficient, and we remove last few coefficient and
use only first m (m<K) number of coefficient to reconstruct the shape. What will be effect of
using truncated Fourier descriptor on the reconstructed shape?
a. We will get a smoothed boundary version of the shape.
b. We will get only the fine details of the boundary of the shape.
c. Full shape will be reconstructed without any loss of information.
d. Low frequency component of the boundary will be removed from contour of the
shape.
Correct Answer: a
Detailed Solution:
Low frequency component of Fourier descriptor captures the general shape properties of
the object and high frequency component captures the finer detail. So, if we remove the last
few component, then the finer details will be lost, and as a result the reconstructed shape
will be smoothed version of original shape. The boundary of the reconstructed shape will
be a low frequency approximation of the original shape boundary.
______________________________________________________________________________
QUESTION 4:
While computing polygonal descriptor of an arbitrary shape using splitting technique, which of
the following we take as the starting guess?
a. Vertex joining the two closet point above a threshold on the boundary.
b. Vertex joining the two farthest point on the boundary.
c. Vertex joining any two arbitrary point on the boundary.
d. None of the above.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: b
Detailed Solution:
_____________________________________________________________________________
QUESTION 5:
Consider two class Bayes’ Minimum Risk Classifier. Probability of classes W1 and W2 are, P (ω1)
=0.3 and P (ω2) =0.7 respectively. P(x) = 0.545, P (x| ω1) = 0.65, P (x| ω2) =0.5 and the loss
𝜆 𝜆12
matrix values are [ 11 ]
𝜆21 𝜆22
If the classifier assign x to class W1, then which one of the following is true.
𝜆21 −𝜆11
a. < 1.79
𝜆12 −𝜆22
𝜆21 −𝜆11
b. > 1.79
𝜆12 −𝜆22
𝜆21 −𝜆11
c. < 1.09
𝜆12 −𝜆22
𝜆21 −𝜆11
d. > 1.09
𝜆12 −𝜆22
Correct Answer: b
Detailed Solution:
𝜆21 − 𝜆11
> 1.79
𝜆12 − 𝜆22
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
The Fourier transformation of a complex sequence of number 𝑠(𝑘) for 𝑘 = 0, … , 𝑁 − 1 is given
by:
a. 𝑎(𝑢) = ∑𝑁−1
𝑘=0 𝑠(𝑘)𝑒
𝑗2𝜋𝑢𝑘/𝑁
b. 𝑎(𝑢) = ∑𝑁
𝑘=0 𝑠(𝑘)𝑒
𝑗2𝜋𝑢𝑘/𝑁
c. 𝑎(𝑢) = ∑𝑁−1
𝑘=0 𝑠(𝑘)𝑒
−𝑗2𝜋𝑢𝑘/𝑁
𝑁⁄
2 −𝑗2𝜋𝑢𝑘/𝑁
d. 𝑎(𝑢) = ∑𝑘=− 𝑁⁄ 𝑠(𝑘)𝑒
2
Correct Answer: c
Detailed Solution:
_____________________________________________________________________________
QUESTION 7:
The gray co-occurrence matrix C of an unknown image is given in below. What is the value of
maximum probability descriptor?
1 2 2
2 1 2
2 3 2
Fig 1: C
a. 3/17
b. 1/12
c. 3/16
d. 5/16
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: a
Detailed Solution:
Maximum probability = max (cij). cij is normalized co-occurrence matrix. Total values in C
is 17.
______________________________________________________________________________
QUESTION 8:
Which of the following is not a boundary descriptor.
a. Polygonal Representation
b. Fourier descriptor
c. Signature
d. Histogram.
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
b. Texture
c. MFCC
______________________________________________________________________________
QUESTION 10:
If the larger values of gray co-occurrence matrix are concentrated around the main diagonal,
then which one of the following will be true?
Correct Answer: a
Detailed Solution:
Options are self-explanatory. We can’t comment anything on the entropy based on the
values of diagonal elements. Because it depends on the randomness of the value. Whereas
element difference moment will be low and inverse element difference moment will be high.
______________________________________________________________________________
************END***********
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 2
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________
QUESTION 1:
Suppose if you are solving an n-class problem, how many discriminant function you will need
for solving?
a. n-1
b. n
c. n+1
d. n-2
Correct Answer: b
______________________________________________________________________________
QUESTION 2:
If we choose the discriminant function 𝑔𝑖 (𝑥) as a function of posterior probability. i.e. 𝑔𝑖 (𝑥) =
𝑓(𝑝(𝑤𝑖 ⁄𝑥)). Then which of following cannot be the function 𝑓( )?
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 3:
What will be the nature of decision surface when the covariance matrices of different classes are
identical but otherwise arbitrary? (Given all the classes has equal class probabilities)
Correct Answer: c
Detailed Solution:
_____________________________________________________________________________
QUESTION 4:
The mean and variance of all the samples of two different normally distributed class ω1 and ω2
are given
3 1⁄ 0 3 2 0
𝜇1 = [ ] ; Σ1 = [ 2 ] 𝑎𝑛𝑑 𝜇2 = [ ] ; Σ2 = [ ]
6 0 2 −2 0 2
What will be the value expression of decision boundary between these two classes if both the
𝑥1
class has equal class probability 0.5? For the input sample 𝑥 = [𝑥 ] consider 𝑔𝑖 (𝑥) = 𝑥 𝑡 −
2
1 1 1
Σ −1 𝑥 + Σ𝑖−1 𝜇𝑖 𝑥 − 2 𝜇𝑖𝑡 Σ𝑖−1 𝜇𝑖 − 2 ln|Σ𝑖 | + ln|𝑃(𝜔𝑖 )|
2 𝑖
Correct Answer: a
Detailed Solution:
This is the most general case of discriminant function for normal density. The inverse
matrices are
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
2 0 1/2 0
Σ1 −1 = [ ] , 𝑎𝑛𝑑 Σ2 −1 = [ ]
0 1/2 0 1/2
Setting 𝑔1 (𝑥) = 𝑔2 (𝑥) we get the decision boundary as 𝑥2 = 3.514 − 1.12𝑥1 + 0.187𝑥12
QUESTION 5:
For a two class problem, the linear discriminant function is given by g(x) =aty. What is the
updating rule for finding the weight vector a. Here y is augmented feature vector.
a. Adding the sum of all augmented feature vector which are misclassified multiplied by the
learning rate to the current weigh vector.
b. Subtracting the sum of all augmented feature vector which are misclassified multiplied by
the learning rate from the current weigh vector.
c. Adding the sum of the all augmented feature vector belonging to the positive class
multiplied by the learning rate to the current weigh vector.
d. Subtracting the sum of all augmented feature vector belonging to the negative class
multiplied by the learning rate from the current weigh vector.
Correct Answer: a
Detailed Solution:
𝑎(𝑘 + 1) = 𝑎(𝑘) + 𝜂 ∑ 𝑦
____________________________________________________________________________
QUESTION 6:
a. All the classes should have identical covariance matrix and diagonal matrix.
b. All the classes should have identical covariance matrix but otherwise arbitrary.
c. All the classes should have equal class probability.
d. None of above.
Correct Answer: c
QUESTION 7:
Which of the following is the updating rule of gradient descent algorithm? Here ∇ is gradient
operator and 𝜂 is learning rate.
a. 𝑎𝑛+1 = 𝑎𝑛 − 𝜂∇𝐹(𝑎𝑛 )
b. 𝑎𝑛+1 = 𝑎𝑛 + 𝜂∇𝐹(𝑎𝑛 )
c. 𝑎𝑛+1 = 𝑎𝑛 − 𝜂∇𝐹(𝑎𝑛−1 )
d. 𝑎𝑛+1 = 𝑎𝑛 + 𝜂∇𝐹(𝑎𝑛−1 )
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 8:
The decision surface between two normally distributed class ω1 and ω2 is shown on the figure.
Can you comment which of the following is true?
a. 𝑝(𝜔1 ) = 𝑝(𝜔2 )
Correct Answer: c
Detailed Solution:
If the prior probabilities are not equal, the optimal boundary hyperplane is shifted away
from the more likely mean.
______________________________________________________________________________
QUESTION 9:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. Assigning the label which is most frequent among the k nearest training samples.
b. Assigning the unknown object to the class of its nearest neighbour among training
sample.
c. Assigning the label which is most frequent among the all training samples except
the k farthest neighbor.
d. None of this.
Correct Answer: a
Detailed Solution:
QUESTION 10:
What is the direction of weight vector w.r.t. decision surface for linear classifier?
a. Parallel
b. Normal
c. At an inclination of 45
d. Arbitrary
Correct Answer: b
Detailed Solution:
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 3
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Find the distance of the 3D point, 𝑃 = (−3, 1, 3) from the plane defined by
2𝑥 + 2𝑦 + 5𝑧 + 9 = 0?
a. 3.1
b. 4.6
c. 0
d. ∞ (infinity)
Correct Answer: b
Detailed Solution:
QUESTION 2:
What is the shape of the loss landscape during optimization of SVM?
a. Linear
b. Paraboloid
c. Ellipsoidal
d. Non-convex with multiple possible local minimum
Correct Answer: b
Detailed Solution:
In SVM the objective to find the maximum margin based hyperplane (W) such that
The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function.
______________________________________________________________________________
QUESTION 3:
How many local minimum can be encountered while solving the optimization for maximizing
margin for SVM?
a. 1
b. 2
c. ∞ (infinite)
d. 0
Correct Answer: a
Detailed Solution:
In SVM the objective to find the maximum margin-based hyperplane (W) such that
The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function. Since the shape is paraboloid, there can be only 1 global minimum.
______________________________________________________________________________
QUESTION 4:
Which of the following classifiers can be replaced by a linear SVM?
a. Logistic Regression
b. Neural Networks
c. Decision Trees
d. None of the above
Correct Answer: a
Detailed Solution:
Logistic regression framework belongs to the genre of linear classifier which means the
decision boundary can segregate classes only if they are linearly separable. SVM is also
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
capable of doing so and thus can be used instead of logistic regression classifiers. Neural
networks and decision trees are capable of modeling non-linear decision boundaries which
linear SVM cannot model directly.
______________________________________________________________________________
QUESTION 5:
Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>?
a. 0
4
b.
√5
2
c.
√17
−2
d. 17
Correct Answer: b
Detailed Solution:
𝒃∙𝒂
Scalar projection of b onto vector a is given by the scalar value |𝒂|
____________________________________________________________________________
QUESTION 6:
For a 2-class problem what is the minimum possible number of support vectors. Assume there
are more than 4 examples from each class?
a. 4
b. 1
c. 2
d. 8
Correct Answer: c
Detailed Solution:
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 7:
Which one of the following is a valid representation of hinge loss (of margin = 1) for a two-class
problem?
Correct Answer: a
Detailed Solution:
Hinge loss is meant to yield a value of 0 if the predicted output (p) has the same sign as that
of the class label and satisfies the margin condition, |p| > 1. If the signs differ, the loss is
meant to increase linearly as a function of p. Option (a) satisfies the above criteria.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 8:
Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1,
y1) = (−1, −1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect
to SVM?
a. Maximum margin will increase if we remove the point p2 from the training
set.
b. Maximum margin will increase if we remove the point p3 from the training
set.
c. Maximum margin will remain same if we remove the point p2 from the
training set.
d. None of the above.
Correct Answer: a
Detailed Solution:
Here the point p2 is a support vector, if we remove the point p2 then maximum margin will
increase.
____________________________________________________________________________
Question 9:
If we employ SVM to realize two input logic gates, then which of the following will be true?
a. The weight vector for AND gate and OR gate will be same.
b. The margin for AND gate and OR gate will be same.
c. Both the margin and weight vector will be same for AND gate and OR
gate.
d. None of the weight vector and margin will be same for AND gate and
OR gate.
Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
As we can see although the weight vectors are not same but the margin is same.
______________________________________________________________________________
QUESTION 10:
What will happen to the margin length of a max-margin linear SVM if one of non-support vector
training example is removed??
Correct Answer: c
Detailed Solution:
In max-margin linear SVM, the separating hyper-planes are determined only by the
training examples which are support vectors. The non-support vector training examples do
not influence the geometry of the separating planes. Thus, the margin, in our case, will be
unaltered.
____________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
A given cost function is of the form J(θ) = θ2 - θ+2? What is the weight update rule for gradient
descent optimization at step t+1? Consider, 𝛼=0.01 to be the learning rate.
a. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 0.01(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − (2𝜃 − 1)
d. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(𝜃 − 1)
Correct Answer: a
Detailed Solution:
𝜕𝐽(𝜃)
= 2𝜃 − 1
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 2:
Can you identify in which of the following graph gradient descent will not work correctly?
a. First figure
b. Second figure
c. First and second figure
d. Fourth figure
Correct Answer: b
Detailed Solution:
This is a classic example of saddle point problem of gradient descent. In the second graph
gradient descent may get stuck in the saddle point.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 3:
From the following two figures can you identify which one corresponds to batch gradient
descent and which one to Stochastic gradient descent?
Correct Answer: a
Detailed Solution:
The graph of cost vs epochs is quite smooth for batch gradient descent because we are
averaging over all the gradients of training data for a single step. The average cost over the
epochs in Stochastic gradient descent fluctuates because we are using one example at a
time.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Suppose for a cost function 𝐽(𝜃) = 0.25𝜃 2 as shown in graph below, in which point do you feel
magnitude of weight update will be more? 𝜃 is plotted along horizontal axis.
Correct Answer: a
Detailed Solution:
Weight update is directly proportional to the magnitude of the gradient of the cost
𝜕𝐽(𝜃)
function. In our case, 𝜕𝜃
= 0.5𝜃. So, the weight update will be more for higher values of 𝜃.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
Which logic function can be performed using a 2-layered Neural Network?
a. AND
b. OR
c. XOR
d. All
Correct Answer: d
Detailed Solution:
A two layer neural network can be used for any type logic Gate (linear or non linear)
implementation.
____________________________________________________________________________
QUESTION 6:
Let X and Y be two features to discriminate between two classes. The values and class labels of
the features are given hereunder. The minimum number of neuron-layers required to design
the neural network classifier
X Y #Class
0 2 Class-II
1 2 Class-I
2 2 Class-I
1 3 Class-I
1 -3 Class-II
a. 1
b. 2
c. 4
d. 5
Correct Answer: a.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Plot the feature points. They are linearly separable. Hence single layer is able to do the
classification task.
____________________________________________________________________________
QUESTION 7:
Which among the following options give the range for a logistic function?
a. -1 to 1
b. -1 to 0
c. 0 to 1
d. 0 to infinity
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 8:
The number of weights (including bias) to be learned by the neural network having 3 inputs and
2 classes and a hidden layer with 5 neurons is:
a. 12
b. 15
c. 25
d. 32
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
For a XNOR function as given in the figure below, activation function of each node is given by:
1, 𝑥 ≥ 0
𝑓(𝑥) = { . Consider 𝑋1 = 1 and𝑋2 = 0, what will be the output for the above
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
neural network?
a. 1.5
b. 2
c. 0
d. 1
Correct Answer: c
Detailed Solution:
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 10:
Which activation function is more prone to vanishing gradient problem?
a. ReLU
b. Tanh
c. sigmoid
d. Threshold
Correct Answer: b
Detailed Solution:
************END*******
Deep Learning
Assignment- Week 5
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
_____________________________________________________________________________
QUESTION 1:
Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is
represented by a 3D feature vector and we have a binary classification problem. Calculate the
number of parameters of the network. Consider there are NO bias nodes in the network.
a. 100
b. 120
c. 140
d. 125
Correct Answer: b
Detailed Solution:
--------------------------------------------------------------------------------------------------------------------
QUESTION 2:
For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the
probability of belonging to class= -1 ?
a. 0
b. 0.22
c. 0.78
d. -0.22
Correct Answer: c
Detailed Solution:
In the binary classification setting we keep a single output node which can denote the probability
(p) of belonging to class= +1. So, probability of belonging to class= -1 is (1 - p) since the 2
classes are mutually exclusive.
______________________________________________________________________________
QUESTION 3:
Input to SoftMax activation function is [2,4,6]. What will be the output?
a. [0.11,0.78,0.11]
b. [0.016,0.117, 0.867]
c. [0.045,0.910,0.045]
d. [0.21, 0.58,0.21]
Correct Answer: b
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟐) = 𝑛 =0.016and similarly the other values
∑𝑘=1 𝒆𝒙𝒌
______________________________________________________________________________
QUESTION 4:
A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of
proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be:
a. 40
b. 20
c. 80
d. 10
Correct Answer: a
Detailed Solution:
In order to find out the output, we multiply the weights with their respective inputs, add the
results and then further multiply them with their transfer function.
______________________________________________________________________
QUESTION 5:
Which one of the following activation functions is NOT analytically differentiable for all real
values of the given input?
a. Sigmoid
b. Tanh
c. ReLU
d. None of the above
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 6:
Which function do the following perceptron realize? X1 and x2 can take only binary values. h(x)
is the activation function. ℎ(𝑥) = 1 if 𝑥 > 0, else 0.
a. NAND
b. NOR
c. AND
d. OR
Correct Answer: b
Detailed Solution:
In the above figure, when either i1 or i2 is 1, output is zero. When both i1 and i2 is 0,
output is 1, When both i1 and i2 is 1, output is 0. This is NOR logic.
______________________________________________________________________________
QUESTION 7:
In a simple MLP model with 10 neurons in the input layer, 100 neurons in the hidden layer and
1 neuron in the output layer. What is the size of the weight matrices between hidden output
layer and input hidden layer?
a. [10x1] , [100 X 2]
b. [100x1] , [ 10 X 1]
c. [100 x 10], [10 x 1]
d. [100x 1] , [10 x 100]
Correct Answer: d
Detailed Solution:
The size of weights between any layer 1 and layer 2 Is given by [nodes in layer 1 X nodes in layer
2]
______________________________________________________________________________
QUESTION 8:
Consider a fully connected neural network with input, one hidden layer, and output layer with
40, 2, 1 nodes respectively in each layer. What is the total number of learnable parameters (no
biases)?
a. 2
b. 82
c. 80
d. 40
Correct Answer: b
Detailed Solution:
Number of learnable parameters are weights and bias. Given there are no bias nodes. For
fully connected network, since each node is connected :
Thus it will be (40*2)+(2*1) =82.
QUESTION 9:
You want to build a 10-class neural network classifier, given a cat image, you want to classify
which of the 10 cat breeds it belongs to. Which among the 4 options would be an appropriate
loss function to use for this task?
Correct Answer: a
Detailed Solution:
Out of the given options, Cross Entropy Loss is well suited for classification problems which is
the end task given in the question.
______________________________________________________________________________
QUESTION 10:
You’d like to train a fully-connected neural network with 5 hidden layers, each with 10 hidden
units. The input is 20-dimensional and the output is a scalar. What is the total number of
trainable parameters in your network? There is no bias.
a. (20+1)*10 + (10+1)*10*4 + (10+1)*1
b. (20)*10 + (10)*10*4 + (10)*1
c. (20)*10 + (10)*10*5 + (10)*1
d. (20+1)*10 + (10+1)*10*5 + (10+1)*1
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 6
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Suppose a neural network has 3 input nodes, a, b, c. There are 2 neurons, X and Y. X = a+ b and
Y = X * c. What is the gradient of Y with respect to a, b and c? Assume, (a, b, c) = (6, -1, -4).
a. (5, -4, -4)
b. (4, 4, -5)
c. (-4, -4, 5)
d. (3, 3, 4)
Correct Answer: c
Detailed Solution:
𝝏𝒀
𝒀 = 𝑿. 𝒄, =𝑿=𝒂+𝒃=𝟓
𝝏𝒄
𝝏𝒀 𝝏𝒀
𝒀 = 𝑿. 𝒄 = (𝒂 + 𝒃). 𝒄, = 𝒄 = −𝟒, = 𝒄 = −𝟒
𝝏𝒂 𝝏𝒃
______________________________________________________________________________
QUESTION 2:
𝑑𝑦 𝑑𝑦
𝑦 = max(𝑎, 𝑏) and 𝑎 > 𝑏. What is the value of 𝑑𝑎 and 𝑑𝑏 ?
a. 1, 0
b. 0, 1
c. 0, 0
d. 1, 1
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
QUESTION 3:
PCA reduces the dimension by finding a few________.
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
QUESTION 4:
Consider the four sample points below, 𝑋𝑖 ∈ ℝ2 .
We want to represent the data in 1D using PCA. Compute the unit-length principal component
directions of X, and then choose from the options below which one the PCA algorithm would
choose if you request just one principal component.
a. [1/√2 1/√2]𝑇
b. [1/√2 −1/√2]𝑇
c. [−1/√2 1/√2]𝑇
d. [1/√2 1/√2]𝑇
Correct Answer: d
Detailed Solution:
Centering X,
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝟏𝟎 𝟔
𝟏
𝑿𝑻𝒄 𝑿𝒄 = 𝟒 [ ].
𝟔 𝟏𝟎
Now eigen vector with eigen value 16 is [1/√2 1/√2]𝑇
Now eigen vector with eigen value 4 is [1/√2 −1/√2]𝑇
QUESTION 5:
Correct Answer: b
Detailed Solution:
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
What is true regarding backpropagation rule?
a. It is a feedback neural network
b. Gradient of the final layer of weights being calculated first and the gradient of the first
layer of weights being calculated last
c. Hidden layers is not important, only meant for supporting input and output layer
d. None of the mentioned
Correct Answer: b
Detailed Solution:
_____________________________________________________________________________
QUESTION 7:
Which of the following is true for PCA? Tick all the options that are correct.
Detailed Solution:
_________________________________________________________________________
QUESTION 8:
A single hidden and no-bias autoencoder has 100 input neurons and 10 hidden neurons. What
will be the number of parameters associated with this autoencoder?
a. 1000
b. 2000
c. 2110
d. 1010
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
Which of the following two vectors can form the first two principal components?
Correct Answer: a
Detailed Solution:
____________________________________________________________________________
QUESTION 10:
Lets say vectors 𝑎⃗ = {2; 4} and 𝑏⃗⃗ = {𝑛; 1} forms the first two principle components after
applying PCA. Under such circumstances, which among the following can be a possible value
of n?
a. 2
b. -2
c. 0
d. 1
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 7
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Select the correct option about Sparse Autoencoder?
Statement 2: The idea is to encourage network to learn an encoding and decoding which only
relies on activating a small number of neurons
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 2:
Select the correct option about Denoising autoencoders?
Statement A: The loss is between the original input and the reconstruction from a noisy version
of the input
Correct Answer: d
Detailed Solution:
For denoising autoencoder, both statement 1 and 2 are true. Thus option (d) is correct
______________________________________________________________________________
QUESTION 3:
Which of the following autoencoder methods uses corrupted versions of the input?
a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 4:
Which of the following autoencoder methods uses a hidden layer with fewer units than the
input layer?
a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design
Correct Answer: b
Detailed Solution:
QUESTION 5:
Correct Answer: b
Detailed Solution:
Except option (b), rest all the options are true about auroencoders
____________________________________________________________________________
QUESTION 6:
Find the value of 𝑑(𝑡 − 34) ∗ 𝑥(𝑡 + 56); 𝑑(𝑡) being the delta function and * being the
convolution operation.
a. 𝑥(𝑡 + 56)
b. 𝑥(𝑡 + 32)
c. 𝑥(𝑡 + 22)
d. 𝑥(𝑡 − 56)
Correct Answer: c
Detailed Solution:
_____________________________________________________________________________
QUESTION 7:
Impulse response is the output of ________________system due to impulse input applied at
time=0. Fill in the blanks from the options below.
a. Linear
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
b. Time Varying
c. Time Invariant
d. Linear And Time Invariant
Correct Answer: d
Detailed Solution:
Impulse response is output of LTI system due to impulse input pplied at time t=0 or n=0.
Behaviour of an LTI system is characterized by its impulse response.
_________________________________________________________________________
QUESTION 8:
Convolution of an input with the system impulse function gives the output of a___ system. Fill
in the blanks.
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
Given the image below where, Row 1: Original Input, Row 2: Noisy input, Row 3: Reconstructed
output. Choose one of the following variants of autoencoder that is most suited to get Row 3
from Row 2.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. Stacked autoencoder
b. Sparse autoencoder
c. Denoising autoencoder
d. None of the above
Correct Answer: c
Detailed Solution:
Reconstruction of original noise-free data from noisy input is the tasks of denoising
autoencoder
____________________________________________________________________________
QUESTION 10:
Which of the following is true for Contractive Autoencoders?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. penalizing instances where a small change in the input leads to a large change in
the encoding space
b. penalizing instances where a large change in the input leads to a small change in
the encoding space
c. penalizing instances where a small change in the input leads to a small change in
the encoding space
d. None of the above
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 8
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Which of the following is false about CNN?
Detailed Solution:
QUESTION 2:
The input image has been converted into a matrix of size 64 X 64 and a kernel/filter of size 5x5
with a stride of 1 and no padding. What will be the size of the convoluted matrix?
a. 5x5
b. 59x59
c. 64x64
d. 60x60
Correct Answer: d
Detailed Solution:
The size of the convoluted matrix is given by CxC where C=((I-F+2P)/S)+1, where C is the
size of the Convoluted matrix, I is the size of the input matrix, F the size of the filter matrix
and P the padding applied to the input matrix. Here P=0, I=64, F=5 and S=1. Therefore,
the answer is 60x60.
______________________________________________________________________________
QUESTION 3:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Filter size of 3x3 is convolved with matrix of size 4x4 (stride=1). What will be the size of output
matrix if valid padding is applied:
a. 4x4
b. 3x3
c. 2x2
d. 1x1
Correct Answer: c
Detailed Solution:
This type is used when there is no requirement for Padding. The output matrix after
convolution will have the dimension of ((n – f +2P)/S+ 1) x ((n – f +2P)/S+ 1)
______________________________________________________________________________
QUESTION 4:
Let us consider a Convolutional Neural Network having three different convolutional layers in
its architecture as:
Layer 3 of the above network is followed by a fully connected layer. If we give a 3-D
image input of dimension 39 X 39 to the network, then which of the following is the input
dimension of the fully connected layer.
a. 1960
b. 2200
c. 4563
d. 13690
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: a
Detailed Solution:
the input image of dimension 39 X 39 X 3 convolves with 10 filters of size 3 X 3 and takes
the Stride as 1 with no padding. After these operations, we will get an output of 37 X 37 X
10.
______________________________________________________________________________
QUESTION 5:
Suppose you have 40 convolutional kernel of size 3 x 3 with no padding and stride 1 in the first
layer of a convolutional neural network. You pass an input of dimension 1024x1024x3 through
this layer. What are the dimensions of the data which the next layer will receive?
a. 1020x1020x40
b. 1022x1022x40
c. 1021x1021x40
d. 1022x1022x3
Correct Answer: b
Detailed Solution:
Requires four hyperparameters: Number of filters K=40, their spatial extent F=3, the
stride S=1, the amount of padding P=0.
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
Consider a CNN model which aims at classifying an image as either a rose,or a marigold, or a lily
or an orchid (consider the test image can have only 1 of the classes at a time) . The last (fully-
connected) layer of the CNN outputs a vector of logits, L, that is passed through a ____
activation that transforms the logits into probabilities, P. These probabilities are the model
predictions for each of the 4 classes. Fill in the blanks with the appropriate option.
a. Leaky ReLU
b. Tanh
c. ReLU
d. Softmax
Correct Answer: d
Detailed Solution:
Softmax works best if there is one true class per example, because it outputs a probability
vector whose entries sum to 1.
____________________________________________________________________________
QUESTION 7:
Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer with
100 filters that are each 5x5. How many parameters does this hidden layer have (without bias)
a. 2501
b. 2600
c. 7500
d. 7600
Correct Answer: c
Detailed Solution:
Now we have 100 such filters. Now, as there is no bias so, total number of parameters= = 5
* 5 * 3 * 100 = 7500
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
______________________________________________________________________________
QUESTION 8:
Which of the following activation functions can lead to vanishing gradients?
a. ReLU
b. Sigmoid
c. Leaky ReLU
d. None of the above
Correct Answer: b
Detailed Solution:
For sigmoid activation, a large change in the input of the sigmoid function will cause a
small change in the output. Hence, the derivative becomes small. When more and more
layers uses such activation, the gradient of the loss function becomes very small making the
network difficult to train.
___________________________________________________________________________
QUESTION 9:
Statement 1: Residual networks can be a solution for vanishing gradient problem
Statement 3: Residual networks can never be a solution for vanishing gradient problem
a. Statement 2 is correct
b. Statement 3 is correct
c. Both Statement 1 and Statement 2 are correct
d. Both Statement 2 and Statement 3 are correct
Correct Answer: c
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
____________________________________________________________________________
QUESTION 10:
Input to SoftMax activation function is [0.5,0.5,1]. What will be the output?
a. [0.28,0.28,0.44]
b. [0.022,0.956, 0.022]
c. [0.045,0.910,0.045]
d. [0.42, 0.42,0.16]
Correct Answer: a
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟎. 𝟓) = 𝑛 =0.28and similarly the other values
∑𝑘=1 𝒆𝒙𝒌
______________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
What can be a possible consequence of choosing a very small learning rate?
a. Slow convergence
b. Overshooting minima
c. Oscillations around the minima
d. All of the above
Correct Answer: a
Detailed Solution:
Choosing a very small learning rate can lead to slower convergence and thus option (a) is
correct.
______________________________________________________________________________
QUESTION 2:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for 𝛾?
𝑉𝑡 = 𝛾𝑉𝑡−1 + 𝜂∇𝜃 𝐽(𝜃)
a. 𝛾 is the momentum term which indicates acceleration
b. 𝛾 is the step size
c. 𝛾 is the first order moment
d. 𝛾 is the second order moment
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. 𝛾 is
that fraction which indicates how much acceleration you want and its value lies between 0 and 1.
______________________________________________________________________________
QUESTION 3:
Which of the following is true about momentum optimizer?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: d
Detailed Solution:
Option (a), (b) and (c) all are true for momentum optimiser. Thus, option (d) is correct.
______________________________________________________________________________
QUESTION 4:
Let 𝐽(𝜃) be the cost function. Let the gradient descent update rule for 𝜃𝑖 be,
𝜃𝑖+1 = 𝜃𝑖 + ∇𝜃𝑖
𝑑𝐽(𝜃𝑖 )
a. −𝛼 𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
b. 𝛼 𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
c. − 𝑑𝜃
𝑖+1
𝑑𝐽(𝜃𝑖 )
d. 𝑑𝜃𝑖
Correct Answer: a
Detailed Solution:
Gradient descent update rule for 𝜃𝑖 is,
𝑑𝐽(𝜃𝑖 )
𝜃𝑖+1 = 𝜃𝑖 − 𝛼 , 𝛼 is the learning rate
𝑑 𝜃𝑖
______________________________________________________________________________
QUESTION 5:
A given cost function is of the form J(θ) =6 θ2 - 6θ+6? What is the weight update rule for
gradient descent optimization at step t+1? Consider, 𝛼 to be the learning rate.
a. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 6𝛼(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − 𝛼(12𝜃 − 6 + 6)
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 + 1)
Correct Answer: a
Detailed Solution:
𝜕𝐽(𝜃)
= 12𝜃 − 6
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
______________________________________________________________________________
QUESTION 6:
If the first few iterations of gradient descent cause the function f(θ0,θ1) to increase rather than
decrease, then what could be the most likely cause for this?
Correct Answer: a
Detailed Solution:
If learning rate were small enough, then gradient descent should successfully take a tiny small
downhill and decrease f(θ0,θ1) at least a little bit. If gradient descent instead increases the
objective value that means learning rate is too high.
______________________________________________________________________________
QUESTION 7:
For a function f(θ0,θ1), if θ0 and θ1 are initialized at a global minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent?
Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________
QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
d. Too small update of weight values leading to slower convergence
Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
______________________________________________________________________________
QUESTION 9:
What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Update the weights and biases using gradient descent formula
3. Pass an input through the network and get values from output layer
4. Initialize weights and biases of the network with random values
5. Calculate gradient value corresponding to each weight and bias
a. 1, 2, 3, 4, 5
b. 5, 4, 3, 2, 1
c. 3, 2, 1, 5, 4
d. 4, 3, 1, 5, 2
Correct Answer: d
Detailed Solution:
Initialize random weights, and then start passing input instances and calculate error response
from output layer and back-propagate the error through each subsequent layers. Then update the
neuron weights using a learning rate and gradient of error. Please refer to the lectures of week 4.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 10:
You run gradient descent for 15 iterations with learning rate 𝜂 = 0.3 and compute error after
each iteration. You find that the value of error decreases very slowly. Based on this, which of
the following conclusions seems most plausible?
Correct Answer: a
Detailed Solution:
Error rate is decreasing very slowly. Therefore increasing the learning rate is a most plausible
solution.
______________________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 10
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
a. Prevent overfitting
b. Faster convergence
c. Faster inference time
d. Prevent Co-variant shift
Correct Answer: c
Detailed Solution:
Inference time does not become faster due to batch normalization. It increases the computational
burden. So, inference time increases.
____________________________________________________________________________
QUESTION 2:
A neural network has 3 neurons in a hidden layer. Activations of the neurons for three batches
1 0 6
are[2] , [2] , [9] respectively. What will be the value of mean if we use batch normalization in
3 5 2
this layer?
2.33
a. [4.33]
3.33
2.00
b. [2.33]
5.66
1.00
c. [1.00]
1.00
0.00
d. [0.00]
0.00
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: a
Detailed Solution:
1 1 0 6 2.33
× ([2] + [2] + [9]) = [4.33]
3
3 5 2 3.33
______________________________________________________________________________
QUESTION 3:
How can we prevent underfitting?
Correct Answer: b
Detailed Solution:
Underfitting happens whenever feature samples are capable enough to capture the data
distribution. We need to increase the feature size, so data can be fitted perfectly well.
______________________________________________________________________________
QUESTION 4:
How do we generally calculate mean and variance during testing?
Correct Answer: c
Detailed Solution:
We generally calculate batch mean and variance statistics during training and use the estimated
batch mean and variance during testing.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
Which one of the following is not an advantage of dropout?
a. Regularization
b. Prevent Overfitting
c. Improve Accuracy
d. Reduce computational cost during testing
Correct Answer: d
Detailed Solution:
Dropout makes some random features during training but while testing we don’t zero-down any
feature. So there is no question of reduction of computational cost.
______________________________________________________________________________
QUESTION 6:
What is the main advantage of layer normalization over batch normalization?
a. Faster convergence
b. Lesser computation
c. Useful in recurrent neural network
d. None of these
Correct Answer: c
Detailed Solution:
See the lectures/lecture materials.
______________________________________________________________________________
QUESTION 7:
While training a neural network for image recognition task, we plot the graph of training error
and validation error. Which is the best for early stopping?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. A
b. B
c. C
d. D
Correct Answer: c
Detailed Solution:
Minimum validation point is the best for early stopping.
______________________________________________________________________________
QUESTION 8:
Which among the following is NOT a data augmentation technique?
Correct Answer: b
Detailed Solution:
Random shuffle of all the pixels of the image will distort the image and neural network will be
unable to learn anything. So, it is not a data augmentation technique.
______________________________________________________________________________
QUESTION 9:
Which of the following is true about model capacity (where model capacity means the ability of
neural network to approximate complex functions)?
Correct Answer: a
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Dropout and learning rate has nothing to do with model capacity. If hidden layers increase, it
increases the number of learnable parameter. Therefore, model capacity increases.
______________________________________________________________________________
QUESTION 10:
Batch Normalization is helpful because
Correct Answer: a
Detailed Solution:
Batch normalization layer normalizes the input.
______________________________________________________________________________
______________________________________________________________________________
************END*******