DL - Assignment 7 Solution

NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 7
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Select the correct option.
a. Layer-by-layer autoencoder pretraining reduces GPU/CPU RAM requirements

b. Layer-by-layer autoencoder pretraining alleviates slow convergence
c. Layer-by-layer autoencoder pretraining followed by finetuning converges to more
optimal parameters than End-to-End training of autoencoders
d. All of the above
Correct Answer: d
Detailed Solution:
Layer-by-Layer autoencoder training reduces GPU/CPU memory requirements since

activations of one layer needs to be saved for backward pass at a time, End-to-End training
converges slowly due to problem of vanishing gradients and dimensionality collapse and
Layer-by-layer followed by finetuning generally gives better results than an end to end
training.
______________________________________________________________________________
QUESTION 2:
Regularization of Contractive Autoencoder is imposed on
a. Jacobian matrix of encoder activations with respect to the input

b. Weights
c. Inputs
d. Does not use regularization
Correct Answer: a
Detailed Solution:
A contractive autoencoder makes this encoding less sensitive to small variations in its
training dataset. This is accomplished by adding a regularizer, or penalty term, to
whatever cost or objective function the algorithm is trying to minimize. The end result is to
reduce the learned representation’s sensitivity towards the training input. This regularizer
needs to conform to the Frobenius norm of the Jacobian matrix for the encoder activation
sequence, with respect to the input.
______________________________________________________________________________
QUESTION 3:
Select true statements about KL Divergence
a. Measures distance between two probability distribution

b. Has range from 0 to 1
c. Is a symmetric, i.e. 𝐾𝐿(𝑃|𝑄) = 𝐾𝐿(𝑄|𝑃)
d. None of above
Correct Answer: d
Detailed Solution:
KL divergence measures divergence between two probability distributions. It is not

a distance metric since distances hold property of symmetry , i.e. dist(A,B) =
dist(B,A) and Triangle inequality. KL divergence is asymmetric 𝑲𝑳(𝑷|𝑸) ≠
𝑲𝑳(𝑸|𝑷) as
𝑲𝑳(𝑷|𝑸) = 𝑷 𝐥𝐨𝐠 𝑷 − 𝑷 𝐥𝐨𝐠 𝑸

𝑲𝑳(𝑸|𝑷) = 𝑸 𝐥𝐨𝐠 𝑸 − 𝑸 𝐥𝐨𝐠 𝑷
KL Divergence varies from 0 to ∞
____________________________________________________________________________
QUESTION 4:
An overcomplete autoencoder generally learns identity function. How can we prevent those
autoencoder from learning the identity function and learn some useful representations?
a. Stack autoencoder based layer-wise training

b. Train the autoencoder for large number of epochs in order to learn more useful
representation
c. Add noise to the data and train to learn noise-free data from noisy data
d. It is not possible to train overcomplete autoencoder. It always converges to the
identity function.
Correct Answer: c
Detailed Solution:
Training for a greater number of epochs or layer-wise training cannot prevent the overcomplete
autoencoder from learning identity function. Denoising autoencoder in overcomplete
representation can extract more representative information from data.
____________________________________________________________________________
QUESTION 5:
In which conditions, autoencoder has more powerful generalization than Principal Components
Analysis (PCA) while performing dimensionality reduction?
a. Undercomplete Linear Autoencoder

b. Overcomplete Linear Autoencoder
c. Undercomplete Non-linear Autoencoder
d. Overcomplete Non-Linear Autoencoder
Correct Answer: c
Detailed Solution:
Overcomplete autoencoder cannot be used for dimensionality reduction. In case of

undercomplete autoencoder, non-linearity helps to get more powerful generalization than PCA.
Otherwise, linear autoencoder will give equivalent result as compared to PCA.
______________________________________________________________________________
QUESTION 6:
An autoencoder consists of 128 input neurons, 32 hidden neurons. If the network weights are
represented using single precision floating point numbers (size= 4 bytes) then what will be size
of weight matrix?
a. 33408 𝐵𝑦𝑡𝑒𝑠
b. 16704 𝐵𝑦𝑡𝑒𝑠
c. 8352 𝐵𝑦𝑡𝑒𝑠
d. 32768 𝐵𝑦𝑡𝑒𝑠
Correct Answer: a
Detailed Solution:
Total Number of weights = encoder + decoder weights= (𝟏𝟐𝟖 × 𝟑𝟐) + 𝟑𝟐 + (𝟑𝟐 × 𝟏𝟐𝟖) +
𝟏𝟐𝟖 = 𝟖𝟑𝟓𝟐. Single-precision floating-point format occupies 𝟑𝟐 𝒃𝒊𝒕𝒔 𝒐𝒓 𝟒 𝑩𝒚𝒕𝒆𝒔 in
computer memory. So total memory requirement = 𝟒 × 𝟖𝟑𝟓𝟐 = 𝟑𝟑𝟒𝟎𝟖 𝑩𝒚𝒕𝒆𝒔
______________________________________________________________________________
QUESTION 7:
Which of the following is used to match template pattern in a signal
a. Cross Correlation
b. Convolution
c. Normalized cross correlation
d. None of the above
Correct Answer: C
Detailed Solution:
As covered in Lecture 35 at time 28:49, Normalized cross correlation is used for template
matching
______________________________________________________________________________
QUESTION 8:
What is the role of sparsity constraint in a sparse autoencoder?
a. Control the number of active nodes in a hidden layer

b. Control the noise level in a hidden layer
c. Control the hidden layer length
d. Not related to sparse autoencoder
Correct Answer: a
Detailed Solution:
Refer to the lecture.

______________________________________________________________________________
QUESTION 9:
Which of the following is true about convolution?
a. Convolution is used to compute features from signal

b. Can be used to compute cross correlation between 𝑥(𝑡) and 𝑦(𝑡) if input signal
𝑥(𝑡) is transformed to 𝑥(−𝑡) and 𝑦(𝑡) is used as filter.
c. Both a and b
d. None of above
Correct Answer: C
Detailed Solution:
A is true and has been demonstrated in class lecture 34 as edge detector. B is true from
basic signal processing.
____________________________________________________________________________
QUESTION 10:
Which of the following is an LTI/LSI system? 𝑦 and 𝑥 are output and input respectively.
a. 𝑦 = 𝑚×𝑥+𝑛×𝑥
b. 𝑦 = 𝑚×𝑥+𝑐
c. 𝑦 = 𝑚×𝑥−𝑐
d. 𝑦 = 𝑚 × 𝑥2
Correct Answer: a
Detailed Solution:
𝑦 = 𝑚×𝑥+𝑛×𝑥
𝑦 = (𝑚 + 𝑛) × 𝑥
Which is of form
𝑦 = 𝑚′𝑥
Where m’ = m+n
Option b and c has bias/intercept which breaks superposition property (refer slide 12:19 lecture
34) and option d is quadratic dependence
______________________________________________________________________________
______________________________________________________________________
______________________________________________________________________________
************END*******

DL - Assignment 7 Solution

Uploaded by

Copyright:

Available Formats

DL - Assignment 7 Solution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL - Assignment 7 Solution

Uploaded by

Copyright:

Available Formats

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

a. Layer-by-layer autoencoder pretraining reduces GPU/CPU RAM requirements

Layer-by-Layer autoencoder training reduces GPU/CPU memory requirements since

a. Jacobian matrix of encoder activations with respect to the input

Select true statements about KL Divergence

a. Measures distance between two probability distribution

KL divergence measures divergence between two probability distributions. It is not

𝑲𝑳(𝑷|𝑸) = 𝑷 𝐥𝐨𝐠 𝑷 − 𝑷 𝐥𝐨𝐠 𝑸

KL Divergence varies from 0 to ∞

a. Stack autoencoder based layer-wise training

a. Undercomplete Linear Autoencoder

Overcomplete autoencoder cannot be used for dimensionality reduction. In case of

a. Control the number of active nodes in a hidden layer

Refer to the lecture.

a. Convolution is used to compute features from signal

You might also like