DL - Assignment 7 Solution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 7
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
Select the correct option.

a. Layer-by-layer autoencoder pretraining reduces GPU/CPU RAM requirements


b. Layer-by-layer autoencoder pretraining alleviates slow convergence
c. Layer-by-layer autoencoder pretraining followed by finetuning converges to more
optimal parameters than End-to-End training of autoencoders
d. All of the above

Correct Answer: d
Detailed Solution:

Layer-by-Layer autoencoder training reduces GPU/CPU memory requirements since


activations of one layer needs to be saved for backward pass at a time, End-to-End training
converges slowly due to problem of vanishing gradients and dimensionality collapse and
Layer-by-layer followed by finetuning generally gives better results than an end to end
training.

______________________________________________________________________________

QUESTION 2:
Regularization of Contractive Autoencoder is imposed on

a. Jacobian matrix of encoder activations with respect to the input


b. Weights
c. Inputs
d. Does not use regularization

Correct Answer: a

Detailed Solution:

A contractive autoencoder makes this encoding less sensitive to small variations in its
training dataset. This is accomplished by adding a regularizer, or penalty term, to
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

whatever cost or objective function the algorithm is trying to minimize. The end result is to
reduce the learned representation’s sensitivity towards the training input. This regularizer
needs to conform to the Frobenius norm of the Jacobian matrix for the encoder activation
sequence, with respect to the input.

______________________________________________________________________________

QUESTION 3:

Select true statements about KL Divergence

a. Measures distance between two probability distribution


b. Has range from 0 to 1
c. Is a symmetric, i.e. 𝐾𝐿(𝑃|𝑄) = 𝐾𝐿(𝑄|𝑃)
d. None of above

Correct Answer: d
Detailed Solution:

KL divergence measures divergence between two probability distributions. It is not


a distance metric since distances hold property of symmetry , i.e. dist(A,B) =
dist(B,A) and Triangle inequality. KL divergence is asymmetric 𝑲𝑳(𝑷|𝑸) ≠
𝑲𝑳(𝑸|𝑷) as

𝑲𝑳(𝑷|𝑸) = 𝑷 𝐥𝐨𝐠 𝑷 − 𝑷 𝐥𝐨𝐠 𝑸


𝑲𝑳(𝑸|𝑷) = 𝑸 𝐥𝐨𝐠 𝑸 − 𝑸 𝐥𝐨𝐠 𝑷

KL Divergence varies from 0 to ∞

____________________________________________________________________________

QUESTION 4:
An overcomplete autoencoder generally learns identity function. How can we prevent those
autoencoder from learning the identity function and learn some useful representations?

a. Stack autoencoder based layer-wise training


b. Train the autoencoder for large number of epochs in order to learn more useful
representation
c. Add noise to the data and train to learn noise-free data from noisy data
d. It is not possible to train overcomplete autoencoder. It always converges to the
identity function.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Correct Answer: c
Detailed Solution:

Training for a greater number of epochs or layer-wise training cannot prevent the overcomplete
autoencoder from learning identity function. Denoising autoencoder in overcomplete
representation can extract more representative information from data.

____________________________________________________________________________

QUESTION 5:
In which conditions, autoencoder has more powerful generalization than Principal Components
Analysis (PCA) while performing dimensionality reduction?

a. Undercomplete Linear Autoencoder


b. Overcomplete Linear Autoencoder
c. Undercomplete Non-linear Autoencoder
d. Overcomplete Non-Linear Autoencoder

Correct Answer: c
Detailed Solution:

Overcomplete autoencoder cannot be used for dimensionality reduction. In case of


undercomplete autoencoder, non-linearity helps to get more powerful generalization than PCA.
Otherwise, linear autoencoder will give equivalent result as compared to PCA.

______________________________________________________________________________

QUESTION 6:
An autoencoder consists of 128 input neurons, 32 hidden neurons. If the network weights are
represented using single precision floating point numbers (size= 4 bytes) then what will be size
of weight matrix?

a. 33408 𝐵𝑦𝑡𝑒𝑠
b. 16704 𝐵𝑦𝑡𝑒𝑠
c. 8352 𝐵𝑦𝑡𝑒𝑠
d. 32768 𝐵𝑦𝑡𝑒𝑠

Correct Answer: a
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Total Number of weights = encoder + decoder weights= (𝟏𝟐𝟖 × 𝟑𝟐) + 𝟑𝟐 + (𝟑𝟐 × 𝟏𝟐𝟖) +
𝟏𝟐𝟖 = 𝟖𝟑𝟓𝟐. Single-precision floating-point format occupies 𝟑𝟐 𝒃𝒊𝒕𝒔 𝒐𝒓 𝟒 𝑩𝒚𝒕𝒆𝒔 in
computer memory. So total memory requirement = 𝟒 × 𝟖𝟑𝟓𝟐 = 𝟑𝟑𝟒𝟎𝟖 𝑩𝒚𝒕𝒆𝒔

______________________________________________________________________________

QUESTION 7:
Which of the following is used to match template pattern in a signal

a. Cross Correlation
b. Convolution
c. Normalized cross correlation
d. None of the above

Correct Answer: C
Detailed Solution:

As covered in Lecture 35 at time 28:49, Normalized cross correlation is used for template
matching

______________________________________________________________________________

QUESTION 8:
What is the role of sparsity constraint in a sparse autoencoder?

a. Control the number of active nodes in a hidden layer


b. Control the noise level in a hidden layer
c. Control the hidden layer length
d. Not related to sparse autoencoder

Correct Answer: a
Detailed Solution:

Refer to the lecture.


______________________________________________________________________________

QUESTION 9:
Which of the following is true about convolution?

a. Convolution is used to compute features from signal


NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

b. Can be used to compute cross correlation between 𝑥(𝑡) and 𝑦(𝑡) if input signal
𝑥(𝑡) is transformed to 𝑥(−𝑡) and 𝑦(𝑡) is used as filter.
c. Both a and b
d. None of above

Correct Answer: C
Detailed Solution:

A is true and has been demonstrated in class lecture 34 as edge detector. B is true from
basic signal processing.
____________________________________________________________________________

QUESTION 10:
Which of the following is an LTI/LSI system? 𝑦 and 𝑥 are output and input respectively.

a. 𝑦 = 𝑚×𝑥+𝑛×𝑥
b. 𝑦 = 𝑚×𝑥+𝑐
c. 𝑦 = 𝑚×𝑥−𝑐
d. 𝑦 = 𝑚 × 𝑥2

Correct Answer: a
Detailed Solution:

𝑦 = 𝑚×𝑥+𝑛×𝑥
𝑦 = (𝑚 + 𝑛) × 𝑥
Which is of form
𝑦 = 𝑚′𝑥
Where m’ = m+n

Option b and c has bias/intercept which breaks superposition property (refer slide 12:19 lecture
34) and option d is quadratic dependence

______________________________________________________________________________

______________________________________________________________________

______________________________________________________________________________

************END*******

You might also like