Deep Learning - IIT Ropar - Unit 7 - Week 4

10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
vmcse09@gmail.com 
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)
Course Week 4 : Assignment 4

outline The due date for submitting this assignment has passed.
Due on 2024-08-21, 23:59 IST.
About
NPTEL ()
Assignment submitted on 2024-08-21, 16:36 IST
How does an 1) A team has a data set that contains 1000 samples for training a feed-forward neural 1 point
NPTEL network. Suppose they decided to use stochastic gradient descent algorithm to update the
online weights. How many times do the weights get updated after training the network for 5 epochs?
course
work? () 1000
5000
Week 1 ()
100
5
Week 2 ()
Yes, the answer is correct.
Week 3 () Score: 1
Accepted Answers:
5000
week 4 ()
Recap: 2) What is the primary benefit of using Adagrad compared to other optimization 1 point
Learning algorithms?
Parameters:
Guess Work, It converges faster than other optimization algorithms.
Gradient It is more memory-efficient than other optimization algorithms.
Descent (unit?
unit=59&lesso
It is less sensitive to the choice of hyperparameters(learning rate).
n=60) It is less likely to get stuck in local optima than other optimization algorithms.
Contours Yes, the answer is correct.

Maps (unit? Score: 1
unit=59&lesso Accepted Answers:
n=61) It is less sensitive to the choice of hyperparameters(learning rate).
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 1/6
Momentum 3) What are the benefits of using stochastic gradient descent compared to vanilla 1 point
based gradient descent?
Gradient
Descent (unit? SGD converges more quickly than vanilla gradient descent.
unit=59&lesso SGD is computationally efficient for large datasets.
n=62)
SGD theoretically guarantees that the descent direction is optimal.
Nesterov SGD experiences less oscillation compared to vanilla gradient descent.
Accelerated
Gradient Yes, the answer is correct.
Score: 1
Descent (unit?
unit=59&lesso
Accepted Answers:
n=63)
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
Stochastic And
Mini-Batch
4) Select the true statements about the factor β used in the momentum based gradient1 point
Gradient
descent algorithm.
Descent (unit?
unit=59&lesso
n=64) Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
Tips for algorithm
Adjusting
Learning Rate Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
and
Momentum Setting β = 1 makes it equivalent to the vanilla gradient descent algorithm
(unit?
unit=59&lesso Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99
n=65)
Partially Correct.
Line Search Score: 0.67
(unit? Accepted Answers:
unit=59&lesso Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
n=66) algorithm
Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
Gradient
Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99
Descent with
Adaptive
Learning Rate 5) Select the behaviour of the Gradient descent algorithm that uses the following 1 point
(unit? update rule,
unit=59&lesso wt+1 = wt − η∇wt
n=67) where w is a weight and η is a learning rate.
Bias The weight update is tiny at a steep loss surface

Correction in
The weight update is tiny at a gentle loss surface
Adam (unit?
unit=59&lesso The weight update is large at a steep loss surface
n=68) The weight update is large at a gentle loss surface
Lecture Yes, the answer is correct.
Material for Score: 1
Week 4 (unit? Accepted Answers:
unit=59&lesso The weight update is tiny at a gentle loss surface
n=69) The weight update is large at a steep loss surface
Week 4
Feedback
Form: Deep
Learning - IIT
Ropar (unit? 6) The figure below shows the change in loss value over iterations 1 point
unit=59&lesso
n=187)
Quiz: Week 4
: Assignment
4
(assessment?
name=288)
Week 5 ()
Week 6 ()
Week 7 ()
The oscillation in the loss value might be due to
Week 8 ()
Mini-batch gradient descent algorithm used for parameter updates
Week 9 () Batch gradient descent with constant learning rate algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
week 10 () Batch gradient descent with line search algorithm used for parameter updates
Week 11 () Score: 1
Accepted Answers:
Week 12 () Mini-batch gradient descent algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
Download
Videos () 7) We have following functions x 3 , ln(x), ex , x and 4. Which of the following 1 point
functions has the steepest slope at x=1?
Books ()
3
x
Text
Transcripts ln(x)
()
x
e
Problem 4
Solving
No, the answer is incorrect.
Session - Score: 0
July 2024 () Accepted Answers:
3
x
8) Which of the following represents the contour plot of the function f(x,y) = x 2 − y 2 ? 1 point

Score: 1
Accepted Answers:
9) Which of the following are among the disadvantages of Adagrad? 1 point
It doesn’t work well for the Sparse matrix.

It usually goes past the minima.
It gets stuck before reaching the minima.
Weight updates are very small at the initial stages of the algorithm.

Score: 1
Accepted Answers:
It gets stuck before reaching the minima.
10) What is the role of activation functions in deep learning? 1 point
Activation functions transform the output of a neuron into a non-linear function, allowing
the network to learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed
for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.

Score: 1
Accepted Answers:
Activation functions transform the output of a neuron into a non-linear function, allowing the
network to learn complex patterns.

Deep Learning - IIT Ropar - Unit 7 - Week 4

Uploaded by

Copyright:

Available Formats

Deep Learning - IIT Ropar - Unit 7 - Week 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning - IIT Ropar - Unit 7 - Week 4

Uploaded by

Copyright:

Available Formats

10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)

Course Week 4 : Assignment 4

Contours Yes, the answer is correct.

n=67) where w is a weight and η is a learning rate.

Bias The weight update is tiny at a steep loss surface

Yes, the answer is correct.

9) Which of the following are among the disadvantages of Adagrad? 1 point

It doesn’t work well for the Sparse matrix.

Yes, the answer is correct.

10) What is the role of activation functions in deep learning? 1 point

Yes, the answer is correct.

You might also like