Deep Learning - IIT Ropar - Unit 7 - Week 4
Deep Learning - IIT Ropar - Unit 7 - Week 4
Deep Learning - IIT Ropar - Unit 7 - Week 4
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
vmcse09@gmail.com
Recap: 2) What is the primary benefit of using Adagrad compared to other optimization 1 point
Learning algorithms?
Parameters:
Guess Work, It converges faster than other optimization algorithms.
Gradient It is more memory-efficient than other optimization algorithms.
Descent (unit?
unit=59&lesso
It is less sensitive to the choice of hyperparameters(learning rate).
n=60) It is less likely to get stuck in local optima than other optimization algorithms.
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 1/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
Momentum 3) What are the benefits of using stochastic gradient descent compared to vanilla 1 point
based gradient descent?
Gradient
Descent (unit? SGD converges more quickly than vanilla gradient descent.
unit=59&lesso SGD is computationally efficient for large datasets.
n=62)
SGD theoretically guarantees that the descent direction is optimal.
Nesterov SGD experiences less oscillation compared to vanilla gradient descent.
Accelerated
Gradient Yes, the answer is correct.
Score: 1
Descent (unit?
unit=59&lesso
Accepted Answers:
n=63)
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
Stochastic And
Mini-Batch
4) Select the true statements about the factor β used in the momentum based gradient1 point
Gradient
descent algorithm.
Descent (unit?
unit=59&lesso
n=64) Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
Tips for algorithm
Adjusting
Learning Rate Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
and
Momentum Setting β = 1 makes it equivalent to the vanilla gradient descent algorithm
(unit?
unit=59&lesso Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99
n=65)
Partially Correct.
Line Search Score: 0.67
(unit? Accepted Answers:
unit=59&lesso Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
n=66) algorithm
Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
Gradient
Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99
Descent with
Adaptive
Learning Rate 5) Select the behaviour of the Gradient descent algorithm that uses the following 1 point
(unit? update rule,
unit=59&lesso wt+1 = wt − η∇wt
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 2/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
Ropar (unit? 6) The figure below shows the change in loss value over iterations 1 point
unit=59&lesso
n=187)
Quiz: Week 4
: Assignment
4
(assessment?
name=288)
Week 5 ()
Week 6 ()
Week 7 ()
The oscillation in the loss value might be due to
Week 8 ()
Mini-batch gradient descent algorithm used for parameter updates
Week 9 () Batch gradient descent with constant learning rate algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
week 10 () Batch gradient descent with line search algorithm used for parameter updates
Yes, the answer is correct.
Week 11 () Score: 1
Accepted Answers:
Week 12 () Mini-batch gradient descent algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
Download
Videos () 7) We have following functions x 3 , ln(x), ex , x and 4. Which of the following 1 point
functions has the steepest slope at x=1?
Books ()
3
x
Text
Transcripts ln(x)
()
x
e
Problem 4
Solving
No, the answer is incorrect.
Session - Score: 0
July 2024 () Accepted Answers:
3
x
8) Which of the following represents the contour plot of the function f(x,y) = x 2 − y 2 ? 1 point
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 3/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 4/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
Activation functions transform the output of a neuron into a non-linear function, allowing
the network to learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed
for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 5/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 6/6