Deep Learning - IIT Ropar - Unit 7 - Week 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

vmcse09@gmail.com 

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)

Course Week 4 : Assignment 4


outline The due date for submitting this assignment has passed.
Due on 2024-08-21, 23:59 IST.
About
NPTEL ()
Assignment submitted on 2024-08-21, 16:36 IST
How does an 1) A team has a data set that contains 1000 samples for training a feed-forward neural 1 point
NPTEL network. Suppose they decided to use stochastic gradient descent algorithm to update the
online weights. How many times do the weights get updated after training the network for 5 epochs?
course
work? () 1000
5000
Week 1 ()
100
5
Week 2 ()
Yes, the answer is correct.
Week 3 () Score: 1
Accepted Answers:
5000
week 4 ()

Recap: 2) What is the primary benefit of using Adagrad compared to other optimization 1 point
Learning algorithms?
Parameters:
Guess Work, It converges faster than other optimization algorithms.
Gradient It is more memory-efficient than other optimization algorithms.
Descent (unit?
unit=59&lesso
It is less sensitive to the choice of hyperparameters(learning rate).
n=60) It is less likely to get stuck in local optima than other optimization algorithms.

Contours Yes, the answer is correct.


Maps (unit? Score: 1
unit=59&lesso Accepted Answers:
n=61) It is less sensitive to the choice of hyperparameters(learning rate).

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 1/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

Momentum 3) What are the benefits of using stochastic gradient descent compared to vanilla 1 point
based gradient descent?
Gradient
Descent (unit? SGD converges more quickly than vanilla gradient descent.
unit=59&lesso SGD is computationally efficient for large datasets.
n=62)
SGD theoretically guarantees that the descent direction is optimal.
Nesterov SGD experiences less oscillation compared to vanilla gradient descent.
Accelerated
Gradient Yes, the answer is correct.
Score: 1
Descent (unit?
unit=59&lesso
Accepted Answers:
n=63)
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
Stochastic And
Mini-Batch
4) Select the true statements about the factor β used in the momentum based gradient1 point
Gradient
descent algorithm.
Descent (unit?
unit=59&lesso
n=64) Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
Tips for algorithm
Adjusting
Learning Rate Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
and
Momentum Setting β = 1 makes it equivalent to the vanilla gradient descent algorithm
(unit?
unit=59&lesso Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99

n=65)
Partially Correct.
Line Search Score: 0.67
(unit? Accepted Answers:
unit=59&lesso Setting β = 0.1 allows the algorithm to move faster than the vanilla gradient descent
n=66) algorithm
Setting β = 0 makes it equivalent to the vanilla gradient descent algorithm
Gradient
Oscillation around the minimum will be less if we set β = 0.1 than setting β = 0.99
Descent with
Adaptive
Learning Rate 5) Select the behaviour of the Gradient descent algorithm that uses the following 1 point
(unit? update rule,
unit=59&lesso wt+1 = wt − η∇wt

n=67) where w is a weight and η is a learning rate.

Bias The weight update is tiny at a steep loss surface


Correction in
The weight update is tiny at a gentle loss surface
Adam (unit?
unit=59&lesso The weight update is large at a steep loss surface
n=68) The weight update is large at a gentle loss surface
Lecture Yes, the answer is correct.
Material for Score: 1
Week 4 (unit? Accepted Answers:
unit=59&lesso The weight update is tiny at a gentle loss surface
n=69) The weight update is large at a steep loss surface
Week 4
Feedback
Form: Deep
Learning - IIT

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 2/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

Ropar (unit? 6) The figure below shows the change in loss value over iterations 1 point
unit=59&lesso
n=187)

Quiz: Week 4
: Assignment
4
(assessment?
name=288)

Week 5 ()

Week 6 ()

Week 7 ()
The oscillation in the loss value might be due to
Week 8 ()
Mini-batch gradient descent algorithm used for parameter updates

Week 9 () Batch gradient descent with constant learning rate algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
week 10 () Batch gradient descent with line search algorithm used for parameter updates
Yes, the answer is correct.
Week 11 () Score: 1
Accepted Answers:
Week 12 () Mini-batch gradient descent algorithm used for parameter updates
Stochastic gradient descent algorithm used for parameter updates
Download
Videos () 7) We have following functions x 3 , ln(x), ex , x and 4. Which of the following 1 point
functions has the steepest slope at x=1?
Books ()

3
x
Text
Transcripts ln(x)
()
x
e

Problem 4
Solving
No, the answer is incorrect.
Session - Score: 0
July 2024 () Accepted Answers:
3
x

8) Which of the following represents the contour plot of the function f(x,y) = x 2 − y 2 ? 1 point

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 3/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 4/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

Yes, the answer is correct.


Score: 1
Accepted Answers:

9) Which of the following are among the disadvantages of Adagrad? 1 point

It doesn’t work well for the Sparse matrix.


It usually goes past the minima.
It gets stuck before reaching the minima.
Weight updates are very small at the initial stages of the algorithm.

Yes, the answer is correct.


Score: 1
Accepted Answers:
It gets stuck before reaching the minima.

10) What is the role of activation functions in deep learning? 1 point

Activation functions transform the output of a neuron into a non-linear function, allowing
the network to learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed
for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 5/6
10/27/24, 1:13 PM Deep Learning - IIT Ropar - - Unit 7 - week 4

Yes, the answer is correct.


Score: 1
Accepted Answers:
Activation functions transform the output of a neuron into a non-linear function, allowing the
network to learn complex patterns.

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 6/6

You might also like