0% found this document useful (0 votes)

170 views

Assignment 4 (Sol.) : Reinforcement Learning

This document summarizes the key points from a 6 question reinforcement learning assignment. 1) The assignment describes a haunted house problem as a Markov decision process (MDP) with states of laughter/quiet and actions of playing organ/burning incense. It asks which state transitions and rewards are correct. 2) For the haunted house letter writer, the advice is to play the organ if there is laughter, and to burn incense if the room is quiet. 3) A policy that is greedy with respect to its own value function must be optimal. This summarizes the essential information from the document in 3 sentences.

Uploaded by

simar rocks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

170 views

Assignment 4 (Sol.) : Reinforcement Learning

Uploaded by

simar rocks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment 4 (Sol.

)
Reinforcement Learning
Prof. B. Ravindran

1. You receive the following letter:

Dear Friend,
Some time ago, I bought this old house, but found it to be haunted by ghostly sardonic laughter.
As a result it is hardly habitable. There is hope, however, for by actual testing I have found
that this haunting is subject to certain laws, obscure but infallible, and that the laughter can
be affected by my playing the organ or burning incense. In each minute, the laughter occurs or
not, it shows no degree. What it will do during the ensuing minute depends, in the following
exact way, on what has been happening during the preceding minute:
Whenever there is laughter, it will continue in the succeeding minute unless I play the organ,
in which case it will stop. But continuing to play the organ does not keep the house quiet.
I notice, however, that whenever I burn incense when the house is quiet and do not play the
organ it remains quiet for the next minute.
At this minute of writing, the laughter is going on. Please tell me what manipulations of
incense and organ I should make to get that house quiet, and to keep it so.
Sincerely, At Wits End
Assume that we make the following decisions in formulating this problem as an MDP:
State set: {L, Q}, where L indicates that there is laughter in the room, and Q indicates that
the room is quiet.
Action set: {O ∧ I, O ∧ ¬I, ¬O ∧ I, ¬O ∧ ¬I}, where O corresponds to playing the organ, and
I corresponds to burning incense.
We consider this as a continuing discounted problem with γ = 0.9 and we let the reward be
+1 on any transition into the silent state, and -1 on any transition into the laughing state.
Assuming deterministic state transitions and rewards based upon current state and action,
which among the following 4-tuples (current state, action, next state, reward) represent correct
state transitions and rewards?

(a) (L, O ∧ I, Q, +1)

(b) (L, O ∧ ¬I, L, −1)
(c) (L, ¬O ∧ I, Q, +1)
(d) (L, ¬O ∧ ¬I, L, −1)
(e) (Q, O ∧ I, Q, +1)

1
(f) (Q, O ∧ ¬I, L, −1)
(g) (Q, ¬O ∧ I, Q, +1)
(h) (Q, ¬O ∧ ¬I, L, −1)

Sol. (a), (d), (f), (g), (h)

We know that if there is laughter and the organ is played, then in the next step laughter will
stop. This contradicts option (b). Similarly, option (c) indicates that by burning incense alone
laughter can be made to stop, which is incorrect. Option (e) is also not correct because we
know that playing the organ when the house is quiet does not result in the house staying quiet.

2. Based on the above problem description, what advice will you give to At Wit’s End?

(a) if there is laughter, play the organ and do not burn incense; if room is quite, play the
organ and burn incense
(b) never play the organ, always burn incense
(c) always play the organ, never burn incense
(d) if there is laughter, play the organ; if room is quite, do not play the organ and burn
incense

Sol. (d)

3. If a policy is greedy with respect to its own value function, then it is an optimal policy.

(a) false
(b) true

Sol. (b)
Consider the value function corresponding to an arbitrary policy π. If we derive a policy
that is greedy with respect to this value function, by the policy improvement theorem, we are
guaranteed to get a policy which is at least as good as the policy π. This derived policy will be
equivalent to the policy π if and only if π is optimal. Hence, if a policy is greedy with respect
to its own value function, then it is optimal.

4. Consider a 4 X 4 grid world problem where the goal is to reach either the top left corner or
the bottom right corner. The agent can choose from four actions {up, down, left, right} which
deterministically cause the corresponding state transitions, except that actions that would take
the agent off the grid leave the state unchanged. We model this as an undiscounted, episodic
task, where the reward is -1 for all transitions. Suppose that the agent follows the equiprobable
random policy. Given below is the partial value function for this problem. Calculate respec-
tively, the missing values in the first and second row? (Hint: the Bellman equation must hold
for every state.)

2
(a) -20, -14
(b) -14, -20
(c) -14, -18
(d) -20, -18

Sol. (b)
For the value in the first row, we have
X X
vπ (s) = π(a|s) p(s0 |s, a)[r + vπ (s0 )]
a s0

vπ (s) = 0.25 ∗ (−1 + vπ (s) − 1 − 21 − 19)

vπ (s) = 0.25vπ (s) − 10.5
0.75vπ (s) = −10.5
vπ (s) = −14

Similarly, for the value in the second row, we have

vπ (s) = 0.25 ∗ (−23 + vπ (s) − 1 − 21 − 15)

vπ (s) = 0.25vπ (s) − 15

0.75vπ (s) = −15
vπ (s) = −20

5. If π is the equiprobable random policy, what are the respective values of qπ (s1 , down) and qπ (s2 , down)
given that s1 is the last cell in the third row (value -14) and s2 is the last cell in the second
row?

(a) -1, -15

(b) -15, -21
(c) 0, -14
(d) -13, -19

3
Sol. (a)
For s1 , we have X
qπ (s1 , down) = p(s0 |s1 , down)[r + vπ (s0 )]
s0

qπ (s1 , down) = −1 + 0 = −1

Similarly, for s2 , we have

qπ (s2 , down) = −1 − 14 = −15

6. In a particular grid-world example, rewards are positive for goals, negative for running into
the edge of the world, and zero the rest of the time. Are the signs of these rewards important,
or only the intervals between them? Prove, using the discounted return equation
∞
X
Gt = Rt+1 + γRt+2 + γ 2 Rt+3 + ... = γ k Rt+k+1
k=0

that adding a constant C to all the rewards adds a constant, K, to the values of all states, and
thus does not affect the relative values of any states under any policies. What is K in terms
of C and γ?
1
(a) K = C(1−γ)
1
(b) K = C( 1−γ − 1)
1
(c) K = C( 1−γ + 1)
C
(d) K = 1−γ

Sol. (d)
Assume that the grid-world problem is a continuing task. For some policy π and state s, the
value function can be give as
vπ (s) = Eπ {Gt |st = s}.
Using the discounted reward equation, we have
∞
X
vπ (s) = Eπ { γ k Rt+k+1 |st = s}.
k=0

Adding a constant C to all rewards, we have

∞
X
vπ0 (s) = Eπ { γ k (Rt+k+1 + C)|st = s}
k=0
X∞ ∞
X
= Eπ { γ k Rt+k+1 + C γ k |st = s}
k=0 k=0
C
= vπ (s) + .
1−γ

We see that adding a constant C to all rewards does not affect the relative values of any states
C
under any policies. Here K = 1−γ .

4
7. Given a reinforcement learning problem, algorithm A will return the optimal state value func-
tion for that problem and algorithm B will return the optimal action value function. Your aim
is to use the value function so obtained to behave optimally in the environment. Assuming
that you know the expected rewards but not the transition probabilities corresponding to the
problem in question, which algorithm would you prefer to use for your control task?

(a) algorithm A
(b) algorithm B

Sol. (b)
Since algorithm B returns the optimal action value function, we can use the information
provided by the optimal action value function to control the behaviour of the agent without
knowledge of the transition probabilities of the underlying MDP.
8. In proving that Lπ is a contraction, we had the expression
X X
γ p(j|s)[v(j) − u(j)] ≤ γ||v − u|| p(j|s)
j j

This inequality holds because

(a) v(j) − u(j) is a component of ||v − u||

(b) the max norm of the difference on the LHS is less than the max norm of the difference
on the RHS
(c) the difference in the LHS can be negative but the norm in the RHS is non-negative
(d) the max norm on the RHS can at worst be equal to the difference in the LHS

Sol. (d)
9. We defined the operator Lπ : V → V as Lπ v = rπ + γPπ v. Having seen the proof of the
Banach fixed point theorem and assuming that v π and v ∗ have their usual meanings, which
among the following are implications of showing that Lπ is a contraction?

(a) v π is a fixed point of Lπ

(b) v π is a unique fixed point of Lπ
(c) repeatedly applying Lπ starting with an arbitrary v ∈ V results in convergence to v π
(d) repeatedly applying Lπ starting with an arbitrary v ∈ V results in convergence to v ∗

Sol. (b), (c)

Note that while the statement of option (a) is true, it is a result of the Bellman equation and
the definition of the Lπ operator. Option (d) is not true since repeated application of the
operator guarantees convergence only to v π and not the optimal v ∗ .
10. Given a value v ∈ V , suppose Lπ v = v 0 . Then we can conclude that

(a) v = v 0
(b) v 6= v 0
(c) ||Lπ v − Lπ v 0 || ≤ λ||v − v 0 ||, 0 ≤ λ < 1

5
(d) none of the above

Sol. (c)
The first option may not hold if v 6= vπ . Similarly, the second option may not hold if v = vπ .
The third option is true because Lπ is a contraction and in all three possible scenarios (v 6=
v 0 6= vπ , v 6= v 0 = vπ , and v = v 0 = vπ ), the statement holds.

(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
100% (5)
(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
81 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
Lab 04 Sol PDF
No ratings yet
Lab 04 Sol PDF
7 pages
Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
100% (1)
Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
259 pages
Jim Dai Textbook
No ratings yet
Jim Dai Textbook
168 pages
A8
No ratings yet
A8
4 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
Assignment 7 (Sol.) : Reinforcement Learning
0% (1)
Assignment 7 (Sol.) : Reinforcement Learning
3 pages
CEHT Question Bank
No ratings yet
CEHT Question Bank
2 pages
Assignment 11: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 11: Reinforcement Learning Prof. B. Ravindran
4 pages
NPTEL Assignment 0
No ratings yet
NPTEL Assignment 0
15 pages
RL Unit 2
No ratings yet
RL Unit 2
11 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
3 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Chapter4 - Heuristic Search
No ratings yet
Chapter4 - Heuristic Search
18 pages
r20 - Aiml (CSM) Syllabus
No ratings yet
r20 - Aiml (CSM) Syllabus
175 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Assignment 9
No ratings yet
Assignment 9
4 pages
Exam Killer
100% (1)
Exam Killer
246 pages
PA12
100% (2)
PA12
3 pages
SS ZG568 EC 2M SECOND SEM 2020 2021 Solution 1617600765956
No ratings yet
SS ZG568 EC 2M SECOND SEM 2020 2021 Solution 1617600765956
9 pages
Instance Based Learning
100% (1)
Instance Based Learning
27 pages
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
3 pages
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
100% (1)
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
4 pages
BCA-106 (OMT) Practical Assignment1572492947753
0% (1)
BCA-106 (OMT) Practical Assignment1572492947753
18 pages
Process and Candidate Requirements
No ratings yet
Process and Candidate Requirements
1 page
Bosch Sample Question Paper
No ratings yet
Bosch Sample Question Paper
25 pages
Solutions To Reinforcement Learning by Sutton Chapter 5 r3
No ratings yet
Solutions To Reinforcement Learning by Sutton Chapter 5 r3
9 pages
Design and Analysis of Algorithm: Lab File
No ratings yet
Design and Analysis of Algorithm: Lab File
58 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Final Written Exam Edit 3.3
No ratings yet
Final Written Exam Edit 3.3
13 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
MGT201 4 Mid Terms Solved by Hafiz Salman - 2 PDF
No ratings yet
MGT201 4 Mid Terms Solved by Hafiz Salman - 2 PDF
47 pages
ML Ass 2
No ratings yet
ML Ass 2
6 pages
Deep Learning For Beginners Mock Exam PDF
No ratings yet
Deep Learning For Beginners Mock Exam PDF
15 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
ML Assignment 3 Nptel 2019
No ratings yet
ML Assignment 3 Nptel 2019
26 pages
Amcat Compulsory Reasoning
No ratings yet
Amcat Compulsory Reasoning
50 pages
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
2 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
SN Week7 PDF
No ratings yet
SN Week7 PDF
5 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
No ratings yet
Total Marks (15 Qns 1 Mark 15 Marks) : Business Intelligence and Analytics Assignment Week 1
29 pages
Cs230exam Spr18 Soln PDF
100% (1)
Cs230exam Spr18 Soln PDF
45 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Capgemini Previous Papers
No ratings yet
Capgemini Previous Papers
47 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
MCQ All Unit
No ratings yet
MCQ All Unit
35 pages
Question Bank
No ratings yet
Question Bank
5 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
100% (1)
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
5 pages
DRL_Homework_1
No ratings yet
DRL_Homework_1
4 pages
cs747 A2020 Quizzes PDF
No ratings yet
cs747 A2020 Quizzes PDF
5 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
2 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
Basic Probability
No ratings yet
Basic Probability
2 pages
University Departments
No ratings yet
University Departments
7 pages
Score-Based Continuous-Time Discrete Diffusion Models
No ratings yet
Score-Based Continuous-Time Discrete Diffusion Models
16 pages
2.markov Chaains
No ratings yet
2.markov Chaains
17 pages
A Hidden Markov Model Approach To Musical Beat Tracking
No ratings yet
A Hidden Markov Model Approach To Musical Beat Tracking
4 pages
Unifying Count-Based Exploration and Intrinsic Motivation: ON Tezuma S Evenge
No ratings yet
Unifying Count-Based Exploration and Intrinsic Motivation: ON Tezuma S Evenge
26 pages
Instant Download (Ebook) Probability with STEM Applications 3ed by Devore J. PDF All Chapters
100% (1)
Instant Download (Ebook) Probability with STEM Applications 3ed by Devore J. PDF All Chapters
77 pages
Abstracts 21
No ratings yet
Abstracts 21
10 pages
Markov Chains: 4.1 Stochastic Processes
No ratings yet
Markov Chains: 4.1 Stochastic Processes
42 pages
MA8402 - PQT - 2 Marks With Answers
No ratings yet
MA8402 - PQT - 2 Marks With Answers
13 pages
Chapter 2 Literature Review: Describes That
No ratings yet
Chapter 2 Literature Review: Describes That
26 pages
Page 1 of 92
No ratings yet
Page 1 of 92
92 pages
CS2A Masterclass IAI
No ratings yet
CS2A Masterclass IAI
65 pages
SYSC 5503 Course-Outline - 2024
No ratings yet
SYSC 5503 Course-Outline - 2024
6 pages
2 Aimlsyll
No ratings yet
2 Aimlsyll
93 pages
IS 7118 Unit-6 HMM
No ratings yet
IS 7118 Unit-6 HMM
78 pages
Stochastic modelling for systems biology Second Edition Wilkinson pdf download
100% (1)
Stochastic modelling for systems biology Second Edition Wilkinson pdf download
74 pages
Stochastic Model For Blood Pressure Analysis
No ratings yet
Stochastic Model For Blood Pressure Analysis
9 pages
Overview of Analytical Power System Reliability Assessment Techniques
No ratings yet
Overview of Analytical Power System Reliability Assessment Techniques
14 pages
Multi-State Transition Models With Actuarial Applications Daniel
No ratings yet
Multi-State Transition Models With Actuarial Applications Daniel
23 pages
Get (Ebook) Industrial Networks and Intelligent Systems: 8th EAI International Conference, INISCOM 2022, Virtual Event, April 21–22, 2022, Proceedings (Lecture ... and Telecommunications Engineering, 444) by Nguyen-Son Vo (editor), Quoc-Tuan Vien (editor), Dac-Binh Ha (editor) ISBN 9783031088773, 3031088778 free all chapters
100% (6)
Get (Ebook) Industrial Networks and Intelligent Systems: 8th EAI International Conference, INISCOM 2022, Virtual Event, April 21–22, 2022, Proceedings (Lecture ... and Telecommunications Engineering, 444) by Nguyen-Son Vo (editor), Quoc-Tuan Vien (editor), Dac-Binh Ha (editor) ISBN 9783031088773, 3031088778 free all chapters
81 pages
Petrovsky, N. (2017) - Solving The Diamond-Mortensen-Pissarides Model Accurately
No ratings yet
Petrovsky, N. (2017) - Solving The Diamond-Mortensen-Pissarides Model Accurately
40 pages
A Backward Monte Carlo Approach To Exotic Option Pricing: Cambridge University Press 2017
No ratings yet
A Backward Monte Carlo Approach To Exotic Option Pricing: Cambridge University Press 2017
42 pages
Fundamentals of queuing theory 4th edition Edition Gross instant download
100% (1)
Fundamentals of queuing theory 4th edition Edition Gross instant download
61 pages
Verona Et Al-2018-Journal of Internet Services and Applications
No ratings yet
Verona Et Al-2018-Journal of Internet Services and Applications
16 pages
Download Complete Stochastic Analysis and Diffusion Processes 1st Edition Gopinath Kallianpur PDF for All Chapters
No ratings yet
Download Complete Stochastic Analysis and Diffusion Processes 1st Edition Gopinath Kallianpur PDF for All Chapters
67 pages
Des Prediction
No ratings yet
Des Prediction
15 pages