Deep Reinforcement Learning Handout v2.0.docx (1)
Deep Reinforcement Learning Handout v2.0.docx (1)
Deep Reinforcement Learning Handout v2.0.docx (1)
Course No(s) 4
Credit Units
Version 1.0
Course Objectives
C01: Understand
a. the conceptual, mathematical foundations of deep reinforcement learning b.
various classic & state-of-the-art Deep Reinforcement Learning algorithms
C02: Implement and Evaluate the deep reinforcement learning solutions to various problems
like planning, control, and decision making in various domains
C03: Provide conceptual, mathematical, and practical exposure on DRL
c. to understand the recent developments in deep reinforcement learning and
d. to enable modeling new problems as DRL problems.
Text Book(s)
T1 Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto,
Second Ed. , MIT Press
1. Introduction: Introducing RL
1.1. Introduction to Reinforcement Learning (RL); Examples; Elements of
Reinforcement Learning ( Policy, Reward, Value, Model of the
environment) & their characteristics; Example: RL for Tic-Tac-Toe;
Historical Background;
1.2. Multi-armed Bandit Problem - Motivation and Problem Statement;
Incremental solution to the stationary & non-stationary MAB problems;
Exploration vs. Exploitation tradeoff; Bandit Gradient Algorithm as
Stochastic Gradient Ascent; Associative Search
2. MDP: Framework
2.1. (Finite) Markov Decision Processes: Modelling Agent-Environment
interaction using MDP; Examples; Discussion on Goals ,
2.2. Rewards & Returns; Policy and Value Functions;
2.3. Bellman Equation for value functions;
2.4. Optimal Policy and Optimal Value functions;
3. Approaches to Solving Reinforcement Problems
3.1. Dynamic Programming Solution (Policy Iteration; Value Iteration;
Generalized policy iteration; Efficiency of Dynamic Programming )
3.2. Monte Carlo (MC) Methods (MC prediction, MC control, incremental MC.)
[ Mid-Semester Exam ]
7. Model-Based Deep RL
7.1. Upper-Confidence-Bound Action Selection,
7.2. Monte-Carlo tree search,
7.3. AlphaGo Zero, MuZero, PlaNet
8. Imitation Learning
8.1. Introduction to Imitation Learning;
8.2. Imitation Learning Via Supervised Learning, Behavior Cloning, Inverse
Reinforcement Learning,
8.3. GAIL; Dataset augmentation, DAGGER;
8.4. Applications in autonomous Driving, Game Playing, and Robotics;
Learning Outcomes
After successfully completing this course, the students will be able to
LO-1: understand the fundamental concepts of reinforcement learning (RL), and algorithms
and apply them to solving problems, including control, decision-making, and
planning.
LO-2: Implement DRL algorithms, and handle challenges in training due to stability and
convergence
LO-3: evaluate the performance of DRL algorithms, including metrics such as sample
efficiency, robustness, and generalization.
LO-4: understand the challenges and opportunities of applying DRL to real-world problems
& model real-life problems
Part B: Learning Plan
Academic Term
Course No
Lead Instructor
Mid-Semester Exam
3 Deep Q Network NA
4 REINFORCE NA
5 Imitation Learning NA
Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duration Weight Schedule Remarks
Note:
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 8
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 16)
Contact sessions: Students should attend the online lectures as per the schedule
provided on the Elearn portal.
Evaluation Guidelines:
1 EC-1 consists of two Quizzes. Students will attempt them through the course pages on
the Elearn portal. Announcements will be made on the portal, in a timely manner.
2 EC-2 consists of either one or two Assignments. Students will attempt them through
the course pages on the Elearn portal. Announcements will be made on the portal,
in a timely manner.
3 For Closed Book tests: No books or reference material of any kind will be permitted. 4
For Open Book exams: Use of books and any printed / written reference material (filed
or bound) is permitted. However, loose sheets of paper will not be allowed. Use of
calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed.
Exchange of any material is not allowed.
5 If a student is unable to appear for the Regular Test/Exam due to genuine exigencies,
the student should follow the procedure to apply for the Make-Up Test/Exam which
will be made available on the Elearn portal. The Make-Up Test/Exam will be
conducted only at selected exam centres on the dates to be announced later.
Plagiarism Policy: