Question 1) Search 10 Marks: Final Term Examination Spring-2020
Question 1) Search 10 Marks: Final Term Examination Spring-2020
Question 1) Search 10 Marks: Final Term Examination Spring-2020
INSTRUCTIONS:
This exam is “openbook,” which means you are permitted to use any materials handed out
in class, your own notes from the course, the text book, and anything on the your VLE
The exam must be taken completely alone. Showing it or discussing it with anyone is
forbidden.
You may not consult with any other person regarding the exam. You may not check your
exam answers with any person. You may not discuss any of the materials or concepts in
this class with any other person.
No discussion of the exam or anything about the exam is allowed.
The answers must be typed and returned in a word file, no screenshots, images will be
accepted
The agent wants to find a path between the starting cell and the goal cell in as low of a cost as
possible. However, the grid has special cells marked as ‘W’ and ‘X’, if your agent moves to the
‘W’ cell, then the agent will be teleported a cell (randomly selected) next to the goal state, but
the cost will be ‘+7’. If your agent moves to the ‘X’ cell then your agent will be teleported to the
starting cell, however, the cost will be ‘-7’.
For example, take a look at a 7 x 8 grid below:
W
X G
W
A
Here the agent is at the position (0,0), and the goal is at the position (3,7). If the agent moves to
the ‘W’ cell at (4,2), then the agent will be transported to either (3,6) or (2,7) or (4,7).
For the environment design a Reinforcement Learning Agent (Pacman), the objective of the
agent is to figure out the best actions the agent can take at any given state.
Use Q-Learning to figure out the best action at every state. Show your working for every
iteration of Q-Learning.
Question 3) MDP 10 marks
The Cliff Walking environment is a gridworld with a discrete state space and discrete action space. The
agent starts at grid cell S. The agent can move to the four neighboring cells by taking actions Up, Down,
Left or Right. The Up and Down actions are deterministic, whereas, the Left and Right actions are
stochastic, with a probability of 0.7 to be completed and a probability of 0.3 of the agent ending up in the
perpendicular direction. Trying to move out of the boundary results in staying in the same location. So,
for example, trying to move left when at a cell on the leftmost column results in no movement at all and
the agent remains in the same location. The agent receives -1 reward per step in most states, and -100
reward when falling off of the cliff. This is an episodic task; termination occurs when the agent reaches
the goal grid cell G. Falling off of the cliff results in resetting to the start state, without termination.
S The Cliff G