0% found this document useful (0 votes)
125 views5 pages

Question 1) Search 10 Marks: Final Term Examination Spring-2020

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 5

Final Term Examination Spring-2020

Subject Artificial Intelligence EDP / Section Code


Facilitator Dr. Aarij Mahmood Hussaan Max. Marks 40
Exam Date 24-01-2021 Start Time 09:00 AM
Submission Deadline 30-01-2021 End Time 10:00 AM

INSTRUCTIONS:
 This exam is “openbook,” which means you are permitted to use any materials handed out
in class, your own notes from the course, the text book, and anything on the your VLE
 The exam must be taken completely alone. Showing it or discussing it with anyone is
forbidden.
 You may not consult with any other person regarding the exam. You may not check your
exam answers with any person. You may not discuss any of the materials or concepts in
this class with any other person.
 No discussion of the exam or anything about the exam is allowed.
 The answers must be typed and returned in a word file, no screenshots, images will be
accepted

Question 1) Search 10 marks


Consider a grid with ‘n’ rows and ‘m’ columns. The agent will start at a particular cell and its job
will be to move to a goal cell. The agent will have the following actions available to it:

I. Move up, the cost of this action is +3


II. Move down, the cost of this action is -2
III. Move Right, the cost of this action is -1.5
IV. Move Left, the cost of this action is +0.5

The agent wants to find a path between the starting cell and the goal cell in as low of a cost as
possible. However, the grid has special cells marked as ‘W’ and ‘X’, if your agent moves to the
‘W’ cell, then the agent will be teleported a cell (randomly selected) next to the goal state, but
the cost will be ‘+7’. If your agent moves to the ‘X’ cell then your agent will be teleported to the
starting cell, however, the cost will be ‘-7’.
For example, take a look at a 7 x 8 grid below:

W
X G

W
A

Here the agent is at the position (0,0), and the goal is at the position (3,7). If the agent moves to
the ‘W’ cell at (4,2), then the agent will be transported to either (3,6) or (2,7) or (4,7).

Given the scenario above, answer the following questions:

I. What is the PEAS description of the agent?


II. Describe its task environment.
III. Model this problem as a search problem
IV. What is the branching factor of this problem?
V. Can we use BFS or DFS on this problem? Explain your answer.
VI. Given the scenario above, write down the path that will be discovered by UCS?
VII. If we can run BFS on this problem, will BFS and UCS give the same answer? Explain
VIII. Will Euclidean distance be an admissible heuristic? Will Manhattan distance be an
admissible heuristic? Prove your answer.
IX. Come up with your own admissible heuristic, and solve the problem above using your
heuristic and A* search.
Question 2) Reinforcement Learning 10 marks
Consider the following environment of PacMan

For the environment design a Reinforcement Learning Agent (Pacman), the objective of the
agent is to figure out the best actions the agent can take at any given state.

The rules of the game are as follows:

 Every move has a reward of -1


 Consuming a food pellet will have a reward of +10
 If pacman collides with a ghost, then the reward will be -500
 If the pacman has eaten all the food pellets without colliding with the ghosts, then the
reward will be +500
 Assume a discount factor of 0.8
 The action noise is 0.3 (the consequences are the same as in the grid world example)
 The environment is static i.e. no ghosts are moving
 The actions for pacman are Up, Down, North and Right
 You can cross the walls

Use Q-Learning to figure out the best action at every state. Show your working for every
iteration of Q-Learning.
Question 3) MDP 10 marks
The Cliff Walking environment is a gridworld with a discrete state space and discrete action space. The
agent starts at grid cell S. The agent can move to the four neighboring cells by taking actions Up, Down,
Left or Right. The Up and Down actions are deterministic, whereas, the Left and Right actions are
stochastic, with a probability of 0.7 to be completed and a probability of 0.3 of the agent ending up in the
perpendicular direction. Trying to move out of the boundary results in staying in the same location. So,
for example, trying to move left when at a cell on the leftmost column results in no movement at all and
the agent remains in the same location. The agent receives -1 reward per step in most states, and -100
reward when falling off of the cliff. This is an episodic task; termination occurs when the agent reaches
the goal grid cell G. Falling off of the cliff results in resetting to the start state, without termination.

S The Cliff G

For the problem described above, answer the following question:

I. Formulate the problem as a MDP


II. Use policy iteration to find the optimal policy
III. Suppose that you are not given the transition or the reward function, suppose that you observe
the following (state, action, reward, state’) tuples, in episode 1
Episode 1:
( (0,0), Up, -1, (1,0) )
( (0,1), Down, -1, (0,0) )
( (0,0), Right, -1, (0,0) )
( (0,0), Left, -1, (0,0) )
( (0,0), Up, -1, (1,0) )
( (0,1), Right, -1, (1,1) )
( (1,1), Right, -1, (1,2) )
( (1,2), Right, -1, (1,3) )
( (1,3), Right, -1, (1,4) )
( (1,4), Down, -1, (0,4) )

Calculate the TD estimates of all the states in Episode 1


IV. Use the MDP code given to you in your LAB and implement this scenario.
Question 4) Games 10 marks
Consider the game of Monopoly https://en.wikipedia.org/wiki/Monopoly. For Monopoly answer the
following questions:

I. Model the Monopoly game as an adversarial search problem


II. Would you use minimax or expectimax search to solve LUDO? Explain your reasoning
III. What will be the complexity of adversarial search for LUDO?
IV. Draw the search graph of expectimax search for a turn for each player, assuming there are four
players. One is controlled by you.
V. Will it be feasible to do a vanilla search for solving Monopoly? Why or why not?
VI. Will you design an optimistic agent or a pessimistic agent for Monopoly? explain your reasoning
VII. How would you speed up the decision-making process of your algorithm? Explain the
improvements you would make and why they will be helpful

You might also like