CPSC-57200 Artificial Intelligence 2
Homework #4
Introduction
For this assignment, you will use the Python programming language to simulate an agent in a
grid environment, based on a policy found by the policy iteration algorithm.
Environment Description (based on AIMA textbook - see pg. 563)
The environment consists of a grid, with obstacles placed randomly within it and two terminating
states at predefined locations: one positive with a reward of +1, and one negative with a reward
of -1. The agent can move up, down, left, or right, but if the movement results into a collision
with a wall (obstacle or grid boundary), then no movement occurs. Otherwise, the agent moves
into the intended direction with 80% chance. In 20% of cases, the agent moves at right angles to
the intended direction.
Requirements
1) Use the provided base code to implement an environment simulator for this environment, such
that the specific geography of the environment is easily altered. In particular, you need to create
two functions:
def getMdpEnv(x_dim, y_dim, pos_terminal, neg_terminal, block_prob):
""" Generates a GridMDP of given dimension x_dim by y_dim, with random
obstacles placed with uniform probability block_prob and with two terminating
states with rewards +1/-1 for pos/neg states,respectively. The terminating
state locations are given by tuples pos_terminal and neg_terminal."""
def simulate_agent(env, pi, U):
""" Simulate the agent with the found policy pi on environment env from
each possible starting state for 1000 iterations. Keep track of rewards.
Afterwards, display the average reward received from each starting state. If
agent doesn't reach terminal state after 100 moves, then end."""
2) Run the policy iteration algorithm on random square sized environments of side sizes of 2, 4,
8, 16, 32, 64, and 128. Run the algorithm 10 iterations for each size and compute the average
execution time. Repeat for different values of block_prob (0, 0.25, and 0.5). Include the results in
your report and answer the question of how does the run time for policy iteration vary with the
size of the environment?
3) Simulate an agent that uses policy iteration on a random 5x5 grid environment with positive
and terminating states at (4,2) and (4,3), respectively. Measure its performance in the
environment simulator from all possible starting states (run the simulate_agent function).
Compare the average total reward received per run with the utility of the state, as determined by
your algorithm. Make sure to visualize the grid itself, the utilities determine by policy iteration,
and the computed average total rewards received per run (use a seaborn heatmap in Python), like
the image below:
4) Write a report detailing your implementation and execution. Visualize and discuss the results.
Attach your code along with the PDF of the report.