Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

CPSC-57200 Artificial Intelligence 2 Homework #4 Introduction For this assignment, you will use the Python programming language to simulate an agent in a grid environment, based on a policy found by...

1 answer below »

CPSC-57200 Artificial Intelligence 2
Homework #4
Introduction
For this assignment, you will use the Python programming language to simulate an agent in a
grid environment, based on a policy found by the policy iteration algorithm.
Environment Description (based on AIMA textbook - see pg. 563)
The environment consists of a grid, with obstacles placed randomly within it and two terminating
states at predefined locations: one positive with a reward of +1, and one negative with a reward
of -1. The agent can move up, down, left, or right, but if the movement results into a collision
with a wall (obstacle or grid boundary), then no movement occurs. Otherwise, the agent moves
into the intended direction with 80% chance. In 20% of cases, the agent moves at right angles to
the intended direction.
Requirements
1) Use the provided base code to implement an environment simulator for this environment, such
that the specific geography of the environment is easily altered. In particular, you need to create
two functions:
def getMdpEnv(x_dim, y_dim, pos_terminal, neg_terminal, block_prob):
""" Generates a GridMDP of given dimension x_dim by y_dim, with random
obstacles placed with uniform probability block_prob and with two terminating
states with rewards +1/-1 for pos/neg states,respectively. The terminating
state locations are given by tuples pos_terminal and neg_terminal."""
def simulate_agent(env, pi, U):
""" Simulate the agent with the found policy pi on environment env from
each possible starting state for 1000 iterations. Keep track of rewards.
Afterwards, display the average reward received from each starting state. If
agent doesn't reach terminal state after 100 moves, then end."""
2) Run the policy iteration algorithm on random square sized environments of side sizes of 2, 4,
8, 16, 32, 64, and 128. Run the algorithm 10 iterations for each size and compute the average
execution time. Repeat for different values of block_prob (0, 0.25, and 0.5). Include the results in
your report and answer the question of how does the run time for policy iteration vary with the
size of the environment?
3) Simulate an agent that uses policy iteration on a random 5x5 grid environment with positive
and terminating states at (4,2) and (4,3), respectively. Measure its performance in the
environment simulator from all possible starting states (run the simulate_agent function).
Compare the average total reward received per run with the utility of the state, as determined by
your algorithm. Make sure to visualize the grid itself, the utilities determine by policy iteration,
and the computed average total rewards received per run (use a seaborn heatmap in Python), like
the image below:
4) Write a report detailing your implementation and execution. Visualize and discuss the results.
Attach your code along with the PDF of the report.
Answered 2 days After Apr 18, 2022

Solution

Sandeep Kumar answered on Apr 20 2022
103 Votes
Microsoft Word - report.docx
The implementation has been provided in the code, as for the results:
The above graph represents the simulations of an agent with three different block probabilities, the
one with highest execution time has the highest block probability of 0.5, while the others are...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here