Immediate Reward Discount

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

immediate reward

discount
the maximum future reward coming to the agent if it takes action a in state s.
However, this value is discounted by 'γ' to take
into account that it isn’t ideal for the agent to wait
the learning rate during the updating
forever for a future reward – it is best for the agent
delayed reward
to aim for the maximum award in the least period
of time.

a neural network is created which takes


the state s as its input, and then the network is
trained to output appropriate Q(s,a) values for
each action in state s.

The action a of the agent can then be chosen by


taking the action with the greatest Q(s,a) value

You might also like