Immediate Reward Discount
Immediate Reward Discount
Immediate Reward Discount
discount
the maximum future reward coming to the agent if it takes action a in state s.
However, this value is discounted by 'γ' to take
into account that it isn’t ideal for the agent to wait
the learning rate during the updating
forever for a future reward – it is best for the agent
delayed reward
to aim for the maximum award in the least period
of time.