SARSA Reinforcement Learning Algorithm
SARSA Reinforcement Learning Algorithm
WHAT IS SARSA?
SARSA is an on-policy algorithm used in reinforcement learning
to train a Markov decision process model on a new policy. It’s an
algorithm where, in the current state (S), an action (A) is taken
and the agent gets a reward (R), and ends up in the next state
(S1), and takes action (A1) in S1, or in other words, the tuple S,
A, R, S1, A1.
SARSA Algorithm
The algorithm for SARSA is a little bit different from Q-learning.
In the SARSA algorithm, the Q-value is updated taking into
account the action, A1, performed in the state, S1. In Q-learning,
the action with the highest Q-value in the next state, S1, is used
to update the Q-table.
A video tutorial on how SARSA works in machine learning. | Video: Pankaj Porwal.
From here, the selected action is taken, and the reward (R) and
next state (S1) are observed. Q(S, A) is then updated, and the
next action (A1) is selected based on the updated Q-values.
Action-value estimates of a state are also updated for each
current action-state pair present, which estimates the value of
receiving a reward for taking a given action.
1. On Policy: In this, the learning agent learns the value function according
to the current action derived from the policy currently being used.
2. Off Policy: In this, the learning agent learns the value function according
to the action derived from another policy.
Q-Learning technique is an Off Policy technique and uses the greedy
approach to learn the Q-value. SARSA technique, on the other hand, is
an On Policy and uses the action performed by the current policy to learn
the Q-value.
This difference is visible in the difference of the update statements for each
technique:-
1.
Here, the update equation for SARSA depends on the current state, current
action, reward obtained, next state and next action. This observation lead to
the naming of the learning technique as SARSA stands for State Action
Reward State Action which symbolizes the tuple (s, a, r, s’, a’).