SARSA Reinforcement Learning Algorithm

SARSA (State-action-reward-state-action) is an on-policy reinforcement learning algorithm used to train a Markov decision process model by updating the policy based on actions taken. It utilizes a Q-table to store state-action estimates and employs an epsilon-greedy policy to balance exploration and exploitation during learning. Unlike Q-learning, which is off-policy and selects actions based on the highest Q-value, SARSA updates its Q-values based on the actions taken under the current policy.

Uploaded by

THARUN KUMAR M 21AD069

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

SARSA Reinforcement Learning Algorithm

Uploaded by

THARUN KUMAR M 21AD069

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

SARSA Reinforcement

Learning Algorithm: A Guide

State-action-reward-state-action (SARSA) is an on-policy reinforcement learning
algorithm used to teach a new Markov decision process policy. Learn how it works
and how to code it.

State-action-reward-state-action (SARSA) is an on-policy

algorithm designed to teach a machine learning model a
new Markov decision process policy in order to
solve reinforcement learning challenges. It’s an algorithm where,
in the current state (S), an action (A) is taken and the agent gets
a reward (R), and ends up in the next state (S1), and takes action
(A1) in S1. Therefore, the tuple (S, A, R, S1, A1) stands for the
acronym SARSA.

It’s called an on-policy algorithm because it updates the policy

based on actions taken.

WHAT IS SARSA?
SARSA is an on-policy algorithm used in reinforcement learning
to train a Markov decision process model on a new policy. It’s an
algorithm where, in the current state (S), an action (A) is taken
and the agent gets a reward (R), and ends up in the next state
(S1), and takes action (A1) in S1, or in other words, the tuple S,
A, R, S1, A1.

SARSA Algorithm
The algorithm for SARSA is a little bit different from Q-learning.
In the SARSA algorithm, the Q-value is updated taking into
account the action, A1, performed in the state, S1. In Q-learning,
the action with the highest Q-value in the next state, S1, is used
to update the Q-table.

A video tutorial on how SARSA works in machine learning. | Video: Pankaj Porwal.

How Does the SARSA Algorithm

Work?
The SARSA algorithm works by carrying out actions based on
rewards received from previous actions. To do this, SARSA
stores a table of state (S)-action (A) estimate pairs for each Q-
value. This table is known as a Q-table, while the state-action
pairs are denoted as Q(S, A).

The SARSA process starts by initializing Q(S, A) to arbitrary

values. In this step, the initial current state (S) is set, and the
initial action (A) is selected by using an epsilon-greedy algorithm
policy based on current Q-values. An epsilon-greedy policy
balances the use of exploitation and exploration methods in the
learning process to select the action with the highest estimated
reward.

Exploitation involves using already known, estimated values to

get more previously earned rewards in the learning process.
Exploration involves attempting to find new knowledge on
actions, which may result in short-term, sub-optimal actions
during learning but may yield long-term benefits to find the best
possible action and reward.

From here, the selected action is taken, and the reward (R) and
next state (S1) are observed. Q(S, A) is then updated, and the
next action (A1) is selected based on the updated Q-values.
Action-value estimates of a state are also updated for each
current action-state pair present, which estimates the value of
receiving a reward for taking a given action.

The above steps of R through A1 are repeated until the

algorithm’s given episode ends, which describes the sequence of
states, actions and rewards taken until the final (terminal) state
is reached. State, action and reward experiences in the SARSA
process are used to update Q(S, A) values for each iteration.

Find out who's hiring.

See all Data + Analytics jobs at top tech companies & startups
VIEW 2881 JOBS

SARSA vs. Q-learning

The main difference between SARSA and Q-learning is that
SARSA is an on-policy learning algorithm, while Q-learning is an
off-policy learning algorithm.

In reinforcement learning, two different policies are also used for

active agents: a behavior policy and a target policy. A behavior
policy is used to decide actions in a given state (what behavior
the agent is currently using to interact with its environment),
while a target policy is used to learn about desired actions and
what rewards are received (the ideal policy the agent seeks to
use to interact with its environment).

If an algorithm’s behavior policy matches its target policy, this

means it is an on-policy algorithm. If these policies in an
algorithm don’t match, then it is an off-policy algorithm.

SARSA operates by choosing an action following the current

epsilon-greedy policy and updates its Q-values accordingly. On-
policy algorithms like SARSA select random actions where non-
greedy actions have some probability of being selected,
providing a balance between exploitation and exploration
techniques. Since SARSA Q-values are generally learned using
the same epsilon-greedy policy for behavior and target, it
classifies as on-policy.

Q-learning, unlike SARSA, tends to choose the greedy action in

sequence. A greedy action is one that gives the maximum Q-
value for the state, that is, it follows an optimal policy. Off-policy
algorithms like Q-learning learn a target policy regardless of
what actions are selected from exploration. Since Q-learning
uses greedy actions, and can evaluate one behavior policy while
following a separate target policy, it classifies as off-policy.

SARSA algorithm is a slight variation of the popular Q-Learning algorithm.

For a learning agent in any Reinforcement Learning algorithm it’s policy can
be of two types:-

1. On Policy: In this, the learning agent learns the value function according
to the current action derived from the policy currently being used.
2. Off Policy: In this, the learning agent learns the value function according
to the action derived from another policy.
Q-Learning technique is an Off Policy technique and uses the greedy
approach to learn the Q-value. SARSA technique, on the other hand, is
an On Policy and uses the action performed by the current policy to learn
the Q-value.
This difference is visible in the difference of the update statements for each
technique:-

Here, the update equation for SARSA depends on the current state, current
action, reward obtained, next state and next action. This observation lead to
the naming of the learning technique as SARSA stands for State Action
Reward State Action which symbolizes the tuple (s, a, r, s’, a’).

Andriy Burkov - The Hundred-Page Machine Learning Book (2019, Andriy Burkov) PDF
100% (3)
Andriy Burkov - The Hundred-Page Machine Learning Book (2019, Andriy Burkov) PDF
152 pages
Generative AI For Beginners1
100% (2)
Generative AI For Beginners1
85 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
Module II-3
No ratings yet
Module II-3
21 pages
Lec 17 SARSA Expected SARSA Q Learning
No ratings yet
Lec 17 SARSA Expected SARSA Q Learning
4 pages
ML Unit 5 (ChatGPT)
No ratings yet
ML Unit 5 (ChatGPT)
17 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Sec 12
No ratings yet
Sec 12
5 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Report p1
No ratings yet
Report p1
7 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
Markov Decision Process
No ratings yet
Markov Decision Process
3 pages
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
No ratings yet
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
5 pages
Rule-based Reinforcement Learning augmented by External Knowledge
No ratings yet
Rule-based Reinforcement Learning augmented by External Knowledge
7 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Q Learning
No ratings yet
Q Learning
9 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Lec 09
No ratings yet
Lec 09
26 pages
Homework6 - Nguyễn Minh Hiếu
No ratings yet
Homework6 - Nguyễn Minh Hiếu
2 pages
Monte Carlo 1
No ratings yet
Monte Carlo 1
245 pages
RL PDF
No ratings yet
RL PDF
4 pages
M6 practice quiz My Path Assessment Editor _ WorldQuant University
No ratings yet
M6 practice quiz My Path Assessment Editor _ WorldQuant University
11 pages
unit 3 ai
No ratings yet
unit 3 ai
5 pages
CS6700 Reinforcement Learning PA1 Jan May 2024
No ratings yet
CS6700 Reinforcement Learning PA1 Jan May 2024
4 pages
Lecture6 Convergence of MDPs
No ratings yet
Lecture6 Convergence of MDPs
23 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Homework 6
No ratings yet
Homework 6
2 pages
RL Quiz 22-3-24
No ratings yet
RL Quiz 22-3-24
8 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Temporal-Difference (TD) Learning: Basics
No ratings yet
Temporal-Difference (TD) Learning: Basics
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
p1 Report
No ratings yet
p1 Report
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Unit 1
No ratings yet
Unit 1
18 pages
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
No ratings yet
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
57 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Q_Networks[1]-31-50
No ratings yet
Q_Networks[1]-31-50
20 pages
Final Course Project - Reinforcement Learning Adaptive Traffic Control System Using N-Step SARSA
No ratings yet
Final Course Project - Reinforcement Learning Adaptive Traffic Control System Using N-Step SARSA
12 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
unit-5
No ratings yet
unit-5
65 pages
Problem Set 1
No ratings yet
Problem Set 1
15 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforcement Learning With MATLAB: Understanding Training and Deployment
No ratings yet
Reinforcement Learning With MATLAB: Understanding Training and Deployment
39 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
5.4-Reinforcement learning-part3-Q-Learning
No ratings yet
5.4-Reinforcement learning-part3-Q-Learning
18 pages
Course 2_ Sample Based Learning Methods Learning Objectives
No ratings yet
Course 2_ Sample Based Learning Methods Learning Objectives
3 pages
Sections
No ratings yet
Sections
76 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Final Project Report - Kunal - Sir
No ratings yet
Final Project Report - Kunal - Sir
32 pages
React - Synergizing Reasoning and Acting in Language Models
No ratings yet
React - Synergizing Reasoning and Acting in Language Models
33 pages
Example Questions For The Exam
No ratings yet
Example Questions For The Exam
6 pages
RL and ObC Lecture 1
No ratings yet
RL and ObC Lecture 1
34 pages
ANN Unit 1
No ratings yet
ANN Unit 1
77 pages
ML VN Unit1 1
No ratings yet
ML VN Unit1 1
27 pages
Week 8 Final
No ratings yet
Week 8 Final
5 pages
Get Intelligent Optimal Adaptive Control For Mechatronic Systems 1st Edition Marcin Szuster Free All Chapters
100% (6)
Get Intelligent Optimal Adaptive Control For Mechatronic Systems 1st Edition Marcin Szuster Free All Chapters
52 pages
Automating Privilege Escalation With Deep Reinforcement Learning
No ratings yet
Automating Privilege Escalation With Deep Reinforcement Learning
12 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Control of A Quadrotor With Reinforcement Learning: Jemin Hwangbo, Inkyu Sa, Roland Siegwart and Marco Hutter
No ratings yet
Control of A Quadrotor With Reinforcement Learning: Jemin Hwangbo, Inkyu Sa, Roland Siegwart and Marco Hutter
8 pages
practical-applications-of-artificial-intelligence-and-value-inve-2019
No ratings yet
practical-applications-of-artificial-intelligence-and-value-inve-2019
5 pages
Knowledge-Based Systems TSF Trading
No ratings yet
Knowledge-Based Systems TSF Trading
10 pages
Ai DL ML
No ratings yet
Ai DL ML
151 pages
A Value-Based Deep Reinforcement Learning Model With Human Expertise in Optimal Treatment of Sepsis
No ratings yet
A Value-Based Deep Reinforcement Learning Model With Human Expertise in Optimal Treatment of Sepsis
12 pages
Module-2 Intelligent Agents
No ratings yet
Module-2 Intelligent Agents
25 pages
Deep Reinforcement Learning in Action 1st Edition Alexander Zai download
100% (2)
Deep Reinforcement Learning in Action 1st Edition Alexander Zai download
62 pages
Etrospective Earning From Nteractions: Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu & Yoav Artzi
No ratings yet
Etrospective Earning From Nteractions: Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu & Yoav Artzi
25 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
Fine-Tuning Language Models For Factuality
No ratings yet
Fine-Tuning Language Models For Factuality
16 pages
Complex System Report Ujwal Bhattarai
No ratings yet
Complex System Report Ujwal Bhattarai
19 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
74 pages
Advances of Machine Learning in Materials Science: Ideas and Techniques
No ratings yet
Advances of Machine Learning in Materials Science: Ideas and Techniques
40 pages
Machine Learning: A Comprehensive Overview
No ratings yet
Machine Learning: A Comprehensive Overview
3 pages
CMRIT B.tech Minor Honors Courses Regulations Syllabus
No ratings yet
CMRIT B.tech Minor Honors Courses Regulations Syllabus
75 pages
rl-3
No ratings yet
rl-3
31 pages
A Hybrid CNN-LSTM Architecture For Path Planning of Mobile Robots in Unknow Environments
No ratings yet
A Hybrid CNN-LSTM Architecture For Path Planning of Mobile Robots in Unknow Environments
5 pages