0% found this document useful (0 votes)

803 views4 pages

Q-Learning in C++

The Q-Learning algorithm was proposed to optimize solutions in Markov decision processes by choosing between immediate and delayed rewards. It works by having an agent take actions in a state, receive rewards that influence the Q-value for that state-action pair, and then transition to a new state. The Q-value is updated using the formula: Q(state, action) = Reward + gamma * Max future Q-value. The algorithm iterates through episodes of state transitions until it converges on optimal actions.

Uploaded by

Guru Mekala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

803 views4 pages

Q-Learning in C++

Uploaded by

Guru Mekala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Introduction

The Q-Learning algorithm was proposed as a way to optimize solutions in Markov decision
process problems. The distinctive feature of Q-Learning is in its capacity to choose between
immediate rewards and delayed rewards. At each step of time, an agent observes the vector of
state xt, then chooses and applies an action ut. As the process moves to state xt+1, the agent
receives a reinforcement r(xt, ut). The goal of the training is to find the sequential order of
actions which maximizes the sum of the future reinforcements, thus leading to the shortest path
from start to finish.

The transition rule of Q learning is a very simple formula:

Q(state, action) = R(state, action) + gamma * Max[Q(next state, all actions)]

The gamma parameter has a range of 0 to 1 (0 <= gamma > 1), and ensures the convergence of
the sum. If gamma is closer to zero, the agent will tend to consider only immediate rewards. If
gamma is closer to one, the agent will consider future rewards with greater weight, willing to
delay the reward.

The Q-Learning algorithm goes as follows:

1. Set the gamma parameter, and environment rewards in matrix R.

2. Initialize matrix Q to zero.

3. For each episode:

Select a random initial state.

Do While the goal state hasn't been reached.

Select one among all possible actions for the current state.
Using this possible action, consider going to the next state.
Get maximum Q value for this next state based on all possible actions.
Compute: Q(state, action) = R(state, action) + gamma * Max[Q(next state, all actions)]
Set the next state as the current state.
End Do

End For

/ Author: John McCullock

// Date: 11-05-05
// Description: Q-Learning Example 1.
#include <iostream>
#include <iomanip>
#include <ctime>

using namespace std;

const int qSize = 6;

const double gamma = 0.8;
const int iterations = 10;
int initialStates[qSize] = {1, 3, 5, 2, 4, 0};

int R[qSize][qSize] = {{-1, -1, -1, -1, 0, -1},

{-1, -1, -1, 0, -1, 100},
{-1, -1, -1, 0, -1, -1},
{-1, 0, 0, -1, 0, -1},
{0, -1, -1, 0, -1, 100},
{-1, 0, -1, -1, 0, 100}};

int Q[qSize][qSize];
int currentState;

void episode(int initialState);

void chooseAnAction();
int getRandomAction(int upperBound, int lowerBound);
void initialize();
int maximum(int state, bool returnIndexOnly);
int reward(int action);

int main(){

int newState;

initialize();

//Perform learning trials starting at all initial states.

for(int j = 0; j <= (iterations - 1); j++){
for(int i = 0; i <= (qSize - 1); i++){
episode(initialStates[i]);
} // i
} // j

//Print out Q matrix.

for(int i = 0; i <= (qSize - 1); i++){
for(int j = 0; j <= (qSize - 1); j++){
cout << setw(5) << Q[i][j];
if(j < qSize - 1){
cout << ",";
}
} // j
cout << "\n";
} // i
cout << "\n";

//Perform tests, starting at all initial states.

for(int i = 0; i <= (qSize - 1); i++){
currentState = initialStates[i];
newState = 0;
do {
newState = maximum(currentState, true);
cout << currentState << ", ";
currentState = newState;
} while(currentState < 5);
cout << "5" << endl;
} // i

return 0;
}

void episode(int initialState){

currentState = initialState;

//Travel from state to state until goal state is reached.

do {
chooseAnAction();
} while(currentState == 5);

//When currentState = 5, run through the set once more to

//for convergence.
for(int i = 0; i <= (qSize - 1); i++){
chooseAnAction();
} // i
}

void chooseAnAction(){

int possibleAction;

//Randomly choose a possible action connected to the current state.

possibleAction = getRandomAction(qSize, 0);

if(R[currentState][possibleAction] >= 0){

Q[currentState][possibleAction] = reward(possibleAction);
currentState = possibleAction;
}
}

int getRandomAction(int upperBound, int lowerBound){

int action;
bool choiceIsValid = false;
int range = (upperBound - lowerBound) + 1;

//Randomly choose a possible action connected to the current state.

do {
//Get a random value between 0 and 6.
action = lowerBound + int(range * rand() / (RAND_MAX + 1.0));
if(R[currentState][action] > -1){
choiceIsValid = true;
}
} while(choiceIsValid == false);

return action;
}

void initialize(){

srand((unsigned)time(0));

for(int i = 0; i <= (qSize - 1); i++){

for(int j = 0; j <= (qSize - 1); j++){
Q[i][j] = 0;
} // j
} // i
}

int maximum(int state, bool returnIndexOnly){

// if returnIndexOnly = true, a Q matrix index is returned.
// if returnIndexOnly = false, a Q matrix element is returned.

int winner;
bool foundNewWinner;
bool done = false;

winner = 0;

do {
foundNewWinner = false;
for(int i = 0; i <= (qSize - 1); i++){
if((i < winner) || (i > winner)){ //Avoid self-
comparison.
if(Q[state][i] > Q[state][winner]){
winner = i;
foundNewWinner = true;
}
}
} // i

if(foundNewWinner == false){
done = true;
}

} while(done = false);

if(returnIndexOnly == true){
return winner;
}else{
return Q[state][winner];
}
}

int reward(int action){

return static_cast<int>(R[currentState][action] + (gamma *

maximum(action, false)));
}

12IB Mechanics 2024 Answers
No ratings yet
12IB Mechanics 2024 Answers
148 pages
(FREE PDF Sample) The Doctrine of God in African Christian Thought The Holy Trinity Theological Hermeneutics and The African Intellectual Culture 1st Edition James Henry Owino Kombo Ebooks
100% (4)
(FREE PDF Sample) The Doctrine of God in African Christian Thought The Holy Trinity Theological Hermeneutics and The African Intellectual Culture 1st Edition James Henry Owino Kombo Ebooks
84 pages
Gumbi Mode 4.0 Ultimate Grow Taller 90-Day Meal Plan 2
No ratings yet
Gumbi Mode 4.0 Ultimate Grow Taller 90-Day Meal Plan 2
85 pages
Architecture of 80286 80386 80486 Microproseccors
100% (1)
Architecture of 80286 80386 80486 Microproseccors
91 pages
2017-28 - Everett Crowley Park Restoration Assessment - Campbell
No ratings yet
2017-28 - Everett Crowley Park Restoration Assessment - Campbell
65 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Metreology Introduction
100% (1)
Metreology Introduction
55 pages
Wipro WILP Model Question Papers Set 2: Verbal Ability
No ratings yet
Wipro WILP Model Question Papers Set 2: Verbal Ability
54 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Overview
No ratings yet
Overview
36 pages
Lesson 9 - BNTC1
No ratings yet
Lesson 9 - BNTC1
44 pages
Probability & Random Process: Formulas
No ratings yet
Probability & Random Process: Formulas
10 pages
2013 13 Libman
No ratings yet
2013 13 Libman
30 pages
Living With Diabetes in Uganda Uda 1
No ratings yet
Living With Diabetes in Uganda Uda 1
12 pages
112 Q Learning N
100% (1)
112 Q Learning N
15 pages
Apr-2021-Pharmacology-Manipal
No ratings yet
Apr-2021-Pharmacology-Manipal
4 pages
BACKHOELOADER01
No ratings yet
BACKHOELOADER01
1 page
Solution: Petri Nets - Homework 1
100% (1)
Solution: Petri Nets - Homework 1
5 pages
Q Learning Java Ex2
No ratings yet
Q Learning Java Ex2
4 pages
Or Assign
No ratings yet
Or Assign
13 pages
20250605_Sebokeng_Test_2_TimeTable_and_rules (1)
No ratings yet
20250605_Sebokeng_Test_2_TimeTable_and_rules (1)
2 pages
Solutions To Exercises On Processes Synchronization
100% (1)
Solutions To Exercises On Processes Synchronization
5 pages
SW 945
No ratings yet
SW 945
10 pages
Dos Donts Spelling Rules 4
100% (1)
Dos Donts Spelling Rules 4
14 pages
Alibijaban's Not Dead - DalaganGarridoTejadaUntalan
No ratings yet
Alibijaban's Not Dead - DalaganGarridoTejadaUntalan
14 pages
Samkhya Philosophy
100% (1)
Samkhya Philosophy
10 pages
MOO in Manet
No ratings yet
MOO in Manet
13 pages
Ef200 Series: Call For Yanmar Solutions
No ratings yet
Ef200 Series: Call For Yanmar Solutions
12 pages
Data Processing Instructions
No ratings yet
Data Processing Instructions
21 pages
Bonnmotion - A Mobility Scenario Generation and Analysis Tool
No ratings yet
Bonnmotion - A Mobility Scenario Generation and Analysis Tool
10 pages
Digital Logic and Computer Design 2023
No ratings yet
Digital Logic and Computer Design 2023
2 pages
Colour Managed Workflow
No ratings yet
Colour Managed Workflow
5 pages
Python Code
100% (1)
Python Code
2 pages
Context Free Grammar
No ratings yet
Context Free Grammar
113 pages
CH-9-The Great Mughals-2023
No ratings yet
CH-9-The Great Mughals-2023
5 pages
Understatement
No ratings yet
Understatement
4 pages
Fundamentals of 6G Communications and Networking - ISAC Chapter
No ratings yet
Fundamentals of 6G Communications and Networking - ISAC Chapter
21 pages
Ai BI
No ratings yet
Ai BI
10 pages
Quiz Questions and Answers: (Review and Summary Quizzes Merged)
No ratings yet
Quiz Questions and Answers: (Review and Summary Quizzes Merged)
103 pages
5 Pointers
100% (1)
5 Pointers
18 pages
Carbon-Fiber-reinforced Polymer - Wikipedia, The Free Encyclopedia
No ratings yet
Carbon-Fiber-reinforced Polymer - Wikipedia, The Free Encyclopedia
9 pages
Marketing Project Topics
No ratings yet
Marketing Project Topics
8 pages
Eastern Philosophy and Religions
100% (1)
Eastern Philosophy and Religions
36 pages
Comparative Study Waypoint The Evaluation: Random and Gauss-Markov Models Performance of
No ratings yet
Comparative Study Waypoint The Evaluation: Random and Gauss-Markov Models Performance of
6 pages
Solar Spotlight With Led Bulbs: Model
No ratings yet
Solar Spotlight With Led Bulbs: Model
6 pages
Aesthetics:What Is Art?: Stephan T. Mayo The Problem of Philosophy
No ratings yet
Aesthetics:What Is Art?: Stephan T. Mayo The Problem of Philosophy
2 pages
M-Ary Pulse Amplitude Modulation (M-PAM) : 1 EE 322 Al-Sanie
100% (2)
M-Ary Pulse Amplitude Modulation (M-PAM) : 1 EE 322 Al-Sanie
40 pages
Machine Learning Techniques Short Answers
No ratings yet
Machine Learning Techniques Short Answers
20 pages
Utc505 PDF
No ratings yet
Utc505 PDF
3 pages
Nelson Phonics 1
50% (2)
Nelson Phonics 1
15 pages
DownSyndromePREPquestions 000
No ratings yet
DownSyndromePREPquestions 000
4 pages
Answered - The Distance of Line AB Was Measured - Bartleby
No ratings yet
Answered - The Distance of Line AB Was Measured - Bartleby
1 page
COM
No ratings yet
COM
197 pages
Lab 1 CedarLogicSimulator Notes
No ratings yet
Lab 1 CedarLogicSimulator Notes
21 pages
DR - Lavanya Mohan - Recommendation Letter-Template
No ratings yet
DR - Lavanya Mohan - Recommendation Letter-Template
1 page
Reflection Paper Buenviaje
No ratings yet
Reflection Paper Buenviaje
2 pages
Turing Machine and Recursive Language
No ratings yet
Turing Machine and Recursive Language
36 pages
TOC Assignment
No ratings yet
TOC Assignment
7 pages
Assignment 1
100% (1)
Assignment 1
2 pages
Theory of Automata Formal Languages RCS403 PDF
No ratings yet
Theory of Automata Formal Languages RCS403 PDF
3 pages
Java Exercises Beginning
No ratings yet
Java Exercises Beginning
1 page
Utc504 PDF
No ratings yet
Utc504 PDF
2 pages
First Semester: Es 15 101 System Design Using Embedded Processors
No ratings yet
First Semester: Es 15 101 System Design Using Embedded Processors
2 pages
Digital Communication Systems
No ratings yet
Digital Communication Systems
4 pages
Lex-Yacc For Exam
100% (1)
Lex-Yacc For Exam
17 pages
ENSA Agadir Hassane Bouzahir Chapter 1 - Graphs
No ratings yet
ENSA Agadir Hassane Bouzahir Chapter 1 - Graphs
17 pages
Signal and Systems - CLO
No ratings yet
Signal and Systems - CLO
3 pages
Practical - 7: Aim: Implement A Program That Remove Left Recursion On Given Grammar. Theory: Left Recursion
No ratings yet
Practical - 7: Aim: Implement A Program That Remove Left Recursion On Given Grammar. Theory: Left Recursion
4 pages
OA TD2 Correction 2019 2020 - Compressed
No ratings yet
OA TD2 Correction 2019 2020 - Compressed
3 pages
Artificial Intelligence QNS
No ratings yet
Artificial Intelligence QNS
31 pages
Markov Chains
No ratings yet
Markov Chains
35 pages
Compiler Design: Syntactic Analysis Sample Exercises and Solutions
No ratings yet
Compiler Design: Syntactic Analysis Sample Exercises and Solutions
26 pages
Solucion A Algunos Ejercicios Capitulo 7
No ratings yet
Solucion A Algunos Ejercicios Capitulo 7
14 pages
C Programing
No ratings yet
C Programing
165 pages
Week-1 Assessment-1 Answers
No ratings yet
Week-1 Assessment-1 Answers
3 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
8-Bit Microprocessor: VLSI Architecture Project Report On
No ratings yet
8-Bit Microprocessor: VLSI Architecture Project Report On
35 pages
FPGA
No ratings yet
FPGA
14 pages
Switching and Finite Automata Theory, 3rd Ed by Kohavi, K. Jha Sample From Ch10
No ratings yet
Switching and Finite Automata Theory, 3rd Ed by Kohavi, K. Jha Sample From Ch10
3 pages
Diode-Transistor Logic (DTL)
No ratings yet
Diode-Transistor Logic (DTL)
32 pages
Evolutionary Programming
No ratings yet
Evolutionary Programming
19 pages
Theory of Automata Chapter 4
No ratings yet
Theory of Automata Chapter 4
24 pages
DC Lab Exp6 17l238 Rep
No ratings yet
DC Lab Exp6 17l238 Rep
12 pages
Modifying Gauss-Elimination For Tridiagonal Systems &#8211 C PROGRAM
No ratings yet
Modifying Gauss-Elimination For Tridiagonal Systems &#8211 C PROGRAM
5 pages
Lecture Notes For Combinational Logic Circuits
No ratings yet
Lecture Notes For Combinational Logic Circuits
25 pages
Adaline Matlab
No ratings yet
Adaline Matlab
2 pages
An To An A That It It An: I. (L, The
No ratings yet
An To An A That It It An: I. (L, The
10 pages
AI Berkeley Solution PDF
No ratings yet
AI Berkeley Solution PDF
9 pages
Lab Manual
No ratings yet
Lab Manual
42 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
CRC Coding
No ratings yet
CRC Coding
25 pages
Square Topology For NoCs
No ratings yet
Square Topology For NoCs
4 pages
Lesson Plan - Signals & Systems 2012
No ratings yet
Lesson Plan - Signals & Systems 2012
3 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet