Skip to content

rl-tools/rl-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLtools: The Fastest Deep Reinforcement Learning Library

Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio

Documentation Documentation
Run tutorials on Binder Run Example on Colab
Join our Discord!

animated animated
Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)

Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO

Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3

Benchmarks

Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)

Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)

Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).

Quick Start

Clone this repo, then build a Zoo example:

g++ -std=c++17 -Ofast -I include src/rl/zoo/l2f/sac.cpp

Run it ./a.out 1337 (number = seed) then run python3 -m http.server to visualize the results. Open http://localhost:8000 and navigate to the ExTrack UI to watch the quadrotor flying.

  • macOS: Append -framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE for fast training (~4s on M3)
  • Ubuntu: Use apt install libopenblas-dev and append -lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS (~6s on Zen 5).

Algorithms

Algorithm Example
TD3 Pendulum, Racing Car, MuJoCo Ant-v4, Acrobot
PPO Pendulum, Racing Car, MuJoCo Ant-v4 (CPU), MuJoCo Ant-v4 (CUDA)
Multi-Agent PPO Bottleneck
SAC Pendulum (CPU), Pendulum (CUDA), Acrobot

Projects Based on RLtools

Getting Started

⚠️ Note: Check out Getting Started in the documentation for a more thorough guide

Simple example on how to implement your own environment and train a policy using PPO:

Clone and checkout:

git clone https://github.com/rl-tools/example
cd example
git submodule update --init external/rl_tools

build and run:

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
./my_pendulum

Note this example does not have dependencies and should work on any system with CMake and a C++ 17 compiler.

Documentation

The documentation is available at docs.rl.tools and consists of C++ notebooks. You can also run them locally to tinker around:

docker run -p 8888:8888 rltools/documentation

After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!

Chapter Interactive Notebook
Overview -
Getting Started -
Containers Binder
Multiple Dispatch Binder
Deep Learning Binder
CPU Acceleration Binder
MNIST Classification Binder
Deep Reinforcement Learning Binder
The Loop Interface Binder
Custom Environment Binder
Python Interface Run Example on Colab

Repository Structure

To build the examples from source (either in Docker or natively), first the repository should be cloned. Instead of cloning all submodules using git clone --recursive which takes a lot of space and bandwidth we recommend cloning the main repo containing all the standalone code for RLtools and then cloning the required sets of submodules later:

git clone https://github.com/rl-tools/rl-tools.git rl_tools

Cloning submodules

There are three classes of submodules:

  1. External dependencies (in external/)
    • E.g. HDF5 for checkpointing, Tensorboard for logging, or MuJoCo for the simulation of contact dynamics
  2. Examples/Code for embedded platforms (in embedded_platforms/)
  3. Redistributable dependencies (in redistributable/)
  4. Test dependencies (in tests/lib)
  5. Test data (in tests/data)

These sets of submodules can be cloned incrementally/independent of each other. For most use-cases (like e.g. most of the Docker examples) you should clone the submodules for external dependencies:

cd rl_tools
git submodule update --init --recursive -- external

The submodules for the embedded platforms, the redistributable binaries and test dependencies/data can be cloned in the same fashion (by replacing external with the appropriate folder from the enumeration above). Note: Make sure that for the redistributable dependencies and test data git-lfs is installed (e.g. sudo apt install git-lfs on Ubuntu) and activated (git lfs install) otherwise only the metadata of the blobs is downloaded.

Python Interface

We provide Python bindings that available as rltools through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments.

pip install rltools gymnasium

Usage:

from rltools import SAC
import gymnasium as gym
from gymnasium.wrappers import RescaleAction

seed = 0xf00d
def env_factory():
    env = gym.make("Pendulum-v1")
    env = RescaleAction(env, -1, 1)
    env.reset(seed=seed)
    return env

sac = SAC(env_factory)
state = sac.State(seed)

finished = False
while not finished:
    finished = state.step()

You can find more details in the Python Interface documentation and from the repository rl-tools/python-interface.

Embedded Platforms

Inference & Training

Inference

Naming Convention

We use snake_case for variables/instances, functions as well as namespaces and PascalCase for structs/classes. Furthermore, we use upper case SNAKE_CASE for compile-time constants.

Citing

When using RLtools in an academic work please cite our publication using the following Bibtex citation:

@article{eschmann_rltools_2024,
  author  = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},
  title   = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {25},
  number  = {301},
  pages   = {1--19},
  url     = {http://jmlr.org/papers/v25/24-0248.html}
}