This is a collection of Reinforcement Learning Algorithms written entirely in TypeScript for educational purposes.
WORK IN PROGRESS!!
- Multi-Arm Bandits Algorithms: Exploration/Exploitation
- Epsilon-Greedy
- Upper Confidence Bound
- Thomson Sampling
- Dynamic Programming
- Iterative Policy Evaluation
- Policy Improvement
- Policy Iteration
- Truncated Policy Iteration
- Value Iteration
- Monte Carlo Methods
- MC Prediction
- State Values
- Action Values
- MC Control
- MC Prediction
- Temporal-Difference Methods
- TD Prediction: TD(0)
- TD Prediction
- State Values
- Action Values
- TD Control: Sarsa
- TD Control: Q-Learning
- TD Control: Expected Sarsa
- Value-Based Methods
- Deep Q-Networks (DQN)
- Vanilla DQN
- N Step DQN
- Double DQN
- Dueling DQN
- DQN with Prioritized Experience Replay (PER)
- DQN with Noisy Networks
- Categorical DQN (C51)
- Quantile Regression DQN
- Rainbow
- Normalized Advantage Functions (NAF)
- Deep Q-Networks (DQN)
- Policy-Based Methods
- REINFORCE
- Off-Policy
- Actor-Critic Methods
- Vanilla Actor-Critic
- Advantage Actor Critic (A2C)
- A2C with Generalized Advantage Estimation (GAE)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Actor-Critic with Experience Replay (ACER)
- Actor-Critic using Kronecker-Factored Trust Region (ACKTR)
- Deep Deterministic Policy Gradient (DDPG)
- DDPG with Hindsight Experience Replay (HER)
- Twin Delayed Deep Deterministic (TD3)
- Soft Actor Critic (SAC)
- SAC Discrete
- Multi-Agent Algorithms
- Multi-Agent DDPG (MADDPG)
- Multi-Agent TD3
- Multi-Agent SAC
- Model-Based Algorithms
- TODO