Some-RL-Implementation (more to be added)

Policy gradients in TensorFlow for CartPole environment in OpenAI gym
Partial implementation of the paper Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning (https://arxiv.org/pdf/1708.02596.pdf)
Implementation of the bit-flipping example in the paper Hindsight Experience Replay (https://arxiv.org/pdf/1707.01495.pdf)
Proximal Policy Optimization (PPO) Algorithms
Conservative Q-Learning for Offline Reinforcement Learning (https://arxiv.org/abs/2006.04779) and Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning (https://arxiv.org/abs/2303.05479)

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
CQL		CQL
Hindsight Experience Replay		Hindsight Experience Replay
Model_based_RL		Model_based_RL
Policy gradients CartPole		Policy gradients CartPole
Utils		Utils
options_HRL		options_HRL
ppo		ppo
.DS_Store		.DS_Store
README.md		README.md

Provide feedback