RL PyTexas 2017 PDF
RL PyTexas 2017 PDF
RL PyTexas 2017 PDF
in Python
Christine Doig, PyTexas 2017
Goals
• Why reinforcement learning?
• How to get started (if you are interested in learning more about
reinforcement learning after this talk!)
Only Python!
Agenda
• Resources
What is reinforcement learning?
Alpha Go
Oct. 2015 - Beats human professional Go
player (v. Fan)
"Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
Reinforcement learning
An area of machine learning inspired by behaviourist psychology,
concerned with how software agents ought to take actions in an
environment so as to maximize some notion of cumulative reward. [1]
[1] https://en.wikipedia.org/wiki/Reinforcement_learning
[2] https://deepmind.com/blog/deep-reinforcement-learning/
Concepts
• machine learning
• agents
• actions
• environment
• reward
• strategies
• trial-and-error
Machine learning
Logistic Q-learning
Linear
ALGORITHMS
Agent
Observation /
Reward Actions
State
affect
generates
Environment
Example: Trading
Agent
Reward Actions
portfolio
Example: Go
Agent
Reward Actions
Go game
Reinforcement learning concepts
Strategy
Agent Goal: Select actions to
maximize total future
is observed by the performs reward
Reward Actions
affect
generates
Environment
Reinforcement learning concepts
Strategy
Agent - Trial-and-error
affect
generates
Environment
Python libraries for
reinforcement learning
Python libraries for Reinforcement learning
• OpenAI
• Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit:
November 2017
• baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November
2017
• TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017
• DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit:
November 2017
• RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017
• AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017
• RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016.
• PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016.
Open AI libraries: gym and baselines
Agent
affect
generates
Environment
Gym: environments
Agent
observation_space: The
reward_range: A
Space object corresponding
tuple corresponding to the
to valid observations
min and max possible
rewards action_space: The Space object
Environment corresponding to valid actions
Cartpole example in Python
The Cartpole environment
reward_range:
observation_space:
1 - poll hasn’t fallen
x, x_dot, theta, theta_dot
0 - poll has fallen
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
https://gym.openai.com/docs/
CartPole agent example
Baselines
reward_range:
observation_space:
1 - poll hasn’t felt
x, x_dot, theta, theta_dot affect
0 - poll has felt ç
Environment action_space: left, right
CartPole agent example
import gym
env = gym.make(“CartPole-v0”)
act = deepq.load("cartpole_model.pkl")
while True:
obs, done = env.reset(), False
episode_rew = 0
while not done:
env.render()
obs, rew, done, _ = env.step(act(obs[None])[0])
episode_rew += rew
https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py
Summary
Goals review
• Why reinforcement learning? Python & Decision making applications (Robotics - Make
Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies)
Strategy
• Understand basics concepts intuitively
• machine learning Agent - Trial-and-error
• agents is observed by the performs
Goal: Select actions to
• actions maximize total future
• environment Model (of the reward
• reward Reward
Observation / environment)
Actions
State
• strategies
• trial-and-error
affect
generates
• How to get started: Environment
• OpenAI: gym, baselines
• Cartpole example
Resources
Resources
• Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/
d.silver/web/Teaching.html
• https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human-
knowledge/
• https://keon.io/deep-q-learning/
• https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence-
optimization-for-cartpole-balancing-problem.html