RL PyTexas 2017 PDF

Reinforcement learning
in Python
Christine Doig, PyTexas 2017
Goals
• Why reinforcement learning?
• Understand basics concepts intuitively
• How to get started (if you are interested in learning more about
reinforcement learning after this talk!)
Only Python!
Agenda
• What is Reinforcement Learning?
• Python libraries for Reinforcement learning
• Cartpole example in Python
• Resources
What is reinforcement learning?
Alpha Go
Oct. 2015 - Beats human professional Go
player (v. Fan)
Mar. 2016 - Beats Lee Sedol (9-dan

professional) in five-game match (v. Lee)
May 2017 - Beats Ke Jie the world's top Go

player (v. Master)
October 2017 - AlphaGo Zero beats Alpha Go

(v.Lee) (100-0) with an algorithm based solely
on reinforcement learning, without human data.
"Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
Reinforcement learning
An area of machine learning inspired by behaviourist psychology,
concerned with how software agents ought to take actions in an
environment so as to maximize some notion of cumulative reward. [1]
Like a human, our agents learn for themselves to achieve successful

strategies that lead to the greatest long-term rewards. This paradigm
learning by trial-and-error, solely from rewards or punishments, is
known as reinforcement learning (RL). [2]
[1] https://en.wikipedia.org/wiki/Reinforcement_learning
[2] https://deepmind.com/blog/deep-reinforcement-learning/
Concepts
• machine learning
• agents
• actions
• environment
• reward
• strategies
• trial-and-error
Machine learning
no labels labels reward
Unsupervised Supervised Reinforcement

learning learning learning
categorical quantitative
Clustering Dimensionality Model-free Model-based

Classification Regression
reduction Value-based - Policy-based
Logistic Q-learning
Linear
ALGORITHMS
K-means Regression Policy gradient

PCA Regression
Hierarchical SVM REINFORCE
T-SNE Neural
clustering Decision trees Dyna
Networks
k-NN Dynamic
programming
MCTS
Machine learning
no labels labels reward
Unsupervised Supervised Reinforcement

learning learning learning
categorical quantitative
Clustering Dimensionality Model-free Model-based

Classification Regression
reduction
APPLICATIONS
Market segmentation Spam detection Robotics - Make Humanoid robot walk

Anomaly detection Object/face recognition Games - Defeat Go champion
Summarizing information Recommender systems Finance - Trading strategies
Exploring Predicting Decision making

Reinforcement learning concepts
Agent
is observed by the performs
Observation /
Reward Actions
State
affect
generates
Environment
Example: Trading
Agent
Reward Actions
win / lose money

affect
generates buys / sells stocks
Environment
portfolio
Example: Go
Agent
Reward Actions
win / lose game

affect Make a move
generates
Environment
Go game
Strategy
Agent Goal: Select actions to
maximize total future
is observed by the performs reward
Reward Actions
affect
generates
Environment
Strategy
Agent - Trial-and-error
Model (of the

environment)
Reward Actions
affect
generates
Environment
Python libraries for
reinforcement learning
Python libraries for Reinforcement learning
• OpenAI
• Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit:
November 2017
• baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November
2017
• TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017
• DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit:
November 2017
• RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017
• AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017
• RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016.
• PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016.
Open AI libraries: gym and baselines
Agent

Baselines
Reward State / Observation Actions Gym
affect
generates
Environment
Gym: environments
Agent
is observed by the performs Baselines
observation_space: The
reward_range: A
Space object corresponding
tuple corresponding to the
to valid observations
min and max possible
rewards action_space: The Space object
Environment corresponding to valid actions
Cartpole example in Python
The Cartpole environment
Goal: Keep pole vertical

CartPole environment example
reward_range:
observation_space:
1 - poll hasn’t fallen
x, x_dot, theta, theta_dot
0 - poll has fallen
Environment action_space: left, right

CartPole environment example
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
https://gym.openai.com/docs/
CartPole agent example
“The Algorithm” e.g. DeepQ
Baselines
Reward State / Observation Actions Open AI Gym
reward_range:
observation_space:
1 - poll hasn’t felt
x, x_dot, theta, theta_dot affect
0 - poll has felt ç
Environment action_space: left, right
CartPole agent example
import gym
from baselines import deepq
env = gym.make(“CartPole-v0”)
act = deepq.load("cartpole_model.pkl")
while True:
obs, done = env.reset(), False
episode_rew = 0
while not done:
env.render()
obs, rew, done, _ = env.step(act(obs[None])[0])
episode_rew += rew
https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py
Summary
Goals review
• Why reinforcement learning? Python & Decision making applications (Robotics - Make
Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies)
Strategy
• Understand basics concepts intuitively
• machine learning Agent - Trial-and-error
• agents is observed by the performs
Goal: Select actions to
• actions maximize total future
• environment Model (of the reward
• reward Reward
Observation / environment)
Actions
State
• strategies
• trial-and-error
affect
generates
• How to get started: Environment
• OpenAI: gym, baselines
• Cartpole example
Resources
Resources
• Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/
d.silver/web/Teaching.html
• https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human-
knowledge/
• https://keon.io/deep-q-learning/
• https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence-
optimization-for-cartpole-balancing-problem.html
• AlphaGo Zero's win, what it means, Fast Forward Labs: http://

blog.fastforwardlabs.com/2017/10/25/alphago-zero.html
Slides at:
Thank you! @ch_doig
https://speakerdeck.com/chdoig

RL PyTexas 2017 PDF

Uploaded by

Copyright:

Available Formats

RL PyTexas 2017 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RL PyTexas 2017 PDF

Uploaded by

Copyright:

Available Formats

Reinforcement learning

• Understand basics concepts intuitively

• What is Reinforcement Learning?

• Python libraries for Reinforcement learning

• Cartpole example in Python

Mar. 2016 - Beats Lee Sedol (9-dan

May 2017 - Beats Ke Jie the world's top Go

October 2017 - AlphaGo Zero beats Alpha Go

Like a human, our agents learn for themselves to achieve successful

no labels labels reward

Unsupervised Supervised Reinforcement

Clustering Dimensionality Model-free Model-based

K-means Regression Policy gradient

no labels labels reward

Unsupervised Supervised Reinforcement

Clustering Dimensionality Model-free Model-based

Market segmentation Spam detection Robotics - Make Humanoid robot walk

Exploring Predicting Decision making

is observed by the performs

is observed by the performs

win / lose money

is observed by the performs

win / lose game

is observed by the performs

Model (of the

is observed by the performs

Reward State / Observation Actions Gym

is observed by the performs Baselines

Reward State / Observation Actions Gym

Goal: Keep pole vertical

Reward State / Observation Actions Gym

Environment action_space: left, right

“The Algorithm” e.g. DeepQ

Reward State / Observation Actions Open AI Gym

from baselines import deepq

• AlphaGo Zero's win, what it means, Fast Forward Labs: http://

You might also like