RL PyTexas 2017 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Reinforcement learning

in Python
Christine Doig, PyTexas 2017
Goals
• Why reinforcement learning?

• Understand basics concepts intuitively

• How to get started (if you are interested in learning more about
reinforcement learning after this talk!)

Only Python!
Agenda

• What is Reinforcement Learning?

• Python libraries for Reinforcement learning

• Cartpole example in Python

• Resources
What is reinforcement learning?
Alpha Go
Oct. 2015 - Beats human professional Go
player (v. Fan)

Mar. 2016 - Beats Lee Sedol (9-dan


professional) in five-game match (v. Lee)

May 2017 - Beats Ke Jie the world's top Go


player (v. Master)

October 2017 - AlphaGo Zero beats Alpha Go


(v.Lee) (100-0) with an algorithm based solely
on reinforcement learning, without human data.

"Mastering the game of Go without human knowledge". Nature. 19 October 2017. Retrieved 19 October 2017.
Reinforcement learning
An area of machine learning inspired by behaviourist psychology,
concerned with how software agents ought to take actions in an
environment so as to maximize some notion of cumulative reward. [1]

Like a human, our agents learn for themselves to achieve successful


strategies that lead to the greatest long-term rewards. This paradigm
learning by trial-and-error, solely from rewards or punishments, is
known as reinforcement learning (RL). [2]

[1] https://en.wikipedia.org/wiki/Reinforcement_learning
[2] https://deepmind.com/blog/deep-reinforcement-learning/
Concepts
• machine learning
• agents
• actions
• environment
• reward
• strategies
• trial-and-error
Machine learning

no labels labels reward

Unsupervised Supervised Reinforcement


learning learning learning
categorical quantitative

Clustering Dimensionality Model-free Model-based


Classification Regression
reduction Value-based - Policy-based

Logistic Q-learning
Linear
ALGORITHMS

K-means Regression Policy gradient


PCA Regression
Hierarchical SVM REINFORCE
T-SNE Neural
clustering Decision trees Dyna
Networks
k-NN Dynamic
programming
MCTS
Machine learning

no labels labels reward

Unsupervised Supervised Reinforcement


learning learning learning
categorical quantitative

Clustering Dimensionality Model-free Model-based


Classification Regression
reduction
APPLICATIONS

Market segmentation Spam detection Robotics - Make Humanoid robot walk


Anomaly detection Object/face recognition Games - Defeat Go champion
Summarizing information Recommender systems Finance - Trading strategies

Exploring Predicting Decision making


Reinforcement learning concepts

Agent

is observed by the performs

Observation /
Reward Actions
State

affect
generates
Environment
Example: Trading

Agent

is observed by the performs

Reward Actions

win / lose money


affect
generates buys / sells stocks
Environment

portfolio
Example: Go

Agent

is observed by the performs

Reward Actions

win / lose game


affect Make a move
generates
Environment

Go game
Reinforcement learning concepts
Strategy
Agent Goal: Select actions to
maximize total future
is observed by the performs reward

Reward Actions

affect
generates
Environment
Reinforcement learning concepts
Strategy
Agent - Trial-and-error

is observed by the performs

Model (of the


environment)
Reward Actions

affect
generates
Environment
Python libraries for

reinforcement learning
Python libraries for Reinforcement learning

• OpenAI

• Gym: Toolkit for developing and comparing reinforcement learning algorithms. MIT License, Last commit:
November 2017

• baselines: high-quality implementations of reinforcement learning algorithms, MIT License, Last commit: November
2017

• TensorForce, A TensorFlow library for applied reinforcement learning, Apache 2, Last commit: November 2017

• DeepRL, Highly modularized implementation of popular deep RL algorithms by PyTorch, Apache 2 License, Last commit:
November 2017

• RLlab, a framework for developing and evaluating reinforcement learning algorithms, MIT License, Last commit: July 2017

• AgentNet, Python library for deep reinforcement learning using Theano+Lasagne, MIT License, Last commit: August 2017

• RLPy, the Reinforcement Learning Library for Education and Research, 3-Clause BSD License, Last commit: April 2016.

• PyBrain, the Python Machine Learning Library, 3-Clause BSD License, Last commit: March 2016.
Open AI libraries: gym and baselines

Agent

is observed by the performs


Baselines

Reward State / Observation Actions Gym

affect
generates
Environment
Gym: environments

Agent

is observed by the performs Baselines

Reward State / Observation Actions Gym

observation_space: The
reward_range: A
Space object corresponding
tuple corresponding to the
to valid observations
min and max possible
rewards action_space: The Space object
Environment corresponding to valid actions
Cartpole example in Python
The Cartpole environment

Goal: Keep pole vertical


CartPole environment example

Reward State / Observation Actions Gym

reward_range:
observation_space:
1 - poll hasn’t fallen
x, x_dot, theta, theta_dot
0 - poll has fallen

Environment action_space: left, right


CartPole environment example

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action

https://gym.openai.com/docs/
CartPole agent example

“The Algorithm” e.g. DeepQ

Baselines

Reward State / Observation Actions Open AI Gym

reward_range:
observation_space:
1 - poll hasn’t felt
x, x_dot, theta, theta_dot affect
0 - poll has felt ç
Environment action_space: left, right
CartPole agent example

import gym

from baselines import deepq

env = gym.make(“CartPole-v0”)
act = deepq.load("cartpole_model.pkl")

while True:
obs, done = env.reset(), False
episode_rew = 0
while not done:
env.render()
obs, rew, done, _ = env.step(act(obs[None])[0])
episode_rew += rew

https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/enjoy_cartpole.py
Summary
Goals review
• Why reinforcement learning? Python & Decision making applications (Robotics - Make
Humanoid robot walk, Games - Defeat Go champion, Finance - Trading strategies)

Strategy
• Understand basics concepts intuitively
• machine learning Agent - Trial-and-error
• agents is observed by the performs
Goal: Select actions to
• actions maximize total future
• environment Model (of the reward

• reward Reward
Observation / environment)
Actions
State
• strategies
• trial-and-error
affect
generates
• How to get started: Environment
• OpenAI: gym, baselines
• Cartpole example
Resources
Resources
• Reinforcement Learning course by David Silver: http://www0.cs.ucl.ac.uk/staff/
d.silver/web/Teaching.html

• https://blog.acolyer.org/2017/11/17/mastering-the-game-of-go-without-human-
knowledge/

• https://keon.io/deep-q-learning/

• https://rishav1.github.io/rein_learning/2017/01/05/simple-swarm-intelligence-
optimization-for-cartpole-balancing-problem.html

• AlphaGo Zero's win, what it means, Fast Forward Labs: http://


blog.fastforwardlabs.com/2017/10/25/alphago-zero.html
Slides at:
Thank you! @ch_doig
https://speakerdeck.com/chdoig

You might also like