Deep Reinforcement Learning in Unity: With Unity ML Toolkit 1st Edition Abhilash Majumder
Deep Reinforcement Learning in Unity: With Unity ML Toolkit 1st Edition Abhilash Majumder
Deep Reinforcement Learning in Unity: With Unity ML Toolkit 1st Edition Abhilash Majumder
com
https://textbookfull.com/product/deep-
reinforcement-learning-in-unity-with-unity-ml-
toolkit-1st-edition-abhilash-majumder/
https://textbookfull.com/product/deep-reinforcement-learning-in-unity-
with-unity-ml-toolkit-1st-edition-abhilash-majumder-2/
textbookfull.com
https://textbookfull.com/product/learning-c-programming-with-
unity-3d-alex-okita/
textbookfull.com
https://textbookfull.com/product/conspiracy-theories-and-other-
dangerous-ideas-sunstein-cass-r/
textbookfull.com
America's Wars on Democracy in Rwanda and the DR Congo
Justin Podur
https://textbookfull.com/product/americas-wars-on-democracy-in-rwanda-
and-the-dr-congo-justin-podur/
textbookfull.com
https://textbookfull.com/product/sex-politics-and-society-the-
regulation-of-sexuality-since-1800-fourth-edition-jeffrey-weeks/
textbookfull.com
https://textbookfull.com/product/intelligent-web-data-management-
software-architectures-and-emerging-technologies-1st-edition-kun-ma/
textbookfull.com
https://textbookfull.com/product/foreign-investment-promotion-
governance-and-implementation-in-central-eastern-european-regions-
pawel-capik/
textbookfull.com
https://textbookfull.com/product/organizational-behavior-a-critical-
thinking-approach-christopher-p-neck/
textbookfull.com
Deep
Reinforcement
Learning in
Unity
With Unity ML Toolkit
—
Abhilash Majumder
Deep Reinforcement
Learning in Unity
With Unity ML Toolkit
Abhilash Majumder
Deep Reinforcement Learning in Unity: With Unity ML Toolkit
Abhilash Majumder
Pune, Maharashtra, India
Introduction�����������������������������������������������������������������������������������������������������������xvii
v
Table of Contents
vi
Table of Contents
viii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 553
ix
About the Author
Abhilash Majumder is a natural language processing
research engineer for HSBC (UK/India) and technical mentor
for Udacity (ML). He also has been associated with Unity
Technologies and was a speaker at Unite India-19, and has
educated close to 1,000 students from EMEA and SEPAC
(India) on Unity. He is an ML contributor and curator for
Open Source Google Research, Tensorflow, and Unity ML
Agents and a creator of ML libraries under Python Package
Index (PyPI). He is a speaker for NLP and deep learning
for Pydata-Los Angeles. He is an online educationalist for
Udemy, and a deep learning mentor for Upgrad. He is an
erstwhile graduate from the National Institute of Technology, Durgapur (NIT-D) majoring
in NLP, Machine Learning, and Applied Mathematics. He can be reached via email at
debabhi1396@gmail.com
Abhilash was a former apprentice/student ambassador for Unity Technologies,
where he educated corporate employees and students on using general Unity for game
development. He was a technical mentor (AI programming) for the Unity Ambassadors
Community and Content Production. He has been associated with Unity Technologies
for general education, with an emphasis on graphics and machine learning. He is a
community moderator for machine learning (ML Agents) sessions organized by Unity
Technologies (Unity Learn). He is one of the first content creators for Unity Technologies
India (since 2017) and is responsible for the growth of the community in India under the
guidance of Unity Technologies.
xi
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
About the Technical Reviewer
Ansh Shah is pursuing an MSc Physics and BE Mechanical
from Bits Pilani University, India. By day, he is a student.
By night, he is a robotics and machine learning enthusiast.
He is a core member of BITS ROBOCON, a technical team
at college, and is currently working on quadcopter and
quadruped.
xiii
Acknowledgments
The amount of dedication and support that I have received in the making of this book
has left me amazed. First, I would like to thank my family, Mr. Abhijit Majumder and
Mrs. Sharbari Majumder, who have been instrumental in supporting me all the way.
I would also like to extend my heartfelt thanks to the entire Apress Team, without
whom this would not have been possible. Special thanks to Mrs. Spandana Chatterjee,
the Acquisition Editor, Mr. Shrikant Vishwakarma, the Coordinating Editor, and Laura
Berendson, the Development Editor, for their constant support and thorough reviews.
Ansh Shah, the Technical Reviewer of this book, has also played an important role and
I extend my thanks to him.
I would also like to share this space in thanking my mentor, Carl Domingo from
Unity Technologies, who has been so instrumental in guiding me from the beginning
of my journey with Unity. The Unity Machine Learning team deserves mention, as this
book would not have been possible without their constant efforts to make the ML Agents
platform amazing. I especially thank Dr. Danny Lange, whose sessions on machine
learning have been instrumental in understanding the framework and the concepts.
I am grateful to everyone who helped in the entire process to make this book, which
would help readers understand the beauty of deep reinforcement learning.
xv
Introduction
Machine learning has been instrumental in shaping the scope of technology since
its inception. ML has played an important role in the development of things such as
autonomous vehicles and robotics. Deep reinforcement learning is that field of learning
where agents learn with help of rewards—a thought which has been derived from
nature. Through this book, the author tries to present the diversity of reinforcement
learning algorithms in game development as well as in scientific research. Unity,
the cross-platform engine that is used in a plethora of tasks, from visual effects and
cinematography to machine learning and high performance graphics, is the primary
tool that is used in this book. With the power of the Unity ML Agents Toolkit, the deep
reinforcement learning framework built by Unity, the author tries to show the vast
possibilities of this learning paradigm.
The book starts with an introduction to state-based reinforcement learning, from
Markov processes to Bellman equations and Q-learning, which sets the ground for
the successive sections. A plethora of diverse pathfinding algorithms, from Dijkstra to
sophisticated variants of A* star, have been provided along with simulations in Unity.
The book also covers how navigation meshes work for automated pathfinding in Unity.
An introduction to the ML Agents Toolkit, from standard process for installation to
training an AI agent with deep reinforcement learning algorithm (proximal policy
operation [PPO]) is provided as a starter. Along the course of this book, there is an
extensive usage of the Tensorflow framework along with OpenAI Gym environments for
proper visualizations of complex deep reinforcement learning algorithms in terms of
simulations, robotics, and autonomous agents. Successive sections of the book involve
an in-depth study of the variety of on- and off-policy algorithms, ranging from discrete
SARSA/Q-learning to actor critic variants, deep Q-network variants, PPO, and their
implementations using the Keras Tensorflow framework on Gym. These sections are
instrumental in understanding how different simulations such as the famous Puppo
(Unity Berlin), Tiny agents, and other ML Agents samples from Unity are created and
built. Sections with detailed descriptions about how to build simulations in Unity using
the C# software development kit for ML Agents and training them using soft actor critic
(SAC), PPO, or behavioral cloning algorithms such as GAIL are provided.
xvii
Introduction
The latter part of this book provides an insight into curriculum learning and
adversarial networks with an analysis of how AI agents are trained in games such as FIFA.
In all these sections, a detailed description of the variants of neural networks—MLP,
convolution networks, recurrent networks along with long short-term memory and GRU
and their implementations and performance are provided. This is especially helpful as
they are used extensively during building the deep learning algorithms. The importance
of convolution networks for image sampling in Atari-based 2D games such as Pong has
been provided. The knowledge of computer vision and deep reinforcement learning is
combined to produce autonomous vehicles and driverless cars, which is also provided as
an example template (game) for the readers to build upon.
Finally, this book also contains an in-depth review of the Obstacle Tower
Challenge, which was organized by Unity Technologies to challenge state-of-the-art
deep reinforcement learning algorithms. Sections on certain evolutionary algorithms
along with the Google Dopamine framework has been provided for understanding
the vast field of reinforcement learning. Through this book, the author hopes to infuse
enthusiasm and foster research among the readers in the field of deep reinforcement
learning.
xviii
CHAPTER 1
Introduction to
Reinforcement Learning
Reinforcement learning (RL) is a paradigm of learning algorithms that are based on
rewards and actions. The state-based learning paradigm is different from generic
supervised and unsupervised learning, as it does not typically try to find structural
inferences in collections of unlabeled or labeled data. Generic RL relies on finite state
automation and decision processes that assist in finding an optimized reward-based
learning trajectory. The field of RL relies heavily on goal-seeking, stochastic processes
and decision theoretic algorithms, which is a field of active research. With developments
in higher order deep learning algorithms, there has been huge advancement in this field
to create self-learning agents that can achieve a goal by using gradient convergence
techniques and sophisticated memory-based neural networks. This chapter will
focus on the fundamentals of the Markov Decision Process (MDP), hidden Markov
Models (HMMs) and dynamic programming for state enumeration, Bellman’s iterative
algorithms, and a detailed walkthrough of value and policy algorithms. In all these
sections, there will be associated python notebooks for better understanding of the
concepts as well as simulated games made with Unity (version 2018.x).
The fundamental aspects in an academy of RL are agent(s) and environment(s).
Agent refers to an object that uses learning algorithms to try and explore rewards
in steps. The agent tries to optimize a suitable path toward a goal that results in
maximization of the rewards and, in this process, tries to avoid punishing states.
Environment is everything around an agent; this includes the states, obstacles, and
rewards. The environment can be static as well as dynamic. Path convergence in a static
environment is faster if the agent has sufficient buffer memory to retain the correct
trajectory toward the goal as it explores different states. Dynamic environment pose
a stronger challenge for agents, as there is no definite trajectory. The second use-case
1
© Abhilash Majumder 2021
A. Majumder, Deep Reinforcement Learning in Unity, https://doi.org/10.1007/978-1-4842-6503-1_1
Chapter 1 Introduction to Reinforcement Learning
requires sufficient deep memory network models like bidirectional long short-term
memory (LSTM) to retain certain key observations that remain static in the dynamic
environment. Figuratively generic reinforcement learning can be presented as shown in
Figure 1-1.
The set of variables that control and govern the interaction between the agent and
the environment includes {state(S), reward(R), action(A)}.
The possible states, rewards, and actions sets in this environment include:
• Termination: if the cart shifts more than 2.4 units from the center or
the pendulum inclines more than 15 degrees
4
Chapter 1 Introduction to Reinforcement Learning
For running the Jupyter notebook, open Anaconda Prompt or Command Prompt
and run the following command:
jupyter notebook
5
Chapter 1 Introduction to Reinforcement Learning
6
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Chapter 1 Introduction to Reinforcement Learning
Note Tensorflow has nightly builds that are released every day with a version
number, and this can be viewed in the Python Package Index (Pypi) page of
Tensorflow. These builds are generally referred to as tf-nightly and may have
an unstable compatibility with Unity ML agents. However, official releases are
recommended for integration with ML agents, while nightly builds can be used for
deep learning as well.
Once the installation is complete, we can dive into the CartPole environment and try
to gain more information on the environment, rewards, states, and actions.
7
Chapter 1 Introduction to Reinforcement Learning
import gym
import numpy as np
import matplotlib.pyplot as plt
from IPython import display as ipythondisplay
The next step involves setting up the dimensions of the display window to visualize
the environment in the Colab notebook. This uses the pyvirtualdisplay library.
Now, let us load the environment from Gym using the gym.make command and look
into the states and the actions. Observation states refer to the environment variables that
contain the key factors like cart velocity and pole velocity and is an array of size 4. The
action space is an array of size 2, which refers to the binary actions (moving left or right).
The observation space also contains high and low values as boundary values for the
problem.
env = gym.make("CartPole-v0")
#Action space->Agent
print(env.action_space)
#Observation Space->State and Rewards
print(env.observation_space)
print(env.observation_space.high)
print(env.observation_space.low)
8
Chapter 1 Introduction to Reinforcement Learning
After running, the details appear in the console. The details include the different
action spaces as well as the observation steps.
Let us try to run the environment for 50 iterations and check the values of rewards
accumulated. This will simulate the environment for 50 iterations and provide insight
into how the agent balances itself with the baseline OpenAI model.
env = gym.make("CartPole-v0")
env.reset()
prev_screen = env.render(mode='rgb_array')
plt.imshow(prev_screen)
for i in range(50):
action = env.action_space.sample()
#Get Rewards and Next States
obs, reward, done, info = env.step(action)
screen = env.render(mode='rgb_array')
print(reward)
plt.imshow(screen)
ipythondisplay.clear_output(wait=True)
ipythondisplay.display(plt.gcf())
9
Chapter 1 Introduction to Reinforcement Learning
if done:
break
ipythondisplay.clear_output(wait=True)
env.close()
The environment is reset initially with the env.reset() method. For each of the 50
iterations, env.action_space.sample() method tries to sample most favorable states or
rewarding states. The sampling method can use tabular discrete RL algorithms like
Q-learning or continuous deep RL algorithms like deep-Q–network (DQN). There is
a discount factor that is called at the start of every iteration to discount the rewards of
the previous timestamp, and the pole agent tries to find new rewards accordingly. The
env.step(action) chooses from a “memory” or previous actions and tries to maximize
its rewards by staying upright as long as possible. At the end of each action step, the
display changes to render a new state of the pole. The loop finally breaks if the iterations
have been completed. The env.close() method closes the connection to the Gym
environment.
This has helped us to understand how states and rewards affect an agent. We will get
into the details of an in-depth study of modeling a deep Q-learning algorithm to provide
a faster and optimal reward-based solution to the CartPole problem. The environment
has observation states that are discrete and can be solved by using tabular RL algorithms
like Markov-based Q-learning or SARSA.
Deep learning provides more optimization by converting the discrete states into
continuous distributions and then tries to apply high-dimensional neural networks
to converge the loss function to a global minima. This is favored by using algorithms
like DQN, double deep-Q-network (DDQN), dueling DQN, actor critic (AC), proximal
policy operation (PPO), deep deterministic policy gradients (DDPG), trust region policy
optimization (TRPO), soft actor critic (SAC). The latter section of the notebook contains
a deep Q-learning implementation of the CartPole problem, which will be explained
in later chapters. To highlight certain important aspects of this code, there is a deep
learning layer made with Keras and also for each iteration the collection of state, action,
and rewards are stored in a replay memory buffer. Based on the previous states of the
buffer memory and the rewards for the previous steps, the pole agent tries to optimize
the Q-learning function over the Keras deep learning layers.
10
Chapter 1 Introduction to Reinforcement Learning
TensorBoard starts on port 6006. To include the episodes of training data inside the
logs, a separate logs file is created at runtime as follows:
tensorboard_callback = TensorBoard(
log_dir='./log', histogram_freq=1,
write_graph=True,
write_grads=True,
batch_size=agent.batch_size,
write_images=True)
self.model.fit(np.array(x_batch),np.array(y_batch),batch_size=len(x_batch),
verbose=1,callbacks=[tensorboard_callback])
11
Random documents with unrelated
content Scribd suggests to you:
He was prudent: "Go ahead!"
"I love you, Roger, but I should like to be sincere. From my
childhood I have lived alone a good deal and enjoyed a great deal of
freedom. My father left in me a spirit of independence, which I
haven't abused, because it seemed quite natural to me, and because
it was wholesome. So I have acquired certain habits of mind that I
should find difficult, now, to do without. I know that I am rather
different from the majority of young girls of my class. Yet I believe
that what I feel they feel too; only I dare to say it, and I have a
clearer conscience. You ask me to unite my life with yours. It is my
wish. For each of us it is our most profound desire to find our
beloved mate. And it seems to me that you could be that mate,
Roger . . . if . . . if you wished . . ."
"If I wished!" he exclaimed. "That's a good joke! I don't do
anything but wish! . . ."
"If you truly wished to be my mate. It is not a joke. Reflect! . . .
To unite our lives means to suppress either one or the other. . . .
What do you offer me? . . . You aren't aware of it, because the world
has long been used to these inequalities. But they are new to me. . .
. You do not come to me with only your affection. You come to me
with your family, your friends, your clients, and your relatives, with
your course mapped out, your career fixed, with your party and its
dogmas, your family and its traditions,—with a whole world that is
yours, a whole world that is you. And I, who have a world too, who
am also a world,—you say to me: 'Abandon your world! Throw it
away, and enter into mine!' I am ready to come, Roger, but I must
come whole. Do you accept me as I am?"
"I want all," said he. "It was you, just now, who said that you
could not give me all."
"You don't understand. I say: 'Do you accept me free? And do you
accept all of me?'"
"Free?" responded Roger circumspectly. "Everybody has been free
in France since '89. . . ." (Annette smiled: "The old platitude! . . .")
"But, after all, we must understand each other. It is certainly evident
that from the moment you marry you will not be completely free. By
that act you will have contracted obligations."
"I don't like that word very much," said Annette, "but I am not
afraid of the thing. I should joyously and freely take my part in the
trials and labors of the man I loved, in the duties of our common
life. But I won't renounce, on that account, the duties of my own
life."
"And what other duties are there? After what you have told me
and what I think I know, your life, my dear Annette, your life that
until now has been so placid and so calm, does not seem to me to
have experienced any very great exigencies? What could it demand?
Is it your work that you mean? Would you like to go on with it? I
confess that kind of activity seems wrong to me, for a woman. At
least, as a vocation. It's bothersome, in the home. . . . But I can't
believe that you are afflicted with this gift from Heaven. You are too
human, and too well balanced."
"No, it isn't a question of a special vocation. That would be
simple, for then one would have to follow it. . . . The demand, the
exigence (as you say) of my life is less easy to formulate: for it is
less precise and much more vast. It is a question of the right laid
upon every living soul: the right to change."
Roger cried: "To change! To change love?"
"Even while always remaining faithful, as I have said, to a single
love, the soul has the right to change. . . . Yes, I know, Roger, that
the word 'change' frightens you. . . . It disturbs me, too. . . . When
the passing hour is beautiful, I should like never to stir. One sighs
that it cannot be held forever! . . . And yet, Roger, one ought not to
do it; and, first of all, one cannot. One does not remain stationary.
One lives, one goes forward, one is pushed,—one must, must
advance! This does no injury to love; one takes that along. But love
should not wish to hold us back, shut up with it in the immobile
sweetness of a single thought. A beautiful love may last for a whole
lifetime, but it cannot entirely fill it. Think, my dear Roger, that while
still loving you I might find myself some day, perhaps (I find myself
already), cramped within your circle of action and thought. I would
never dream of arguing with you the excellence of your choice. But
would it be just for it to be imposed on me? And don't you find it
equitable to grant me the right of opening the window, if I haven't
enough air,—and even the door, a little—(oh! I won't go far)—and
for me to have my own little province of activity, my intellectual
interests, my friendships, not to remain confined to one point of the
globe, to the same horizon, but to try and enlarge it, to seek a
change of air, to emigrate. . . . (I say: if it is necessary. . . . I don't
know yet. But in any case I need to feel that I am free to do it, that
I am free to wish, free to breathe, free . . . free to be free . . . even
if I never make use of my liberty.) . . . Forgive me, Roger, perhaps
you find this need absurd and childish. It is not, I assure you; it is
the most profound need of my being, the breath that gives me life.
If it were taken away from me, I should die. . . . I can do
everything, for love. . . . But constraint kills me. And the idea of
constraint makes me a rebel. No, the union of two beings ought not
to become a mutual enchainment. It should be a twofold blooming. I
should like each, instead of being jealous of the other's free
development, to be happy in assisting it. Would you be, Roger?
Would you know how to love me enough to love me free, free of
you? . . ."
(She was thinking: "I should be yours only the more! . . .")
Roger was listening to her anxiously, nervous, and a little vexed.
Any man would have been. Annette should have been capable of
more adroitness. In her need of frankness and her fear of deception,
she was always led into exaggerating the most startling features of
her thought. But a stronger love than Roger's would not have set
this all at naught. Roger, his self-love touched above all, wavered
between two sentiments: that of not taking this feminine caprice
seriously, and the annoyance that he felt at this moral insurrection.
He had not perceived its passionate appeal to his heart. All that he
understood of it was that it was a sort of obscure menace and attack
upon his proprietary rights. If he had possessed more cunning in his
management of women, he would have hidden his secret vexation,
and promised, promised, promised . . . all that Annette desired.
"Lover's promises, as many as the wind will carry. Why then be
niggardly? . . ." But Roger, who had his faults, also had his virtues:
he was, as they say, "a simple young fellow," too much filled with
himself to be well acquainted with women, with whom he had had
recent dealings. He lacked the skill to hide his vexation. And when
Annette awaited his generous answer, she suffered the
disappointment of seeing that while listening to her he had thought
only of himself.
"Annette," said he, "I confess that I can scarcely understand what
you ask of me. You talk of our marriage as of a prison, and your one
idea seems to be to escape from it. My house has no bars at the
windows, and it is large enough for one to be comfortable in it. But
one cannot live with all the doors wide open, and my house is made
to be lived in. You talk to me about leaving it, about having your
individual life, your personal relationships, your friends, and even, if
I have rightly understood, of your privilege to leave the home at will,
in search of Heaven knows what you fail to find there, until it
happens to please you to come back again some day. . . . This can't
be serious, Annette! You haven't thought about it! No man could
grant his wife a position that would be so humiliating for him and so
equivocal for her."
These reflections were not, perhaps, lacking in good sense. But
there are times when perfectly dry good sense, with no intuition of
the heart, is a kind of nonsense. Annette, somewhat ruffled,
answered with a proud frigidity that masked her emotion:
"Roger, it is necessary to have faith in the woman one loves;
when one marries her, one must not do her the wrong of believing
that she would not have the same care as yourself for your honor.
Do you think that such a woman as myself would lend herself to an
equivocation in order to humiliate you? Any humiliation for you
would be a humiliation for her as well. And the freer she were, the
more bound she would feel to watch over that part of yourself which
you had confided to her. You will have to esteem me more highly.
Aren't you capable of having confidence in me?"
XIV
XV
XVI
She took her departure on the following day. Her excuse was a
letter, a sudden illness of her old aunt. The Brissots were not
completely fooled by this. For some time they had been more
suspicious than Roger that Annette was escaping them. But it suited
their dignity not to seem to admit this possibility, and to believe in
the reasons given for this sudden departure. Up to the last moment
they played a comedy of brief separation and early reunion. This
constraint was painful to Annette; but Roger had begged her not to
announce her decision until later, at Paris, and Annette admitted to
herself that she would have found it hard to inform the Brissots by
word of mouth. So, when they took leave of each other, they
exchanged smiles, coy words and embraces from which the heart
was absent.
Roger again accompanied Annette in a carriage to the station.
They were both sad. Roger had virtuously renewed his request to
Annette that she should marry him; he felt that he was bound to: he
was a gentleman. Too much of a one. He also felt that he had the
right, now, to make his authority felt,—in the interest of Annette. He
thought that because she had given herself, because Annette had
abdicated, the situation between them was no longer quite equal,
and that he must now demand marriage. Annette saw only too
clearly that, if he married her now, he would think himself justified a
thousand times more than ever in playing her guardian. Of course,
she was grateful to him for his correct insistence. But . . . she
refused. Roger was secretly irritated by this. He no longer
understood her. . . . (He thought that he had always understood
her!) . . . And he judged her severely. He did not show it. But she
guessed it, with mingled sorrow and irony, and always tenderness. .
. . (He was still Roger! . . .)
When they had nearly arrived, she placed her gloved hand on
Roger's hand. He started:
"Annette!"
"Let us forgive each other!" she said.
He wished to speak; he could not. Their hands remained clasped.
They did not look at each other, but each knew that the other was
holding back the tears, ready to flow. . . .
They were at the station; they had to be discreet. Roger installed
Annette in her carriage. She was not alone in the compartment.
They had to restrict themselves to commonplace courtesies; but the
eyes of each were avidly seizing upon the image of the other's
beloved face.
The engine whistled.
"Till we meet again!" they said.
And they were thinking: "Never!"
The train pulled out. Roger returned home in the falling night. His
heart was full of sorrow and of anger. Of anger against Annette. Of
anger against himself. He felt torn asunder. He felt—oh, shame!—he
felt relieved. . . .
And stopping his horse on the deserted road, in contempt for
himself and in contempt for love, he wept bitterly.
XVII
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.