Machine Learning in Finance
Matthew F. Dixon • Igor Halperin • Paul Bilokon
Machine Learning in Finance
From Theory to Practice
Matthew F. Dixon
Department of Applied Mathematics
Illinois Institute of Technology
Chicago, IL, USA
Igor Halperin
Tandon School of Engineering
New York University
Brooklyn, NY, USA
Paul Bilokon
Department of Mathematics
Imperial College London
London, UK
Additional material to this book can be downloaded from http://mypages.iit.edu/~mdixon7/
book/ML_Finance_Codes-Book.zip
ISBN 978-3-030-41067-4
ISBN 978-3-030-41068-1 (eBook)
https://doi.org/10.1007/978-3-030-41068-1
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Once you eliminate the impossible, whatever
remains, no matter how improbable, must be
the truth.
—Arthur Conan Doyle
Introduction
Machine learning in finance sits at the intersection of a number of emergent
and established disciplines including pattern recognition, financial econometrics,
statistical computing, probabilistic programming, and dynamic programming. With
the trend towards increasing computational resources and larger datasets, machine
learning has grown into a central computational engineering field, with an emphasis
placed on plug-and-play algorithms made available through open-source machine
learning toolkits. Algorithm focused areas of finance, such as algorithmic trading
have been the primary adopters of this technology. But outside of engineering-based
research groups and business activities, much of the field remains a mystery.
A key barrier to understanding machine learning for non-engineering students
and practitioners is the absence of the well-established theories and concepts that
financial time series analysis equips us with. These serve as the basis for the
development of financial modeling intuition and scientific reasoning. Moreover,
machine learning is heavily entrenched in engineering ontology, which makes developments in the field somewhat intellectually inaccessible for students, academics,
and finance practitioners from the quantitative disciplines such as mathematics,
statistics, physics, and economics. Consequently, there is a great deal of misconception and limited understanding of the capacity of this field. While machine
learning techniques are often effective, they remain poorly understood and are
often mathematically indefensible. How do we place key concepts in the field of
machine learning in the context of more foundational theory in time series analysis,
econometrics, and mathematical statistics? Under which simplifying conditions are
advanced machine learning techniques such as deep neural networks mathematically
equivalent to well-known statistical models such as linear regression? How should
we reason about the perceived benefits of using advanced machine learning methods
over more traditional econometrics methods, for different financial applications?
What theory supports the application of machine learning to problems in financial
modeling? How does reinforcement learning provide a model-free approach to
the Black–Scholes–Merton model for derivative pricing? How does Q-learning
generalize discrete-time stochastic control problems in finance?
vii
viii
Introduction
This book is written for advanced graduate students and academics in financial
econometrics, management science, and applied statistics, in addition to quants and
data scientists in the field of quantitative finance. We present machine learning
as a non-linear extension of various topics in quantitative economics such as
financial econometrics and dynamic programming, with an emphasis on novel
algorithmic representations of data, regularization, and techniques for controlling
the bias-variance tradeoff leading to improved out-of-sample forecasting. The book
is presented in three parts, each part covering theory and applications. The first
part presents supervised learning for cross-sectional data from both a Bayesian
and frequentist perspective. The more advanced material places a firm emphasis
on neural networks, including deep learning, as well as Gaussian processes, with
examples in investment management and derivatives. The second part covers
supervised learning for time series data, arguably the most common data type
used in finance with examples in trading, stochastic volatility, and fixed income
modeling. Finally, the third part covers reinforcement learning and its applications
in trading, investment, and wealth management. We provide Python code examples
to support the readers’ understanding of the methodologies and applications. As
a bridge to research in this emergent field, we present the frontiers of machine
learning in finance from a researcher’s perspective, highlighting how many wellknown concepts in statistical physics are likely to emerge as research topics for
machine learning in finance.
Prerequisites
This book is targeted at graduate students in data science, mathematical finance,
financial engineering, and operations research seeking a career in quantitative
finance, data science, analytics, and fintech. Students are expected to have completed upper section undergraduate courses in linear algebra, multivariate calculus,
advanced probability theory and stochastic processes, statistics for time series
(econometrics), and gained some basic introduction to numerical optimization and
computational mathematics. Students shall find the later chapters of this book,
on reinforcement learning, more accessible with some background in investment
science. Students should also have prior experience with Python programming and,
ideally, taken a course in computational finance and introductory machine learning.
The material in this book is more mathematical and less engineering focused than
most courses on machine learning, and for this reason we recommend reviewing
the recent book, Linear Algebra and Learning from Data by Gilbert Strang as
background reading.
Introduction
ix
Advantages of the Book
Readers will find this book useful as a bridge from well-established foundational
topics in financial econometrics to applications of machine learning in finance.
Statistical machine learning is presented as a non-parametric extension of financial
econometrics and quantitative finance, with an emphasis on novel algorithmic representations of data, regularization, and model averaging to improve out-of-sample
forecasting. The key distinguishing feature from classical financial econometrics
and dynamic programming is the absence of an assumption on the data generation
process. This has important implications for modeling and performance assessment
which are emphasized with examples throughout the book. Some of the main
contributions of the book are as follows:
• The textbook market is saturated with excellent books on machine learning.
However, few present the topic from the prospective of financial econometrics
and cast fundamental concepts in machine learning into canonical modeling and
decision frameworks already well established in finance such as financial time
series analysis, investment science, and financial risk management. Only through
the integration of these disciplines can we develop an intuition into how machine
learning theory informs the practice of financial modeling.
• Machine learning is entrenched in engineering ontology, which makes developments in the field somewhat intellectually inaccessible for students, academics,
and finance practitioners from quantitative disciplines such as mathematics,
statistics, physics, and economics. Moreover, financial econometrics has not kept
pace with this transformative field, and there is a need to reconcile various
modeling concepts between these disciplines. This textbook is built around
powerful mathematical ideas that shall serve as the basis for a graduate course for
students with prior training in probability and advanced statistics, linear algebra,
times series analysis, and Python programming.
• This book provides financial market motivated and compact theoretical treatment
of financial modeling with machine learning for the benefit of regulators, wealth
managers, federal research agencies, and professionals in other heavily regulated
business functions in finance who seek a more theoretical exposition to allay
concerns about the “black-box” nature of machine learning.
• Reinforcement learning is presented as a model-free framework for stochastic
control problems in finance, covering portfolio optimization, derivative pricing,
and wealth management applications without assuming a data generation
process. We also provide a model-free approach to problems in market
microstructure, such as optimal execution, with Q-learning. Furthermore,
our book is the first to present on methods of inverse reinforcement
learning.
• Multiple-choice questions, numerical examples, and more than 80 end-ofchapter exercises are used throughout the book to reinforce key technical
concepts.
x
Introduction
• This book provides Python codes demonstrating the application of machine
learning to algorithmic trading and financial modeling in risk management
and equity research. These codes make use of powerful open-source software
toolkits such as Google’s TensorFlow and Pandas, a data processing environment
for Python.
Overview of the Book
Chapter 1
Chapter 1 provides the industry context for machine learning in finance, discussing
the critical events that have shaped the finance industry’s need for machine learning
and the unique barriers to adoption. The finance industry has adopted machine
learning to varying degrees of sophistication. How it has been adopted is heavily
fragmented by the academic disciplines underpinning the applications. We view
some key mathematical examples that demonstrate the nature of machine learning
and how it is used in practice, with the focus on building intuition for more technical
expositions in later chapters. In particular, we begin to address many finance
practitioner’s concerns that neural networks are a “black-box” by showing how they
are related to existing well-established techniques such as linear regression, logistic
regression, and autoregressive time series models. Such arguments are developed
further in later chapters.
Chapter 2
Chapter 2 introduces probabilistic modeling and reviews foundational concepts
in Bayesian econometrics such as Bayesian inference, model selection, online
learning, and Bayesian model averaging. We develop more versatile representations
of complex data with probabilistic graphical models such as mixture models.
Chapter 3
Chapter 3 introduces Bayesian regression and shows how it extends many of
the concepts in the previous chapter. We develop kernel-based machine learning
methods—specifically Gaussian process regression, an important class of Bayesian
machine learning methods—and demonstrate their application to “surrogate” models of derivative prices. This chapter also provides a natural point from which to
Introduction
xi
develop intuition for the role and functional form of regularization in a frequentist
setting—the subject of subsequent chapters.
Chapter 4
Chapter 4 provides a more in-depth description of supervised learning, deep
learning, and neural networks—presenting the foundational mathematical and statistical learning concepts and explaining how they relate to real-world examples in
trading, risk management, and investment management. These applications present
challenges for forecasting and model design and are presented as a reoccurring
theme throughout the book. This chapter moves towards a more engineering
style exposition of neural networks, applying concepts in the previous chapters to
elucidate various model design choices.
Chapter 5
Chapter 5 presents a method for interpreting neural networks which imposes minimal restrictions on the neural network design. The chapter demonstrates techniques
for interpreting a feedforward network, including how to rank the importance of
the features. In particular, an example demonstrating how to apply interpretability
analysis to deep learning models for factor modeling is also presented.
Chapter 6
Chapter 6 provides an overview of the most important modeling concepts in
financial econometrics. Such methods form the conceptual basis and performance
baseline for more advanced neural network architectures presented in the next
chapter. In fact, each type of architecture is a generalization of many of the models
presented here. This chapter is especially useful for students from an engineering or
science background, with little exposure to econometrics and time series analysis.
Chapter 7
Chapter 7 presents a powerful class of probabilistic models for financial data.
Many of these models overcome some of the severe stationarity limitations of the
frequentist models in the previous chapters. The fitting procedure demonstrated is
also different—the use of Kalman filtering algorithms for state-space models rather
xii
Introduction
than maximum likelihood estimation or Bayesian inference. Simple examples of
hidden Markov models and particle filters in finance and various algorithms are
presented.
Chapter 8
Chapter 8 presents various neural network models for financial time series analysis,
providing examples of how they relate to well-known techniques in financial econometrics. Recurrent neural networks (RNNs) are presented as non-linear time series
models and generalize classical linear time series models such as AR(p). They
provide a powerful approach for prediction in financial time series and generalize
to non-stationary data. The chapter also presents convolution neural networks for
filtering time series data and exploiting different scales in the data. Finally, this
chapter demonstrates how autoencoders are used to compress information and
generalize principal component analysis.
Chapter 9
Chapter 9 introduces Markov decision processes and the classical methods of
dynamic programming, before building familiarity with the ideas of reinforcement
learning and other approximate methods for solving MDPs. After describing Bellman optimality and iterative value and policy updates before moving to Q-learning,
the chapter quickly advances towards a more engineering style exposition of the
topic, covering key computational concepts such as greediness, batch learning, and
Q-learning. Through a number of mini-case studies, the chapter provides insight
into how RL is applied to optimization problems in asset management and trading.
These examples are each supported with Python notebooks.
Chapter 10
Chapter 10 considers real-world applications of reinforcement learning in finance,
as well as further advances the theory presented in the previous chapter. We start
with one of the most common problems of quantitative finance, which is the problem
of optimal portfolio trading in discrete time. Many practical problems of trading or
risk management amount to different forms of dynamic portfolio optimization, with
different optimization criteria, portfolio composition, and constraints. The chapter
introduces a reinforcement learning approach to option pricing that generalizes the
classical Black–Scholes model to a data-driven approach using Q-learning. It then
presents a probabilistic extension of Q-learning called G-learning and shows how it
Introduction
xiii
can be used for dynamic portfolio optimization. For certain specifications of reward
functions, G-learning is semi-analytically tractable and amounts to a probabilistic
version of linear quadratic regulators (LQRs). Detailed analyses of such cases are
presented and we show their solutions with examples from problems of dynamic
portfolio optimization and wealth management.
Chapter 11
Chapter 11 provides an overview of the most popular methods of inverse reinforcement learning (IRL) and imitation learning (IL). These methods solve the problem
of optimal control in a data-driven way, similarly to reinforcement learning, however
with the critical difference that now rewards are not observed. The problem is rather
to learn the reward function from the observed behavior of an agent. As behavioral
data without rewards are widely available, the problem of learning from such data
is certainly very interesting. The chapter provides a moderate-level description of
the most promising IRL methods, equips the reader with sufficient knowledge to
understand and follow the current literature on IRL, and presents examples that use
simple simulated environments to see how these methods perform when we know
the “ground truth" rewards. We then present use cases for IRL in quantitative finance
that include applications to trading strategy identification, sentiment-based trading,
option pricing, inference of portfolio investors, and market modeling.
Chapter 12
Chapter 12 takes us forward to emerging research topics in quantitative finance
and machine learning. Among many interesting emerging topics, we focus here
on two broad themes. The first one deals with unification of supervised learning
and reinforcement learning as two tasks of perception-action cycles of agents. We
outline some recent research ideas in the literature including in particular information theory-based versions of reinforcement learning and discuss their relevance for
financial applications. We explain why these ideas might have interesting practical
implications for RL financial models, where feature selection could be done within
the general task of optimization of a long-term objective, rather than outside of it,
as is usually performed in “alpha-research.”
The second topic presented in this chapter deals with using methods of reinforcement learning to construct models of market dynamics. We also introduce some
advanced physics-based approaches for computations for such RL-inspired market
models.
xiv
Introduction
Source Code
Many of the chapters are accompanied by Python notebooks to illustrate some
of the main concepts and demonstrate application of machine learning methods.
Each notebook is lightly annotated. Many of these notebooks use TensorFlow.
We recommend loading these notebooks, together with any accompanying Python
source files and data, in Google Colab. Please see the appendices of each chapter
accompanied by notebooks, and the README.md in the subfolder of each chapter,
for further instructions and details.
Scope
We recognize that the field of machine learning is developing rapidly and to keep
abreast of the research in this field is a challenging pursuit. Machine learning is an
umbrella term for a number of methodology classes, including supervised learning,
unsupervised learning, and reinforcement learning. This book focuses on supervised
learning and reinforcement learning because these are the areas with the most
overlap with econometrics, predictive modeling, and optimal control in finance.
Supervised machine learning can be categorized as generative and discriminative.
Our focus is on discriminative learners which attempt to partition the input
space, either directly, through affine transformations or through projections onto
a manifold. Neural networks have been shown to provide a universal approximation
to a wide class of functions. Moreover, they can be shown to reduce to other wellknown statistical techniques and are adaptable to time series data.
Extending time series models, a number of chapters in this book are devoted to
an introduction to reinforcement learning (RL) and inverse reinforcement learning
(IRL) that deal with problems of optimal control of such time series and show how
many classical financial problems such as portfolio optimization, option pricing, and
wealth management can naturally be posed as problems for RL and IRL. We present
simple RL methods that can be applied for these problems, as well as explain how
neural networks can be used in these applications.
There are already several excellent textbooks covering other classical machine
learning methods, and we instead choose to focus on how to cast machine learning
into various financial modeling and decision frameworks. We emphasize that much
of this material is not unique to neural networks, but comparisons of alternative
supervised learning approaches, such as random forests, are beyond the scope of
this book.
Introduction
xv
Multiple-Choice Questions
Multiple-choice questions are included after introducing a key concept. The correct
answers to all questions are provided at the end of each chapter with selected, partial,
explanations to some of the more challenging material.
Exercises
The exercises that appear at the end of every chapter form an important component
of the book. Each exercise has been chosen to reinforce concepts explained in the
text, to stimulate the application of machine learning in finance, and to gently bridge
material in other chapters. It is graded according to difficulty ranging from (*),
which denotes a simple exercise which might take a few minutes to complete,
through to (***), which denotes a significantly more complex exercise. Unless
specified otherwise, all equations referenced in each exercise correspond to those
in the corresponding chapter.
Instructor Materials
The book is supplemented by a separate Instructor’s Manual which provides worked
solutions to the end of chapter questions. Full explanations for the solutions to the
multiple-choice questions are also provided. The manual provides additional notes
and example code solutions for some of the programming exercises in the later
chapters.
Acknowledgements
This book is dedicated to the late Mark Davis (Imperial College) who was an
inspiration in the field of mathematical finance and engineering, and formative in
our careers. Peter Carr, Chair of the Department of Financial Engineering at NYU
Tandon, has been instrumental in supporting the growth of the field of machine
learning in finance. Through providing speaker engagements and machine learning
instructorship positions in the MS in Algorithmic Finance Program, the authors have
been able to write research papers and identify the key areas required by a text
book. Miquel Alonso (AIFI), Agostino Capponi (Columbia), Rama Cont (Oxford),
Kay Giesecke (Stanford), Ali Hirsa (Columbia), Sebastian Jaimungal (University
of Toronto), Gary Kazantsev (Bloomberg), Morton Lane (UIUC), Jörg Osterrieder
(ZHAW) have established various academic and joint academic-industry workshops
xvi
Introduction
and community meetings to proliferate the field and serve as input for this book.
At the same time, there has been growing support for the development of a book
in London, where several SIAM/LMS workshops and practitioner special interest
groups, such as the Thalesians, have identified a number of compelling financial
applications. The material has grown from courses and invited lectures at NYU,
UIUC, Illinois Tech, Imperial College and the 2019 Bootcamp on Machine Learning
in Finance at the Fields Institute, Toronto.
Along the way, we have been fortunate to receive the support of Tomasz Bielecki
(Illinois Tech), Igor Cialenco (Illinois Tech), Ali Hirsa (Columbia University),
and Brian Peterson (DV Trading). Special thanks to research collaborators and
colleagues Kay Giesecke (Stanford University), Diego Klabjan (NWU), Nick
Polson (Chicago Booth), and Harvey Stein (Bloomberg), all of whom have shaped
our understanding of the emerging field of machine learning in finance and the many
practical challenges. We are indebted to Sri Krishnamurthy (QuantUniversity),
Saeed Amen (Cuemacro), Tyler Ward (Google), and Nicole Königstein for their
valuable input on this book. We acknowledge the support of a number of Illinois
Tech graduate students who have contributed to the source code examples and
exercises: Xiwen Jing, Bo Wang, and Siliang Xong. Special thanks to Swaminathan
Sethuraman for his support of the code development, to Volod Chernat and George
Gvishiani who provided support and code development for the course taught at
NYU and Coursera. Finally, we would like to thank the students and especially the
organisers of the MSc Finance and Mathematics course at Imperial College, where
many of the ideas presented in this book have been tested: Damiano Brigo, Antoine
(Jack) Jacquier, Mikko Pakkanen, and Rula Murtada. We would also like to thank
Blanka Horvath for many useful suggestions.
Chicago, IL, USA
Brooklyn, NY, USA
London, UK
December 2019
Matthew F. Dixon
Igor Halperin
Paul Bilokon
Contents
Part I Machine Learning with Cross-Sectional Data
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
Big Data—Big Compute in Finance . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Fintech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Machine Learning and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Statistical Modeling vs. Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Modeling Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Financial Econometrics and Machine Learning . . . . . . . . . . . . . . .
3.3
Over-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Examples of Supervised Machine Learning in Practice . . . . . . . . . . . . . .
5.1
Algorithmic Trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
High-Frequency Trade Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Mortgage Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
4
6
8
11
14
16
16
18
21
22
28
29
32
34
40
41
44
2
Probabilistic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Bayesian vs. Frequentist Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Frequentist Inference from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Assessing the Quality of Our Estimator: Bias and Variance . . . . . . . . .
5
The Bias–Variance Tradeoff (Dilemma) for Estimators . . . . . . . . . . . . . .
6
Bayesian Inference from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
A More Informative Prior: The Beta Distribution . . . . . . . . . . . . .
6.2
Sequential Bayesian updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
48
51
53
55
56
60
61
xvii
xviii
Contents
6.3
Practical Implications of Choosing a Classical
or Bayesian Estimation Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1
Bayesian Inference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2
Model Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3
Model Selection When There Are Many Models . . . . . . . . . . . . .
7.4
Occam’s Razor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5
Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
4
63
63
64
65
66
69
69
70
72
76
76
80
Bayesian Regression and Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Bayesian Inference with Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Bayesian Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Schur Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Gaussian Processes in Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Gaussian Processes Regression and Prediction . . . . . . . . . . . . . . .
3.3
Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Computational Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Massively Scalable Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Structured Kernel Interpolation (SKI) . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Kernel Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Example: Pricing and Greeking with Single-GPs. . . . . . . . . . . . . . . . . . . . .
5.1
Greeking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Mesh-Free GPs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Massively Scalable GPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Multi-response Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Multi-Output Gaussian Process Regression
and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
Programming Related Questions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
81
82
86
88
89
91
92
93
94
96
96
97
97
98
101
101
103
103
Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Feedforward Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Geometric Interpretation of Feedforward Networks . . . . . . . . . .
2.3
Probabilistic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
111
111
112
112
114
117
104
105
106
107
108
Contents
5
xix
2.4
Function Approximation with Deep Learning* . . . . . . . . . . . . . . .
2.5
VC Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
When Is a Neural Network a Spline?* . . . . . . . . . . . . . . . . . . . . . . . . .
2.7
Why Deep Networks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Convexity and Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Similarity of MLPs with Other Supervised Learners . . . . . . . . .
4
Training, Validation, and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Stochastic Gradient Descent (SGD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Back-Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Bayesian Neural Networks* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
Programming Related Questions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
120
124
127
132
138
140
142
143
146
149
153
153
156
164
Interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Background on Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Explanatory Power of Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Multiple Hidden Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Example: Step Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Interaction Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Example: Friedman Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Bounds on the Variance of the Jacobian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Chernoff Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Simulated Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Factor Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Non-linear Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Fundamental Factor Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
Programming Related Questions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
167
168
168
169
170
170
170
171
172
174
174
177
177
178
183
184
184
188
Part II Sequential Learning
6
Sequence Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Autoregressive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Autoregressive Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
Partial Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191
191
192
192
194
195
195
197
xx
Contents
2.6
Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7
Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8
Moving Average Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9
GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Exponential Smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Fitting Time Series Models: The Box–Jenkins Approach . . . . . . . . . . . .
3.1
Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Transformation to Ensure Stationarity . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Predicting Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Time Series Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Principal Component Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199
200
201
202
204
205
205
206
206
208
210
210
213
213
215
216
217
218
220
7
Probabilistic Sequence Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Hidden Markov Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
State-Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Sequential Importance Resampling (SIR) . . . . . . . . . . . . . . . . . . . . .
3.2
Multinomial Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Application: Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . .
4
Point Calibration of Stochastic Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Bayesian Calibration of Stochastic Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
221
221
222
224
227
227
228
229
230
231
233
235
235
237
8
Advanced Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
RNN Memory: Partial Autocovariance . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Generalized Recurrent Neural Networks (GRNNs) . . . . . . . . . . .
3
Gated Recurrent Units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
α-RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Neural Network Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . .
3.3
Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . . . . . . .
239
239
240
244
245
246
248
249
249
251
254
Contents
4
Python Notebook Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Bitcoin Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Predicting from the Limit Order Book. . . . . . . . . . . . . . . . . . . . . . . . .
5
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Weighted Moving Average Smoothers . . . . . . . . . . . . . . . . . . . . . . . .
5.2
2D Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
Dilated Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5
Python Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
Linear Autoencoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Equivalence of Linear Autoencoders and PCA . . . . . . . . . . . . . . .
6.3
Deep Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
Programming Related Questions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxi
255
256
256
257
258
261
263
264
265
266
267
268
270
271
272
273
275
Part III Sequential Data with Decision-Making
9
Introduction to Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Value and Policy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Observable Versus Partially Observable Environments . . . . . . .
3
Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Decision Policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Value Functions and Bellman Equations . . . . . . . . . . . . . . . . . . . . . .
3.3
Optimal Policy and Bellman Optimality. . . . . . . . . . . . . . . . . . . . . . .
4
Dynamic Programming Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Reinforcement Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Policy-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Temporal Difference Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
SARSA and Q-Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5
Stochastic Approximations and Batch-Mode Q-learning . . . . .
5.6
Q-learning in a Continuous Space: Function
Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7
Batch-Mode Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8
Least Squares Policy Iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9
Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
279
279
284
284
286
286
289
291
293
296
299
300
302
303
306
307
309
311
313
316
323
327
331
335
xxii
Contents
6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10
Applications of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
The QLBS Model for Option Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Discrete-Time Black–Scholes–Merton Model . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Hedge Portfolio Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Optimal Hedging Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Option Pricing in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Hedging and Pricing in the BS Limit . . . . . . . . . . . . . . . . . . . . . . . . . .
4
The QLBS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
State Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Bellman Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
DP Solution: Monte Carlo Implementation . . . . . . . . . . . . . . . . . . .
4.5
RL Solution for QLBS: Fitted Q Iteration . . . . . . . . . . . . . . . . . . . . .
4.6
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
Option Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8
Possible Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
G-Learning for Stock Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
Investment Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Terminal Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
Asset Returns Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5
Signal Dynamics and State Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6
One-Period Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7
Multi-period Portfolio Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8
Stochastic Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9
Reference Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.10 Bellman Optimality Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11 Entropy-Regularized Bellman Optimality Equation . . . . . . . . . .
5.12 G-Function: An Entropy-Regularized Q-Function . . . . . . . . . . . .
5.13 G-Learning and F-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14 Portfolio Dynamics with Market Impact . . . . . . . . . . . . . . . . . . . . . .
5.15 Zero Friction Limit: LQR with Entropy Regularization . . . . . .
5.16 Non-zero Market Impact: Non-linear Dynamics . . . . . . . . . . . . . .
6
RL for Wealth Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
The Merton Consumption Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
Portfolio Optimization for a Defined Contribution
Retirement Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3
G-Learning for Retirement Plan Optimization . . . . . . . . . . . . . . . .
6.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
347
347
349
352
352
354
356
359
360
361
362
365
368
370
373
375
379
380
380
381
382
383
383
384
386
386
388
388
389
391
393
395
396
400
401
401
405
408
413
413
Contents
xxiii
8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
11
Inverse Reinforcement Learning and Imitation Learning . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Inverse Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
RL Versus IRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
What Are the Criteria for Success in IRL? . . . . . . . . . . . . . . . . . . . .
2.3
Can a Truly Portable Reward Function Be Learned
with IRL?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Maximum Entropy Inverse Reinforcement Learning . . . . . . . . . . . . . . . . .
3.1
Maximum Entropy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Maximum Causal Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
G-Learning and Soft Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Maximum Entropy IRL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
Estimating the Partition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Example: MaxEnt IRL for Inference of Customer Preferences . . . . . .
4.1
IRL and the Problem of Customer Choice. . . . . . . . . . . . . . . . . . . . .
4.2
Customer Utility Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Maximum Entropy IRL for Customer Utility . . . . . . . . . . . . . . . . .
4.4
How Much Data Is Needed? IRL and Observational
Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
Counterfactual Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6
Finite-Sample Properties of MLE Estimators . . . . . . . . . . . . . . . . .
4.7
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Adversarial Imitation Learning and IRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
GAIL: Generative Adversarial Imitation Learning. . . . . . . . . . . .
5.3
GAIL as an Art of Bypassing RL in IRL . . . . . . . . . . . . . . . . . . . . . .
5.4
Practical Regularization in GAIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5
Adversarial Training in GAIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6
Other Adversarial Approaches* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7
f-Divergence Training* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8
Wasserstein GAN*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9
Least Squares GAN* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Beyond GAIL: AIRL, f-MAX, FAIRL, RS-GAIL, etc.* . . . . . . . . . . . . .
6.1
AIRL: Adversarial Inverse Reinforcement Learning . . . . . . . . .
6.2
Forward KL or Backward KL?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3
f-MAX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4
Forward KL: FAIRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5
Risk-Sensitive GAIL (RS-GAIL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Gaussian Process Inverse Reinforcement Learning. . . . . . . . . . . . . . . . . . .
7.1
Bayesian IRL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2
Gaussian Process IRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
419
419
423
425
426
427
428
430
433
436
438
442
443
444
445
446
450
452
454
455
457
457
459
461
464
466
468
468
469
471
471
472
474
476
477
479
481
481
482
483
xxiv
Contents
8
12
Can IRL Surpass the Teacher? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1
IRL from Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2
Learning Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3
T-REX: Trajectory-Ranked Reward EXtrapolation . . . . . . . . . . .
8.4
D-REX: Disturbance-Based Reward EXtrapolation . . . . . . . . . .
9
Let Us Try It Out: IRL for Financial Cliff Walking . . . . . . . . . . . . . . . . . .
9.1
Max-Causal Entropy IRL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2
IRL from Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3
T-REX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Financial Applications of IRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Algorithmic Trading Strategy Identification. . . . . . . . . . . . . . . . . . .
10.2 Inverse Reinforcement Learning for Option Pricing . . . . . . . . . .
10.3 IRL of a Portfolio Investor with G-Learning . . . . . . . . . . . . . . . . . .
10.4 IRL and Reward Learning for Sentiment-Based
Trading Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5 IRL and the “Invisible Hand” Inference . . . . . . . . . . . . . . . . . . . . . . .
11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
484
485
487
488
490
490
491
492
493
494
495
495
497
499
Frontiers of Machine Learning and Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Market Dynamics, IRL, and Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
“Quantum Equilibrium–Disequilibrium” (QED) Model . . . . . .
2.2
The Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
The GBM Model as the Langevin Equation . . . . . . . . . . . . . . . . . . .
2.4
The QED Model as the Langevin Equation . . . . . . . . . . . . . . . . . . .
2.5
Insights for Financial Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
Insights for Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Physics and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Hierarchical Representations in Deep Learning
and Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Tensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Bounded-Rational Agents in a Non-equilibrium
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
A “Grand Unification” of Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Perception-Action Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Information Theory Meets Reinforcement Learning. . . . . . . . . .
4.3
Reinforcement Learning Meets Supervised Learning:
Predictron, MuZero, and Other New Ideas . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
519
519
521
522
523
524
525
527
528
529
504
505
512
513
515
529
530
534
535
537
538
539
540
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
About the Authors
Matthew F. Dixon is an Assistant Professor of Applied Math at the Illinois Institute
of Technology. His research in computational methods for finance is funded by
Intel. Matthew began his career in structured credit trading at Lehman Brothers
in London before pursuing academics and consulting for financial institutions in
quantitative trading and risk modeling. He holds a Ph.D. in Applied Mathematics
from Imperial College (2007) and has held postdoctoral and visiting professor
appointments at Stanford University and UC Davis, respectively. He has published
over 20 peer-reviewed publications on machine learning and financial modeling,
has been cited in Bloomberg Markets and the Financial Times as an AI in fintech
expert, and is a frequently invited speaker in Silicon Valley and on Wall Street. He
has published R packages, served as a Google Summer of Code mentor, and is the
co-founder of the Thalesians Ltd.
Igor Halperin is a Research Professor in Financial Engineering at NYU and an
AI Research Associate at Fidelity Investments. He was previously an Executive
Director of Quantitative Research at JPMorgan for nearly 15 years. Igor holds a
Ph.D. in Theoretical Physics from Tel Aviv University (1994). Prior to joining
the financial industry, he held postdoctoral positions in theoretical physics at the
Technion and the University of British Columbia.
Paul Bilokon is CEO and Founder of Thalesians Ltd. and an expert in electronic
and algorithmic trading across multiple asset classes, having helped build such
businesses at Deutsche Bank and Citigroup. Before focusing on electronic trading,
Paul worked on derivatives and has served in quantitative roles at Nomura, Lehman
Brothers, and Morgan Stanley. Paul has been educated at Christ Church College,
Oxford, and Imperial College. Apart from mathematical and computational finance,
his academic interests include machine learning and mathematical logic.
xxv