QUANTITATIVE ECONOMICS With Python PDF
QUANTITATIVE ECONOMICS With Python PDF
QUANTITATIVE ECONOMICS With Python PDF
1 Programming in Python 7
1.1 About Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Setting up Your Python Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 An Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4 Python Essentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.5 Object Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1.6 How it Works: Data, Variables and Names . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.7 More Language Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
1.8 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
1.9 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
1.10 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
1.11 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
1.12 IPython Shell and Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
1.13 The Need for Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3
3.1 Continuous State Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
3.2 The Lucas Asset Pricing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
3.3 The Aiyagari Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
3.4 Modeling Career Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
3.5 On-the-Job Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
3.6 Search with Offer Distribution Unknown . . . . . . . . . . . . . . . . . . . . . . . . . 463
3.7 Optimal Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
3.8 Covariance Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
3.9 Estimation of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
3.10 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
3.11 Dynamic Stackelberg Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
3.12 Optimal Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
3.13 History Dependent Public Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
3.14 Optimal Taxation with State-Contingent Debt . . . . . . . . . . . . . . . . . . . . . . . 596
3.15 Optimal Taxation without State-Contingent Debt . . . . . . . . . . . . . . . . . . . . . 622
3.16 Default Risk and Income Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
4 Solutions 653
References 659
CONTENTS 5
Note: You are currently viewing an automatically generated PDF version of our on-
line lectures, which are located at
http://quant-econ.net
Please visit the website for more information on the aims and scope of the lectures
and the two language options (Julia or Python). This PDF is generated from a set of
source files that are oriented towards the website and to HTML output. As a result, the
presentation quality can be less consistent than the website.
ONE
PROGRAMMING IN PYTHON
This first part of the course provides a relatively fast-paced introduction to the Python program-
ming language
Contents
• About Python
– Overview
– What’s Python?
– Scientific Programming
– Learn More
Overview
What’s Python?
7
1.1. ABOUT PYTHON 8
Common Uses Python is a general purpose language used in almost all application domains
• communications
• web development
• CGI and graphical user interfaces
• games
• multimedia, data processing, security, etc., etc., etc.
Used extensively by Internet service and high tech companies such as
• Google
• Dropbox
• Reddit
• YouTube
Features
• A high level language suitable for rapid development
• Relatively small core language supported by many libraries
• A multiparadigm language, in that multiple programming styles are supported (procedural,
object-oriented, functional, etc.)
• Interpreted rather than compiled
Syntax and Design One nice feature of Python is its elegant syntax — we’ll see many examples
later on
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the syntax
easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means that
you don’t need to break your flow of thought in order to hunt down correct syntax on the Internet
Closely related to elegant syntax is elegant design
Features like iterators, generators, decorators, list comprehensions, etc. make Python highly ex-
pressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors
Scientific Programming
Over the last decade, Python has become one of the core languages of scientific computing
It’s now either the dominant player or a major player in
• Machine learning and data science
• Astronomy
• Artificial intelligence
• Chemistry
• Computational biology
• Meteorology
• etc., etc.
This section briefly showcases some examples of Python for scientific programming
• All of these topics will be covered in detail later on
Numerical programming Fundamental matrix and array processing capabilities are provided
by the excellent NumPy library
NumPy provides the basic array data type plus some simple processing operations
For example
In [1]: import numpy as np # Load the library
In [2]: a = np.linspace(-np.pi, np.pi, 100) # Create array (even grid from -pi to pi)
The SciPy library is built on top of NumPy and provides additional functionality For example,
R2
let’s calculate −2 φ(z)dz where φ is the standard normal density
In [5]: from scipy.stats import norm
In [9]: value
Out[9]: 0.9544997361036417
Graphics The most popular and comprehensive Python library for creating figures and graphs
is Matplotlib
In [12]: x + x + x + y
Out[12]: 3*x + y
In [14]: expression.expand()
Out[14]: x**2 + 2*x*y + y**2
solve polynomials
In [15]: from sympy import solve
In [16]: solve(x**2 + x + 2)
Out[16]: [-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]
In [18]: limit(1 / x, x, 0)
Out[18]: oo
In [19]: limit(sin(x) / x, x, 0)
Out[19]: 1
In [20]: diff(sin(x), x)
Out[20]: cos(x)
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
Statistics Python’s data manipulation and statistics libraries have improved rapidly over the last
few years
Pandas One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Here’s a simple example
In [21]: import pandas as pd
In [23]: data = sp.randn(5, 2) # Create 5x2 matrix of random numbers for toy example
In [26]: print(df)
price weight
In [27]: df.mean()
out[27]:
price 0.176616
weight 0.344975
Networks and Graphs Python has many libraries for studying graphs
One well-known example is NetworkX
• Standard graph algorithms for analyzing network structure, etc.
• Plotting routines
• etc., etc.
Here’s some example code that generates and plots a random graph, with node color determined
by shortest path length from a central node
"""
Filename: nx_demo.py
Authors: John Stachurski and Thomas J. Sargent
"""
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
node_color=list(p.values()), cmap=plt.cm.jet_r)
plt.show()
Cloud Computing Running your Python code on massive servers in the cloud is becoming eas-
ier and easier
A nice example is Wakari
See also
• Amazon Elastic Compute Cloud
• The Google App Engine (Python, Java, PHP or Go)
• Pythonanywhere
• Sagemath Cloud
Parallel Processing Apart from the cloud computing options listed above, you might like to
consider
• Parallel computing through IPython clusters
• The Starcluster interface to Amazon’s EC2
• GPU programming through PyCuda, PyOpenCL, Theano or similar
Other Developments There are many other interesting developments with scientific program-
ming in Python
Some representative examples include
• Jupyter — Python in your browser with code cells, embedded images, etc.
• Numba — Make Python run at the same speed as native machine code!
• Blaze — a generalization of NumPy
• PyTables — manage large data sets
• CVXPY — convex optimization in Python
Learn More
Contents
• Setting up Your Python Environment
– Overview
– First Steps
– Jupyter
– Additional Software
– Alternatives
– Exercises
Overview
Warning: The core Python package is easy to install, but not what you should choose for these
lectures. The reason is that these lectures require the entire scientific programming ecosystem,
which the core installation doesn’t provide. Please read the following carefully.
First Steps
By far the best approach for our purposes is to install one of the free Python distributions that
contains
1. the core Python language and
2. the most popular scientific libraries
While there are several such distributions, we highly recommend Anaconda
Anaconda is
• very popular
• cross platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name
Anaconda also comes with a great package management system to organize your code libraries
All of what follows assumes that you adopt this recommendation!
Installing Anaconda Installing Anaconda is straightforward: download the binary and follow
the instructions
Important points:
• Install the latest version, which is currently Python 3.5
• If you are asked during the installation process whether you’d like to make Anaconda your
default Python installation, say yes
• Otherwise you can accept all of the defaults
What if you have an older version of Anaconda?
For most scientific programmers, the best thing you can do is uninstall (see, e.g., these instructions)
and then install the newest version
Package Management The packages in Anaconda contain the various scientific libraries used in
day to day scientific programming
Anaconda supplies a great tool called conda to keep your packages organized and up to date
One conda command you should execute regularly is the one that updates the whole Anaconda
distribution
As a practice run, please execute the following
1. Open up a terminal
• If you don’t know what a terminal is
– For Mac users, see this guide
– For Windows users, search for the cmd application or see this guide
– Linux users – you already know what a terminal is
2. Type conda update anaconda
(If you’ve already installed Anaconda and it was a little while ago, please make sure you execute
this step)
Another useful command is conda info, which tells you about your installation
For more information on conda
• type conda help in a terminal
• read the documentation online
Get a Modern Browser We’ll be using your browser to interact with Python, so now might be a
good time to
1. update your browser, or
2. install a free modern browser such as Chrome or Firefox
Once you’ve done that we can start having fun
Jupyter
Jupyter notebooks are one of the many possible ways to interact with Python and the scientific
Python stack
• Later we’ll look at others
Jupyter notebooks provide a browser-based interface to Python with
• The ability to write and execute Python commands directly in your browser
• Formatted output also in the browser, including tables, figures, animation, etc.
• The ability to mix in formatted text and mathematical expressions between cells
While Jupyter isn’t always the best way to code in Python, it is a great place to start off
Jupyter is also a powerful tool for orgainizing and communicating scientific ideas
In fact Jupyter is fast turning into a major player in scientific computing
Starting the Jupyter Notebook To start the Jupyter notebook, open up a terminal (cmd for Win-
dows) and type jupyter notebook
Here’s an example (click to enlarge)
Notebook Basics Let’s start with how to edit code and run simple programs
Running Cells Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual Enter
(Note: There are also menu and button options for running code in a cell that you can find by
exploring)
Modal Editing The next thing to understand about the Jupyter notebook is that it uses a modal
editing system
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are
1. Edit mode
• Indicated by a green border around one cell
• Whatever you type appears as is in that cell
2. Command mode
• The green border is replaced by a grey border
• Key strokes are interpreted as commands — for example, typing b adds a new cell
below the current one
• To switch to command mode from edit mode, hit the Esc key or Ctrl-M
• To switch to edit mode from command mode, hit Enter or click in a cell
The modal behavior of the IPython notebook is a little tricky at first but very efficient when you
get used to it
N = 20
theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)
ax = plt.subplot(111, polar=True)
bars = ax.bar(theta, radii, width=width, bottom=0.0)
plt.show()
Don’t worry about the details for now — let’s just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook, like so
Now Shift-Enter and a figure should appear looking a bit like this
Notes:
• The details of your figure will be different because the data is random
• The figure might be hidden behind your browser — have a look around your desktop
In-line Figures One nice thing about Jupyter notebooks is that figures can also be displayed
inside the page
To achieve this effect, use the matplotlib inline magic
Here we’ve done this by prepending %matplotlib inline to the cell and executing it again (click
to enlarge)
Working with the Notebook Let’s run through a few more notebook essentials
Other Content In addition to executing code, the Jupyter notebook allows you to embed text,
equations, figures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
Next we Esc to enter command mode and then type m to indicate that we are writing Markdown,
a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below the list
of menu items)
Now we Shift+Enter to produce this
Sharing Notebooks A notebook can easily be saved and shared between users
Notebook files are just text files structured in JSON and typically ending with .ipynb
For example, try downloading the notebook we just created by clicking here
Save it somewhere you can navigate to easily
Now you can import it from the dashboard (the first browser page that opens when you start
Jupyter notebook) and run the cells or edit as discussed above
You can also share your notebooks using nbviewer
The notebooks you see there are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right of its
page
Once downloaded you can open it as a notebook, as discussed just above
Additional Software
There are some other bits and pieces we need to know about before we can proceed with the
lectures
QuantEcon In these lectures we’ll make extensive use of code from the QuantEcon organization
On the Python side we’ll be using the QuantEcon.py version
Installing QuantEcon.py You can install QuantEcon.py by typing the following into a terminal
(terminal on Mac, cmd on Windows, etc.)
pip install quantecon
More instructions on installing and keeping your code up to date can be found at QuantEcon
Obtaining the GitHub Repo One way to do this is to download the zip file by clicking the
“Download ZIP” button on the main page
(Remember where you unzip the directory, and make it somewhere you can find it easily)
There is another, better way to get a copy of the repo, using a program called Git
We’ll investigate how to do this in Exercise 2
Working with Python Files How does one run a locally saved Python file using the notebook?
Method 1: Copy and Paste Copy and paste isn’t the slickest way to run programs but sometimes
it gets the job done
One option is
1. Navigate to your file with your mouse / trackpad using a file browser
2. Click on your file to open it with a text editor
• e.g., Notepad, TextEdit, TextMate, depending on your OS
3. Copy and paste into a cell and Shift-Enter
Method 2: Run Using the run command is usually faster and easier than copy and paste
• For example, run test.py will run the file test.py
Warning:
• Jupyter only looks for test.py in the present working directory (PWD)
• If test.py isn’t in that directory, you will get an error
Let’s look at a successful example, where we run a file test.py with contents:
for i in range(5):
print('foobar')
Here
• pwd asks Jupyter to show the PWD
– This is where Jupyter is going to look for files to run
– Your output will look a bit different depending on your OS
• ls asks Jupyter to list files in the PWD
– Note that test.py is there (on our computer, because we saved it there earlier)
• cat test.py asks Jupyter to print the contents of test.py
• run test.py runs the file and prints any output
But file X isn’t in my PWD! If you’re trying to run a file not in the present working director,
you’ll get an error
To fix this error you need to either
1. Shift the file into the PWD, or
2. Change the PWD to where the file lives
One way to achieve the first option is to use the Upload button
• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture
Loading Files It’s often convenient to be able to see your code before you run it
For this purpose we can replace run white_noise_plot.py with load white_noise_plot.py
Now the code from the file appears in a cell ready to execute
Alternatives
The preceding discussion covers most of what you need to know to write and run Python code
However, as you start to write longer programs, you might want to experiment with your work-
flow
There are many different options and we cover only a few
Text Editors A text editor is an application that is specifically designed to work with text files —
such as Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.
Among the most popular are Sublime Text and Atom
For a top quality open source text editor with a steeper learning curve, try Emacs
If you want an outstanding free text editor and don’t mind a seemingly vertical learning curve
plus long days of pain and suffering while all your neural pathways are rewired, try Vim
Text Editors Plus IPython Shell A text editor is for writing programs
To run them you can continue to use Jupyter as described above
Another option is to use the excellent IPython shell
To use an IPython shell, open up a terminal and type ipython
You should see something like this
The IPython shell has many of the features of the notebook: tab completion, color syntax, etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typing...
Here’s one set up, on a Linux box, with
• a file being edited in Vim
• An IPython shell next to it, to run the file
Exercises
Exercise 1 If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
Exercise 2
Getting the Repo with Git Git is a version control system — a piece of software used to manage
digital projects such as code libraries
In many cases the associated collections of files — called repositories — are stored on GitHub
GitHub is a wonderland of collaborative coding projects
For example, it hosts many of the scientific libraries we’ll be using later on, such as this one
Git is the underlying software used to manage these projects
Git is an extremely powerful tool for distributed collaboration — for example, we use it to share
and synchronize all the source files for these lectures
There are two main flavors of Git
1. the plain vanilla command line Git version
2. the various point-and-click GUI versions
• See, for example, the GitHub version
As an exercise, try
1. Installing Git
2. Getting a copy of QuantEcon.applications using Git
For example, if you’ve installed the command line version, open up a terminal and enter
git clone https://github.com/QuantEcon/QuantEcon.applications
(This is just git clone in front of the URL for the repository)
Even better,
1. Sign up to GitHub
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a GitHub
repository, stored on GitHub)
3. Fork QuantEcon.applications
4. Clone your fork to some local directory, make edits, commit them, and push them back up
to your forked GitHub repo
For reading on these and other topics, try
• The official Git documentation
• Reading through the docs on GitHub
• Pro Git Book by Scott Chacon and Ben Straub
• One of the thousands of Git tutorials on the Net
Contents
• An Introductory Example
– Overview
– First Example: Plotting a White Noise Process
– Exercises
– Solutions
We’re now ready to start learning the Python language itself, and the next few lectures are devoted
to this task
Our approach is aimed at those who already have at least some knowledge of fundamental pro-
gramming concepts, such as
• variables
• for loops, while loops
• conditionals (if/else)
Don’t give up if you have no programming experience—you are not excluded
You just need to cover some of the fundamentals of programming before returning here
One good references for first time programmers the first 5 or 6 chapters of How to Think Like a
Computer Scientist
Overview
In this lecture we will write and then pick apart small Python programs
The objective is to introduce you to basic Python syntax and data structures
Deeper concepts—how things work—will be covered in later lectures
In reading the following, you should be conscious of the fact that all “first programs” are to some
extent contrived
We try to avoid this, but nonetheless
• Be aware that the programs are written to illustrate certain concepts
• Soon you will be writing the same programs in a rather different—and more efficient—way
In particular, the scientific libraries will allow us to accomplish the same things faster and more
efficiently, once we know how to use them
However, you also need to learn pure Python, the core language
This is the objective of the present lecture, and the next few lectures too
Prerequisites: The lecture on getting started with Python
To begin, suppose we want to simulate and plot the white noise process e0 , e1 , . . . , eT , where each
draw et is independent standard normal
In other words, we want to generate figures that look something like this:
A program that accomplishes what we want can be found in the file test_program_1.py from the
applications repository
Let’s repeat it here:
In brief,
• Lines 1–2 use the Python import keyword to pull in functionality from external libraries
• Line 3 sets the desired length of the time series
• Line 4 creates an empty list called epsilon_values that will store the et values as we generate
them
• Line 5 tells the Python interpreter that it should cycle through the block of indented lines
(lines 6–7) ts_length times before continuing to line 8
– Lines 6–7 draw a new value et and append it to the end of the list epsilon_values
• Lines 8–9 generate the plot and display it to the user
Let’s now break this down and see how the different parts work
Import Statements First we’ll look at how to import functionality from outside your program,
as in lines 1–2
In [5]: normalvariate(0, 1)
Out[5]: -0.38430990243287594
In [6]: uniform(-1, 1)
Out[6]: 0.5492316853602877
In [2]: random.normalvariate(0, 1)
Out[2]: -0.12451500570438317
In [3]: random.uniform(-1, 1)
Out[3]: 0.35121616197003336
After importing the module itself, we can access anything defined within via
module_name.attribute_name syntax
Lists Next let’s consider the statement epsilon_values = [], which creates an empty list
Lists are a native Python data structure used to group a collection of objects. For example
In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
In [8]: type(x)
Out[8]: list
Here the first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [9]: x
Out[9]: [10, 'foo', False]
In [10]: x.append(2.5)
In [11]: x
Out[11]: [10, 'foo', False, 2.5]
Here append() is what’s called a method, which is a function “attached to” an object—in this case,
the list x
We’ll learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate the
data contained in the object
• String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
In [12]: x
Out[12]: [10, 'foo', False, 2.5]
In [13]: x.pop()
Out[13]: 2.5
In [14]: x
Out[14]: [10, 'foo', False]
In [16]: x[0]
Out[16]: 10
In [17]: x[1]
Out[17]: 'foo'
In [19]: range(5)
Out[19]: [0, 1, 2, 3, 4]
The For Loop Now let’s consider the for loop in test_program_1.py, which we repeat here for
convenience, along with the line that follows it
for i in range(ts_length):
e = normalvariate(0, 1)
epsilon_values.append(e)
plt.plot(epsilon_values, 'b-')
The for loop causes Python to execute the two indented lines a total of ts_length times before
moving on
These two lines are called a code block, since they comprise the “block” of code that we are
looping over
Unlike most other languages, Python knows the extent of the code block only from indentation
In particular, the fact that indentation decreases after line epsilon_values.append(e) tells Python
that this line marks the lower limit of the code block
More on indentation below—for now let’s look at another example of a for loop
animals = ['dog', 'cat', 'bird']
for animal in animals:
print("The plural of " + animal + " is " + animal + "s")
If you put this in a text file or Jupyter cell and run it you will see
The plural of dog is dogs
The plural of cat is cats
The plural of bird is birds
This example helps to clarify how the for loop works: When we execute a loop of the form
for variable_name in sequence:
<code block>
Code Blocks and Indentation In discussing the for loop, we explained that the code blocks
being looped over are delimited by indentation
In fact, in Python all code blocks (i.e., those occuring inside loops, if clauses, function definitions,
etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the program
Once you get used to it, this is a good thing: It
• forces clean, consistent indentation, improving readability
• removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.
• All lines in a code block must have the same amount of indentation
• The Python standard is 4 spaces, and that’s what you should use
Tabs vs Spaces One small “gotcha” here is the mixing of tabs and spaces, which often leads to
errorss
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure it’s configured to do so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs —
trying searching on line
While Loops The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, let’s modify test_program_1.py to use a while loop instead
In Python, the while loop syntax is as shown in the file test_program_2.py below
1 from random import normalvariate
2 import matplotlib.pyplot as plt
3 ts_length = 100
4 epsilon_values = []
5 i = 0
6 while i < ts_length:
7 e = normalvariate(0, 1)
8 epsilon_values.append(e)
9 i = i + 1
10 plt.plot(epsilon_values, 'b-')
11 plt.show()
User-Defined Functions Now let’s go back to the for loop, but restructure our program to make
the logic clearer
To this end, we will break our program into two parts:
1. A user-defined function that generates a list of random variables
2. The main part of the program that
(a) calls this function to get data
(b) plots the data
12 data = generate_data(100)
13 plt.plot(data, 'b-')
14 plt.show()
Let’s go over this carefully, in case you’re not familiar with functions and how they work
We have defined a function called generate_data(), where the definition spans lines 4–9
• def on line 4 is a Python keyword used to start function definitions
• def generate_data(n): indicates that the function is called generate_data, and that it has
a single argument n
• Lines 5–9 are a code block called the function body—in this case it creates an iid list of random
draws using the same logic as before
• Line 9 indicates that the list epsilon_values is the object that should be returned to the
calling code
This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100) in line 12, it executes the function
body (lines 5–9) with n set equal to 100.
The net result is that the name data on the left-hand side of line 12 is set equal to the list
epsilon_values returned by the function
8 if generator_type == 'U':
9 e = uniform(0, 1)
10 else:
11 e = normalvariate(0, 1)
12 epsilon_values.append(e)
13 return epsilon_values
14
15 data = generate_data(100, 'U')
16 plt.plot(data, 'b-')
17 plt.show()
Comments:
• Hopefully the syntax of the if/else clause is self-explanatory, with indentation again delim-
iting the extent of the code blocks
• We are passing the argument U as a string, which is why we write it as ’U’
• Notice that equality is tested with the == syntax, not =
– For example, the statement a = 10 assigns the name a to the value 10
– The expression a == 10 evaluates to either True or False, depending on the value of a
Now, there are two ways that we can simplify test_program_4
First, Python accepts the following conditional assignment syntax
In [20]: x = -10
In [22]: s
Out[22]: 'negative'
Second, and more importantly, we can get rid of the conditionals all together by just passing the
desired generator type as a function
To understand this, consider test_program_6.py
The only lines that have changed here are lines 7 and 11
In line 11, when we call the function generate_data(), we pass uniform as the second argument
The object uniform is in fact a function, defined in the random module
In [23]: from random import uniform
In [24]: uniform(0, 1)
Out[24]: 0.2981045489306786
When the function call generate_data(100, uniform) on line 11 is executed, Python runs the
code block on lines 5–9 with n equal to 100 and the name generator_type “bound” to the function
uniform
• While these lines are executed, the names generator_type and uniform are “synonyms”,
and can be used in identical ways
This principle works more generally—for example, consider the following piece of code
In [25]: max(7, 2, 4) # max() is a built-in Python function
Out[25]: 7
In [26]: m = max
In [27]: m(7, 2, 4)
Out[27]: 7
Here we created another name for the built-in function max(), which could then be used in iden-
tical ways
In the context of our program, the ability to bind new names to functions means that there is no
problem passing a function as an argument to another function—as we do in line 11
List Comprehensions Now is probably a good time to tell you that we can simplify the code for
generating the list of random draws considerably by using something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line
In [28]: animals = ['dog', 'cat', 'bird']
In [30]: plurals
Out[30]: ['dogs', 'cats', 'birds']
In [33]: doubles
Out[33]: [0, 2, 4, 6, 8, 10, 12, 14]
into
epsilon_values = [generator_type(0, 1) for i in range(n)]
Using the Scientific Libraries As discussed at the start of the lecture, our example is somewhat
contrived
In practice we would use the scientific libraries, which can generate large arrays of independent
random draws much more efficiently
For example, try
In [34]: from numpy.random import randn
In [36]: epsilon_values
Out[36]: array([-0.15591709, -1.42157676, -0.67383208, -0.45932047, -0.17041278])
Exercises
There are functions to compute this in various modules, but let’s write our own version as an
exercise
In particular, write a function factorial such that factorial(n) returns n! for any positive integer
n
Exercise 2 The binomial random variable Y ∼ Bin(n, p) represents the number of successes in n
binary trials, where each trial succeeds with probability p
Without any import besides from random import uniform, write a function binomial_rv such
that binomial_rv(n, p) generates one draw of Y
Hint: If U is uniform on (0, 1) and p ∈ (0, 1), then the expression U < p evaluates to True with
probability p
Exercise 4 Write a program that prints one realization of the following random device:
• Flip an unbiased coin 10 times
• If 3 consecutive heads occur one or more times within this sequence, pay one dollar
• If not, pay nothing
Use no import besides from random import uniform
Exercise 5 Your next task is to simulate and plot the correlated time series
x t +1 = α x t + e t +1 where x0 = 0 and t = 0, . . . , T
Exercise 6 To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea
from pylab import plot, show, legend
from random import normalvariate
Now, starting with your solution to exercise 5, plot three simulated time series, one for each of the
cases α = 0, α = 0.8 and α = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatilities,
as these three processes have, can have very different unconditional volatilities.)
In your solution, please restrict your import statements to
import matplotlib.pyplot as plt
from random import normalvariate
• If you call the plot() function multiple times before calling show(), all of the lines you
produce will end up on the same figure
– And if you omit the argument ’b-’ to the plot function, Matplotlib will automatically
select different colors for each line
• The expression ’foo’ + str(42) evaluates to ’foo42’
Solutions
Solution notebook
Contents
• Python Essentials
– Overview
– Data Types
– Imports
– Input and Output
– Iterating
– Comparisons and Logical Operators
– More Functions
– Coding Style and PEP8
– Exercises
– Solutions
In this lecture we’ll cover features of the language that are essential to reading and writing Python
code
Overview
Topics:
• Data types
• Imports
• Basic file I/O
• The Pythonic approach to iteration
• More on user-defined functions
• Comparisons and logic
• Standard Python style
Data Types
So far we’ve briefly met several common data types, such as strings, integers, floats and lists
Let’s learn a bit more about them
Primitive Data Types A particularly simple data type is Boolean values, which can be either
True or False
In [1]: x = True
In [3]: y
Out[3]: False
In [4]: type(y)
Out[4]: bool
In [6]: x * y
Out[6]: 0
In [9]: sum(bools)
Out[9]: 3
In [3]: type(a)
Out[3]: int
In [4]: type(c)
Out[4]: float
Computers distinguish between the two because, while floats are more informative, interal arith-
metic operations on integers are more straightforward
Warning: Be careful: If you’re still using Python 2.x, division of two integers returns only the
integer part
To clarify:
In [5]: 1 / 2 # Integer division in Python 2.x
Out[5]: 0
In [10]: x = complex(1, 2)
In [11]: y = complex(2, 1)
In [12]: x * y
Out[12]: 5j
There are several more primitive data types that we’ll introduce as necessary
Containers Python has several basic types for storing collections of (possibly heterogeneous)
data
We have already discussed lists
A related data type is tuples, which are “immutable” lists
In [13]: x = ('a', 'b') # Round brackets instead of the square brackets
In [15]: x
Out[15]: ('a', 'b')
In [16]: type(x)
Out[16]: tuple
In Python, an object is called “immutable” if, once created, the object cannot be changed
Lists are mutable while tuples are not
In [17]: x = [1, 2] # Lists are mutable
We’ll say more about mutable vs immutable a bit later, and explain why the distinction is impor-
tant
Tuples (and lists) can be “unpacked” as follows
In [21]: integers = (10, 20, 30)
In [22]: x, y, z = integers
In [23]: x
Out[23]: 10
In [24]: y
Out[24]: 20
Slice Notation To access multiple elements of a list or tuple, you can use Python’s slice notation
For example,
In [14]: a = [2, 4, 6, 8]
In [15]: a[1:]
Out[15]: [4, 6, 8]
In [16]: a[1:3]
Out[16]: [4, 6]
Sets and Dictionaries Two other container types we should mention before moving on are sets
and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered
In [25]: d = {'name': 'Frodo', 'age': 33}
In [26]: type(d)
Out[26]: dict
In [27]: d['age']
Out[27]: 33
Sets are unordered collections without duplicates, and set methods provide the usual set theoretic
operations
In [28]: s1 = {'a', 'b'}
In [29]: type(s1)
Out[29]: set
In [31]: s1.issubset(s2)
Out[31]: False
In [32]: s1.intersection(s2)
Out[32]: set(['b'])
In [34]: s3
Out[34]: set(['foo', 'bar']) # Unique elements only
Imports
From the start, Python has been designed around the twin principles of
• a small core language
• extra functionality in separate libraries or modules
For example, if you want to compute the square root of an arbitrary number, there’s no built in
function that will perform this for you
Instead, you need to import the functionality from a module — in this case a natural choice is math
In [1]: import math
In [2]: math.sqrt(4)
Out[2]: 2.0
In [4]: sqrt(4)
Out[4]: 2.0
Here from math import * pulls all of the functionality of math into the current “namespace” — a
concept we’ll define formally later on
Actually this kind of syntax should be avoided for the most part
In essence the reason is that it pulls in lots of variable names without explicitly listing them — a
potential source of conflicts
In [38]: f.close()
Here
• The built-in function open() creates a file object for writing to
• Both write() and close() are methods of file objects
Where is this file that we’ve created?
Recall that Python maintains a concept of the present working directory (pwd) that can be located
by
import os
print(os.getcwd())
In [41]: out
Out[41]: 'Testing\nTesting again'
In [42]: print(out)
Out[42]:
Testing
Testing again
Paths Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case you can either specify the full path to the file
In [43]: f = open('insert_full_path_to_file/newfile.txt', 'r')
or change the present working directory to the location of the file via os.chdir(’path_to_file’)
(In IPython, use cd to change directories)
Details are OS specific – a Google search on paths and Python should yield plenty of examples
Iterating
One of the most important tasks in computing is stepping through a sequence of data and per-
forming a given action
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for loop
Looping over Different Objects Many Python objects are “iterable”, in the sense that they can
looped over
To give an example, consider the file us_cities.txt, which lists US cities and their population
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
Suppose that we want to make the information more readable, by capitalizing names and adding
commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:
Here format() is a string method used for inserting variables into strings
The output is as follows
New York 8,244,910
Los Angeles 3,819,702
Chicago 2,707,120
Houston 2,145,146
Philadelphia 1,536,471
Phoenix 1,469,471
San Antonio 1,359,758
San Diego 1,326,179
Dallas 1,223,229
The reformatting of each line is the result of three different string methods, the details of which
can be left till later
The interesting part of this program for us is line 2, which shows that
1. The file object f is iterable, in the sense that it can be placed to the right of in within a for
loop
2. Iteration steps through each line in the file
This leads to the clean, convenient syntax shown in our program
Many other kinds of objects are iterable, and we’ll discuss some of them later on
Looping without Indices One thing you might have noticed is that Python tends to favor loop-
ing without explicit indexing
For example,
for x in x_values:
print(x * x)
is preferred to
for i in range(len(x_values)):
print(x_values[i] * x_values[i])
When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code
The zip() function is also useful for creating dictionaries — for example
In [1]: names = ['Tom', 'John']
If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example
letter_list = ['a', 'b', 'c']
for index, letter in enumerate(letter_list):
print("letter_list[{0}] = '{1}'".format(index, letter))
Comparisons Many different kinds of expressions evaluate to one of the Boolean values (i.e.,
True or False)
A common type is comparisons, such as
In [44]: x, y = 1, 2
In [45]: x < y
Out[45]: True
In [46]: x > y
Out[46]: False
In [49]: x = 1 # Assignment
In [50]: x == 2 # Comparison
Out[50]: False
Note that when testing conditions, we can use any valid Python expression
In [52]: x = 'yes' if 42 else 'no'
In [53]: x
Out[53]: 'yes'
In [55]: x
Out[55]: 'no'
Remember
• P and Q is True if both are True, else False
• P or Q is False if both are False, else True
More Functions
Let’s talk a bit more about functions, which are all-important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
In [61]: max(19, 20)
Out[61]: 20
In [62]: range(4)
Out[62]: [0, 1, 2, 3]
In [63]: str(22)
Out[63]: '22'
In [64]: type(22)
Out[64]: int
Why Write Functions? User defined functions are important for improving the clarity of your
code by
• separating different strands of logic
• facilitating code reuse
(Writing the same thing twice is almost always a bad idea)
The basics of user defined functions were discussed here
The Flexibility of Python Functions As we discussed in the previous lecture, Python functions
are very flexible
In particular
• Any number of functions can be defined in a given file
• Any object can be passed to a function as an argument, including other functions
• Functions can be (and often are) defined inside other functions
Functions without a return statement automatically return the special Python object None
Docstrings Python has a system for adding comments to functions, modules, etc. called doc-
strings
The nice thing about docstrings is that they are available at run-time
For example, let’s say that this code resides in file temp.py
# Filename: temp.py
def f(x):
"""
This function squares its argument
"""
return x**2
After it has been run in the IPython shell, the docstring is available as follows
In [1]: run temp.py
In [2]: f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
In [3]: f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as well
One-Line Functions: lambda The lambda keyword is used to create simple functions on one line
For example, the definitions
def f(x):
return x**3
and
f = lambda x: x**3
Here the function created by lambda is said to be anonymous, because it was never given a name
Keyword Arguments If you did the exercises in the previous lecture, you would have come across
the statement
plt.plot(x, 'b-', label="white noise")
In this call to Matplotlib’s plot function, notice that the last argument is passed in name=argument
syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
• plot(x, ’b-’, label="white noise") is different from plot(’b-’, x, label="white
noise")
Keyword arguments are particularly useful when a function has a lot of arguments, in which case
it’s hard to remember the right order
You can adopt keyword arguments in user defined functions with no difficulty
The next example illustrates the syntax
def f(x, coefficients=(1, 1)):
a, b = coefficients
return a + b * x
Notice that the keyword argument values we supplied in the definition of f become the default
values
To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
We’ve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page
In Python, the standard style is set out in PEP8
(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical notation)
Exercises
Exercise 1 Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute
their inner product using zip()
Part 2: In one line, count the number of even numbers in 0,...,99
• Hint: x % 2 returns 0 if x is even, 1 otherwise
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs (a, b)
such that both a and b are even
Write a function p such that p(x, coeff) that computes the value in (1.1) given a point x and a list
of coefficients coeff
Try to use enumerate() in your loop
Exercise 3 Write a function that takes a string as an argument and returns the number of capital
letters in the string
Hint: ’foo’.upper() returns ’FOO’
Exercise 4 Write a function that takes two sequences seq_a and seq_b as arguments and returns
True if every element in seq_a is also an element of seq_b, else False
• By “sequence” we mean a list, a tuple or a string
• Do the exercise without using sets and set methods
Exercise 5 When we cover the numerical libraries, we will see they include many alternatives
for interpolation and function approximation
Nevertheless, let’s write our own function approximation routine as an exercise
In particular, without using any imports, write a function linapprox that takes as arguments
• A function f mapping some interval [ a, b] into R
• two scalars a and b providing the limits of this interval
• An integer n determining the number of grid points
• A number x satisfying a <= x <= b
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points a =
point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency
Solutions
Solution notebook
Contents
• Object Oriented Programming
– Overview
– About OOP
– Defining Your Own Classes
– Special Methods
– Exercises
– Solutions
Overview
OOP is one of the major paradigms in programming, and nicely supported in Python
OOP has become an important concept in modern software engineering because
• It can help facilitate clean, efficient code (when used well)
• The OOP design pattern fits well with the human brain
OOP is about producing well organized code — an important determinant of productivity
Moreover, OOP is a part of Python, and to progress further it’s necessary to understand the basics
About OOP
Key Concepts The traditional (non-OOP) paradigm is called procedural, and works as follows
• The program has a state that contains the values of its variables
• Functions are called to act on these data according to the task
• Data are passed back and forth via function calls
In contrast, in the OOP paradigm, data and functions are bundled together into “objects”
An example is a Python list, which not only stores data, but also knows how to sort itself, etc.
In [1]: x = [1, 5, 4]
In [2]: x.sort()
In [3]: x
Out[3]: [1, 4, 5]
Standard Terminology A class definition is a blueprint for a particular class of objects (e.g., lists,
strings or complex numbers)
It describes
In [5]: x.sort()
In [6]: x.__class__
Out[6]: list
• x is an object or instance, created from the definition for Python lists, but with its own par-
ticular data
• x.sort() and x.__class__ are two attributes of x
• dir(x) can be used to view all the attributes of x
Why is OOP Useful? OOP is useful for the same reason that abstraction is useful: for recogniz-
ing and organizing common structures
• E.g., a general equilibrium theory consists of a commodity space, preferences, technologies,
and a common equilibrium definition
• E.g., a game consists of a list of players, lists of actions available to each player, payoffs for
each player as functions of all players’ actions, and a timing protocol
For an example more relevant to OOP, consider the open windows on your desktop
Windows have common functionality and individual data, which makes them suitable for imple-
menting with OOP
• individual data: contents of specific windows
• common functionality: closing, maximizing, etc.
Your window manager almost certainly uses OOP to generate and manage these windows
• individual windows created as objects / instances from a class definition, with their own
data
• common functionality implemented as methods, which all of these objects share
At this point, both variables have been brought into the global namespace, and the second will
shadow the first
A better idea is to replace the above with
import os
import sys
and then reference the path you want with either os.path or sys.path
In this example, we see that modules provide one means of data encapsulation
As will now become clear, OOP provides another
In [3]: c1.wealth = 10
In [4]: c1.wealth
Out[4]: 10
Comments on notation:
• The class keyword indicates that we are building a class
• The pass keyword is used in Python to stand in for an empty code block
Notice the flexibility of Python:
• We don’t actually need to specify what attributes a class will have
Example: Another Consumer Class Let’s build a Consumer class with more structure:
• A wealth attribute that stores the consumer’s wealth (data)
• An earn method, where earn(y) increments the consumer’s wealth by y
• A spend method, where spend(x) either decreases wealth by x or returns an error if insuffi-
cient funds exist
Admittedly a little contrived, this example of a class helps us internalize some new syntax
Here’s one implementation, from file consumer.py in the applications repository
class Consumer:
Calling __init__ sets up a “namespace” to hold the instance data — more on this soon
We’ll also discuss the role of self just below
Usage Here’s an example of usage, assuming consumer.py is in your present working directory
In [1]: run consumer.py
In [3]: c1.spend(5)
In [4]: c1.wealth
Out[4]: 5
In [5]: c1.earn(15)
In [6]: c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
In [2]: c1 = Consumer(10)
In [3]: c2 = Consumer(12)
In [4]: c2.spend(4)
In [5]: c2.wealth
Out[5]: 8
In [6]: c1.wealth
Out[6]: 10
In [8]: c2.__dict__
Out[8]: {'wealth': 8}
When we access or set attributes we’re actually just modifying the dictionary maintained by the
instance
Self If you look at the Consumer class definition again you’ll see the word self throughout the
code
The rules with self are that
• Any instance data should be prepended with self
– e.g., the earn method references self.wealth rather than just wealth
• Any method defined within the class should have self as its first argument
– e.g., def earn(self, y) rather than just def earn(y)
• Any method referenced within the class should be called as self.method_name
There are no examples of the last rule in the preceding code but we will see some shortly
Details In this section we look at some more formal details related to classes and self
• You might wish to skip to the next section on first pass of this lecture
• You can return to these details after you’ve familiarized yourself with more examples
Methods actually live inside a class object formed when the interpreter reads the class definition
In [1]: run consumer.py # Read class def, build class object Consumer
Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code
In [2]: c1 = Consumer(10)
In [3]: c1.earn(10)
In [4]: c1.wealth
Out[4]: 20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argument 10 to
Consumer.earn
In fact the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter
def earn(self, y):
"The consumer earns y dollars"
self.wealth += y
The end result is that self is bound to the instance c1 inside the function call
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth
Example: The Solow Growth Model For our next example, let’s write a simple class to imple-
ment the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock per
capita k t evolves according to the rule
szkαt + (1 − d)k t
k t +1 = (1.2)
1+n
Here
• s is an exogenously given savings rate
• z is a productivity parameter
• α is capital’s share of income
• n is the population growth rate
• d is the depreciation rate
The steady state of the model is the k that solves (1.2) when k t+1 = k t = k
While the QuantEcon.applications package already has some relatively sophisticated code for
dealing with this model, here we’ll create something more basic for illustrative purposes
You can find it file solow.py in the applications repository
"""
Filename: solow.py
Reference: http://quant-econ.net/py/python_oop.html
"""
from __future__ import division # Omit for Python 3.x
import numpy as np
class Solow:
r"""
Implements the Solow growth model with update rule
.. math::
k_{t+1} = \frac{s z k^{\alpha}_t}{1 + n} + k_t \frac{1 + d}{1 + n}
"""
def h(self,x):
"Evaluate the h function"
temp = self.s * self.z * self.k**self.alpha + self.k * (1 - self.d)
return temp / (1 + self.n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h(self.k)
def steady_state(self):
"Compute the steady state value of capital."
return ((self.s * self.z) / (self.n + self.d))**(1 / (1 - self.alpha))
ax.legend(loc='lower right')
plt.show()
Example: A Market Next let’s write a class for a simple one good market where agents are price
takers
The market consists of the following objects:
• A linear demand curve Q = ad − bd p
• A linear supply curve Q = az + bz ( p − t)
Here
• p is price paid by the consumer, Q is quantity, and t is a per unit tax
• Other symbols are demand and supply parameters
The class provides methods to compute various values of interest, including competitive equlib-
rium price and quantity, tax revenue raised, consumer surplus and producer surplus
Here’s our implementation
"""
Filename: market.py
Reference: http://quant-econ.net/py/python_oop.html
"""
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz*self.tax)/(self.bd + self.bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad/self.bd) - (1/self.bd)* x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az/self.bz) + (1/self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
def inverse_demand(self,x):
"Compute inverse demand"
return self.ad/self.bd - (1/self.bd)* x
def inverse_supply(self,x):
"Compute inverse supply curve"
return -(self.az/self.bz) + (1/self.bz) * x + self.tax
def inverse_supply_no_tax(self,x):
"Compute inverse supply curve without tax"
return -(self.az/self.bz) + (1/self.bz) * x
In [3]: m = Market(*baseline_params)
Here’s a short program that uses this class to plot an inverse demand curve and curves supply
with and without tax
import matplotlib.pyplot as plt
import numpy as np
from market import Market
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
def deadw(m):
"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2
In [7]: m = Market(*baseline_params)
Example: Chaos Let’s look at one more example, related to chaotic dynamics in nonlinear sys-
tems
One simple transition rule that can generate complex dynamics is the logistic map
Let’s write a class for generating time series from this model
Here’s one implementation, in file chaos_class.py
"""
Filename: chaos_class.py
Reference: http://quant-econ.net/py/python_oop.html
"""
class Chaos:
"""
Models the dynamical system with :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r
def update(self):
ch = Chaos(0.1, 4.0)
ts_length = 250
fig, ax = plt.subplots()
ax.set_xlabel(r'$t$', fontsize=14)
ax.set_ylabel(r'$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label=r'$x_t$')
plt.show()
fig, ax = plt.subplots()
ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
r = r + 0.005
ax.set_xlabel(r'$r$', fontsize=16)
plt.show()
Special Methods
Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length, and that this length can be queried
via the len function
In [21]: x = (10, 20)
In [22]: len(x)
Out[22]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
class Foo:
def __len__(self):
return 42
Now we get
In [23]: f = Foo()
In [24]: len(f)
Out[24]: 42
Exercises
In [30]: F = ECDF(samples)
In [33]: F(0.5)
Out[33]: 0.479
This exercise is an extension, where the task is to build a simple class called Polynomial for repre-
senting and manipulating polynomial functions such as
N
p ( x ) = a0 + a1 x + a2 x 2 + · · · a N x N = ∑ an x n ( x ∈ R) (1.5)
n =0
The instance data for the class Polynomial will be the coefficients (in the case of (1.5), the numbers
a0 , . . . , a N )
Provide methods that
1. Evaluate the polynomial (1.5), returning p( x ) for any x
2. Differentiate the polynomial, replacing the original coefficients with those of its derivative
p0
Avoid using any import statements
Solutions
Solution notebook
Contents
• How it Works: Data, Variables and Names
– Overview
– Objects
– Iterables and Iterators
– Names and Name Resolution
Overview
The objective of the lecture is to provide deeper understanding of Python’s execution model
Understanding these details is important for writing larger programs
You should feel free to skip this material on first pass and continue on to the applications
We provide this material mainly as a reference, and for returning to occasionally to build your
Python skills
Objects
Objects are usually thought of as instances of some class definition, typically combining both data
and methods (functions)
For example
In [1]: x = ['foo', 'bar']
creates (an instance of) a list, possessing various methods (append, pop, etc.)
In Python everything in memory is treated as an object
This includes not just lists, strings, etc., but also less obvious things, such as
• functions (once they have been read into memory)
• modules (ditto)
• files opened for reading or writing
• integers, etc.
At this point it is helpful to have a clearer idea of what an object is in Python
In Python, an object is a collection of data and instructions held in computer memory that consists
of
1. a type
2. some content
3. a unique identity
4. zero or more methods
These concepts are discussed sequentially in the remainder of this section
Type Python understands and provides for different types of objects, to accommodate different
types of data
The type of an object can be queried via type(object_name)
For example
In [2]: s = 'This is a string'
In [3]: type(s)
Out[3]: str
In [5]: type(x)
Out[5]: int
Here we are mixing types, and it’s unclear to Python whether the user wants to
• convert ’300’ to an integer and then add it to 400, or
• convert 400 to string and then concatenate it with ’300’
Some languages might try to guess but Python is strongly typed
• Type is important, and implicit type conversion is rare
• Python will respond instead by raising a TypeError
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-9b7dffd27f2d> in <module>()
----> 1 '300' + 400
To avoid the error, you need to clarify by changing the relevant type
For example,
In [9]: int('300') + 400 # To add as numbers, change the string to an integer
Out[9]: 700
In [11]: x
Out[11]: 42
In [12]: x.imag
Out[12]: 0
In [13]: x.__class__
Out[13]: int
When Python creates this integer object, it stores with it various auxiliary information, such as the
imaginary part, and the type
As discussed previously, any name following a dot is called an attribute of the object to the left of
the dot
• For example, imag and __class__ are attributes of x
Identity In Python, each object has a unique identifier, which helps Python (and us) keep track
of the object
The identity of an object can be obtained via the id() function
In [14]: y = 2.5
In [15]: z = 2.5
In [16]: id(y)
Out[16]: 166719660
In [17]: id(z)
Out[17]: 166719740
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the same object
The identity of an object is in fact just the address of the object in memory
Methods As discussed earlier, methods are functions that are bundled with objects
Formally, methods are attributes of objects that are callable (i.e., can be called as functions)
In [18]: x = ['foo', 'bar']
In [19]: callable(x.append)
Out[19]: True
In [20]: callable(x.__doc__)
Out[20]: False
Methods typically act on the data contained in the object they belong to, or combine that data with
other data
In [21]: x = ['a', 'b']
In [22]: x.append('c')
In [24]: s.upper()
Out[24]: 'THIS IS A STRING'
In [25]: s.lower()
Out[25]: 'this is a string'
In [29]: x
Out[29]: ['aa', 'b']
It doesn’t look like there are any methods used here, but in fact the square bracket assignment
notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows
In [30]: x = ['a', 'b']
In [32]: x
Out[32]: ['aa', 'b']
(If you wanted to you could modify the __setitem__ method, so that square bracket assignment
does something totally different)
Everything is an Object Above we said that in Python everything is an object—let’s look at this
again
Consider, for example, functions
When Python reads a function definition, it creates a function object and stores it in memory
The following code illustrates
In [33]: def f(x): return x**2
In [34]: f
Out[34]: <function __main__.f>
In [35]: type(f)
Out[35]: function
In [36]: id(f)
Out[36]: 3074342220L
In [37]: f.__name__
Out[37]: 'f'
We can see that f has type, identity, attributes and so on—just like any other object
In [39]: id(math)
Out[39]: 3074329380L
This uniform treatment of data in Python (everything is an object) helps keep the language simple
and consistent
In [2]: f.__next__()
Out[2]: 'new york: 8244910\n'
In [3]: f.__next__()
Out[3]: 'los angeles: 3819702\n'
We see that file objects do indeed have a __next__ method, and that calling this method returns
the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls this
method
In [4]: next(f)
Out[4]: 'chicago: 2707120 \n'
In [44]: next(e)
Out[44]: (0, 'foo')
In [45]: next(e)
Out[45]: (1, 'bar')
In [49]: next(nikkei_data)
Out[49]: ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
In [50]: next(nikkei_data)
Out[50]: ['2009-05-21', '9280.35', '9286.35', '9189.92', '9264.15', '133200', '9264.15']
Iterators in For Loops All iterators can be placed to the right of the in keyword in for loop
statements
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
Iterables You already know that we can put a Python list to the right of in in a for loop
for i in range(2):
print('foo')
In [16]: type(x)
Out[16]: list
In [17]: next(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-5e4e57af3a97> in <module>()
----> 1 next(x)
In [60]: type(x)
Out[60]: list
In [61]: y = iter(x)
In [62]: type(y)
Out[62]: list_iterator
In [63]: next(y)
Out[63]: 'foo'
In [64]: next(y)
Out[64]: 'bar'
In [65]: next(y)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-62-75a92ee8313a> in <module>()
----> 1 y.next()
StopIteration:
----> 1 iter(42)
Iterators and built-ins Some built-in functions that act on sequences also work with iterables
• max(), min(), sum(), all(), any()
For example
In [67]: x = [10, -10]
In [68]: max(x)
Out[68]: 10
In [69]: y = iter(x)
In [70]: type(y)
Out[70]: listiterator
In [71]: max(y)
Out[71]: 10
One thing to remember about iterators is that they are depleted by use
In [72]: x = [10, -10]
In [73]: y = iter(x)
In [74]: max(y)
Out[74]: 10
In [75]: max(y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-1d3b6314f310> in <module>()
----> 1 max(y)
We now know that when this statement is executed, Python creates an object of type int in your
computer’s memory, containing
• the value 42
• some associated attributes
But what is x itself?
In Python, x is called a name, and the statement x = 42 binds the name x to the integer object we
have just discussed
Under the hood, this process of binding names to objects is implemented as a dictionary—more
about this in a moment
There is no problem binding two or more names to the one object, regardless of what that object is
In [77]: def f(string): # Create a function called f
....: print(string) # that prints any string it's passed
In [78]: g = f
In [80]: g('test')
Out[80]: test
In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then rebound
to another
In [81]: x = 'foo'
In [82]: id(x)
Out[82]: 164994764
What happens here is that the first object, with identity 164994764 is garbage collected
In other words, the memory slot that stores that object is deallocated, and returned to the operating
system
We also mentioned that this process of binding x to the correct object is implemented as a dictio-
nary
This dictionary is called a namespace
Definition: A namespace is a symbol table that maps names to objects in memory
Python uses multiple namespaces, creating them on the fly as necessary
For example, every time we import a module, Python creates a namespace for that module
To see this in action, suppose we write a script math2.py like this
# Filename: math2.py
pi = 'foobar'
Next let’s import the math module from the standard library
In [86]: import math
In [88]: math2.pi
Out[88]: 'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a dic-
tionary
We can look at the dictionary directly, using module_name.__dict__
In [89]: import math
In [90]: math.__dict__
Out[90]: {'pow': <built-in function pow>, ..., 'pi': 3.1415926535897931,...} # Edited output
In [92]: math2.__dict__
Out[92]: {..., '__file__': 'math2.py', 'pi': 'foobar',...} # Edited output
As you know, we access elements of the namespace using the dotted attribute notation
In [93]: math.pi
Out[93]: 3.1415926535897931
Viewing Namespaces As we saw above, the math namespace can be printed by typing
math.__dict__
Another way to see its contents is to type vars(math)
In [95]: vars(math)
Out[95]: {'pow': <built-in function pow>,...
In [98]: math.__name__
'math'
Interactive Sessions In Python, all code executed by the interpreter runs in some module
What about commands typed at the prompt?
These are also regarded as being executed within a module — in this case, a module called
__main__
To check this, we can look at the current module name via the value of __name__ given at the
prompt
In [99]: print(__name__)
__main__
When we run a script using IPython’s run command, the contents of the file are executed as part
of __main__ too
To see this, let’s create a file mod.py that prints its own __name__ attribute
# Filename: mod.py
print(__name__)
In the second case, the code is executed as part of __main__, so __name__ is equal to __main__
To see the contents of the namespace of __main__ we use vars() rather than vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initial-
ized when you started up your session
If you prefer to see only the variables you have initialized, use whos
In [3]: x = 2
In [4]: y = 3
In [6]: whos
Variable Type Data/Info
------------------------------
np module <module 'numpy' from '/us<...>ages/numpy/__init__.pyc'>
x int 2
y int 3
The Global Namespace Python documentation often makes reference to the “global names-
pace”
The global namespace is the namespace of the module currently being executed
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is the global
namespace
Next, we import a module called amodule
In [7]: import amodule
At this point, the interpreter creates a namespace for the module amodule and starts executing
commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the im-
port statement was made
In this case it’s __main__, so the namespace of __main__ again becomes the global namespace
Local Namespaces Important fact: When we call a function, the interpreter creates a local names-
pace for that function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with locals()
For example, consider
In [1]: def f(x):
...: a = 2
...: print(locals())
...: return a * x
...:
The __builtins__ Namespace We have been using various built-in functions, such as max(),
dir(), str(), list(), len(), range(), type(), etc.
How does access to these names work?
• These definitions are stored in a module called __builtin__
• They have there own namespace called __builtins__
In [12]: dir()
Out[12]: [..., '__builtins__', '__doc__', ...] # Edited output
In [13]: dir(__builtins__)
Out[13]: [... 'iter', 'len', 'license', 'list', 'locals', ...] # Edited output
But __builtins__ is special, because we can always access them directly as well
In [15]: max
Out[15]: <built-in function max>
Name Resolution Namespaces are great because they help us organize variable names
(Type import this at the prompt and look at the last item that’s printed)
However, we do need to understand how the Python interpreter works with multiple namespaces
At any point of execution, there are in fact at least two namespaces that can be accessed directly
Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
1. the local namespace (if it exists)
2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace
If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Here’s an example that helps to illustrate
Consider a script test.py that looks as follows
def g(x):
a = 1
x = x + a
return x
a = 0
y = g(10)
print("a = ", a, "y = ", y)
In [18]: x
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-401b30e3b8b5> in <module>()
----> 1 x
First,
• The global namespace {} is created
• The function object is created, and g is bound to it within the global namespace
• The name a is bound to 0, again in the global namespace
Next g is called via y = g(10), leading to the following sequence of actions
• The local namespace for the function is created
• Local names x and a are bound, so that the local namespace becomes {’x’: 10, ’a’: 1}
• Statement x = x + a uses the local a and local x to compute x + a, and binds local name x
to the result
• This value is returned, and y is bound to it in the global namespace
• Local x and a are discarded (and the local namespace is deallocated)
Note that the global a was not affected by the local a
Mutable Versus Immutable Parameters This is a good time to say a little more about mutable
vs immutable objects
Consider the code segment
def f(x):
x = x + 1
return x
x = 1
print(f(x), x)
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as the
value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, it’s a different story when we use a mutable data type such as a list
def f(x):
x[0] = x[0] + 1
return x
x = [1]
print(f(x), x)
Contents
• More Language Features
– Overview
– Handling Errors
– Decorators and Descriptors
– Generators
– Recursive Function Calls
– Exercises
– Solutions
Overview
As with the last lecture, our advice is to skip this lecture on first pass, unless you have a burning
desire to read it
It’s here
Handling Errors
Assertions A relatively easy way to handle checks is with the assert keyword
For example, pretend for a moment that the np.var function doesn’t exist and we need to write
our own
In [19]: def var(y):
....: n = len(y)
....: assert n > 1, 'Sample size must be greater than one.'
....: return np.sum((y - y.mean())**2) / float(n-1)
....:
If we run this with an array of length one, the program will terminate and print our error message
In [20]: var([1])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-20-0032ff8a150f> in <module>()
----> 1 var([1])
<ipython-input-19-cefafaec3555> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
Handling Errors During Runtime The approach used above is a bit limited, because it always
leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Let’s look at how this is done
Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Here’s a different kind of error, unrelated to syntax
In [44]: 1 / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-17-05c9758a9c21> in <module>()
----> 1 1/0
Here’s another
In [45]: x1 = y1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-23-142e0509fbd6> in <module>()
----> 1 x1 = y1
And another
In [46]: 'foo' + 6
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-44bbe7e963e7> in <module>()
----> 1 'foo' + 6
And another
In [47]: X = []
In [48]: x = X[0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-22-018da6d9fc14> in <module>()
----> 1 x = X[0]
Catching Exceptions We can catch and deal with exceptions using try – except blocks
Here’s a simple example
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
In [51]: f(0)
Error: division by zero. Returned None
In [52]: f(0.0)
Error: division by zero. Returned None
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None
In [55]: f(0)
Error: Division by zero. Returned None
In [56]: f('foo')
Error: Unsupported operation. Returned None
In [59]: f(0)
Error: Unsupported operation. Returned None
In [60]: f('foo')
Error: Unsupported operation. Returned None
Let’s look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other people’s
code
Hence you need to understand them at some stage of your Python education
Decorators Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to
be popular
It’s very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example Suppose we are working on a program that looks something like this
import numpy as np
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the calcula-
tions that follow
If you try it, you’ll see that when these functions are called with negative numbers they return a
NumPy object called nan
Suppose further that this is not what we want because it causes other problems that are hard to
pick up
Suppose that instead we want the program to terminate whenever this happens with a sensible
error message
This change is easy enough to implement
import numpy as np
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try hard to
avoid
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such functions
that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20 times
The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater
import numpy as np
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
Enter Decorators The last version of our code is still not ideal
For example, if someone is reading our code and wants to know how f works, they will be looking
for the function definition, which is
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
@check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
One potential problem we might have here is that a user alters one of these variables but not the
other
In [3]: car.miles
Out[3]: 1000
In [4]: car.kms
Out[4]: 1610.0
In [6]: car.kms
Out[6]: 1610.0
In the last two lines we see that miles and kms are out of sync
What we really want is some mechanism whereby each time a user sets one of these variables, the
other is automatically updated
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
In [9]: car.miles
Out[9]: 1000
In [11]: car.kms
Out[11]: 9660.0
How it Works The names _miles and _kms are arbitrary names we are using to store the values
of the variables
The objects miles and kms are properties, a common kind of descriptor
The methods get_miles, set_miles, get_kms and set_kms define what happens when you get (i.e.
access) or set (bind) these variables
• So-called “getter” and “setter” methods
The builtin Python function property takes getter and setter methods and creates a property
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is triggered — in
this case set_miles
Decorators and Properties These days its very common to see the property function used via a
decorator
Here’s another version of our Car class that works as before but now uses decorators to set up the
properties
class Car(object):
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
Generators
Generator Expressions The easiest way to build generators is using generator expressions
Just like a list comprehension, but with round brackets
Here is the list comprehension:
In [1]: singular = ('dog', 'cat', 'bird')
In [2]: type(singular)
Out[2]: tuple
In [4]: plural
Out[4]: ['dogs', 'cats', 'birds']
In [5]: type(plural)
Out[5]: list
In [8]: type(plural)
Out[8]: generator
In [9]: next(plural)
Out[9]: 'dogs'
In [10]: next(plural)
Out[10]: 'cats'
In [11]: next(plural)
Out[11]: 'birds'
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
In [13]: sum(x * x for x in range(10))
Out[13]: 285
Generator Functions The most flexible way to create generator objects is to use generator func-
tions
Let’s look at some examples
It looks like a function, but uses a keyword yield that we haven’t met before
Let’s see how it works after running this code
In [15]: type(f)
Out[15]: function
In [17]: gen
Out[17]: <generator object f at 0x3b66a50>
In [18]: next(gen)
Out[18]: 'start'
In [19]: next(gen)
Out[19]: 'middle'
In [20]: next(gen)
Out[20]: 'end'
In [21]: next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-21-b2c61ce5e131> in <module>()
----> 1 gen.next()
StopIteration:
The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)
• Executes code in the body of f() until it meets a yield statement
• Returns that value to the caller of next(gen)
The second call to next(gen) starts executing from the next line
def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
In [26]: type(gen)
Out[26]: generator
In [27]: next(gen)
Out[27]: 2
In [28]: next(gen)
Out[28]: 4
In [29]: next(gen)
Out[29]: 16
In [30]: next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-32-b2c61ce5e131> in <module>()
----> 1 gen.next()
StopIteration:
In [34]: sum(draws)
But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens
In [35]: n = 1000000000
In [41]: draws
Out[41]: <generator object at 0xb7d8b2cc>
In [42]: sum(draws)
Out[42]: 4999141
In summary, iterables
• avoid the need to create big lists/tuples, and
• provide a uniform interface to iteration that can be used transparently in for loops
This is not something that you will use every day, but it is still useful — you should learn it at
some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing xt for some t when
else:
return 2 * x(t-1)
What happens here is that each successive call uses it’s own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be preferred
to the recursive solution
We’ll meet less contrived applications of recursion later on
Exercises
x t +1 = x t + x t −1 , x0 = 0, x1 = 1 (1.7)
The first few numbers in the sequence are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the t-th Fibonacci number for any t
Exercise 2 Complete the following code, and test it using this csv file, which we assume that
you’ve put in your current working directory
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here
dates = column_iterator('test_table.csv', 1)
Exercise 3 Suppose we have a text file numbers.txt containing the following lines
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the numbers,
ignoring lines without numbers
Solutions
Solution notebook
1.8 NumPy
Contents
• NumPy
– Overview
– Introduction to NumPy
– NumPy Arrays
– Operations on Arrays
– Other NumPy Functions
– Exercises
– Solutions
“Let’s be clear: the work of science has nothing whatever to do with consensus. Con-
sensus is the business of politics. Science, on the contrary, requires only one investiga-
tor who happens to be right, which means that he or she has results that are verifiable
by reference to the real world. In science consensus is irrelevant. What is relevant is
reproducible results.” – Michael Crichton
Overview
References
• The official NumPy documentation
Introduction to NumPy
NumPy Arrays
The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
For example, the np.zeros function returns an numpy.ndarray of zeros
In [1]: import numpy as np
In [2]: a = np.zeros(3)
In [3]: a
Out[3]: array([ 0., 0., 0.])
In [4]: type(a)
Out[4]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
• Data must be homogeneous (all elements of the same type)
• These types must be one of the data types (dtypes) provided by NumPy
The most important of these dtypes are:
• float64: 64 bit floating point number
In [8]: type(a[0])
Out[8]: numpy.float64
In [10]: type(a[0])
Out[10]: numpy.int32
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute
In [13]: z.shape = (10, 1)
In [14]: z
Out[14]:
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])
In [15]: z = np.zeros(4)
In [17]: z
Out[17]:
array([[ 0., 0.],
[ 0., 0.]])
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() function, as in
z = np.zeros((2, 2))
Creating Arrays As we’ve seen, the np.zeros function creates an array of zeros
You can probably guess what np.ones creates
Related is np.empty, which creates arrays in memory that can later be populated with data
In [18]: z = np.empty(3)
In [19]: z
Out[19]: array([ 8.90030222e-307, 4.94944794e+173, 4.04144187e-262])
In [22]: z
Out[22]:
array([[ 1., 0.],
[ 0., 1.]])
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [23]: z = np.array([10, 20]) # ndarray from Python list
In [24]: z
Out[24]: array([10, 20])
In [25]: type(z)
Out[25]: numpy.ndarray
In [27]: z
In [29]: z
Out[29]:
array([[1, 2],
[3, 4]])
See also np.asarray, which performs a similar function, but does not make a distinct copy of data
already in a NumPy array
In [11]: na = np.linspace(10, 20, 2)
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details
Array Indexing For a flat array, indexing is the same as Python sequences:
In [30]: z = np.linspace(1, 2, 5)
In [31]: z
Out[31]: array([ 1. , 1.25, 1.5 , 1.75, 2. ])
In [32]: z[0]
Out[32]: 1.0
In [34]: z[-1]
Out[34]: 2.0
In [36]: z
Out[36]:
array([[1, 2],
[3, 4]])
In [37]: z[0, 0]
Out[37]: 1
In [38]: z[0, 1]
Out[38]: 2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
In [39]: z[0,:]
Out[39]: array([1, 2])
In [40]: z[:,1]
Out[40]: array([2, 4])
In [42]: z
Out[42]: array([ 2. , 2.5, 3. , 3.5, 4. ])
In [44]: z[indices]
Out[44]: array([ 2. , 3. , 3.5])
In [47]: d
Out[47]: array([False, True, True, False, False], dtype=bool)
In [48]: z[d]
Out[48]: array([ 2.5, 3. ])
In [50]: z
Out[50]: array([ -1.25236750e-041, 0.00000000e+000, 5.45693855e-313])
In [51]: z[:] = 42
In [52]: z
Out[52]: array([ 42., 42., 42.])
Array Methods Arrays have useful methods, all of which are highly optimized
In [54]: A
Out[54]: array([4, 3, 2, 1])
In [56]: A
Out[56]: array([1, 2, 3, 4])
In [68]: z
Out[68]: array([ 2. , 2.5, 3. , 3.5, 4. ])
In [69]: z.searchsorted(2.2)
Out[69]: 1
In [70]: z.searchsorted(2.5)
Out[70]: 1
In [71]: z.searchsorted(2.6)
Out[71]: 2
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [72]: a = np.array((4, 3, 2, 1))
In [73]: np.sum(a)
Out[73]: 10
In [74]: np.mean(a)
Out[74]: 2.5
Operations on Arrays
Algebraic Operations The algebraic operators +, -, *, / and ** all act elementwise on arrays
In [75]: a = np.array([1, 2, 3, 4])
In [77]: a + b
Out[77]: array([ 6, 8, 10, 12])
In [78]: a * b
Out[78]: array([ 5, 12, 21, 32])
In [82]: a * 10
Out[82]: array([10, 20, 30, 40])
In [88]: A + B
Out[88]:
array([[ 2., 2.],
[ 2., 2.]])
In [89]: A + 10
Out[89]:
array([[ 11., 11.],
[ 11., 11.]])
In [90]: A * B
Out[90]:
array([[ 1., 1.],
[ 1., 1.]])
Matrix Multiplication To perform matrix multiplication, one typically uses the np.dot function
In [137]: A = np.ones((2, 2))
In [139]: np.dot(A, B)
Out[139]:
array([[ 2., 2.],
[ 2., 2.]])
With np.dot we can also take the inner product of two flat arrays
In [91]: A = np.array([1, 2])
In fact we can use dot when one element is a Python list or tuple
In [94]: A = np.empty((2, 2))
In [95]: A
Out[95]:
array([[ 3.48091887e-262, 1.14802984e-263],
[ 3.61513512e-313, -1.25232371e-041]])
Note: Because np.dot can be inconvenient for expressions involving the multiplication of many
matricies, NumPy provides the numpy.matrix class. For instances of this data type, the * operator
means matrix (as opposed to elementwise) multiplication. However, it’s easy to get mixed up
between NumPy arrays and NumPy matrices. For this reason, the numpy.matrix type is avoided
by many programmers, including us.
In [99]: z == y
Out[99]: array([ True, True], dtype=bool)
In [100]: y[0] = 5
In [101]: z == y
Out[101]: array([False, True], dtype=bool)
In [102]: z != y
Out[102]: array([ True, False], dtype=bool)
In [104]: z
Out[104]: array([ 0. , 2.5, 5. , 7.5, 10. ])
In [105]: z > 3
Out[105]: array([False, False, True, True, True], dtype=bool)
In [107]: b
Out[107]: array([False, False, True, True, True], dtype=bool)
In [108]: z[b]
Out[108]: array([ 5. , 7.5, 10. ])
Vectorized Functions NumPy provides versions of the standard functions log, exp, sin, etc. that
act elementwise on arrays
In [110]: z = np.array([1, 2, 3])
In [111]: np.sin(z)
Out[111]: array([ 0.84147098, 0.90929743, 0.14112001])
for i in range(n):
y[i] = np.sin(z[i])
Because they act elementwise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”
As we saw above, the usual arithmetic operations (+, *, etc.) also work elementwise, and combin-
ing these with the ufuncs gives a very large set of fast elementwise functions
In [112]: z
Out[112]: array([1, 2, 3])
In this situation you should use the vectorized NumPy function np.where
In [114]: import numpy as np
In [115]: x = np.random.randn(4)
In [116]: x
Out[116]: array([-0.25521782, 0.38285891, -0.98037787, -0.083662 ])
Although it’s usually better to hand code vectorized functions from vectorized NumPy operations,
at a pinch you can use np.vectorize
In [118]: def f(x): return 1 if x > 0 else 0
In [119]: f = np.vectorize(f)
In [136]: y.mean()
Out[136]: 5.0369999999999999
However, all of this functionality is also available in SciPy, a collection of modules that build on
top of NumPy
We’ll cover the SciPy versions in more detail soon
Exercises
Earlier, you wrote a simple function p(x, coeff) to evaluate (1.8) without considering efficiency
Now write a new function that does the same job, but uses NumPy arrays and array operations
for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise don’t use
this class)
• Hint: Use np.cumprod()
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you can’t see how this works, try thinking through the flow for a simple example, such as q =
[0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops
• Hint: Use np.searchsorted and np.cumsum
If you can, implement the functionality as a class called discreteRV, where
• the data for an instance of the class is the vector of probabilities q
• the class has a draw() method, which returns one draw according to the algorithm described
above
If you can, write the method so that draw(k) returns k draws from q
Solutions
Solution notebook
1.9 SciPy
Contents
• SciPy
– SciPy versus NumPy
– Statistics
– Roots and Fixed Points
– Optimization
– Integration
– Linear Algebra
– Exercises
– Solutions
SciPy builds on top of NumPy to provide common tools for scientific programming, such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
Like NumPy, SciPy is stable, mature and widely used
Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as LA-
PACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole—a better approach is to learn each relevant
feature as required
You can browse from the top of the documentation tree to see what’s available
In this lecture we aim only to highlight some useful parts of the package
SciPy is a package that contains various tools that are built on top of NumPy, using its array data
type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initialization file
# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
Although SciPy imports NumPy, the standard approach is to start scientific programs with
import numpy as np
This approach helps clarify what functionality belongs to what package, and we will follow it in
these lectures
Statistics
Random Variables and Distributions Recall that numpy.random provides functions for generat-
ing random variables
In [1]: import numpy as np
x ( a −1) (1 − x ) ( b −1)
f ( x; a, b) = R 1 (0 ≤ x ≤ 1) (1.9)
0
u(a−1) u(b−1) du
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats, which provides all of this functionality as well as random num-
ber generation in a single consistent interface
In this code we created a so-called rv_frozen object, via the call q = beta(5, 5)
The “frozen” part of the notation related to the fact that q represents a particular distribution with
a particular set of parameters
Once we’ve done so, we can then generate random numbers, evaluate the density, etc., all from
this fixed distribution
In [14]: q.cdf(0.4) # Cumulative distribution function
Out[14]: 0.2665676800000002
In [17]: q.mean()
Out[17]: 0.5
Other Goodies in scipy.stats There are also many statistical functions in scipy.stats
For example, scipy.stats.linregress implements simple linear regression
In [19]: from scipy.stats import linregress
In [20]: x = np.random.randn(200)
Bisection One of the most common algorithms for numerical root finding is bisection
To understand the idea, recall the well known game where
• Player A thinks of a secret number between 1 and 100
• Player B asks if it’s less than 50
– If yes, B asks if it’s less than 25
– If no, B asks if it’s less than 75
And so on
This is bisection
Here’s a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with f ( a) < 0 < f (b)
In fact SciPy provides it’s own bisection function, which we now test using the function f defined
in (1.10)
In [24]: from scipy.optimize import bisect
In [26]: bisect(f, 0, 1)
Out[26]: 0.40829350427936706
The Newton-Raphson Method Another very common root-finding algorithm is the Newton-
Raphson method
In SciPy this algorithm is implemented by scipy.newton
Unlike bisection, the Newton-Raphson method uses local slope information
This is a double-edged sword:
• When the function is well-behaved, the Newton-Raphson method is faster than bisection
• When the function is less well-behaved, the Newton-Raphson might fail
Let’s investigate this using the same function f , first looking at potential instability
In [27]: from scipy.optimize import newton
Hybrid Methods So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle
• If you have specific knowledge about your function, you might be able to exploit it to gen-
erate efficiency
• If not, then algorithm choice involves a trade-off between speed of convergence and robust-
ness
In practice, most default algorithms for root finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following manner:
1. Attempt to use a fast method
2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm
In scipy.optimize, the function brentq is such a hybrid method, and a good default
In [35]: brentq(f, 0, 1)
Out[35]: 0.40829350427936706
Here the correct solution is found and the speed is almost the same as newton
Fixed Points SciPy has a function for finding (scalar) fixed points too
In [1]: from scipy.optimize import fixed_point
If you don’t get good results, you can always switch back to the brentq root finder, since the fixed
point of a function f is the root of g( x ) := x − f ( x )
Optimization
Integration
Most numerical integration methods work by computing the integral of an approximating poly-
nomial
The resulting error depends on how well the polynomial fits the integrand, which in turn depends
on how “regular” the integrand is
In SciPy, the relevant module for numerical integration is scipy.integrate
A good default for univariate integration is quad
In [13]: from scipy.integrate import quad
In [15]: integral
Out[15]: 0.33333333333333337
In fact quad is an interface to a very standard numerical integration routine in the Fortran library
QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integration—a useful one is fixed_quad, which is fast and
hence works well inside for loops
Linear Algebra
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines
Exercises
Exercise 1 Recall that we previously discussed the concept of recusive function calls
Write a recursive implementation of the bisection function described above, which we repeat here
for convenience
def bisect(f, a, b, tol=10e-5):
"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b
Solutions
Solution notebook
1.10 Matplotlib
Contents
• Matplotlib
– Overview
– A Simple API
– The Object-Oriented API
– More Features
– Further Reading
Overview
We’ve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with
• high quality 2D and 3D plots
• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• animation, etc., etc.
A Simple API
Matplotlib is very easy to get started with, thanks to its simple MATLAB-style API (Application
Progamming Interface)
Here’s the kind of easy example you might find in introductory treatments
from pylab import * # Depreciated
x = linspace(0, 10, 200)
y = sin(x)
plot(x, y, 'b-', linewidth=2)
show()
The API described above is simple and convenient, but also a bit limited and somewhat un-
Pythonic
For example, in the function calls a lot of objects get created and passed around without making
themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (type import this in
the IPython (or Python) shell and look at the second line)
This leads us to the alternative, object oriented Matplotlib API
Here’s the code corresponding to the preceding figure using the object oriented API:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.linspace(0, 10, 200)
y = np.sin(x)
While there’s a bit more typing, the more explicit use of objects gives us more fine-grained control
This will become more clear as we go along
Incidentally, regarding the above lines of code,
• the form of the import statement import matplotlib.pyplot as plt is standard
• Here the call fig, ax = plt.subplots() returns a pair, where
– fig is a Figure instance—like a blank canvas
– ax is an AxesSubplot instance—think of a frame for plotting in
• The plot() function is actually a method of ax
Tweaks Here we’ve changed the line to red and added a legend
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.linspace(0, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
We’ve also used alpha to make the line slightly transparent—which makes it look smoother
Unfortunately the legend is obscuring the line
This can be fixed by replacing ax.legend() with ax.legend(loc=’upper center’)
The r in front of the label string tells Python that this is a raw string
The figure now looks as follows
More Features
Matplotlib has a huge array of functions and features, which you can discover over time as you
have need for them
We mention just a few
Multiple Plots on One Axis It’s straightforward to generate mutiple plots on the same axes
Here’s an example that randomly generates three normal densities and adds a label with their
mean
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from random import uniform
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = r'$\mu = {0:.2f}$'.format(m)
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
In fact the preceding figure was generated by the code above preceded by the following three lines
Depending on your LaTeX installation, this may or may not work for you — try experimenting
and see how you go
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
A Customizing Function Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
Here’s a nice example from this blog of how the object-oriented API can be used to build a custom
subplots function that implements these changes
Read carefully through the code and see if you can follow what’s going on
import matplotlib.pyplot as plt
import numpy as np
def subplots():
"Custom subplots with axes throught the origin"
fig, ax = plt.subplots()
ax.grid()
return fig, ax
Here’s the figure it produces (note axes through the origin and the grid)
Further Reading
1.11 Pandas
Contents
• Pandas
– Overview
– Series
– DataFrames
– On-Line Data Sources
– Exercises
– Solutions
Overview
1 Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.
Series
Perhaps the two most important data types defined by pandas are the DataFrame and Series types
You can think of a Series as a “column” of data, such as a collection of observations on a single
variable
In [4]: s = pd.Series(np.random.randn(4), name='daily returns')
In [5]: s
Out[5]:
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
Name: daily returns
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the values
being daily returns on their shares
Pandas Series are built on top of NumPy arrays, and support many similar operations
In [6]: s * 100
Out[6]:
0 43.027108
1 61.732829
2 -26.542104
3 -83.611339
Name: daily returns
In [7]: np.abs(s)
Out[7]:
0 0.430271
1 0.617328
2 0.265421
3 0.836113
Name: daily returns
In [10]: s
Out[10]:
AMZN 0.430271
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
Name: daily returns
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction that the
items in the dictionary all have the same type—in this case, floats)
In fact you can use much of the same syntax as Python dictionaries
In [11]: s['AMZN']
Out[11]: 0.43027108469945924
In [12]: s['AMZN'] = 0
In [13]: s
Out[13]:
AMZN 0.000000
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
Name: daily returns
In [14]: 'AAPL' in s
Out[14]: True
DataFrames
As mentioned above a DataFrame is somewhat like a spreadsheet, or a structure for storing the
data matrix in a regression
While a Series is one individual column of data, a DataFrame is all the columns
Let’s look at an example, reading in data from the CSV file pandas/test_pwt.csv in the applica-
tions repository
Here’s the contents of test_pwt.csv, which is a small excerpt from the Penn World Tables
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5788042896"
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7200975332"
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.072205773"
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.266688415"
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","11.658954494"
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427","5.7265463933"
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.0324539789"
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988"
Here we’re in IPython, so we have access to shell commands such as ls, as well as the usual
Python commands
In [15]: ls data/test_pw* # List all files starting with 'test_pw' -- check CSV file is in present work
test_pwt.csv
In [29]: type(df)
Out[29]: pandas.core.frame.DataFrame
In [30]: df
Out[30]:
country country isocode year POP XRAT tcgdp cc cg
0 Argentina ARG 2000 37335.653 0.999500 295072.218690 0 75.716805 5.578804
1 Australia AUS 2000 19053.186 1.724830 541804.652100 1 67.759026 6.720098
2 India IND 2000 1006300.297 44.941600 1728144.374800 2 64.575551 14.072206
3 Israel ISR 2000 6114.570 4.077330 129253.894230 3 64.436451 10.266688
4 Malawi MWI 2000 11801.505 59.543808 5026.221784 4 74.707624 11.658954
5 South Africa ZAF 2000 45064.098 6.939830 227242.369490 5 72.718710 5.726546
6 United States USA 2000 282171.957 1.000000 9898700.000000 6 72.347054 6.032454
7 Uruguay URY 2000 3219.793 12.099592 25255.961693 7 78.978740 5.108068
We can select particular rows using standard Python array slicing notation
In [13]: df[2:5]
Out[13]:
country country isocode year POP XRAT tcgdp cc cg
2 India IND 2000 1006300.297 44.941600 1728144.374800 64.575551 14.072206
3 Israel ISR 2000 6114.570 4.077330 129253.894230 64.436451 10.266688
4 Malawi MWI 2000 11801.505 59.543808 5026.221784 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented as
strings
In [14]: df[['country', 'tcgdp']]
Out[14]:
country tcgdp
0 Argentina 295072.218690
1 Australia 541804.652100
2 India 1728144.374800
3 Israel 129253.894230
4 Malawi 5026.221784
5 South Africa 227242.369490
6 United States 9898700.000000
7 Uruguay 25255.961693
2 India 1728144.374800
3 Israel 129253.894230
4 Malawi 5026.221784
5 South Africa 227242.369490
Let’s imagine that we’re only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is as follows
In [31]: keep = ['country', 'POP', 'tcgdp']
In [32]: df = df[keep]
In [33]: df
Out[33]:
country POP tcgdp
0 Argentina 37335.653 295072.218690
1 Australia 19053.186 541804.652100
2 India 1006300.297 1728144.374800
3 Israel 6114.570 129253.894230
4 Malawi 11801.505 5026.221784
5 South Africa 45064.098 227242.369490
6 United States 282171.957 9898700.000000
7 Uruguay 3219.793 25255.961693
Here the index 0, 1,..., 7 is redundant, because we can use the country names as an index
To do this, first let’s pull out the country column using the pop method
In [34]: countries = df.pop('country')
In [35]: type(countries)
Out[35]: pandas.core.series.Series
In [36]: countries
Out[36]:
0 Argentina
1 Australia
2 India
3 Israel
4 Malawi
5 South Africa
6 United States
7 Uruguay
Name: country
In [37]: df
Out[37]:
POP tcgdp
0 37335.653 295072.218690
1 19053.186 541804.652100
2 1006300.297 1728144.374800
3 6114.570 129253.894230
4 11801.505 5026.221784
5 45064.098 227242.369490
6 282171.957 9898700.000000
7 3219.793 25255.961693
In [39]: df
Out[39]:
POP tcgdp
country
Argentina 37335.653 295072.218690
Australia 19053.186 541804.652100
India 1006300.297 1728144.374800
Israel 6114.570 129253.894230
Malawi 11801.505 5026.221784
South Africa 45064.098 227242.369490
United States 282171.957 9898700.000000
Uruguay 3219.793 25255.961693
In [41]: df
Out[41]:
population total GDP
country
Argentina 37335.653 295072.218690
Australia 19053.186 541804.652100
India 1006300.297 1728144.374800
Israel 6114.570 129253.894230
Malawi 11801.505 5026.221784
South Africa 45064.098 227242.369490
United States 282171.957 9898700.000000
Uruguay 3219.793 25255.961693
In [67]: df
Out[67]:
population total GDP
country
Argentina 37335653 295072.218690
Australia 19053186 541804.652100
India 1006300297 1728144.374800
Israel 6114570 129253.894230
Malawi 11801505 5026.221784
South Africa 45064098 227242.369490
United States 282171957 9898700.000000
Uruguay 3219793 25255.961693
Next we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as we
In [75]: df
Out[75]:
population total GDP GDP percap
country
Argentina 37335653 295072.218690 7903.229085
Australia 19053186 541804.652100 28436.433261
India 1006300297 1728144.374800 1717.324719
Israel 6114570 129253.894230 21138.672749
Malawi 11801505 5026.221784 425.896679
South Africa 45064098 227242.369490 5042.647686
United States 282171957 9898700.000000 35080.381854
Uruguay 3219793 25255.961693 7843.970620
One of the nice things about pandas DataFrame and Series objects is that they have methods for
plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita
In [76]: df['GDP percap'].plot(kind='bar')
Out[76]: <matplotlib.axes.AxesSubplot at 0x2f22ed0>
In [78]: plt.show()
In [84]: df
Out[84]:
population total GDP GDP percap
country
United States 282171957 9898700.000000 35080.381854
Australia 19053186 541804.652100 28436.433261
Israel 6114570 129253.894230 21138.672749
Argentina 37335653 295072.218690 7903.229085
Uruguay 3219793 25255.961693 7843.970620
South Africa 45064098 227242.369490 5042.647686
India 1006300297 1728144.374800 1717.324719
Malawi 11801505 5026.221784 425.896679
Accessing Data with urllib.request One option is to use urllib.request, a standard Python li-
brary for requesting data over the Internet
To begin, try the following code on your computer
In [36]: import urllib.request
In [58]: source[0]
Out[58]: 'DATE,VALUE\r\n'
In [59]: source[1]
Out[59]: '1948-01-01,3.4\r\n'
In [60]: source[2]
Out[60]: '1948-02-01,3.8\r\n'
We could now write some additional code to parse this text and store it as an array...
But this is unnecessary — pandas’ read_csv function can handle the task for us
In [69]: source = urllib.request.urlopen(url)
The data has been read into a pandas DataFrame called data that we can now manipulate in the
usual way
In [71]: type(data)
Out[71]: pandas.core.frame.DataFrame
Accessing Data with pandas Although it is worth understanding the low level procedures, for
the present case pandas can take care of all these messy details
(pandas puts a simple API (Application Progamming Interface) on top of the kind of low level
function calls we’ve just covered)
For example, we can obtain the same unemployment data for the period 2006–2012 inclusive as
follows
In [77]: import pandas.io.data as web
In [81]: type(data)
Out[81]: pandas.core.frame.DataFrame
In [82]: data.plot()
Out[82]: <matplotlib.axes.AxesSubplot at 0xcf79390>
In [84]: plt.show()
(If you’re working in the IPython notebook, the last two lines can probably be omitted)
The resulting figure looks as follows
Data from the World Bank Let’s look at one more example of downloading and manipulating
data — this time from the World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, here we find data on government debt as a ratio to GDP:
http://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS/countries
If you click on “DOWNLOAD DATA” you will be given the option to download the data as an
Excel file
The next program does this for you, parses the data from Excel file to pandas DataFrame, and
plots time series for France, Germany, the US and Australia
import sys
import matplotlib.pyplot as plt
if sys.version_info[0] == 2:
from urllib import urlretrieve
elif sys.version_info[0] == 3:
from urllib.request import urlretrieve
Exercises
Exercise 1 Write a program to calculate the percentage price change since the start of the year for
the following shares
ticker_list = {'INTC': 'Intel',
'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'RSH': 'RadioShack',
'TM': 'Toyota',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
Solutions
Solution notebook
Contents
• IPython Shell and Notebook
– Overview
– IPython Magics
– Debugging
– Python in the Cloud
“Debugging is twice as hard as writing the code in the first place. Therefore, if you
write the code as cleverly as possible, you are, by definition, not smart enough to debug
it.” – Brian Kernighan
Overview
As you know by now, IPython is not really a scientific library — it’s an enhanced Python command
interface oriented towards scientific workflow
We’ve already discussed the IPython notebook and shell, starting in this lecture
Here we briefly review some more of IPython’s features
We will work in the IPython shell, but almost all of the following applies to the notebook too
IPython Magics
Line Magics As you know by now, any Python command can be typed into an IPython shell
In [1]: 'foo' * 2
Out[1]: 'foofoo'
A program foo.py in the current working directory can be executed using run
In [2]: run foo.py
Timing Code For scientific calculations, we often need to know how long certain blocks of code
take to run
For this purpose, IPython includes the timeit magic
Usage is very straightforward — let’s look at an example
In earier exercises, we wrote two different functions to calculate the value of a polynomial
Let’s put them in a file called temp.py as follows
## Filename: temp.py
import numpy as np
Note that p1 uses pure Python, whereas p2 uses NumPy arrays and should run faster
Here’s how we can test this
In [1]: run temp.py
In [2]: p1(10, (1, 2)) # Let's make sure the function works OK
Out[2]: 21
For p1, average execution time was 1.15 milliseconds, while for p2 it was about 10 microseconds
(i.e., millionths of a second) — two orders of magnitude faster
Reloading Modules Here is one very common Python gotcha and a nice solution provided by
IPython
When we work with multiple files, changes in one file are not always visible in our program
To see this, suppose that you are working with files useful_functions.py and main_program.py
As the names suggest, the main program resides in main_program.py but imports functions from
useful_functions.py
You might have noticed that if you make a change to useful_functions.py and then re-run
main_program.py, the effect of that change isn’t always apparent
Here’s an example useful_functions.py in the current directory
## Filename: useful_functions.py
def meaning_of_life():
"Computes the meaning of life"
return 42
x = meaning_of_life()
print("The meaning of life is: {}".format(x))
def meaning_of_life():
"Computes the meaning of life"
return 43
The reason is that useful_functions.py has been compiled to a byte code file, in preparation for
sending its instructions to the Python virtual machine
The byte code file will be called useful_functions.pyc, and live in the same directory as
useful_functions.py
Even though we’ve modified useful_functions.py, this change is not reflected in
useful_functions.pyc
The nicest way to get your dependencies to recompile is to use IPython’s autoreload extension
In [3]: %load_ext autoreload
In [4]: autoreload 2
If you want this behavior to load automatically when you start IPython, add these lines to your
ipython_config.py file
c.InteractiveShellApp.extensions = ['autoreload']
c.InteractiveShellApp.exec_lines = ['%autoreload 2']
Debugging
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, it’s OK, we all used to do that
But today might be a good day to turn a new page, and start using a debugger
Debugging is a big topic, but it’s actually very easy to learn the basics
The standard Python debugger is pdb
Here we use one called ipdb that plays well with the IPython shell
Either pdb or ipdb will do the job fine
Let’s look at an example of when and how to use them
The debug Magic Let’s consider a simple (and rather contrived) example, where we have a script
called temp.py with the following contents
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
This code is intended to plot the log function over the interval [1, 2]
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suitable for
having two subplots on the same figure)
Here’s what happens when we run the code
In [1]: run temp.py
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where)
176 else:
177 filename = fname
--> 178 __builtin__.execfile(filename,*where)
/home/john/temp/temp.py in <module>()
8 plt.show()
9
---> 10 plot_log()
/home/john/temp/temp.py in plot_log()
5 fig, ax = plt.subplots(2, 1)
6 x = np.linspace(1, 2, 10)
----> 7 ax.plot(x, np.log(x))
8 plt.show()
9
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array has
no plot method
But let’s pretend that we don’t understand this for the moment
We might suspect there’s something wrong with ax, but when we try to investigate this object
In [2]: ax
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-645aedc8a285> in <module>()
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that function
terminates
Let’s try doing it a different way
First we run temp.py again, but this time we respond to the exception by typing debug
This will cause us to be dropped into the Python debugger at the point of execution just before the
exception occurs
In [1]: run temp.py
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where)
176 else:
177 filename = fname
--> 178 __builtin__.execfile(filename,*where)
/home/john/temp/temp.py in <module>()
8 plt.show()
9
---> 10 plot_log()
/home/john/temp/temp.py in plot_log()
5 fig, ax = plt.subplots(2, 1)
6 x = np.linspace(1, 2, 10)
----> 7 ax.plot(x, np.log(x))
8 plt.show()
9
In [2]: debug
> /home/john/temp/temp.py(7)plot_log()
6 x = np.linspace(1, 2, 10)
----> 7 ax.plot(x, np.log(x))
8 plt.show()
ipdb>
We’re now at the ipdb> prompt, at which we can investigate the value of our variables at this point
in the program, step forward through the code, etc.
For example, here we simply type the name ax to see what’s happening with this object
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
It’s now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the on line help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
Setting a Break Point The preceding approach is handy but sometimes insufficent
For example, consider the following modified version of temp.py
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Here the original problem is fixed, by we’ve accidentally written np.logspace(1, 2, 10) instead
of np.linspace(1, 2, 10)
Now there won’t be any exception, but the plot will not look right
To use the debugger to investigate, we can add a “break point”, by inserting the line import ipdb;
ipdb.set_trace() in a suitable location
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
import ipdb; ipdb.set_trace()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now let’s run the script, and investigate via the debugger
In [3]: run temp.py
> /home/john/temp/temp.py(6)plot_log()
ipdb> n
> /home/john/temp/temp.py(7)plot_log()
6 fig, ax = plt.subplots()
----> 7 x = np.logspace(1, 2, 10)
8 ax.plot(x, np.log(x))
ipdb> n
> /home/john/temp/temp.py(8)plot_log()
7 x = np.logspace(1, 2, 10)
----> 8 ax.plot(x, np.log(x))
9 plt.show()
ipdb> x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])
Here we used n twice to step forward through the code (one line at a time), and then printed the
value of x to see what was happening with that variable
you can do it from a notebook cell as long as you put a ! in front of the command
For example
!git clone https://github.com/QuantEcon/QuantEcon.py
If this works, you should now have the main repository sitting in your pwd, and you can cd into
it and get programming in the same manner described above
The big difference is that your programs are now running on Amazon’s massive web service
infrastructure!
Contents
• The Need for Speed
– Overview
– Where are the Bottlenecks?
– Vectorization
– Numba
– Cython
– Other Options
– Exercises
– Solutions
Overview
Note: In what follows we often ask you to execute code in an IPython notebook cell. Such code
will not run outside the notebook without modifications. This is because we take advantage of
some IPython line and cell magics
Let’s start by trying to understand why high level languages like Python are slower than compiled
code
In [2]: a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
In [3]: a, b = 'foo', 'bar'
In [4]: a + b
Out[4]: 'foobar'
In [6]: a + b
Out[6]: ['foo', 'bar']
(We say that the operator + is overloaded — its action depends on the type of the objects on which
it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10
# include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
Data Access Another drag on speed for high level languages is data access
To illustrate, let’s consider the problem of summing some data — say, a collection of integers
Summing with Compiled Code In C or Fortran, these integers would typically be stored in an
array, which is a simple data structure for storing homogeneous data
Such an array is stored in a single contiguous block of memory
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
• For example, a 64 bit integer is stored in 8 bytes of memory
• An array of n such integers occupies 8n consecutive memory slots
Moreover, the compiler is made aware of the data type by the programmer
• In this case 64 bit integers
Hence each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount
• In this case 8 bytes
Summing in Pure Python Python tries to replicate these ideas to some degree
For example, in the standard Python implementation (CPython), list elements are placed in mem-
ory locations that are in a sense contiguous
However, these list elements are more like pointers to data rather than actual data
Hence there is still overhead involved in accessing the data values themselves
This is a considerable drag on speed
In fact it’s generally true that memory traffic is a major culprit when it comes to slow execution
Let’s look at some ways around these problems
Vectorization
Now try
%%timeit
n = 100000
sum = 0
for i in range(n):
x = random.uniform(0, 1)
sum += x**2
Followed by
%%timeit
n = 100000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
You should find that the second code block — which achieves the same thing as the first — runs
one to two orders of magnitude faster
The reason is that in the second implementation we have broken the loop down into three basic
operations
1. draw n uniforms
2. square them
3. sum them
These are sent as batch operators to optimized machine code
Apart from minor overheads associated with sending data back and forth, the result is C- or
Fortran-like speed
When we run batch operations on arrays like this, we say that the code is vectorized
Although there are exceptions, vectorized code is typically fast and efficient
It is also surprisingly flexible, in the sense that many operations can be vectorized
The next section illustrates this point
Universal Functions Many functions provided by NumPy are so-called universal functions —
also called ufuncs
This means that they
• map scalars into scalars, as expected
• map arrays into arrays, acting elementwise
For example, np.cos is a ufunc:
In [1]: import numpy as np
In [2]: np.cos(1.0)
Out[2]: 0.54030230586813977
for y in grid:
z = f(x, y)
if z > m:
m = z
print(m)
In the vectorized version all the looping takes place in compiled code
If you add %%timeit to the top of these code snippets and run them in a notebook cell, you’ll see
that the second version is much faster — about two orders of magnitude
Pros and Cons of Vectorization At its best, vectorization yields fast, simple code
However, it’s not without disadvantages
One issue is that it can be highly memory intensive
For example, the vectorized maximization routine above is far more memory intensive than the
non-vectorized version that preceded it
Another issue is that not all algorithms can be vectorized
In these kinds of settings, we need to go back to loops
Fortunately, there are very nice ways to speed up Python loops
Numba
One of the most exciting developments in recent years in terms of scientific Python is Numba
Numba aims to automatically compile functions to native machine code instructions on the fly
The process isn’t flawless, since Numba needs to infer type information on all variables to generate
pure machine instructions
Such inference isn’t possible in every setting
But for simple routines Numba infers types very well
Moreover, the “hot loops” at the heart of our code that we need to speed up are often such simple
routines
Prerequisites If you followed our set up instructions and installed Anaconda, then you’ll be
ready to use Numba
If not, try import numba
• If you get no complaints then you should be good to go
• If you do experience problems here or below then consider installing Anaconda
If you do have Anaconda installed, now might be a good time to run conda update numba from a
system terminal
xt+1 = 4xt (1 − xt )
Here’s the plot of a typical trajectory, starting from x0 = 0.1, with t on the x-axis
Now here’s a function to generate a trajectory of a given length from a given initial condition
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
Here the function body is identical to qm — the name has been changed only to aid speed compar-
isons
Timing the function calls qm(0.1, 100000) and qm_numba(0.1, 100000) gives us a speed-up fac-
tor in the order of 400 times
Your mileage may vary depending on your hardware and version of Numba, but anything in this
neighborhood is remarkable given how trivial the implementation is
How and When it Works Numba attempts to generate fast machine code using the infrastruc-
ture provided by the LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
In an ideal setting, Numba can infer all necessary type information
This allows it to generate native machine code, without having to call the Python runtime envi-
ronment
In such a setting, Numba will be on par with machine code from low level languages
When Numba cannot infer all type information, some Python objects are given generic object
status, and some code is generated using the Python runtime
In this second setting, Numba typically provides only minor speed gains — or none at all
Hence it’s prudent when using Numba to focus on speeding up small, time-critical snippets of
code
This will give you much better performance than blanketing your Python programs with @jit
statements
Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As you’ll recall, Numba solves this problem (where possible) by inferring type
Cython’s approach is different — programmers add type definitions directly to their “Python”
code
As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming Cython
code into optimized C and C++ code
Cython also takes care of building language extentions — the wrapper code that interfaces be-
tween the resulting compiled code and Python
As we’ll see, Cython is particularly easy to use from within the IPython notebook
}
return sum;
}
If you’re not familiar with C, the main thing you should take notice of is the type definitions
• int means integer
• double means double precision floating point number
• the double in double geo_prog(... indicates that the function will return a double
Not surprisingly, the C code is faster than the Python code
Here cdef is a Cython keyword indicating a variable declaration, and is followed by a type
The %%cython line at the top is not actually Cython code — it’s an IPython cell magic indicating
the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within Python
What you are in fact calling is compiled C code that runs at about the same speed as our hand-
coded C routine above
Example 2: Cython with NumPy Arrays Let’s go back to the first problem that we worked with:
generating the iterates of the quadratic map
xt+1 = 4xt (1 − xt )
The problem of computing iterates and returning a time series requires us to work with arrays
The natural array type to work with is NumPy arrays
Here’s a Cython implemention that initializes, populates and returns a NumPy array
%%cython
import numpy as np
If you run this code and time it, you will see that it’s performance is disappointing — nothing like
the speed gain we got from Numba
• See qm_numba above
The reason is that working with NumPy arrays still incurs substantial Python overheads
We can do better by using Cython’s typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Here’s an example:
%%cython
import numpy as np
from numpy cimport float_t
Here
• cimport pulls in some compile-time information from NumPy
• cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array
x_np_array
• the return statement uses np.asarray(x) to convert the memoryview back to a NumPy array
On our hardware, the Cython implementation qm_cython runs at about the same speed as
qm_numba
Summary Cython requires more expertise than Numba, and is a little more fiddly in terms of
getting good performance
In fact it’s surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,
• Cython is a very mature, stable and widely used tool
• Cython can be more useful than Numba when working with larger, more sophisticated ap-
plications
Other Options
There are in fact many other approaches to speeding up your Python code
We mention only a few of the most popular methods
Interfacing with Fortran If you are comfortable writing Fortran you will find it very easy to
create extention modules from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, an IPython cell magic for Fortran has been developed — you might want to give it a try
Parallel and Cloud Computing This is a big topic that we won’t address in detail yet
However, you might find the following links a useful starting point
• IPython for parallel computing
• NumbaPro
• The Starcluster interface to Amazon’s EC2
• Anaconda Accelerate
Exercises
Exercise 1 Later we’ll learn all about finite state Markov chains
For now let’s just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes — high or low
The transition probabilities across states are as follows
0.9 0.8
0.1
low high
0.2
For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be
• high with probability 0.8
• low with probability 0.2
Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3
Solutions
Solution notebook
Appendix — Other Options There are other important projects aimed at speeding up Python
These include but are not limited to
• Pythran : A Python to C++ compiler
• Parakeet : A runtime compiler aimed at scientific computing in Python
• PyPy : Runtime environment using just-in-time compiler
• Nuitka : Another Python compiler
• Pyston : Under development, sponsored by Dropbox
TWO
INTRODUCTORY APPLICATIONS
Contents
• Linear Algebra
– Overview
– Vectors
– Matrices
– Solving Systems of Equations
– Eigenvalues and Eigenvectors
– Further Topics
Overview
One of the single most useful branches of mathematics you can learn is linear algebra
For example, many applied problems in economics, finance, operations research and other fields
of science require the solution of a linear system of equations, such as
y1 = ax1 + bx2
y2 = cx1 + dx2
181
2.1. LINEAR ALGEBRA 182
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
• If a solution exists, how should we compute it?
These are the kinds of topics addressed by linear algebra
In this lecture we will cover the basics of linear and matrix algebra, treating both theory and
computation
We admit some overlap with this lecture, where operations on NumPy arrays were first explained
Note that this lecture is more theoretical than most, and contains background material that will be
used in applications as we go along
Vectors
A vector of length n is just a sequence (or array, or tuple) of n numbers, which we write as x =
( x1 , . . . , xn ) or x = [ x1 , . . . , xn ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish
between the two)
The set of all n-vectors is denoted by Rn
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner
If you’re interested, the Python code for producing this figure is here
Vector Operations The two most common operators for vectors are addition and scalar multi-
plication, which we now describe
As a matter of definition, when we add two vectors, we add them element by element
x1 y1 x1 + y1
x2 y2 x2 + y2
x + y = . + . :=
..
.. ..
.
xn yn xn + yn
Scalar multiplication is an operation that takes a number γ and a vector x and produces
γx1
γx2
γx := .
..
γxn
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more
commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syn-
tax
In [1]: import numpy as np
In [4]: x + y
Out[4]: array([ 3., 5., 7.])
In [5]: 4 * x
Out[5]: array([ 4., 4., 4.])
Span Given a set of vectors A := { a1 , . . . , ak } in Rn , it’s natural to think about the new vectors
we can create by performing linear operations
New vectors created in this manner are called linear combinations of A
In particular, y ∈ Rn is a linear combination of A := { a1 , . . . , ak } if
y = β 1 a1 + · · · + β k ak for some scalars β 1 , . . . , β k
In this context, the values β 1 , . . . , β k are called the coefficients of the linear combination
The set of linear combinations of A is called the span of A
The next figure shows the span of A = { a1 , a2 } in R3
The span is a 2 dimensional plane passing through these two points and the origin
The code for producing this figure can be found here
Examples If A contains only one vector a1 ∈ R2 , then its span is just the scalar multiples of a1 ,
which is the unique line passing through both a1 and the origin
If A = {e1 , e2 , e3 } consists of the canonical basis vectors of R3 , that is
1 0 0
e1 : = 0 , e2 : = 1 , e3 : = 0
0 0 1
x = x 1 e1 + x 2 e2 + x 3 e3
Linear Independence As we’ll see, it’s often desirable to find families of vectors with relatively
large span, so that many vectors can be described by linear operators on a few vectors
The condition we need for a set of vectors to have a large span is what’s called linear independence
In particular, a collection of vectors A := { a1 , . . . , ak } in Rn is said to be
• linearly dependent if some strict subset of A has the same span as A
• linearly independent if it is not linearly dependent
Put differently, a set of vectors is linearly independent if no vector is redundant to the span, and
linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors { a1 , a2 } in R3 as a plane
through the origin
If we take a third vector a3 and form the set { a1 , a2 , a3 }, this set will be
• linearly dependent if a3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since Rn can be spanned by n vectors (see the discussion of
canonical basis vectors above), any collection of m > n vectors in Rn must be linearly dependent
The following statements are equivalent to linear independence of A := { a1 , . . . , ak } ⊂ Rn
1. No vector in A can be formed as a linear combination of the other elements
2. If β 1 a1 + · · · β k ak = 0 for scalars β 1 , . . . , β k , then β 1 = · · · = β k = 0
(The zero in the first expression is the origin of Rn )
Unique Representations Another nice thing about sets of linearly independent vectors is that
each element in the span has a unique representation as a linear combination of these vectors
In other words, if A := { a1 , . . . , ak } ⊂ Rn is linearly independent and
y = β 1 a1 + · · · β k a k
( β 1 − γ1 ) a1 + · · · + ( β k − γk ) ak = 0
Matrices
Matrices are a neat way of organizing data for use in linear operations
An n × k matrix is a rectangular array A of numbers with n rows and k columns:
a11 a12 · · · a1k
a21 a22 · · · a2k
A= .
. .. ..
. . .
an1 an2 · · · ank
Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed
at the start of this lecture
For obvious reasons, the matrix A is also called a vector if either n = 1 or k = 1
In the former case, A is called a row vector, while in the latter it is called a column vector
If n = k, then A is called square
The matrix formed by replacing aij by a ji for every i and j is called the transpose of A, and denoted
A0 or A>
If A = A0 , then A is called symmetric
For a square matrix A, the i elements of the form aii for i = 1, . . . , n are called the principal diagonal
A is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then A is
called the identity matrix, and denoted by I
Matrix Operations Just as was the case for vectors, a number of algebraic operations are defined
for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:
a11 · · · a1k γa11 · · · γa1k
γA = γ ... .. .. := .. .. ..
. . . . .
an1 · · · ank γan1 · · · γank
and
a11 · · · a1k b11 · · · b1k a11 + b11 · · · a1k + b1k
A + B = ... .. .. + .. .. .. := .. .. ..
. . . . . . . .
an1 · · · ank bn1 · · · bnk an1 + bn1 · · · ank + bnk
In the latter case, the matrices must have the same shape in order for the definition to make sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above, and is
designed to make multiplication play well with basic linear operations
If A and B are two matrices, then their product AB is formed by taking as its i, j-th element the
inner product of the i-th row of A and the j-th column of B
There are many tutorials to help you visualize this operation, such as this one, or the discussion
on the Wikipedia page
If A is n × k and B is j × m, then to multiply A and B we require k = j, and the resulting matrix
AB is n × m
As perhaps the most important special case, consider multiplying n × k matrix A and k × 1 column
vector x
According to the preceding rule, this gives us an n × 1 column vector
a11 · · · a1k x1 a11 x1 + · · · + a1k xk
Ax = ... .. .. .. := .. (2.2)
. . . .
an1 · · · ank xk an1 x1 + · · · + ank xk
Matrices in NumPy NumPy arrays are also used as matrices, and have fast, efficient functions
and methods for all the standard matrix operations 1
You can create them manually from tuples of tuples (or lists of lists) as follows
In [1]: import numpy as np
In [3]: type(A)
Out[3]: tuple
In [4]: A = np.array(A)
In [5]: type(A)
Out[5]: numpy.ndarray
In [6]: A.shape
Out[6]: (2, 2)
The shape attribute is a tuple giving the number of rows and columns — see here for more discus-
sion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.)
— see here
Since operations are performed elementwise by default, scalar multiplication and addition have
very natural syntax
In [8]: A = np.identity(3)
In [10]: 2 * A
Out[10]:
array([[ 2., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 2.]])
In [11]: A + B
1 Although there is a specialized matrix data type defined in NumPy, it’s more standard to work with ordinary
Out[11]:
array([[ 2., 1., 1.],
[ 1., 2., 1.],
[ 1., 1., 2.]])
Matrices as Maps Each n × k matrix A can be identified with a function f ( x ) = Ax that maps
x ∈ Rk into y = Ax ∈ Rn
These kinds of functions have a special property: they are linear
A function f : Rk → Rn is called linear if, for all x, y ∈ Rk and all scalars α, β, we have
You can check that this holds for the function f ( x ) = Ax + b when b is the zero vector, and fails
when b is nonzero
In fact, it’s known that f is linear if and only if there exists a matrix A such that f ( x ) = Ax for all
x.
y = Ax (2.3)
The problem we face is to determine a vector x ∈ Rk that solves (2.3), taking y and A as given
This is a special case of a more general problem: Find an x such that y = f ( x )
Given an arbitrary function f and a y, is there always an x such that y = f ( x )?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
In the first plot there are multiple solutions, as the function is not one-to-one, while in the second
there are no solutions, since y lies outside the range of f
Can we impose conditions on A in (2.3) that rule out these problems?
In this context, the most important thing to recognize about the expression Ax is that it corre-
sponds to a linear combination of the columns of A
Ax = x1 a1 + · · · + xk ak
The n × n Case Let’s discuss some more details, starting with the case where A is n × n
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary y ∈ Rn , we hope to find a unique x ∈ Rn such that y = Ax
In view of the observations immediately above, if the columns of A are linearly independent, then
their span, and hence the range of f ( x ) = Ax, is all of Rn
Hence there always exists an x such that y = Ax
Moreover, the solution is unique
In particular, the following are equivalent
1. The columns of A are linearly independent
2. For any y ∈ Rn , the equation y = Ax has a unique solution
The property of having linearly independent columns is sometimes expressed as having full col-
umn rank
Inverse Matrices Can we give some sort of expression for the solution?
If y and A are scalar with A 6= 0, then the solution is x = A−1 y
A similar expression is available in the matrix case
In particular, if square matrix A has full column rank, then it possesses a multiplicative inverse
matrix A−1 , with the property that AA−1 = A−1 A = I
As a consequence, if we pre-multiply both sides of y = Ax by A−1 , we get x = A−1 y
This is the solution that we’re looking for
Determinants Another quick comment about square matrices is that to every such matrix we
assign a unique number called the determinant of the matrix — you can find the expression for it
here
If the determinant of A is not zero, then we say that A is nonsingular
Perhaps the most important fact about determinants is that A is nonsingular if and only if A is of
full column rank
This gives us a useful one-number summary of whether or not a square matrix can be inverted
More Columns than Rows This is the n × k case with n < k, so there are fewer equations than
unknowns
In this case there are either no solutions or infinitely many — in other words, uniqueness never
holds
For example, consider the case where k = 3 and n = 2
Thus, the columns of A consists of 3 vectors in R2
This set can never be linearly independent, since 2 vectors are enough to span R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, let’s say that a1 = αa2 + βa3
y = x1 (αa2 + βa3 ) + x2 a2 + x3 a3 = ( x1 α + x2 ) a2 + ( x1 β + x3 ) a3
Linear Equations with SciPy Here’s an illustration of how to solve linear equations with SciPy’s
linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN code
In [9]: import numpy as np
In [12]: A = np.array(A)
In [16]: A_inv
Out[16]:
array([[-2. , 1. ],
[ 1.5, -0.5]])
Observe how we can solve for x = A−1 y by either via np.dot(inv(A), y), or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more stable,
and hence should almost always be preferred
To obtain the least squares solution x̂ = ( A0 A)−1 A0 y, use scipy.linalg.lstsq(A, y)
Av = λv
The eigenvalue equation is equivalent to ( A − λI )v = 0, and this has a nonzero solution v only
when the columns of A − λI are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for λ such that the determinant of A − λI is zero
This problem can be expressed as one of solving for the roots of a polynomial in λ of degree n
This in turn implies the existence of n solutions in the complex plane, although some might be
repeated
Some nice facts about the eigenvalues of a square matrix A are as follows
1. The determinant of A equals the product of the eigenvalues
2. The trace of A (the sum of the elements on the principal diagonal) equals the sum of the
eigenvalues
3. If A is symmetric, then all of its eigenvalues are real
4. If A is invertible and λ1 , . . . , λn are its eigenvalues, then the eigenvalues of A−1 are
1/λ1 , . . . , 1/λn
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues are
nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
In [1]: import numpy as np
In [4]: A = np.array(A)
In [6]: evals
Out[6]: array([ 3.+0.j, -1.+0.j])
In [7]: evecs
Out[7]:
array([[ 0.70710678, -0.70710678],
[ 0.70710678, 0.70710678]])
Av = λBv
Further Topics
We round out our discussion by briefly mentioning several other important topics
Series Expansions Recall the usual summation formula for a geometric progression, which
states that if | a| < 1, then ∑∞ k
k =0 a = ( 1 − a )
−1
k Ak := max k Ax k
k x k=1
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side
is a matrix norm — in this case, the so-called spectral norm
For example, for a square matrix S, the condition kSk < 1 means that S is contractive, in the sense
that it pulls all vectors towards the origin 2
Spectral Radius A result known as Gelfand’s formula tells us that, for any square matrix A,
ρ( A) = lim k Ak k1/k
k→∞
Here ρ( A) is the spectral radius, defined as maxi |λi |, where {λi }i is the set of eigenvalues of A
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a k with k Ak k < 1
In which case (2.4) is valid
Differentiating Linear and Quadratic forms The following formulas are useful in many eco-
nomic contexts. Let
• z, x and a all be n × 1 vectors
• A be an n × n matrix
2
Suppose that kSk < 1. Take any nonzero vector x, and let r := k x k. We have kSx k = r kS( x/r )k ≤ r kSk < r = k x k.
Hence every point is pulled towards the origin.
v( x ) = max −y0 Py − u0 Qu
y,u
L = −y0 Py − u0 Qu + λ0 [ Ax + Bu − y]
Note: If we don’t care about the Lagrange multipliers, we can subsitute the constraint into the
objective function, and then just maximize −( Ax + Bu)0 P( Ax + Bu) − u0 Qu with respect to u. You
can verify that this leads to the same maximizer.
Further Reading The documentation of the scipy.linalg submodule can be found here
Chapters 2 and 3 of the following text contains a discussion of linear algebra along the same lines
as above, with solved exercises
If you don’t mind a slightly abstract approach, a nice intermediate-level read on linear algebra is
[Janich94]
Contents
• Finite Markov Chains
– Overview
– Definitions
– Simulation
– Marginal Distributions
– Irreducibility and Aperiodicity
– Stationary Distributions
– Ergodicity
– Computing Expectations
– Exercises
– Solutions
Overview
Markov chains are one of the most useful classes of stochastic processes
Attributes:
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• very useful in their own right
You will find them in many of the workhorse models of economics and finance
In this lecture we review some of the theory of Markov chains
We will also introduce some of the great routines for working with Markov chains available in
QuantEcon
Prerequisite knowledge is basic probability and linear algebra
Definitions
Stochastic Matrices A stochastic matrix (or Markov matrix) is an n × n square matrix P = P[s, s0 ]
such that
1. each element P[s, s0 ] is nonnegative, and
2. each row P[s, ·] sums to one
(The square brackets notation for indices is unconventional but ties in well with the code below
and helps us differentiate between indices and time subscripts)
Let S := {0, . . . , n − 1}
Each row P[s, ·] can be regarded as a distribution (probability mass function) on S
It is not difficult to check 3 that if P is a stochastic matrix, then so is the k-th power Pk for all k ∈ N
Markov Chains There is a close connection between stochastic matrices and Markov chains
A Markov chain { Xt } on S is a stochastic process on S that has the Markov property
This means that, for any date t and any state s0 ∈ S,
P { X t +1 = s 0 | X t } = P { X t +1 = s 0 | X t , X t −1 , . . . } (2.5)
In other words, knowing the current state is enough to understand probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• P[s, s0 ] is the probability of going from s to s0 in one unit of time (one step)
• P[s, ·] is the conditional distribution of Xt+1 given Xt = s
It’s clear that P is a stochastic matrix
Conversely, if we take a stochastic matrix P, we can generate a Markov chain { Xt } as follows:
• draw X0 from some specified distribution
• for t = 0, 1, . . .,
– draw Xt+1 from P[ Xt , ·]
By construction, the resulting process satisfies (2.5)
Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or em-
ployed (state 1)
Suppose that, over a one month period,
1. An employed worker loses her job and becomes unemployed with probability β ∈ (0, 1)
3 Hint: First show that if P and Q are stochastic matrices then so is their product — to check the row sums, try
postmultiplying by a column vector of ones. Finally, argue that Pn is a stochastic matrix using induction.
Once we have the values α and β, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least once
over the next 12 months?
• Etc.
We’ll cover such applications below
Example 2 Using US unemployment data, Hamilton [Ham05] estimated the stochastic matrix
0.971 0.029 0
P := 0.145 0.778 0.077
0 0.508 0.492
where
• the frequency is monthly
• the first state represents “normal growth”
• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process { Xt }
This Markov process can also be represented as a directed graph, with edges labeled by transition
probabilities
Simulation
One of the most natural ways to answer questions about Markov chains is to simulate them
(As usual, to approximate the probability of event E, we can simulate many times and count the
fraction of times that E occurs)
Nice functionality for simulating Markov chains exists in QuantEcon
This is probably what you should use in applications, since it’s is efficient and bundled with lots
of other useful routines for handling Markov chains
However, it’s also a good exercise to roll our own routines — let’s do that first and then come back
to the methods in QuantEcon
Rolling our own To simulate a Markov chain, we need its stochastic matrix P and either an
initial state or a probability distribution ψ for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:
1. At time t = 0, the X0 is set to some fixed state or chosen from ψ
2. At each subsequent time t, the new state Xt+1 is drawn from P[ Xt , ·]
In order to implement this simulation procedure, we need a method for generating draws from a
discrete distributions
For this task we’ll use DiscreteRV from QuantEcon
In [64]: from quantecon import DiscreteRV
In [66]: d = DiscreteRV(psi)
We’ll write our code as a function that takes the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function should
return
import numpy as np
import quantecon as qe
X[0] = init
# === convert each row of P into a distribution === #
# In particular, P_dist[i] = the distribution corresponding to P[i,:]
n = len(P)
P_dist = [qe.DiscreteRV(P[i,:]) for i in range(n)]
return X
As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0 will
be about 0.25
If you run the following code you should get roughly that answer
In [5]: P = [[0.4, 0.6], [0.2, 0.8]]
In [7]: np.mean(X == 0)
Out[7]: 0.25128
Using QuantEcon’s Routines As discussed above, QuantEcon has very nice routines for han-
dling Markov chains, inluding simulation
Here’s an illustration using the same P as the preceding example
In [6]: import numpy as np
In [9]: mc = qe.MarkovChain(P)
In [10]: X = mc.simulate(ts_length=1000000)
In [11]: np.mean(X == 0)
Out[11]: 0.250359
Marginal Distributions
Suppose that
1. { Xt } is a Markov chain with stochastic matrix P
2. the distribution of Xt is known to be ψt
What then is the distribution of Xt+1 , or, more generally, of Xt+m ?
(Motivation for these questions is given below)
In words, to get the probability of being at s0 tomorrow, we account for all ways this can happen
and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
ψt+1 [s0 ] = ∑ P[s, s0 ]ψt [s]
s∈S
Multiple Step Transition Probabilities We know that the probability of transitioning from s to
s0 in one step is P[s, s0 ]
It turns out that that the probability of transitioning from s to s0 in m steps is Pm [s, s0 ], the [s, s0 ]-th
element of the m-th power of P
To see why, consider again (2.9), but now with ψt putting all probability on state s
If we regard ψt as a vector, it is a vector with 1 in the s-th position and zero elsewhere
Inserting this into (2.9), we see that, conditional on Xt = s, the distribution of Xt+m is the s-th row
of Pm
In particular
P{ Xt+m = s0 } = Pm [s, s0 ] = [s, s0 ]-th element of Pm
Example: Probability of Recession Recall the stochastic matrix P for recession and growth con-
sidered above
Suppose that the current state is unknown — perhaps statistics are available only at the end of the
current month
We estimate the probability that the economy is in state s to be ψ[s]
The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner
product
0
ψP6 · 1
1
Just about every theoretical treatment of Markov chains has some discussion of the concepts of
irreducibility and aperiodicity
Let’s see what they’re about
0.4
0.9 0.8
0.1 middle class 0.2
poor 0.4 0.1 rich
0.1
We can translate this into a stochastic matrix, putting zeros where there’s no edge between nodes
0.9 0.1 0
P := 0.4 0.4 0.2
0.1 0.1 0.8
It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state from any
other state eventually
We can also test this using QuantEcon’s MarkovChain class
In [1]: import quantecon as qe
In [3]: mc = qe.MarkovChain(P)
In [4]: mc.is_irreducible
Out[4]: True
Here’s a more pessimistic scenario, where the poor are poor forever
1.0
0.8 poor
0.1
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Let’s confirm this
In [2]: P = [[1.0, 0.0, 0.0],
...: [0.1, 0.8, 0.1],
...: [0.0, 0.2, 0.8]]
In [3]: mc = qe.MarkovChain(P)
In [4]: mc.is_irreducible
Out[4]: False
It might be clear to you already that irreducibility is going to be important in terms of long run
outcomes
For example, poverty is a life sentence in the second graph but not the first
We’ll come back to this a bit later
Aperiodicity Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way
and aperiodic otherwise
Here’s a trivial example with three states
1.0 b 1.0
a 1.0 c
In [3]: mc = qe.MarkovChain(P)
In [4]: mc.period
Out[4]: 3
More formally, the period of a state s is the greatest common divisor of the set of integers
In the last example, D (s) = {3, 6, 9, . . .} for every state s, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic otherwise
For example, the stochastic matrix associated with the transition probabilities below is periodic
because, for example, state a has period 2
In [3]: mc = qe.MarkovChain(P)
In [4]: mc.period
Out[4]: 2
In [5]: mc.is_aperiodic
Out[5]: False
Stationary Distributions
As seen in (2.7), we can shift probabilities forward one unit of time via postmultiplication by P
Some distributions are invariant under this updating process — for example,
In [2]: P = np.array([[.4, .6], [.2, .8]]) # after import numpy as np
In [4]: np.dot(psi, P)
Out[4]: array([ 0.25, 0.75])
Example Recall our model of employment / unemployment dynamics for a given worker dis-
cussed above
Assuming α ∈ (0, 1) and β ∈ (0, 1), the uniform ergodicity condition is satisfied
Let ψ∗ = ( p, 1 − p) be the stationary distribution, so that p corresponds to unemployment (state
0)
Using ψ∗ = ψ∗ P and a bit of algebra yields
β
p=
α+β
This is, in some sense, a steady state probability of unemployment — more on interpretation below
Not surprisingly it tends to zero as β → 0, and to one as α → 0
Calculating Stationary Distributions As discussed above, a given Markov matrix P can have
many stationary distributions
That is, there can be many row vectors ψ such that ψ = ψP
In fact if P has two distinct stationary distributions ψ1 , ψ2 then it has infinitely many, since in this
case, as you can verify,
ψ3 := λψ1 + (1 − λ)ψ2
is a stationary distribuiton for P for any λ ∈ [0, 1]
If we restrict attention to the case where only one stationary distribution exists, one option for
finding it is to try to solve the linear system ψ( In − P) = 0 for ψ, where In is the n × n identity
But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution
A suitable algorithm is implemented in QuantEcon — the next code block illustrates
In [2]: import quantecon as qe
In [4]: mc = qe.MarkovChain(P)
Convergence to Stationarity Part 2 of the Markov chain convergence theorem stated above tells
us that the distribution of Xt converges to the stationary distribution regardless of where we start
off
This adds considerable weight to our interpretation of ψ∗ as a stochastic steady state
The convergence in the theorem is illustrated in the next figure
Here
• P is the stochastic matrix for recession and growth considered above
• The highest red dot is an arbitrarily chosen initial probability distribution ψ, represented as
a vector in R3
• The other red dots are the distributions ψPt for t = 1, 2, . . .
• The black dot is ψ∗
The code for the figure can be found in the QuantEcon applications library — you might like to
try experimenting with different initial conditions
Ergodicity
1 n
n t∑
1 { Xt = s } → ψ ∗ [ s ] as n → ∞ (2.10)
=1
Here
• 1{ Xt = s} = 1 if Xt = s and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of X0
The result tells us that the fraction of time the chain spends at state s converges to ψ∗ [s] as time
goes to infinity This gives us another way to interpret the stationary distribution — provided that
the convergence result in (2.10) is valid
The convergence in (2.10) is a special case of a law of large numbers result for Markov chains —
see EDTC, section 4.3.4 for some additional information
β
p=
α+β
Computing Expectations
E[h( Xt )] (2.11)
E[ h ( Xt + k ) | Xt = s ] (2.12)
where
The vector Pk h stores the conditional expectation E[h( Xt+k ) | Xt = s] over all s
where
( I − βP)−1 = I + βP + β2 P2 + · · ·
Premultiplication by ( I − βP)−1 amounts to “applying the resolvent operator“
Exercises
In other words, if { Xt } represents the Markov chain for employment, then X̄n → p as n → ∞,
where
1 n
X̄n := ∑ 1{ Xt = 0}
n t =1
(You don’t need to add the fancy touches to the graph—see the solution if you’re interested)
Exercise 2 A topic of interest for economics and many other disciplines is ranking
Let’s now consider one of the most practical and important ranking problems — the rank assigned
to web pages by search engines
(Although the problem is motivated from outside of economics, there is in fact a deep connection
between search ranking systems and prices in certain competitive equilibria — see [DLP13])
To understand the issue, consider the set of results returned by a query to a web search engine
For the user, it is desirable to
1. receive a large set of accurate matches
2. have the matches returned in order, where the order corresponds to some measure of “im-
portance”
Ranking according to a measure of importance is the problem we now consider
The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank
To illustrate the idea, consider the following diagram
where
• `i is the total number of outbound links from i
• L j is the set of all pages i such that i has a link to j
This is a measure of the number of inbound links, weighted by their own ranking (and normalized
by 1/`i )
There is, however, another interpretation, and it brings us back to Markov chains
Let P be the matrix given by P[i, j] = 1{i → j}/`i where 1{i → j} = 1 if i has a link to j and zero
otherwise
The matrix P is a stochastic matrix provided that each page has at least one link
With this definition of P we have
ri r
rj = ∑ = ∑ 1{i → j} i = ∑ P[i, j]ri
`i all i `i all i
i∈ L j
When you solve for the ranking, you will find that the highest ranked node is in fact g, while the
lowest is a
σu2
σy2 :=
1 − ρ2
Tauchen’s method [Tau86] is the most common method for approximating this continuous state
process with a finite state Markov chain
As a first step we choose
• n, the number of states for the discrete approximation
• m, an integer that parameterizes the width of the state space
Next we create a state space { x0 , . . . , xn−1 } ⊂ R and a stochastic n × n matrix P such that
• x0 = −m σy
• xn−1 = m σy
• xi+1 = xi + s where s = ( xn−1 − x0 )/(n − 1)
• P[i, j] represents the probability of transitioning from xi to x j
Let F be the cumulative distribution function of the normal distribution N (0, σu2 )
The values P[i, j] are computed to approximate the AR(1) process — omitting the derivation, the
rules are as follows:
1. If j = 0, then set
P[i, j] = P[i, 0] = F ( x0 − ρxi + s/2)
2. If j = n − 1, then set
P[i, j] = P[i, n − 1] = 1 − F ( xn−1 − ρxi − s/2)
3. Otherwise, set
P[i, j] = F ( x j − ρxi + s/2) − F ( x j − ρxi − s/2)
The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that returns
{ x0 , . . . , xn−1 } ⊂ R and n × n matrix P as described above
Solutions
Solution notebook
Contents
• Shortest Paths
– Overview
– Outline of the Problem
– Finding Least-Cost Paths
– Solving for J
– Exercises
– Solutions
Overview
The shortest path problem is a classic problem in mathematics and computer science with appli-
cations in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• Etc., etc.
For us, the shortest path problem also provides a simple introduction to the logic of dynamic
programming, which is one of our key topics
Variations of the methods we discuss are used millions of times every day, in applications such as
Google Maps
The shortest path problem is one of finding how to traverse a graph from one specified node to
another at minimum cost
Consider the following graph
We wish to travel from node (vertex) A to node G at minimum cost
• Arrows (edges) indicate the movements we can take
• Numbers next to edges indicate the cost of traveling that edge
Possible interpretations of the graph include
• Minimum cost for supplier to reach a destination
• Routing of packets on the internet (minimize time)
• Etc., etc.
For this simple graph, a quick scan of the edges shows that the optimal paths are
• A, C, F, G at cost 8
• A, D, F, G at cost 8
where
• Fv is the set of nodes that can be reached from v in one step
Solving for J
Exercises
Exercise 1 Use the algorithm given above to find the optimal path (and its cost) for this graph
Here the line node0, node1 0.04, node8 11.11, node14 72.21 means that from node0 we can
go to
• node1 at cost 0.04
• node8 at cost 11.11
• node14 at cost 72.21
and so on
According to our calculations, the optimal path and its cost are like this
Your code should replicate this result
Solutions
Solution notebook
Contents
• Schelling’s Segregation Model
– Outline
– The Model
– Results
– Exercises
– Solutions
Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [Sch69]
His model studies the dynamics of racially mixed neighborhoods
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Economic
Sciences (joint with Robert Aumann)
In this lecture we (in fact you) will build and run a version of Schelling’s model
The Model
We will cover a variation of Schelling’s model that is easy to program and captures the main idea
Set Up Suppose we have two types of people: orange people and green people
For the purpose of this lecture, we will assume there are 250 of each type
These agents all live on a single unit square
The location of an agent is just a point ( x, y), where 0 < x, y < 1
Preferences We will say that an agent is happy if half or more of her 10 nearest neighbors are of
the same type
Here ‘nearest’ is in terms of Euclidean distance
An agent who is not happy is called unhappy
An important point here is that agents are not averse to living in mixed areas
They are perfectly happy if half their neighbors are of the other color
Results
Let’s have a look at the results we got when we coded and ran this model
As discussed above, agents are initially mixed randomly together
But after several cycles they become segregated into distinct regions
In this instance, the program terminated after 4 cycles through the set of agents, indicating that all
agents had reached a state of happiness
What is striking about the pictures is how rapidly racial integration breaks down
This is despite the fact that people in the model don’t actually mind living mixed with the other
type
Even with these preferences, the outcome is a high degree of segregation
Exercises
Rather than show you the program that generated these figures, we’ll now ask you to write your
own version
You can see our program at the end, when you look at the solution
* Methods:
Solutions
Solution notebook
Contents
• LLN and CLT
– Overview
– Relationships
– LLN
– CLT
– Exercises
– Solutions
Overview
This lecture illustrates two of the most important theorems of probability and statistics: The law
of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics and
quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are based on
do not hold
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables
• The multivariate case
Some of these extensions are presented as exercises
Relationships
LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means
The Classical LLN The classical law of large numbers concerns independent and identically
distributed (IID) random variables
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law
Let X1 , . . . , Xn be independent and identically distributed scalar random variables, with common
distribution F
When it exists, let µ denote the common mean of this sample:
Z
µ := EX = xF (dx )
In addition, let
1 n
n i∑
X̄n := Xi
=1
P { X̄n → µ as n → ∞} = 1 (2.17)
Proof The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of
[Dud02]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of the
intuition
The version we prove is as follows: If X1 , . . . , Xn is IID with EXi2 < ∞, then, for any e > 0, we
have
P {| X̄n − µ| ≥ e} → 0 as n → ∞ (2.18)
(This version is weaker because we claim only convergence in probability rather than almost sure
convergence, and assume a finite second moment)
To see that this is so, fix e > 0, and let σ2 be the variance of each Xi
Recall the Chebyshev inequality, which tells us that
Here the crucial step is at the third equality, which follows from independence
Independence means that if i 6= j, then the covariance term E( Xi − µ)( X j − µ) drops out
As a result, n2 − n terms vanish, leading us to a final expression that goes to zero in n
Combining our last result with (2.19), we come to the estimate
σ2
P {| X̄n − µ| ≥ e} ≤ (2.20)
ne2
The claim in (2.18) is now clear
Of course, if the sequence X1 , . . . , Xn is correlated, then the cross-product terms E( Xi − µ)( X j − µ)
are not necessarily zero
While this doesn’t mean that the same line of argument is impossible, it does mean that if we want
a similar result then the covariances should be “almost zero” for “most” of these terms
In a long sequence, this would be true if, for example, E( Xi − µ)( X j − µ) approached zero when
the difference between i and j became large
In other words, the LLN can still work if the sequence X1 , . . . , Xn has a kind of “asymptotic in-
dependence”, in the sense that correlation falls to zero as variables become further apart in the
sequence
This idea is very important in time series analysis, and we’ll come across it again soon enough
Illustration Let’s now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolution
of X̄n as n increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots X̄n against n in each case
The dots represent the underlying observations Xi for i = 1, . . . , 100
In each of the three cases, convergence of X̄n to µ occurs as predicted
The figure was produced by illustrates_lln.py, which is shown below (and can be found in the
lln_clt directory of the applications repository)
The three distributions are chosen at random from a selection stored in the dictionary
distributions
"""
Filename: illustrates_lln.py
Authors: John Stachurski and Thomas J. Sargent
import random
import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, poisson
import matplotlib.pyplot as plt
n = 100
for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)
# == Plot == #
plt.show()
Infinite Mean What happens if the condition E| X | < ∞ in the statement of the LLN is not
satisfied?
This might be the case if the underlying distribution is heavy tailed — the best known example is
the Cauchy distribution, which has density
1
f (x) = ( x ∈ R)
π (1 + x 2 )
The next figure shows 100 independent draws from this distribution
Notice how extreme observations are far more prevalent here than the previous figure
Let’s now have a look at the behavior of the sample mean
Here we’ve increased n to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take n even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is
Z
φ(t) = EeitX = eitx f ( x )dx = e−|t| (2.21)
CLT
Next we turn to the central limit theorem, which tells us about the distribution of the deviation
between sample averages and population means
Statement of the Theorem The central limit theorem is one of the most remarkable results in all
of mathematics
In the classical IID setting, it tells us the following: If the sequence X1 , . . . , Xn is IID, with common
mean µ and common variance σ2 ∈ (0, ∞), then
√ d
n( X̄n − µ) → N (0, σ2 ) as n → ∞ (2.22)
d
Here → N (0, σ2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal with
standard deviation σ
Intuition The striking implication of the CLT is that for any distribution with finite second mo-
ment, the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with character-
istic functions (see, e.g., theorem 9.5.6 of [Dud02])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating addition of independent Bernoulli random
variables
In particular, let Xi be binary, with P{ Xi = 0} = P{ Xi = 1} = 0.5, and let X1 , . . . , Xn be indepen-
dent
Think of Xi = 1 as a “success”, so that Yn = ∑in=1 Xi is the number of successes in n trials
The next figure plots the probability mass function of Yn for n = 1, 2, 4, 8
When n = 1, the distribution is flat — one success or no successes have the same probability
When n = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point k = 1
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then fail”)
than to get zero or two successes
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed then
fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”
(If there was positive correlation, say, then “succeed then fail” would be less likely than “succeed
then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For n = 4 and n = 8 we again get a peak at the “middle” value (halfway between the minimum
and the maximum possible value)
The intuition is the same — there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes ever more pronounced
We are witnessing the binomial approximation of the normal distribution
Simulation 1 Since the CLT seems almost magical, running simulations that verify its implica-
tions is one good way to build intuition
To this end, we now perform the following simulation
1. Choose an arbitrary distribution F for the underlying observations Xi
√
2. Generate independent draws of Yn := n( X̄n − µ)
3. Use these draws to compute some measure of their distribution — such as a histogram
4. Compare the latter to N (0, σ2 )
Here’s some code that does exactly this for the exponential distribution F ( x ) = 1 − e−λx
(Please experiment with other choices of F, but remember that, to conform with the conditions of
the CLT, the distribution must have finite second moment)
"""
Filename: illustrates_clt.py
Authors: John Stachurski and Thomas J. Sargent
# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, lambda = 1/2
# == Plot == #
fig, ax = plt.subplots()
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, normed=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label=r'$N(0, \sigma^2)$')
ax.legend()
plt.show()
The fit to the normal density is already tight, and can be further improved by increasing n
You can also experiment with other specifications of F
Note: You might need to delete or modify the lines beginning with rc to get this code to run on
your computer
Simulation 2 Our next√ simulation is somewhat like the first, except that we aim to track the
distribution of Yn := n( X̄n − µ) as n increases
In the simulation we’ll be working with random variables having µ = 0
Thus, when n = 1, we have Y1 = X1 , so the first distribution is just the distribution of the under-
lying random variable
√
For n = 2, the distribution of Y2 is that of ( X1 + X2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of Yn will smooth out into a bell shaped curve
The next figure shows this process for Xi ∼ f , where f was specified as the convex combination
of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for f )
In the figure, the closest density is that of Y1 , while the furthest is that of Y5
The Multivariate Case The law of large numbers and central limit theorem work just as nicely
in multidimensional settings
To state the results, let’s recall some elementary facts about random vectors
A random vector X is just a sequence of k random variables ( X1 , . . . , Xk )
Each realization of X is an element of Rk
A collection of random vectors X1 , . . . , Xn is called independent if, given any n vectors x1 , . . . , xn
in Rk , we have
P{ X1 ≤ x1 , . . . , X n ≤ x n } = P{ X1 ≤ x1 } × · · · × P{ X n ≤ x n }
(The vector inequality X ≤ x means that X j ≤ x j for j = 1, . . . , k)
Let µ j := E[ X j ] for all j = 1, . . . , k
The expectation E[X] of X is defined to be the vector of expectations:
E [ X1 ]
µ1
E [ X2 ] µ 2
E[ X ] : = = .. =: µ
..
. .
E[ Xk ] µk
The variance-covariance matrix of random vector X is defined as
Var[X] := E[(X − µ)(X − µ)0 ]
Expanding this out, we get
E[( X1 − µ1 )( X1 − µ1 )] · · · E[( X1 − µ1 )( Xk − µk )]
E[( X2 − µ2 )( X1 − µ1 )] · · · E[( X2 − µ2 )( Xk − µk )]
Var[X] =
.. .. ..
. . .
E[( Xk − µk )( X1 − µ1 )] · · · E[( Xk − µk )( Xk − µk )]
The j, k-th term is the scalar covariance between X j and Xk
With this notation we can proceed to the multivariate LLN and CLT
Let X1 , . . . , Xn be a sequence of independent and identically distributed random vectors, each one
taking values in Rk
Let µ be the vector E[Xi ], and let Σ be the variance-covariance matrix of Xi
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let
1 n
n i∑
X̄n := Xi
=1
Exercises
Exercise 1 One very useful consequence of the central limit theorem is as follows
Assume the conditions of the CLT as stated above
If g : R → R is differentiable at µ and g0 (µ) 6= 0, then
√ d
n{ g( X̄n ) − g(µ)} → N (0, g0 (µ)2 σ2 ) as n→∞ (2.25)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators —
many of which can be expressed as functions of sample means
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of g around the point µ
Taking the result as given, let the distribution F of each Xi be uniform on [0, π/2] and let g( x ) =
sin( x )
√
Derive the asymptotic distribution of n{ g( X̄n ) − g(µ)} and illustrate convergence in the same
spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, π/2] with [0, π ]?
What is the source of the problem?
Exercise 2 Here’s a result that’s often used in developing statistical tests, and is connected to the
multivariate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that
1. X1 , . . . , Xn is a sequence of IID random vectors, each taking values in Rk
2. µ := E[Xi ], and Σ is the variance-covariance matrix of Xi
3. The convergence
√ d
n(X̄n − µ) → N (0, Σ) (2.26)
is valid
In a statistical setting, one often wants the right hand side to be standard normal, so that confi-
dence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in Rk and A is constant and k × k, then
Var[AX] = A Var[X]A0
d
Second, by the continuous mapping theorem, if Zn → Z in Rk and A is constant and k × k, then
d
AZn → AZ
Third, if S is a k × k symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
QSQ0 = I
d
k Z n k2 → k Z k2
Solutions
Solution notebook
Contents
• Linear State Space Models
– Overview
– The Linear State Space Model
– Distributions and Moments
– Stationarity and Ergodicity
– Noisy Observations
– Prediction
– Code
– Exercises
– Solutions
“We may regard the present state of the universe as the effect of its past and the cause
of its future” – Marquis de Laplace
Overview
Objects in play
• An n × 1 vector xt denoting the state at time t = 0, 1, 2, . . .
• An iid sequence of m × 1 random vectors wt ∼ N (0, I )
• A k × 1 vector yt of observations at time t = 0, 1, 2, . . .
• An n × n matrix A called the transition matrix
• An n × m matrix C called the volatility matrix
• A k × n matrix G sometimes called the output matrix
Here is the linear state-space system
Martingale difference shocks We’ve made the common assumption that the shocks are inde-
pendent standardized normal vectors
But some of what we say will go through under the assumption that {wt+1 } is a martingale dif-
ference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past infor-
mation
In the present case, since { xt } is our state sequence, this means that it satisfies
E [ w t +1 | x t , x t −1 , . . . ] = 0
This is a weaker condition than that {wt } is iid with wt+1 ∼ N (0, I )
0 0 1 0
It is easy to check that A4 = I, which implies that xt is strictly periodic with period 4:4
x t +4 = x t
Such an xt process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.
Unconditional Moments Using (2.28), it’s easy to obtain expressions for the (unconditional)
means of xt and yt
We’ll explain what unconditional and conditional mean soon
Letting µt := E [ xt ] and using linearity of expectations, we find that
Distributions In general, knowing the mean and variance-covariance matrix of a random vector
is not quite as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given
1. our Gaussian assumptions on the primitives
2. the fact that normality is preserved under linear operations
In fact, it’s well-known that
In particular, given our Gaussian assumptions on the primitives and the linearity of (2.28) we can
see immediately that both xt and yt are Gaussian for all t ≥ 0 5
Since xt is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact we’ve already done this, in (2.34) and (2.35)
5 The correct way to argue this is by induction. Suppose that xt is Gaussian. Then (2.28) and (2.38) imply that xt+1
is Gaussian. Since x0 is assumed to be Gaussian, it follows that every xt is Gaussian. Evidently this implies that each yt
is Gaussian.
xt ∼ N (µt , Σt ) (2.39)
In the right-hand figure, these values are converted into a rotated histogram that shows relative
frequencies from our sample of 20 y T ‘s
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py from the applications repository)
Here is another figure, this time with 100 observations
Let’s now try with 500,000 observations, showing only the histogram (without rotation)
The black line is the density of y T calculated analytically, using (2.40)
The histogram and analytical distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
distribution depends on the model primitives listed above
1 I i
I i∑
ȳ T := yT
=1
approximates the expectation E [y T ] = Gµt (as implied by the law of large numbers)
Here’s a simulation comparing the ensemble averages and population means at time points t =
0, . . . , 50
The parameters are the same as for the preceding figures, and the sample size is relatively small
(I = 20)
1 I i
I i∑
x̄ T := xT → µT ( I → ∞)
=1
1 I i
I i∑
( xT − x̄T )( xiT − x̄T )0 → ΣT ( I → ∞)
=1
p( xt+1 | xt ) = N ( Axt , CC 0 )
Autocovariance functions An important object related to the joint distribution is the autocovari-
ance function
Σt+ j,t := E [( xt+ j − µt+ j )( xt − µt )0 ] (2.41)
Elementary calculations show that
Σt+ j,t = A j Σt (2.42)
Notice that Σt+ j,t in general depends on both j, the gap between the two dates, and t, the earlier
date
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of linear
state space models
Let’s start with the intuition
Visualizing Stability Let’s look at some more time series from the same model that we analyzed
above
This picture shows cross-sectional distributions for y at times T, T 0 , T 00
Note how the time series “settle down” in the sense that the distributions at T 0 and T 00 are rela-
tively similar to each other — but unlike the distribution at T
Apparently, the distributions of yt converge to a fixed long-run distribution as t → ∞
When such a distribution exists it is called a stationary distribution
Since
1. in the present case all distributions are Gaussian
2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix
we can restate the definition as follows: ψ∞ is stationary for xt if
ψ∞ = N (µ∞ , Σ∞ )
Covariance Stationary Processes Let’s see what happens to the preceding figure if we start x0 at
the stationary distribution
Now the differences in the observed distributions at T, T 0 and T 00 come entirely from random
fluctuations due to the finite sample size
By
The globally stable case The difference equation µt+1 = Aµt is known to have unique fixed point
µ∞ = 0 if all eigenvalues of A have moduli strictly less than unity
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True
The difference equation (2.35) also has a unique fixed point in this case, and, moreover
µt → µ∞ = 0 and Σt → Σ∞ as t→∞
This is the globally stable case — see these notes for more a theoretical treatment
However, global stability is more than we need for stationary solutions, and often more than we
want
To illustrate, consider our second order difference equation example
0
Here the state is xt = 1 yt yt−1
Because of the constant first component in the state vector, we will never have µt → 0
How can we find stationary solutions that respect a constant state component?
Processes with a constant state component To investigate such a process, suppose that A and C
take the form
A1 a C1
A= C=
0 1 0
where
• A1 is an (n − 1) × (n − 1) matrix
• a is an (n − 1) × 1 column vector
0 0
Let xt = x1t 1 where x1t is (n − 1) × 1
It follows that
Let µ1t = E [ x1t ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of A1 are all strictly less than one
Then (2.43) has a unique stationary solution, namely,
µ1∞ = ( I − A1 )−1 a
0 0
The stationary value of µt itself is then µ∞ := µ1∞ 1
The stationary values of Σt and Σt+ j,t satisfy
Σ∞ = AΣ∞ A0 + CC 0 (2.44)
Σt+ j,t = A Σ∞
j
Notice that here Σt+ j,t depends on the time gap j but not on calendar time t
In conclusion, if
• x0 ∼ N (µ∞ , Σ∞ ) and
• the moduli of the eigenvalues of A1 are all strictly less than unity
Note: If the eigenvalues of A1 are less than unity in modulus, then (a) starting from any initial
value, the mean and variance-covariance matrix both converge to their stationary values; and (b)
iterations on (2.35) converge to the fixed point of the discrete Lyapunov equation in the first line of
(2.44)
Ergodicity Let’s suppose that we’re working with a covariance stationary process
In this case we know that the ensemble mean will converge to µ∞ as the sample size I approaches
infinity
Averages over time Ensemble averages across simulations are interesting theoretically, but in
real life we usually observe only a single realization { xt , yt }tT=0
So now let’s take a single realization and form the time series averages
T T
1 1
x̄ :=
T ∑ xt and ȳ :=
T ∑ yt
t =1 t =1
Do these time series averages converge to something interpretable in terms of our basic state-space
representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expectation
under the stationary distribution
In particular,
• 1
T ∑tT=0 xt → µ∞
• 1
T ∑tT=0 ( xt − x̄ T )( xt − x̄ T )0 → Σ∞
• 1
T ∑tT=0 ( xt+ j − x̄ T )( xt − x̄ T )0 → A j Σ∞
In our linear Gaussian setting, any covariance stationary process is also ergodic
Noisy Observations
In some settings the observation equation yt = Gxt is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce
• An iid sequence of ` × 1 random vectors vt ∼ N (0, I )
• A k × ` matrix H
yt ∼ N ( Gµt , GΣt G 0 + HH 0 )
Prediction
The theory of prediction for linear state space systems is elegant and simple
Forecasting Formulas – Conditional Means The natural way to predict variables is to use con-
ditional distributions
For example, the optimal forecast of xt+1 given information known at time t is
In view of the iid property, current and past state values provide no information about future
values of the shock
Hence E t [wt+k ] = E [wt+k ] = 0
It now follows from linearity of expectations that the j-step ahead forecast of x is
E t [ xt+ j ] = A j xt
The j-step ahead forecast of y is therefore
Covariance of Prediction Errors It is useful to obtain the covariance matrix of the vector of j-
step-ahead prediction errors
j −1
xt+ j − E t [ xt+ j ] = ∑ As Cwt−s+ j (2.48)
s =0
Evidently,
j −1
Vj := E t [( xt+ j − E t [ xt+ j ])( xt+ j − E t [ xt+ j ])0 ] = ∑ Ak CC0 Ak
0
(2.49)
k =0
Vj is the conditional covariance matrix of the errors in forecasting xt+ j , conditioned on time t infor-
mation xt
Under particular conditions, Vj converges to
V∞ = CC 0 + AV∞ A0 (2.51)
These objects are important components of some famous and interesting dynamic models
For example,
Formulas Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of A has modulus strictly less than β
Code
Our preceding simulations and calculations are based on code in the file lss.py from the QuantE-
con.py package
The code implements a class for handling linear state space models (simulations, calculating mo-
ments, etc.)
We repeat it here for convenience
"""
Filename: lss.py
Reference: http://quant-econ.net/py/linear_models.html
Computes quantities associated with the Gaussian linear state space model.
"""
"""
This is a separate function for simulating a vector linear system of
the form
Parameters
----------
A : array_like or scalar(float)
Should be n x n
x0 : array_like
Should be n x 1. Initial condition
v : np.ndarray
Should be n x ts_length-1. Its t-th column is used as the time t
shock v_t
ts_length : int
The length of the time series
Returns
--------
x : np.ndarray
Time series with ts_length columns, the t-th column being x_t
"""
A = np.asarray(A)
n = A.shape[0]
x = np.empty((n, ts_length))
x[:, 0] = x0
for t in range(ts_length-1):
# x[:, t+1] = A.dot(x[:, t]) + v[:, t]
for i in range(n):
x[i, t+1] = v[i, t] #Shock
for j in range(n):
x[i, t+1] += A[i, j] * x[j, t] #Dot Product
return x
if numba_installed:
simulate_linear_model = jit(simulate_linear_model)
class LinearStateSpace(object):
"""
A class that describes a Gaussian linear state space model of the
form:
where {w_t} and {v_t} are independent and standard normal with dimensions
k and l respectively. The initial conditions are mu_0 and Sigma_0 for x_0
~ N(mu_0, Sigma_0). When Sigma_0=0, the draw of x_0 is exactly mu_0.
Parameters
----------
A : array_like or scalar(float)
Part of the state transition equation. It should be `n x n`
C : array_like or scalar(float)
Part of the state transition equation. It should be `n x m`
G : array_like or scalar(float)
Part of the observation equation. It should be `k x n`
H : array_like or scalar(float), optional(default=None)
Part of the observation equation. It should be `k x l`
mu_0 : array_like or scalar(float), optional(default=None)
This is the mean of initial draw and is `n x 1`
Sigma_0 : array_like or scalar(float), optional(default=None)
This is the variance of the initial draw and is `n x n` and
also should be positive definite and symmetric
Attributes
----------
A, C, G, H, mu_0, Sigma_0 : see Parameters
n, k, m, l : scalar(int)
The dimensions of x_t, y_t, w_t and v_t respectively
"""
else:
self.Sigma_0 = self.convert(Sigma_0)
def __repr__(self):
return self.__str__()
def __str__(self):
m = """\
Linear Gaussian state space model:
- dimension of state space : {n}
- number of innovations : {m}
- dimension of observation equation : {k}
"""
return dedent(m.format(n=self.n, k=self.k, m=self.m))
"""
return np.atleast_2d(np.asarray(x, dtype='float32'))
Parameters
----------
Returns
-------
x : array_like(float)
An n x ts_length array, where the t-th column is x_t
y : array_like(float)
A k x ts_length array, where the t-th column is y_t
"""
x0 = multivariate_normal(self.mu_0.flatten(), self.Sigma_0)
w = np.random.randn(self.m, ts_length-1)
v = self.C.dot(w) # Multiply each w_t by C to get v_t = C w_t
# == simulate time series == #
x = simulate_linear_model(self.A, x0, v, ts_length)
y = self.G.dot(x)
return x, y
Parameters
----------
T : scalar(int), optional(default=10)
The period that we want to replicate values for
num_reps : scalar(int), optional(default=100)
The number of replications that we want
Returns
-------
x : array_like(float)
An n x num_reps array, where the j-th column is the j_th
observation of x_T
y : array_like(float)
A k x num_reps array, where the j-th column is the j_th
observation of y_T
"""
x = np.empty((self.n, num_reps))
for j in range(num_reps):
x_T, _ = self.simulate(ts_length=T+1)
x[:, j] = x_T[:, -1]
if self.H is not None:
v = np.random.randn(self.l, num_reps)
y = self.G.dot(x) + self.H.dot(v)
else:
y = self.G.dot(x)
return x, y
def moment_sequence(self):
"""
Create a generator to calculate the population mean and
variance-convariance matrix for both x_t and y_t, starting at
the initial condition (self.mu_0, self.Sigma_0). Each iteration
produces a 4-tuple of items (mu_x, mu_y, Sigma_x, Sigma_y) for
the next period.
Yields
------
mu_x : array_like(float)
An n x 1 array representing the population mean of x_t
mu_y : array_like(float)
A k x 1 array representing the population mean of y_t
Sigma_x : array_like(float)
An n x n array representing the variance-covariance matrix
of x_t
Sigma_y : array_like(float)
A k x k array representing the variance-covariance matrix
of y_t
"""
# == Simplify names == #
A, C, G, H = self.A, self.C, self.G, self.H
# == Initial moments == #
mu_x, Sigma_x = self.mu_0, self.Sigma_0
while 1:
mu_y = G.dot(mu_x)
if H is None:
Sigma_y = G.dot(Sigma_x).dot(G.T)
else:
Sigma_y = G.dot(Sigma_x).dot(G.T) + H.dot(H.T)
# == Update moments of x == #
mu_x = A.dot(mu_x)
Sigma_x = A.dot(Sigma_x).dot(A.T) + C.dot(C.T)
Parameters
----------
max_iter : scalar(int), optional(default=200)
The maximum number of iterations allowed
tol : scalar(float), optional(default=1e-5)
The tolerance level that one wishes to achieve
Returns
-------
mu_x_star : array_like(float)
An n x 1 array representing the stationary mean of x_t
mu_y_star : array_like(float)
An k x 1 array representing the stationary mean of y_t
Sigma_x_star : array_like(float)
An n x n array representing the stationary var-cov matrix
of x_t
Sigma_y_star : array_like(float)
An k x k array representing the stationary var-cov matrix
of y_t
"""
# == Initialize iteration == #
m = self.moment_sequence()
mu_x, mu_y, Sigma_x, Sigma_y = next(m)
i = 0
error = tol + 1
if i > max_iter:
fail_message = 'Convergence failed after {} iterations'
raise ValueError(fail_message.format(max_iter))
else:
i += 1
mu_x1, mu_y1, Sigma_x1, Sigma_y1 = next(m)
error_mu = np.max(np.abs(mu_x1 - mu_x))
error_Sigma = np.max(np.abs(Sigma_x1 - Sigma_x))
error = max(error_mu, error_Sigma)
mu_x, Sigma_x = mu_x1, Sigma_x1
Parameters
----------
beta : scalar(float)
Discount factor, in [0, 1)
beta : array_like(float)
The term x_t for conditioning
Returns
-------
S_x : array_like(float)
Geometric sum as defined above
S_y : array_like(float)
Geometric sum as defined above
"""
I = np.identity(self.n)
Parameters
----------
j : Scalar(int)
Number of coefficients that we want
Returns
-------
xcoef : list(array_like(float, 2))
The coefficients for x
ycoef : list(array_like(float, 2))
The coefficients for y
"""
# Pull out matrices
A, C, G, H = self.A, self.C, self.G, self.H
Apower = np.copy(A)
for i in range(j):
xcoef.append(np.dot(Apower, C))
ycoef.append(np.dot(G, np.dot(Apower, C)))
Apower = np.dot(Apower, A)
Exercises
Exercise 1 Replicate this figure using the LinearStateSpace class from lss.py
Exercise 2 Replicate this figure modulo randomness using the same class
Exercise 3 Replicate this figure modulo randomness using the same class
The state space model and parameters are the same as for the preceding exercise
Exercise 4 Replicate this figure modulo randomness using the same class
The state space model and parameters are the same as for the preceding exercise, except that the
initial condition is the stationary distribution
Hint: You can use the stationary_distributions method to get the initial conditions
The number of sample paths is 80, and the time horizon in the figure is 100
Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates 10, 50
and 75
Solutions
Solution notebook
Contents
• A Lake Model of Employment and Unemployment
– Overview
– Model
– Dynamics of a worker
– Endogenous job finding rate
– Implementation
– Exercises
– Solutions
Overview
Model
Laws of Motion for Stock Variables We begin by constructing laws of motion for the aggregate
variables: Et , Ut , and Nt
Of the mass of workers Et who are employed at date t,
• (1 − d) Et will remain in the labor force
• of these, (1 − α)(1 − d) Et will remain employed
Of the mass of workers Ut workers who are currently unemployed,
• (1 − d)Ut will remain in the labor force
• of these, λ(1 − d)Ut will become employed
Therefore, the number of workers who will be employed at date t + 1 will be
Nt+1 = (1 + b − d) Nt = (1 + g) Nt
Et
Letting Xt = , the law of motion for X is
Ut
Xt+1 = AXt
where
(1 − d)(1 − α) (1 − d ) λ
A :=
(1 − d)α + b (1 − d)(1 − λ) + b
Laws of Motion for Rates of Employment and Unemployment The following describes the
laws of motion for the employment and unemployment rates.
Et+1 /Nt+1 1 (1 − d)(1 − α) (1 − d ) λ Et /Nt
=
Ut+1 /Nt+1 1 + g (1 − d)α + b (1 − d)(1 − λ) + b Ut /Nt
Letting
et Et /Nt
xt := =
ut Ut /Nt
we can also write this as
1
xt+1 = Âxt where  := A
1+g
Evidently, et + ut = 1 implies that et+1 + ut+1 = 1
Steady States The aggregates Et and Ut won’t converge to steady states because their sum Et +
Ut grows at gross rate 1 + g
But the vector of employment and unemployment rates xt can be in a steady state x̄ provided that
we can find a solution to the matrix equation
x̄ = Â x̄
where the components satisfy ē + ū = 1 (that is, et + ut = 1 is preserved in steady state)
This equation tells us that a steady state level x̄ is an eigenvector of  associated with a unit
eigenvalue
We also have xt → x̄ as t → ∞ provided that the remaining eigenvalues of  are in modulus less
that 1
The figure below illustrates the convergence of the unemployment and employment rate to steady
state levels (dashed red line)
While the rates converge, the stocks grow at a constant rate 1 + g and thus do not converge to any
steady state levels.
Dynamics of a worker
and
T
1
T + 1 t∑
π̄u = lim 1 { s t = 1}
T →∞
=0
Here
• 1{st = 0} is the indicator function that takes on value 0 if st = 1 and 0 otherwise
• 1{st = 1} is the indicator function that takes on value 1 if st = 1 and 0 otherwise
Because our Markov chain has a unique and ergodic invariant distribution, these time series aver-
ages equal probabilities under the invariant distribution π̄
The invariant distribution satisfies
π̄ 0 = π̄ 0 P
π̄e
π̄ = is an eigenvector of P0 associated with a unit eigenvalue
π̄u
Inspection tells us that P0 is exactly  under the assumption b = d = 0.
Thus, the percentages of time that an infinitely lived worker spends employed and unemployed
equal the fractions of workers employed and unemployed in the steady state distribution
Convergence rate How long does it take for sample averages to converge to long run averages?
Let
T
1
π e,T = ∑
T + 1 t =0
1 { s t = 0}
and
T
1
π u,T = ∑
T + 1 t =0
1 { s t = 1}
These are the average amounts of time a worker has spent unemployed and employed, respec-
tively, after T periods
The figure below plots the path of these two objects over 5000 periods
It takes virtually the entire sample for these two objects to converge to probabilities from the
ergodic probabilities (dashed red line).
The code that generates these plots can be found in file lakemodel_example.py from the applica-
tions GitHub repository
The implementation is discussed in more detail below
McCall Search Model The McCall search model helped transform economists’ way of thinking
about labor markets
It did this by casting
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
To formulate the model, we follow McCall in using dynamic programming, a powerful technique
that we study in detail in many lectures on this site (see, e.g., the lecture on optimal growth)
You don’t have to know the material in that lecture now – our presentation here is self-contained
The model is about the life an infinitely lived worker and
• the opportunities he has to work at different wages
• exogenous events that destroy his current job
• his decision making process while unemployed
Here are the details
So here goes
Let Vs be the value of a previously employed worker who enters a period with wage ws
Let U be the value of a worker who is unemployed this period
~ and U must satisfy the following two equations
A little thought will convince you that V
Vs = u(ws ) + β [(1 − α)Vs + αU ] (2.52)
and
U = u(c) + β(1 − γ)U + βγ ∑ max {U, Vs } (2.53)
s
Let’s interpret these two equations in light of the fact that today’s tomorrow is tomorrow’s today
• The left hand sides of equations (2.52) and (2.53) are the values of a worker in a particular
situation today
• The right hand sides of the equations are the discounted (by β) expected values of the possi-
ble situations that worker can be in tomorrow
• But tomorrow the worker can be in only one of the situations whose values today are on the
left sides of our two equations
• Equation (2.53) incorporates the assumption that a currently unemployed worker knows
that if he is lucky enough to receive a wage offer ws next period, he will choose to remain
unemployed unless U < Vs
Equations (2.52) and (2.53) are called Bellman equations after the mathematician Richard Bellman
They can be solved by iterating to convergence on the following sets of equations
h i
( j +1) ( j)
Vs = u(ws ) + β (1 − α)Vs + αU ( j) (2.54)
and n o
U ( j+1) = u(c) + β(1 − γ)U ( j) + βγ ∑ max U ( j) , Vs
( j)
(2.55)
s
(0)
starting from initial conditions Vs = 0, U (0) = 0
This procedure is called iterating on Bellman equations
These iterations are guaranteed to converge (see, e.g., the discussion in this lecture)
Linking the McCall search model to the lake model We now suppose that all workers inside a
Lake Model behave according to the McCall search model [McC70]
Their optimal decision rules determine some of the key probabilities that define a Lake Model
• The exogenous probability of leaving employment remains α.
• The probability of leaving unemployment is γ = λp · C
Fiscal Policy We can use the McCall search version of the Lake Model to find an optimal level of
unemployment insurance
We assume that the government sets unemployment compensation ĉ
The government imposes a lump sum tax T sufficient to finance total unemployment payments
To attain a balanced budget at a steady state, taxes, the unemployment rate u, and the unemploy-
ment compensation rate must satisfy
T = uĉ
The lump sum tax applies to everyone, including unemployed workers
Thus, the post-tax income of an employed worker with wage w∗ [s] is w[s] = w∗ [s] − T
The post-tax income of an unemployed worker is ĉ − T.
Taking as given the wage probability distribution ( p, w∗ ), we can solve for the worker’s optimal
reservation wage for a given government policy (ĉ, T ).
This will imply a steady state unemployment rate u(ĉ, T ).
For a given level of unemployment benefit ĉ, we can solve for a tax that balances the budget in the
steady state
T = u(ĉ, T )ĉ
To evaluate alternative government tax-unemployment compensation pairs, we require a welfare
criterion
We use a steady state welfare criterion
For a wage offer distribution parameterized as log(w∗ ) ∼ N (log(20), 1), we plot steady state
welfare, budget balancing tax rate, as well as steady state unemployment for different levels of
unemployment benefit
The unemployment benefit that maximizes steady state welfare is ĉ = 32.7
the steady state unemployment rate is 36%
Implementation
The code in the file lake.py from the QuantEcon.applications repo finds the transition matrices
for the lake model and simulates the dynamics of unemployment and employment
You can view the program on GitHub but we repeat it here for convenience
This code computes the steady state and simulates the path of the economy
We take a period to be a month
We set b and d to match monthly birth and death rates, respectively, in the U.S. population
We take the α and λ, the hazard rate of leaving employment and unemployment respectively, from
[DFH06]
• α = 0.013
• λ = 0.283
• b = 0.0124
• d = 0.00822
In the exercises below, these values will be taken as our baseline parameters
# -*- coding: utf-8 -*-
"""
Created on Fri Feb 27 18:08:44 2015
import pandas as pd
pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier
#Initialize Parameters
alpha = 0.013
lamb = 0.283#0.2486
b = 0.0124
d = 0.00822
g = b-d
N0 = 150.
e0 = 0.92
u0 = 1-e0
T = 50
LM = LakeModel(lamb,alpha,b,d)
A = LakeModelAgent(lamb,alpha)
pi_bar = A.compute_ergodic().flatten()
sHist = np.hstack(A.simulate(1,T))
plt.figure(figsize=[10,6])
plt.subplot(2,1,1)
plt.plot(range(50,T),pi_e[50:])
plt.hlines(pi_bar[0],0,T,'r','--')
plt.title('Percent of Time Employed')
plt.subplot(2,1,2)
plt.plot(range(50,T),pi_u[50:])
plt.hlines(pi_bar[1],0,T,'r','--')
plt.xlabel('Time')
plt.title('Percent of Time Unemployed')
plt.tight_layout()
plt.savefig('example_averages.png')
#==============================================================================
# Now add McCall Search Model
#==============================================================================
from scipy.stats import norm
logw_dist = norm(np.log(20.),1)
w = np.linspace(0.,175,201)# wage grid
plt.figure(figsize=[10,6])
plt.subplot(221)
plt.plot(cvec,W)
plt.xlabel(r'$c$')
plt.title(r'Welfare' )
axes = plt.gca()
plt.vlines(cvec[i_max],axes.get_ylim()[0],max(W),'k','-.')
plt.subplot(222)
plt.plot(cvec,T)
axes = plt.gca()
plt.vlines(cvec[i_max],axes.get_ylim()[0],T[i_max],'k','-.')
plt.xlabel(r'$c$')
plt.title(r'Taxes' )
plt.subplot(223)
plt.plot(cvec,pi[:,0])
axes = plt.gca()
plt.vlines(cvec[i_max],axes.get_ylim()[0],pi[i_max,0],'k','-.')
plt.xlabel(r'$c$')
plt.title(r'Employment Rate' )
plt.subplot(224)
plt.plot(cvec,pi[:,1])
axes = plt.gca()
plt.vlines(cvec[i_max],axes.get_ylim()[0],pi[i_max,1],'k','-.')
plt.xlabel(r'$c$')
plt.title(r'Unemployment Rate' )
plt.tight_layout()
plt.savefig('welfare_plot.png')
The code that generates the figures for this lecture can be found in file lakemodel_example.py
from the examples section of the GitHub repository
Exercises
Exercise 1 Consider an economy with initial stock of workers N0 = 100 at the steady state level
of employment in the baseline parameterization. Suppose that in response to new legislation the
hiring rate reduces to α = 0.2. Plot the tradition dynamics of the Unemployment and Employment
stocks for 50 periods. Plot the transition dynamics for the rates. How long does the economy take
to converge to a steady state? What is the new steady state level of employment?
Exercise 2 Consider an economy with initial stock of workers N0 = 100 at the steady state level
of employment in the baseline parameterization. Suppose that for 20 periods the birth rate was
temporarily high (b = 0.0025) and then returned to it’s original level. Plot the tradition dynamics
of the Unemployment and Employment stocks for 50 periods. Plot the transition dynamics for the
rates. How long does the economy take to return to its original steady state?
Solutions
Solution notebook
Contents
• A First Look at the Kalman Filter
– Overview
– The Basic Idea
– Convergence
– Implementation
– Exercises
– Solutions
Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [LS12], section 2.7.
• [AM05]
The last reference gives a particularly clear and comprehensive treatment of the Kalman filter
Required knowledge: Familiarity with matrix manipulations, multivariate normal distributions,
covariance matrices, etc.
The Kalman filter has many applications in economics, but for now let’s pretend that we are rocket
scientists
A missile has been launched from country Y and our mission is to track it
Let x ∈ R2 denote the current location of the missile—a pair indicating latitude-longitute coordi-
nates on a map
At the present moment in time, the precise location x is unknown, but we do have some beliefs
about x
One way to summarize our knowledge is a point prediction x̂
• But what if the President wants to know the probability that the missile is currently over the
Sea of Japan?
• Better to summarize our initial beliefs with a bivariate probability density p
Fig. 2.1: Prior density (Click this or any other figure to enlarge.)
The Filtering Step We are now presented with some good news and some bad news
The good news is that the missile has been located by our sensors, which report that the current
location is y = (2.3, −1.9)
The next figure shows the original prior p( x ) and the new reported location y
The bad news is that our sensors are imprecise.
In particular, we should interpret the output of our sensor not as y = x, but rather as
y = Gx + v, where v ∼ N (0, R) (2.58)
Here G and R are 2 × 2 matrices with R positive definite. Both are assumed known, and the noise
term v is assumed to be independent of x
How then should we combine our prior p( x ) = N ( x̂, Σ) and this new information y to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us we should update
our prior p( x ) to p( x | y) via
p(y | x ) p( x )
p( x | y) =
p(y)
R
where p(y) = p(y | x ) p( x )dx
In solving for p( x | y), we observe that
• p( x ) = N ( x̂, Σ)
• In view of (2.58), the conditional density p(y | x ) is N ( Gx, R)
• p(y) does not depend on x, and enters into the calculations only as a normalizing constant
Because we are in a linear and Gaussian framework, the updated density can be computed by
calculating population linear regressions.
In particular, the solution is known 6 to be
p( x | y) = N ( x̂ F , Σ F )
where
Here ΣG 0 ( GΣG 0 + R)−1 is the matrix of population regression coefficients of the hidden object
x − x̂ on the surprise y − G x̂
6See, for example, page 93 of [Bis06]. To get from his expressions to the ones used above, you will also need to apply
the Woodbury matrix identity.
This new density p( x | y) = N ( x̂ F , Σ F ) is shown in the next figure via contour lines and the color
map
The original density is left in as contour lines for comparison
Our new density twists the prior p( x ) in a direction determined by the new information y − G x̂
In generating the figure, we set G to the identity matrix and R = 0.5Σ for Σ defined in (2.57)
(The code for generating this and the proceding figures can be found in the file gaus-
sian_contours.py from the QuantEcon.applications package
Our aim is to combine this law of motion and our current distribution p( x | y) = N ( x̂ F , Σ F ) to
come up with a new predictive distribution for the location one unit of time hence
In view of (2.60), all we have to do is introduce a random vector x F ∼ N ( x̂ F , Σ F ) and work out the
distribution of Ax F + w where w is independent of x F and has distribution N (0, Q)
Since linear combinations of Gaussians are Gaussian, Ax F + w is Gaussian
Elementary calculations and the expressions in (2.59) tell us that
and
The matrix AΣG 0 ( GΣG 0 + R)−1 is often written as KΣ and called the Kalman gain
• the subscript Σ has been added to remind us that KΣ depends on Σ, but not y or x̂
Using this notation, we can summarize our results as follows: Our updated prediction is the
density N ( x̂new , Σnew ) where
x̂new := A x̂ + KΣ (y − G x̂ ) (2.61)
0 0
Σnew := AΣA − KΣ GΣA + Q
These are the standard dynamic equations for the Kalman filter. See, for example, [LS12], page 58.
Convergence
Implementation
The class Kalman from the QuantEcon.py package implements the Kalman filter
• Instance data consists of:
– the moments ( x̂t , Σt ) of the current prior
– An instance of the LinearStateSpace class from QuantEcon.py
The latter represents a linear state space model of the form
Q := CC 0 and R := HH 0
"""
Filename: kalman.py
Reference: http://quant-econ.net/py/kalman.html
Implements the Kalman filter for a linear Gaussian state space model.
"""
from textwrap import dedent
import numpy as np
from numpy import dot
from scipy.linalg import inv
from quantecon.lss import LinearStateSpace
from quantecon.matrix_eqn import solve_discrete_riccati
class Kalman(object):
r"""
Implements the Kalman filter for the Gaussian state space model
Here x_t is the hidden state and y_t is the measurement. The shocks
w_t and v_t are iid standard normals. Below we use the notation
Q := CC'
R := HH'
Parameters
-----------
ss : instance of LinearStateSpace
An instance of the quantecon.lss.LinearStateSpace class
x_hat : scalar(float) or array_like(float), optional(default=None)
An n x 1 array representing the mean x_hat and covariance
matrix Sigma of the prior/predictive density. Set to zero if
not supplied.
Sigma : scalar(float) or array_like(float), optional(default=None)
An n x n array representing the covariance matrix Sigma of
the prior/predictive density. Must be positive definite.
Set to the identity if not supplied.
Attributes
----------
Sigma, x_hat : as above
Sigma_infinity : array_like or scalar(float)
The infinite limit of Sigma_t
K_infinity : array_like or scalar(float)
The stationary Kalman gain.
References
----------
http://quant-econ.net/py/kalman.html
"""
def __repr__(self):
return self.__str__()
def __str__(self):
m = """\
Kalman filter:
- dimension of state space : {n}
- dimension of observation equation : {k}
"""
return dedent(m.format(n=self.ss.n, k=self.ss.k))
def whitener_lss(self):
r"""
This function takes the linear state space system
that is an input to the Kalman class and it converts
that system to the time-invariant whitener represenation
given by
where
and
\tilde{A} = [A 0 0
KG A-KG KH
0 0 0]
\tilde{C} = [C 0
0 0
0 I]
\tilde{G} = [G -G H]
Returns
-------
whitened_lss : LinearStateSpace
This is the linear state space system that represents
the whitened system
"""
# Check for steady state Sigma and K
if self.K_infinity is None:
Sig, K = self.stationary_values()
self.Sigma_infinity = Sig
self.K_infinity = K
else:
K = self.K_infinity
return whitened_lss
Sigma
Parameters
----------
y : scalar or array_like(float)
The current measurement
"""
# === simplify notation === #
G, H = self.ss.G, self.ss.H
R = np.dot(H, H.T)
def filtered_to_forecast(self):
"""
Updates the moments of the time t filtering distribution to the
moments of the predictive distribution, which becomes the time
t+1 prior
"""
# === simplify notation === #
A, C = self.ss.A, self.ss.C
Q = np.dot(C, C.T)
Parameters
----------
y : np.ndarray
A k x 1 ndarray y representing the current measurement
"""
self.prior_to_filtered(y)
self.filtered_to_forecast()
def stationary_values(self):
"""
Computes the limit of Sigma_t as t goes to infinity by
Returns
-------
Sigma_infinity : array_like or scalar(float)
The infinite limit of Sigma_t
K_infinity : array_like or scalar(float)
The stationary Kalman gain.
"""
# === simplify notation === #
A, C, G, H = self.ss.A, self.ss.C, self.ss.G, self.ss.H
Q, R = np.dot(C, C.T), np.dot(H, H.T)
Parameters
----------
j : int
The lag length
coeff_type : string, either 'ma' or 'var' (default='ma')
The type of coefficent sequence to compute. Either 'ma' for
moving average or 'var' for VAR.
"""
# == simplify notation == #
A, G = self.ss.A, self.ss.G
K_infinity = self.K_infinity
# == make sure that K_infinity has actually been computed == #
if K_infinity is None:
S, K_infinity = self.stationary_values()
# == compute and return coefficients == #
coeffs = []
i = 1
if coeff_type == 'ma':
coeffs.append(np.identity(self.ss.k))
P_mat = A
P = np.identity(self.ss.n) # Create a copy
elif coeff_type == 'var':
coeffs.append(dot(G, K_infinity))
P_mat = A - dot(K_infinity, G)
P = np.copy(P_mat) # Create a copy
else:
raise ValueError("Unknown coefficient type")
while i <= j:
coeffs.append(dot(dot(G, P), K_infinity))
P = dot(P, P_mat)
i += 1
return coeffs
def stationary_innovation_covar(self):
# == simplify notation == #
H, G = self.ss.H, self.ss.G
R = np.dot(H, H.T)
Sigma_infinity = self.Sigma_infinity
Exercises
Exercise 1 Consider the following simple application of the Kalman filter, loosely based on
[LS12], section 2.9.2
Suppose that
• all variables are scalars
• the hidden state { xt } is in fact constant, equal to some θ ∈ R unknown to the modeler
State dynamics are therefore given by (2.60) with A = 1, Q = 0 and x0 = θ
The measurement equation is yt = θ + vt where vt is N (0, 1) and iid
The task of this exercise to simulate the model and, using the code from kalman.py, plot the first
five predictive densities pt ( x ) = N ( x̂t , Σt )
As shown in [LS12], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value θ
In the simulation, take θ = 10, x̂0 = 8 and Σ0 = 1
Your figure should – modulo randomness – look something like this
Exercise 2 The preceding figure gives some support to the idea that probability mass converges
to θ
To get a better idea, choose a small e > 0 and calculate
Z θ +e
zt := 1 − pt ( x )dx
θ −e
for t = 0, 1, 2, . . . , T
Plot zt against T, setting e = 0.1 and T = 600
Your figure should show error erratically declining something like this
Exercise 3 As discussed above, if the shock sequence {wt } is not degenerate, then it is not in
general possible to predict xt without error at time t − 1 (and this would be the case even if we
could observe xt−1 )
Let’s now compare the prediction x̂t made by the Kalman filter against a competitor who is al-
lowed to observe xt−1
This competitor will use the conditional expectation E[ xt | xt−1 ], which in this case is Axt−1
The conditional expectation is known to be the optimal prediction method in terms of minimizing
mean squared error
(More precisely, the minimizer of E k xt − g( xt−1 )k2 with respect to g is g∗ ( xt−1 ) := E[ xt | xt−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in the
sense of being able to observe the latent state) and behaves optimally in terms of minimizing
squared error
Our horse race will be assessed in terms of squared error
In particular, your task is to generate a graph plotting observations of both k xt − Axt−1 k2 and
k xt − x̂t k2 against t for t = 1, . . . , 50
For the parameters, set G = I, R = 0.5I and Q = 0.3I, where I is the 2 × 2 identity
Set
0.5 0.4
A=
0.6 0.3
To initialize the prior density, set
0.9 0.3
Σ0 =
0.3 0.9
and x̂0 = (8, 8)
Finally, set x0 = (0, 0)
You should end up with a figure similar to the following (modulo randomness)
Observe how, after an initial learning period, the Kalman filter performs quite well, even relative
to the competitor who predicts optimally with knowledge of the latent state
Solutions
Solution notebook
Overview
In this lecture we study a simplified version of an uncertainty traps model of Fajgelbaum, Schaal
and Taschereau-Dumouchel [FSTD15]
The model features self-reinforcing uncertainty that has big impacts on economic activity
In the model,
• Fundamentals vary stochastically and are not fully observable
• At any moment there are both active and inactive entrepreneurs; only active entrepreneurs
produce
• Agents – active and inactive entrepreuneurs – have beliefs about the fundamentals expressed
as probability distributions
• Greater uncertainty means greater dispersions of these distributions
• Entrepreneurs are risk averse and hence less inclined to be active when uncertainty is high
• The output of active entrepreneurs is observable, supplying a noisy signal that helps every-
one inside the model infer fundamentals
• Entrepreneurs update their beliefs about fundamentals using Bayes’ Law, implemented via
Kalman filtering
Uncertainty traps emerge because:
• High uncertainty discourages entrepreneurs from becoming active
• A low level of participation – i.e., a smaller number of active entrepreneurs – diminishes the
flow of information about fundamentals
The Model
The original model described in [FSTD15] has many interesting moving parts
Here we examine a simplified version that nonetheless captures many of the key ideas
where
• σθ > 0 and 0 < ρ < 1
• {wt } is IID and standard normal
The random variable θt is not observable at any time
Information and Beliefs All entrepreneurs start with identical beliefs about θ0
Signals are publicly observable and hence all agents have identical beliefs always
Dropping time subscripts, beliefs for current θ are represented by the normal distribution
N (µ, γ−1 )
Here γ is the precision of beliefs; its inverse is the degree of uncertainty
These parameters are updated by Kalman filtering
Let
• M ⊂ {1, . . . , M̄ } denote the set of currently active firms
• M := |M| denote the number of currently active firms
1
• X be the average output M ∑m∈M xm of the active firms
With this notation and primes for next period values, we can write the updating of the mean and
precision via
γµ + Mγx X
µ0 = ρ (2.66)
γ + Mγx
−1
ρ2
0
γ = + σθ2 (2.67)
γ + Mγx
These are standard Kalman filtering results applied to the current setting
Exercise 1 provides more details on how (2.66) and (2.67) are derived, and then asks you to fill in
remaining steps
The next figure plots the law of motion for the precision in (2.67) as a 45 degree diagram, with one
curve for each M ∈ {0, . . . , 6}
The other parameter values are ρ = 0.99, γx = 0.5, σθ = 0.5
Points where the curves hit the 45 degree lines are long run steady states for precision for different
values of M
Thus, if one of these values for M remains fixed, a corresponding steady state is the equilibrium
level of precision
• high values of M correspond to greater information about the fundamental, and hence more
precision in steady state
• low values of M correspond to less information and more uncertainty in steady state
In practice, as we’ll see, the number of active firms fluctuates stochastically
Participation Omitting time subscripts once more, entrepreneurs enter the market in the current
period if
E[u( xm − Fm )] > c (2.68)
Here
• the mathematical expectation of xm is based on (2.65) and beliefs N (µ, γ−1 ) for θ
• Fm is a stochastic but previsible fixed cost, independent across time and firms
• c is a constant reflecting opportunity costs
The statement that Fm is previsible means that it is realized at the start of the period and treated as
a constant in (2.68)
The utility function has the constant absolute risk aversion form
1
u( x ) = (1 − exp(− ax )) (2.69)
a
where a is a positive parameter
Combining (2.68) and (2.69), entrepreneur m participates in the market (or is said to be active)
when
1
{1 − E[exp (− a(θ + em − Fm ))]} > c
a
Using standard formulas for expectations of lognormal random variables, this is equivalent to the
condition
a 2 1 + 1
1 γ γx
ψ(µ, γ, Fm ) := 1 − exp − aµ + aFm + − c > 0 (2.70)
a 2
Implementation
class UncertaintyTrapEcon(object):
def __init__(self,
a=1.5, # Risk aversion
gx=0.5, # Production shock precision
rho=0.99, # Correlation coefficient for theta
sig_theta=0.5, # Std dev of theta shock
num_firms=100, # Number of firms
sig_F=1.5, # Std dev of fixed costs
c=-420, # External opportunity cost
mu_init=0, # Initial value for mu
gamma_init=4, # Initial value for gamma
theta_init=0): # Initial value for theta
# == Record values == #
self.a, self.gx, self.rho, self.sig_theta = a, gx, rho, sig_theta
self.num_firms, self.sig_F, self.c, = num_firms, sig_F, c
self.sd_x = np.sqrt(1/ gx)
# == Initialize states == #
self.gamma, self.mu, self.theta = gamma_init, mu_init, theta_init
def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (mu, gamma). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.sig_F * np.random.randn(self.num_firms)
M = np.sum(self.psi(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.theta + self.sd_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M
In the results below we use this code to simulate time series for the major variables
Results
Let’s look first at the dynamics of µ, which the agents use to track θ
We see that µ tracks θ well when there are sufficient firms in the market
However, there are times when µ tracks θ poorly due to insufficient information
These are episodes where the uncertainty traps take hold
During these episodes
• precision is low and uncertainty is high
• few firms are in the market
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given set
of shocks
Notice how the traps only take hold after a sequence of bad draws for the fundamental
Thus, the model gives us a propagation mechanism that maps bad random draws into long down-
turns in economic activity
Exercises
Exercise 1 Fill in the details behind (2.66) and (2.67) based on the following standard result (see,
e.g., p. 24 of [YS05])
Fact Let x = ( x1 , . . . , x M ) be a vector of IID draws from common distribution N (θ, 1/γx ) and let x̄
be the sample mean. If γx is known and the prior for θ is N (µ, 1/γ), then the posterior distribution
of θ given x is
π (θ | x) = N (µ0 , 1/γ0 )
where
µγ + M x̄γx
µ0 = and γ0 = γ + Mγx
γ + Mγx
Solutions
Solution notebook
Contents
• A Simple Optimal Growth Model
– Overview
– The Model
– Dynamic Programming
– Computation
– Exercises
– Solutions
Overview
In this lecture we’re going to study a simple optimal growth model with one agent
The model is a version of the standard one sector infinite horizon growth model studied in
• [SLP89], chapter 2
• [LS12], section 3.1
• EDTC, chapter 1
• [Sun96], chapter 12
The technique we use to solve the model is dynamic programming
• Pervasive in economics, finance and many other fields
• General, powerful and yields both intuition and practical computational methods
The growth model studied below is intentionally simplistic — for now we favor ease of exposition
over realism
If you find the dynamic programming arguments hard going, try re-reading the lecture on shortest
paths
The Model
Consider an agent who owns at time t capital stock k t ∈ R+ := [0, ∞) and produces output
y t : = f ( k t ) ∈ R+
This output can either be consumed or saved as capital for next period
For simplicity we assume that depreciation is total, so that next period capital is just output minus
consumption:
k t +1 = y t − c t (2.71)
Taking k0 as given, we suppose that the agent wishes to maximize
∞
∑ βt u(ct ) (2.72)
t =0
The Policy Function Approach As it turns out, we are better off seeking the function σ directly,
rather than the optimal consumption sequence
The main reason is that the functional approach — seeking the optimal policy — translates directly
over to the stochastic case, whereas the sequential approach does not
For this model, we will say that function σ mapping R+ into R+ is a feasible consumption policy if it
satisfies
σ(k ) ≤ f (k ) for all k ∈ R+ (2.73)
The set of all such policies will be denoted by Σ
Using this notation, the agent’s decision problem can be rewritten as
∞
( )
max
σ∈Σ
∑ βt u(σ(kt )) (2.74)
t =0
k t +1 = f ( k t ) − σ ( k t ), k0 given (2.75)
In the next section we discuss how to solve this problem for the maximizing σ
Dynamic Programming
The value function gives the maximal value that can be obtained from state k0 , after considering
all feasible policies
A policy σ ∈ Σ is called optimal if it attains the supremum in (2.77) for all k0 ∈ R+
The Bellman equation for this problem takes the form
It states that maximal value from a given state can be obtained by trading off
• current reward from a given action (in this case utility from current consumption) vs
• the discounted future value of the state resulting from that action
(If the intuition behind the Bellman equation is not clear to you, try working through this lecture)
As a matter of notation, given a continuous function w on R+ , we say that policy σ ∈ Σ is w-greedy
if σ(k ) is a solution to
max {u(c) + βw( f (k) − c)} (2.79)
0≤ c ≤ f ( k )
for every k ∈ R+
Theoretical Results As with most optimization problems, conditions for existence of a solution
typically require some form of continuity and compactness
In addition, some restrictions are needed to ensure that the sum of discounted utility is always
finite
For example, if we are prepared to assume that f and u are continuous and u is bounded, then
1. The value function v∗ is finite, bounded, continuous and satisfies the Bellman equation
2. At least one optimal policy exists
3. A policy is optimal if and only if it is v∗ -greedy
(For a proof see, for example, proposition 10.1.13 of EDTC)
In view of these results, to find an optimal policy, one option — perhaps the most common — is
to
1. compute v∗
2. solve for a v∗ -greedy policy
The advantage is that, once we get to the second step, we are solving a one-dimensional optimiza-
tion problem — the problem on the right-hand side of (2.78)
This is much easier than an infinite-dimensional optimization problem, which is what we started
out with
(An infinite sequence {ct } is a point in an infinite-dimensional space)
In fact step 2 is almost trivial once v∗ is obtained
For this reason, most of our focus is on the first step — how to obtain the value function
Value Function Iteration The value function v∗ can be obtained by an iterative technique:
• Start with a guess — some initial function w
• successively improve it
The improvement step involves applying an operator (i.e., a map from functions to functions)
called the Bellman operator
The Bellman operator for this problem is denoted T and sends w into Tw via
Tw(k ) := max {u(c) + βw( f (k ) − c)} (2.80)
0≤ c ≤ f ( k )
Unbounded Utility The theoretical results stated above assume that the utility function is
bounded
In practice economists often work with unbounded utility functions
For utility functions that are bounded below (but possibly unbounded above), a clean and com-
prehensive theory now exists
(Section 12.2 of EDTC provides one exposition)
For utility functions that are unbounded both below and above the situation is more complicated
For recent work on deterministic problems, see, for example, [Kam12] or [MdRV10]
In this lecture we will use both bounded and unbounded utility functions without dwelling on
the theory
Computation
Let’s now look at computing the value function and the optimal policy
Fitted Value Iteration The first step is to compute the value function by iterating with the Bell-
man operator
In theory, the algorithm is as follows
1. Begin with a function w — an initial condition
2. Solving (2.80), obtain the function Tw
3. Unless some stopping condition is satisfied, set w = Tw and go to step 2
However, there is a problem we must confront before we implement this procedure: The iterates
can neither be calculated exactly nor stored on a computer
To see the issue, consider (2.80)
Even if w is a known function, unless Tw can be shown to have some special structure, the only
way to store this function is to record the value Tw(k ) for every k ∈ R+
Clearly this is impossible
What we will do instead is use fitted value function iteration
The procedure is to record the value of the function Tw at only finitely many “grid” points
{k1 , . . . , k I } ⊂ R+ , and reconstruct it from this information when required
Another advantage of piecewise linear interpolation is that it preserves useful shape properties
such as monotonicity and concavity / convexity
A First Pass Implementation Let’s now look at an implementation of fitted value function iter-
ation using Python
In the example below,
• f (k ) = kα with α = 0.65
• u(c) = ln c and β = 0.95
As is well-known (see [LS12], section 3.1.2), for this particular problem an exact analytical solution
is available, with
v∗ (k ) = c1 + c2 ln k (2.81)
for
ln(1 − αβ) ln(αβ)αβ α
c1 : = + and c2 : =
1−β (1 − αβ)(1 − β) 1 − αβ
At this stage, our only aim is to see if we can replicate this solution numerically, using fitted value
function iteration
Here’s a first-pass solution, the details of which are explained below
The code can be found in file optgrowth/optgrowth_v0.py from the QuantEcon.applications
repository
We repeat it here for convenience
"""
Filename: optgrowth_v0.py
Authors: John Stachurski and Tom Sargent
A first pass at solving the optimal growth problem via value function
iteration. A more general version is provided in optgrowth.py.
"""
from __future__ import division # Not needed for Python 3.x
import matplotlib.pyplot as plt
import numpy as np
from numpy import log
from scipy.optimize import fminbound
from scipy import interp
def v_star(k):
return c1 + c2 * log(k)
def bellman_operator(w):
"""
The approximate Bellman operator, which computes and returns the updated
value function Tw on the grid points.
The vector w represents the value of the input function on the grid
points.
"""
# === Apply linear interpolation to w === #
Aw = lambda x: interp(x, grid, w)
# === set Tw[i] equal to max_c { log(c) + beta w(f(k_i) - c)} === #
Tw = np.empty(grid_size)
for i, k in enumerate(grid):
objective = lambda c: - log(c) - beta * Aw(k**alpha - c)
c_star = fminbound(objective, 1e-6, k**alpha)
Tw[i] = - objective(c_star)
return Tw
plt.show()
Incidentally, it is true that knowledge of the functional form of v∗ for this model has influenced
our choice of the initial condition
w = 5 * log(grid) - 25
In more realistic problems such information is not available, and convergence will probably take
longer
Comments on the Code The function bellman_operator implements steps 2–3 of the fitted
value function algorithm discussed above
Linear interpolation is performed by SciPy’s interp function
Like the rest of SciPy’s numerical solvers, fminbound minimizes its objective, so we use the identity
maxx f ( x ) = − minx − f ( x ) to solve (2.80)
The line if __name__ == ’__main__’: is common and operates as follows
• If the file is run as the result of an import statement in another file, the clause evaluates to
False, and the code block is not executed
• If the file is run directly as a script, the clause evaluates to True, and the code block is exe-
cuted
To see how this trick works, suppose we have a file in our current working directory called
test_file.py that contains the single line
print(__name__)
The Policy Function To compute an approximate optimal policy, we run the fitted value function
algorithm until approximate convergence
Taking the function so produced as an approximation to v∗ , we then compute the (approximate)
v∗ -greedy policy
For this particular problem, the optimal consumption policy has the known analytical solution
σ(k ) = (1 − αβ)kα
The next figure compares the numerical solution to this exact solution
In the three figures, the approximation to v∗ is obtained by running the loop in the fitted value
function algorithm 2, 4 and 6 times respectively
Even with as few as 6 iterates, the numerical result is quite close to the true policy
Exercises
Exercise 2 Once an optimal consumption policy σ is given, the dynamics for the capital stock
follows (2.75)
The next figure shows the first 25 elements of this sequence for three different discount factors
(and hence three different policies)
Solutions
Solution notebook
Contents
• Optimal Growth Part II: Adding Some Bling
– Overview
– Adding a Shock
– Implementation
– Exercises
– Solutions
Overview
Adding a Shock
subject to
y t +1 = f ( y t − c t ) ξ t +1 and 0 ≤ ct ≤ yt for all t
For interpretation see the first optimal growth lecture
The difference is in the timing, as discussed above, and the addition of the expectation term E
Note also that yt is the state variable rather than k t
We seek again a Markov policy, which is a function σ : R+ → R+ that maps states to actions:
Analogous to the first optimal growth lecture, we will call σ a feasible consumption policy if it satisfies
to obtain the total expected present value of following policy σ forever, given initial income y0
The aim is to select a policy that makes this number as large as possible
The next section covers these ideas more formally
Optimality and the Bellman Operator The policy value function vσ associated with a given policy
σ is
∞
" #
v σ ( y0 ) : = E ∑ βt u(σ(yt )) (2.86)
t =0
Analogous to the Bellman equation in the first optimal growth lecture, it states that maximal value
from a given state can be obtained by trading off
Greedy and Optimal Policies A policy σ ∈ Σ is called optimal if it attains the supremum in (2.87)
for all y0 ∈ R+
Given a continuous function w on R+ , we say that σ ∈ Σ is w-greedy if σ(y) is a solution to
for every y ∈ R+
A feasible policy is optimal if and only if it is v∗ -greedy policy (see, e.g., theorem 10.1.11 of EDTC)
Hence, once we have a good approximation to v∗ , we can compute the (approximately) optimal
policy by taking the greedy policy
Implementation
Solving the optimal growth problem via value function iteration. The model is
described in
http://quant-econ.net/py/optgrowth_2.html
"""
import numpy as np
from scipy.optimize import fminbound
from scipy import interp
Parameters
----------
w : array_like(float, ndim=1)
The value of the input function on different grid points
grid : array_like(float, ndim=1)
The set of grid points
u : function
The utility function
f : function
The production function
shocks : numpy array
An array of draws from the shock, for Monte Carlo integration (to
compute expectations).
beta : scalar
The discount factor
Tw : array_like(float, ndim=1) optional (default=None)
Array to write output values to
compute_policy : Boolean, optional (default=False)
Whether or not to compute policy function
"""
# === Apply linear interpolation to w === #
w_func = lambda x: interp(x, grid, w)
# == Initialize Tw if necessary == #
if Tw is None:
Tw = np.empty(len(w))
if compute_policy:
sigma = np.empty(len(w))
if compute_policy:
return Tw, sigma
else:
return Tw
This code is from the file optgrowth_2/optgrowth.py from the QuantEcon.applications repository
The arguments to bellman_operator are described in the docstring to the function
Comments:
• We can now pass in any production and utility function
• The expectation in (2.89) is computed via Monte Carlo, using the approximation
1 n
n i∑
Ew( f (y − c)ξ ) ≈ w( f (y − c)ξ i )
=1
where {ξ i }in=1 are IID draws from the distribution of the shock
Monte Carlo is not always the most efficient way to compute integrals numerically but it does
have some theoretical advantages in the present setting
(For example, it preserves the contraction mapping property of the Bellman operator — see, e.g.,
[PalS13])
Solving Specific Models Now we want to write some code that uses the Bellman operator to
solve specific models
The first model we’ll try out is one with log utility and Cobb–Douglas production, similar to the
first optimal growth lecture
This model is simple and has a closed form solution but is useful as a test case
We assume that shocks are lognormal, with ln ξ t ∼ N (µ, σ2 )
We present the code first, which is file optgrowth_2/log_linear_growth_model.py from the Quan-
tEcon.applications repository
Explanations are given after the code
"""
Filename: log_linear_growth_model.py
Authors: John Stachurski, Thomas Sargent
The log linear growth model, wrapped as classes. For use with the
optgrowth.py module.
"""
import numpy as np
import matplotlib.pyplot as plt
from optgrowth import bellman_operator
from quantecon import compute_fixed_point
from joblib import Memory
memory = Memory(cachedir='./joblib_cache')
@memory.cache
def compute_value_function_cached(grid, beta, alpha, shocks):
"""
Compute the value function by iterating on the Bellman operator.
The work is done by QuantEcon's compute_fixed_point function.
"""
Tw = np.empty(len(grid))
initial_w = 5 * np.log(grid) - 25
v_star = compute_fixed_point(bellman_operator,
initial_w,
1e-4, # error_tol
100, # max_iter
True, # verbose
5, # print_skip
grid,
beta,
np.log,
lambda k: k**alpha,
shocks,
Tw=Tw,
compute_policy=False)
return v_star
class LogLinearGrowthModel:
"""
Stores parameters and computes solutions for the basic log utility / Cobb
Douglas production growth model. Shocks are lognormal.
"""
def __init__(self,
alpha=0.65, # Productivity parameter
beta=0.95, # Discount factor
mu=1, # First parameter in lognorm(mu, sigma)
sigma=0.1, # Second parameter in lognorm(mu, sigma)
grid_max=8,
grid_size=150):
return v_star
if show_plot:
fig, ax = plt.subplots()
ax.plot(self.grid, sigma, lw=2, alpha=0.6, label='approximate policy function')
cstar = (1 - self.alpha * self.beta) * self.grid
ax.plot(self.grid, cstar, lw=2, alpha=0.6, label='true policy function')
ax.legend(loc='upper left')
plt.show()
return sigma
@memory.cache
We are using the joblib library to cache the result of calling compute_value_function_cached at a
given set of parameters
With the argument cachedir=’./joblib_cache’, any call to this function results in both the input values
and output values being stored a subdirectory joblib_cache of the present working directory
• In UNIX shells, . refers to the present working directory
Now if we call the function twice with the same set of parameters, the result will be returned
almost instantaneously
Models as Classes The parameters and the methods that we actually interact with are wrapped
in a class called LogLinearGrowthModel
This keeps our variables organized in one self-containted object
It also makes it easier to evaluate endogenous entities like value functions across a range of pa-
rameters
For example, to compute the value function at α ∈ {0.5, 0.55, 0.6, 0.65} we could use the following
val_functions = []
alphas = [0.5, 0.55, 0.6, 0.65]
for alpha in alphas:
gm = LogLinearGrowthModel(alpha=alpha)
vstar = gm.compute_value_function()
val_functions.append(vstar)
This client code is simple and neat, and hence less error prone than client code involving complex
function calls
A Test For this growth model it is known (see, e.g., chapter 1 of EDTC) that the optimal policy
for consumption is σ(y) = (1 − αβ)y
Let’s see if we match this when we run
In [2]: lg = LogLinearGrowthModel()
In [3]: lg.compute_greedy(show_plot=True)
The approximation is good apart from close to the right hand boundary
Why is the approximation poor at the upper end of the grid?
This is caused by a combination ofthe function approximation step and the shock component
In particular, with the interp routine that we are using, evaluation at points larger than any grid
point returns the value at the right-most grid point
Points larger than any grid point are encountered in the Bellman operator at the step where we
take expectations
One solution is to take the grid larger than where we wish to compute the policy
Another is to tweak the interp routine, although we won’t pursue that idea here
Exercises
Coming soon.
Solutions
Coming soon.
Contents
• LQ Dynamic Programming Problems
– Overview
– Introduction
– Optimality – Finite Horizon
– Extensions and Comments
– Implementation
– Further Applications
– Exercises
– Solutions
Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have found
applications in almost every scientific field
This lecture provides an introduction to LQ control and its economic applications
As we will see, LQ systems have a simple structure that makes them an excellent workhorse for a
wide variety of economic problems
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than it
may appear initially
These themes appear repeatedly below
Mathematically, LQ control problems are closely related to the Kalman filter, although we won’t
pursue the deeper connections in this lecture
In reading what follows, it will be useful to have some familiarity with
• matrix manipulations
• vectors of random variables
• dynamic programming and the Bellman equation (see for example this lecture and this lec-
ture)
For additional reading on LQ control, see, for example,
• [LS12], chapter 5
• [HS08], chapter 4
Introduction
The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part refers to
preferences
Let’s begin with the former, move on to the latter, and then put them together into an optimization
problem
The Law of Motion Let xt be a vector describing the state of some economic system
Suppose that xt follows a linear law of motion given by
Here
• ut is a “control” vector, incorporating choices available to a decision maker confronting the
current state xt
• {wt } is an uncorrelated zero mean shock process satisfying Ewt wt0 = I, where the right-hand
side is the identity matrix
Regarding the dimensions
• xt is n × 1, A is n × n
• ut is k × 1, B is n × k
• wt is j × 1, C is n × j
a t +1 + c t = ( 1 + r ) a t + y t
Here at is assets, r is a fixed interest rate, ct is current consumption, and yt is current non-financial
income
If we suppose that {yt } is uncorrelated and N (0, σ2 ), then, taking {wt } to be standard normal, we
can write the system as
at+1 = (1 + r ) at − ct + σwt+1
This is clearly a special case of (2.91), with assets being the state and consumption being the control
Example 2 One unrealistic feature of the previous model is that non-financial income has a zero
mean and is often negative
This can easily be overcome by adding a sufficiently large mean
Hence in this example we take yt = σwt+1 + µ for some positive real number µ
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control variable
from consumption to the deviation of consumption from some “ideal” quantity c̄
(Most parameterizations will be such that c̄ is large relative to the amount of consumption that is
attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be ut := ct − c̄
In terms of these variables, the budget constraint at+1 = (1 + r ) at − ct + yt becomes
How can we write this new system in the form of equation (2.91)?
If, as in the previous example, we take at as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side
This means that we are dealing with an affine function, not a linear one (recall this discussion)
Fortunately, we can easily circumvent this problem by adding an extra state variable
In particular, if we write
a t +1 1 + r −c̄ + µ at −1 σ
= + ut + w t +1 (2.93)
1 0 1 1 0 0
Preferences In the LQ model, the aim is to minimize a flow of losses, where time-t loss is given
by the quadratic expression
xt0 Rxt + u0t Qut (2.95)
Here
• R is assumed to be n × n, symmetric and nonnegative definite
• Q is assumed to be k × k, symmetric and positive definite
Note: In fact, for many economic problems, the definiteness conditions on R and Q can be relaxed.
It is sufficient that certain submatrices of R and Q be nonnegative definite. See [HS08] for details
Example 1 A very simple example that satisfies these assumptions is to take R and Q to be
identity matrices, so that current loss is
Thus, for both the state and the control, loss is measured as squared distance from the origin
(In fact the general case (2.95) can also be understood in this way, but with R and Q identifying
other – non-Euclidean – notions of “distance” from the zero vector)
Intuitively, we can often think of the state xt as representing deviation from a target, such as
• deviation of inflation from some target level
• deviation of a firm’s capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously
Example 2 In the household problem studied above, setting R = 0 and Q = 1 yields preferences
Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level c̄
Let’s now be precise about the optimization problem we wish to consider, and look at how to solve
it
The Objective We will begin with the finite horizon case, with terminal time T ∈ N
In this case, the aim is to choose a sequence of controls {u0 , . . . , u T −1 } to minimize the objective
( )
T −1
E ∑ βt (xt0 Rxt + u0t Qut ) + βT xT0 R f xT (2.96)
t =0
Information There’s one constraint we’ve neglected to mention so far, which is that the decision
maker who solves this LQ problem knows only the present and the past, not the future
To clarify this point, consider the sequence of controls {u0 , . . . , u T −1 }
When choosing these controls, the decision maker is permitted to take into account the effects of
the shocks {w1 , . . . , wT } on the system
However, it is typically assumed — and will be assumed here — that the time-t control ut can be
made with knowledge of past and present shocks only
The fancy measure-theoretic way of saying this is that ut must be measurable with respect to the
σ-algebra generated by x0 , w1 , w2 , . . . , wt
This is in fact equivalent to stating that ut can be written in the form ut = gt ( x0 , w1 , w2 , . . . , wt ) for
some Borel measurable function gt
(Just about every function that’s useful for applications is Borel measurable, so, for the purposes
of intuition, you can read that last phrase as “for some function gt ”)
Now note that xt will ultimately depend on the realizations of x0 , w1 , w2 , . . . , wt
In fact it turns out that xt summarizes all the information about these historical shocks that the
decision maker needs to set controls optimally
More precisely, it can be shown that any optimal control ut can always be written as a function of
the current state alone
Hence in what follows we restrict attention to control policies (i.e., functions) of the form ut =
gt ( x t )
Actually, the preceding discussion applies to all standard dynamic programming problems
What’s special about the LQ case is that – as we shall soon see — the optimal ut turns out to be a
linear function of xt
Solution To solve the finite horizon LQ problem we can use a dynamic programming strategy
based on backwards induction that is conceptually similar to the approach adopted in this lecture
For reasons that will soon become clear, we first introduce the notation JT ( x ) := x 0 R f x
Now consider the problem of the decision maker in the second to last period
In particular, let the time be T − 1, and suppose that the state is x T −1
The decision maker must trade off current and (discounted) final losses, and hence solves
The function JT −1 will be called the T − 1 value function, and JT −1 ( x ) can be thought of as repre-
senting total “loss-to-go” from state x at time T − 1 when the decision maker behaves optimally
Letting
JT −2 ( x ) := min{ x 0 Rx + u0 Qu + β EJT −1 ( Ax + Bu + CwT −1 )}
u
The first equality is the Bellman equation from dynamic programming theory specialized to the
finite horizon LQ problem
Now that we have { J0 , . . . , JT }, we can obtain the optimal controls
As a first step, let’s find out what the value functions look like
It turns out that every Jt has the form Jt ( x ) = x 0 Pt x + dt where Pt is a n × n matrix and dt is a
constant
We can show this by induction, starting from PT := R f and d T = 0
Using this notation, (2.97) becomes
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to u and set it equal
to zero
Applying the relevant rules of matrix calculus, this gives
JT −1 ( x ) := x 0 PT −1 x + d T −1
where
PT −1 := R − β2 A0 PT B( Q + βB0 PT B)−1 B0 PT A + βA0 PT A (2.100)
and
d T −1 := β trace(C 0 PT C ) (2.101)
(The algebra is a good exercise — we’ll leave it up to you)
If we continue working backwards in this manner, it soon becomes clear that Jt ( x ) := x 0 Pt x + dt
as claimed, where { Pt } and {dt } satisfy the recursions
and
dt−1 := β(dt + trace(C 0 Pt C )) with dT = 0 (2.103)
Recalling (2.99), the minimizers from these backward steps are
An Application Early Keynesian models assumed that households have a constant marginal
propensity to consume from current income
Data contradicted the constancy of the marginal propensity to consume
In response, Milton Friedman, Franco Modigliani and many others built models based on a con-
sumer’s preference for a stable consumption stream
(See, for example, [Fri56] or [MB54])
One property of those models is that households purchase and sell financial assets to make con-
sumption streams smoother than income streams
The household savings problem outlined above captures these ideas
The optimization problem for the household is to choose a consumption sequence in order to
minimize ( )
T −1
E ∑ βt (ct − c̄)2 + βT qa2T (2.106)
t =0
As before we set yt = σwt+1 + µ and ut := ct − c̄, after which the constraint can be written as in
(2.92)
We saw how this constraint could be manipulated into the LQ formulation xt+1 = Axt + But +
Cwt+1 by setting xt = ( at 1)0 and using the definitions in (2.94)
To match with this state and control, the objective function (2.106) can be written in the form of
(2.96) by choosing
0 0 q 0
Q := 1, R := , and R f :=
0 0 0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying (2.102)
and (2.104)
After generating shocks w1 , . . . , wT , the dynamics for assets and consumption can be simulated
via (2.105)
We provide code for all these operations below
The following figure was computed using this code, with r = 0.05, β = 1/(1 + r ), c̄ = 2, µ =
1, σ = 0.25, T = 45 and q = 106
The shocks {wt } were taken to be iid and standard normal
The top panel shows the time path of consumption ct and income yt in the simulation
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income
(But note that consumption becomes more irregular towards the end of life, when the zero final
asset requirement impinges more on consumption choices)
The second panel in the figure shows that the time path of assets at is closely correlated with
cumulative unanticipated income, where the latter is defined as
t
zt := ∑ σwt
j =0
A key message is that unanticipated windfall gains are saved rather than consumed, while unan-
ticipated negative shocks are met by reducing assets
(Again, this relationship breaks down towards the end of life due to the zero final asset require-
ment)
These results are relatively robust to changes in parameters
For example, let’s increase β from 1/(1 + r ) ≈ 0.952 to 0.96 while keeping other parameters fixed
This consumer is slightly more patient than the last one, and hence puts relatively more weight
on later consumption values
A simulation is shown below
We now have a slowly rising consumption stream and a hump-shaped build up of assets in the
middle periods to fund rising consumption
However, the essential features are the same: consumption is smooth relative to income, and assets
are strongly positively correlated with cumulative unanticipated income
Let’s now consider a number of standard extensions to the LQ problem treated above
Infinite Horizon Finally, we consider the infinite horizon case, with cross-product term, un-
changed dynamics and objective function given by
∞
( )
E ∑ βt (xt0 Rxt + u0t Qut + 2u0t Nxt ) (2.110)
t =0
In the infinite horizon case, optimal policies can depend on time only if time itself is a component
of the state vector xt
In other words, there exists a fixed matrix F such that ut = − Fxt for all t
This stationarity is intuitive — after all, the decision maker faces the same infinite horizon at every
stage, with only the current state changing
Not surprisingly, P and d are also constant
The stationary matrix P is given by the fixed point of (2.102)
Equivalently, it is the solution P to the discrete time algebraic Riccati equation
Equation (2.111) is also called the LQ Bellman equation, and the map that sends a given P into the
right-hand side of (2.111) is called the LQ Bellman operator
The stationary optimal policy for this model is
β
d := trace(C 0 PC ) (2.113)
1−β
Certainty Equivalence Linear quadratic control problems of the class discussed above have the
property of certainty equivalence
By this we mean that the optimal policy F is not affected by the parameters in C, which specify
the shock process
This can be confirmed by inspecting (2.112) or (2.109)
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back in
when examining optimal state dynamics
Implementation
We have put together some code for solving finite and infinite horizon linear quadratic control
problems
The code can be found in the file lqcontrol.py from the QuantEcon.py package
You can view the program on GitHub but we repeat it here for convenience
"""
Filename: lqcontrol.py
"""
from textwrap import dedent
import numpy as np
from numpy import dot
from scipy.linalg import solve
from .matrix_eqn import solve_discrete_riccati
class LQ(object):
r"""
This class is for analyzing linear quadratic optimal control
problems of either the infinite horizon form
with
For this model, the time t value (i.e., cost-to-go) function V_t
takes the form
Parameters
----------
Q : array_like(float)
Q is the payoff(or cost) matrix that corresponds with the
control variable u and is k x k. Should be symmetric and
nonnegative definite
R : array_like(float)
R is the payoff(or cost) matrix that corresponds with the
state variable x and is n x n. Should be symetric and
non-negative definite
N : array_like(float)
N is the cross product term in the payoff, as above. It should
be k x n.
A : array_like(float)
A is part of the state transition as described above. It should
be n x n
B : array_like(float)
B is part of the state transition as described above. It should
be n x k
C : array_like(float), optional(default=None)
C is part of the state transition as described above and
corresponds to the random variable today. If the model is
deterministic then C should take default value of None
beta : scalar(float), optional(default=1)
beta is the discount parameter
T : scalar(int), optional(default=None)
T is the number of periods in a finite horizon problem.
Rf : array_like(float), optional(default=None)
Rf is the final (in a finite horizon model) payoff(or cost)
matrix that corresponds with the control variable u and is n x
n. Should be symetric and non-negative definite
Attributes
----------
Q, R, N, A, B, C, beta, T, Rf : see Parameters
P : array_like(float)
P is part of the value function representation of V(x) = x'Px + d
d : array_like(float)
d is part of the value function representation of V(x) = x'Px + d
F : array_like(float)
F is the policy rule that determines the choice of control in
each period.
k, n, j : scalar(int)
The dimensions of the matrices as presented above
"""
self.beta = beta
if C is None:
# == If C not given, then model is deterministic. Set C=0. == #
self.j = 1
self.C = np.zeros((self.n, self.j))
else:
self.C = converter(C)
self.j = self.C.shape[1]
if N is None:
# == No cross product term in payoff. Set N=0. == #
self.N = np.zeros((self.k, self.n))
if T:
# == Model is finite horizon == #
self.T = T
self.Rf = np.asarray(Rf, dtype='float32')
self.P = self.Rf
self.d = 0
else:
self.P = None
self.d = None
self.T = None
self.F = None
def __repr__(self):
return self.__str__()
def __str__(self):
m = """\
Linear Quadratic control system
- beta (discount parameter) : {b}
- T (time horizon) : {t}
- n (number of state variables) : {n}
- k (number of control variables) : {k}
- j (number of shocks) : {j}
"""
t = "infinite" if self.T is None else self.T
return dedent(m.format(b=self.beta, n=self.n, k=self.k, j=self.j,
t=t))
def update_values(self):
"""
This method is for updating in the finite horizon case. It
shifts the current value function
"""
# === Simplify notation === #
Q, R, A, B, N, C = self.Q, self.R, self.A, self.B, self.N, self.C
P, d = self.P, self.d
def stationary_values(self):
"""
Computes the matrix P and scalar d that represent the value
function
V(x) = x' P x + d
Returns
-------
P : array_like(float)
P is part of the value function representation of
V(x) = xPx + d
F : array_like(float)
F is the policy rule that determines the choice of control
in each period.
d : array_like(float)
d is part of the value function representation of
V(x) = xPx + d
"""
# === simplify notation === #
Q, R, A, B, N, C = self.Q, self.R, self.A, self.B, self.N, self.C
# == Compute F == #
S1 = Q + self.beta * dot(B.T, dot(P, B))
S2 = self.beta * dot(B.T, dot(P, A)) + N
F = solve(S1, S2)
# == Compute d == #
d = self.beta * np.trace(dot(P, dot(C, C.T))) / (1 - self.beta)
return P, F, d
Parameters
===========
x0 : array_like(float)
The initial state, a vector of length n
ts_length : scalar(int)
Length of the simulation -- defaults to T in finite case
Returns
========
x_path : array_like(float)
An n x T matrix, where the t-th column represents x_t
u_path : array_like(float)
A k x T matrix, where the t-th column represents u_t
w_path : array_like(float)
A j x T matrix, where the t-th column represent w_t
"""
self.update_values()
policies.append(self.F)
In the module, the various updating, simulation and fixed point methods are wrapped in a class
called LQ, which includes
• Instance data:
– The required parameters Q, R, A, B and optional parameters C, beta, T, R_f, N specifying
a given LQ model
# == Model parameters == #
r = 0.05
beta = 1 / (1 + r)
T = 45
c_bar = 2
sigma = 0.25
mu = 1
q = 1e6
# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
A = [[1 + r, -c_bar + mu],
[0, 1]]
B = [[-1],
[0]]
C = [[sigma],
[0]]
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for i in range(n_rows):
axes[i].grid()
axes[i].set_xlabel(r'Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
Further Applications
subject to at+1 = (1 + r ) at − ct + yt , t ≥ 0
For income we now take yt = p(t) + σwt+1 where p(t) := m0 + m1 t + m2 t2
(In the next section we employ some tricks to implement a more sophisticated model)
The coefficients m0 , m1 , m2 are chosen such that p(0) = 0, p( T/2) = µ, and p( T ) = 0
You can confirm that the specification m0 = 0, m1 = Tµ/( T/2)2 , m2 = −µ/( T/2)2 satisfies these
constraints
To put this into an LQ setting, consider the budget constraint, which becomes
The fact that at+1 is a linear function of ( at , 1, t, t2 ) suggests taking these four variables as the state
vector xt
Once a good choice of state and control (recall ut = ct − c̄) has been made, the remaining specifi-
cations fall into place relatively easily
Thus, for the dynamics we set
at 1 + r −c̄ m1 m2 −1 σ
1 0 1 0 0 0 0
xt := t , A := 0
, B :=
0 ,
C := (2.116)
1 1 0 0
t2 0 1 2 1 0 0
If you expand the expression xt+1 = Axt + But + Cwt+1 using this specification, you will find that
assets follow (2.115) as desired, and that the other state variables also update appropriately
To implement preference specification (2.114) we take
0 0 0 0 q 0 0 0
0 0 0 0 0 0 0 0
Q := 1, R := 0 0 0 0 and
R f :=
0
(2.117)
0 0 0
0 0 0 0 0 0 0 0
The next figure shows a simulation of consumption and assets computed using the
compute_sequence method of lqcontrol.py with initial assets set to zero
Here
• p(t) := m1 t + m2 t2 with the coefficients m1 , m2 chosen such that p(K ) = µ and p(0) =
p(2K ) = 0
• s is retirement income
We suppose that preferences are unchanged and given by (2.106)
The budget constraint is also unchanged and given by at+1 = (1 + r ) at − ct + yt
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture
In fact this is a nontrivial problem, as the kink in the dynamics (2.118) at K makes it very difficult
to express the law of motion as a fixed-coefficient linear system
However, we can still use our LQ methods here by suitably linking two component LQ problems
These two LQ problems describe the consumer’s behavior during her working life (lq_working)
and retirement (lq_retired)
(This is possible because in the two separate periods of life, the respective income processes [poly-
nomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem, it is
still a dynamic programming problem, and hence we can use appropriate Bellman equations at
every stage
Based on this logic, we can
1. solve lq_retired by the usual backwards induction procedure, iterating back to the start of
retirement
2. take the start-of-retirement value function generated by this process, and use it as the termi-
nal condition R f to feed into the lq_working specification
3. solve lq_working by backwards induction from this choice of R f , iterating back to the start
of working life
This process gives the entire life-time sequence of value functions and optimal policies
The next figure shows one simulation based on this procedure
The full set of parameters used in the simulation is discussed in Exercise 2, where you are asked to
replicate the figure
Once again, the dominant feature observable in the simulation is consumption smoothing
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving
Application 3: Monopoly with Adjustment Costs Consider a monopolist facing stochastic in-
verse demand function
p t = a0 − a1 q t + d t
Here qt is output, and the demand shock dt follows
Here
• γ(qt+1 − qt )2 represents adjustment costs
• c is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first let’s study the
problem and try to get some intuition
One way to start thinking about the problem is to consider what would happen if γ = 0
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose output
to maximize current profit in each period
It’s not difficult to show that profit-maximizing output is
a0 − c + d t
q̄t :=
2a1
In light of this discussion, what we might expect for general γ is that
• if γ is close to zero, then qt will track the time path of q̄t relatively closely
• if γ is larger, then qt will be smoother than q̄t , as the monopolist seeks to avoid adjustment
costs
This intuition turns out to be correct
The following figures show simulations produced by solving the corresponding LQ problem
The only difference in parameters across the figures is the size of γ
It’s now relatively straightforward to find R and Q such that (2.120) can be written as (2.110)
Furthermore, the matrices A, B and C from (2.91) can be found by writing down the dynamics of
each element of the state
Exercise 3 asks you to complete this process, and reproduce the preceding figures
Exercises
For lq_retired, use the same definition of xt and ut , but modify A, B, C to correspond to constant
income yt = s
For lq_retired, set preferences as in (2.117)
For lq_working, preferences are the same, except that R f should be replaced by the final value
function that emerges from iterating lq_retired back to the start of retirement
With some careful footwork, the simulation can be generated by patching together the simulations
from these two separate models
Exercise 3 Reproduce the figures from the monopolist application given above
For parameters, use a0 = 5, a1 = 0.5, σ = 0.15, ρ = 0.9, β = 0.95 and c = 2, while γ varies between
1 and 50 (see figures)
Solutions
Solution notebook
Overview
In this lecture we discuss a family of dynamic programming problems with the following features:
1. a discrete state space
2. discrete choices (actions)
3. an infinite horizon
4. discounted rewards
5. Markov state transitions
We call such problems discrete dynamic programs, or simply discrete DPs
Discrete DPs are the workhorses in much of modern quantitative economics, including
• monetary economics
• search and labor economics
• household savings and consumption theory
• investment theory
• asset pricing
• industrial organization, etc.
When a given model is not inherently discrete, it is common to replace it with a discretized version
in order to use discrete DP techniques
How to Read this Lecture This lecture provides a detailed treatment of discrete dynamic pro-
gramming
If you
• already know discrete dynamic programming, or
• want to move quickly to problem solving
then we suggest you review the notation and then skip immediately to the first example
Alternatively, if you find the treatment of dynamic programming too theoretical, you might want
to review some of the introductory lectures on dynamic programming, such as
• The shortest path lecture
• The lake model lecture
• The optimal growth lecture
References For background reading and additional applications, see, for example,
• [LS12]
• [HLL96], section 3.5
• [Put05]
• [SLP89]
• [Rus96]
• [MF02]
• EDTC, chapter 5
Discrete DPs
Loosely speaking, a discrete DP is a maximization problem with an objective function of the form
∞
E ∑ βt r (st , at ) (2.121)
t =0
where
• st is the state variable
• at is the action
• β is a discount factor
• r (st , at ) is interpreted as a current reward when the state is st and the action chosen is at
Each pair (st , at ) pins down transition probabilities Q(st , at , st+1 ) for the next period state st+1
Thus, actions influence not only current rewards but also the future time path of the state
The essence of dynamic programming problems is to trade off current rewards vs favorable posi-
tioning of the future state (modulo randomness)
Examples:
• consuming today vs saving and accumulating assets
• accepting a job offer today vs seeking a better one in the future
• exercising an option now vs waiting
Policies The most fruitful way to think about solutions to discrete DP problems is to compare
policies
In general, a policy is a randomized map from past actions and states to current action
In the setting formalized below, it suffices to consider so-called stationary Markov policies, which
consider only the current state
• A stationary Markov policy is a map σ from states to actions, with at = σ(st ) indicating that
at is the action to be taken in state st
• For any arbitrary policy, there exists a stationary Markov policy that dominates it at least
weakly
• See section 5.5 of [Put05] for discussion and proofs
In what follows, stationary Markov policies are referred to simply as policies
The aim is to find an optimal policy, in the sense of one that maximizes (2.121)
Let’s now step through these ideas more carefully
Formal definition Formally, a discrete dynamic program consists of the following components:
1. A finite set of states S = {0, . . . , n − 1}
2. A finite set of feasible actions A(s) for each state s ∈ S, and a corresponding set
SA := {(s, a) | s ∈ S, a ∈ A(s)}
of feasible state-action pairs
3. A reward function r : SA → R
4. A transition probability function Q : SA → ∆(S), where ∆(S) is the set of probability distribu-
tions over S
5. A discount factor β ∈ [0, 1)
A(s) = {0, . . . , m − 1} and call this set the action space
S
We also use the notation A := s∈S
Value and Optimality Let vσ (s) denote the discounted sum of expected reward flows from pol-
icy σ when the initial state is s
To calculate this quantity we pass the expectation through the sum in (2.121) and use (2.122) to get
∞
vσ (s) = ∑ βt (Qtσ rσ )(s) (s ∈ S)
t =0
This function is called the policy value function for the policy σ
The optimal value function, or simply value function, is the function v∗ : S → R defined by
(We can use max rather than sup here because the domain is a finite set)
A policy σ ∈ Σ is called optimal if vσ (s) = v∗ (s) for all s ∈ S
Given any w : S → R, a policy σ ∈ Σ is called w-greedy if
( )
σ(s) ∈ arg max r (s, a) + β ∑
0
w(s0 ) Q(s, a, s0 ) (s ∈ S)
a∈ A(s) s ∈S
As discussed in detail below, optimal policies are precisely those that are v∗ -greedy
Tσ v = rσ + βQσ v
The Bellman Equation and the Principle of Optimality The main principle of the theory of
dynamic programming is that
• the optimal value function v∗ is a unique solution to the Bellman equation,
( )
v(s) = max
a∈ A(s)
r (s, a) + β ∑
0
v(s0 ) Q(s, a, s0 ) ( s ∈ S ),
s ∈S
Now that the theory has been set out, let’s turn to solution methods
Code for solving dicrete DPs is available in the DiscreteDP class from QuantEcon.py
It implements the three most important solution methods for discrete dynamic programs, namely
• value function iteration
• policy function iteration
• modified policy function iteration
Let’s briefly review these algorithms and their implementation
Value Function Iteration Perhaps the most familiar method for solving all manner of dynamic
programs is value function iteration
This algorithm uses the fact that the Bellman operator T is a contraction mapping with fixed point
v∗
Hence, iterative application of T to any initial function v0 : S → R converges to v∗
The details of the algorithm can be found in the appendix
Policy Function Iteration This routine, also known as Howard’s policy improvement algorithm,
exploits more closely the particular structure of a discrete DP problem
Each iteration consists of
1. A policy evaluation step that computes the value vσ of a policy σ by solving the linear equa-
tion v = Tσ v
2. A policy improvement step that computes a vσ -greedy policy
In the current setting policy iteration computes an exact optimal policy in finitely many iterations
• See theorem 10.2.6 of EDTC for a proof
The details of the algorithm can be found in the appendix
Modified Policy Function Iteration Modified policy iteration replaces the policy evaluation step
in policy iteration with “partial policy evaluation”
The latter computes an approximation to the value of a policy σ by iterating Tσ for a specified
number of times
This approach can be useful when the state space is very large and the linear system in the policy
evaluation step of policy iteration is correspondingly difficult to solve
The details of the algorithm can be found in the appendix
s0 = a + U where U ∼ U [0, . . . , B]
Discrete DP Representation We want to represent this model in the format of a discrete dynamic
program
To this end, we take
• the state variable to be the stock s
• the state space to be S = {0, . . . , M + B}
– hence n = M + B + 1
• the action to be the storage quantity a
• the set of feasible actions at s to be A(s) = {0, . . . , min{s, M }}
– hence A = {0, . . . , M} and m = M + 1
Defining a DiscreteDP Instance This information will be used to create an instance of Discret-
eDP by passing the following information
1. An n × m reward array R
2. An n × m × n transition probability array Q
3. A discount factor β
For R we set R[s, a] = u(s − a) if a ≤ s and −∞ otherwise
For Q we follow the rule in (2.123)
Note:
• The feasibility constraint is embedded into R by setting R[s, a] = −∞ for a ∈
/ A(s)
• Probability distributions for (s, a) with a ∈
/ A(s) can be arbitrary
A simple class that sets up these objects for us in the current application can be found in the
QuantEcon.applications repository
For convenience let’s repeat it here:
"""
A simple optimal growth model, for testing the DiscreteDP class.
Filename: finite_dp_og_example.py
"""
import numpy as np
class SimpleOG(object):
self.populate_Q()
self.populate_R()
def populate_R(self):
"""
Populate the R matrix, with R[s, a] = -np.inf for infeasible
state-action pairs.
"""
for s in range(self.n):
for a in range(self.m):
self.R[s, a] = self.u(s - a) if a <= s else -np.inf
def populate_Q(self):
"""
Populate the Q matrix by setting
for a in range(self.m):
self.Q[:, a, a:(a + self.B + 1)] = 1.0 / (self.B + 1)
(In IPython version 4.0 and above you can also type results. and hit the tab key)
The most important attributes are v, the value function, and sigma, the optimal policy
In [7]: results.v
Out[7]:
array([ 19.01740222, 20.01740222, 20.43161578, 20.74945302,
In [8]: results.sigma
Out[8]: array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5])
Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound
max_iter
Let’s make sure this didn’t happen
In [9]: results.max_iter
Out[9]: 250
In [10]: results.num_iter
Out[10]: 3
Another interesting object is results.mc, which is the controlled chain defined by Qσ∗ , where σ∗
is the optimal policy
In other words, it gives the dynamics of the state when the agent follows the optimal policy
Since this object is an instance of the MarkovChain class from QuantEcon (see this lecture for more
discussion), we can immediately simulate it, compute its stationary distribution and so on
In [11]: results.mc.stationary_distributions
Out[11]:
array([[ 0.01732187, 0.04121063, 0.05773956, 0.07426848, 0.08095823,
0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909,
0.09090909, 0.07358722, 0.04969846, 0.03316953, 0.01664061,
0.00995086]])
In [20]: results.mc.stationary_distributions
Out[20]:
array([[ 0.00546913, 0.02321342, 0.03147788, 0.04800681, 0.05627127,
0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909,
0.09090909, 0.08543996, 0.06769567, 0.05943121, 0.04290228,
0.03463782]])
If we look at the bar graph we can see the rightward shift in probability mass
State-Action Pair Formulation The DiscreteDP class in fact provides a second interface to set-
ting up an instance
One of the advantages of this alternative set up is that it permits use of a sparse matrix for Q
(An example of using sparse matrices is given in the exercise solution notebook below)
The call signature of the second formulation is DiscreteDP(R, Q, beta, s_indices,
a_indices) where
• s_indices and a_indices are arrays of equal length L enumerating all feasible state-action
pairs
• R is an array of length L giving corresponding rewards
• Q is an L x n transition probability array
Here’s how we could set up these objects for the preceding example
import numpy as np
s_indices = []
a_indices = []
Q = []
R = []
b = 1.0 / (B + 1)
for s in range(n):
for a in range(min(M, s) + 1): # All feasible a at this s
s_indices.append(s)
a_indices.append(a)
q = np.zeros(n)
q[a:(a + B + 1)] = b # b on these values, otherwise 0
Q.append(q)
R.append(u(s - a))
For larger problems you might need to write this code more efficiently by vectorizing or using
Numba
Exercises
In the deterministic optimal growth dynamic programming lecture, we solved a benchmark model
that has an analytical solution to check we could replicate it numerically
The exercise is to replicate this solution again using the DiscreteDP class described above
Solutions
Solution notebook
Appendix: Algorithms
This appendix covers the details of the solution algorithms implemented in the DiscreteDP class
We will make use of the following notions of approximate optimality:
• For ε > 0, v is called an ε-approximation of v∗ if kv − v∗ k < ε
• A policy σ ∈ Σ is called ε-optimal if vσ is an ε-approximation of v∗
Contents
• Rational Expectations Equilibrium
– Overview
– Defining Rational Expectations Equilibrium
– Computation of an Equilibrium
– Exercises
– Solutions
Overview
The Big Y, little y trick This widely used method applies in contexts in which a “representative
firm” or agent is a “price taker” operating within a competitive equilibrium
We want to impose that
• The representative firm or individual takes aggregate Y as given when it chooses individual
y, but . . .
• At the end of the day, Y = y, so that the representative firm is indeed representative
The Big Y, little y trick accomplishes these two goals by
• Taking Y as beyond control when posing the choice problem of who chooses y; but . . .
• Imposing Y = y after having solved the individual’s optimization problem
Please watch for how this strategy is applied as the lecture unfolds
We begin by applying the Big Y, little y trick in a very simple static context
A simple static example of the Big Y, little y trick Consider a static model in which a collection
of n firms produce a homogeneous good that is sold in a competitive market
Each of these n firms sells output y
The price p of the good lies on an inverse demand curve
p = a0 − a1 Y (2.124)
where
• ai > 0 for i = 0, 1
• Y = ny is the market-wide level of output
Each firm has total cost function
c(y) = c1 y + 0.5c2 y2 , ci > 0 for i = 1, 2
The profits of a representative firm are py − c(y)
Using (2.124), we can express the problem of the representative firm as
h i
max ( a0 − a1 Y )y − c1 y − 0.5c2 y2 (2.125)
y
Our first illustration of a rational expectations equilibrium involves a market with n firms, each of
which seeks to maximize the discounted present value of profits in the face of adjustment costs
The adjustment costs induce the firms to make gradual adjustments, which in turn requires con-
sideration of future prices
Individual firms understand that, via the inverse demand curve, the price is determined by the
amounts supplied by other firms
Hence each firm wants to forecast future total industry supplies
In our context, a forecast is generated by a belief about the law of motion for the aggregate state
Rational expectations equilibrium prevails when this belief coincides with the actual law of motion
generated by production choices induced by this belief
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that maps
beliefs into optimal beliefs
pt = a0 − a1 Yt (2.128)
where
• ai > 0 for i = 0, 1
• Yt = nyt is the market-wide level of output
where
γ ( y t +1 − y t )2
rt := pt yt − , y0 given (2.130)
2
Regarding the parameters,
• β ∈ (0, 1) is a discount factor
• γ > 0 measures the cost of adjusting the rate of output
Regarding timing, the firm observes pt and yt when it chooses yt+1 at at time t
To state the firm’s optimization problem completely requires that we specify dynamics for all state
variables
This includes ones that the firm cares about but does not control like pt
We turn to this problem now
Prices and Aggregate Output In view of (2.128), the firm’s incentive to forecast the market price
translates into an incentive to forecast aggregate output Yt
Aggregate output depends on the choices of other firms
We assume that n is such a large number that the output of any single firm has a negligible effect
on aggregate output
That justifies firms in regarding their forecasts of aggregate output as being unaffected by their
own output decisions
The Firm’s Beliefs We suppose the firm believes that market-wide output Yt follows the law of
motion
Yt+1 = H (Yt ) (2.131)
where Y0 is a known initial condition
The belief function H is an equilibrium object, and hence remains to be determined
Optimal Behavior Given Beliefs For now let’s fix a particular belief H in (2.131) and investigate
the firm’s response to it
Let v be the optimal value function for the firm’s problem given H
The value function satisfies the Bellman equation
γ ( y 0 − y )2
0
v(y, Y ) = max a0 y − a1 yY − + βv(y , H (Y )) (2.132)
y0 2
Let’s denote the firm’s optimal policy function by h, so that
yt+1 = h(yt , Yt ) (2.133)
where
γ ( y 0 − y )2
h(y, Y ) := arg max a0 y − a1 yY − + βv(y0 , H (Y )) (2.134)
y0 2
Evidently v and h both depend on H
vy (y, Y ) = a0 − a1 Y + γ(y0 − y)
The firm optimally sets an output path that satisfies (2.136), taking (2.131) as given, and subject to
• the initial conditions for (y0 , Y0 )
• the terminal condition limt→∞ βt yt vy (yt , Yt ) = 0
This last condition is called the transversality condition, and acts as a first-order necessary condition
“at infinity”
The firm’s decision rule solves the difference equation (2.136) subject to the given initial condition
y0 and the transversality condition
Note that solving the Bellman equation (2.132) for v and then h in (2.134) yields a decision rule
that automatically imposes both the Euler equation (2.136) and the transversality condition
The Actual Law of Motion for {Yt } As we’ve seen, a given belief translates into a particular
decision rule h
Recalling that Yt = nyt , the actual law of motion for market-wide output is then
Thus, when firms believe that the law of motion for market-wide output is (2.131), their optimizing
behavior makes the actual law of motion be (2.137)
Fixed point characterization As we’ve seen, the firm’s optimum problem induces a mapping Φ
from a perceived law of motion H for market-wide output to an actual law of motion Φ( H )
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via (2.132)–(2.134), and a decision rule into an actual law via (2.137)
The H component of a rational expectations equilibrium is a fixed point of Φ
Computation of an Equilibrium
Now let’s consider the problem of computing the rational expectations equilibrium
A Planning Problem Approach Our plan of attack is to match the Euler equations of the market
problem with those for a single-agent choice problem
As we’ll see, this planning problem can be solved by LQ control (linear regulator)
The optimal quantities from the planning problem are rational expectations equilibrium quantities
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem
For convenience, in this section we set n = 1
We first compute a sum of consumer and producer surplus at time t
γ(Yt+1 − Yt )2
Z Yt
s(Yt , Yt+1 ) := ( a0 − a1 x ) dx − (2.138)
0 2
The first term is the area under the demand curve, while the second measures the social costs of
changing output
7 A literature that studies whether models populated with agents who learn can converge to rational expectations
equilibria features iterations on a modification of the mapping Φ that can be approximated as γΦ + (1 − γ) I. Here I
is the identity operator and γ ∈ (0, 1) is a relaxation parameter. See [MS89] and [EH01] for statements and applications
of this approach to establish conditions under which collections of adaptive agents who use least squares learning
converge to a rational expectations equilibrium.
Solution of the Planning Problem Evaluating the integral in (2.138) yields the quadratic form
a0 Yt − a1 Yt2 /2
As a result, the Bellman equation for the planning problem is
a 1 2 γ (Y 0 − Y ) 2
V (Y ) = max a 0 Y − Y − + βV (Y 0 ) (2.139)
Y0 2 2
− γ(Y 0 − Y ) + βV 0 (Y 0 ) = 0 (2.140)
V 0 (Y ) = a 0 − a 1 Y + γ (Y 0 − Y )
Substituting this into equation (2.140) and rearranging leads to the Euler equation
The Key Insight Return to equation (2.136) and set yt = Yt for all t
(Recall that for this section we’ve set n = 1 to simplify the calculations)
A small amount of algebra will convince you that when yt = Yt , equations (2.141) and (2.136) are
identical
Thus, the Euler equation for the planning problem matches the second-order difference equation
that we derived by
1. finding the Euler equation of the representative firm and
2. substituting into it the expression Yt = nyt that “makes the representative firm be represen-
tative”
If it is appropriate to apply the same terminal conditions for these two difference equations, which
it is, then we have verified that a solution of the planning problem is also a rational expectations
equilibrium quantity sequence
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation (2.139)
The optimal policy function for the planning problem is the aggregate law of motion H that the
representative firm faces within a rational expectations equilibrium.
Structure of the Law of Motion As you are asked to show in the exercises, the fact that the
planner’s problem is an LQ problem implies an optimal policy — and hence aggregate law of
motion — taking the form
Yt+1 = κ0 + κ1 Yt (2.142)
for some parameter pair κ0 , κ1
Now that we know the aggregate law of motion is linear, we can see from the firm’s Bellman
equation (2.132) that the firm’s problem can also be framed as an LQ problem
As you’re asked to show in the exercises, the LQ formulation of the firm’s problem implies a law
of motion that looks as follows
yt+1 = h0 + h1 yt + h2 Yt (2.143)
Hence a rational expectations equilibrium will be defined by the parameters (κ0 , κ1 , h0 , h1 , h2 ) in
(2.142)–(2.143)
Exercises
Express the solution of the firm’s problem in the form (2.143) and give the values for each h j
If there were n identical competitive firms all behaving according to (2.143), what would (2.143)
imply for the actual law of motion (2.131) for market supply
Exercise 2 Consider the following κ0 , κ1 pairs as candidates for the aggregate law of motion
component of a rational expectations equilibrium (see (2.142))
Extending the program that you wrote for exercise 1, determine which if any satisfy the definition
of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to compute a
rational expectations equilibrium
(You are not being asked actually to use the algorithm you are suggesting)
Exercise 4 A monopolist faces the industry demand curve (2.128) and chooses {Yt } to maximize
∑∞ t
t=0 β rt where
γ(Yt+1 − Yt )2
rt = pt Yt −
2
Formulate this problem as an LQ problem
Compute the optimal policy using the same parameters as the previous exercise
In particular, solve for the parameters in
Yt+1 = m0 + m1 Yt
Solutions
Solution notebook
Overview
Background
Example: A duopoly model Two firms are the only producers of a good the demand for which
is governed by a linear inverse demand function
p = a0 − a1 ( q1 + q2 ) (2.144)
Here p = pt is the price of the good, qi = qit is the output of firm i = 1, 2 at time t and a0 > 0, a1 > 0
In (2.144) and what follows,
• the time subscript is suppressed when possible to simplify notation
• x̂ denotes a next period value of variable x
Each firm recognizes that its output affects total output and therefore the market price
The one-period payoff function of firm i is price times quantity minus adjustment costs:
Substituting the inverse demand curve (2.144) into (2.145) lets us express the one-period payoff as
Firm i chooses a decision rule that sets next period quantity q̂i as a function f i of the current state
( qi , q −i )
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule of the
other firm as known and given
Given f −i , the Bellman equation of firm i is
vi (qi , q−i ) = max {πi (qi , q−i , q̂i ) + βvi (q̂i , f −i (q−i , qi ))} (2.147)
q̂i
Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions (v1 , v2 )
and a pair of policy functions ( f 1 , f 2 ) such that, for each i ∈ {1, 2} and each possible state,
• The value function vi satisfies the Bellman equation (2.147)
• The maximizer on the right side of (2.147) is equal to f i (qi , q−i )
The adjective “Markov” denotes that the equilibrium decision rules depend only on the current
values of the state variables, not other parts of their histories
“Perfect” means complete, in the sense that the equilibrium is constructed by backward induction
and hence builds in optimizing behavior for each firm for all possible future states
This includes many states that will not be reached when we iterate forward on the pair of equilib-
rium strategies f i
Computation One strategy for computing a Markov perfect equilibrium is iterating to conver-
gence on pairs of Bellman equations and decision rules
j j
In particular, let vi , f i be the value function and policy function for firm i at the j-th iteration
Imagine constructing the iterates
n o
j +1 j
vi (qi , q−i ) = max πi (qi , q−i , q̂i ) + βvi (q̂i , f −i (q−i , qi )) (2.148)
q̂i
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Riccati
equations” with a tractable mathematical structure
We’ll lay out that structure in a general setup and then apply it to some simple problems
A Coupled Linear Regulator Problem We consider a general linear quadratic regulator game
with two players
For convenience, we’ll start with a finite horizon formulation, where t0 is the initial date and t1 is
the common terminal date
Here
• xt is an n × 1 state vector and uit is a k i × 1 vector of controls for player i
• Ri is n × n
• Si is k −i × k −i
• Qi is k i × k i
• Wi is n × k i
• Mi is k −i × k i
• A is n × n
• Bi is n × k i
subject to
xt+1 = Λ1t xt + B1 u1t , (2.152)
where
• Λit := A − B−i F−it
• Πit := Ri + F−0 it Si F−it
• Γit := Wi0 − Mi0 F−it
This is an LQ dynamic programming problem that can be solved by working backwards
The policy rule that solves this problem is
P1t = Π1t − ( βB10 P1t+1 Λ1t + Γ1t )0 ( Q1 + βB10 P1t+1 B1 )−1 ( βB10 P1t+1 Λ1t + Γ1t ) + βΛ1t
0
P1t+1 Λ1t (2.154)
P2t = Π2t − ( βB20 P2t+1 Λ2t + Γ2t )0 ( Q2 + βB20 P2t+1 B2 )−1 ( βB20 P2t+1 Λ2t + Γ2t ) + βΛ2t
0
P2t+1 Λ2t (2.156)
Infinite horizon We often want to compute the solutions of such games for infinite horizons, in
the hope that the decision rules Fit settle down to be time invariant as t1 → +∞
In practice, we usually fix t1 and compute the equilibrium of an infinite horizon game by driving
t0 → − ∞
This is the approach we adopt in the next section
Implementation Below we display a function called nnash that computes a Markov perfect equi-
librium of the infinite horizon linear quadratic dynamic game in the manner described above
from __future__ import division, print_function
import numpy as np
from numpy import dot, eye
from scipy.linalg import solve
def nnash(A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
beta=1.0, tol=1e-8, max_iter=1000):
r"""
Compute the limit of a Nash linear quadratic dynamic game. In this
problem, player i minimizes
.. math::
\sum_{t=0}^{\infty}
\left\{
x_t' r_i x_t + 2 x_t' w_i
u_{it} +u_{it}' q_i u_{it} + u_{jt}' s_i u_{jt} + 2 u_{jt}'
m_i u_{it}
\right\}
.. math::
x_{t+1} = A x_t + b_1 u_{1t} + b_2 u_{2t}
and a perceived control law :math:`u_j(t) = - f_j x_t` for the other
player.
Parameters
----------
A : scalar(float) or array_like(float)
Corresponds to the above equation, should be of size (n, n)
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
beta : scalar(float), optional(default=1.0)
Discount rate
tol : scalar(float), optional(default=1e-8)
This is the tolerance level for convergence
max_iter : scalar(int), optional(default=1000)
Returns
-------
F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 1
P2 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 2
"""
# == Unload parameters and make sure everything is an array == #
params = A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2
params = map(np.asarray, params)
A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2 = params
n = A.shape[0]
if B1.ndim == 1:
k_1 = 1
B1 = np.reshape(B1, (n, 1))
else:
k_1 = B1.shape[1]
if B2.ndim == 1:
k_2 = 1
B2 = np.reshape(B2, (n, 1))
else:
k_2 = B2.shape[1]
v1 = eye(k_1)
v2 = eye(k_2)
P1 = np.zeros((n, n))
P2 = np.zeros((n, n))
F1 = np.random.randn(k_1, n)
F2 = np.random.randn(k_2, n)
for it in range(max_iter):
# update
F10 = F1
F20 = F2
H1 = dot(G1, B1.T.dot(P1))
Lambda1 = A - B2.dot(F2)
Lambda2 = A - B1.dot(F1)
Pi1 = R1 + dot(F2.T, S1.dot(F2))
Pi2 = R2 + dot(F1.T, S2.dot(F1))
else:
msg = 'No convergence: Iteration limit of {0} reached in nnash'
raise ValueError(msg.format(max_iter))
Applications
Let’s use these procedures to treat some applications, starting with the duopoly model
The duopoly case To map the duopoly model into a coupled linear-quadratic dynamic program-
ming problem, define the state and controls as
1
xt := q1t and uit := qi,t+1 − qit , i = 1, 2
q2t
If we write
xt0 Ri xt + uit0 Qi uit
where Q1 = Q2 = γ,
− a20 − a20
0 0 0 0
a1 a1
R1 := − a20 a1 2 and R2 : = 0 0 2
a1 a1
0 2 0 − a20 2 a1
The optimal decision rule of firm i will take the form uit = − Fi xt , inducing the following closed
loop system for the evolution of x in the Markov perfect equilibrium:
xt+1 = ( A − B1 F1 − B1 F2 ) xt (2.157)
Parameters and Solution Consider the previously presented duopoly model with parameter
values of:
• a0 = 10
• a1 = 2
• β = 0.96
• γ = 12
From these we compute the infinite horizon MPE using the preceding code
"""
@authors: Chase Coleman, Thomas Sargent, John Stachurski
# == Parameters == #
a0 = 10.0
a1 = 2.0
beta = 0.96
gamma = 12.0
# == In LQ form == #
A = np.eye(3)
Q1 = Q2 = gamma
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print("F1 = {}".format(F1))
print("F2 = {}".format(F2))
print("\n")
One way to see that Fi is indeed optimal for firm i taking F2 as given is to use QuantEcon’s LQ
class
In particular, let’s take F2 as computed above, plug it into (2.151) and (2.152) to get firm 1’s problem
and solve it using LQ
We hope that the resulting policy will agree with F1 as computed above
In [2]: Lambda1 = A - np.dot(B2, F2)
In [5]: F1_ih
Out[5]: array([[-0.66846611, 0.29512481, 0.07584666]])
This is close enough for rock and roll, as they say in the trade
Indeed, np.allclose agrees with our assessment
In [6]: np.allclose(F1, F1_ih)
Out[6]: True
Dynamics Let’s now investigate the dynamics of price and output in this simple duopoly model
under the MPE policies
Given our optimal policies F1 and F2, the state evolves according to (2.157)
The following program
• imports F1 and F2 from the previous program along with all parameters
• computes the evolution of xt using (2.157)
• extracts and plots industry output qt = q1t + q2t and price pt = a0 − a1 qt
import matplotlib.pyplot as plt
from duopoly_mpe import *
AF = A - B1.dot(F1) - B2.dot(F2)
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = np.dot(AF, x[:, t])
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
Note that the initial condition has been set to q10 = q20 = 1.0
The resulting figure looks as follows
To gain some perspective we can compare this to what happens in the monopoly case
The first panel in the next figure compares output of the monopolist and industry output under
the MPE, as a function of time
The second panel shows analogous curves for price
Here parameters are the same as above for both the MPE and monopoly solutions
The monopolist initial condition is q0 = 2.0 to mimic the industry initial condition q10 = q20 = 1.0
in the MPE case
As expected, output is higher and prices are lower under duopoly than monopoly
Exercises
Exercise 1 Replicate the pair of figures showing the comparison of output and prices for the mo-
nopolist and duopoly under MPE
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies under
duopoly
The optimal policy in the monopolist case can be computed using QuantEcon’s LQ class
delta = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
Solutions
Solution notebook
Contents
• Markov Asset Pricing
– Overview
– Pricing Models
– Finite Markov Asset Pricing
– Implementation
– Exercises
– Solutions
“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.
Overview
Pricing Models
We begin with some notation and then proceed to foundational pricing models
In what follows let d0 , d1 , . . . be a stream of dividends
• A time-t cum-dividend asset is a claim to the stream dt , dt+1 , . . .
• A time-t ex-dividend asset is a claim to the stream dt+1 , dt+2 , . . .
p t = d t + β E t [ p t +1 ] (2.158)
p t = β E t [ d t +1 + p t +1 ] (2.159)
Pricing Under Risk Aversion Let’s now introduce risk aversion by supposing that all agents
evaluate payoffs according to strictly concave period utility function u
In this setting Robert Lucas [Luc78] showed that under certain equilibrium conditions the price of
an ex-dividend asset obeys the famous consumption-based asset pricing equation
0
pt = E t β 0
u ( d t +1 )
( d t +1 + p t +1 ) (2.160)
u (dt )
Comparing (2.159) and (2.160), the difference is that β in (2.159) has been replaced by
u 0 ( d t +1 )
β
u0 (dt )
c 1− γ
u(c) = with γ > 0 or u(c) = ln c
1−γ
Example 1: Constant dividends, risk neutral pricing The simplest case is a constant, non-
random dividend stream dt = d > 0
Removing the expectation from (2.158) and iterating forward gives
pt = d + βpt+1
= d + β(d + βpt+2 )
..
.
= d + βd + β2 d + · · · + βk−1 d + βk pt+k
1
pt = d
1−β
dt λ t d0
pt = = (2.161)
1 − βλ 1 − βλ
(Hint: Set vt = pt /dt in (2.158) and then vt = vt+1 = v to solve for constant v)
The ex-dividend price is pt = (1 − βλ)−1 βλdt
If, in this example, we take λ = 1 + g, then the ex-dividend price becomes
1+g
pt = dt
ρ−g
Example 3: Markov growth, risk neutral pricing Next we consider a dividend process where
the growth rate is Markovian
In particular,
d t +1 = λ t +1 d t where P{λt+1 = s j | λt = si } = Pij :=: P[i, j]
This notation means that {λt } is an n state Markov chain with transition matrix P and state space
s = { s1 , . . . , s n }
To obtain asset prices under risk neutrality, recall that in (2.161) the price dividend ratio pt /dt is
constant and depends on λ
This encourages us to guess that, in the current case, pt /dt is constant given λt
That is pt = v(λt )dt for some unknown function v on the state space
To simplify notation, let vi := v(si )
For a cum-dividend stock we find that vi = 1 + β ∑nj=1 Pij s j v j
Letting 1 be an n × 1 vector of ones and P̃ij = Pij s j , we can express this in matrix notation as
v = ( I − β P̃)−1 1
Here we are assuming invertibility, which requires that the growth rate of the Markov chain is not
too large relative to β
(In particular, that the eigenvalues of P̃ be strictly less than β−1 in modulus)
Similar reasoning yields the ex-dividend price-dividend ratio w, which satisfies
w = β( I − β P̃)−1 Ps0
Example 4: Deterministic dividends, risk averse pricing Our formula for pricing a cum-
dividend claim to a non random stream dt = λt d then becomes
pt = dt + βλ−γ pt+1
Guessing again that the price obeys pt = vdt where v is a constant price-dividend ratio, we have
vdt = dt + βλ−γ vdt+1 , or
1
v=
1 − βλ1−γ
If u0 (c) = 1/c, then the preceding formula for the price-dividend ratio becomes v = 1/(1 − β)
Here the price-dividend ratio is constant and independent of the dividend growth rate λ
For the remainder of the lecture we focus on computing asset prices when
• endowments follow a finite state Markov chain
• agents are risk averse, and prices obey (2.160)
Our finite state Markov setting emulates [MP85]
In particular, we’ll assume that there is an endowment of a consumption good that follows
c t +1 = λ t +1 c t (2.162)
Pricing the Lucas tree Using (2.160), the definition of u and (2.162) leads to
Drawing intuition from our earlier discussion on pricing with Markov growth, we guess a pricing
function of the form pt = v(λt )ct where v is yet to be determined
1− γ
Letting P̃ij = Pij s j , we can write (2.164) as v = β P̃1 + β P̃v
Assuming again that the eigenvalues of P̃ are strictly less than β−1 in modulus, we can solve this
to yield
v = β( I − β P̃)−1 P̃1 (2.165)
With log preferences, γ = 1 and hence s1−γ = 1, from which we obtain
β
v= 1
1−β
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant
A Risk-Free Consol Consider the same pure exchange representative agent economy
A risk-free consol promises to pay a constant amount ζ > 0 each period
Recycling notation, let pt now be the price of an ex-coupon claim to the consol
An ex-coupon claim to the consol entitles the owner at the end of period t to
• ζ in period t + 1, plus
• the right to sell the claim for pt+1 next period
The price satisfies
u0 (ct ) pt = βE t u0 (ct+1 )(ζ + pt+1 )
It follows that
p t = β E t λ t +1 ( ζ + p t +1 )
h i
−γ
(2.166)
p t = p ( λ t ) = pi when λ t = si
Pricing an Option to Purchase the Consol Let’s now price options of varying maturity that give
the right to purchase a consol at a price pS
An infinite horizon call option We want to price an infinite horizon option to purchase a consol
at a price pS
The option entitles the owner at the beginning of a period either to
1. purchase the bond at price pS now, or
2. to hold the option until next period
Thus, the owner either exercises the option now, or chooses not to exercise and wait until next period
This is termed an infinite-horizon call option with strike price pS
The owner of the option is entitled to purchase the consol at the price pS at the beginning of any
period, after the coupon has been paid to the previous owner of the bond
The economy is identical with the one above
Let w(λt , pS ) be the value of the option when the time t growth state is known to be λt but before
the owner has decided whether or not to exercise the option at time t (i.e., today)
Recalling that p(λt ) is the value of the consol when the initial growth state is λt , the value of the
option satisfies
u 0 ( c t +1 )
w(λt , pS ) = max β E t 0
w ( λ t +1 , p S ), p ( λ t ) − p S
u (ct )
The first term on the right is the value of waiting, while the second is the value of exercising
We can also write this as
( )
n
β ∑ Pij s j w(s j , pS ), p(si ) − pS
−γ
w(si , pS ) = max (2.168)
j =1
−γ
Letting P̂ij = Pij s j and wi = w(si , pS ), we can express (2.168) as the nonlinear vector equation
To solve (2.169), form the operator T mapping vector w into vector Tw via
Tw = max{ β P̂w, p − pS 1}
Finite-horizon options Finite horizon options obey functional equations closely related to
(2.168)
A k period option expires after k periods
At time t, a k period option gives the owner the right to exercise the option to purchase the risk-free
consol at the strike price pS at t, t + 1, . . . , t + k − 1
The option expires at time t + k
Thus, for k = 1, 2, . . ., let w(si , k ) be the value of a k-period option
It obeys ( )
n
β∑
−γ
w(si , k ) = max Pij s j w(s j , k − 1), p ( si ) − p S
j =1
The one-period risk-free interest rate For this economy, the stochastic discount factor is
−γ
c t +1
m t +1 = β −γ = βλ− γ
t +1
ct
n
E t mt+1 = β ∑ Pij s−j γ
j =1
or
m1 = βPs−γ
where the i-th element of m1 is the reciprocal of the one-period gross risk-free interest rate when
λ t = si
j period risk-free interest rates Let m j be an n × 1 vector whose i th component is the reciprocal
of the j -period gross risk-free interest rate when λt = si
−γ
Again, let P̂ij = Pij s j
Implementation
The class AssetPrices from the QuantEcon.applications package provides methods for computing
some of the prices described above
We print the code here for convenience
"""
Filename: asset_pricing.py
References
----------
http://quant-econ.net/py/markov_asset.html
"""
from textwrap import dedent
import numpy as np
from numpy.linalg import solve
class AssetPrices(object):
r"""
A class to compute asset prices when the endowment follows a finite
Markov chain.
Parameters
----------
beta : scalar, float
Discount factor
P : array_like(float)
Transition matrix
s : array_like(float)
Growth rate of consumption
gamma : scalar(float)
Coefficient of risk aversion
Attributes
----------
beta, P, s, gamma : see Parameters
n : scalar(int)
The number of rows in P
Examples
--------
>>> n = 5
>>> P = 0.0125 * np.ones((n, n))
>>> P += np.diag(0.95 - 0.0125 * np.ones(5))
>>> s = np.array([1.05, 1.025, 1.0, 0.975, 0.95])
>>> gamma = 2.0
>>> beta = 0.94
>>> ap = AssetPrices(beta, P, s, gamma)
>>> zeta = 1.0
>>> v = ap.tree_price()
>>> print("Lucas Tree Prices: %s" % v)
Lucas Tree Prices: [ 12.72221763 14.72515002 17.57142236
21.93570661 29.47401578]
"""
def __init__(self, beta, P, s, gamma):
self.beta, self.gamma = beta, gamma
self.P, self.s = P, s
self.n = self.P.shape[0]
def __repr__(self):
m = "AssetPrices(beta={b:g}, P='{n:g} by {n:g}', s={s}, gamma={g:g})"
return m.format(b=self.beta, n=self.P.shape[0], s=self.s, g=self.gamma)
def __str__(self):
m = """\
AssetPrices (Merha and Prescott, 1985):
- beta (discount factor) : {b:g}
- P (Transition matrix) : {n:g} by {n:g}
- s (growth rate of consumption) : {s:s}
- gamma (Coefficient of risk aversion) : {g:g}
"""
@property
def P_tilde(self):
P, s, gamma = self.P, self.s, self.gamma
return P * s**(1.0-gamma) # using broadcasting
@property
def P_check(self):
P, s, gamma = self.P, self.s, self.gamma
return P * s**(-gamma) # using broadcasting
def tree_price(self):
"""
Computes the function v such that the price of the lucas tree is
v(lambda)C_t
Returns
-------
v : array_like(float)
Lucas tree prices
"""
# == Simplify names == #
beta = self.beta
# == Compute v == #
P_tilde = self.P_tilde
I = np.identity(self.n)
O = np.ones(self.n)
v = beta * solve(I - beta * P_tilde, P_tilde.dot(O))
return v
Parameters
----------
zeta : scalar(float)
Coupon of the console
Returns
-------
p_bar : array_like(float)
Console bond prices
"""
# == Simplify names == #
beta = self.beta
# == Compute price == #
P_check = self.P_check
I = np.identity(self.n)
O = np.ones(self.n)
p_bar = beta * solve(I - beta * P_check, P_check.dot(zeta * O))
return p_bar
Parameters
----------
zeta : scalar(float)
Coupon of the console
p_s : scalar(float)
Strike price
T : iterable(integers)
Length of option in the finite horizon case
Returns
-------
w_bar : array_like(float)
Infinite horizon call option prices
w_bars : dict
A dictionary of key-value pairs {t: vec}, where t is one of
the dates in the list T and vec is the option prices at that
date
"""
# == Simplify names, initialize variables == #
beta = self.beta
P_check = self.P_check
# == Update == #
w_bar = w_bar_new
t += 1
Exercises
Exercise 1 Compute the price of the Lucas tree in an economy with the following primitives
n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([1.05, 1.025, 1.0, 0.975, 0.95]) # state values
gamma = 2.0
beta = 0.94
zeta = 1.0
Using the same set of primitives, compute the price of the risk-free console when ζ = 1
Do the same for the call option on the console when pS = 150.0
Compute the value of the option at dates T = [10,20,30]
Solutions
Solution notebook
Contents
• The Permanent Income Model
– Overview
– The Savings Problem
– Alternative Representations
– Two Classic Examples
– Further Reading
– Appendix: The Euler Equation
Overview
This lecture describes a rational expectations version of the famous permanent income model of
Friedman [Fri56]
Hall cast Friedman’s model within a linear-quadratic setting [Hal78]
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem
We use the model as a vehicle for illustrating
• alternative formulations of the state of a dynamic system
• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income
In this section we state and solve the savings and consumption problem faced by the consumer
X t +1 = X t + w t +1
Not every martingale arises as a random walk (see, for example, Wald’s martingale)
The Decision Problem A consumer has preferences over consumption streams that are ordered
by the utility functional
∞
" #
E 0 ∑ βt u(ct ) (2.170)
t =0
where
• E t is the mathematical expectation conditioned on the consumer’s time t information
• ct is time t consumption
• u is a strictly concave one-period utility function
• β ∈ (0, 1) is a discount factor
The consumer maximizes (2.170) by choosing a consumption, borrowing plan {ct , bt+1 }∞
t=0 subject
to the sequence of budget constraints
Here
• yt is an exogenous endowment process
• r > 0 is the risk-free interest rate
• bt is one-period risk-free debt maturing at t
• b0 is a given initial condition
Assumptions For the remainder of this lecture, we follow Friedman and Hall in assuming that
(1 + r ) −1 = β
Regarding the endowment process, we assume it has the state-space representation
where
• {wt } is an iid vector process with E wt = 0 and E wt wt0 = I
• the spectral radius of A satisfies ρ( A) < 1/β
• U is a selection vector that pins down yt as a particular linear combination of the elements
of xt .
The restriction on ρ( A) prevents income from growing so fast that some discounted geometric
sums of some infinite sequences below become infinite
We also impose the no Ponzi scheme condition
∞
" #
E0 ∑ βt bt2 <∞ (2.174)
t =0
This condition rules out an always-borrow scheme that would allow the household to enjoy un-
bounded or bliss consumption forever
Regarding preferences, we assume the quadratic utility function
First Order Conditions First-order conditions for maximizing (2.170) subject to (2.171) are
These equations are also known as the Euler equations for the model
If you’re not sure where they come from, you can find a proof sketch in the appendix
With our quadratic preference specification, (2.175) has the striking implication that consumption
follows a martingale:
E t [ c t +1 ] = c t (2.176)
(In fact quadratic preferences are necessary for this conclusion 8 )
One way to interpret (2.176) is that consumption will only change when “new information” about
permanent income is revealed
These ideas will be clarified below
The Optimal Decision Rule The state vector confronting the household at t is bt xt
Here
• xt is an exogenous component, unaffected by household behavior
• bt is an endogenous component (since it depends on the decision rule)
Note that xt contains all variables useful for forecasting the household’s future endowment
Now let’s deduce the optimal decision rule 9
Note: One way to solve the consumer’s problem is to apply dynamic programming as in this lecture.
We do this later. But first we use an alternative approach that is revealing and shows the work
that dynamic programming does for us automatically
8 A linear marginal utility is essential for deriving (2.176) from (2.175). Suppose instead that we had imposed
the following more standard assumptions on the utility function: u0 (c) > 0, u00 (c) < 0, u000 (c) > 0 and required
that c ≥ 0. The Euler equation remains (2.175). But the fact that u000 < 0 implies via Jensen’s inequality that
E 0 E0
t [ u ( ct+1 )] > u ( t [ ct+1 ]). This inequality together with (2.175) implies that E
t [ ct+1 ] > ct (consumption is said
to be a ‘submartingale’), so that consumption stochastically diverges to +∞. The consumer’s savings also diverge to
+∞.
9 An optimal decision rule is a map from current state into current actions—in this case, consumption
We want to solve the system of difference equations formed by (2.171) and (2.176) subject to the
boundary condition (2.174)
To accomplish this, observe first that (2.174) implies limt→∞ βt bt+1 = 0
Using this restriction on the debt path and solving (2.171) forward yields
∞
bt = ∑ β j ( yt+ j − ct+ j ) (2.177)
j =0
Take conditional expectations on both sides of (2.177) and use the law of iterated expectations to
deduce
∞
bt = ∑ β j E t [ y t + j ] −
ct
(2.178)
j =0
1−β
∞
" #
c t = (1 − β ) ∑ β E t [ y t + j ] − bt
j
(2.179)
j =0
1
If we define the net rate of interest r by β = 1+r , we can also express this equation as
∞
" #
∑ β E t [ y t + j ] − bt
r j
ct =
1+r j =0
These last two equations assert that consumption equals economic income
• financial wealth equals bt
j =0 β E t [ y t + j ]
• non-financial wealth equals ∑∞ j
r
• A marginal propensity to consume out of wealth equals the interest factor 1+r
A State-Space Representation The preceding results provide a decision rule and hence the dy-
namics of both state and control variables
First note that equation (2.179) represents ct as a function of the state bt xt confronting the
household
If the last statement isn’t clear, recall that E t [yt+ j ] can be expressed as a function of xt , since the
latter contains all information useful for forecasting the household’s endowment process
In fact, from this discussion we see that
∞ ∞
" #
∑ β E t [ yt+ j ] = E t ∑ β yt+ j
j j
= U ( I − βA)−1 xt
j =0 j =0
Using this expression, we can obtain a linear state-space system governing consumption, debt and
income:
xt+1 = Axt + Cwt+1 (2.180)
−1
bt+1 = bt + U [( I − βA) ( A − I )] xt (2.181)
yt = Uxt (2.182)
−1
ct = (1 − β)[U ( I − βA) x t − bt ] (2.183)
Define
x A 0 C
zt = t , Ã = , C̃ =
bt U ( I − βA)−1 ( A − I ) 1 0
and
U 0 y
Ũ = , ỹt = t
(1 − β)U ( I − βA)−1 −(1 − β) bt
Then we can express equation (2.180) as
zt+1 = Ãzt + C̃wt+1 (2.184)
ỹt = Ũzt (2.185)
We can use the following formulas from state-space representation to compute population mean
µt = E zt and covariance Σt := E [(zt − µt )(zt − µt )0 ]
µt+1 = õt with µ0 given (2.186)
A Simple Example with iid Income To gain some preliminary intuition on the implications of
(2.180), let’s look at a highly stylized example where income is just iid
(Later examples will investigate more realistic income streams)
In particular, let {wt }∞
t=1 be iid and scalar standard normal, and let
1
xt 0 0 σ
xt = , A= , U= 1 µ , C=
1 0 1 0
Finally, let b0 = x01 = 0
Under these assumptions we have yt = µ + σwt ∼ N (µ, σ2 )
Further, if you work through the state space representation, you will see that
t −1
bt = − σ ∑ w j
j =1
t
c t = µ + (1 − β ) σ ∑ w j
j =1
Thus income is iid and debt and consumption are both Gaussian random walks
Defining assets as −bt , we see that assets are just the cumulative sum of unanticipated income
prior to the present date
The next figure shows a typical realization with r = 0.05, µ = 1 and σ = 0.15
Alternative Representations
In this section we shed more light on the evolution of savings, debt and consumption by repre-
senting their dynamics in several different ways
Hall’s Representation Hall [Hal78] suggests a sharp way to summarize the implications of LQ
permanent income theory
First, to represent the solution for bt , shift (2.179) forward one period and eliminate bt+1 by using
(2.171) to obtain
∞
c t + 1 = ( 1 − β ) ∑ β j E t + 1 [ y t + j + 1 ] − ( 1 − β ) β − 1 ( c t + bt − y t )
h i
j =0
j=0 β E t yt+ j from the right side of the preceding equation and
If we add and subtract β−1 (1 − β) ∑∞ j
rearrange, we obtain
∞
c t +1 − c t = (1 − β ) ∑ β j E t +1 [ y t + j +1 ] − E t [ y t + j +1 ]
(2.189)
j =0
The right side is the time t + 1 innovation to the expected present value of the endowment process
{yt }
We can represent the optimal decision rule for ct , bt+1 in the form of (2.189) and (2.178), which is
repeated here:
∞
bt = ∑ β j E t [ y t + j ] −
1
ct (2.190)
j =0
1 − β
Equation (2.190) asserts that the household’s debt due at t equals the expected present value of its
endowment minus the expected present value of its consumption stream
A high debt thus indicates a large expected present value of surpluses yt − ct
Recalling again our discussion on forecasting geometric sums, we have
∞
E t ∑ β j yt+j = U ( I − βA)−1 xt
j =0
∞
E t+1 ∑ β j yt+j+1 = U ( I − βA)−1 xt+1
j =0
∞
E t ∑ β j yt+j+1 = U ( I − βA)−1 Axt
j =0
Using these formulas together with (2.172) and substituting into (2.189) and (2.190) gives the fol-
lowing representation for the consumer’s optimum decision rule:
ct+1 = ct + (1 − β)U ( I − βA)−1 Cwt+1 (2.191)
1
bt = U ( I − βA)−1 xt − ct (2.192)
1−β
yt = Uxt (2.193)
xt+1 = Axt + Cwt+1 (2.194)
Cointegration Representation (2.191) reveals that the joint process {ct , bt } possesses the property
that Engle and Granger [EG87] called cointegration
Cointegration is a tool that allows us to apply powerful results from the theory of stationary pro-
cesses to (certain transformations of) nonstationary models
To clarify cointegration in the present context, suppose that xt is asymptotically stationary 10
Despite this, both ct and bt will be non-stationary because they have unit roots (see (2.180) for bt )
Nevertheless, there is a linear combination of ct , bt that is asymptotically stationary
In particular, from the second equality in (2.191) we have
Equation (2.196) asserts that the cointegrating residual on the left side equals the conditional expec-
tation of the geometric sum of future incomes on the right 11
Cross-Sectional Implications Consider again (2.191), this time in light of our discussion of dis-
tribution dynamics in the lecture on linear systems
The dynamics of ct are given by
or
t
ct = c0 + ∑ ŵ j for ŵt+1 := (1 − β)U ( I − βA)−1 Cwt+1
j =1
10 This would be the case if, for example, the spectral radius of A is strictly less than one
11 See Campbell and Shiller (1988) and Lettau and Ludvigson (2001, 2004) for interesting applications of related ideas.
The unit root affecting ct causes the time t variance of ct to grow linearly with t
In particular, since {ŵt } is iid, we have
when
σ̂2 := (1 − β)2 U ( I − βA)−1 CC 0 ( I − βA0 )−1 U 0
Assuming that σ̂ > 0, this means that {ct } has no asymptotic distribution
Let’s consider what this means for a cross-section of ex ante identical households born at time 0
Let the distribution of c0 represent the cross-section of initial consumption values
Equation (2.198) tells us that the distribution of ct spreads out over time at a rate proportional to t
A number of different studies have investigated this prediction (see, e.g., [DP94], [STY04])
Impulse Response Functions Impulse response functions measure the change in a dynamic sys-
tem subject to a given impulse (i.e., temporary shock)
The impulse response function of {ct } to the innovation {wt } is a box
In particular, the response of ct+ j to a unit increase in the innovation wt+1 is (1 − β)U ( I − βA)−1 C
for all j ≥ 1
Moving Average Representation It’s useful to express the innovation to the expected present
value of the endowment process in terms of a moving average representation for income yt
The endowment process defined by (2.172) has the moving average representation
y t +1 = d ( L ) w t +1 (2.199)
where
• d( L) = ∑∞ j
j=0 d j L for some sequence d j , where L is the lag operator
12
We illustrate some of the preceding ideas with the following two examples
In both examples, the endowment follows the process yt = x1t + x2t where
x1t+1 1 0 x1t σ1 0 w1t+1
= +
x2t+1 0 0 x2t 0 σ2 w2t+1
Here
• wt+1 is an iid 2 × 1 process distributed as N (0, I )
• x1t is a permanent component of yt
• x2t is a purely transitory component
Example 1 Assume as before that the consumer observes the state xt at time t
In view of (2.191) we have
Formula (2.202) shows how an increment σ1 w1t+1 to the permanent component of income x1t+1
leads to
• a permanent one-for-one increase in consumption and
• no increase in savings −bt+1
But the purely transitory component of income σ2 w2t+1 leads to a permanent increment in con-
sumption by a fraction 1 − β of transitory income
The remaining fraction β is saved, leading to a permanent increment in −bt+1
Application of the formula for debt in (2.180) to this example shows that
This confirms that none of σ1 w1t is saved, while all of σ2 w2t is saved
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
The code for generating this figure is in file perm_income/perm_inc_ir.py from the applications
repository, as shown below
"""
Impulse response functions for the LQ permanent income model permanent and
transitory shocks.
"""
import numpy as np
import matplotlib.pyplot as plt
r = 0.05
beta = 1 / (1 + r)
T = 20 # Time horizon
S = 5 # Impulse date
sigma1 = sigma2 = 0.15
def time_path(permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] - sigma2 * w2[t]
c[t+1] = c[t] + sigma1 * w1[t+1] + (1 - beta) * sigma2 * w2[t+1]
return b, c
L = 0.175
for ax in axes:
ax.grid(alpha=0.5)
ax.set_xlabel(r'Time')
ax.set_ylim(-L, L)
ax.plot((S, S), (-L, L), 'k-', lw=0.5)
ax = axes[0]
b, c = time_path(permanent=0)
ax.set_title('impulse-response, transitory income shock')
ax.plot(list(range(T+1)), c, 'g-', label="consumption", **p_args)
ax.plot(list(range(T+1)), b, 'b-', label="debt", **p_args)
ax.legend(loc='upper right')
ax = axes[1]
b, c = time_path(permanent=1)
ax.set_title('impulse-response, permanent income shock')
ax.plot(list(range(T+1)), c, 'g-', label="consumption", **p_args)
ax.plot(list(range(T+1)), b, 'b-', label="debt", **p_args)
ax.legend(loc='lower right')
plt.show()
Example 2 Assume now that at time t the consumer observes yt , and its history up to t, but not
xt
Under this assumption, it is appropriate to use an innovation representation to form A, C, U in (2.191)
The discussion in sections 2.9.1 and 2.11.3 of [LS12] shows that the pertinent state space represen-
tation for yt is
y t +1 1 −(1 − K ) yt 1
= + a
a t +1 0 0 at 1 t +1
yt
yt = 1 0
at
where
• K := the stationary Kalman gain
• a t : = y t − E [ y t | y t −1 , . . . , y 0 ]
In the same discussion in [LS12] it is shown that K ∈ [0, 1] and that K increases as σ1 /σ2 does
In other words, as the ratio of the standard deviation of the permanent shock to that of the transi-
tory shock increases
Applying formulas (2.191) implies
where the endowment process can now be represented in terms of the univariate innovation to yt
as
y t +1 − y t = a t +1 − (1 − K ) a t (2.205)
Equation (2.205) indicates that the consumer regards
• fraction K of an innovation at+1 to yt+1 as permanent
• fraction 1 − K as purely transitory
The consumer permanently increases his consumption by the full amount of his estimate of the
permanent part of at+1 , but by only (1 − β) times his estimate of the purely transitory part of at+1
Therefore, in total he permanently increments his consumption by a fraction K + (1 − β)(1 − K ) =
1 − β(1 − K ) of at+1
He saves the remaining fraction β(1 − K )
According to equation (2.205), the first difference of income is a first-order moving average
Equation (2.204) asserts that the first difference of consumption is iid
Application of formula to this example shows that
bt + 1 − bt = ( K − 1 ) a t (2.206)
This indicates how the fraction K of the innovation to yt that is regarded as permanent influences
the fraction of the innovation that is saved
Further Reading
The model described above significantly changed how economists think about consumption
At the same time, it’s generally recognized that Hall’s version of the permanent income hypothesis
fails to capture all aspects of the consumption/savings data
For example, liquidity constraints and buffer stock savings appear to be important
Further discussion can be found in, e.g., [HM82], [Par99], [Dea91], [Car01]
b1
c0 = − b0 + y0 and c1 = y1 − b1
1+r
Here b0 and y0 are given constants
Subsituting these constraints into our two period objective u(c0 ) + βE 0 [u(c1 )] gives
− b0 + y0 + β E 0 [u(y1 − b1 )]
b1
max u
b1 R
THREE
ADVANCED APPLICATIONS
This advanced section of the course contains more complex applications, and can be read selec-
tively, according to your interests
Contents
• Continuous State Markov Chains
– Overview
– The Density Case
– Beyond Densities
– Stability
– Exercises
– Solutions
– Appendix
Overview
In a previous lecture we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov chains
Most stochastic dynamic models studied by economists either fit directly into this class or can be
represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that
• evolve in discrete time
• are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic mod-
els have their own highly developed tool set, as we’ll see later on
409
3.1. CONTINUOUS STATE MARKOV CHAINS 410
The question that interests us most is: Given a particular stochastic dynamic model, how will the
state of the system evolve over time?
In particular,
• What happens to the distribution of the state variables?
• Is there anything we can say about the “average behavior” of these variables?
• Is there a notion of “steady state” or “long run equilibrium” that’s applicable to the model?
– If so, how can we compute it?
Answering these questions will lead us to revisit many of the topics that occupied us in the finite
state case, such as simulation, distribution dynamics, stability, ergodicity, etc.
Note: For some people, the term “Markov chain” always refers to a process with a finite or
discrete state space. We follow the mainstream mathematical literature (e.g., [MT09]) in using the
term to refer to any discrete time Markov process
You are probably aware that some distributions can be represented by densities and some cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of notation
and intuition
Once we’ve built some intuition we’ll cover the general case
Definitions and Basic Properties In our lecture on finite Markov chains, we studied discrete
time Markov chains that evolve on a finite state space S
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnegative
square matrix P = P[i, j] such that each row P[i, ·] sums to one
The interpretation of P is that P[i, j] represents the probability of transitioning from state i to state
j in one unit of time
In symbols,
P{ Xt+1 = j | Xt = i } = P[i, j]
Equivalently,
• P can be thought of as a family of distributions P[i, ·], one for each i ∈ S
• P[i, ·] is the distribution of Xt+1 given Xt = i
(As you probably recall, when using NumPy arrays, P[i, ·] is expressed as P[i,:])
In this section, we’ll allow S to be a subset of R, such as
• R itself
• the positive reals (0, ∞)
• a bounded interval ( a, b)
The family of discrete distributions P[i, ·] will be replaced by a family of densities p( x, ·), one for
each x ∈ S
Analogous to the finite state case, p( x, ·) is to be understood as the distribution (density) of Xt+1
given Xt = x
More formally, a stochastic kernel on S is a function p : S × S → R with the property that
1. p( x, y) ≥ 0 for all x, y ∈ S
R
2. p( x, y)dy = 1 for all x ∈ S
(Integrals are over the whole space unless otherwise specified)
For example, let S = R and consider the particular stochastic kernel pw defined by
( y − x )2
1
pw ( x, y) := √ exp − (3.1)
2π 2
Connection to Stochastic Difference Equations In the previous section, we made the connection
between stochastic difference equation (3.2) and stochastic kernel (3.1)
In economics and time series analysis we meet stochastic difference equations of all different
shapes and sizes
It will be useful for us if we have some systematic methods for converting stochastic difference
equations into stochastic kernels
To this end, consider the generic (scalar) stochastic difference equation given by
X t +1 = µ ( X t ) + σ ( X t ) ξ t +1 (3.3)
IID
• {ξ t } ∼ φ, where φ is a given density on R
• µ and σ are given functions on S, with σ( x ) > 0 for all x
Example 1: The random walk (3.2) is a special case of (3.3), with µ( x ) = x and σ( x ) = 1
Example 2: Consider the ARCH model
This is a special case of (3.3) with µ( x ) = αx and σ( x ) = ( β + γx2 )1/2 Example 3: With stochastic
production and a constant savings rate, the one-sector neoclassical growth model leads to a law
of motion for capital per worker such as
Here
• s is the rate of savings
• At+1 is a production shock
– The t + 1 subscript indicates that At+1 is not visible at time t
• δ is a depreciation rate
• f : R+ → R+ is a production function satisfying f (k ) > 0 whenever k > 0
(The fixed savings rate can be rationalized as the optimal policy for a particular set of technologies
and preferences (see [LS12], section 3.1.2), although we omit the details here)
Equation (3.5) is a special case of (3.3) with µ( x ) = (1 − δ) x and σ( x ) = s f ( x )
Now let’s obtain the stochastic kernel corresponding to the generic model (3.3)
To find it, note first that if U is a random variable with density f U , and V = a + bU for some
constants a, b with b > 0, then the density of V is given by
v−a
1
f V (v) = f U (3.6)
b b
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3)
Taking (3.6) as given for the moment, we can obtain the stochastic kernel p for (3.3) by recalling
that p( x, ·) is the conditional density of Xt+1 given Xt = x
In the present case, this is equivalent to stating that p( x, ·) is the density of Y := µ( x ) + σ( x ) ξ t+1
when ξ t+1 ∼ φ
Hence, by (3.6),
y − µ( x )
1
p( x, y) = φ (3.7)
σ( x) σ( x)
Distribution Dynamics In this section of our lecture on finite Markov chains, we asked the fol-
lowing question: If
1. { Xt } is a Markov chain with stochastic matrix P
2. the distribution of Xt is known to be ψt
then what is the distribution of Xt+1 ?
Letting ψt+1 denote the distribution of Xt+1 , the answer we gave was that
This intuitive equality states that the probability of being at j tomorrow is the probability of visit-
ing i today and then going on to j, summed over all possible i
In the density case, we just replace the sum with an integral and probability mass functions with
densities, yielding Z
ψt+1 (y) = p( x, y)ψt ( x ) dx, ∀y ∈ S (3.9)
Note: Unlike most operators, we write P to the right of its argument, instead of to the left (i.e.,
ψP instead of Pψ). This is a common convention, with the intention being to maintain the parallel
with the finite case — see here
With this notation, we can write (3.9) more succinctly as ψt+1 (y) = (ψt P)(y) for all y, or, dropping
the y and letting “=” indicate equality of functions,
ψt+1 = ψt P (3.11)
Equation (3.11) tells us that if we specify a distribution for ψ0 , then the entire sequence of future
distributions can be obtained by iterating with P
Note: Some people might be aware that discrete Markov chains are in fact a special case of the
continuous Markov chains we have just described. The reason is that probability mass functions
are densities with respect to the counting measure.
Computation To learn about the dynamics of a given process, it’s useful to compute and study
the sequences of densities generated by the model
One way to do this is to try to implement the iteration described by (3.10) and (3.11) using numer-
ical integration
However, to produce ψP from ψ via (3.10), you would need to integrate at every y, and there is a
continuum of such y
Another possibility is to discretize the model, but this introduces errors of unknown size
A nicer alternative in the present setting is to combine simulation with an elegant estimator called
the look ahead estimator
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of which
we repeat here for convenience:
Our aim is to compute the sequence {ψt } associated with this model and fixed initial condition ψ0
To approximate ψt by simulation, recall that, by definition, ψt is the density of k t given k0 ∼ ψ0
If we wish to generate observations of this random variable, all we need to do is
1. draw k0 from the specified initial condition ψ0
2. draw the shocks A1 , . . . , At from their specified density φ
3. compute k t iteratively via (3.12)
If we repeat this n times, we get n independent observations k1t , . . . , knt
With these draws in hand, the next step is to generate some kind of representation of their distri-
bution ψt
A naive approach would be to use a histogram, or perhaps a smoothed histogram using SciPy’s
gaussian_kde function
However, in the present setting there is a much better way to do this, based on the look-ahead
estimator
With this estimator, to construct an estimate of ψt , we actually generate n observations of k t−1 ,
rather than k t
Now we take these n observations k1t−1 , . . . , knt−1 and form the estimate
1 n
n i∑
ψtn (y) = p(kit−1 , y) (3.13)
=1
1 n
Z
n i∑
p(kit−1 , y) → Ep(kit−1 , y) = p( x, y)ψt−1 ( x ) dx = ψt (y)
=1
Implementation A class called LAE for estimating densities by this technique can be found in
QuantEcon
We repeat it here for convenience
"""
Filename: lae.py
.. math::
This is a density in y.
References
----------
http://quant-econ.net/py/stationary_densities.html
"""
from textwrap import dedent
import numpy as np
class LAE(object):
"""
An instance is a representation of a look ahead estimator associated
with a given stochastic kernel p and a vector of observations X.
Parameters
----------
p : function
The stochastic kernel. A function p(x, y) that is vectorized in
both x and y
X : array_like(float)
A vector containing observations
Attributes
----------
p, X : see Parameters
Examples
--------
>>> psi = LAE(p, X)
>>> y = np.linspace(0, 1, 100)
>>> psi(y) # Evaluate look ahead estimate at grid of points y
"""
def __repr__(self):
return self.__str__()
def __str__(self):
m = """\
Look ahead estimator
- number of observations : {n}
"""
return dedent(m.format(n=self.X.size))
Parameters
----------
y : array_like(float)
A vector of points at which we wish to evaluate the look-
ahead estimator
Returns
-------
psi_vals : array_like(float)
"""
k = len(y)
v = self.p(self.X, y.reshape((1, k)))
psi_vals = np.mean(v, axis=0) # Take mean along each row
return psi_vals.flatten()
Given our use of the __call__ method, an instance of LAE acts as a callable object, which is essen-
tially a function that can store its own data (see this discussion)
This function returns the right-hand side of (3.13) using
• the data and stochastic kernel that it stores as its instance data
• the value y as its argument
The function is vectorized, in the sense that if psi is such an instance and y is an array, then the
call psi(y) acts elementwise
(This is the reason that we reshaped X and y inside the class — to make vectorization work)
Because the implementation is fully vectorized, it is about as efficient as it would be in C or Fortran
Example An example of usage for the stochastic growth model described above can be found in
stationary_densities/stochasticgrowth.py
When run, the code produces a figure like this
The figure shows part of the density sequence {ψt }, with each density computed via the look
ahead estimator
Notice that the sequence of densities shown in the figure seems to be converging — more on this
in just a moment
Another quick comment is that each of these distributions could be interpreted as a cross sectional
distribution (recall this discussion)
Beyond Densities
Up until now, we have focused exclusively on continuous state Markov chains where all condi-
tional distributions p( x, ·) are densities
As discussed above, not all distributions can be represented as densities
If the conditional distribution of Xt+1 given Xt = x cannot be represented as a density for some
x ∈ S, then we need a slightly different theory
The ultimate option is to switch from densities to probability measures, but not all readers will be
familiar with measure theory
We can, however, construct a fairly general theory using distribution functions
Example and Definitions To illustrate the issues, recall that Hopenhayn and Rogerson [HR93]
study a model of firm dynamics where individual firm productivity follows the exogenous process
IID
Xt+1 = a + ρXt + ξ t+1 , where {ξ t } ∼ N (0, σ2 )
If you think about it, you will see that for any given x ∈ [0, 1], the conditional distribution of Xt+1
given Xt = x puts positive probability mass on 0 and 1
Hence it cannot be represented as a density
What we can do instead is use cumulative distribution functions (cdfs)
To this end, set
G ( x, y) := P{h( a + ρx + ξ t+1 ) ≤ y} (0 ≤ x, y ≤ 1)
This family of cdfs G ( x, ·) plays a role analogous to the stochastic kernel in the density case
The distribution dynamics in (3.9) are then replaced by
Z
Ft+1 (y) = G ( x, y) Ft (dx ) (3.14)
Here Ft and Ft+1 are cdfs representing the distribution of the current state and next period state
The intuition behind (3.14) is essentially the same as for (3.9)
Computation If you wish to compute these cdfs, you cannot use the look-ahead estimator as
before
Indeed, you should not use any density estimator, since the objects you are estimating/computing
are not densities
One good option is simulation as before, combined with the empirical distribution function
Stability
In our lecture on finite Markov chains we also studied stationarity, stability and ergodicity
Here we will cover the same topics for the continuous case
We will, however, treat only the density case (as in this section), where the stochastic kernel is a
family of densities
The general case is relatively similar — references are given below
Theoretical Results Analogous to the finite case, given a stochastic kernel p and corresponding
Markov operator as defined in (3.10), a density ψ∗ on S is called stationary for P if it is a fixed point
of the operator P
In other words, Z
ψ∗ (y) = p( x, y)ψ∗ ( x ) dx, ∀y ∈ S (3.15)
As with the finite case, if ψ∗ is stationary for P, and the distribution of X0 is ψ∗ , then, in view of
(3.11), Xt will have this same distribution for all t
Hence ψ∗ is the stochastic equivalent of a steady state
In the finite case, we learned that at least one stationary distribution exists, although there may be
many
When the state space is infinite, the situation is more complicated
Even existence can fail very easily
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210)
However, there are well-known conditions under which a stationary density ψ∗ exists
With additional conditions, we can also get a unique stationary density (ψ ∈ D and ψ = ψP =⇒
ψ = ψ∗ ), and also global convergence in the sense that
This combination of existence, uniqueness and global convergence in the sense of (3.16) is often
referred to as global stability
1 n
Z
n t∑
h ( Xt ) → h( x )ψ∗ ( x )dx as n → ∞ (3.17)
=1
for any (measurable) function h : S → R such that the right-hand side is finite
Note that the convergence in (3.17) does not depend on the distribution (or value) of X0
This is actually very important for simulation — it means we can learn about ψ∗ (i.e., approximate
the right hand side of (3.17) via the left hand side) without requiring any special knowledge about
what to do with X0
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the “edges” of the state space
2. Sufficient “mixing” obtains
For one such set of conditions see theorem 8.2.14 of EDTC
In addition
• [SLP89] contains a classic (but slightly outdated) treatment of these topics
• From the mathematical literature, [LM94] and [MT09] give outstanding in depth treatments
• Section 8.1.2 of EDTC provides detailed intuition, and section 8.3 gives additional references
• EDTC, section 11.3.4 provides a specific treatment for the growth model we considered in
this lecture
An Example of Stability As stated above, the growth model treated here is stable under mild con-
ditions on the primitives
• See EDTC, section 11.3.4 for more details
We can see this stability in action — in particular, the convergence in (3.16) — by simulating the
path of densities from various initial conditions
Here is such a figure
All sequences are converging towards the same limit, regardless of their initial condition
The details regarding initial conditions and so on are given in this exercise, where you are asked to
replicate the figure
Computing Stationary Densities In the preceding figure, each sequence of densities is converg-
ing towards the unique stationary density ψ∗
Even from this figure we can get a fair idea what ψ∗ looks like, and where its mass is located
However, there is a much more direct way to estimate the stationary density, and it involves only
a slight modification of the look ahead estimator
Let’s say that we have a model of the form (3.3) that is stable and ergodic
Let p be the corresponding stochastic kernel, as given in (3.7)
To approximate the stationary density ψ∗ , we can simply generate a long time series X0 , X1 , . . . , Xn
and estimate ψ∗ via
1 n
ψn∗ (y) = ∑ p( Xt , y) (3.18)
n t =1
This is essentially the same as the look ahead estimator (3.13), except that now the observations
we generate are a single time series, rather than a cross section
The justification for (3.18) is that, with probability one as n → ∞,
1 n
Z
n t∑
p ( Xt , y ) → p( x, y)ψ∗ ( x ) dx = ψ∗ (y)
=1
where the convergence is by (3.17) and the equality on the right is by (3.15)
The right hand side is exactly what we want to compute
On top of this asymptotic result, it turns out that the rate of convergence for the look ahead esti-
mator is very good
The first exercise helps illustrate this point
Exercises
This is one of those rare nonlinear stochastic models where an analytical expression for the sta-
tionary density is available
In particular, provided that |θ | < 1, there is a unique stationary density ψ∗ given by
∗ θy
ψ (y) = 2 φ(y) Φ (3.20)
(1 − θ 2 )1/2
Here φ is the standard normal density and Φ is the standard normal cdf
As an exercise, compute the look ahead estimate of ψ∗ , as defined in (3.18), and compare it with
ψ∗ in (3.20) to see whether they are indeed close for large n
In doing so, set θ = 0.8 and n = 500
The next figure shows the result of such a computation
The additional density (black line) is a nonparametric kernel density estimate, added to the solu-
tion for illustration
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look ahead estimator is a much tighter fit than the kernel density estimator
If you repeat the simulation you will see that this is consistently the case
n = 500
x = np.random.randn(n) # N(0, 1)
x = np.exp(x) # Map x to lognormal
y = np.random.randn(n) + 2.0 # N(2, 1)
z = np.random.randn(n) + 4.0 # N(4, 1)
Each data set is represented by a box, where the top and bottom of the box are the third and first
quartiles of the data, and the red line in the center is the median
Solutions
Solution notebook
Appendix
Contents
• The Lucas Asset Pricing Model
– Overview
– The Lucas Model
– Exercises
– Solutions
Overview
Lucas studied a pure exchange economy with a representative consumer (or household), where
• Pure exchange means that all endowments are exogenous
• Representative consumer means that either
– there is a single consumer (sometimes also referred to as a household), or
– all consumers have identical endowments and preferences
Either way, the assumption of a representative agent means that prices adjust to eradicate desires
to trade
This makes it very easy to compute competitive equilibrium prices
Assets There is a single “productive unit” that costlessly generates a sequence of consumption
goods {yt }∞
t =0
We will assume that this endowment is Markovian, following the exogenous process
y t +1 = G ( y t , ξ t +1 )
Here {ξ t } is an iid shock sequence with known distribution φ and yt ≥ 0
An asset is a claim on all or part of this endowment stream
The consumption goods {yt }∞
t=0 are nonstorable, so holding assets is the only way to transfer
wealth into the future
For the purposes of intuition, it’s common to think of the productive unit as a “tree” that produces
fruit
Based on this idea, a “Lucas tree” is a claim on the consumption endowment
Consumers A representative consumer ranks consumption streams {ct } according to the time
separable utility functional
∞
E ∑ βt u(ct ) (3.21)
t =0
Here
• β ∈ (0, 1) is a fixed discount factor
• u is a strictly increasing, strictly concave, continuously differentiable period utility function
• E is a mathematical expectation
Pricing a Lucas Tree What is an appropriate price for a claim on the consumption endowment?
We’ll price an ex dividend claim, meaning that
• the seller retains this period’s dividend
• the buyer pays pt today to purchase a claim on
– yt+1 and
– the right to sell the claim tomorrow at price pt+1
Since this is a competitive model, the first step is to pin down consumer behavior, taking prices as
given
Next we’ll impose equilibrium constraints and try to back out prices
In the consumer problem, the consumer’s control variable is the share πt of the claim held in each
period
Thus, the consumer problem is to maximize (3.21) subject to
c t + π t +1 p t ≤ π t y t + π t p t
along with ct ≥ 0 and 0 ≤ πt ≤ 1 at each t
The decision to hold share πt is actually made at time t − 1
But this value is inherited as a state variable at time t, which explains the choice of subscript
The dynamic program We can write the consumer problem as a dynamic programming problem
Our first observation is that prices depend on current information, and current information is
really just the endowment process up until the current period
In fact the endowment process is Markovian, so that the only relevant information is the current
state y ∈ R+ (dropping the time subscript)
This leads us to guess an equilibrium where price is a function p of y
Remarks on the solution method
• Since this is a competitive (read: price taking) model, the consumer will take this function p
as given
• In this way we determine consumer behavior given p and then use equilibrium conditions
to recover p
• This is the standard way to solve competitive equilibrum models
Using the assumption that price is a given function p of y, we write the value function and con-
straint as Z
v(π, y) = max
0
u(c) + β v(π 0 , G (y, z))φ(dz)
c,π
subject to
c + π 0 p(y) ≤ πy + π p(y) (3.22)
We can invoke the fact that utility is increasing to claim equality in (3.22) and hence eliminate the
constraint, obtaining
Z
0 0
v(π, y) = max
0
u[π (y + p(y)) − π p(y)] + β v(π , G (y, z))φ(dz) (3.23)
π
The solution to this dynamic programming problem is an optimal policy expressing either π 0 or c
as a function of the state (π, y)
• Each one determines the other, since c(π, y) = π (y + p(y)) − π 0 (π, y) p(y)
Equilibrium constraints Since the consumption good is not storable, in equilibrium we must
have ct = yt for all t
In addition, since there is one representative consumer (alternatively, since all consumers are iden-
tical), there should be no trade in equilibrium
In particular, the representative consumer owns the whole tree in every period, so πt = 1 for all t
Prices must adjust to satisfy these two constraints
The equilibrium price function Now observe that the first order condition for (3.23) can be
written as Z
u0 (c) p(y) = β v10 (π 0 , G (y, z))φ(dz)
Next we impose the equilibrium constraints while combining the last two equations to get
u0 [ G (y, z)]
Z
p(y) = β [ G (y, z) + p( G (y, z))]φ(dz) (3.24)
u0 (y)
Solving the Model Equation (3.24) is a functional equation in the unknown function p
The solution is an equilibrium price function p∗
Let’s look at how to obtain it
Setting up the problem Instead of solving for it directly we’ll follow Lucas’ indirect approach,
first setting
f (y) := u0 (y) p(y) (3.26)
so that (3.24) becomes Z
f (y) = h(y) + β f [ G (y, z)]φ(dz) (3.27)
u0 [ G (y, z)] G (y, z)φ(dz) is a function that depends only on the primitives
R
Here h(y) := β
Equation (3.27) is a functional equation in f
The plan is to solve out for f and convert back to p via (3.26)
To solve (3.27) we’ll use a standard method: convert it to a fixed point problem
First we introduce the operator T mapping f into T f as defined by
Z
( T f )(y) = h(y) + β f [ G (y, z)]φ(dz) (3.28)
The reason we do this is that a solution to (3.27) now corresponds to a function f ∗ satisfying
( T f ∗ )(y) = f ∗ (y) for all y
In other words, a solution is a fixed point of T
This means that we can use fixed point theory to obtain and compute the solution
A little fixed point theory Let cbR+ be the set of continuous bounded functions f : R+ → R+
We now show that
1. T has exactly one fixed point f ∗ in cbR+
2. For any f ∈ cbR+ , the sequence T k f converges uniformly to f ∗
(Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the next
section)
Recall the Banach contraction mapping theorem
It tells us that the previous statements will be true if we can find an α < 1 such that
k T f − Tgk ≤ αk f − gk, ∀ f , g ∈ cbR+ (3.29)
Here k hk := supx∈R+ |h( x )|
To see that (3.29) is valid, pick any f , g ∈ cbR+ and any y ∈ R+
Observe that, since integrals get larger when absolute values are moved to the inside,
Z Z
| T f (y) − Tg(y)| = β f [ G (y, z)]φ(dz) − β g[ G (y, z)]φ(dz)
Z
≤β | f [ G (y, z)] − g[ G (y, z)]| φ(dz)
Z
≤β k f − gkφ(dz)
= βk f − gk
Since the right hand side is an upper bound, taking the sup over all y on the left hand side gives
(3.29) with α := β
Computation – An Example The preceding discussion tells that we can compute f ∗ by picking
any arbitrary f ∈ cbR+ and then iterating with T
The equilibrium price function p∗ can then be recovered by p∗ (y) = f ∗ (y)/u0 (y)
Let’s try this when ln yt+1 = α ln yt + σet+1 where {et } is iid and standard normal
Utility will take the isoelastic form u(c) = c1−γ /(1 − γ), where γ > 0 is the coefficient of relative
risk aversion
Some code to implement the iterative computational procedure can be found in lucastree.py from
the QuantEcon.applications repo
We repeat it here for convenience
r"""
Filename: lucastree.py
Solves the price function for the Lucas tree in a continuous state
setting, using piecewise linear approximation for the sequence of
candidate price functions. The consumption endownment follows the log
linear AR(1) process
.. math::
where y' is a next period y and epsilon is an iid standard normal shock.
Hence
.. math::
where
.. math::
.. math::
"""
from __future__ import division # == Omit for Python 3.x == #
from textwrap import dedent
import numpy as np
from scipy import interp
from scipy.stats import lognorm
from scipy.integrate import fixed_quad
from quantecon import compute_fixed_point
class LucasTree(object):
"""
Class to solve for the price of a the Lucas tree in the Lucas
asset pricing model
Parameters
----------
gamma : scalar(float)
Attributes
----------
gamma, beta, alpha, sigma, grid : see Parameters
grid_min, grid_max, grid_size : scalar(int)
Properties for grid upon which prices are evaluated
phi : scipy.stats.lognorm
The distribution for the shock process
Examples
--------
>>> tree = LucasTree(gamma=2, beta=0.95, alpha=0.90, sigma=0.1)
>>> grid, price_vals = tree.grid, tree.compute_lt_price()
"""
# == set up grid == #
if grid is None:
(self.grid, self.grid_min,
self.grid_max, self.grid_size) = self._new_grid()
else:
self.grid = np.asarray(grid)
self.grid_min = min(grid)
self.grid_max = max(grid)
self.grid_size = len(grid)
def __repr__(self):
m = "LucasTree(gamma={g}, beta={b}, alpha={a}, sigma={s})"
return m.format(g=self.gamma, b=self.beta, a=self.alpha, s=self.sigma)
def __str__(self):
m = """\
Lucas Pricing Model (Lucas, 1978):
- gamma (coefficient of risk aversion) : {g}
- beta (discount parameter) : {b}
- alpha (correlation coefficient in shock process) : {a}
- sigma (volatility of shock process) : {s}
- grid bounds (bounds for where to compute prices) : ({gl:g}, {gu:g})
- grid points (number of grid points) : {gs}
"""
return dedent(m.format(g=self.gamma, b=self.beta, a=self.alpha,
s=self.sigma, gl=self.grid_min,
gu=self.grid_max, gs=self.grid_size))
def _init_h(self):
"""
Compute the function h in the Lucas operator as a vector of
values on the grid
h = np.empty(grid_size)
for i, y in enumerate(grid):
# == u'(G(y,z)) G(y,z) == #
integrand = lambda z: (y**alpha * z)**(1 - gamma)
h[i] = beta * self.integrate(integrand)
return h
def _new_grid(self):
"""
Construct the default grid for the problem
Parameters
----------
g : function
The function which to integrate
Returns
-------
result : scalar(float)
The result of the integration
"""
# == Simplify notation == #
phi = self.phi
if int_min is None:
int_min = self._int_min
if int_max is None:
int_max = self._int_max
Parameters
----------
f : array_like(float)
A candidate function on R_+ represented as points on a grid
and should be flat NumPy array with len(f) = len(grid)
Tf : array_like(float)
Returns
-------
Tf : array_like(float)
The updated function Tf
Notes
-----
The argument `Tf` is optional, but recommended. If it is passed
into this function, then we do not have to allocate any memory
for the array here. As this function is often called many times
in an iterative algorithm, this can save significant computation
time.
"""
grid, h = self.grid, self.h
alpha, beta = self.alpha, self.beta
for i, y in enumerate(grid):
Tf[i] = h[i] + beta * self.integrate(lambda z: Af(y**alpha * z))
return Tf
Parameters
----------
error_tol, max_iter, verbose
Arguments to be passed directly to
`quantecon.compute_fixed_point`. See that docstring for more
information
Returns
-------
price : array_like(float)
The prices at the grid points in the attribute `grid` of the
object
"""
# == simplify notation == #
grid, grid_size = self.grid, self.grid_size
lucas_operator, gamma = self.lucas_operator, self.gamma
price = f * grid**gamma
return price
The price is increasing, even if we remove all serial correlation from the endowment process
The reason is that a larger current endowment reduces current marginal utility
The price must therefore rise to induce the household to consume the entire endowment (and
hence satisfy the resource constraint)
What happens with a more patient consumer?
Here the blue line corresponds to the previous parameters and the green line is price when β =
0.98
We see that when consumers are more patient the asset becomes more valuable, and the price of
the Lucas tree shifts up
Exercise 1 asks you to replicate this figure
Exercises
Exercise 1 Replicate the figure to show how discount rates affect prices
Solutions
Solution notebook
Overview
In this lecture we describe the structure of a class of models that build on work by Truman Bewley
[Bew77]
We begin by discussing an example of a Bewley model due to Rao Aiyagari
The model features
• Heterogeneous agents
The Economy
subject to
at+1 + ct ≤ wzt + (1 + r ) at ct ≥ 0, and at ≥ − B
where
• ct is current consumption
• at is assets
• zt is an exogenous component of labor income capturing stochastic unemployment risk, etc.
• w is a wage rate
• r is a net interest rate
• B is the borrowing constraint
The exogenous process {zt } follows a finite state Markov chain with given stochastic matrix P
The wage and interest rate are fixed over time
In this simple version of the model, households supply labor inelastically because they do not
value leisure
Firms
From the first-order condition with respect to capital, the firm’s inverse demand for capital is
1− α
N
r = Aα (3.30)
K
Using this expression and the firm’s first-order condition for labor, we can pin down the equilib-
rium wage rate as a function of r as
2. determine corresponding prices, with interest rate r determined by (3.30) and a wage rate
w(r ) as given in (3.31)
3. determine the common optimal savings policy of the households given these prices
4. compute aggregate capital as the mean of steady state capital given this savings policy
If this final quantity agrees with K then we have a SREE
Code
import numpy as np
from numba import jit
class Household(object):
"""
This class takes the parameters that define a household asset accumulation
problem and computes the corresponding reward and transition matrices R
and Q required to generate an instance of DiscreteDP, and thereby solve
for the optimal policy.
"""
def __init__(self,
r=0.01, # interest rate
w=1.0, # wages
beta=0.96, # discount factor
a_min=1e-10,
Pi = [[0.9, 0.1], [0.1, 0.9]], # Markov chain
z_vals=[0.1, 1.0], # exogenous states
a_max=18,
a_size=200):
self.Pi = np.asarray(Pi)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)
def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Pi)
def build_R(self):
self.R.fill(-np.inf)
populate_R(self.R, self.a_size, self.z_size, self.a_vals, self.z_vals, self.r, self.w)
@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a - a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility
@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Pi):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i * z_size + next_z_i] = Pi[z_i, next_z_i]
@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
for a_i in range(a_size):
for z_i in range(z_size):
a_probs[a_i] += s_probs[a_i * z_size + z_i]
return a_probs
In the following examples our import statements assume that this code is stored as
aiyagari_household.py in the present working directory
As a first example of what we can do, let’s plot an optimal accumulation policy at a given interest
rate
"""
Created on Wed Sep 23 17:00:17 EDT 2015
@authors: John Stachurski, Thomas Sargent
"""
import numpy as np
import quantecon as qe
# Example prices
r = 0.03
w = 0.956
# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size
# Get all optimal actions across the set of a indices with z fixed in each row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]
plt.show()
"""
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
from numba import jit
from aiyagari_household import Household, asset_marginal
from quantecon.markov import DiscreteDP
A = 2.5
N = 0.05
alpha = 0.33
beta = 0.96
def r_to_w(r):
return A * (1 - alpha) * (alpha / (1 + r))**(alpha / (1 - alpha))
def rd(K):
return A * alpha * (N / K)**(1 - alpha)
Parameters:
----------
am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, beta)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)
plt.show()
Contents
• Modeling Career Choice
– Overview
– Model
– Implementation: career.py
– Exercises
– Solutions
Overview
Next we study a computational problem concerning career and job choices. The model is origi-
nally due to Derek Neal [Nea99] and this exposition draws on the presentation in [LS12], section
6.5.
Model features
• career and job within career both chosen to maximize expected discounted wage flow
• infinite horizon dynamic programming with two states variables
Model
Notice that the worker does not have the option to retain a job but redraw a career — starting a
new career always requires starting a new job
A young worker aims to maximize the expected sum of discounted wages
∞
E ∑ β t wt (3.32)
t =0
I = θ + e + βV (θ, e) (3.33)
Z Z
II = θ + e0 G (de0 ) + β V (θ, e0 ) G (de0 )
Z Z Z Z
0 0 0 0
III = θ F (dθ ) + e G (de ) + β V (θ 0 , e0 ) G (de0 ) F (dθ 0 )
Evidently I, I I and I I I correspond to “stay put”, “new job” and “new life”, respectively
Parameterization As in [LS12], section 6.5, we will focus on a discrete version of the model,
parameterized as follows:
• both θ and e take values in the set np.linspace(0, B, N) — an even grid of N points be-
tween 0 and B inclusive
• N = 50
• B=5
• β = 0.95
The distributions F and G are discrete distributions generating draws from the grid points
np.linspace(0, B, N)
A very useful family of discrete distributions is the Beta-binomial family, with probability mass
function
n B(k + a, n − k + b)
p(k | n, a, b) = , k = 0, . . . , n
k B( a, b)
Interpretation:
• draw q from a Beta distribution with shape parameters ( a, b)
• run n independent binary trials, each with success probability q
• p(k | n, a, b) is the probability of k successes in these n trials
Nice properties:
Implementation: career.py
The QuantEcon.applications repo provides some code for solving the DP problem described above
See in particular this file, which is repeated here for convenience
"""
Filename: career.py
A class to solve the career / job choice model due to Derek Neal.
References
----------
http://quant-econ.net/py/career.html
"""
from textwrap import dedent
import numpy as np
from quantecon.distributions import BetaBinomial
class CareerWorkerProblem(object):
"""
An instance of the class is an object with data on a particular
problem of this type, including probabilites, discount factor and
sample space for the variables.
Parameters
----------
beta : scalar(float), optional(default=5.0)
Discount factor
B : scalar(float), optional(default=0.95)
Upper bound of for both epsilon and theta
N : scalar(int), optional(default=50)
Number of possible realizations for both epsilon and theta
F_a : scalar(int or float), optional(default=1)
Parameter `a` from the career distribution
F_b : scalar(int or float), optional(default=1)
Parameter `b` from the career distribution
G_a : scalar(int or float), optional(default=1)
Parameter `a` from the job distribution
G_b : scalar(int or float), optional(default=1)
Parameter `b` from the job distribution
Attributes
----------
beta, B, N : see Parameters
theta : array_like(float, ndim=1)
A grid of values from 0 to B
epsilon : array_like(float, ndim=1)
A grid of values from 0 to B
F_probs : array_like(float, ndim=1)
The probabilities of different values for F
G_probs : array_like(float, ndim=1)
The probabilities of different values for G
F_mean : scalar(float)
The mean of the distribution for F
G_mean : scalar(float)
The mean of the distribution for G
"""
def __repr__(self):
m = "CareerWorkerProblem(beta={b:g}, B={B:g}, N={n:g}, F_a={fa:g}, "
m += "F_b={fb:g}, G_a={ga:g}, G_b={gb:g})"
return m.format(b=self.beta, B=self.B, n=self.N, fa=self._F_a,
fb=self._F_b, ga=self._G_a, gb=self._G_b)
def __str__(self):
m = """\
CareerWorkerProblem (Neal, 1999)
- beta (discount factor) : {b:g}
- B (upper bound for epsilon and theta) : {B:g}
- N (number of realizations of epsilon and theta) : {n:g}
- F_a (parameter a from career distribution) : {fa:g}
- F_b (parameter b from career distribution) : {fb:g}
- G_a (parameter a from job distribution) : {ga:g}
- G_b (parameter b from job distribution) : {gb:g}
"""
return dedent(m.format(b=self.beta, B=self.B, n=self.N, fa=self._F_a,
fb=self._F_b, ga=self._G_a, gb=self._G_b))
Parameters
----------
v : array_like(float)
A 2D NumPy array representing the value function
Interpretation: :math:`v[i, j] = v(\theta_i, \epsilon_j)`
Returns
-------
new_v : array_like(float)
The updated value function Tv as an array of shape v.shape
"""
new_v = np.empty(v.shape)
for i in range(self.N):
for j in range(self.N):
# stay put
v1 = self.theta[i] + self.epsilon[j] + self.beta * v[i, j]
# new job
v2 = (self.theta[i] + self.G_mean + self.beta *
np.dot(v[i, :], self.G_probs))
# new life
v3 = (self.G_mean + self.F_mean + self.beta *
np.dot(self.F_probs, np.dot(v, self.G_probs)))
new_v[i, j] = max(v1, v2, v3)
return new_v
Parameters
----------
v : array_like(float)
A 2D NumPy array representing the value function
Interpretation: :math:`v[i, j] = v(\theta_i, \epsilon_j)`
Returns
-------
policy : array_like(float)
A 2D NumPy array, where policy[i, j] is the optimal action
at :math:`(\theta_i, \epsilon_j)`.
"""
policy = np.empty(v.shape, dtype=int)
for i in range(self.N):
for j in range(self.N):
v1 = self.theta[i] + self.epsilon[j] + self.beta * v[i, j]
v2 = (self.theta[i] + self.G_mean + self.beta *
np.dot(v[i, :], self.G_probs))
v3 = (self.G_mean + self.F_mean + self.beta *
np.dot(self.F_probs, np.dot(v, self.G_probs)))
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
policy[i, j] = action
return policy
Hence we can reproduce figures 6.5.1 and 6.5.2 shown there, which exhibit the value function and
optimal policy respectively
Here’s the value function
Exercises
Exercise 1 Using the default parameterization in the class CareerWorkerProblem, generate and
plot typical sample paths for θ and e when the worker follows the optimal policy
In particular, modulo randomness, reproduce the following figure (where the horizontal axis rep-
resents time)
Hint: To generate the draws from the distributions F and G, use the class DiscreteRV
Exercise 2 Let’s now consider how long it takes for the worker to settle down to a permanent
job, given a starting point of (θ, e) = (0, 0)
In other words, we want to study the distribution of the random variable
T ∗ := the first point in time from which the worker’s job no longer changes
Evidently, the worker’s job becomes permanent if and only if (θt , et ) enters the “stay put” region
of (θ, e) space
Letting S denote this region, T ∗ can be expressed as the first passage time to S under the optimal
policy:
T ∗ := inf{t ≥ 0 | (θt , et ) ∈ S}
Collect 25,000 draws of this random variable and compute the median (which should be about 7)
Repeat the exercise with β = 0.99 and interpret the change
Exercise 3 As best you can, reproduce the figure showing the optimal policy
Hint: The get_greedy() function returns a representation of the optimal policy where values 1,
2 and 3 correspond to “stay put”, “new job” and “new life” respectively. Use this and contourf
from matplotlib.pyplot to produce the different shadings.
Now set G_a = G_b = 100 and generate a new figure with these parameters. Interpret.
Solutions
Solution notebook
Contents
• On-the-Job Search
– Overview
– Model
– Implementation
– Solving for Policies
– Exercises
– Solutions
Overview
Model features
• job-specific human capital accumulation combined with on-the-job search
• infinite horizon dynamic programming with one state variable and two controls
Model
Let
• xt denote the time-t job-specific human capital of a worker employed at a given firm
• wt denote current wages
Let wt = xt (1 − st − φt ), where
• φt is investment in job-specific human capital for the current role
• st is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of { xt } is given by xt+1 = G ( xt , φt )
When search effort at t is st , the worker receives a new job offer with probability π (st ) ∈ [0, 1]
Value of offer is Ut+1 , where {Ut } is iid with common distribution F
Worker has the right to reject the current offer and continue with existing job.
In particular, xt+1 = Ut+1 if accepts and xt+1 = G ( xt , φt ) if rejects
Letting bt+1 ∈ {0, 1} be binary with bt+1 = 1 indicating an offer, we can write
Agent’s objective: maximize expected discounted sum of wages via controls {st } and {φt }
Taking the expectation of V ( xt+1 ) and using (3.34), the Bellman equation for this problem can be
written as
Z
V ( x ) = max x (1 − s − φ) + β(1 − π (s))V [ G ( x, φ)] + βπ (s) V [ G ( x, φ) ∨ u] F (du) . (3.35)
s + φ ≤1
Back-of-the-Envelope Calculations Before we solve the model, let’s make some quick calcula-
tions that provide intuition on what the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
1. invest in capital specific to the current job via φ
2. search for a new job with better job-specific capital match via s
Since wages are x (1 − s − φ), marginal cost of investment via either φ or s is identical
Our risk neutral worker should focus on whatever instrument has the highest expected return
The relative expected return will depend on x
For example, suppose first that x = 0.05
• If s = 1 and φ = 0, then since G ( x, φ) = 0, taking expectations of (3.34) gives expected next
period capital equal to π (s)EU = EU = 0.5
• If s = 0 and φ = 1, then next period capital is G ( x, φ) = G (0.05, 1) ≈ 0.23
Both rates of return are good, but the return from search is better
Next suppose that x = 0.4
• If s = 1 and φ = 0, then expected next period capital is again 0.5
• If s = 0 and φ = 1, then G ( x, φ) = G (0.4, 1) ≈ 0.8
Return from investment via φ dominates expected return from search
Combining these observations gives us two informal predictions:
1. At any given state x, the two controls φ and s will function primarily as substitutes — worker
will focus on whichever instrument has the higher expected return
2. For sufficiently small x, search will be preferable to investment in job-specific human capital.
For larger x, the reverse will be true
Now let’s turn to implementation, and see if we can match our predictions.
Implementation
The QuantEcon package provides some code for solving the DP problem described above
See in particular jv.py, which is repeated here for convenience
"""
Filename: jv.py
References
-----------
http://quant-econ.net/py/jv.html
"""
from textwrap import dedent
import sys
import numpy as np
from scipy.integrate import fixed_quad as integrate
from scipy.optimize import minimize
import scipy.stats as stats
from scipy import interp
# The SLSQP method is faster and more stable, but it didn't give the
# correct answer in python 3. So, if we are in python 2, use SLSQP, otherwise
# use the only other option (to handle constraints): COBYLA
if sys.version_info[0] == 2:
method = "SLSQP"
else:
# python 3
method = "COBYLA"
class JvWorker(object):
r"""
A Jovanovic-type model of employment with on-the-job search. The
value function is given by
.. math::
for
.. math::
Here
* x = human capital
* s = search effort
* :math:`\phi` = investment in human capital
* :math:`\pi(s)` = probability of new offer given search level s
* :math:`x(1 - \phi - s)` = wage
* :math:`G(x, \phi)` = new human capital when current job retained
* U = RV with distribution F -- new draw of human capital
Parameters
----------
A : scalar(float), optional(default=1.4)
Parameter in human capital transition function
alpha : scalar(float), optional(default=0.6)
Parameter in human capital transition function
beta : scalar(float), optional(default=0.96)
Discount factor
grid_size : scalar(int), optional(default=50)
Grid size for discretization
G : function, optional(default=lambda x, phi: A * (x * phi)**alpha)
Transition function for human captial
pi : function, optional(default=sqrt)
Function mapping search effort (:math:`s \in (0,1)`) to
probability of getting new job offer
F : distribution, optional(default=Beta(2,2))
Distribution from which the value of new job offers is drawn
Attributes
----------
A, alpha, beta : see Parameters
x_grid : array_like(float)
The grid over the human capital
"""
def __repr__(self):
m = "JvWorker(A={a:g}, alpha={al:g}, beta={b:g}, grid_size={gs})"
def __str__(self):
m = """\
Jovanovic worker (on the job search):
- A (parameter in human capital transition function) : {a:g}
- alpha (parameter in human capital transition function) : {al:g}
- beta (parameter in human capital transition function) : {b:g}
- grid_size (number of grid points for human capital) : {gs}
- grid_max (maximum of grid for human capital) : {gm:g}
"""
return dedent(m.format(a=self.A, al=self.alpha, b=self.beta,
gs=self.x_grid.size, gm=self.x_grid.max()))
Parameters
----------
V : array_like(float)
Array representing an approximate value function
brute_force : bool, optional(default=False)
Default is False. If the brute_force flag is True, then grid
search is performed at each maximization step.
return_policies : bool, optional(default=False)
Indicates whether to return just the updated value function
TV or both the greedy policy computed from V and TV
Returns
-------
s_policy : array_like(float)
The greedy policy computed from V. Only returned if
return_policies == True
new_V : array_like(float)
The updated value function Tv, as an array representing the
values TV(x) over x in x_grid.
"""
# === simplify names, set up arrays, etc. === #
G, pi, F, beta = self.G, self.pi, self.F, self.beta
Vf = lambda x: interp(x, self.x_grid, V)
N = len(self.x_grid)
new_V, s_policy, phi_policy = np.empty(N), np.empty(N), np.empty(N)
if return_policies:
return s_policy, phi_policy
else:
return new_V
TV ( x ) = − min w(s, φ)
s + φ ≤1
where
Z
w(s, φ) := − x (1 − s − φ) + β(1 − π (s))V [ G ( x, φ)] + βπ (s) V [ G ( x, φ) ∨ u] F (du) (3.36)
Here we are minimizing instead of maximizing to fit with SciPy’s optimization routines
When we represent V, it will be with a NumPy array V giving values on grid x_grid
But to evaluate the right-hand side of (3.36), we need a function, so we replace the arrays V and
x_grid with a function Vf that gives linear iterpolation of V on x_grid
Hence in the preliminaries of bellman_operator
• from the array V we define a linear interpolation Vf of its values
– c1 is used to implement the constraint s + φ ≤ 1
– c2 is used to implement s ≥ e, a numerically stable
alternative to the true constraint s ≥ 0
– c3 does the same for φ
Inside the for loop, for each x in the grid over the state space, we set up the function w(z) =
w(s, φ) defined in (3.36).
The function is minimized over all feasible (s, φ) pairs, either by
• a relatively sophisticated solver from SciPy called fmin_slsqp, or
• brute force search over a grid
The former is much faster, but convergence to the global optimum is not guaranteed. Grid search
is a simple way to check results
Let’s plot the optimal policies and see what they look like
The code is in a file jv/jv_test.py from the applications repository and looks as follows
import matplotlib.pyplot as plt
from quantecon import compute_fixed_point
from quantecon.models import JvWorker
wp = JvWorker(grid_size=25)
v_init = wp.x_grid * 0.5
V = compute_fixed_point(wp.bellman_operator, v_init, max_iter=40)
s_policy, phi_policy = wp.bellman_operator(V, return_policies=True)
The horizontal axis is the state x, while the vertical axis gives s( x ) and φ( x )
Overall, the policies match well with our predictions from section.
• Worker switches from one investment strategy to the other depending on relative return
• For low values of x, the best option is to search for a new job
• Once x is larger, worker does better by investing in human capital specific to the current
position
Exercises
Exercise 1 Let’s look at the dynamics for the state process { xt } associated with these policies.
The dynamics are given by (3.34) when φt and st are chosen according to the optimal policies, and
P { bt + 1 = 1 } = π ( s t ) .
By examining the plot, argue that under the optimal policies, the state xt will converge to a con-
stant value x̄ close to unity
Argue that at the steady state, st ≈ 0 and φt ≈ 0.6.
Exercise 2 In the preceding exercise we found that st converges to zero and φt converges to about
0.6
Since these results were calculated at a value of β close to one, let’s compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are a
function of steady state capital.
You can take it as given—it’s certainly true—that the infinitely patient worker does not search in
the long run (i.e., st = 0 for large t)
Thus, given φ, steady state capital is the positive fixed point x ∗ (φ) of the map x 7→ G ( x, φ).
Steady state wages can be written as w∗ (φ) = x ∗ (φ)(1 − φ)
Graph w∗ (φ) with respect to φ, and examine the best choice of φ
Can you give a rough interpretation for the value that you see?
Solutions
Solution notebook
Contents
• Search with Offer Distribution Unknown
– Overview
– Model
– Take 1: Solution by VFI
– Take 2: A More Efficient Method
– Exercises
– Solutions
Overview
In this lecture we consider an extension of the job search model developed by John J. McCall
[McC70]
In the McCall model, an unemployed worker decides when to accept a permanent position at a
specified wage, given
• his or her discount rate
• the level of unemployment compensation
• the distribution from which wage offers are drawn
In the version considered below, the wage distribution is unknown and must be learned
• Based on the presentation in [LS12], section 6.6
Model features
• Infinite horizon dynamic programming with two states and one binary control
• Bayesian updating to learn the unknown distribution
Model
Let’s first recall the basic McCall model [McC70] and then add the variation we want to consider
The Basic McCall Model Consider an unemployed worker who is presented in each period with
a permanent job offer at wage wt
At time t, our worker has two choices
1. Accept the offer and work permanently at constant wage wt
2. Reject the offer, receive unemployment compensation c, and reconsider next period
The wage sequence {wt } is iid and generated from known density h
The worker aims to maximize the expected discounted sum of earnings E ∑∞ t
t =0 β y t
Trade-off:
• Waiting too long for a good offer is costly, since the future is discounted
• Accepting too early is costly, since better offers will arrive with probability one
Let V (w) denote the maximal expected discounted sum of earnings that can be obtained by an
unemployed worker who starts with wage offer w in hand
The function V satisfies the recursion
w
Z
0 0 0
V (w) = max , c+β V (w )h(w )dw (3.37)
1−β
where the two terms on the r.h.s. are the respective payoffs from accepting and rejecting the
current offer w
The optimal policy is a map from states into actions, and hence a binary function of w
Not surprisingly, it turns out to have the form 1{w ≥ w̄}, where
• w̄ is a constant depending on ( β, h, c) called the reservation wage
• 1{w ≥ w̄} is an indicator function returning 1 if w ≥ w̄ and 0 otherwise
• 1 indicates “accept” and 0 indicates “reject”
For further details see [LS12], section 6.3
Offer Distribution Unknown Now let’s extend the model by considering the variation pre-
sented in [LS12], section 6.6
The model is as above, apart from the fact that
• the density h is unknown
• the worker learns about h by starting with a prior and updating based on wage offers that
he/she observes
The worker knows there are two possible distributions F and G — with densities f and g
At the start of time, “nature” selects h to be either f or g — the wage distribution from which the
entire sequence {wt } will be drawn
This choice is not observed by the worker, who puts prior probability π0 on f being chosen
Update rule: worker’s time t estimate of the distribution is πt f + (1 − πt ) g, where πt updates via
π t f ( w t +1 )
π t +1 = (3.38)
π t f ( w t +1 ) + (1 − π t ) g ( w t +1 )
This last expression follows from Bayes’ rule, which tells us that
P {W = w | h = f } P { h = f }
P{ h = f | W = w } =
P {W = w }
and P{W = w} = ∑ P {W = w | h = ψ } P { h = ψ }
ψ∈{ f ,g}
The fact that (3.38) is recursive allows us to progress to a recursive solution method
Letting
π f (w)
h π ( w ) : = π f ( w ) + (1 − π ) g ( w ) and q(w, π ) :=
π f ( w ) + (1 − π ) g ( w )
we can express the value function for the unemployed worker recursively as follows
w
Z
V (w, π ) = max , c + β V (w0 , π 0 ) hπ (w0 ) dw0 where π 0 = q(w0 , π ) (3.39)
1−β
Notice that the current guess π is a state variable, since it affects the worker’s perception of prob-
abilities for future rewards
Looking Forward What kind of optimal policy might result from (3.39) and the parameterization
specified above?
Intuitively, if we accept at wa and wa ≤ wb , then — all other things being given — we should also
accept at wb
This suggests a policy of accepting whenever w exceeds some threshold value w̄
But w̄ should depend on π — in fact it should be decreasing in π because
• f is a less attractive offer distribution than g
Let’s set about solving the model and see how our results match with our intuition
We begin by solving via value function iteration (VFI), which is natural but ultimately turns out
to be second best
VFI is implemented in the file odu/odu.py contained in the QuantEcon.applications repo
The code is as follows
"""
Filename: odu.py
"""
from textwrap import dedent
from scipy.interpolate import LinearNDInterpolator
from scipy.integrate import fixed_quad
from scipy.stats import beta as beta_distribution
from scipy import interp
from numpy import maximum as npmax
import numpy as np
class SearchProblem(object):
"""
A class to store a given parameterization of the "offer distribution
unknown" model.
Parameters
----------
beta : scalar(float), optional(default=0.95)
The discount parameter
c : scalar(float), optional(default=0.6)
The unemployment compensation
F_a : scalar(float), optional(default=1)
First parameter of beta distribution on F
F_b : scalar(float), optional(default=1)
Second parameter of beta distribution on F
G_a : scalar(float), optional(default=3)
Attributes
----------
beta, c, w_max : see Parameters
w_grid : np.ndarray
Grid points over wages, ndim=1
pi_grid : np.ndarray
Grid points over pi, ndim=1
grid_points : np.ndarray
Combined grid points, ndim=2
F : scipy.stats._distn_infrastructure.rv_frozen
Beta distribution with params (F_a, F_b), scaled by w_max
G : scipy.stats._distn_infrastructure.rv_frozen
Beta distribution with params (G_a, G_b), scaled by w_max
f : function
Density of F
g : function
Density of G
pi_min : scalar(float)
Minimum of grid over pi
pi_max : scalar(float)
Maximum of grid over pi
"""
def __repr__(self):
m = "SearchProblem(beta={b}, c={c}, F_a={fa}, F_b={fb}, G_a={ga}, "
m += "G_b={gb}, w_max={wu}, w_grid_size={wgs}, pi_grid_size={pgs})"
fa, fb = self.F.args
ga, gb = self.G.args
return m.format(b=self.beta, c=self.c, fa=fa, fb=fb, ga=ga,
gb=gb, wu=self.w_grid.max(),
wgs=self.w_grid.size, pgs=self.pi_grid.size)
def __str__(self):
m = """\
SearchProblem (offer distribution unknown):
- beta (discount factor) : {b:g}
- c (unemployment compensation) : {c}
- F (distribution F) : Beta({fa}, {fb:g})
- G (distribution G) : Beta({ga}, {gb:g})
- w bounds (bounds for wage offers) : ({wl:g}, {wu:g})
- w grid size (number of points in grid for wage) : {wgs}
- pi bounds (bounds for probability of dist f) : ({pl:g}, {pu:g})
- pi grid size (number of points in grid for pi) : {pgs}
"""
fa, fb = self.F.args
ga, gb = self.G.args
return dedent(m.format(b=self.beta, c=self.c, fa=fa, fb=fb, ga=ga,
gb=gb,
wl=self.w_grid.min(), wu=self.w_grid.max(),
wgs=self.w_grid.size,
pl=self.pi_grid.min(), pu=self.pi_grid.max(),
pgs=self.pi_grid.size))
Returns
-------
new_pi : scalar(float)
The updated probability
"""
return new_pi
Parameters
----------
v : array_like(float, ndim=1, length=len(pi_grid))
An approximate value function represented as a
one-dimensional array.
Returns
-------
new_v : array_like(float, ndim=1, length=len(pi_grid))
The updated value function
"""
# == Simplify names == #
f, g, beta, c, q = self.f, self.g, self.beta, self.c, self.q
vf = LinearNDInterpolator(self.grid_points, v)
N = len(v)
new_v = np.empty(N)
for i in range(N):
w, pi = self.grid_points[i, :]
v1 = w / (1 - beta)
integrand = lambda m: vf(m, q(m, pi)) * (pi * f(m)
+ (1 - pi) * g(m))
integral, error = fixed_quad(integrand, 0, self.w_max)
v2 = c + beta * integral
new_v[i] = max(v1, v2)
return new_v
Parameters
----------
v : array_like(float, ndim=1, length=len(pi_grid))
An approximate value function represented as a
one-dimensional array.
Returns
-------
policy : array_like(float, ndim=1, length=len(pi_grid))
The decision to accept or reject an offer where 1 indicates
accept and 0 indicates reject
"""
# == Simplify names == #
f, g, beta, c, q = self.f, self.g, self.beta, self.c, self.q
vf = LinearNDInterpolator(self.grid_points, v)
N = len(v)
policy = np.zeros(N, dtype=int)
for i in range(N):
w, pi = self.grid_points[i, :]
v1 = w / (1 - beta)
return policy
Updates the reservation wage function guess phi via the operator
Q.
Parameters
----------
phi : array_like(float, ndim=1, length=len(pi_grid))
This is reservation wage guess
Returns
-------
new_phi : array_like(float, ndim=1, length=len(pi_grid))
The updated reservation wage guess.
"""
# == Simplify names == #
beta, c, f, g, q = self.beta, self.c, self.f, self.g, self.q
# == Turn phi into a function == #
phi_f = lambda p: interp(p, self.pi_grid, phi)
new_phi = np.empty(len(phi))
for i, pi in enumerate(self.pi_grid):
def integrand(x):
"Integral expression on right-hand side of operator"
return npmax(x, phi_f(q(x, pi))) * (pi*f(x) + (1 - pi)*g(x))
integral, error = fixed_quad(integrand, 0, self.w_max)
new_phi[i] = (1 - beta) * c + beta * integral
return new_phi
The class SearchProblem is used to store parameters and methods needed to compute optimal
actions
The Bellman operator is implemented as the method bellman_operator(), while get_greedy()
computes an approximate optimal policy from a guess v of the value function
We will omit a detailed discussion of the code because there is a more efficient solution method
These ideas are implemented in the res_wage_operator method
Before explaining it let’s look quickly at solutions computed from value function iteration
Here’s the value function:
The optimal policy:
Code for producing these figures can be found in file odu/odu_vfi_plots.py from the applications
repository
The code takes several minutes to run
The results fit well with our intuition from section looking forward
• The black line in the figure above corresponds to the function w̄(π ) introduced there
• decreasing as expected
Another Functional Equation To begin, note that when w = w̄(π ), the worker is indifferent
between accepting and rejecting
Hence the two choices on the right-hand side of (3.39) have equal value:
w̄(π )
Z
= c+β V (w0 , π 0 ) hπ (w0 ) dw0 (3.40)
1−β
Together, (3.39) and (3.40) give
w w̄(π )
V (w, π ) = max , (3.41)
1−β 1−β
Combining (3.40) and (3.41), we obtain
w0 w̄(π 0 )
w̄(π )
Z
= c+β max , hπ (w0 ) dw0
1−β 1−β 1−β
Equation (3.42) can be understood as a functional equation, where w̄ is the unknown function
• Let’s call it the reservation wage functional equation (RWFE)
• The solution w̄ to the RWFE is the object that we wish to compute
Solving the RWFE To solve the RWFE, we will first show that its solution is the fixed point of a
contraction mapping
To this end, let
• b[0, 1] be the bounded real-valued functions on [0, 1]
• kψk := supx∈[0,1] |ψ( x )|
Comparing (3.42) and (3.43), we see that the set of fixed points of Q exactly coincides with the set
of solutions to the RWFE
• If Qw̄ = w̄ then w̄ solves (3.42) and vice versa
Moreover, for any ψ, φ ∈ b[0, 1], basic algebra and the triangle inequality for integrals tells us that
Z
max w0 , ψ ◦ q(w0 , π ) − max w0 , φ ◦ q(w0 , π ) hπ (w0 ) dw0 (3.44)
|( Qψ)(π ) − ( Qφ)(π )| ≤ β
Working case by case, it is easy to check that for real numbers a, b, c we always have
In other words, Q is a contraction of modulus β on the complete metric space (b[0, 1], k · k)
Hence
• A unique solution w̄ to the RWFE exists in b[0, 1]
• Qk ψ → w̄ uniformly as k → ∞, for any ψ ∈ b[0, 1]
Implementation These ideas are implemented in the res_wage_operator method from odu.py
as shown above
The method corresponds to action of the operator Q
The following exercise asks you to exploit these facts to compute an approximation to w̄
Exercises
Exercise 1 Use the default parameters and the res_wage_operator method to compute an opti-
mal policy
Your result should coincide closely with the figure for the optimal policy shown above
Try experimenting with different parameters, and confirm that the change in the optimal policy
coincides with your intuition
Solutions
Solution notebook
Contents
• Optimal Savings
– Overview
– The Optimal Savings Problem
– Computation
– Exercises
– Solutions
Overview
Next we study the standard optimal savings problem for an infinitely lived consumer—the “com-
mon ancestor” described in [LS12], section 1.3
• Also known as the income fluctuation problem
• An important sub-problem for many representative macroeconomic models
– [Aiy94]
– [Hug93]
– etc.
• Useful references include [Dea91], [DH10], [Kuh13], [Rab02], [Rei09] and [SE77]
Our presentation of the model will be relatively brief
• For further details on economic intuition, implication and models, see [LS12]
• Proofs of all mathematical results stated below can be found in this paper
In this lecture we will explore an alternative to value function iteration (VFI) called policy function
iteration (PFI)
• Based on the Euler equation, and not to be confused with Howard’s policy iteration algo-
rithm
• Globally convergent under mild assumptions, even when utility is unbounded (both above
and below)
• Numerically, turns out to be faster and more efficient than VFI for this model
Model features
• Infinite horizon dynamic programming with two states and one control
Consider a household that chooses a state-contingent consumption plan {ct }t≥0 to maximize
∞
E ∑ βt u(ct )
t =0
subject to
ct + at+1 ≤ Rat + zt , ct ≥ 0, at ≥ −b t = 0, 1, . . . (3.48)
Here
• β ∈ (0, 1) is the discount factor
• at is asset holdings at time t, with ad-hoc borrowing constraint at ≥ −b
• ct is consumption
• zt is non-capital income (wages, unemployment compensation, etc.)
• R := 1 + r, where r > 0 is the interest rate on savings
Assumptions
1. {zt } is a finite Markov process with Markov matrix Π taking values in Z
2. | Z | < ∞ and Z ⊂ (0, ∞)
3. r > 0 and βR < 1
4. u is smooth, strictly increasing and strictly concave with limc→0 u0 (c) = ∞ and
limc→∞ u0 (c) = 0
The asset space is [−b, ∞) and the state is the pair ( a, z) ∈ S := [−b, ∞) × Z
A feasible consumption path from ( a, z) ∈ S is a consumption sequence {ct } such that {ct } and its
induced asset path { at } satisfy
1. ( a0 , z0 ) = ( a, z)
2. the feasibility constraints in (3.48), and
where the supremum is over all feasible consumption paths from ( a, z).
An optimal consumption path from ( a, z) is a feasible consumption path from ( a, z) that attains the
supremum in (3.49)
Given our assumptions, it is known that
1. For each ( a, z) ∈ S, a unique optimal consumption path from ( a, z) exists
2. This path is the unique feasible path from ( a, z) satisfying the Euler equality
u0 (ct ) = max βR Et [u0 (ct+1 )] , u0 ( Rat + zt + b)
(3.50)
and the transversality condition
lim βt E [u0 (ct ) at+1 ] = 0. (3.51)
t→∞
Moreover, there exists an optimal consumption function c∗ : S → [0, ∞) such that the path from ( a, z)
generated by
( a0 , z0 ) = ( a, z), zt+1 ∼ Π(zt , dy), ct = c∗ ( at , zt ) and at+1 = Rat + zt − ct
satisfies both (3.50) and (3.51), and hence is the unique optimal path from ( a, z)
In summary, to solve the optimization problem, we need to compute c∗
Computation
where
J ( a, z) := {t ∈ R : min Z ≤ t ≤ Ra + z + b} (3.54)
We refer to K as Coleman’s policy function operator [Col90]
It is known that
• K is a contraction mapping on C under the metric
ρ(c, d) := k u0 ◦ c − u0 ◦ d k := sup | u0 (c(s)) − u0 (d(s)) | (c, d ∈ C )
s∈S
We have to be careful with VFI (i.e., iterating with T) in this setting because u is not assumed to be
bounded
• In fact typically unbounded both above and below — e.g. u(c) = log c
• In which case, the standard DP theory does not apply
• T n v is not guaranteed to converge to the value function for arbitrary continous bounded v
Nonetheless, we can always try the strategy “iterate and hope”
• In this case we can check the outcome by comparing with PFI
• The latter is known to converge, as described above
References
----------
http://quant-econ.net/py/ifp.html
"""
from textwrap import dedent
import numpy as np
from scipy.optimize import fminbound, brentq
from scipy import interp
class ConsumerProblem(object):
"""
A class for solving the income fluctuation problem. Iteration with
either the Coleman or Bellman operators from appropriate initial
conditions leads to convergence to the optimal consumption policy.
The income process is a finite state Markov chain. Note that the
Coleman operator is the preferred method, as it is almost always
faster and more accurate. The Bellman operator is only provided for
comparison.
Parameters
----------
r : scalar(float), optional(default=0.01)
A strictly positive scalar giving the interest rate
beta : scalar(float), optional(default=0.96)
The discount factor, must satisfy (1 + r) * beta < 1
Pi : array_like(float), optional(default=((0.60, 0.40),(0.05, 0.95))
A 2D NumPy array giving the Markov matrix for {z_t}
z_vals : array_like(float), optional(default=(0.5, 0.95))
The state space of {z_t}
b : scalar(float), optional(default=0)
The borrowing constraint
grid_max : scalar(float), optional(default=16)
Max of the grid used to solve the problem
grid_size : scalar(int), optional(default=50)
Number of grid points to solve problem, a grid on [-b, grid_max]
u : callable, optional(default=np.log)
The utility function
Attributes
----------
r, beta, Pi, z_vals, b, u, du : see Parameters
asset_grid : np.ndarray
One dimensional grid for assets
"""
def __repr__(self):
m = "ConsumerProblem(r={r:g}, beta={be:g}, Pi='{n:g} by {n:g}', "
m += "z_vals={z}, b={b:g}, grid_max={gm:g}, grid_size={gs:g}, "
m += "u={u}, du={du})"
return m.format(r=self.r, be=self.beta, n=self.Pi.shape[0],
z=self.z_vals, b=self.b,
gm=self.asset_grid.max(), gs=self.asset_grid.size,
u=self.u, du=self.du)
def __str__(self):
m = """
Consumer Problem (optimal savings):
- r (interest rate) : {r:g}
- beta (discount rate) : {be:g}
- Pi (transition matrix) : {n} by {n}
- z_vals (state space of shocks) : {z}
- b (borrowing constraint) : {b:g}
- grid_max (maximum of asset grid) : {gm:g}
- grid_size (number of points in asset grid) : {gs:g}
- u (utility function) : {u}
- du (marginal utility function) : {du}
"""
return dedent(m.format(r=self.r, be=self.beta, n=self.Pi.shape[0],
z=self.z_vals, b=self.b,
gm=self.asset_grid.max(),
gs=self.asset_grid.size, u=self.u,
du=self.du))
Parameters
----------
V : array_like(float)
A NumPy array of dim len(cp.asset_grid) times len(cp.z_vals)
return_policy : bool, optional(default=False)
Indicates whether to return the greed policy given V or the
updated value function TV. Default is TV.
Returns
-------
array_like(float)
Returns either the greed policy given V or the updated value
function TV.
"""
# === Simplify names, set up arrays === #
R, Pi, beta, u, b = self.R, self.Pi, self.beta, self.u, self.b
asset_grid, z_vals = self.asset_grid, self.z_vals
new_V = np.empty(V.shape)
new_c = np.empty(V.shape)
z_idx = list(range(len(z_vals)))
if return_policy:
return new_c
else:
return new_V
Parameters
----------
c : array_like(float)
A NumPy array of dim len(cp.asset_grid) times len(cp.z_vals)
Returns
-------
array_like(float)
The updated policy, where updating is by the Coleman
operator. function TV.
"""
# === simplify names, set up arrays === #
R, Pi, beta, du, b = self.R, self.Pi, self.beta, self.du, self.b
asset_grid, z_vals = self.asset_grid, self.z_vals
z_size = len(z_vals)
gamma = R * beta
vals = np.empty(z_size)
return Kc
def initialize(self):
"""
Creates a suitable initial conditions V and c for value function
and policy function iteration respectively.
Returns
-------
V : array_like(float)
Initial condition for value function iteration
c : array_like(float)
Initial condition for Coleman operator iteration
"""
# === Simplify names, set up arrays === #
R, beta, u, b = self.R, self.beta, self.u, self.b
asset_grid, z_vals = self.asset_grid, self.z_vals
return V, c
Exercises
Exercise 1 The first exercise is to replicate the following figure, which compares PFI and VFI as
solution methods
The figure shows consumption policies computed by iteration of K and T respectively
• In the case of iteration with T, the final value function is used to compute the observed policy
Consumption is shown as a function of assets with income z held fixed at its smallest value
The following details are needed to replicate the figure
• The parameters are the default parameters in the definition of consumerProblem
• The initial conditions are the default ones from initialize()
• Both operators are iterated 80 times
When you run your code you will observe that iteration with K is faster than iteration with T
In the IPython shell, a comparison of the operators can be made as follows
In [1]: run ifp.py
In [3]: cp = ConsumerProblem()
In [4]: v, c = cp.initialize()
Exercise 2 Next let’s consider how the interest rate affects consumption
Reproduce the following figure, which shows (approximately) optimal consumption policies for
different interest rates
• Other than r, all parameters are at their default values
• r steps through np.linspace(0, 0.04, 4)
• Consumption is plotted against assets for income shock fixed at the smallest value
The figure shows that higher interest rates boost savings and hence suppress consumption
Exercise 3 Now let’s consider the long run asset levels held by households
We’ll take r = 0.03 and otherwise use default parameters
The following figure is a 45 degree diagram showing the law of motion for assets when consump-
tion is optimal
The green line and blue line represent the function
a0 = h( a, z) := Ra + z − c∗ ( a, z)
Exercise 4 Following on from exercises 2 and 3, let’s look at how savings and aggregate asset
holdings vary with the interest rate
• Note: [LS12] section 18.6 can be consulted for more background on the topic treated in this
exercise
For a given parameterization of the model, the mean of the stationary distribution can be inter-
preted as aggregate capital in an economy with a unit mass of ex-ante identical households facing
idiosyncratic shocks
Let’s look at how this measure of aggregate capital varies with the interest rate and borrowing
constraint
The next figure plots aggregate capital against the interest rate for b in (1, 3)
Solutions
Solution notebook
Contents
• Covariance Stationary Processes
– Overview
– Introduction
– Spectral Analysis
– Implementation
Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models rou-
tinely used to study economic and financial time series
This class has the advantange of being
1. simple enough to be described by an elegant and comprehensive theory
2. relatively broad in terms of the kinds of dynamics it can represent
We consider these models in both the time and frequency domain
ARMA Processes We will focus much of our attention on linear covariance stationary models
with a finite number of parameters
In particular, we will study stationary ARMA processes, which form a cornerstone of the standard
theory of time series analysis
It’s well known that every ARMA processes can be represented in linear state space form
However, ARMA have some important structure that makes it valuable to study them separately
Spectral Analysis Analysis in the frequency domain is also called spectral analysis
In essence, spectral analysis provides an alternative representation of the autocovariance of a co-
variance stationary process
Having a second representation of this important object
• shines new light on the dynamics of the process in question
• allows for a simpler, more tractable representation in certain important cases
The famous Fourier transform and its inverse are used to map between the two representations
Introduction
Example 1: White Noise Perhaps the simplest class of covariance stationary processes is the
white noise processes
A process {et } is called a white noise process if
1. Eet = 0
2. γ(k ) = σ2 1{k = 0} for some σ > 0
(Here 1{k = 0} is defined to be 1 if k = 0 and zero otherwise)
Example 2: General Linear Processes From the simple building block provided by white noise,
we can construct a very flexible family of covariance stationary processes — the general linear
processes
∞
Xt = ∑ ψ j et − j , t∈Z (3.56)
j =0
where
• {et } is white noise
• {ψt } is a square summable sequence in R (that is, ∑∞
t=0 ψt < ∞)
2
By the Cauchy-Schwartz inequality one can show that the last expression is finite. Clearly it does
not depend on t
Wold’s Decomposition Remarkably, the class of general linear processes goes a long way to-
wards describing the entire class of zero-mean covariance stationary processes
In particular, Wold’s theorem states that every zero-mean covariance stationary process { Xt } can
be written as
∞
Xt = ∑ ψ j et − j + ηt
j =0
where
• {et } is white noise
• {ψt } is square summable
• ηt can be expressed as a linear function of Xt−1 , Xt−2 , . . . and is perfectly predictable over
arbitrarily long horizons
For intuition and further discussion, see [Sar87], p. 286
AR and MA General linear processes are a very broad class of processes, and it often pays to
specialize to those for which there exists a representation having only finitely many parameters
(In fact, experience shows that models with a relatively small number of parameters typically
perform better than larger models, especially for forecasting)
One very simple example of such a model is the AR(1) process
Applying (3.57) to the previous expression for Xt , we get the AR(1) autocovariance function
σ2
γ(k ) = φk , k = 0, 1, . . . (3.59)
1 − φ2
The next figure plots this function for φ = 0.8 and φ = −0.8 with σ = 1
Xt = et + θet−1
The AR(1) can be generalized to an AR(p) and likewise for the MA(1)
Putting all of this together, we get the
• algebraic manipulations treating the lag operator as an ordinary scalar often are legitimate
Using L, we can rewrite (3.60) as
L0 Xt − φ1 L1 Xt − · · · − φ p L p Xt = L0 et + θ1 L1 et + · · · + θq Lq et (3.61)
The sequence {ψt } can be obtained by a recursive procedure outlined on page 79 of [CC08]
In this context, the function t 7→ ψt is often called the impulse response function
Spectral Analysis
Autocovariance functions provide a great deal of infomation about covariance stationary pro-
cesses
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution
Even for non-Gaussian processes, it provides a significant amount of information
It turns out that there is an alternative representation of the autocovariance function of a covari-
ance stationary process, called the spectral density
At times, the spectral density is easier to derive, easier to manipulate and provides additional
intuition
Complex Numbers Before discussing the spectral density, we invite you to recall the main prop-
erties of complex numbers (or skip to the next section)
It can be helpful to remember that, in a formal sense, complex numbers are just points ( x, y) ∈ R2
endowed with a specific notion of multiplication
When ( x, y) is regarded as a complex number, x is called the real part and y is called the imaginary
part
The modulus or absolute value of a complex number z = ( x, y) is just its Euclidean norm in R2 , but
is usually written as |z| instead of kzk
The product of two complex numbers ( x, y) and (u, v) is defined to be ( xu − vy, xv + yu), while
addition is standard pointwise vector addition
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field — addition and multiplication play well together, just as they do in R
The complex number ( x, y) is often written as x + iy, where i is called the imaginary unit, and is
understood to obey i2 = −1
The x + iy notation can be thought of as an easy way to remember the definition of multiplication
given above, because, proceeding naively,
Converted back to our first notation, this becomes ( xu − vy, xv + yu), which is the same as the
product of ( x, y) and (u, v) from our previous definition
Complex numbers are also sometimes expressed in their polar form reiω , which should be inter-
preted as
reiω := r (cos(ω ) + i sin(ω ))
(Some authors normalize the expression on the right by constants such as 1/π — the chosen
convention makes little difference provided you are consistent)
Using the fact that γ is even, in the sense that γ(t) = γ(−t) for all t, you should be able to show
that
f (ω ) = γ(0) + 2 ∑ γ(k ) cos(ωk ) (3.64)
k ≥1
Example 1: White Noise Consider a white noise process {et } with standard deviation σ
It is simple to check that in this case we have f (ω ) = σ2 . In particular, f is a constant function
As we will see, this can be interpreted as meaning that “all frequencies are equally present”
(White light has this property when frequency refers to the visible spectrum, a connection that
provides the origins of the term “white noise”)
Example 2: AR and :index‘MA‘ and ARMA It is an exercise to show that the MA(1) process
Xt = θet−1 + et has spectral density
f (ω ) = σ2 (1 + 2θ cos(ω ) + θ 2 ) (3.65)
With a bit more effort, it’s possible to show (see, e.g., p. 261 of [Sar87]) that the spectral density of
the AR(1) process Xt = φXt−1 + et is
σ2
f (ω ) = (3.66)
1 − 2φ cos(ω ) + φ2
More generally, it can be shown that the spectral density of the ARMA process (3.60) is
θ (eiω ) 2 2
f (ω ) =
σ (3.67)
φ(eiω )
where
• σ is the standard deviation of the white noise process {et }
• the polynomials φ(·) and θ (·) are as defined in (3.62)
The derivation of (3.67) uses the fact that convolutions become products under Fourier transfor-
mations
The proof is elegant and can be found in many places — see, for example, [Sar87], chapter 11,
section 4
It’s a nice exercise to verify that (3.65) and (3.66) are indeed special cases of (3.67)
Interpreting the Spectral Density Plotting (3.66) reveals the shape of the spectral density for the
AR(1) model when φ takes the values 0.8 and -0.8 respectively
These spectral densities correspond to the autocovariance functions for the AR(1) process shown
above
Informally, we think of the spectral density as being large at those ω ∈ [0, π ] such that the autoco-
variance function exhibits significant cycles at this “frequency”
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral density
for the case φ = −0.8 is large at ω = π
Recall that the spectral density can be expressed as
When we evaluate this at ω = π, we get a large number because cos(πk ) is large and positive
when (−0.8)k is positive, and large in absolute value and negative when (−0.8)k is negative
Hence the product is always large and positive, and hence the sum of the products on the right-
hand side of (3.68) is large
These ideas are illustrated in the next figure, which has k on the horizontal axis (click to enlarge)
On the other hand, if we evaluate f (ω ) at ω = π/3, then the cycles are not matched, the sequence
γ(k ) cos(ωk ) contains both positive and negative terms, and hence the sum of these terms is much
smaller
In summary, the spectral density is large at frequencies ω where the autocovariance function ex-
hibits cycles
Inverting the Transformation We have just seen that the spectral density is useful in the sense
that it provides a frequency-based perspective on the autocovariance structure of a covariance
stationary process
Another reason that the spectral density is useful is that it can be “inverted” to recover the auto-
covariance function via the inverse Fourier transform
In particular, for all k ∈ Z, we have
1
Z π
γ(k) = f (ω )eiωk dω (3.69)
2π −π
This is convenient in situations where the spectral density is easier to calculate and manipulate
than the autocovariance function
(For example, the expression (3.67) for the ARMA spectral density is much easier to work with
than the expression for the ARMA autocovariance)
Mathematical Theory This section is loosely based on [Sar87], p. 249-253, and included for those
who
f = ∑ αk hk where αk := h f , hk i (3.70)
k
eiωk 1
Tγ = ∑ γ ( k ) √
2π
= √ f (ω )
2π
(3.71)
k ∈Z
In other words, apart from a scalar multiple, the spectral density is just an transformation of γ ∈ `2
under a certain linear isometry — a different way to view γ
Implementation
Most code for working with covariance stationary models deals with ARMA models
Python code for studying ARMA models can be found in the tsa submodule of statsmodels
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis — we’ve
put together the module arma.py, which is part of QuantEcon.py package.
The module provides functions for mapping ARMA(p, q) models into their
1. impulse response function
2. simulated time series
3. autocovariance function
4. spectral density
In additional to individual plots of these entities, we provide functionality to generate 2x2 plots
containing all this information
In other words, we want to replicate the plots on pages 68–69 of [LS12]
Here’s an example corresponding to the model Xt = 0.5Xt−1 + et − 0.8et−2
Provides functions for working with and visualizing scalar ARMA processes.
"""
import numpy as np
from numpy import conj, pi
import matplotlib.pyplot as plt
from scipy.signal import dimpulse, freqz, dlsim
class ARMA(object):
r"""
This class represents scalar ARMA(p, q) processes.
.. math::
.. math::
where
Parameters
----------
phi : scalar or iterable or array_like(float)
Autocorrelation values for the autocorrelated variable.
See above for explanation.
Attributes
----------
phi, theta, sigma : see Parmeters
ar_poly : array_like(float)
The polynomial form that is needed by scipy.signal to do the
processing we desire. Corresponds with the phi values
ma_poly : array_like(float)
The polynomial form that is needed by scipy.signal to do the
processing we desire. Corresponds with the theta values
"""
def __repr__(self):
m = "ARMA(phi=%s , theta=%s , sigma=%s )"
return m % (self.phi, self.theta, self.sigma)
def __str__(self):
m = "An ARMA({p}, {q}) process"
p = np.asarray(self.phi).size
q = np.asarray(self.theta).size
return m.format(p=p, q=q)
if rhs[0] == "+":
rhs = rhs[1:] # remove initial `+` if phi_1 was positive
@property
def phi(self):
return self._phi
@phi.setter
def phi(self, new_value):
self._phi = new_value
self.set_params()
@property
def theta(self):
return self._theta
@theta.setter
def theta(self, new_value):
self._theta = new_value
self.set_params()
def set_params(self):
r"""
Internally, scipy.signal works with systems of the form
.. math::
.. math::
"""
# === set up ma_poly === #
ma_poly = np.asarray(self._theta)
self.ma_poly = np.insert(ma_poly, 0, 1) # The array (1, theta)
Returns
-------
psi : array_like(float)
psi[j] is the response at lag j of the impulse response.
We take psi[0] as unity.
"""
sys = self.ma_poly, self.ar_poly, 1
times, psi = dimpulse(sys, n=impulse_length)
psi = psi[0].flatten() # Simplify return value into flat array
return psi
.. math::
Parameters
----------
two_pi : Boolean, optional
Compute the spectral density function over [0, pi] if
two_pi is False and [0, 2 pi] otherwise. Default value is
True
res : scalar or array_like(int), optional(default=1200)
If res is a scalar then the spectral density is computed at
`res` frequencies evenly spaced around the unit circle, but
if res is an array then the function computes the response
at the frequencies given by the array
Returns
-------
w : array_like(float)
The normalized frequencies at which h was computed, in
radians/sample
spect : array_like(float)
The frequency response
"""
w, h = freqz(self.ma_poly, self.ar_poly, worN=res, whole=two_pi)
spect = h * conj(h) * self.sigma**2
return w, spect
Parameters
----------
num_autocov : scalar(int), optional(default=16)
The number of autocovariances to calculate
"""
spect = self.spectral_density()[1]
acov = np.fft.ifft(spect).real
Parameters
----------
ts_length : scalar(int), optional(default=90)
Number of periods to simulate for
Returns
-------
vals : array_like(float)
A simulation of the model that corresponds to this class
"""
sys = self.ma_poly, self.ar_poly, 1
u = np.random.randn(ts_length, 1) * self.sigma
vals = dlsim(sys, u)[1]
return vals.flatten()
ax.set_xlabel('time')
ax.set_ylabel('response')
if show:
plt.show()
def quad_plot(self):
"""
Plots the impulse response, spectral_density, autocovariance,
and one realization of the process.
"""
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8))
plt.subplots_adjust(hspace=0.4)
plot_functions = [self.plot_impulse_response,
self.plot_spectral_density,
self.plot_autocovariance,
self.plot_simulation]
for plot_func, ax in zip(plot_functions, axes.flatten()):
plot_func(ax, show=False)
plt.show()
In [5]: lp.quad_plot()
If phi and theta are arrays or sequences, then the interpretation will be
• phi holds the vector of parameters (φ1 , φ2 , ..., φ p )
• theta holds the vector of parameters (θ1 , θ2 , ..., θq )
The parameter sigma is always a scalar, the standard deviation of the white noise
We also permit phi and theta to be scalars, in which case the model will be interpreted as
Xt = φXt−1 + et + θet−1
The two numerical packages most useful for working with ARMA models are scipy.signal and
numpy.fft
The package scipy.signal expects the parameters to be passed in to its functions in a manner
consistent with the alternative ARMA notation (3.63)
For example, the impulse response sequence {ψt } discussed above can be obtained using
scipy.signal.dimpulse, and the function call should be of the form
times, psi = dimpulse((ma_poly, ar_poly, 1), n=impulse_length)
where ma_poly and ar_poly correspond to the polynomials in (3.62) — that is,
• ma_poly is the vector (1, θ1 , θ2 , . . . , θq )
• ar_poly is the vector (1, −φ1 , −φ2 , . . . , −φ p )
To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their values
computed automatically from the values of phi and theta supplied by the user
If the user decides to change the value of either phi or theta ex-post by assignments such as
lp.phi = (0.5, 0.2)
lp.theta = (0, -0.1)
then ma_poly and ar_poly should update automatically to reflect these new parameters
This is achieved in our implementation by using descriptors
Computing the Autocovariance Function As discussed above, for ARMA processes the spectral
density has a simple representation that is relatively easy to calculate
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform
Here we use NumPy’s Fourier transform package np.fft, which wraps a standard Fortran-based
package called FFTPACK
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given se-
quence A0 , A1 , . . . , An−1 and returns the sequence a0 , a1 , . . . , an−1 defined by
1 n −1
n t∑
ak = At eik2πt/n
=0
Thus, if we set At = f (ωt ), where f is the spectral density and ωt := 2πt/n, then
1 n −1 1 2π n −1
n t∑ ∑
ak = f (ωt )eiωt k = f (ωt )eiωt k , ωt := 2πt/n
=0 2π n t =0
For n sufficiently large, we then have
Z 2π
1 1
Z π
iωk
ak ≈ f (ω )e dω = f (ω )eiωk dω
2π 0 2π −π
(You can check the last equality)
In view of (3.69) we have now shown that, for n sufficiently large, ak ≈ γ(k ) — which is exactly
what we want to compute
Contents
• Estimation of Spectra
– Overview
– Periodograms
– Smoothing
– Exercises
– Solutions
Overview
Periodograms
Recall that the spectral density f of a covariance stationary process with autocorrelation function
γ can be written as
f (ω ) = γ(0) + 2 ∑ γ(k ) cos(ωk ), ω∈R
k ≥1
Now consider the problem of estimating the spectral density of a given time series, when γ is
unknown
In particular, let X0 , . . . , Xn−1 be n consecutive observations of a single time series that is assumed
to be covariance stationary
The most common estimator of the spectral density of this process is the periodogram of
X0 , . . . , Xn−1 , which is defined as
2
1 n −1
I (ω ) := ∑ Xt eitω , ω∈R (3.72)
n t =0
It is straightforward to show that the function I is even and 2π-periodic (i.e., I (ω ) = I (−ω ) and
I (ω + 2π ) = I (ω ) for all ω ∈ R)
From these two results, you will be able to verify that the values of I on [0, π ] determine the values
of I on all of R
The next section helps to explain the connection between the periodogram and the spectral density
Interpretation To interpret the periodogram, it is convenient to focus on its values at the Fourier
frequencies
2πj
ω j := , j = 0, . . . , n − 1
n
In what sense is I (ω j ) an estimate of f (ω j )?
The answer is straightforward, although it does involve some algebra
With a bit of effort one can show that, for any integer j > 0,
n −1 n −1
t
∑e itω j
= ∑ exp i2πj
n
=0
t =0 t =0
Now let
1 n −1
n t∑
γ̂(k ) := ( Xt − X̄ )( Xt−k − X̄ ), k = 0, 1, . . . , n − 1
=k
This is the sample autocovariance function, the natural “plug-in estimator” of the autocovariance
function γ
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations with
sample means)
With this notation, we can now write
n −1
I (ω j ) = γ̂(0) + 2 ∑ γ̂(k) cos(ω j k)
k =1
Recalling our expression for f given above, we see that I (ω j ) is just a sample analog of f (ω j )
Calculation Let’s now consider how to compute the periodogram as defined in (3.72)
There are already functions available that will do this for us — an example is
statsmodels.tsa.stattools.periodogram in the statsmodels package
However, it is very simple to replicate their results, and this will give us a platform to make useful
extensions
The most common way to calculate the periodogram is via the discrete Fourier transform, which
in turn is implemented through the fast Fourier transform algorithm
In general, given a sequence a0 , . . . , an−1 , the discrete Fourier transform computes the sequence
n −1
tj
A j := ∑ at exp i2π , j = 0, . . . , n − 1
t =0 n
With numpy.fft.fft imported as fft and a0 , . . . , an−1 stored in NumPy array a, the function call
fft(a) returns the values A0 , . . . , An−1 as a NumPy array
It follows that, when the data X0 , . . . , Xn−1 is stored in array X, the values I (ω j ) at the Fourier
frequencies, which are given by
n −1 2
1 tj
∑ Xt exp i2π , j = 0, . . . , n − 1
n t =0 n
def periodogram(x):
"Argument x is a NumPy array containing the time series data"
n = len(x)
I_w = np.abs(fft(x))**2 / n
w = 2 * np.pi * np.arange(n) / n # Fourier frequencies
w, I_w = w[:int(n/2)], I_w[:int(n/2)] # Truncate to interval [0, pi]
return w, I_w
Let’s generate some data for this function using the ARMA class from QuantEcon
(See the lecture on linear processes for details on this class)
Here’s a code snippet that, once the preceding code has been run, generates data from the process
n = 40 # Data size
phi, theta = 0.5, (0, -0.8) # AR and MA parameters
lp = ARMA(phi, theta)
X = lp.simulation(ts_length=n)
fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not surprising
that the estimate is poor
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density
This brings us to our next topic
Smoothing
I ( ω j − p ), I ( ω j − p +1 ), . . . , I ( ω j ), . . . , I ( ω j + p )
where the weights w(− p), . . . , w( p) are a sequence of 2p + 1 nonnegative values summing to one
In generally, larger values of p indicate more smoothing — more on this below
The next figure shows the kind of sequence typically used
Note the smaller weights towards the edges and larger weights in the center, so that more distant
values from I (ω j ) have less weight than closer ones in the sum (3.74)
Estimation with Smoothing Our next step is to provide code that will not only estimate the
periodogram but also provide smoothing as required
Such functions have been written in estspec.py and are available via QuantEcon.py
The file estspec.py are printed below
"""
Filename: estspec.py
"""
Parameters
----------
x : array_like(float)
A flat NumPy array containing the data to smooth
window_len : scalar(int), optional
An odd integer giving the length of the window. Defaults to 7.
window : string
A string giving the window type. Possible values are 'flat',
'hanning', 'hamming', 'bartlett' or 'blackman'
Returns
-------
array_like(float)
The smoothed values
Notes
-----
Application of the smoothing window at the top and bottom of x is
done by reflecting x around these points to extend it sufficiently
in each direction.
"""
if len(x) < window_len:
raise ValueError("Input vector length must be >= window length.")
if window_len < 3:
raise ValueError("Window length must be at least 3.")
.. math::
Parameters
----------
x : array_like(float)
A flat NumPy array containing the data to smooth
window_len : scalar(int), optional(default=7)
An odd integer giving the length of the window. Defaults to 7.
window : string
A string giving the window type. Possible values are 'flat',
'hanning', 'hamming', 'bartlett' or 'blackman'
Returns
-------
w : array_like(float)
Fourier frequences at which periodogram is evaluated
I_w : array_like(float)
Values of periodogram at the Fourier frequences
"""
n = len(x)
I_w = np.abs(fft(x))**2 / n
w = 2 * np.pi * np.arange(n) / n # Fourier frequencies
w, I_w = w[:int(n/2)+1], I_w[:int(n/2)+1] # Take only values on [0, pi]
if window:
I_w = smooth(I_w, window_len=window_len, window=window)
return w, I_w
Parameters
----------
x : array_like(float)
A flat NumPy array containing the data to smooth
window_len : scalar(int), optional
An odd integer giving the length of the window. Defaults to 7.
window : string
A string giving the window type. Possible values are 'flat',
'hanning', 'hamming', 'bartlett' or 'blackman'
Returns
-------
w : array_like(float)
Fourier frequences at which periodogram is evaluated
I_w : array_like(float)
Values of periodogram at the Fourier frequences
"""
# === run regression === #
x_current, x_lagged = x[1:], x[:-1] # x_t and x_{t-1}
x_current, x_lagged = Series(x_current), Series(x_lagged) # pandas series
results = ols(y=x_current, x=x_lagged, intercept=True, nw_lags=1)
e_hat = results.resid.values
phi = results.beta['x']
return w, I_w
The listing displays three functions, smooth(), periodogram(), ar_periodogram(). We will dis-
cuss the first two here and the third one below
The periodogram() function returns a periodogram, optionally smoothed via the smooth() func-
tion
Regarding the smooth() function, since smoothing adds a nontrivial amount of computation, we
have applied a fairly terse array-centric method based around np.convolve
Readers are left to either explore or simply use this code according to their interests
The next three figures each show smoothed and unsmoothed periodograms, as well as the true
spectral density
(The model is the same as before — see equation (3.73) — and there are 400 observations)
From top figure to bottom, the window length is varied from small to large
In looking at the figure, we can see that for this model and data size, the window length chosen in
the middle figure provides the best fit
Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing
Of course in real estimation problems the true spectral density is not visible and the choice of
appropriate smoothing will have to be made based on judgement/priors or some other theory
Pre-Filtering and Smoothing In the code listing above we showed three functions from the file
estspec.py
The third function in the file (ar_periodogram()) adds a pre-processing step to periodogram
smoothing
First we describe the basic idea, and after that we give the code
The essential idea is to
1. Transform the data in order to make estimation of the spectral density more efficient
2. Compute the periodogram associated with the transformed data
3. Reverse the effect of the transformation on the periodogram, so that it now estimates the
spectral density of the original process
Step 1 is called pre-filtering or pre-whitening, while step 3 is called recoloring
The first step is called pre-whitening because the transformation is usually designed to turn the
data into something closer to white noise
Why would this be desirable in terms of spectral density estimation?
The reason is that we are smoothing our estimated periodogram based on estimated values at
nearby points — recall (3.74)
The underlying assumption that makes this a good idea is that the true spectral density is rela-
tively regular — the value of I (ω ) is close to that of I (ω 0 ) when ω is close to ω 0
This will not be true in all cases, but it is certainly true for white noise
For white noise, I is as regular as possible — it is a constant function
In this case, values of I (ω 0 ) at points ω 0 near to ω provided the maximum possible amount of
information about the value I (ω )
Another way to put this is that if I is relatively constant, then we can use a large amount of
smoothing without introducing too much bias
The AR(1) Setting Let’s examine this idea more carefully in a particular setting — where the
data is assumed to be AR(1)
(More general ARMA settings can be handled using similar techniques to those described below)
Suppose in partcular that { Xt } is covariance stationary and AR(1), with
where µ and φ ∈ (−1, 1) are unknown parameters and {et } is white noise
It follows that if we regress Xt+1 on Xt and an intercept, the residuals will approximate white
noise
Let
• g be the spectral density of {et } — a constant function, as discussed above
• I0 be the periodogram estimated from the residuals — an estimate of g
• f be the spectral density of { Xt } — the object we are trying to estimate
In view of an earlier result we obtained while discussing ARMA processes, f and g are related by
2
1
f (ω ) = iω
g(ω ) (3.76)
1 − φe
This suggests that the recoloring step, which constructs an estimate I of f from I0 , should set
2
1
I (ω ) =
iω
I0 (ω )
1 − φ̂e
The periodograms are calculated from time series drawn from (3.75) with µ = 0 and φ = −0.9
Each time series is of length 150
The difference between the three subfigures is just randomness — each one uses a different draw
of the time series
In all cases, periodograms are fit with the “hamming” window and window length of 65
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer to the
true spectral density
Exercises
Exercise 3 To be written. The exercise will be to use the code from this lecture to download FRED
data and generate periodograms for different kinds of macroeconomic data.
Solutions
Solution notebook
3.10 Robustness
Contents
• Robustness
– Overview
– The Model
– Constructing More Robust Policies
– Robustness as Outcome of a Two-Person Zero-Sum Game
– The Stochastic Case
– Implementation
– Application
– Appendix
Overview
This lecture modifies a Bellman equation to express a decision maker’s doubts about transition
dynamics
His specification doubts make the decision maker want a robust decision rule
Robust means insensitive to misspecification of transition dynamics
The decision maker has a single approximating model
He calls it approximating to acknowledge that he doesn’t completely trust it
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly
All that he knows is that the actual data-generating model is in some (uncountable) set of models
that surrounds his approximating model
He quantifies the discrepancy between his approximating model and the genuine data-generating
model by using a quantity called entropy
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models actu-
ally governs outcomes
This is what it means for his decision rule to be “robust to misspecification of an approximating
model”
This may sound like too much to ask for, but . . .
. . . a secret weapon is available to design robust decision rules
The secret weapon is max-min control theory
A value-maximizing decision maker enlists the aid of an (imaginary) value-minimizing model
chooser to construct bounds on the value attained by a given decision rule under different models
of the transition dynamics
The original decision maker uses those bounds to construct a decision rule with an assured per-
formance level, no matter which model actually governs outcomes
Note: In reading this lecture, please don’t think that our decision maker is paranoid when he
conducts a worst-case analysis. By designing a rule that works well against a worst-case, his
intention is to construct a rule that will work well across a set of models.
Sets of Models Imply Sets Of Values Our “robust” decision maker wants to know how well a
given rule will work when he does not know a single transition law . . .
. . . he wants to know sets of values that will be attained by a given decision rule F under a set of
transition laws
Ultimately, he wants to design a decision rule F that shapes these sets of values in ways that he
prefers
With this in mind, consider the following graph, which relates to a particular decision problem to
be explained below
The figure shows a value-entropy correspondence for a particular decision rule F
The shaded set is the graph of the correspondence, which maps entropy to a set of values associ-
ated with a set of models that surround the decision maker’s approximating model
Here
• Value refers to a sum of discounted rewards obtained by applying the decision rule F when
the state starts at some fixed initial state x0
• Entropy is a nonnegative number that measures the size of a set of models surrounding the
decision maker’s approximating model
– Entropy is zero when the set includes only the approximating model, indicating that
the decision maker completely trusts the approximating model
– Entropy is bigger, and the set of surrounding models is bigger, the less the decision
maker trusts the approximating model
The shaded region indicates that for all models having entropy less than or equal to the number
on the horizontal axis, the value obtained will be somewhere within the indicated set of values
Now let’s compare sets of values associated with two different decision rules, Fr and Fb
In the next figure,
• The red set shows the value-entropy correspondence for decision rule Fr
• The blue set shows the value-entropy correspondence for decision rule Fb
The blue correspondence is skinnier than the red correspondence
This conveys the sense in which the decision rule Fb is more robust than the decision rule Fr
• more robust means that the set of values is less sensitive to increasing misspecification as mea-
sured by entropy
Notice that the less robust rule Fr promises higher values for small misspecifications (small en-
tropy)
(But it is more fragile in the sense that it is more sensitive to perturbations of the approximating
model)
Below we’ll explain in detail how to construct these sets of values for a given F, but for now . . .
Here is a hint about the secret weapons we’ll use to construct these sets
• We’ll use some min problems to construct the lower bounds
Inspiring Video If you want to understand more about why one serious quantitative researcher
is interested in this approach, we recommend Lars Peter Hansen’s Nobel lecture
The Model
For simplicity, we present ideas in the context of a class of problems with linear transition laws
and quadratic objective functions
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than value
maximization
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of controls {ut }
to minimize
∞
∑ βt xt0 Rxt + u0t Qut
(3.77)
t =0
As before,
• xt is n × 1, A is n × n
• ut is k × 1, B is n × k
• wt is j × 1, C is n × j
• R is n × n and Q is k × k
Here xt is the state, ut is the control, and wt is a shock vector.
For now we take {wt } := {wt }∞
t=1 to be deterministic — a single fixed sequence
We also allow for model uncertainty on the part of the agent solving this optimization problem
In particular, the agent takes wt = 0 for all t ≥ 0 as a benchmark model, but admits the possibility
that this model might be wrong
As a consequence, she also considers a set of alternative models expressed in terms of sequences
{wt } that are “close” to the zero sequence
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {wt }
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the expres-
sion ∑∞
t =0 β
t +1 w 0 w
t +1 t +1
If our agent takes {wt } as a given deterministic sequence, then, drawing on intuition from earlier
lectures on dynamic programming, we can anticipate Bellman equations such as
By raising θ more and more, we more and more limit the ability of maximizing agent to distort
outcomes relative to the approximating model
So bigger θ is implicitly associated with smaller distortion sequences {wt }
where
D( P) := P + PC (θ I − C 0 PC )−1 C 0 P (3.81)
and I is a j × j identity matrix. Substituting this expression for the maximum into (3.79) yields
P = B(D( P))
The operator B is the standard (i.e., non-robust) LQ Bellman operator, and P = B( P) is the stan-
dard matrix Riccati equation coming from the Bellman equation — see this discussion
Under some regularity conditions (see [HS08]), the operator B ◦ D has a unique positive definite
fixed point, which we denote below by P̂
A robust policy, indexed by θ, is u = − F̂x where
We also define
K̂ := (θ I − C 0 P̂C )−1 C 0 P̂( A − B F̂ ) (3.84)
The interpretation of K̂ is that wt+1 = K̂xt on the worst-case path of { xt }, in the sense that this
vector is the maximizer of (3.80) evaluated at the fixed rule u = − F̂x
Note that P̂, F̂, K̂ are all determined by the primitives and θ
Note also that if θ is very large, then D is approximately equal to the identity mapping
Hence, when θ is large, P̂ and F̂ are approximately equal to their standard LQ values
Furthermore, when θ is large, K̂ is approximately equal to zero
Conversely, smaller θ is associated with greater fear of model misspecification, and greater concern
for robustness
What we have done above can be interpreted in terms of a two-person zero-sum game in which
F̂, K̂ are Nash equilibrium objects
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting the
possibility of misspecification
Agent 2 is an imaginary malevolent player
Agent 2’s malevolence helps the original agent to compute bounds on his value function across a
set of models
We begin with agent 2’s problem
Now let F be a fixed policy, and let JF ( x0 , w) be the present-value cost of that policy given sequence
w := {wt } and initial condition x0 ∈ Rn
Substituting − Fxt for ut in (3.77), this value can be written as
∞
JF ( x0 , w ) : = ∑ βt xt0 ( R + F0 QF)xt (3.86)
t =0
where
xt+1 = ( A − BF ) xt + Cwt+1 (3.87)
and the initial condition x0 is as specified in the left side of (3.86)
or, equivalently,
∞
min ∑ βt − xt0 ( R + F 0 QF ) xt + βθwt0 +1 wt+1
(3.88)
w
t =0
subject to (3.87)
What’s striking about this optimization problem is that it is once again an LQ discounted dynamic
programming problem, with w = {wt } as the sequence of controls
The expression for the optimal policy can be found by applying the usual LQ formula (see here)
We denote it by K ( F, θ ), with the interpretation wt+1 = K ( F, θ ) xt
The remaining step for agent 2’s problem is to set θ to enforce the constraint (3.85), which can be
done by choosing θ = θη such that
∞
β ∑ βt xt0 K ( F, θη )0 K ( F, θη ) xt = η (3.89)
t =0
The Lower Bound Define the minimized object on the right side of problem (3.88) as Rθ ( x0 , F ).
Because “minimizers minimize” we have
∞ ∞
∑ βt − xt0 ( R + F 0 QF ) xt + βθ ∑ βt wt0 +1 wt+1 ,
R θ ( x0 , F ) ≤
t =0 t =0
where
∞
ent := β ∑ βt wt0 +1 wt+1
t =0
To construct the lower bound on the set of values associated with all perturbations w satisfying the
entropy constraint (3.85) at a given entropy level, we proceed as follows:
• For a given θ, solve the minimization problem (3.88)
• Compute the minimizer Rθ ( x0 , F ) and the associated entropy using (3.91)
• Compute the lower bound on the value function Rθ ( x0 , F ) − θ ent and plot it against ent
• Repeat the preceding three steps for a range of values of θ to trace out the lower bound
Note: This procedure sweeps out a set of separating hyperplanes indexed by different values for
the Lagrange multiplier θ
The Upper Bound To construct an upper bound we use a very similar procedure
We simply replace the minimization problem (3.88) with the maximization problem
∞
Vθ̃ ( x0 , F ) = max ∑ βt − xt0 ( R + F 0 QF ) xt − βθ̃wt0 +1 wt+1
(3.92)
w
t =0
where
∞
ent ≡ β ∑ βt wt0 +1 wt+1
t =0
To construct the upper bound on the set of values associated all perturbations w with a given en-
tropy we proceed much as we did for the lower bound
• For a given θ̃, solve the maximization problem (3.92)
• Compute the maximizer Vθ̃ ( x0 , F ) and the associated entropy using (3.94)
• Compute the upper bound on the value function Vθ̃ ( x0 , F ) + θ̃ ent and plot it against ent
• Repeat the preceding three steps for a range of values of θ̃ to trace out the upper bound
Reshaping the set of values Now in the interest of reshaping these sets of values by choosing F,
we turn to agent 1’s problem
subject to
xt+1 = ( A + CK ) xt + But (3.97)
Once again, the expression for the optimal policy can be found here — we denote it by F̃
Nash Equilibrium Clearly the F̃ we have obtained depends on K, which, in agent 2’s problem,
depended on an initial policy F
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where
F̃ = Φ(K ( F, θ ))
As you may have already guessed, the robust policy F̂ defined in (3.83) is a fixed point of the
mapping Φ
In particular, for any given θ,
1. K ( F̂, θ ) = K̂, where K̂ is as given in (3.84)
2. Φ(K̂ ) = F̂
A sketch of the proof is given in the appendix
Now we turn to the stochastic case, where the sequence {wt } is treated as an iid sequence of
random vectors
In this setting, we suppose that our agent is uncertain about the conditional probability distribution
of wt+1
The agent takes the standard normal distribution N (0, I ) as the baseline conditional distribution,
while admitting the possibility that other “nearby” distributions prevail
These alternative conditional distributions of wt+1 might depend nonlinearly on the history xs , s ≤
t
To implement this idea, we need a notion of what it means for one distribution to be near another
one
Here we adopt a very useful measure of closeness for distributions known as the relative entropy,
or Kullback-Leibler divergence
For densities p, q, the Kullback-Leibler divergence of q from p is defined as
p( x )
Z
DKL ( p, q) := ln p( x ) dx
q( x )
Here P represents the set of all densities on Rn and φ is the benchmark distribution N (0, I )
The distribution φ is chosen as the least desirable conditional distribution in terms of next period
outcomes, while taking into account the penalty term θDKL (ψ, φ)
This penalty term plays a role analogous to the one played by the deterministic penalty θw0 w in
(3.79), since it discourages large deviations from the benchmark
Solving the Model The maximization problem in (3.98) appears highly nontrivial — after all,
we are maximizing over an infinite dimensional space consisting of the entire set of densities
However, it turns out that the solution is tractable, and in fact also falls within the class of normal
distributions
First, we note that J has the form J ( x ) = x 0 Px + d for some positive definite matrix P and constant
real number d
Moreover, it turns out that if ( I − θ −1 C 0 PC )−1 is nonsingular, then
Z
0
max ( Ax + Bu + Cw) P( Ax + Bu + Cw) ψ(dw) − θDKL (ψ, φ)
ψ∈P
where
κ (θ, P) := θ ln[det( I − θ −1 C 0 PC )−1 ]
and the maximizer is the Gaussian distribution
ψ = N (θ I − C 0 PC )−1 C 0 P( Ax + Bu), ( I − θ −1 C 0 PC )−1 (3.100)
Substituting the expression for the maximum into Bellman equation (3.98) and using J ( x ) =
x 0 Px + d gives
Since constant terms do not affect minimizers, the solution is the same as (3.82), leading to
To solve this Bellman equation, we take P̂ to be the positive definite fixed point of B ◦ D
In addition, we take dˆ as the real number solving d = β [d + κ (θ, P)], which is
β
dˆ := κ (θ, P) (3.102)
1−β
The robust policy in this stochastic case is the minimizer in (3.101), which is once again u = − F̂x
for F̂ given by (3.83)
Substituting the robust policy into (3.100) we obtain the worst case shock distribution:
Computing Other Quantities Before turning to implementation, we briefly outline how to com-
pute several other quantities of interest
Worst-Case Value of a Policy One thing we will be interested in doing is holding a policy fixed
and computing the discounted loss associated with that policy
So let F be a given policy and let JF ( x ) be the associated loss, which, by analogy with (3.98),
satisfies
Z
0 0
JF ( x ) = max x ( R + F QF ) x + β JF (( A − BF ) x + Cw) ψ(dw) − θDKL (ψ, φ)
ψ∈P
Writing JF ( x ) = x 0 PF x + d F and applying the same argument used to derive (3.99) we get
x 0 PF x + d F = x 0 ( R + F 0 QF ) x + β x 0 ( A − BF )0 D( PF )( A − BF ) x + d F + κ (θ, PF )
PF = R + F 0 QF + β( A − BF )0 D( PF )( A − BF )
and
β β
d F := κ (θ, PF ) = θ ln[det( I − θ −1 C 0 PF C )−1 ] (3.103)
1−β 1−β
If you skip ahead to the appendix, you will be able to verify that − PF is the solution to the Bellman
equation in agent 2’s problem discussed above — we use this in our computations
Implementation
The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ optimal
control
Here’s the relevant code, from file robustlq.py ‘‘
"""
Filename: robustlq.py
"""
from __future__ import division # Remove for Python 3.sx
from textwrap import dedent
import numpy as np
from .lqcontrol import LQ
from .quadsums import var_quadratic_sum
from numpy import dot, log, sqrt, identity, hstack, vstack, trace
from scipy.linalg import solve, inv, det
from .matrix_eqn import solve_discrete_lyapunov
class RBLQ(object):
r"""
Provides methods for analysing infinite horizon robust LQ control
problems of the form
.. math::
subject to
.. math::
Parameters
----------
Q : array_like(float, ndim=2)
The cost(payoff) matrix for the controls. See above for more.
Q should be k x k and symmetric and positive definite
R : array_like(float, ndim=2)
The cost(payoff) matrix for the state. See above for more. R
should be n x n and symmetric and non-negative definite
A : array_like(float, ndim=2)
The matrix that corresponds with the state in the state space
system. A should be n x n
B : array_like(float, ndim=2)
The matrix that corresponds with the control in the state space
system. B should be n x k
C : array_like(float, ndim=2)
The matrix that corresponds with the random process in the
state space system. C should be n x j
beta : scalar(float)
The discount factor in the robust control problem
theta : scalar(float)
The robustness factor in the robust control problem
Attributes
----------
Q, R, A, B, C, beta, theta : see Parameters
k, n, j : scalar(int)
The dimensions of the matrices
"""
def __repr__(self):
return self.__str__()
def __str__(self):
m = """\
Robust linear quadratic control system
- beta (discount parameter) : {b}
- theta (robustness factor) : {th}
- n (number of state variables) : {n}
- k (number of control variables) : {k}
- j (number of shocks) : {j}
"""
return dedent(m.format(b=self.beta, n=self.n, k=self.k, j=self.j,
th=self.theta))
.. math::
Parameters
----------
P : array_like(float, ndim=2)
A matrix that should be n x n
Returns
-------
dP : array_like(float, ndim=2)
The matrix P after applying the D operator
"""
C, theta = self.C, self.theta
I = np.identity(self.j)
S1 = dot(P, C)
S2 = dot(C.T, S1)
return dP
.. math::
.. math::
Parameters
----------
P : array_like(float, ndim=2)
A matrix that should be n x n
Returns
-------
F : array_like(float, ndim=2)
The F matrix as defined above
new_p : array_like(float, ndim=2)
The matrix P after applying the B operator
"""
A, B, Q, R, beta = self.A, self.B, self.Q, self.R, self.beta
S1 = Q + beta * dot(B.T, dot(P, B))
S2 = beta * dot(B.T, dot(P, A))
S3 = beta * dot(A.T, dot(P, A))
F = solve(S1, S2)
new_P = R - dot(S2.T, solve(S1, S2)) + S3
return F, new_P
def robust_rule(self):
"""
This method solves the robust control problem by tricking it
into a stacked LQ problem, as described in chapter 2 of Hansen-
Sargent's text "Robustness." The optimal control with observed
state is
.. math::
u_t = - F x_t
Returns
-------
F : array_like(float, ndim=2)
The optimal control matrix from above
P : array_like(float, ndim=2)
The positive semi-definite matrix defining the value
function
K : array_like(float, ndim=2)
the worst-case shock matrix K, where
:math:`w_{t+1} = K x_t` is the worst case shock
"""
# == Simplify names == #
A, B, C, Q, R = self.A, self.B, self.C, self.Q, self.R
return F, K, P
Parameters
----------
P_init : array_like(float, ndim=2), optional(default=None)
The initial guess for the value function matrix. It will
be a matrix of zeros if no guess is given
max_iter : scalar(int), optional(default=80)
The maximum number of iterations that are allowed
tol : scalar(float), optional(default=1e-8)
The tolerance for convergence
Returns
-------
F : array_like(float, ndim=2)
The optimal control matrix from above
P : array_like(float, ndim=2)
The positive semi-definite matrix defining the value
function
K : array_like(float, ndim=2)
the worst-case shock matrix K, where
:math:`w_{t+1} = K x_t` is the worst case shock
"""
# == Simplify names == #
A, B, C, Q, R = self.A, self.B, self.C, self.Q, self.R
beta, theta = self.beta, self.theta
# == Set up loop == #
P = np.zeros((self.n, self.n)) if P_init is None else P_init
iterate, e = 0, tol + 1
return F, K, P
Parameters
----------
F : array_like(float, ndim=2)
A k x n array
Returns
-------
K : array_like(float, ndim=2)
Agent's best cost minimizing response for a given F
P : array_like(float, ndim=2)
The value function for a given F
"""
Q2 = self.beta * self.theta
R2 = - self.R - dot(F.T, dot(self.Q, F))
A2 = self.A - dot(self.B, F)
B2 = self.C
lq = LQ(Q2, R2, A2, B2, beta=self.beta)
neg_P, neg_K, d = lq.stationary_values()
Parameters
----------
K : array_like(float, ndim=2)
A j x n array
Returns
-------
F : array_like(float, ndim=2)
The policy function for a given K
P : array_like(float, ndim=2)
The value function for a given K
"""
A1 = self.A + dot(self.C, K)
B1 = self.B
Q1 = self.Q
R1 = self.R - self.beta * self.theta * dot(K.T, K)
lq = LQ(Q1, R1, A1, B1, beta=self.beta)
P, F, d = lq.stationary_values()
return F, P
Parameters
----------
F : array_like(float, ndim=2)
The policy function, a k x n array
K : array_like(float, ndim=2)
The worst case matrix, a j x n array
x0 : array_like(float, ndim=1)
The initial condition for state
Returns
-------
e : scalar(int)
The deterministic entropy
"""
H0 = dot(K.T, K)
C0 = np.zeros((self.n, 1))
A0 = self.A - dot(self.B, F) + dot(self.C, K)
e = var_quadratic_sum(A0, C0, H0, self.beta, x0)
return e
Parameters
----------
F : array_like(float, ndim=2)
The policy function, a k x n array
Returns
-------
P_F : array_like(float, ndim=2)
Matrix for discounted cost
d_F : scalar(float)
Constant for discounted cost
K_F : array_like(float, ndim=2)
Worst case policy
O_F : array_like(float, ndim=2)
Matrix for discounted entropy
o_F : scalar(float)
Constant for discounted entropy
"""
# == Simplify names == #
Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C
beta, theta = self.beta, self.theta
Application
Let us consider a monopolist similar to this one, but now facing model uncertainty
The inverse demand function is pt = a0 − a1 yt + dt
where
iid
dt+1 = ρdt + σd wt+1 , {wt } ∼ N (0, 1)
and all parameters are strictly positive
The period return function for the monopolist is
( y t +1 − y t )2
rt = pt yt − γ − cyt
2
Its objective is to maximize expected discounted profits, or, equivalently, to minimize
E ∑∞ t
t=0 β (−rt )
The standard normal distribution for wt is understood as the agent’s baseline, with uncertainty
parameterized by θ
We compute value-entropy correspondences for two policies
1. The no concern for robustness policy F0 , which is the ordinary LQ loss minimizer
2. A “moderate” concern for robustness policy Fb , with θ = 0.02
The code for producing the graph shown above, with blue being for the robust policy, is given in
robustness/robust_monopolist.py
We repeat it here for convenience
"""
Filename: robust_monopolist.py
Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski
The robust control problem for a monopolist with adjustment costs. The
where d_{t+1} = \rho d_t + \sigma_d w_{t+1} for w_t ~ N(0, 1) and iid.
The period return function for the monopolist is
"""
import pandas as pd
import numpy as np
from scipy.linalg import eig
from scipy import interp
import matplotlib.pyplot as plt
import quantecon as qe
# == model parameters == #
a_0 = 100
a_1 = 0.5
rho = 0.9
sigma_d = 0.05
beta = 0.95
c = 2
gamma = 50.0
theta = 0.002
ac = (a_0 - c) / 2.0
# == Define LQ matrices == #
R = -R # For minimization
Q = gamma / 2
[0.],
[sigma_d]])
# -------------------------------------------------------------------------- #
# Functions
# -------------------------------------------------------------------------- #
Parameters
==========
emax: scalar
The target entropy value
F: array_like
The policy function to be evaluated
bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.
Returns
=======
df: pd.DataFrame
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.
"""
if bw == 'worst':
thetas = 1 / np.linspace(1e-8, 1000, grid_size)
else:
thetas = -1 / np.linspace(1e-8, 1000, grid_size)
df = df.dropna(how='any')
return df
# -------------------------------------------------------------------------- #
# Main
# -------------------------------------------------------------------------- #
emax = 1.6e6
fig, ax = plt.subplots()
ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()
(robust_best_case, robust_worst_case))
class Curve(object):
plt.show()
Can you explain the different shape of the value-entropy correspondence for the robust policy?
Appendix
We sketch the proof only of the first claim in this section, which is that, for any given θ, K ( F̂, θ ) = K̂,
where K̂ is as given in (3.84)
This is the content of the next lemma
Lemma. If P̂ is the fixed point of the map B ◦ D and F̂ is the robust policy as given in (3.83), then
Proof: As a first step, observe that when F = F̂, the Bellman equation associated with the LQ
problem (3.87) – (3.88) is
(revisit this discussion if you don’t know where (3.105) comes from) and the optimal policy is
Using the definition of D , we can rewrite the right-hand side more simply as
R + F̂ 0 Q F̂ + β( A − B F̂ )0 D( P̂)( A − B F̂ )
Although it involves a substantial amount of algebra, it can be shown that the latter is just P̂
(Hint: Use the fact that P̂ = B(D( P̂)))
Contents
• Dynamic Stackelberg Problems
– Overview
– The Stackelberg Problem
– Solving the Stackelberg Problem
– Shadow prices
– A Large Firm With a Competitive Fringe
– Concluding Remarks
– Exercises
Overview
Previous lectures including the LQ dynamic programming, rational expectations equilibrium, and
Markov perfect equilibrium lectures have studied decision problems that are recursive in what we
can call “natural” state variables, such as
• stocks of capital (fiscal, financial and human)
• wealth
• information that helps forecast future prices and quantities that impinge on future payoffs
In problems that are recursive in the natural state variables, optimal decision rules are functions
of the natural state variables
In this lecture, we describe an important class of problems that are not recursive in the natural
state variables
Kydland and Prescott [KP77], [Pre77] and Calvo [Cal78] gave examples of such decision problems,
which have the following features
• The time t ≥ 0 actions of some decision makers depend on the time s ≥ t decisions of
another decision maker called a government or Stackelberg leader
• At time t = 0, the government or Stackelberg leader chooses his actions for all times s ≥ 0
• In choosing actions for all times at time 0, the government or leader is said to commit to a plan
In these problems, variables that encode history dependence appear in optimal decision rules of the
government or leader
Furthermore, the Stackelberg leader has distinct optimal decision rules for time t = 0, on the one
hand, and times t ≥ 1, on the other hand
The Stackelberg leader’s decision rules for t = 0 and t ≥ 1 have distinct state variables
These properties of the Stackelberg leader’s decision rules are symptoms of the time inconsistency
of optimal government plans
An expression of time inconsistency is that optimal decision rules are not recursive in natural state
variables
Examples of time inconsistent optimal rules are those of a large agent (e.g., a government) who
• confronts a competitive market composed of many small private agents, and in which
• the private agents’ decisions at each date are influenced by their forecasts of the government’s
future actions
In such settings, private agents’ stocks of capital and other durable assets at time t are partly
shaped by their past decisions that in turn were influenced by their earlier forecasts of the govern-
ment’s actions
The rational expectations equilibrium concept plays an essential role
Rational expectations implies that in choosing its future actions, the government (or leader)
chooses the private agents’ (or followers’) expectations about them
We use the optimal linear regulator to solve a linear quadratic version of what is known as a
dynamic Stackelberg problem
For now we refer to the Stackelberg leader as the government and the Stackelberg follower as the
representative agent or private sector
Soon we’ll give an application with another interpretation of these two decision makers
Let zt be an nz × 1 vector of natural state variables, xt an n x × 1 vector of endogenous forward-
looking variables that are physically free to jump at t, and ut a vector of government instruments
The zt vector is inherited physically from the past
But xt is inherited from the past not physically but as a consequence of promises made earlier
Included in xt might be prices and quantities that adjust instantaneously to clear markets at time t
z
Let yt = t
xt
Define the government’s one-period loss function 1
r (y, u) = y0 Ry + u0 Qu (3.106)
Subject to an initial condition for z0 , but not for x0 , a government wants to maximize
∞
− ∑ βt r (yt , ut ) (3.107)
t =0
1 The problem assumes that there are no cross products between states and controls in the return function. A simple
transformation converts a problem whose return function has cross products into an equivalent problem that has no
cross products. For example, see [HS08] (chapter 4, pp. 72-73).
We assume that the matrix on the left is invertible, so that we can multiply both sides of the above
equation by its inverse to obtain
z t +1 A11 A12 zt
= + But (3.109)
x t +1 A21 A22 xt
or
yt+1 = Ayt + But (3.110)
The government maximizes (3.107) by choosing sequences {ut , xt , zt+1 }∞
t=0 subject (3.110) and an
initial condition for z0
Note that we have an intitial condition for z0 but not for x0
x0 is among the variables to be chosen at time 0
The private sector’s behavior is summarized by the second block of equations of (3.109) or (3.110)
These typically include the first-order conditions of private agents’ optimization problem (i.e.,
their Euler equations)
These Euler equations summarize the forward-looking aspect of private agents’ behavior and ex-
press how their time t decisions depend on government actions at times s ≥ t
When combined with a stability condition to be imposed below, these Euler equations summarize
the private sector’s best response to the sequence of actions by the government.
The government uses its understanding of these responses to manipulate private sector actions.
To indicate the features of the problem that make xt a vector of forward-looking variables, write
the second block of system (3.108) as
In choosing ut for t ≥ 1 at time 0, the government takes into account how future z and u affect
earlier x through equation (3.112).
The lecture on history dependent policies analyzes an example about Ramsey taxation in which,
as is typical of such problems, the last n x equations of (3.109) or (3.110) constitute implementability
constraints that are formed by the Euler equations of a competitive fringe or private sector
When combined with a stability condition to be imposed below, these Euler equations summarize
the private sector’s best response to the sequence of actions by the government
A certainty equivalence principle allows us to work with a nonstochastic model
That is, we would attain the same decision rule if we were to replace xt+1 with the forecast Et xt+1
and to add a shock process Cet+1 to the right side of (3.110), where et+1 is an IID random vector
with mean of zero and identity covariance matrix
Let X t denote the history of any variable X from 0 to t
[MS85], [HR85], [PL92], [Sar87], [Pea92], and others have all studied versions of the following
problem:
Problem S: The Stackelberg problem is to maximize (3.107) by choosing an x0 and a sequence of
decision rules, the time t component of which maps the time t history of the natural state zt into
the time t decision ut of the Stackelberg leader
The Stackelberg leader chooses this sequence of decision rules once and for all at time t = 0
Another way to say this is that he commits to this sequence of decision rules at time 0
The maximization is subject to a given initial condition for z0
But x0 is among the objects to be chosen by the Stackelberg leader
The optimal decision rule is history dependent, meaning that ut depends not only on zt but also
on lags of z
History dependence has two sources: (a) the government’s ability to commit 2 to a sequence of
rules at time 0 as in the lecture on history dependent policies, and (b) the forward-looking behavior
of the private sector embedded in the second block of equations (3.109)
Some Basic Notation For any vector at , define ~at = [ at , at+1 , . . .].
Define a feasible set of (~y1 , ~u0 ) sequences
∞
( )
Ω ( y0 ) = (~y1 , ~u0 ) : − ∑ β r (yt , ut ) > −∞ and yt+1 = Ayt + But , ∀t ≥ 0
t
t =0
Subproblem 1
∞
v ( y0 ) = max − ∑ βt r (yt , ut ) (3.113)
(~y1 ,~u0 )∈Ω(y0 ) t =0
Subproblem 2
w(z0 ) = max v(y0 ) (3.114)
x0
Subproblem 1 is solved first, once-and-for-all at time 0, tentatively taking the vector of forward-
looking variables x0 as given
Then subproblem 2 is solved for x0
The value function w(z0 ) tells the value of the Stackelberg plan as a function of the vector of
natural state variables
Two Bellman equations We now describe Bellman equations for v(y) and w(z0 )
Subproblem 1 The value function v(y) in subproblem 1 satisfies the Bellman equation
v(y) = max
∗
{−r (y, u) + βv(y∗ )} (3.115)
u,y
which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation
where
P11 P12
P=
P21 P22
Now choose x0 by equating to zero the gradient of v(y0 ) with respect to x0 :
−2P21 z0 − 2P22 x0 = 0,
Manifestation of time inconsistency We have seen that for t ≥ 0 the optimal decision rule for
the Stackelberg leader has the form
ut = − Fyt
or
ut = f 11 zt + f 12 xt
where for t ≥ 1, xt is effectively a state variable, albeit not a natural one, inherited from the past
The means that for t ≥ 1, xt is not a linear function of zt and that xt exerts an independent influence
on ut
The situation is different at t = 0
−1
For t = 0, the initialization the optimal choice of x0 = − P22 P21 z0 described in equation (3.121)
implies that
−1
u0 = ( f 11 − f 12 P22 P2,1 )z0 (3.122)
So for t = 0, u0 is a linear function of the natural state variable z0 only
−1
But for t ≥ 0, xt 6= − P22 P21 zt
Nor does xt equal any other linear combination of zt only for t ≥ 1
This means that xt has an independent role in shaping ut for t ≥ 1
All of this means that the Stackelberg leader’s decision rule at t ≥ 1 differs from its decision rule
at t = 0
As indicated at the beginning of this lecture, this is a symptom of the time inconsistency of the
optimal Stackelberg plan
Shadow prices
The history dependence of the government’s plan can be expressed in the dynamics of Lagrange
multipliers µ x on the last n x equations of (3.108) or (3.109)
These multipliers measure the cost today of honoring past government promises about current
and future settings of u
Later, we shall show that as a result of optimally choosing x0 , it is appropriate to initialize the
multipliers to zero at time t = 0
This is true because at t = 0, there are no past promises about u to honor
But the multipliers µ x take nonzero values thereafter, reflecting future costs to the government of
adhering to its commitment
From the linear regulator lecture, the formula µt = Pyt for the vector of shadow prices on the
transition equations is
µzt
µt =
µ xt
The shadow price µ xt on the forward-looking variables xt evidently equals
for t ≥ 1
By making the instrument feed back on itself, the form of decision rule (3.129) potentially allows
for “instrument-smoothing” to emerge as an optimal rule under commitment
As an example, this section studies the equilibrium of an industry with a large firm that acts as a
Stackelberg leader with respect to a competitive fringe
Sometimes the large firm is called ‘the monopolist’ even though there are actually many firms in
the industry
The industry produces a single nonstorable homogeneous good, the quantity of which is chosen
in the previous period
One large firm produces Qt and a representative firm in a competitive fringe produces qt
The representative firm in the competitive fringe acts as a price taker and chooses sequentially
The large firm commits to a policy at time 0, taking into account its ability to manipulate the price
sequence, both directly through the effects of its quantity choices on prices, and indirectly through
the responses of the competitive fringe to its forecasts of prices 3
The costs of production are Ct = eQt + .5gQ2t + .5c( Qt+1 − Qt )2 for the large firm and σt = dqt +
.5hq2t + .5c(qt+1 − qt )2 for the competitive firm, where d > 0, e > 0, c > 0, g > 0, h > 0 are cost
parameters
There is a linear inverse demand curve
p t = A0 − A1 ( Q t + q t ) + v t , (3.130)
and where |ρ| < 1 and ět+1 is an IID sequence of random variables with mean zero and variance 1
In (3.130), qt is equilibrium output of the representative competitive firm
In equilibrium, qt = qt , but we must distinguish between qt and qt in posing the optimum problem
of a competitive firm
Let it = qt+1 − qt
We regard it as the representative firm’s control at t
The first-order conditions for maximizing (3.132) are
for t ≥ 0
We appeal to a certainty equivalence principle to justify working with a non-stochastic version of
(3.133) formed by dropping the expectation operator and the random term ět+1 from (3.131)
We use a method of [Sar79] and [Tow83] 4
We shift (3.130) forward one period, replace conditional expectations with realized values, use
(3.130) to substitute for pt+1 in (3.133), and set qt = qt and it for all t ≥ 0 to get
it = βit+1 − c−1 βhqt+1 + c−1 β( A0 − d) − c−1 βA1 qt+1 − c−1 βA1 Qt+1 + c−1 βvt+1 (3.134)
Given sufficiently stable sequences { Qt , vt }, we could solve (3.134) and it = qt+1 − qt to express
the competitive fringe’s output sequence as a function of the (tail of the) monopolist’s output
sequence
The dependence of it on future Qt ‘s opens an avenue for the monopolist to influence current
outcomes by its choice now of its future actions
It is this feature that makes the monopolist’s problem fail to be recursive in the natural state vari-
ables q, Q
The monopolist arrives at period t > 0 facing the constraint that it must confirm the expectations
about its time t decision upon which the competitive fringe based its decisions at dates before t
The monopolist’s problem The monopolist views the sequence of the competitive firm’s Euler
equations as constraints on its own opportunities
They are implementability constraints on the monopolist’s choices
Including the implementability constraints, we can represent the constraints in terms of the tran-
sition law impinging on the monopolist:
1 0 0 0 0
1 0 0 0 0 1 1 0
0 1 0 0 0 v t +1 0 ρ 0 0 0 v t 0
Q t +1 = 0 0 1 0 0 Q t + 1 u t
0 0 1 0 0 (3.135)
0 0 0 1 0 q t +1 0 0 0 1 1 q t 0
c
A0 − d 1 − A1 − A1 − h c i t +1 0 0 0 0 β it 0
Represent (3.135) as
yt+1 = Ayt + But (3.136)
Although we have included the competitive fringe’s choice variable it as a component of the
“state” yt in the monopolist’s transition law (3.136), it is actually a “jump” variable
Nevertheless, the analysis above implies that the solution of the large firm’s problem is encoded
in the Riccati equation associated with (3.136) as the transition law
Let’s decode it
To match our general setup, we partition yt as y0t = z0t xt0 where z0t = 1 vt Qt qt and xt = it
subject to the given initial condition for z0 , equations (3.130) and (3.134) and it = qt+1 − qt , as well
as the laws of motion of the natural state variables z
Notice that the monopolist in effect chooses the price sequence, as well as the quantity sequence
of the competitive fringe, albeit subject to the restrictions imposed by the behavior of consumers,
as summarized by the demand curve (3.130) and the implementability constraint (3.134) that de-
scribes the best responses of firms in the competitive fringe
By substituting (3.130) into the above objective function, the monopolist’s problem can be ex-
pressed as
∞
max ∑ βt ( A0 − A1 (qt + Qt ) + vt ) Qt − eQt − .5gQ2t − .5cu2t
(3.137)
{ u t } t =0
subject to (3.136)
This can be written
∞
max − ∑ βt y0t Ryt + u0t Qut
(3.138)
{ut } t =0
Under the Stackelberg plan, ut = − Fyt , which implies that the evolution of y under the Stackel-
berg plan as
yt+1 = ( A − BF )yt (3.139)
0
where yt = 1 vt Qt qt it
Recursive formulation of a follower’s problem We now make use of a “Big K, little k” trick (see
rational expectations equilibrium) to formulate a recursive version of a follower’s problem cast in
terms of an ordinary Bellman equation
The individual firm faces { pt } as a price taker and believes
p t = a0 − a1 Q t − a1 q t + v t (3.140)
y
≡ Ep t (3.141)
qt
From the point of the view of a representative firm in the competitive fringe, {yt } is an exogenous
process
A representative fringe firm wants to forecast y because it wants to forecast what it regards as the
exogenous price process { pt }.
Therefore it wants to forecast the determinant of future prices
• future values of Q and
• future values of q
0
An individual follower firm confronts state yt qt where qt is its current output as opposed to q
within y
(This is an application of the ‘’Big K, little k‘’ idea)
The follower faces law of motion
y t +1 A − BF 0 yt 0
= + i (3.142)
q t +1 0 1 qt 1 t
subject to q0 given the law of motion (3.139) and the price function (3.140) and where the costs are
still σt = dqt + .5hq2t + .5c(qt+1 − qt )2
The representative firm’s problem is a linear-quadratic dynamic programming problem with ma-
trices As , Bs , Qs , Rs that can be constructed easily from the above information.
The representative firm’s decision rule can be represented as
1
vt
Qt
it = − Fs
q
(3.143)
t
it
qt
Now let’s stare at the decision rule (3.143) for it , apply “Big K, little k” logic, and ask what we want
in order to confirm that we have verified a recursive representation of a representative follower’s
choice problem
We want decision rule (3.143) to have the property that it = it when we evaluate the rule at qt = qt
We inherit these desires from the “Big K, little k” logic
Here we apply a “Big K, little k” logic in two parts to make the “representative firm be represen-
tative” after solving the representative firm’s optimization problem
• We want qt = qt
• We want it = it
Numerical example We computed the optimal Stackelberg plan for parameter settings
A0 , A1 , ρ, Ce , c, d, e, g, h, β = 100, 1, .8, .2, 1, 20, 20, .2, .2, .95 5
For these parameter values the decision rule is
1
vt
ut = ( Qt+1 − Qt ) = 83.98 0.78 −0.95 −1.31 −2.07 Qt
q
t
it
for t ≥ 0
and
1
v0
x0 ≡ i0 = 31.08 0.29 −0.15 −0.56
Q0
q0
For this example, starting from z0 = 1 v0 Q0 q0 = 1 0 25 46 , the monopolist chooses
to set i0 = 1.43
That choice implies that
• i1 = 0.25, and
• z1 = 1 v1 Q1 q1 = 1 0 21.83 47.43
A monopolist who started from the initial conditions z̃0 = z1 would set i0 = 1.10 instead of .25 as
called for under the original optimal plan
Thr preceding little calculation reflects the time inconsistency of the monopolist’s optimal plan
5 These calculations were performed by the Python program from QuantEcon.applications in
dyn_stack/oligopoly.py.
The recursive representation of the decision rule for a representative fringe firm is
1
vt
Qt
it = 0 0 0 .34 1 −.34 q
t
it
qt
Concluding Remarks
This lecture is our first encounter with a class of problems in which optimal decision rules are
history dependent 6
We shall encounter another example in the lecture on history dependent policies
There are many more such problems - see chapters 20-24 of [LS12]
Exercises
m t +1 = m t + u t (3.144)
(3.144) “forward” to express pt as a function of current and future values of ms . Note how future
values of m influence the current price level.
At time 0, a monetary authority chooses (commits to) a possibly history-dependent strategy for
setting {ut }∞
t =0
• Code can be found in the file lqcontrol.py from the QuantEcon.py package that implements
the optimal linear regulator
ct + at+1 = (1 + r ) at + yt − τt (3.148)
where
• at is the household’s holdings of an asset at the beginning of t
• r > 0 is a constant net interest rate satisfying β(1 + r ) < 1
• yt is the consumer’s endowment at t
The consumer’s plan for (ct , at+1 ) has to obey the boundary condition ∑∞
t =0 β a t < + ∞
t 2
yt = ρyt−1 , t ≥ 1, (3.149)
over {ct , τt }∞
t=0 subject to the implementability constraints in (3.148) for t ≥ 0 and
λ t = β (1 + r ) λ t +1 (3.151)
for t ≥ 0, where λt ≡ (b − ct )
a. Argue that (3.151) is the Euler equation for a consumer who maximizes (3.147) subject to (3.148),
taking {τt } as a given sequence
b. Formulate the planner’s problem as a Stackelberg problem
c. For β = .95, b = 30, β(1 + r ) = .95, formulate an artificial optimal linear regulator problem and
use it to solve the Stackelberg problem
d. Give a recursive representation of the Stackelberg plan for τt
Contents
• Optimal Taxation
– Overview
– The Ramsey Problem
– Implementation
– Examples
– Exercises
– Solutions
Overview
We want to study the dynamics of tax rates, tax revenues, government debt under a Ramsey plan
Because the Lucas and Stokey model features state-contingent government debt, the government
debt dynamics differ substantially from those in a model of Robert Barro [Bar79]
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde
We cover only the key features of the problem in this lecture, leaving you to refer to that source
for additional results and intuition
Model Features
• Linear quadratic (LQ) model
• Representative household
• Stochastic dynamic programming over an infinite horizon
• Distortionary taxation
We begin by outlining the key assumptions regarding technology, households and the government
sector
Technology Labor can be converted one-for-one into a single, non-storable consumption good
In the usual spirit of the LQ model, the amount of labor supplied in each period is unrestricted
This is unrealistic, but helpful when it comes to solving the model
Realistic labor supply can be induced by suitable parameter values
Households Consider a representative household who chooses a path {`t , ct } for labor and con-
sumption to maximize
1 ∞
− E ∑ βt (ct − bt )2 + `2t
(3.152)
2 t =0
subject to the budget constraint
∞
E ∑ βt p0t [dt + (1 − τt )`t + st − ct ] = 0 (3.153)
t =0
Here
• β is a discount factor in (0, 1)
• p0t is state price at time t
• bt is a stochastic preference parameter
• dt is an endowment process
Government The government imposes a linear tax on labor income, fully committing to a
stochastic path of tax rates at time zero
The government also issues state-contingent debt
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare of
the representative consumer
c t + gt = d t + ` t (3.154)
Government budget constraint Where p0t is a scaled Arrow-Debreu price, the time zero govern-
ment budget constraint is
∞
E ∑ βt p0t (st + gt − τt `t ) = 0 (3.155)
t =0
Solution Step one is to obtain the first order conditions for the household’s problem, taking taxes
and prices as given
Letting µ be the Lagrange multiplier on (3.153), the first order conditions are pt = (ct − bt )/µ and
`t = (ct − bt )(1 − τt )
Rearranging and normalizing at µ = b0 − c0 , we can write these conditions as
bt − c t `t
pt = and τt = 1 − (3.156)
b0 − c0 bt − c t
Substituting (3.156) into the government’s budget constraint (3.155) yields
∞
E ∑ βt (bt − ct )(st + gt − `t ) + `2t = 0
(3.157)
t =0
The Ramsey problem now amounts to maximizing (3.152) subject to (3.157) and (3.154)
The associated Lagrangian is
∞
1
L = E ∑ β − (ct − bt ) + `t + λ (bt − ct )(`t − st − gt ) − `t + µt [dt + `t − ct − gt ]
t 2 2 2
t =0 2
(3.158)
The first order conditions associated with ct and `t are
−(ct − bt ) + λ[−`t + ( gt + st )] = µt
and
`t − λ[(bt − ct ) − 2`t ] = µt
Combining these last two equalities with (3.154) and working through the algebra, one can show
that
`t = `¯ t − νmt and ct = c̄t − νmt (3.159)
where
• ν := λ/(1 + 2λ)
• `¯ t := (bt − dt + gt )/2
• c̄t := (bt + dt − gt )/2
• mt := (bt − dt − st )/2
Apart from ν, all of these quantities are expressed in terms of exogenous variables
To solve for ν, we can use the government’s budget constraint again
The term inside the brackets in (3.157) is (bt − ct )(st + gt ) − (bt − ct )`t + `2t
Using (3.159), the definitions above and the fact that `¯ = b − c̄, this term can be rewritten as
∞ ∞
( ) ( )
E ∑ β (bt − c̄t )( gt + st )
t
+ ( ν − ν )E
2
∑β t
2m2t =0 (3.160)
t =0 t =0
Solving the Quadratic Term Let’s consider how to obtain the term ν in (3.160)
If we can solve the two expected geometric sums
∞ ∞
( ) ( )
b0 := E ∑ βt (bt − c̄t )( gt + st ) and a0 : = E ∑ βt 2m2t (3.161)
t =0 t =0
b0 + a0 (ν2 − ν) = 0
for ν
Provided that 4b0 < a0 , there is a unique solution ν ∈ (0, 1/2), and a unique corresponding λ > 0
Let’s work out how to solve the expectations terms in (3.161)
For the first one, the random variable (bt − c̄t )( gt + st ) inside the summation can be expressed as
1 0
x ( S − Sd + S g ) 0 ( S g + Ss ) x t
2 t b
For the second expectation in (3.161), the random variable 2m2t can be written as
1 0
x ( S − Sd − Ss ) 0 ( Sb − Sd − Ss ) x t
2 t b
It follows that both of these expectations terms are special cases of the expression
∞
q( x0 ) = E ∑ βt xt0 Hxt (3.162)
t =0
Here
• Pt is the t-th power of the transition matrix P
• h is, with some abuse of notation, the vector (h( x1 ), . . . , h( x N ))
• ( Pt h)[ j] indicates the j-th element of Pt h
It can be show that (3.163) is in fact equal to the j-th element of the vector ( I − βP)−1 h
This last fact is applied in the calculations below
Other Variables We are interested in tracking several other variables besides the ones described
above
One is the present value of government obligations outstanding at time t, which can be expressed
as
∞
Bt := Et ∑ β j ptt+ j (τt+ j `t+ j − gt+ j ) (3.164)
j =0
Using our expression for prices and the Ramsey plan, we can also write Bt as
j =0
bt − c t
where
R−
tj : = Et β pt+ j
1 j t
Here Rtj can be thought of as the gross j-period risk-free rate on holding government debt between
t and j
Furthermore, letting Rt be the one-period risk-free rate, we define
and
t
Πt := ∑ πt
s =0
The term πt+1 is the payout on the public’s portfolio of government debt
As shown in the original manuscript, if we distort one-step-ahead transition probabilities by
the adjustment factor
ptt+1
ξ t :=
Et ptt+1
then Πt is a martingale under the distorted probabilities
See the treatment in the manuscript for more discussion and intuition
For now we will concern ourselves with computation
Implementation
"""
import sys
import numpy as np
from numpy import sqrt, eye, dot, zeros, cumsum
from numpy.random import randn
import scipy.linalg
import matplotlib.pyplot as plt
from collections import namedtuple
from quantecon import nullspace, mc_sample_path, var_quadratic_sum
Parameters
===========
T: int
Length of the simulation
Returns
========
path: a namedtuple of type 'Path', containing
g - Govt spending
d - Endowment
b - Utility shift parameter
s - Coupon payment on existing debt
c - Consumption
l - Labor
p - Price
tau - Tax rate
rvn - Revenue
B - Govt debt
R - Risk free gross return
pi - One-period risk-free interest rate
Pi - Cumulative rate of return, adjusted
xi - Adjustment factor for Pi
"""
# == Simplify names == #
beta, Sg, Sd, Sb, Ss = econ.beta, econ.Sg, econ.Sd, econ.Sb, econ.Ss
if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc
if econ.discrete:
state = mc_sample_path(P, init=0, sample_size=T)
x = x_vals[:, state]
else:
# == Generate an initial condition x0 satisfying x0 = A x0 == #
nx, nx = A.shape
x0 = nullspace((eye(nx) - A))
x0 = -x0 if (x0[nx-1] < 0) else x0
x0 = x0 / x0[nx-1]
Pi=Pi,
xi=xi)
return path
def gen_fig_1(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# == Prepare axes == #
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10))
plt.subplots_adjust(hspace=0.4)
for i in range(num_rows):
for j in range(num_cols):
axes[i, j].grid()
axes[i, j].set_xlabel(r'Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
def gen_fig_2(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# == Prepare axes == #
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
bbox = (0., 1.02, 1., .102)
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
Comments on the Code The function var_quadratic_sum imported from quadsums is for com-
puting the value of (3.162) when the exogenous process { xt } is of the VAR type described above
Below the definition of the function, you will see definitions of two namedtuple objects, Economy
and Path
The first is used to collect all the parameters and primitives of a given LQ economy, while the
second collects output of the computations
In Python, a namedtuple is a popular data type from the collections module of the standard
library that replicates the functionality of a tuple, but also allows you to assign a name to each
tuple element
These elements can then be references via dotted attribute notation — see for example the use of
path in the function gen_fig_1()
The benefits of using namedtuples:
• Keeps content organized by meaning
Examples
The Continuous Case Our first example adopts the VAR specification described above
Regarding the primitives, we set
• β = 1/1.05
• bt = 2.135 and st = dt = 0 for all t
Government spending evolves according to
gt+1 − µ g = ρ( gt − µ g ) + Cg w g,t+1
p
with ρ = 0.7, µ g = 0.35 and Cg = µ g 1 − ρ2 /10
Here’s the code, from file lqramsey/lqramsey_ar1.py
"""
Filename: lqramsey_ar1.py
Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski
"""
import numpy as np
from numpy import array
import lqramsey
# == Parameters == #
beta = 1 / 1.05
rho, mg = .7, .35
A = np.identity(2)
A[0, :] = rho, mg * (1-rho)
C = np.zeros((2, 1))
C[0, 0] = np.sqrt(1 - rho**2) * mg / 10
Sg = array((1, 0)).reshape(1, 2)
Sd = array((0, 0)).reshape(1, 2)
Sb = array((0, 2.135)).reshape(1, 2)
Ss = array((0, 0)).reshape(1, 2)
economy = lqramsey.Economy(beta=beta,
Sg=Sg,
Sd=Sd,
Sb=Sb,
Ss=Ss,
discrete=False,
proc=(A, C))
T = 50
path = lqramsey.compute_paths(T, economy)
lqramsey.gen_fig_1(path)
The Discrete Case Our second example adopts a discrete Markov specification for the exoge-
nous process
Here’s the code, from file lqramsey/lqramsey_discrete.py
"""
Filename: lqramsey_discrete.py
Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski
"""
from numpy import array
import lqramsey
# == Parameters == #
beta = 1 / 1.05
P = array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])
# == Possible states of the world == #
# Each column is a state of the world. The rows are [g d b s 1]
x_vals = array([[0.5, 0.5, 0.25],
[0.0, 0.0, 0.0],
[2.2, 2.2, 2.2],
[0.0, 0.0, 0.0],
[1.0, 1.0, 1.0]])
Sg = array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = array((0, 0, 1, 0, 0)).reshape(1, 5)
Ss = array((0, 0, 0, 1, 0)).reshape(1, 5)
economy = lqramsey.Economy(beta=beta,
Sg=Sg,
Sd=Sd,
Sb=Sb,
Ss=Ss,
discrete=True,
proc=(P, x_vals))
T = 15
Exercises
Solutions
Solution notebook
Contents
• History Dependent Public Policies
– Overview
– Two Sources of History Dependence
– Competitive equilibrium
– Ramsey Problem
– Two Subproblems
– Time Inconsistency
– Credible Policy
– Concluding remarks
Overview
This lecture describes history-dependent public policies and some of their representations
History dependent policies are decision rules that depend on the entire past history of the state
variables
History dependent policies naturally emerge in Ramsey problems
A Ramsey planner (typically interpreted as a government) devises a plan of actions at time t = 0
References The presentation below is based on a recent paper by Evans and Sargent [ES13]
Regarding techniques, we will make use of the methods described in
1. the linear regulator lecture
2. the solving LQ dynamic Stackelberg problems lecture
Ramsey Timing Protocol The first timing protocol models a policy maker who can be said to
‘commit’, choosing a sequence of tax rates once-and-for-all at time 0
7 We could also call a competitive equilibrium a rational expectations equilibrium.
Sequence of Governments Timing Protocol For the second timing protocol we use the notion
of a sustainable plan proposed in [CK90], also referred to as a credible public policy in [Sto89]
A key idea here is that history-dependent policies can be arranged so that, when regarded as a
representative firm’s forecasting functions, they confront policy makers with incentives to confirm
them
We follow Chang [Cha98] in expressing such history-dependent plans recursively
Credibility considerations contribute an additional auxiliary state variable in the form of a
promised value to the planner
It expresses how decisions must unfold to give the government the incentive to confirm private
sector expectations when the government chooses sequentially
Competitive equilibrium
• chooses {qt+1 }∞
t=0 to maximize
∞
d
∑ βt pt qt − (qt+1 − qt )2 − τt qt
(3.166)
t =0 2
Let ut := qt+1 − qt be the firm’s ‘control variable’ at time t
First-order conditions for the representative firm’s problem are
β β
ut = pt+1 + βut+1 − τt+1 , t = 0, 1, . . . (3.167)
d d
Q t +1 = Q t + u t (3.169)
Notation: For any scalar xt , let ~x = { xt }∞
t =0
Ramsey Problem
The planner’s objective is cast in terms of consumer surplus net of the firm’s adjustment costs
8 It is important not to set q = Q prematurely. To make the firm a price taker, this equality should be imposed after
t t
and not before solving the firm’s optimization problem.
9 We could instead, perhaps with more accuracy, define a promised marginal value as β ( A − A Q
0 1 t+1 ) − βτt+1 +
ut+1 /β, since this is the object to which the firm’s first-order condition instructs it to equate to the marginal cost dut
of ut = qt+1 − qt . This choice would align better with how Chang [Cha98] chose to express his competitive equilib-
rium recursively. But given (ut , Qt ), the representative firm knows ( Qt+1 , τt+1 ), so it is adequate to take ut+1 as the
intermediate variable that summarizes how ~τt+1 affects the firm’s choice of ut .
Consumer surplus is
Z Q
A1 2
( A0 − A1 x )dx = A0 Q − Q
0 2
Hence the planner’s one-period return function is
A1 2 d 2
A0 Q t − Q − ut (3.170)
2 t 2
At time t = 0, a Ramsey planner faces the intertemporal budget constraint
∞
∑ βt τt Qt = G0 (3.171)
t =1
subject to (3.171)
Thus, the Ramsey timing protocol is:
1. At time 0, knowing ( Q0 , G0 ), the Ramsey planner chooses {τt+1 }∞
t =0
Note: In bringing out the timing protocol associated with a Ramsey plan, we run head on into
a set of issues analyzed by Bassetto [Bas05]. This is because our definition of the Ramsey timing
protocol doesn’t completely describe all conceivable actions by the government and firms as time
unfolds. For example, the definition is silent about how the government would respond if firms,
for some unspecified reason, were to choose to deviate from the competitive equilibrium associ-
ated with the Ramsey plan, possibly prompting violation of government budget balance. This is
an example of the issues raised by [Bas05], who identifies a class of government policy problems
whose proper formulation requires supplying a complete and coherent description of all actors’
behavior across all possible histories. Implicitly, we are assuming that a more complete description
of a government strategy could be specified that (a) agrees with ours along the Ramsey outcome,
and (b) suffices uniquely to implement the Ramsey plan by deterring firms from taking actions
that deviate from the Ramsey outcome path.
∞ ∞
" #
A1 2 d 2
∑ β ( A0 Qt − 2 Qt − 2 ut ) + µ ∑ β τt Qt − G0 − τ0 Q0
t t
(3.173)
t =0 t =0
Two Subproblems
Working backwards, we first present the Bellman equation for the value function that takes both
zt and ut as given. Then we present a value function that takes only z0 as given and is the indirect
utility function that arises from choosing u0 optimally.
Let v( Qt , τt , ut ) be the optimum value function for the time t ≥ 1 government administrator facing
state Qt , τt , ut .
Let w( Q0 ) be the value of the Ramsey plan starting from Q0
Q t +1 = Q t + u t
and
A0 A A 1 1
u t +1 = − + 1 Qt + 1 + ut + τt+1
d d d β d
Here we regard ut as a state
w(z0 ) = max v( Q0 , 0, u0 )
u0
ut
0
where zt = 1 Qt τt are authentic state variables and ut is a variable whose time 0 value is a
‘jump’ variable but whose values for dates t ≥ 1 will become state variables that encode history
dependence in the Ramsey plan
and where
0 − A20 0 0 1 0 0 0
0
A0 A1 −µ 0 1 0 1
− 0 0
R= 2 2
−µ
2 , A = 0 0 0 0 , and B = 1 .
0 2 0 0
A1 A1
0 0 0 d − Ad0 d 0 d + β
1 1
d
2
where
• P solves the algebraic matrix Riccati equation P = R + βA0 PA − βA0 PB( B0 PB)−1 B0 PA
• the optimal policy function is given by τt+1 = − Fyt for F = ( B0 PB)−1 B0 PA
Recursive Representation An outcome of the preceding results is that the Ramsey plan can be
represented recursively as the choice of an initial marginal utility (or rate of growth of output)
according to a function
u0 = υ ( Q0 | µ ) (3.178)
that obeys (3.177) and the following updating equations for t ≥ 0:
Q t +1 = Q t + u t (3.180)
u t +1 = u ( Q t , u t | µ ) (3.181)
We have conditioned the functions υ, τ, and u by µ to emphasize how the dependence of F on G0
appears indirectly through the Lagrange multiplier µ
An Example Calculation We’ll discuss how to compute µ below but first consider the following
numerical example
We take the parameter set [ A0 , A1 , d, β, Q0 ] = [100, .05, .2, .95, 100] and compute the Ramsey plan
with the following piece of code
import numpy as np
from quantecon import LQ
from quantecon.matrix_eqn import solve_discrete_lyapunov
from scipy.optimize import root
Parameters
----------
A0 : float
A constant parameter for the inverse demand function
A1 : float
A constant parameter for the inverse demand function
d : float
A constant parameter for quadratic adjustment cost of production
Q0 : float
An initial condition for production
tau0 : float
An initial condition for taxes
beta : float
A constant parameter for discounting
mu : float
Lagrange multiplier
Returns
-------
T0 : array(float)
Present discounted value of government spending
A : array(float)
One of the transition matrices for the states
B : array(float)
Another transition matrix for the states
F : array(float)
Policy rule matrix
P : array(float)
Value function matrix
"""
# Create Matrices for solving Ramsey problem
R = np.array([[0, -A0/2, 0, 0],
[-A0/2, A1/2, -mu/2, 0],
[0, -mu/2, 0, 0],
[0, 0, 0, d/2]])
A = np.array([[1, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 0, 0],
[-A0/d, A1/d, 0, A1/d+1/beta]])
B = np.array([0, 0, 1, 1/d]).reshape(-1, 1)
Q = 0
return T0, A, B, F, P
# == Primitives == #
T = 20
A0 = 100.0
A1 = 0.05
d = 0.20
beta = 0.95
# == Initial conditions == #
mu0 = 0.0025
Q0 = 1000.0
tau0 = 0.0
def gg(mu):
"""
Computes the tax revenues for the government given Lagrangian
multiplier mu.
"""
return computeG(A0, A1, d, Q0, tau0, beta, mu)
# == Initialize vectors == #
y = np.zeros((4, T))
uhat = np.zeros(T)
uhatdif = np.zeros(T)
tauhat = np.zeros(T)
tauhatdif = np.zeros(T-1)
mu = np.zeros(T)
G = np.zeros(T)
GPay = np.zeros(T)
# == Initial conditions == #
G[0] = G0
mu[0] = mu0
uhatdif[0] = 0
uhat[0] = u0
y[:, 0] = np.vstack([z0, u0]).flatten()
# update G
G[t] = (G[t-1] - beta*y[1, t]*y[2, t])/beta
GPay[t] = beta*y[1, t]*y[2, t]
# find ff = 0
mu[t] = root(ff, mu[t-1]).x
temp, Atemp, Btemp, Ftemp, Ptemp = gg(mu[t])
if __name__ == '__main__':
print("1 Q tau u")
print(y)
print("-F")
print(-F)
come for ( Qt , ut )
From top to bottom, the panels show Qt , τt and ut := Qt+1 − Qt over t = 0, . . . , 15
The optimal decision rule is 10
Notice how the Ramsey plan calls for a high tax at t = 1 followed by a perpetual stream of lower
taxes
Taxing heavily at first, less later expresses time-inconsistency of the optimal plan for {τt+1 }∞
t =0
10 As promised, τt does not appear in the Ramsey planner’s decision rule for τt+1 .
where T1 = ∑∞
t =2 βt−1 Qt τt
The present values T0 and T1 are connected by
Guess a solution that takes the form Tt = y0t Ωyt , then find an Ω that satisfies
Equation (3.183) is a discrete Lyapunov equation that can be solved for Ω using QuantEcon’s
solve_discrete_lyapunov function
The matrix F and therefore the matrix A F = A − BF depend on µ
To find a µ that guarantees that T0 = G0 we proceed as follows:
1. Guess an initial µ, compute a tentative Ramsey plan and the implied T0 = y00 Ω(µ)y0
2. If T0 > G0 , lower µ; if T0 < µ, raise µ
3. Continue iterating on step 3 until T0 = G0
Time Inconsistency
∞
A1 2 d 2
∑ β A0 Q t − 2 Q t − 2 u t
t
t =0
where
• { Qt , ut }∞
t=0 are evaluated under the Ramsey plan whose recursive representation is given by
(3.179), (3.180), (3.181)
• µ0 is the value of the Lagrange multiplier that assures budget balance, computed as de-
scribed above
A1 2 d 2
w ( Q t , u t | µ0 ) = A0 Q t − Q − ut + βw( Qt+1 , ut+1 |µ0 ) (3.185)
2 t 2
for all t ≥ 0, where Qt+1 = Qt + ut
Under the timing protocol affiliated with the Ramsey plan, the planner is committed to the out-
come of iterations on (3.179), (3.180), (3.181)
In particular, when time t comes, the Ramsey planner is committed to the value of ut implied by
the Ramsey plan and receives continuation value w( Qt , ut , µ0 )
That the Ramsey plan is time-inconsistent can be seen by subjecting it to the following ‘revolu-
tionary’ test
First, define continuation revenues Gt that the government raises along the original Ramsey out-
come by
t
Gt = β−t ( G0 − ∑ βs τs Qs ) (3.186)
s =1
where {τt , Qt }∞
t=0 is the original Ramsey outcome
11
Then at time t ≥ 1,
1. take ( Qt , Gt ) inherited from the original Ramsey plan as initial conditions
2. invite a brand new Ramsey planner to compute a new Ramsey plan, solving for a new ut , to
be called ǔt , and for a new µ, to be called µ̌t
The revised Lagrange multiplier µ̌t is chosen so that, under the new Ramsey plan, the government
is able to raise enough continuation revenues Gt given by (3.186)
Would this new Ramsey plan be a continuation of the original plan?
The answer is no because along a Ramsey plan, for t ≥ 1, in general it is true that
w Qt , υ( Qt |µ̌)|µ̌ > w( Qt , ut |µ0 ) (3.187)
Inequality (3.187) expresses a continuation Ramsey planner’s incentive to deviate from a time 0
Ramsey plan by
1. resetting ut according to (3.178)
2. adjusting the Lagrange multiplier on the continuation appropriately to account for tax rev-
enues already collected 12
Inequality (3.187) expresses the time-inconsistency of a Ramsey plan
11 The continuation revenues G are the time t present value of revenues that must be raised to satisfy the original
t
time 0 government intertemporal budget constraint, taking into account the revenues already raised from s = 1, . . . , t
under the original Ramsey plan.
12 For example, let the Ramsey plan yield time 1 revenues Q τ . Then at time 1, a continuation Ramsey planner
1 1
G − βQ1 τ1
would want to raise continuation revenues, expressed in units of time 1 goods, of G̃1 := β . To finance the
remainder revenues, the continuation Ramsey planner would find a continuation Lagrange multiplier µ by applying
the three-step procedure from the previous section to revenue requirements G̃1 .
A Simulation To bring out the time inconsistency of the Ramsey plan, we compare
• the time t values of τt+1 under the original Ramsey plan with
• the value τ̌t+1 associated with a new Ramsey plan begun at time t with initial conditions
( Qt , Gt ) generated by following the original Ramsey plan
Here again Gt := β−t ( G0 − ∑ts=1 βs τs Qs )
The difference ∆τt := τ̌t − τt is shown in the top panel of the following figure
In the second panel we compare the time t outcome for ut under the original Ramsey plan with
the time t value of this new Ramsey problem starting from ( Qt , Gt )
To compute ut under the new Ramsey plan, we use the following version of formula (3.177):
−1
ǔt = − P22 (µ̌t ) P21 (µ̌t )zt
Here zt is evaluated along the Ramsey outcome path, where we have included µ̌t to emphasize
the dependence of P on the Lagrange multiplier µ0 13
To compute ut along the Ramsey path, we just iterate the recursion starting (??) from the initial Q0
with u0 being given by formula (3.177)
Thus the second panel indicates how far the reinitialized value ǔt value departs from the time t
outcome along the Ramsey plan
Note that the restarted plan raises the time t + 1 tax and consequently lowers the time t value of
ut
Associated with the new Ramsey plan at t is a value of the Lagrange multiplier on the continuation
government budget constraint
This is the third panel of the figure
The fourth panel plots the required continuation revenues Gt implied by the original Ramsey plan
These figures help us understand the time inconsistency of the Ramsey plan
Further Intuition One feature to note is the large difference between τ̌t+1 and τt+1 in the top
panel of the figure
If the government is able to reset to a new Ramsey plan at time t, it chooses a significantly higher
tax rate than if it were required to maintain the original Ramsey plan
The intuition here is that the government is required to finance a given present value of expendi-
tures with distorting taxes τ
The quadratic adjustment costs prevent firms from reacting strongly to variations in the tax rate
for next period, which tilts a time t Ramsey planner toward using time t + 1 taxes
As was noted before, this is evident in the first figure, where the government taxes the next period
heavily and then falls back to a constant tax from then on
This can also been seen in the third panel of the second figure, where the government pays off a
significant portion of the debt using the first period tax rate
The similarities between the graphs in the last two panels of the second figure reveals that there is a
one-to-one mapping between G and µ
The Ramsey plan can then only be time consistent if Gt remains constant over time, which will not
be true in general
Credible Policy
We express the theme of this section in the following: In general, a continuation of a Ramsey plan
is not a Ramsey plan
13 It can be verified that this formula puts non-zero weight only on the components 1 and Qt of zt .
Regard J0 as an a discounted present value promised to the Ramsey planner and take it as an
initial condition.
Then after choosing u0 according to
u0 = υ( Q0 , G0 , J0 ), (3.190)
choose subsequent taxes, outputs, and continuation values according to recursions that can be
represented as
τ̂t+1 = τ ( Qt , ut , Gt , Jt ) (3.191)
• Inequality expresses that continuation values adjust to deviations in ways that discourage
the government from deviating from the prescribed τ̂t+1
• Inequality (3.195) indicates that two continuation values Jt+1 contribute to sustaining time t
promised value Jt
14 This choice is the key to what [LS12] call ‘dynamic programming squared’.
– Jt+1 (τ̂t+1 , Ĝt+1 ) is the continuation value when the government chooses to confirm the
private sector’s expectation, formed according to the decision rule (3.191) 15
– Jt+1 (τt+1 , Gt+1 ) tells the continuation consequences should the government disappoint
the private sector’s expectations
The internal structure of a credible plan deters deviations from it
That (3.195) maps two continuation values Jt+1 (τt+1 , Gt+1 ) and Jt+1 (τ̂t+1 , Ĝt+1 ) into one promised
value Jt reflects how a credible plan arranges a system of private sector expectations that induces
the government to choose to confirm them
Chang [Cha98] builds on how inequality (3.195) maps two continuation values into one
Remark Let J be the set of values associated with credible plans
Every value J ∈ J can be attained by a credible plan that has a recursive representation of form
form (3.191), (3.192), (3.193)
The set of values can be computed as the largest fixed point of an operator that maps sets of
candidate values into sets of values
Given a value within this set, it is possible to construct a government strategy of the recursive
form (3.191), (3.192), (3.193) that attains that value
In many cases, there is set a of values and associated credible plans
In those cases where the Ramsey outcome is credible, a multiplicity of credible plans is a key part
of the story because, as we have seen earlier, a continuation of a Ramsey plan is not a Ramsey plan
For it to be credible, a Ramsey outcome must be supported by a worse outcome associated with
another plan, the prospect of reversion to which sustains the Ramsey outcome
Concluding remarks
The term ‘optimal policy’, which pervades an important applied monetary economics literature,
means different things under different timing protocols
Under the ‘static’ Ramsey timing protocol (i.e., choose a sequence once-and-for-all), we obtain a
unique plan
Here the phrase ‘optimal policy’ seems to fit well, since the Ramsey planner optimally reaps early
benefits from influencing the private sector’s beliefs about the government’s later actions
When we adopt the sequential timing protocol associated with credible public policies, ‘optimal
policy’ is a more ambiguous description
There is a multiplicity of credible plans
True, the theory explains how it is optimal for the government to confirm the private sector’s
expectations about its actions along a credible plan
But some credible plans have very bad outcomes
15 Note the double role played by (3.191): as decision rule for the government and as the private sector’s rule for
These bad outcomes are central to the theory because it is the presence of bad credible plans that
makes possible better ones by sustaining the low continuation values that appear in the second
line of incentive constraint (3.195)
Recently, many have taken for granted that ‘optimal policy’ means ‘follow the Ramsey plan’ 16
In pursuit of more attractive ways to describe a Ramsey plan when policy making is in practice
done sequentially, some writers have repackaged a Ramsey plan in the following way
• Take a Ramsey outcome - a sequence of endogenous variables under a Ramsey plan - and
reinterpret it (or perhaps only a subset of its variables) as a target path of relationships among
outcome variables to be assigned to a sequence of policy makers 17
• If appropriate (infinite dimensional) invertibility conditions are satisfied, it can happen that
following the Ramsey plan is the only way to hit the target path 18
• The spirit of this work is to say, “in a democracy we are obliged to live with the sequential
timing protocol, so let’s constrain policy makers’ objectives in ways that will force them to
follow a Ramsey plan in spite of their benevolence” 19
• By this slight of hand, we acquire a theory of an optimal outcome target path
This ‘invertibility’ argument leaves open two important loose ends:
1. implementation, and
2. time consistency
As for (1), repackaging a Ramsey plan (or the tail of a Ramsey plan) as a target outcome sequence
does not confront the delicate issue of how that target path is to be implemented 20
As for (2), it is an interesting question whether the ‘invertibility’ logic can repackage and conceal
a Ramsey plan well enough to make policy makers forget or ignore the benevolent intentions that
give rise to the time inconsistency of a Ramsey plan in the first place
To attain such an optimal output path, policy makers must forget their benevolent intentions be-
cause there will inevitably occur temptations to deviate from that target path, and the implied
relationship among variables like inflation, output, and interest rates along it
Remark The continuation of such an optimal target path is not an optimal target path
16 It is possible to read [Woo03] and [GW10] as making some carefully qualified statements of this type. Some of the
qualifications can be interpreted as advice ‘eventually’ to follow a tail of a Ramsey plan.
17 In our model, the Ramsey outcome would be a path (~ ~ ).
p, Q
18 See [GW10].
19 Sometimes the analysis is framed in terms of following the Ramsey plan only from some future date T onwards.
20 See [Bas05] and [ACK10].
Contents
• Optimal Taxation with State-Contingent Debt
– Overview
– A competitive equilibrium with distorting taxes
– Recursive formulation of the Ramsey problem
– Examples
– Implementation
Overview
This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and Nancy
Stokey [LS83].
The model revisits classic issues about how to pay for a war.
The model features
• a government that must finance an exogenous stream of government expenditures with
– a flat rate tax on labor
– trades in a full array of Arrow state contingent securities
• a representative consumer who values consumption and leisure
• a linear production function mapping leisure into a single good
• a Ramsey planner who at time t = 0 chooses a plan for taxes and borrowing for all t ≥ 0
After first presenting the model in a space of sequences, we shall reformulate it recursively in
terms of two Bellman equations formulated along lines that we encountered in Dynamic Stackel-
berg models.
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state
vector artfully
In particular, we shall include forward-looking variables that summarize the optimal responses of
private agents to the Ramsey plan
See also Optimal taxation for an analysis within a linear-quadratic setting
For t ≥ 0, the history st = [st , st−1 , . . . , s0 ] of an exogenous state st has joint probability density
π t ( s t ).
Government purchases g(s) are an exact time-invariant function of s.
Let ct (st ), `t (st ), and nt (st ) denote consumption, leisure, and labor supply, respectively, at history
st
A representative household is endowed with one unit of time that can be divided between leisure
`t and labor nt :
nt (st ) + `t (st ) = 1 (3.197)
Output equals nt (st ) and can be divided between ct (st ) and gt (st )
c t ( s t ) + gt ( s t ) = n t ( s t ) (3.198)
∞
∑ ∑ βt πt (st )u[ct (st ), `t (st )] (3.199)
t =0 st
where the utility function u is increasing, strictly concave, and three times continuously differen-
tiable in both arguments
The technology pins down a pre-tax wage rate to unity for all t, st
The government imposes a flat rate tax τt (st ) on labor income at time t, history st
There are complete markets in one-period Arrow securities
One unit of an Arrow security issued at time t at history st and promising to pay one unit of time
t + 1 consumption in state st+1 costs pt (st+1 |st )
The government issues one-period Arrow securities each period
The government has a sequence of budget constraints whose time t ≥ 0 component is
where
• pt (st+1 |st ) is the competitive equilibrium price of one-period Arrow state-contingent securi-
ties
• bt (st |st−1 ) is government debt falling due at time t, history st .
Here pt (st+1 |st ) is the price of one unit of consumption at date t + 1 in state st+1 at date t and
history st
The initial government debt b0 (s0 ) is a given initial condition
The representative household has a sequence of budget constraints whose time t ≥ 0 component
is
ct (st ) + ∑ pt (st+1 |st )bt+1 (st+1 |st ) = 1 − τt (st ) nt (st ) + bt (st |st−1 ) ∀t ≥ 0.
(3.201)
s t +1
The household faces the price system as a price-taker and takes the government policy as given.
A competitive equilibrium with distorting taxes is a feasible allocation, a price system, and a
government policy such that
• Given the price system and the government policy, the allocation solves household’s opti-
mization problem.
• Given the allocation, government policy, and price system, the government’s budget con-
straint is satisfied for all t, st
Note: There is a large number of competitive equilibria with distorting taxes, indexed by different
government policies.
The Ramsey problem or optimal taxation problem is to choose a competitive equilibrium with
distorting taxes that maximizes (3.199)
Primal approach We apply a popular approach to solving a Ramsey problem, called the primal
approach
The idea is to use first-order conditions for household optimization to eliminate taxes and prices
in favor of quantities, then pose an optimum problem cast entirely in terms of quantities
After Ramsey quantities have been found, taxes and prices can then be unwound from the alloca-
tion
The primal approach uses four steps:
1. Obtain the first-order conditions of the household’s problem and solve these conditions for
{q0t (st ), τt (st )}∞ t t ∞
t=0 as functions of the allocation { ct ( s ), nt ( s )}t=0 .
2. Substitute these expressions for taxes and prices in terms of the allocation into the house-
hold’s present-value budget constraint
• This intertemporal constraint involves only the allocation and is regarded as an imple-
mentability constraint.
3. Find the Ramsey allocation that maximizes the utility of the representative consumer (3.199)
subject to the feasibility constraints (3.197) and (3.198) and the implementability condition
derived in step 2.
• This optimal allocation is called the Ramsey allocation.
4. Use the Ramsey allocation together with the formulas from step 1 to find taxes and prices.
The Arrow-Debreu price system is related to the system of Arrow securities prices through the
recursion:
q0t+1 (st+1 ) = pt (st+1 |st )q0t (st ) (3.203)
To approach the Ramsey problem, we study the household’s optimization problem
First-order conditions for the household’s problem for `t (st ) and bt (st+1 |st ), respectively, imply
ul (st )
(1 − τt (st )) = (3.204)
uc (st )
and
u c ( s t +1 )
t t
pt+1 (st+1 |s ) = βπ( st+1 |s ) (3.205)
uc (st )
where π( st+1 |st ) is the probability distribution of st+1 conditional on history st
Equation (3.205) implies that the Arrow-Debreu price system satisfies
uc (st )
q0t (st ) = βt πt (st ) (3.206)
u c ( s0 )
Use the first-order conditions (3.204) and (3.205) to eliminate taxes and prices from (3.202) to derive
the implementability condition
∞
∑ ∑ βt πt (st )[uc (st )ct (st ) − u` (st )nt (st )] − uc (s0 )b0 = 0. (3.207)
t =0 s t
subject to (3.207)
V ct (st ), nt (st ), Φ = u[ct (st ), 1 − nt (st )] + Φ uc (st )ct (st ) − u` (st )nt (st ) ,
(3.209)
where {θt (st ); ∀st }t≥0 is a sequence of Lagrange multipliers on the feasible conditions (3.198)
For given initial government debt b0 , we want to maximize J with respect to {ct (st ), nt (st ); ∀st }t≥0 .
The first-order conditions for the Ramsey problem for periods t ≥ 1 and t = 0, respectively, are
ct (st ): (1 + Φ)uc (st ) + Φ ucc (st )ct (st ) − u`c (st )nt (st ) − θt (st ) = 0, t ≥ 1,
: label : eqFONCRamsey1
nt (st ): − (1 + Φ)u` (st ) − Φ uc` (st )ct (st ) − u`` (st )nt (st ) + θt (st ) = 0, t ≥ 1
and
c0 (s0 , b0 ): (1 + Φ)uc (s0 ) + Φ ucc (s0 )c0 (s0 ) − u`c (s0 )n0 (s0 ) − θ0 (s0 ) − Φucc (s0 )b0 = 0,
n0 (s0 , b0 ): − (1 + Φ)u` (s0 ) − Φ uc` (s0 )c0 (s0 ) − u`` (s0 )n0 (s0 ) + θ0 (s0 ) + Φuc` (s0 )b0 = 0.
(3.211)
It is instructive to use first-order conditions (??) for t ≥ 1 to eliminate the multiplier θt (st )
For convenience, we suppress the time subscript and the index st and obtain
(3.212)
gt (st ) = g j (s̃ j ) = g,
then it follows from (3.212) that the optimal choices of consumption and leisure, (ct (st ), `t (st )) and
(c j (s̃ j ), ` j (s̃ j )), are identical.
The proposition affirms that the optimal allocation is a function of the current realized quantity
of government purchases g only and does not depend on the specific history leading up to that
outcome.
Thus, while b0 influences c0 and n0 , there appears no analogous variable bt that influences ct and
nt for t ≥ 1.
The absence of bt as a determinant of the Ramsey allocation for t ≥ 1 but not for t = 0 is a tell-tale
sign of the time-inconsistency of a Ramsey plan.
Φ has to take a value that assures that the consumer’s and the government’s budget constraints
are both satisfied at a candidate Ramsey allocation and price system associated with that Φ.
Further specialization At this point, it is useful to specialize the model in the following way.
We assume that s is governed by a finite state Markov chain with states s ∈ [1, . . . , S] and transition
matrix Π, where
Π(s0 |s) = Prob(st+1 = s0 |st = s).
Also, assume that government purchases g are an exact time-invariant function g(s) of s.
We maintain these assumptions throughout the remainder of this lecture.
Determining Φ We complete the Ramsey plan by computing the Lagrange multiplier Φ on the
implementability constraint (3.207)
Government budget balance restricts Φ via the following line of reasoning.
The household’s first-order conditions imply
ul (st )
(1 − τt (st )) = (3.213)
uc (st )
u c ( s t +1 )
pt+1 (st+1 |st ) = βΠ(st+1 |st ) . (3.214)
uc (st )
Substituting from (3.213), (3.214), and the feasibility condition (3.198) into the recursive version
(3.201) of the household budget constraint gives
uc (st )[nt (st ) − gt (st )] + β ∑ Π(st+1 |st )uc (st+1 )bt+1 (st+1 |st )
s t +1
where s0 denotes a next period value of s and x 0 (s0 ) denotes a next period value of x.
Equation (3.216) is easy to solve for x (s) for s = 1, . . . , S.
If we let ~n, ~g, ~x denote S × 1 vectors whose ith elements are the respective n, g, and x values when
s = i, and let Π be the transition matrix for the Markov state s, then we can express as the matrix
equation
~uc (~n − ~g) + βΠ~x = ~ul~n + ~x. (3.217)
This is a system of S linear equations in the S × 1 vector x, whose solution is
In these equations, by ~uc~n, for example, we mean element-by-element multiplication of the two
vectors.
x (s)
After solving for ~x, we can find b(st |st−1 ) in Markov state st = s from b(s) = uc (s)
or the matrix
equation
~b = ~x , (3.219)
~uc
where division here means element-by-element division of the respective components of the S × 1
vectors ~x and ~uc .
Here is a computational algorithm:
1. Start with a guess for the value for Φ, then use the first-order conditions and the feasibility
conditions to compute c(st ), n(st ) for s ∈ [1, . . . , S] and c0 (s0 , b0 ) and n0 (s0 , b0 ), given Φ.
These are 2(S + 1) equations in 2(S + 1) unknowns.
2. Solve the S equations (3.218) for the S elements of ~x. These depend on Φ.
3. Find a Φ that satisfies
S
uc,0 b0 = uc,0 (n0 − g0 ) − ul,0 n0 + β ∑ Π(s|s0 ) x (s) (3.220)
s =1
by gradually raising Φ if the left side of exceeds the right side and lowering Φ if the
left side is smaller.
4. After computing a Ramsey allocation, we can recover the flat tax rate on labor from (3.204)
and the implied one-period Arrow securities prices from (3.205)
In summary, when gt is a time invariant function of a Markov state st , a Ramsey plan can be
constructed by solving 3S + 3 equations in S components each of ~c, ~n, and ~x together with n0 , c0 ,
and Φ.
xt (st ) = uc (st )bt (st |st−1 ) in equation (3.215) appears to be a purely “forward-looking” variable.
But xt (st ) is a also a natural candidate for a state variable in a recursive formulation of the Ramsey
problem
Intertemporal delegation To express a Ramsey plan recursively, we imagine that a time 0 Ram-
sey planner is followed by a sequence of continuation Ramsey planners at times t = 1, 2, . . ..
A “continuation Ramsey planner” has a different objective function and faces different constraints
than a Ramsey planner.
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled gov-
ernment debts xt (st ) = uc (st )bt (st |st−1 ) as predetermined quantities that continuation Ramsey
planners at times t ≥ 1 are obligated to attain.
A time t ≥ 1 continuation Ramsey planner delivers xt by choosing a suitable nt , ct pair and a list
of st+1 -contingent continuation quantities xt+1 to bequeath to a time t + 1 continuation Ramsey
planner.
A time t ≥ 1 continuation Ramsey planner faces xt , st as state variables.
But the time 0 Ramsey planner faces b0 , not x0 , as a state variable.
Furthermore, the Ramsey planner cares about (c0 (s0 ), `0 (s0 )), while continuation Ramsey plan-
ners do not.
The time 0 Ramsey planner hands x1 as a function of s1 to a time 1 continuation Ramsey planner.
These lines of delegated authorities and responsibilities across time express the continuation Ram-
sey planners’ obligations to implement their parts of the original Ramsey plan, designed once-and-
for-all at time 0.
Two Bellman equations After st has been realized at time t ≥ 1, the state variables confronting
the time t continuation Ramsey planner are ( xt , st ).
The continuation Ramsey problem The Bellman equation for a time t ≥ 1 continuation Ramsey
planner is
V ( x, s) = max u(n − g(s), 1 − n) + β ∑ Π(s0 |s)V ( x 0 , s0 ) (3.221)
n,{ x 0 (s0 )} s0 ∈S
where maximization over n and the S elements of x 0 (s0 ) is subject to the single implementability
constraint for t ≥ 1
x = uc (n − g(s)) − ul n + β ∑ Π(s0 |s) x 0 (s0 ) (3.222)
s0 ∈S
The Ramsey problem The Bellman equation for the time 0 Ramsey planner is
W (b0 , s0 ) = max
n0 ,{ x 0 (s1 )}
u ( n 0 − g0 , 1 − n 0 ) + β ∑ Π ( s1 | s0 )V ( x 0 ( s1 ), s1 ) (3.224)
s1 ∈S
where the maximization over n0 and the S elements of x 0 (s1 ) is subject to the time 0 implementabil-
ity constraint
uc,0 b0 = uc,0 (n0 − g0 ) − ul,0 n0 + β ∑ Π(s1 |s0 ) x 0 (s1 ) (3.225)
s1 ∈S
State variable degeneracy Equations (3.230) and (3.231) imply that Φ0 = Φ1 and that
Vx ( xt , st ) = Φ0 (3.232)
for all t ≥ 1.
When V is concave in x, this implies state-variable degeneracy along a Ramsey plan in the sense that
for t ≥ 1, xt will be a time-invariant function of st .
Given Φ0 , this function mapping st into xt can be expressed as a vector ~x that solves equation for
n and c as functions of g that are associated with Φ = Φ0 .
Manifestations of time inconsistency While the marginal utility adjusted level of government
debt xt is a key state variable for the continuation Ramsey planners at t ≥ 1, it is not a state
variable at time 0.
The time 0 Ramsey planner faces b0 , not x0 = uc,0 b0 , as a state variable.
The discrepancy in state variables faced by the time 0 Ramsey planner and the time t ≥ 1 con-
tinuation Ramsey planners captures the differing obligations and incentives faced by the time 0
Ramsey planner and the time t ≥ 1 continuation Ramsey planners.
• The time 0 Ramsey planner is obligated to honor government debt b0 measured in time 0
consumption goods
• The time 0 Ramsey planner can manipulate the value of government debt as measured by
uc,0 b0 .
• In contrast, time t ≥ 1 continuation Ramsey planners are obligated not to alter values of
debt, as measured by uc,t bt , that they inherit from an earlier Ramsey planner or continuation
Ramsey planner.
When government expenditures gt are a time invariant function of a Markov state st , a Ramsey
plan and associated Ramsey allocation feature marginal utilities of consumption uc (st ) that, given
Φ, for t ≥ 1 depend only on st , but that for t = 0 depend on b0 as well.
This means that uc (st ) will be a time invariant function of st for t ≥ 1, but except when b0 = 0, a
different function for t = 0.
This in turn means that prices of one period Arrow securities pt (st+1 |st ) = p(st+1 |st ) will be the
same time invariant functions of (st+1 , st ) for t ≥ 1, but a different function p0 (s1 |s0 ) for t = 0,
except when b0 = 0.
The differences between these time 0 and time t ≥ 1 objects reflect the workings of the Ramsey
planner’s incentive to manipulate Arrow security prices and, through them, the value of initial
government debt b0 .
Examples
Anticipated One Period War This example illustrates in a simple setting how a Ramsey planner
manages uncertainty.
Government expenditures are known for sure in all periods except one
• For t < 3 or t > 3 we assume that gt = gl = 0.1.
• At t = 3 a war occcurs with probability 0.5.
– If there is war, g3 = gh = 0.2
– If there is no war g3 = gl = 02..
We define the components of the state vector as the following six (t, g) pairs:
(0, gl ), (1, gl ), (2, gl ), (3, gl ), (3, gh ), (t ≥ 4, gl ).
We think of these 6 states as corresponding to s = 1, 2, 3, 4, 5, 6
c 1− σ n 1+ γ
u(c, n) = −
1−σ 1+γ
Note: For convenience in terms of matching our computer code, we have expressed utility as a
function of n rather than leisure l
This has the consequence of raising the time t = 0 value of gross interest rate for risk-free loans
between periods t and t + 1, which equals
uc,t
Rt =
βEt [uc,t+1 ]
A tax policy that makes time t = 0 consumption be higher than time t = 1 consumption evidently
increases the risk-free rate one-period interest rate, Rt , at t = 0
Raising the time t = 0 risk-free interest rate makes time t = 0 consumption goods cheaper relative
to consumption goods at later dates, thereby lowering the value uc,0 b0 of initial government debt
b0
We see this in a figure below that plots the time path for the risk free interest rate under both
realizations of the time t = 3 government expenditure shock
The figure illustrates how the government lowers the interest rate at time 0 by raising consump-
tion.
• These purchases are designed in such a way that regardless of whether or not there is a war
at t = 3, the government will begin period t = 4 with the same government debt
• This time t = 4 debt level is one that can be serviced with revenues from the constant tax
rate set at times t ≥ 1
At times t ≥ 4 the government rolls over its debt, knowing that the tax rate is set at level required
to service the interest payments on the debt and government expenditures
Time 0 manipulation of interest rate We have seen that when b0 > 0, the Ramsey plan sets
the time t = 0 tax partly with an eye toward raising a risk-free interest rate for one-period loans
between times t = 0 and t = 1
By raising this interest rate, the plan makes time t = 0 goods cheap relative to consumption goods
at later times
By doing this, it lowers the value of time t = 0 debt that somehow it has inherited and must
finance
Time 0 and Time Inconsistency In the preceding example, the Ramsey tax rate at time 0 differs
from its value at time 1
To explore what is going on here, let’s simplify things by removing the possibility of war at time
t=3
The Ramsey problem then includes no randomness because gt = gl for all t
The figure below plots the Ramsey tax rates at time t = 0 and time t ≥ 1 as functions of the initial
government debt
The figure indicates that if the government enters with positive debt, it sets a tax rate at t = 0 that
is less than all later tax rates
By setting a lower tax rate at t = 0, the government raises consumption, which reduces the value
uc,0 b0 of its initial debt
It does this by increasing c0 and thereby lowering uc,0
Conversely, if b0 < 0, the Ramsey planner sets the tax rate at t = 0 higher than in subsequent
periods.
A side effect of lowering time t = 0 consumption is that it raises the one-period interest rate at
time 0 above that of subsequent periods.
There are only two values of initial government debt at which the tax rate is constant for all t ≥ 0
The first is b0 = 0.
• Here the government can’t use the t = 0 tax rate to alter the value of the initial debt
The second occurs when the government enters with sufficiently large assets that the Ramsey
planner can achieve first best and sets τt = 0 for all t
It is only for these two values of initial government debt that the Ramsey plan is time consistent.
Another way of saying this is that, except for these two values of initial government debt, a con-
tinuation of a Ramsey plan is not a Ramsey plan
To illustrate this, consider a Ramsey planner who starts with an initial government debt b1 associ-
ated with one of the Ramsey plans computed above
Call τ1R the time t = 0 tax rate chosen by the Ramsey planner confronting this value for initial
government debt government
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner and
what the new Ramsey planner would choose for its time t = 0
The tax rates in the figure are equal for only two values of initial government debt.
Tax Smoothing and non-CES preferences The complete tax smoothing for t ≥ 1 in the preced-
ing example is a consequence of our having assumed CES preferences
To see what is driving this outcome, we begin by noting that the tax rate will be a time invariant
function τ (Φ, g) of the Lagrange multiplier on the implementability constraint and government
expenditures
For CES preferences, we can exploit the relations Ucc c = −σUc and Unn n = γUn to derive
(1 + (1 − σ)Φ)Uc
=1
1 + (1 − γ)Φ)Un
from the first order conditions.
This equation immediately implies that the tax rate is constant
For other preferences, the tax rate may not be constant.
For example, let the period utility function be
and suppose that gt follows a two state i.i.d. process with equal weights on gl and gh
The figure below plots a sample path of the Ramsey tax rate
We computed the tax rate by using both the sequential and recursive approaches described above
As should be expected, the recursive and sequential solutions produce almost identical allocations
Unlike outcomes with CES preferences, the tax rate is not perfectly smoothed
Instead the government raises the tax rate when gt is high
Implementation
@author: dgevans
"""
import numpy as np
from scipy.optimize import root
from scipy.optimize import fmin_slsqp
from scipy.interpolate import UnivariateSpline
from quantecon import compute_fixed_point
from quantecon.markov import mc_sample_path
class Planners_Allocation_Sequential(object):
'''
Class returns planner's allocation as a function of the multiplier on the
implementability constraint mu
'''
def __init__(self,Para):
'''
Initializes the class from the calibration Para
'''
self.beta = Para.beta
self.Pi = Para.Pi
self.G = Para.G
self.S = len(Para.Pi) # number of states
self.Theta = Para.Theta
self.Para = Para
#now find the first best allocation
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
Para = self.Para
S,Theta,Uc,Un,G = self.S,self.Theta,Para.Uc,Para.Un,self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack(
[Theta*Uc(c,n)+Un(c,n), Theta*n - c - G]
)
res = root(res,0.5*np.ones(2*S))
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
self.XiFB = Uc(self.cFB,self.nFB) #multiplier on the resource constraint.
self.zFB = np.hstack([self.cFB,self.nFB,self.XiFB])
def time1_allocation(self,mu):
'''
Computes optimal allocation for time t\geq 1 for a given \mu
'''
Para = self.Para
S,Theta,G,Uc,Ucc,Un,Unn = self.S,self.Theta,self.G,Para.Uc,Para.Ucc,Para.Un,Para.Unn
def FOC(z):
c = z[:S]
n = z[S:2*S]
Xi = z[2*S:]
return np.hstack([
Uc(c,n) - mu*(Ucc(c,n)*c+Uc(c,n)) -Xi, #foc c
Un(c,n) - mu*(Unn(c,n)*n+Un(c,n)) + Theta*Xi, #foc n
Theta*n - c - G #resource constraint
])
#now compute x
I = Uc(c,n)*c + Un(c,n)*n
x = np.linalg.solve(np.eye(S) - self.beta*self.Pi, I )
return c,n,x,Xi
def time0_allocation(self,B_,s_0):
'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
Para,Pi,Theta,G,beta = self.Para,self.Pi,self.Theta,self.G,self.beta
Uc,Ucc,Un,Unn = Para.Uc,Para.Ucc,Para.Un,Para.Unn
def FOC(z):
mu,c,n,Xi = z
xprime = self.time1_allocation(mu)[2]
return np.hstack([
Uc(c,n)*(c-B_) + Un(c,n)*n + beta*Pi[s_0].dot(xprime),
Uc(c,n) - mu*(Ucc(c,n)*(c-B_) + Uc(c,n)) - Xi,
Un(c,n) - mu*(Unn(c,n)*n+Un(c,n)) + Theta[s_0]*Xi,
(Theta*n - c - G)[s_0]
])
#find root
res = root(FOC,np.array([0.,self.cFB[s_0],self.nFB[s_0],self.XiFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
def time1_value(self,mu):
'''
Find the value associated with multiplier mu
'''
c,n,x,Xi = self.time1_allocation(mu)
U = self.Para.U(c,n)
V = np.linalg.solve(np.eye(self.S) - self.beta*self.Pi, U )
return c,n,x,V
def Tau(self,c,n):
'''
Computes Tau given c,n
'''
Para = self.Para
Uc,Un = Para.Uc(c,n),Para.Un(c,n)
def simulate(self,B_,s_0,T,sHist=None):
'''
Simulates planners policies for T periods
'''
Para,Pi,beta = self.Para,self.Pi,self.beta
Uc = Para.Uc
if sHist == None:
sHist = mc_sample_path(Pi,s_0,T)
cHist,nHist,Bhist,TauHist,muHist = np.zeros((5,T))
RHist = np.zeros(T-1)
#time0
mu,cHist[0],nHist[0],_ = self.time0_allocation(B_,s_0)
TauHist[0] = self.Tau(cHist[0],nHist[0])[s_0]
Bhist[0] = B_
muHist[0] = mu
#time 1 onward
for t in range(1,T):
c,n,x,Xi = self.time1_allocation(mu)
Tau = self.Tau(c,n)
u_c = Uc(c,n)
s = sHist[t]
Eu_c = Pi[sHist[t-1]].dot(u_c)
cHist[t],nHist[t],Bhist[t],TauHist[t] = c[s],n[s],x[s]/u_c[s],Tau[s]
RHist[t-1] = Uc(cHist[t-1],nHist[t-1])/(beta*Eu_c)
muHist[t] = mu
return cHist,nHist,Bhist,TauHist,sHist,muHist,RHist
class Planners_Allocation_Bellman(object):
'''
Compute the planner's allocation by solving Bellman
equation.
'''
def __init__(self,Para,mugrid):
'''
Initializes the class from the calibration Para
'''
self.beta = Para.beta
self.Pi = Para.Pi
self.G = Para.G
self.S = len(Para.Pi) # number of states
self.Theta = Para.Theta
self.Para = Para
self.mugrid = mugrid
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration Para and initial grid mugrid0
'''
Para,mugrid0 = self.Para,self.mugrid
S = len(Para.Pi)
Vf,cf,nf,xprimef = {},{},{},{}
for s in range(2):
cf[s] = UnivariateSpline(x[:,s],c[:,s])
nf[s] = UnivariateSpline(x[:,s],n[:,s])
Vf[s] = UnivariateSpline(x[:,s],V[:,s])
for sprime in range(S):
xprimef[s,sprime] = UnivariateSpline(x[:,s],x[:,s])
policies = [cf,nf,xprimef]
#create xgrid
xbar = [x.min(0).max(),x.max(0).min()]
xgrid = np.linspace(xbar[0],xbar[1],len(mugrid0))
self.xgrid = xgrid
Vfnew,policies = self.fit_policy_function(PF)
diff = 0.
for s in range(S):
diff = max(diff, np.abs((Vf[s](xgrid)-Vfnew[s](xgrid))/Vf[s](xgrid)).max() )
print(diff)
Vf = Vfnew
def fit_policy_function(self,PF):
'''
Fits the policy functions PF using the points xgrid using UnivariateSpline
'''
xgrid,S = self.xgrid,self.S
Vf,cf,nf,xprimef = {},{},{},{}
for s in range(S):
PFvec = np.vstack(map(lambda x:PF(x,s),xgrid))
Vf[s] = UnivariateSpline(xgrid,PFvec[:,0],s=0)
cf[s] = UnivariateSpline(xgrid,PFvec[:,1],s=0,k=1)
nf[s] = UnivariateSpline(xgrid,PFvec[:,2],s=0,k=1)
for sprime in range(S):
xprimef[s,sprime] = UnivariateSpline(xgrid,PFvec[:,3+sprime],s=0,k=1)
return Vf,[cf,nf,xprimef]
def Tau(self,c,n):
'''
Computes Tau given c,n
'''
Para = self.Para
Uc,Un = Para.Uc(c,n),Para.Un(c,n)
def time0_allocation(self,B_,s0):
'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
PF = self.T(self.Vf)
z0 = PF(B_,s0)
c0,n0,xprime0 = z0[1],z0[2],z0[3:]
return c0,n0,xprime0
def simulate(self,B_,s_0,T,sHist=None):
'''
Simulates Ramsey plan for T periods
'''
Para,Pi = self.Para,self.Pi
Uc = Para.Uc
cf,nf,xprimef = self.policies
if sHist == None:
sHist = mc_sample_path(Pi,s_0,T)
cHist,nHist,Bhist,TauHist,muHist = np.zeros((5,T))
RHist = np.zeros(T-1)
#time0
cHist[0],nHist[0],xprime = self.time0_allocation(B_,s_0)
TauHist[0] = self.Tau(cHist[0],nHist[0])[s_0]
Bhist[0] = B_
muHist[0] = 0.
#time 1 onward
for t in range(1,T):
s,x = sHist[t],xprime[sHist[t]]
c,n,xprime = np.empty(self.S),nf[s](x),np.empty(self.S)
for shat in range(self.S):
c[shat] = cf[shat](x)
for sprime in range(self.S):
xprime[sprime] = xprimef[s,sprime](x)
Tau = self.Tau(c,n)[s]
u_c = Uc(c,n)
Eu_c = Pi[sHist[t-1]].dot(u_c)
muHist[t] = self.Vf[s](x,1)
RHist[t-1] = Uc(cHist[t-1],nHist[t-1])/(self.beta*Eu_c)
cHist[t],nHist[t],Bhist[t],TauHist[t] = c[s],n,x/u_c[s],Tau
return cHist,nHist,Bhist,TauHist,sHist,muHist,RHist
class BellmanEquation(object):
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
def __init__(self,Para,xgrid,policies0):
'''
Initializes the class from the calibration Para
'''
self.beta = Para.beta
self.Pi = Para.Pi
self.G = Para.G
self.S = len(Para.Pi) # number of states
self.Theta = Para.Theta
self.Para = Para
self.xbar = [min(xgrid),max(xgrid)]
self.time_0 = False
self.z0 = {}
cf,nf,xprimef = policies0
for s in range(self.S):
for x in xgrid:
xprime0 = np.empty(self.S)
for sprime in range(self.S):
xprime0[sprime] = xprimef[s,sprime](x)
self.z0[x,s] = np.hstack([cf[s](x),nf[s](x),xprime0])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
Para = self.Para
S,Theta,Uc,Un,G = self.S,self.Theta,Para.Uc,Para.Un,self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack(
[Theta*Uc(c,n)+Un(c,n), Theta*n - c - G]
)
res = root(res,0.5*np.ones(2*S))
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB,self.nFB)*self.cFB + Un(self.cFB,self.nFB)*self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack([self.cFB[s],self.nFB[s],self.xFB])
def __call__(self,Vf):
'''
Given continuation value function next period return value function this
period return T(V) and optimal policies
'''
if not self.time_0:
PF = lambda x,s: self.get_policies_time1(x,s,Vf)
else:
PF = lambda B_,s0: self.get_policies_time0(B_,s0,Vf)
return PF
def get_policies_time1(self,x,s,Vf):
'''
Finds the optimal policies
'''
Para,beta,Theta,G,S,Pi = self.Para,self.beta,self.Theta,self.G,self.S,self.Pi
U,Uc,Un = Para.U,Para.Uc,Para.Un
def objf(z):
c,n,xprime = z[0],z[1],z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
return -(U(c,n)+beta*Pi[s].dot(Vprime))
def cons(z):
c,n,xprime = z[0],z[1],z[2:]
return np.hstack([
x - Uc(c,n)*c-Un(c,n)*n - beta*Pi[s].dot(xprime),
(Theta*n - c - G)[s]
])
out,fx,_,imode,smode = fmin_slsqp(objf,self.z0[x,s],f_eqcons=cons,
bounds=[(0.,100),(0.,100)]+[self.xbar]*S,full_output=True,iprint=0)
if imode >0:
raise Exception(smode)
self.z0[x,s] = out
return np.hstack([-fx,out])
def get_policies_time0(self,B_,s0,Vf):
'''
Finds the optimal policies
'''
Para,beta,Theta,G,S,Pi = self.Para,self.beta,self.Theta,self.G,self.S,self.Pi
U,Uc,Un = Para.U,Para.Uc,Para.Un
def objf(z):
c,n,xprime = z[0],z[1],z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
return -(U(c,n)+beta*Pi[s0].dot(Vprime))
def cons(z):
c,n,xprime = z[0],z[1],z[2:]
return np.hstack([
-Uc(c,n)*(c-B_)-Un(c,n)*n - beta*Pi[s0].dot(xprime),
(Theta*n - c - G)[s0]
])
out,fx,_,imode,smode = fmin_slsqp(objf,self.zFB[s0],f_eqcons=cons,
bounds=[(0.,100),(0.,100)]+[self.xbar]*S,full_output=True,iprint=0)
if imode >0:
raise Exception(smode)
return np.hstack([-fx,out])
Further Comments A related lecture describes an extension of the Lucas-Stokey model by Aiya-
gari, Marcet, Sargent, and Seppälä (2002) [AMSS02]
In their (AMSS) economy, only a risk-free bond is traded
We compare the recursive representation of the Lucas-Stokey model presented in this lecture with
the one in that lecture
By comparing these recursive formulations, we can glean a sense in which the dimension of the
state is lower in the Lucas Stokey model.
Accompanying that difference in dimension are quite different dynamics of government debt
Overview
In an earlier lecture we described a model of optimal taxation with state-contingent debt due to
Robert Lucas and Nancy Stokey [LS83].
Aiyagari, Marcet, Sargent, and Seppälä [AMSS02] (hereafter, AMSS) adapt this framework to
study optimal taxation without state-contingent debt.
In this lecture we
• describe the assumptions and equilibrium concepts
• solve the model
• implement the model numerically and
• conduct some policy experiments
We begin with an introduction to the model.
Many but not all features of the economy are identical to those of the Lucas-Stokey economy.
Let’s start with the things that are.
For t ≥ 0, the history of the state is represented by st = [st , st−1 , . . . , s0 ] .
Government purchases g(s) are an exact time-invariant function of s.
Let ct (st ), `t (st ), and nt (st ) denote consumption, leisure, and labor supply, respectively, at history
st at time t.
A representative household is endowed with one unit of time that can be divided between leisure
`t and labor nt :
nt (st ) + `t (st ) = 1. (3.233)
Output equals nt (st ) and can be divided between consumption ct (st ) and gt (st )
c t ( s t ) + gt ( s t ) = n t ( s t ). (3.234)
∞
∑ ∑ βt πt (st )u[ct (st ), `t (st )], (3.235)
t =0 st
where
• πt (st ) is a joint probability distribution over the sequence st , and
• the utility function u is increasing, strictly concave, and three times continuously differen-
tiable in both arguments
The technology pins down a pre-tax wage rate to unity for all t, st
The government imposes a flat rate tax τt (st ) on labor income at time t, history st
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities.
It is at this point that AMSS [AMSS02] modify the Lucas and Stokey economy
bt + 1 ( s t )
bt (st−1 ) = τtn (st )nt (st ) − gt (st ) − Tt (st ) +
Rt (st )
(3.236)
b (st )
≡ z ( s ) + t +1 t ,
t
Rt (s )
Substituting this expression into the government’s budget constraint (3.236) yields:
u c ( s t +1 )
bt ( s t − 1 ) = z ( s t ) + ∑ βπt+1 (st+1 |st )
uc (st )
bt + 1 ( s t ) . (3.237)
s t +1 | s t
That bt+1 (st ) is the same for all realizations of st+1 captures its risk-free quality
We will now replace this constant bt+1 (st ) by another expression of the same magnitude.
In fact, we have as many candidate expressions for that magnitude as there are possible states st+1
That is,
• For each state st+1 there is a government budget constraint that is the analogue to expression
(3.237) where the time index is moved one period forward.
• All of those budget constraints have a right side that equals bt+1 (st ).
Instead of picking one of these candidate expressions to replace all occurrences of bt+1 (st ) in equa-
tion (3.237), we replace bt+1 (st ) when the summation index in equation is st+1 by the right side of
next period’s budget constraint that is associated with that particular realization st+1 .
21 In an allocation that solves the Ramsey problem and that levies distorting taxes on labor, why would the govern-
ment ever want to hand revenues back to the private sector? Not in an economy with state-contingent debt, since any
such allocation could be improved by lowering distortionary taxes rather than handing out lump-sum transfers. But
without state-contingent debt there can be circumstances when a government would like to make lump-sum transfers
to the private sector.
u c ( s t +1 ) bt + 2 ( s t + 1 )
bt ( s t −1 t
) = z(s ) + ∑ βπt+1 (s t +1
|s )t
uc (st )
z(s t +1
)+
R t +1 ( s t +1 )
.
s t +1 | s t
After similar repeated substitutions for all future occurrences of government indebtedness, and
by invoking the natural debt limit, we arrive at:
∞
uc (st+ j )
bt ( s t − 1 ) = ∑ ∑ β j πt+ j ( st+ j | st )
uc (st )
z ( st+ j )
j =0 s t + j | s t
∞ t+ j )
(3.238)
= Et ∑ β j uc (s
z(s t+ j
).
j =0
uc (st )
Comparison with Lucas-Stokey economy It is instructive to compare the present economy with-
out state-contingent debt to the Lucas-Stokey economy.
Suppose that the initial government debt in period 0 and state s0 is the same across the two
economies:
b0 (s−1 ) = b0 (s0 |s−1 ).
Implementability condition (3.239) of our AMSS economy is exactly the same as the one for the
Lucas-Stokey economy
While (3.239) is the only implementability condition arising from budget constraints in the com-
plete markets economy, many more implementability conditions must be satisfied in the AMSS
economy because there is no state-contingent debt.
Because the beginning-of-period indebtedness is the same across any two histories, for any two
realizations st and s̃t that share the same history until the previous period, i.e., st−1 = s̃t−1 , we
must impose equality across the right sides of their respective budget constraints, as depicted in
expression (3.237).
Ramsey problem without state-contingent debt The Ramsey planner chooses an allocation that
solves
∞
E 0 ∑ β t u c t ( s t ), 1 − c t ( s t ) − gt ( s t )
max
{ct (st ),bt+1 (st )} t =0
where the maximization is subject to
∞ j
E 0 ∑ β j uuc((ss0)) z(s j ) ≥ b0 (s−1 ) (3.240)
j =0 c
and
∞ t+ j
E t ∑ β j uuc ((sst )) z(st+j ) = bt (st−1 ) ∀ st (3.241)
j =0 c
given b0 (s−1 )
Here we have
• substituted the resource constraint into the utility function, and also
• substituted the resource constraint into the net-of-interest government surplus and
• used the household’s first-order condition, 1 − τtn (st ) = u` (st )/uc (st ), to eliminate the labor
tax rate.
Hence, the net-of-interest government surplus now reads
u (st )
z(st ) = 1 − ` t ct (st ) + gt (st ) − gt (st ) − Tt (st ) . : label : AMSS4 4
uc (s )
Depending on how the constraints bind, these multipliers might be positive or negative,
A negative multiplier γt (st ) < 0 means that if we could relax constraint , we would like to increase
the beginning-of-period indebtedness for that particular realization of history st
That would let us reduce the beginning-of-period indebtedness for some other history 22
These features flow from the fact that the government cannot use state-contingent debt and there-
fore cannot allocate its indebtedness most efficiently across future states.
Apply two transformations to the Lagrangian.
Multiply constraint by uc (s0 ) and the constraints by βt uc (st ).
22 We will soon see from the first-order conditions of the Ramsey problem, there would then exist another realization
s̃t with the same history up until the previous period, i.e., s̃t−1 = st−1 , but where the multiplier on constraint takes on
a positive value γt (s̃t ) > 0.
where
Ψ t ( s t ) = Ψ t − 1 ( s t − 1 ) + γt ( s t ) and Ψ−1 (s−1 ) = 0. (3.243)
In (3.242), the second equality uses the law of iterated expectations and Abel’s summation for-
mula.
The first-order condition with respect to ct (st ) can be expressed as
uc (st ) − u` (st ) + Ψt (st ) ucc (st ) − uc` (st ) z(st ) + uc (st ) zc (st )
If we substitute z(st ) from equation (3.240) and its derivative zc (st ) into first-order condition
(3.244), we will find only two differences from the corresponding condition for the optimal al-
location in a Lucas-Stokey economy with state-contingent government debt.
1. The term involving bt (st−1 ) in first-order condition (3.244) does not appear in the corre-
sponding expression for the Lucas-Stokey economy
• This term reflects the constraint that beginning-of-period government indebtedness
must be the same across all realizations of next period’s state, a constraint that is not
present if government debt can be state contingent.
2. The Lagrange multiplier Ψt (st ) in first-order condition (3.244) may change over time in re-
sponse to realizations of the state, while the multiplier Φ in the Lucas-Stokey economy is
time invariant.
To analyze the AMSS model, we find it useful to adopt a recursive formulation using tech-
niques like those in the lectures on dynamic stackelberg models and optimal taxation with state-
contingent debt
• leaves intact the single implementability constraint on allocations from the Lucas-Stokey
economy
• while adding measurability constraints on functions of tails of the allocations at each time
and history.
These functions represent the present values of government surpluses.
We now explore how these constraints alter the Bellman equations for a time 0 Ramsey planner
and for time t ≥ 1, history st continuation Ramsey planners.
Recasting state variables In the AMSS setting, the government faces a sequence of budget con-
straints
τt (st )nt (st ) + Tt (st ) + bt+1 (st )/Rt (st ) = gt + bt (st−1 ),
where Rt (st ) is the gross risk-free rate of interest between t and t + 1 at history st and Tt (st ) are
nonnegative transfers.
In most of the remainder of this lecture, we shall set transfers to zero.
In this case the household faces a sequence of budget constraints
bt (st−1 ) + (1 − τt (st ))nt (st ) = ct (st ) + bt+1 (st )/Rt (st ). (3.246)
The household’s first-order conditions are uc,t = βRt E t uc,t+1 and (1 − τt )uc,t = ul,t .
Using these to eliminate Rt and τt from the budget constraint (3.246) gives
ul,t (st ) β(E t uc,t+1 )bt+1 (st )
bt ( s t − 1 ) + t
nt (st ) = ct (st ) + (3.247)
uc,t (s ) uc,t (st )
or
uc,t (st )bt (st−1 ) + ul,t (st )nt (st ) = uc,t (st )ct (st ) + β(E t uc,t+1 )bt+1 (st ) (3.248)
Now define
bt + 1 ( s t )
xt ≡ βbt+1 (st )E t uc,t+1 = uc,t (st ) (3.249)
Rt (st )
and represent the household’s budget constraint at time t, history st as
uc,t xt−1
βE t−1 uc,t
= uc,t ct − ul,t nt + xt (3.250)
for t ≥ 1.
uc (s) x−
= uc (s)(n(s) − g(s)) − ul (s)n(s) + x (s). (3.253)
β ∑s̃ Π(s̃|s− )uc (s̃)
subject to
uc,0 b0 = uc,0 (n0 − g0 ) − ul,0 n0 + x0 . (3.255)
uc (s)
Vx ( x− , s− ) = ∑ Π(s|s− )µ(s|s− ) β ∑ . (3.257)
s s̃ Π ( s̃ | s− ) uc ( s̃ )
uc (s)
Π̌(s|s− ) ≡ Π(s|s− ) .
∑s̃ Π(s̃|s− )uc (s̃)
Absence of state-variable degeneracy Along a Ramsey plan, the state variable xt = xt (st , b0 )
becomes a function of the history st and also the initial government debt b0 .
In our recursive formulation of the Lucas-Stokey, we found that
• the counterpart to Vx ( x, s) is time invariant and equal to the Lagrange multiplier on the
Lucas-Stokey implementability constraint.
• the time invariance of Vx ( x, s) in the Lucas-Stokey model is the source of the key feature of
the Lucas-Stokey model, namely, state variable degeneracy (i.e., xt is an exact function of st ),
That Vx ( x, s) varies over time according to a twisted martingale means that there is no state-
variable degeneracy in the AMSS model.
In the AMSS model, both x and s are needed to describe the state.
This property of the AMSS model is what transmits a twisted martingale-like component to con-
sumption, employment, and the tax rate.
That the Ramsey allocation for the AMSS model differs from the Ramsey allocation of the Lucas-
Stokey model is a symptom that the measurability constraints bind.
Following Bhandari, Evans, Golosov, and Sargent [BEGS13] (henceforth BEGS), we now consider
a special case of the AMSS model in which these constraints don’t bind.
Here the AMSS Ramsey planner chooses not to issue state-contingent debt, though he is free to do
so.
The environment is one in which fluctuations in the risk-free interest rate provide the insurance
that the Ramsey planner wants.
Following BEGS, we set S = 2 and assume that the state st is i.i.d., so that the transition matrix
Π(s0 |s) = Π(s0 ) for s = 1, 2.
Following BEGS, it is useful to consider the following special case of the implementability con-
straints evaluated at the constant value of the state variable x− = x (s) = x̌:
uc (s) x̌
= uc (s)(n(s) − g(s)) − ul (s)n(s) + x̌, s = 1, 2. (3.259)
β ∑s̃ Π(s̃)uc (s̃)
We guess and verify that the scaled Lagrange multiplier µ(s) = µ is a constant independent of s.
At a fixed x, because Vx ( x, s) must be independent of s− , equation becomes
uc (s)
Vx ( x̌ ) = ∑ Π(s) Vx ( x̌ ) = Vx ( x̌ ).
s ∑s̃ Π(s̃)uc (s̃)
uc (s) − ul (s)µ×
x̌
(ucc (s) − ucl (s)) − uc (s) − n(s)(ucc (s) − ucl (s)) − ulc (s)n(s) + ul (s)
β ∑s̃ Π(s̃)uc (s̃)
= 0. (3.260)
Equations (3.260) and are four equations in the four unknowns n(s), s = 1, 2, x̌, and µ.
Under some regularity conditions on the period utility function u(c, l ), BEGS show that these
equations have a unique solution featuring a negative value of x̌.
Consumption c(s) and the flat-rate tax on labor τ (s) can then be constructed as history-
independent functions of s.
bt + 1 ( s )
In this AMSS economy, x̌ = x (s) = uc (s) Rt (s)
.
The risk-free interest rate, the tax rate, and the marginal utility of consumption fluctuate with s,
but x does not and neither does µ(s).
The labor tax rate and the allocation depend only on the current value of s.
For this special AMSS economy to be in a steady state from time 0 onward, it is necessary that
initial debt b0 satisfy the time 0 implementability constraint at the value x̌ and the realized value
of s0 .
We can solve for this level of b0 by plugging the n(s0 ) and x̌ that solve our four equation system
into
uc,0 b0 = uc,0 (n0 − g0 ) − ul,0 n0 + x̌
and solving for b0 .
This b0 assures that a steady state x̌ prevails from time 0 on.
Relationship to a Lucas-Stokey economy The constant value of the Lagrange multiplier µ(s) in
the Ramsey plan for our special AMSS economy is a tell tale sign that the measurability restrictions
imposed on the Ramsey allocation by the requirement that government debt must be risk free are
slack.
When they bind, those measurability restrictions cause the AMSS tax policy and allocation to be
history dependent — that’s what activates flucations in the risk-adjusted martingale.
Because those measurability conditions are slack in this special AMSS economy, we can also view
this as a Lucas-Stokey economy that starts from a particular initial government debt.
The setting of b0 for the corresponding Lucas-Stokey implementability constraint solves
In this Lucas-Stokey economy, although the Ramsey planner is free to issue state-contingent debt,
it chooses not to and instead issues only risk-free debt.
It achieves the desired risk-sharing with the private sector by altering the amounts of one-period
risk-free debt it issues at each curent state, while understanding the equilibrium interest rate fluc-
tuations that its tax policy induces.
Convergence to the special case In an i.i.d., S = 2 AMSS economy in which the initial b0 does not
equal the special value described in the previous subsection, x fluctuates and is history-dependent.
The Lagrange multiplier µs (st ) is a non trivial risk-adjusted martingale and the allocation and
distorting tax rate are both history dependent, as is true generally in an AMSS economy.
However, BEGS describe conditions under which such an i.i.d., S = 2 AMSS economy in which
the representative agent dislikes consumption risk converges to a Lucas-Stokey economy in the
sense that xt → x̌ as t → ∞.
The following subsection displays a numerical example that exhibits convergence.
Examples
Anticipated One-Period War In our lecture on optimal taxation with state contingent debt we
studied in a simple setting how the government manages uncertainty.
As in that lecture, we assume the one-period utility function
c 1− σ n 1+ γ
u(c, n) = −
1−σ 1+γ
Note: For convenience in terms of matching our computer code, we have expressed utility as a
Perpetual War Alert The history dependence can be seen more starkly in a case where the gov-
ernment perpetually faces the prospect of war
This case was studied in the final example of the lecture on optimal taxation with state-contingent
debt
There, each period the government faces a constant probability, 0.5, of war.
In addition, this example features BGP preferences
so even with state contingent debt tax rates will not be constant.
The figure below plots the optimal tax policies for both the case of state contingent (circles) and
non state contingent debt (triangles).
When the government experiences a prolonged period of peace, it is able to reduce government
debt and permanently lower tax rates.
However, the government must finance a long war by borrowing and raising taxes.
This results in a drift away from the policies with state contingent debt that depends on the history
of shocks received.
This is even more evident in the following figure that plots the evolution of the two policies over
200 periods.
Contents
• Default Risk and Income Fluctuations
– Overview
– Structure
– Equilibrium
– Computation
– Results
– Exercises
– Solutions
Overview
Structure
Output, Consumption and Debt A small open economy is endowed with an exogenous stochas-
tically fluctuating potential output stream {yt }
Potential output is realized in periods in which the government honors its sovereign debt
The output good can be traded or consumed
The sequence {yt } is described by a Markov process with stochastic density kernel p(y, y0 )
Households within the country are identical and rank stochastic consumption streams according
to
∞
E ∑ βt u(ct ) (3.261)
t =0
Here
• 0 < β < 1 is a time discount factor
• u is an increasing and strictly concave utility function
Consumption sequences enjoyed by households are affected by the government’s decision to bor-
row or lend internationally
The government is benevolent in the sense that its aim is to maximize (3.261)
The government is the only domestic actor with access to foreign credit
Because household are averse to consumption fluctuations, the government will try to smooth
consumption by borrowing from (and lending to) foreign creditors
Asset Markets The only credit instrument available to the government is a one-period bond
traded in international credit markets
The bond market has the following features
• The bond matures in one period and is not state contingent
• A purchase of a bond with face value B0 is a claim to B0 units of the consumption good next
period
• The cost of the contract is qB0
– if B0 < 0, then −qB0 units of the good are received in the current period, for a promise
to repay − B units next period
– the price q = q( B0 , y) will depend on both B0 and y in equilibrium
Earnings on the government portfolio are paid lump sum to households
Conditional on absence of default, the resource constraint for the economy at any point in time is
c = y + B − q( B0 , y) B0 (3.262)
Here and below, a prime denotes a next period value or a claim maturing next period
To rule out Ponzi schemes, we also require that B ≥ − Z in every period
• Z is chosen to be sufficiently large that the constraint never binds in equilibrium
1−δ 0
qB0 − B (3.263)
1+r
Equilibrium
Informally, a recursive equilibrium for this economy is a sequence of interest rates on its sovereign
debt, a stochastic sequence of government default decisions and an implied flow of household
consumption such that
1. Consumption satisfies the resource constraint
2. The government maximizes household utility taking into account
• the resource constraint
vc ( B, y) < vd (y)
and hence the probability of default next period given current holdings B0 is
Z
δ( B0 , y) := 1{vc ( B0 , y0 ) < vd (y0 )} p(y, y0 )dy0 (3.264)
Given zero profits for foreign creditors in equilibrium, we can now pin down the bond price by
combining (3.263) and (3.264) to obtain
1 − δ( B0 , y)
q( B0 , y) = (3.265)
1+r
Computation
References
----------
http://quant-econ.net/py/arellano.html
"""
from __future__ import division
import numpy as np
import random
import quantecon as qe
from numba import jit
class Arellano_Economy(object):
"""
Arellano 2008 deals with a small open economy whose government
invests in foreign assets in order to smooth the consumption of
domestic households. Domestic households receive a stochastic
path of income.
Parameters
----------
beta : float
Time discounting parameter
gamma : float
Risk-aversion parameter
r : float
int lending rate
rho : float
Persistence in the income process
eta : float
Standard deviation of the income process
theta : float
Probability of re-entering financial markets in each period
ny : int
Number of points in y grid
nB : int
Number of points in B grid
tol : float
Error tolerance in iteration
maxit : int
Maximum number of iterations
"""
def __init__(self,
beta=.953, # time discount rate
gamma=2., # risk aversion
r=0.017, # international interest rate
rho=.945, # persistence in output
eta=0.025, # st dev of output shock
theta=0.282, # prob of regaining access
ny=21, # number of points in y grid
nB=251, # number of points in B grid
tol=1e-8, # error tolerance in iteration
maxit=10000):
# Save parameters
self.beta, self.gamma, self.r = beta, gamma, r
self.rho, self.eta, self.theta = rho, eta, theta
self.ny, self.nB = ny, nB
# Allocate memory
self.Vd = np.zeros(ny)
self.Vc = np.zeros((ny, nB))
self.V = np.zeros((ny, nB))
self.Q = np.ones((ny, nB)) * .95 # Initial guess for prices
self.default_prob = np.empty((ny, nB))
# == Main loop == #
while dist > tol and maxit > it:
# Update prices
Vd_compat = np.repeat(self.Vd, self.nB).reshape(self.ny, self.nB)
default_states = Vd_compat > self.Vc
it += 1
if it % 25 == 0:
print("Running iteration {} with dist of {}".format(it, dist))
return None
def compute_savings_policy(self):
"""
Compute optimal savings B' conditional on not defaulting.
The policy is recorded as an index value in Bgrid.
"""
# Allocate memory
self.next_B_index = np.empty((self.ny, self.nB))
EV = np.dot(self.Py, self.V)
if y_init is None:
# Set to index near the mean of the ygrid
y_init = np.searchsorted(self.ygrid, self.ygrid.mean())
if B_init is None:
B_init = zero_B_index
# Start off not in default
in_default = False
# Initialize Markov chain for output
mc = qe.markov.MarkovChain(self.Py)
for t in range(T-1):
return return_vecs
@jit(nopython=True)
def u(c, gamma):
return c**(1-gamma)/(1-gamma)
@jit(nopython=True)
def _inner_loop(ygrid, def_y, Bgrid, Vd, Vc, EVc,
EVd, EV, qq, beta, theta, gamma):
"""
This is a numba version of the inner loop of the solve in the
Arellano class. It updates Vd and Vc in place.
"""
ny, nB = len(ygrid), len(Bgrid)
zero_ind = nB // 2 # Integer division
for iy in range(ny):
y = ygrid[iy] # Pull out current y
# Compute Vd
Vd[iy] = u(def_y[iy], gamma) + \
beta * (theta * EVc[iy, zero_ind] + (1 - theta) * EVd[iy])
# Compute Vc
for ib in range(nB):
B = Bgrid[ib] # Pull out current B
current_max = -1e14
for ib_next in range(nB):
c = max(y - qq[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
return None
@jit(nopython=True)
def _compute_savings_policy(ygrid, Bgrid, Q, EV, gamma, beta, next_B_index):
# Compute best index in Bgrid given iy, ib
ny, nB = len(ygrid), len(Bgrid)
for iy in range(ny):
y = ygrid[iy]
for ib in range(nB):
B = Bgrid[ib]
current_max = -1e10
for ib_next in range(nB):
c = max(y - Q[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, gamma) + beta * EV[iy, ib_next]
if m > current_max:
current_max = m
current_max_index = ib_next
next_B_index[iy, ib] = current_max_index
return None
Results
• Lower income also causes more discounting, as foreign creditors anticipate greater likeli-
hood of default
The next figure plots value functions and replicates the right hand panel of Figure 4 of [Are08]
We can use the results of the computation to study the default probability δ( B0 , y) defined in (3.264)
The next plot shows these default probabilities over ( B0 , y) as a heat map
As anticipated, the probability that the government chooses to default in the following period
increases with indebtedness and with falls with income
Next let’s run a time series simulation of {yt }, { Bt } and q( Bt+1 , yt )
The grey vertical bars correspond to periods when the economy is in default
One notable feature of the simulated data is the nonlinear response of interest rates
Periods of relative stability are followed by sharp spikes in the discount rate on government debt
Exercises
Exercise 1 To the extent that you can, replicate the figures shown above
• Use the parameter values listed as defaults in the __init__ method of the Arellano_Economy
• The time series will of course vary depending on the shock draws
Solutions
Solution notebook
FOUR
SOLUTIONS
Each lecture with exercises has a link to solutions immediately after the exercises
The links are to static versions of IPython Notebook files — the originals
are located in matching topic folders contained in the ‘QuantEcon.applications
<https://github.com/QuantEcon/QuantEcon.applications‘_ repo. For example, to obtain a
solutions notebook for the schelling chapter, it can be found here
If you look at a typical solution notebook you’ll see a download icon on top right
You can download a copy of the ipynb file (the notebook file) using that icon
Now start IPython notebook and navigate to the downloaded ipynb file
Once you open it in IPython notebook it should be running live, allowing you to make changes
653
654
FIVE
5.1 FAQs
Run one of these commands in the system terminal (terminal, cmd, etc., depending on your OS)
• python — the basic Python shell (actually, don’t use it, see the next command)
• ipython — a much better Python shell
• ipython notebook — start IPython Notebook on local machine
See here for more details on running Python
5.5 Where do I get all the Python programs from the lectures?
655
5.6. WHAT’S GIT? 656
PDF Lectures
[Aiy94] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quar-
terly Journal of Economics, 109(3):659–684, 1994.
[AMSS02] S. Rao Aiyagari, Albert Marcet, Thomas J. Sargent, and Juha Seppala. Op-
timal Taxation without State-Contingent Debt. Journal of Political Economy,
110(6):1220–1254, December 2002.
[AM05] D. B. O. Anderson and J. B. Moore. Optimal Filtering. Dover Publications, 2005.
[AHMS96] E. W. Anderson, L. P. Hansen, E. R. McGrattan, and T. J. Sargent. Mechanics of
Forming and Estimating Dynamic Linear Economies. In Handbook of Computational
Economics. Elsevier, vol 1 edition, 1996.
[Are08] Cristina Arellano. Default risk and income fluctuations in emerging economies.
The American Economic Review, pages 690–712, 2008.
[ACK10] Andrew Atkeson, Varadarajan V Chari, and Patrick J Kehoe. Sophisticated mone-
tary policies*. The Quarterly journal of economics, 125(1):47–89, 2010.
[Bar79] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[Bas05] Marco Bassetto. Equilibrium and government commitment. Journal of Economic
Theory, 124(1):79–105, 2005.
[BBZ15] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in be-
wley economies with capital income risk. Journal of Economic Theory, 159:489–515,
2015.
[BS79] L M Benveniste and J A Scheinkman. On the Differentiability of the Value Func-
tion in Dynamic Models of Economics. Econometrica, 47(3):727–732, 1979.
[Bew77] Truman Bewley. The permanent income hypothesis: A theoretical formulation.
Journal of Economic Theory, 16(2):252–292, 1977.
[BEGS13] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J Sargent. Taxes,
debts, and redistributions with aggregate shocks. Technical Report, National Bu-
reau of Economic Research, 2013.
[Bis06] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
659
REFERENCES 660
[GW10] Marc P Giannoni and Michael Woodford. Optimal target criteria for stabilization
policy. Technical Report, National Bureau of Economic Research, 2010.
[Hal78] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hy-
pothesis: Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.
[HM82] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transi-
tory Income: Estimates from Panel Data on Households. National Bureau of Eco-
nomic Research Working Paper Series, 1982.
[Ham05] James D Hamilton. What’s real about the business cycle?. Federal Reserve Bank of
St. Louis Review, pages 435–452, 2005.
[HR85] Dennis Epple, Hansen, Lars. P. and Will Roberds. Linear-quadratic duopoly mod-
els of resource depletion. In Energy, Foresight, and Strategy. Resources for the Fu-
ture, vol 1 edition, 1985.
[HS08] L P Hansen and T J Sargent. Robustness. Princeton University Press, 2008.
[HS13] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
[HS00] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics.
Manuscript, Department of Economics, Stanford University., 2000.
[HL96] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets
on risk sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.
[HLL96] O Hernandez-Lerma and J B Lasserre. Discrete-Time Markov Control Processes: Ba-
sic Optimality Criteria. number Vol 1 in Applications of Mathematics Stochastic
Modelling and Applied Probability. Springer, 1996.
[HP92] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Station-
ary Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.
[HR93] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation:
A General Equilibrium Analysis. Journal of Political Economy, 101(5):915–938, 1993.
[Hug93] Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance
economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, 1993.
[Haggstrom02] Olle Häggström. Finite Markov chains and algorithmic applications. volume 52. Cam-
bridge University Press, 2002.
[Jud90] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical Report,
Hoover Institution, Stanford University, 1990.
[Janich94] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Tech-
nology. Springer, 1994.
[Kam12] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of
dynamic programming: existence, uniqueness, and convergence. Technical Re-
port, Kobe University, 2012.
[Kuh13] Moritz Kuhn. Recursive Equilibria In An Aiyagari-Style Economy With Perma-
nent Income Shocks. International Economic Review, 54:807–835, 2013.
[KP80a] Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational ex-
pectations and optimal control. Journal of Economic Dynamics and Control, 2:79–91,
1980.
[KP77] Finn E., Kydland and Edward C. Prescott. Rules rather than discretion: The in-
consistency of optimal plans. Journal of Political Economy, 106(5):867–896, 1977.
[KP80b] Finn E., Kydland and Edward C. Prescott. Time to build and aggregate fluctua-
tions. Econometrics, 50(6):1345–2370, 1980.
[LM94] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynam-
ics. Applied Mathematical Sciences. Springer-Verlag, 1994.
[LM80] David Levhari and Leonard J Mirman. The great fish war: an example using a
dynamic cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.
[LS12] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 3 edition,
2012.
[Luc78] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of
the Econometric Society, 46(6):1429–1445, 1978.
[LP71] Robert E Lucas, Jr and Edward C Prescott. Investment under uncertainty. Econo-
metrica: Journal of the Econometric Society, pages 659–681, 1971.
[LS83] Robert E Lucas, Jr and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):55–93, 1983.
[MS89] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in
Environments with Hidden State Variables and Private Information. Journal of Po-
litical Economy, 97(6):1306–1322, 1989.
[MdRV10] V Filipe Martins-da-Rocha and Yiannis Vailakis. Existence and Uniqueness of a
Fixed Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.
[MCWG95] A Mas-Colell, M D Whinston, and J R Green. Microeconomic Theory. volume 1.
Oxford University Press, 1995.
[McC70] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.
[MP85] Rajnish Mehra and Edward C Prescott. The equity premium: A puzzle. Journal of
Monetary Economics, 15(2):145–161, 1985.
[MT09] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.
[MS85] Marcus Miller and Mark Salmon. Dynamic Games and the Time Inconsistency of
Optimal Policy in Open Economies. Economic Journal, 95:124–137, 1985.
[MF02] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance.
Cambridge: MIT Press, 2002.
[MZ75] Leonard J Mirman and Itzhak Zilcha. On optimal growth under uncertainty. Jour-
nal of Economic Theory, 11(3):329–339, 1975.
[MB54] F. Modigliani and R. Brumberg. Utility analysis and the consumption function:
An interpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian
Economics. 1954.
[Nea99] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.
[Par99] Jonathan A Parker. The Reaction of Household Consumption to Predictable
Changes in Social Security Taxes. American Economic Review, 89(4):959–973, 1999.
[PL92] D.A. Currie, Pearlman, J.G. and P.L. Levine. Rational expectations with partial
information. Economic Modeling, 3:90–105, 1992.
[Pea92] J.G. Pearlman. Reputational and nonreputational policies under partial informa-
tion. Journal of Economic Dynamics and Control, 16(2):339–358, 1992.
[Pre77] Edward C. Prescott. Should control theory be used for economic stabilization?
Journal of Monetary Economics, 7:13–38, 1977.
[Put05] Martin L Puterman. Markov decision processes: discrete stochastic dynamic program-
ming. John Wiley & Sons, 2005.
[PalS13] Jenő Pál and John Stachurski. Fitted value function iteration with probability one
contractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.
[Rab02] Guillaume Rabault. When do borrowing constraints bind? Some new results
on the income fluctuation problem. Journal of Economic Dynamics and Control,
26(2):217–245, 2002.
[Ram27] F. P. Ramsey. A Contribution to the theory of taxation. Economic Journal,
37(145):47–61, 1927.
[Rei09] Michael Reiter. Solving heterogeneous-agent models by projection and perturba-
tion. Journal of Economic Dynamics and Control, 33(3):649–665, 2009.
[Rus96] John Rust. Numerical dynamic programming in economics. Handbook of computa-
tional economics, 1:619–729, 1996.
[Rya12] Stephen P Ryan. The costs of environmental regulation in a concentrated industry.
Econometrica, 80(3):1019–1061, 2012.
[Sar79] T J Sargent. A note on maximum likelihood estimation of the rational expectations
model of the term structure. Journal of Monetary Economics, 35:245–274, 1979.
[Sar87] T J Sargent. Macroeconomic Theory. Academic Press, 2nd edition, 1987.
[SE77] Jack Schechtman and Vera L S Escudero. Some results on “an income fluctuation
problem”. Journal of Economic Theory, 16(2):151–166, 1977.
[Sch69] Thomas C Schelling. Models of Segregation. American Economic Review,
59(2):488–493, 1969.
[Shi95] A N Shiriaev. Probability. Graduate texts in mathematics. Springer. Springer, 2nd
edition, 1995.
[SLP89] N L Stokey, R E Lucas, and E C Prescott. Recursive Methods in Economic Dynamics.
Harvard University Press, 1989.
[Sto89] Nancy L Stokey. Reputation and time consistency. The American Economic Review,
pages 134–139, 1989.
[STY04] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk
sharing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.
[Sun96] R K Sundaram. A First Course in Optimization Theory. Cambridge University Press,
1996.
[Tau86] George Tauchen. Finite state markov-chain approximations to univariate and vec-
tor autoregressions. Economics Letters, 20(2):177–181, 1986.
[Tow83] Robert M. Townsend. Forecasting the forecasts of others. Journal of Political Econ-
omy, 91:546–588, 1983.
[VL11] Ngo Van Long. Dynamic games in the economics of natural resources: a survey.
Dynamic Games and Applications, 1(1):115–148, 2011.
[Woo03] Michael Woodford. Interest and Prices: Foundations of a Theory of Monetary Policy.
Princeton University Press, 2003.
[YS05] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.
Acknowledgements: These lectures have benefitted greatly from comments and suggestion from
our colleagues, students and friends. Special thanks go to Anmol Bhandari, Jeong-Hun Choi,
Chase Coleman, David Evans, Chenghan Hou, Doc-Jin Jang, Spencer Lyon, Qingyin Ma, Matthew
McKay, Tomohito Okabe, Alex Olssen, Nathan Palmer and Yixiao Zhou.
666
INDEX 667
Spectra, 506
Estimation, 506
Spectra, Estimation
AR(1) Setting, 517
Fast Fourier Transform, 507
Pre-Filtering, 515
Smoothing, 511, 515
Spectral Analysis, 488, 492
Spectral Densities, 493
Spectral Density, 494
interpretation, 494
Inverting the Transformation, 496
Mathematical Theory, 496
Spectral Radius, 196
Static Types, 168
Stationary Distributions, 198, 208
statsmodels, 14
Stochastic Matrices, 199
SymPy, 11
T
Text Editors, 32
U
Unbounded Utility, 304
urllib.request, 154
V
Value Function Iteration, 303
Vectorization, 167, 169
Operations on Arrays, 169
Vectors, 181, 182
Inner Product, 184
Linear Independence, 185
Norm, 184
Operations, 183
Span, 184
W
Wakari, 166
White Noise, 489, 493
Wold’s Decomposition, 490