Spatial Econometrics
Spatial Econometrics
Spatial Econometrics
James P. LeSage
Department of Economics
University of Toledo
CIRCULATED FOR REVIEW
December, 1998
Preface
This text provides an introduction to spatial econometrics as well as a set of
MATLAB functions that implement a host of spatial econometric estimation
methods. The intended audience is faculty and students involved in modeling spatial data sets using spatial econometric methods. The MATLAB
functions described in this book have been used in my own research as well
as teaching both undergraduate and graduate econometrics courses.
Toolboxes are the name given by the MathWorks to related sets of MATLAB functions aimed at solving a particular class of problems. Toolboxes
of functions useful in signal processing, optimization, statistics, finance
and a host of other areas are available from the MathWorks as add-ons
to the standard MATLAB software distribution. I use the term Econometrics Toolbox to refer to my collection of function libraries described
in a manual entitled Applied Econometrics using MATLAB available at
http://www.econ.utoledo.edu.
The MATLAB spatial econometrics functions used to apply the spatial
econometric models discussed in this text rely on many of the functions in
the Econometrics Toolbox. The spatial econometric functions constitute a
library within the broader set of econometric functions. To use the spatial
econometrics functions library you need to install the entire set of Econometrics Toolbox functions in MATLAB. The spatial econometrics functions
library is part of the Econometrics Toolbox and will be installed and available for use as well as the econometrics functions.
Researchers currently using Gauss, RATS, TSP, or SAS for econometric
programming might find switching to MATLAB advantageous. MATLAB
software has always had excellent numerical algorithms, and has recently
been extended to include: sparse matrix algorithms and very good graphical
capabilities. MATLAB software is available on a wide variety of computing
platforms including mainframe, Intel, Apple, and Linux or Unix workstations. A Student Version of MATLAB is available for less than $100. This
version is limited in the size of problems it can solve, but many of the exi
ii
amples in this text rely on a small data sample with 49 observations that
can be used with the Student Version of MATLAB.
The collection of around 450 functions and demonstration programs are
organized into libraries, with approximately 30 spatial econometrics library
functions described in this text. For those interested in other econometric
functions or in adding programs to the spatial econometrics library, see the
manual for the Econometrics Toolbox. The 350 page manual provides many
details regarding programming techniques used to construct the functions
and examples of adding new functions to the Econometrics Toolbox. This
text does not focus on programming methods. The emphasis here is on
applying the existing spatial econometric estimation functions to modeling
spatial data sets.
A consistent design was implemented that provides documentation, example programs, and functions to produce printed as well as graphical presentation of estimation results for all of the econometric functions. This
was accomplished using the structure variables introduced in MATLAB
Version 5. Information from econometric estimation is encapsulated into a
single variable that contains fields for individual parameters and statistics
related to the econometric results. A thoughtful design by the MathWorks
allows these structure variables to contain scalar, vector, matrix, string,
and even multi-dimensional matrices as fields. This allows the econometric
functions to return a single structure that contains all estimation results.
These structures can be passed to other functions that can intelligently decipher the information and provide a printed or graphical presentation of
the results.
The Econometrics Toolbox along with the spatial econometrics library
functions should allow faculty to use MATLAB in undergraduate and graduate level courses with absolutely no programming on the part of students
or faculty.
In addition to providing a set of spatial econometric estimation routines
and documentation, the book has another goal, applied modeling strategies
and data analysis. Given the ability to easily implement a host of alternative
models and produce estimates rapidly, attention naturally turns to which
models and estimates work best to summary a spatial data sample. Much
of the discussion in this text is on these issues.
This text is provided in Adobe PDF format for online use. It attempts
to draw on the unique aspects of a computer presentation platform. The
ability to present program code, data sets and applied examples in an online
fashion is a relatively recent phenomena, so issues of how to best accomplish
a useful online presentation are numerous. For the online text the following
iii
features were included in the PDF document.
1. A detailed set of bookmarks that allow the reader to jump to any
section or subsection in the text.
2. A detailed set of bookmarks that allow the reader to jump to an
examples or figures in the text.
3. A set of bookmarks that allow the reader to view the spatial datasets
and documentation for the datasets using a Web browser.
4. A set of bookmarks that allow the reader to view all of the example
programs using a Web browser.
All of the examples in the text as well as the datasets are available offline
as well on my Web site: http://www.econ.utoledo.edu under the MATLAB
gallery icon.
Finally, there are obviously omissions, bugs and perhaps programming
errors in the Econometrics Toolbox and the spatial econometrics library
functions. This would likely be the case with any such endeavor. I would be
grateful if users would notify me when they encounter problems. It would
also be helpful if users who produce generally useful functions that extend
the toolbox would submit them for inclusion. Much of the econometric code
I encounter on the internet is simply too specific to a single research problem
to be generally useful in other applications. If econometric researchers are
serious about their newly proposed estimation methods, they should take
the time to craft a generally useful MATLAB function that others could use
in applied research. Inclusion in the spatial econometrics function library
would have the added benefit of introducing new research methods to faculty
and their students.
The latest version of the Econometrics Toolbox functions can be found on
the Internet at: http://www.econ.utoledo.edu under the MATLAB gallery
icon. Instructions for installing these functions are in an Appendix to this
text along with a listing of the functions in the library and a brief description
of each.
Contents
1 Introduction
1.1 Spatial econometrics . . . . . . . . . . . . .
1.2 Spatial dependence . . . . . . . . . . . . . .
1.3 Spatial heterogeneity . . . . . . . . . . . . .
1.4 Quantifying location in our models . . . . .
1.4.1 Quantifying spatial contiguity . . . .
1.4.2 Quantifying spatial position . . . . .
1.5 The MATLAB spatial econometrics library
1.5.1 Estimation functions . . . . . . . . .
1.5.2 Using the results structure . . . . .
1.6 Chapter Summary . . . . . . . . . . . . . .
1.7 References . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
3
6
7
8
13
18
21
23
27
28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
31
34
36
42
43
44
50
56
57
61
62
64
70
81
81
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
84
85
91
99
101
107
110
115
124
125
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
127
129
133
142
154
156
158
163
166
167
170
184
185
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
188
192
193
195
204
210
211
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
213
213
220
231
243
CONTENTS
6.5
6.6
6.7
vi
Appendix
260
List of Examples
1.1
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.1
4.2
4.3
4.4
4.5
4.6
4.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
36
39
41
44
46
48
57
59
64
68
71
74
75
88
95
101
104
110
112
115
121
123
133
147
151
157
170
171
177
LIST OF EXAMPLES
5.1
5.2
5.3
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
196
202
205
215
217
218
219
222
226
229
233
235
243
245
246
248
252
255
List of Figures
1.1
1.2
1.3
1.4
1.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
6
9
16
17
2.1
2.2
2.3
2.4
2.5
. . . .
Barry
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
40
47
65
73
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
94
98
104
106
108
121
122
124
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
. .
. .
. .
. .
. .
. .
. .
Vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
139
140
141
142
158
172
173
174
ix
. .
. .
. .
. .
. .
. .
. .
by
LIST OF FIGURES
4.9
4.10
4.11
4.12
5.1
5.2
6.1
6.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
175
177
180
181
Chapter 1
Introduction
This chapter provides an overview of the nature of spatial econometrics.
An applied model-based approach is taken where various spatial econometric methods are introduced in the context of spatial data sets and associated models based on the data. The remaining chapters of the text are
organized along the lines of alternative spatial econometric estimation procedures. Each chapter illustrates applications of a different econometric
estimation method and provides references to the literature regarding these
methods.
Section 1.1 sets forth the nature of spatial econometrics and discusses differences with traditional econometrics. We will see that spatial econometrics
is characterized by: 1) spatial dependence between sample data observations
at various points in the Cartesian plane, and 2) spatial heterogeneity that
arises from relationships or model parameters that vary with our sample
data as we move over the Cartesian plane.
The nature of spatially dependent or spatially correlated data is taken
up in Section 1.2 and spatial heterogeneity is discussed in Section 1.3. Section 1.4 takes up the subject of how we formally incorporate the locational
information from spatial data in econometric models. In addition to the
theoretical discussion of incorporating locational information in econometric models, Section 1.4 provides a preview of alternative spatial econometric
estimation methods that will be covered in Chapters 2 through 4.
Finally, Section 1.5 describes software design issues related to a spatial
econometric function library based on MATLAB software from the MathWorks Inc. Functions are described throughout the text that implement the
spatial econometric estimation methods discussed. These functions provide
a consistent user-interface in terms of documentation and related functions
CHAPTER 1. INTRODUCTION
that provided printed as well as graphical presentation of the estimation results. Section 1.5 introduces the spatial econometrics function library which
is part of a broader collection of econometric estimation functions available
in my public domain Econometrics Toolbox.
1.1
Spatial econometrics
Applied work in regional science relies heavily on sample data that is collected with reference to location measured as points in space. The subject
of how we incorporate the locational aspect of sample data is deferred until Section 1.4. What distinguishes spatial econometrics from traditional
econometrics? Two problems arise when sample data has a locational component: 1) spatial dependence exists between the observations and 2) spatial
heterogeneity occurs in the relationships we are modeling.
Traditional econometrics has largely ignored these two issues that violate the traditional Gauss-Markov assumptions used in regression modeling.
With regard to spatial dependence between the observations, recall that
Gauss-Markov assumes the explanatory variables are fixed in repeated sampling. Spatial dependence violates this assumption, a point that will be
made clear in the next section. This gives rise to the need for alternative
estimation approaches. Similarly, spatial heterogeneity violates the GaussMarkov assumption that a single linear relationship exists across the sample
data observations. If the relationship varies as we move across the spatial
data sample, alternative estimation procedures are needed to successfully
model this type of variation and draw appropriate inferences.
The subject of this text is alternative estimation approaches that can be
used when dealing with spatial data samples. A subject that is seldom discussed in traditional econometrics textbooks. For example, no discussion of
issues and models related to spatial data samples can be found in Amemiya
(1985), Chow (1983), Dhrymes (1978), Fomby et al. (1984), Green (1997),
Intrilligator (1978), Kelijian and Oates (1989), Kmenta (1986), Maddala
(1977), Pindyck and Rubinfeld (1981), Schmidt (1976), and Vinod and Ullah (1981).
Anselin (1988) provides a complete treatment of many facets of spatial econometrics which this text draws upon. In addition to introducing
ideas set forth in Anselin (1988), this presentation includes some more recent approaches based on Bayesian methods applied to spatial econometric
models. In terms of focus, the materials presented here are more applied
than Anselin (1988), providing program functions and illustrations of hands-
CHAPTER 1. INTRODUCTION
1.2
Spatial dependence
(1.1)
CHAPTER 1. INTRODUCTION
(1)
(2)
(3)
(4)
(5)
(6)
1,000
2,000
3,000
3,000
2,000
1,000
Labor force
by residence
1,000
3,000
3,000
1,000
Employment by place
100%
50%
0%
0%
50%
100%
Measured rate of
Unemployment
25%
25%
25%
25%
25%
25%
Acutal rate of
Unemployment
CHAPTER 1. INTRODUCTION
CHAPTER 1. INTRODUCTION
0.035
0.03
0.025
0.02
0.015
0.01
0.005
-0.005
1860
1880
1900
1920
1940
1960
1980
2000
1.3
Spatial heterogeneity
(1.2)
CHAPTER 1. INTRODUCTION
1.4
A first task we must undertake before we can ask questions about spatial
dependence and heterogeneity is quantification of the locational aspects of
our sample data. Given that we can always map a set of spatial data observations, we have two sources of information on which we can draw.
The location in Cartesian space represented by latitude and longitude is
one source of information. This information would also allow us to calculate
distances from any point in space, or the distance of observations located at
distinct points in space to observations at other locations. Spatial dependence should conform to the fundamental theorem of regional science, distance matters. Observations that are near each other should reflect a greater
degree of spatial dependence than those more distant from each other. In
other words, the strength of spatial dependence between observations should
decline with the distance between observations.
The second source of locational information is contiguity, reflecting the
relative position in space of one regional unit of observation to other such
units. Measures of contiguity rely on a knowledge of the size and shape of the
observational units depicted on a map. From this, we can determine which
CHAPTER 1. INTRODUCTION
1.4.1
Figure 1.2 shows a hypothetical example of five regions as they would appear
on a map. We wish to construct a 5 by 5 binary matrix W containing 25
elements taking values of 0 or 1 that captures the notion of connectiveness
between the five entities depicted in the map configuration. We record in
each row of the matrix W a set of contiguity relations associated with one
of the five regions. For example the matrix element in row 1, column 2
would record the presence (represented by a 1) or absence (denoted by 0) of
a contiguity relationship between regions 1 and 2. As another example, the
row 3, column 4 element would reflect the presence or absence of contiguity
between regions 3 and 4. Of course, a matrix constructed in such fashion
must be symmetric if regions 3 and 4 are contiguous, so are regions 4 and
3.
It turns out there are an embarrassingly large number of ways to accomplish our task. Below, we enumerate some of the alternative ways we
might define a binary matrix W that represent alternative definitions of the
contiguity relationships between the five entities in Figure 1.3. For the
enumeration below, start with a matrix filled with zeros, then consider the
following alternative ways to define the presence of a contiguity relationship.
Linear contiguity: Define Wij = 1 for entities that share a common
edge to the immediate right or left of the region of interest. For row
1, where we record the relations associated with region 1, we would
have all W1j = 0, j = 1, . . ., 5. On the other hand, for row 5, where we
record relationships involving region 5, we would have W53 = 1 and
CHAPTER 1. INTRODUCTION
(4)
(3)
(5)
(2)
(1)
CHAPTER 1. INTRODUCTION
10
Double linear contiguity: For two entities to the immediate right or left
of the region of interest, define Wij = 1. This definition would produce
the same results as linear contiguity for the regions in Figure 1.3.
Double rook contiguity: For two entities to the right, left, north and
south of the region of interest define Wij = 1. This would result in the
same matrix W as rook contiguity for the regions shown in Figure 1.3.
Queen contiguity: For entities that share a common side or vertex
with the region of interest define Wij = 1. For region 3 we would
have: W32 = 1, W34 = 1, W35 = 1 and all other row elements zero.
Believe it or not, there are even more ways one could proceed. For a
good discussion of these issues, see Appendix 1 of Kelejian and Robinson
(1995). Note also that the double linear and double rook definitions are
sometimes referred to as second order contiguity, whereas the other definitions are termed first order. More elaborate definitions sometimes rely
on the distance of shared borders. This might impact whether we considered regions (4) and (5) in Figure 1.3 as contiguous or not. They have a
common border, but it is very short. Note that in the case of a vertex, the
rook definition rules out a contiguity relation, whereas the bishop and queen
definitions would record a relationship.
The guiding principle in selecting a definition should be the nature of
the problem being modeled, and perhaps additional non-sample information
that is available. For example, suppose that a major highway connecting
regions (2) and (3) existed and we knew that region (2) was a bedroom
community for persons who work in region (3). Given this non-sample
information, we would not want to rely on the rook definition that would
rule out a contiguity relationship, as there is quite reasonably a large amount
of spatial interaction between these two regions.
We will use the rook definition to define a first-order contiguity matrix
for the five regions in Figure 1.3 as a concrete illustration. This is a definition
that is often used in applied work. Perhaps the motivation for this is that
we simply need to locate all regions on the map that have common borders
with some positive length.
The matrix W reflecting first-order rooks contiguity relations for the
five regions in Figure 1.3 is:
CHAPTER 1. INTRODUCTION
W =
0
1
0
0
0
11
1
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
(1.3)
Note that the matrix W is symmetric as indicated above, and by convention the matrix always has zeros on the main diagonal. A transformation
often used in applied work is to convert the matrix W to have row-sums of
unity. This is referred to as a standardized first-order contiguity matrix,
which we denote as C:
C =
0
1
0
0
0
1 0
0
0
0 0
0
0
0 0 1/2 1/2
0 1/2 0 1/2
0 1/2 1/2 0
(1.4)
y1?
y2?
y3?
y4?
y5?
y1?
y2?
y3?
y4?
y5?
0
1
0
0
0
1 0
0
0
y1
0 0
0
0 y2
0 0 0.5 0.5 y3
0 0.5 0 0.5 y4
0 0.5 0.5 0
y5
y2
y1
1/2y4 + 1/2y5
1/2y3 + 1/2y5
1/2y3 + 1/2y4
(1.5)
This is one way of quantifying the notion that yi = f (yj ), j 6= i, expressed in (1.1). Consider now a linear relationship that uses the variable
y ? we constructed in (1.5) as an explanatory variable in a linear regression
relationship to explain variation in y across the spatial sample of observations.
CHAPTER 1. INTRODUCTION
y = Cy +
12
(1.6)
(1.7)
CHAPTER 1. INTRODUCTION
13
sigma^2
=
95.5032
log-likelihood =
-165.41269
Nobs, Nvars
=
49,
3
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
45.056251
6.231261
0.000000
income
-1.030641
-3.373768
0.001534
house value
-0.265970
-3.004945
0.004331
rho
0.431381
3.625340
0.000732
For this example, we can calculate the proportion of total variation ex2
plained by spatial dependence with a comparison of the fit measured by R
from this model to the fit of a least-squares model that excludes the spatial dependence variable Cy. The least-squares regression for comparison is
shown below:
Ordinary Least-squares Estimates
Dependent Variable =
Crime
R-squared
=
0.5521
Rbar-squared =
0.5327
sigma^2
= 130.8386
Durbin-Watson =
1.1934
Nobs, Nvars
=
49,
3
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
68.609759
14.484270
0.000000
income
-1.596072
-4.776038
0.000019
house value
-0.274079
-2.655006
0.010858
1.4.2
Associating location in space with observations is essential to modeling relationships that exhibit spatial heterogeneity. Recall this means there is
variation in the relationship being modeled over space. We illustrate two
CHAPTER 1. INTRODUCTION
14
y = X +
= ZJ0
(1.8)
Where:
y =
y1
y2
..
.
X=
yn
x01 0 . . . 0
0 x02
..
..
.
.
0
x0n
1
2
..
.
n
1
2
..
.
n
Ik 0
Zx1 Ik Zy1 Ik
0
...
0 Ik
..
..
.
.
0
Z =
J = ..
.
..
.
Zxn Ik Zyn Ik
0 Ik
0 =
x
y
(1.9)
CHAPTER 1. INTRODUCTION
15
(1.10)
CHAPTER 1. INTRODUCTION
16
Household income
0
-0.2
-0.4
-0.6
-0.8
-1
10
15
20
25
30
Distance from central city
35
40
45
50
10
15
20
25
30
Distance from central city
35
40
45
50
House value
-0.05
-0.1
-0.15
-0.2
-0.25
(1.11)
The subscript i on i indicates that this kx1 parameter vector is associated with observation i. The GWR model produces n such vectors of
parameter estimates, one for each observation. These estimates are produced using:
i = (X 0 Wi2 X)1 (X 0Wi2 y)
(1.12)
CHAPTER 1. INTRODUCTION
17
also, that Wi X represents a distance-weighted data matrix, not a single observation and i represents a n-vector. The precise nature of the distance
weighting is taken up in Chapter 4.
As an applied illustration, we provide a graphical presentation in Figure 1.5 of the estimates produced by the GWR method sorted by distance
from the central city, making these comparable to those already presented
for the Casetti distance expansion model. The two sets of estimates are
quite different, raising the question of which approach provides a better
set of inferences regarding the relationship between the neighborhood crime
rates and the explanatory variables in the model. This topic will be taken
up when we discuss alternative modeling approaches to deal with spatial
dependence and heterogeneity.
2
Household income
1
0
-1
-2
-3
-4
10
15
20
25
30
Distance from central city
35
40
45
50
10
15
20
25
30
Distance from central city
35
40
45
50
House value
0.5
0
-0.5
-1
-1.5
CHAPTER 1. INTRODUCTION
18
geneous model should reflect unexplained variation attributable to heterogeneity in the underlying relationship over space. Spatial clustering of the
residuals would occur with positive and negative residuals appearing in distinct regions and patterns on the map. This of course was our motivation
and illustration of spatial dependence as illustrated in Figure 1.2. You might
infer correctly that spatial heterogeneity and dependence are often related
in the context of modeling. An inappropriate model that fails to capture
spatial heterogeneity will result in residuals that exhibit spatial dependence.
This is another topic we discuss in the following chapters of this text.
1.5
CHAPTER 1. INTRODUCTION
19
cate with other functions in the Econometric Toolbox should allow you to
more effectively use these functions to solve spatial econometric estimation
problems.
The entire Econometrics Toolbox has been included in the internet-based
materials provided here, as well as an online HTML interface to examine
the functions available along with their documentation. All functions have
accompanying demonstration files that illustrate the typical use of the functions with sample data. These demonstration files can be viewed using the
online HTML interface. We have also provided demonstration files for all
of the estimation functions in the spatial econometrics library that can be
viewed online along with their documentation. Examples are provided in
this text and the program files along with the datasets have been included
in the Web-based module.
In designing a spatial econometric library of functions, we need to think
about organizing our functions to present a consistent user-interface that
packages all of our MATLAB functions in a unified way. The advent of
structures in MATLAB version 5 allows us to create a host of alternative
spatial econometric functions that all return results structures.
A structure in MATLAB allows the programmer to create a variable
containing what MATLAB calls fields that can be accessed by referencing
the structure name plus a period and the field name. For example, suppose
we have a MATLAB function to perform ordinary least-squares estimation
named ols that returns a structure. The user can call the function with
input arguments (a dependent variable vector y and explanatory variables
matrix x) and provide a variable name for the structure that the ols function
will return using:
result = ols(y,x);
The structure variable result returned by our ols function might have
fields named rsqr, tstat, beta, etc. These fields might contain the R One virtue
squared statistic, tstatistics and the least-squares estimates .
of using the structure to return regression results is that the user can access
individual fields of interest as follows:
bhat = result.beta;
disp(The R-squared is:);
result.rsqr
disp(The 2nd t-statistic is:);
result.tstat(2,1)
There is nothing sacred about the name result used for the returned
structure in the above example, we could have used:
CHAPTER 1. INTRODUCTION
bill_clinton
result2
restricted
unrestricted
=
=
=
=
20
ols(y,x);
ols(y,x);
ols(y,x);
ols(y,x);
That is, the name of the structure to which the ols function returns its
information is assigned by the user when calling the function.
To examine the nature of the structure in the variable result, we can
simply type the structure name without a semi-colon and MATLAB will
present information about the structure variable as follows:
result =
meth:
y:
nobs:
nvar:
beta:
yhat:
resid:
sige:
tstat:
rsqr:
rbar:
dw:
ols
[100x1
100.00
3.00
[ 3x1
[100x1
[100x1
1.01
[ 3x1
0.74
0.73
1.89
double]
double]
double]
double]
double]
Each field of the structure is indicated, and for scalar components the
value of the field is displayed. In the example above, nobs, nvar, sige,
rsqr, rbar, and dw are scalar fields, so there values are displayed. Matrix
or vector fields are not displayed, but the size and type of the matrix or
vector field is indicated. Scalar string arguments are displayed as illustrated
by the meth field which contains the string ols indicating the regression
method that was used to produce the structure. The contents of vector or
matrix strings would not be displayed, just their size and type. Matrix and
vector fields of the structure can be displayed or accessed using the MATLAB
conventions of typing the matrix or vector name without a semi-colon. For
example,
result.resid
result.y
would display the residual vector and the dependent variable vector y in the
MATLAB command window.
Another virtue of using structures to return results from our regression
functions is that we can pass these structures to another related function
CHAPTER 1. INTRODUCTION
21
that would print or plot the regression results. These related functions can
query the structure they receive and intelligently decipher the meth field
to determine what type of regression results are being printed or plotted.
For example, we could have a function prt that prints regression results and
another plt that plots actual versus fitted and/or residuals. Both these functions take a regression structure as input arguments. Example 1.1 provides
a concrete illustration of these ideas.
% ----- Example 1.1 Demonstrate regression using the ols() function
load y.data;
load x.data;
result = ols(y,x);
prt(result);
plt(result);
The example assumes the existence of functions ols, prt, plt and data
matrices y, x in files y.data and x.data. Given these, we carry out a regression, print results and plot the actual versus predicted as well as residuals
with the MATLAB code shown in example 1.1. We will discuss the prt and
plt functions in Section 1.5.2.
1.5.1
Estimation functions
Now to put these ideas into practice, consider implementing an ols function.
The function code would be stored in a file ols.m whose first line is:
function results=ols(y,x)
The keyword function instructs MATLAB that the code in the file ols.m
represents a callable MATLAB function.
The help portion of the MATLAB ols function is presented below and
follows immediately after the first line as shown. All lines containing the
MATLAB comment symbol % will be displayed in the MATLAB command
window when the user types help ols.
function results=ols(y,x)
% PURPOSE: least-squares regression
%--------------------------------------------------% USAGE: results = ols(y,x)
% where: y = dependent variable vector (nobs x 1)
%
x = independent variables matrix (nobs x nvar)
%--------------------------------------------------% RETURNS: a structure
CHAPTER 1. INTRODUCTION
22
%
results.meth = ols
%
results.beta = bhat
%
results.tstat = t-stats
%
results.yhat = yhat
%
results.resid = residuals
%
results.sige = e*e/(n-k)
%
results.rsqr = rsquared
%
results.rbar = rbar-squared
%
results.dw
= Durbin-Watson Statistic
%
results.nobs = nobs
%
results.nvar = nvars
%
results.y
= y data vector
% -------------------------------------------------% SEE ALSO: prt(results), plt(results)
%---------------------------------------------------
All functions in the spatial econometrics library present a unified documentation format for the MATLAB help command by adhering to the
convention of sections entitled, PURPOSE, USAGE, RETURNS, SEE
ALSO, and perhaps a REFERENCES section, delineated by dashed lines.
The USAGE section describes how the function is used, with each input
argument enumerated along with any default values. A RETURNS section
portrays the structure that is returned by the function and each of its fields.
To keep the help information uncluttered, we assume some knowledge on
the part of the user. For example, we assume the user realizes that the
.residuals field would be an (nobs x 1) vector and the .beta field would
consist of an (nvar x 1) vector.
The SEE ALSO section points the user to related routines that may
be useful. In the case of our ols function, the user might what to rely on
the printing or plotting routines prt and plt, so these are indicated. The
REFERENCES section would be used to provide a literature reference (for
the case of our more exotic spatial estimation procedures) where the user
could read about the details of the estimation methodology.
As an illustration of the consistency in documentation, consider the function sar that provides estimates for the spatial autoregressive model that
we presented in Section 1.4.1. The documentation for this function is shown
below:
PURPOSE: computes spatial autoregressive model estimates
y = p*W*y + X*b + e, using sparse matrix algorithms
--------------------------------------------------USAGE: results = sar(y,x,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
x = explanatory variables matrix
CHAPTER 1. INTRODUCTION
23
The actual execution code to produce least-squares or spatial autoregressive parameter estimates would follow the documentation in the file
discussed above. We do not discuss programming of the spatial econometric
functions in the text, but you can of course examine all of the functions
to see how they work. The manual for the Econometrics Toolbox provides
a great deal of discussion of programming in MATLAB and examples of
how to add new functions to the toolbox or change existing functions in the
toolbox.
1.5.2
To illustrate the use of the results structure returned by our ols function,
consider the associated function plt reg which plots actual versus predicted
values along with the residuals. The results structure contains everything
needed by the plt reg function to carry out its task. Earlier, we referred to
functions plt and prt rather than plt reg, but prt and plt are wrapper
CHAPTER 1. INTRODUCTION
24
functions that call the functions prt reg and plt reg where the real work
of printing and plotting regression results is carried out. The motivation
for taking this approach is that separate smaller functions can be devised
to print and plot results from all of the spatial econometric procedures,
facilitating development. The wrapper functions eliminate the need for the
user to learn the names of different printing and plotting functions associated
with each group of spatial econometric procedures all results structures
can be printed and plotted by simply invoking the prt and plt functions.
Documentation for the plt function which plots results from all spatial
econometrics functions as well as the Econometrics Toolbox is shown below.
This function is a the wrapper function that calls an appropriate plotting
function, plt spat based on the econometric method identified in the results
structure meth field argument.
PURPOSE: Plots results structures returned by most functions
by calling the appropriate plotting function
--------------------------------------------------USAGE: plt(results,vnames)
Where: results = a structure returned by an econometric function
vnames = an optional vector of variable names
e.g. vnames = vnames = strvcat(y,const,x1,x2);
-------------------------------------------------NOTES: this is simply a wrapper function that calls another function
-------------------------------------------------RETURNS: nothing, just plots the results
-------------------------------------------------SEE ALSO: prt()
---------------------------------------------------
A decision was made not to place the pause command in the plt function, but rather let the user place this statement in the calling program or
function. An implication of this is that the user controls viewing regression
plots in for loops or in the case of multiple invocations of the plt function.
For example, only the second plot will be shown in the following code.
result1 = sar(y,x1,W);
plt(result1);
result2 = sar(y,x2,W);
plt(result2);
If the user wishes to see the plots associated with the first spatial autoregression, the code would need to be modified as follows:
result1 = sar(y,x1,W);
CHAPTER 1. INTRODUCTION
25
plt(result1);
pause;
result2 = sar(y,x2,W);
plt(result2);
The pause statement would force a plot of the results from the first spatial autoregression and wait for the user to strike any key before proceeding
with the second regression and accompanying plot of these results.
A more detailed example of using the results structure is the prt function which produces printed output from all of the functions in the spatial
econometrics library. The printout of estimation results is similar to that
provided by most statistical packages.
The prt function allows the user an option of providing a vector of fixed
width variable name strings that will be used when printing the regression
coefficients. These can be created using the MATLAB strvcat function
that produces a vertical concatenated list of strings with fixed width equal
to the longest string in the list. We can also print results to an indicated file
rather than the MATLAB command window. Three alternative invocations
of the prt function illustrating these options for usage are shown below:
vnames = strvcat(crime,const,income,house value);
res = sar(y,x,W);
prt(res);
% print with generic variable names
prt(res,vnames);
% print with user-supplied variable names
fid = fopen(sar.out,wr); % open a file for printing
prt(res,vnames,fid);
% print results to file sar.out
CHAPTER 1. INTRODUCTION
26
The second use of prt uses the user-supplied variable names. The MATLAB function strvcat carries out a vertical concatenation of strings and
pads the shorter strings in the vnames vector to have a fixed width based
on the longer strings. A fixed width string containing the variable names is
required by the prt function. Note that we could have used:
vnames = [crime
,
const
,
income
,
house value];
but, this takes up more space and is slightly less convenient as we have to
provide the padding of strings ourselves. Using the vnames input in the prt
function would result in the following printed to the MATLAB command
window.
Spatial autoregressive Model Estimates
Dependent Variable =
crime
R-squared
=
0.6518
Rbar-squared
=
0.6366
sigma^2
=
95.5033
Nobs, Nvars
=
49,
3
log-likelihood =
-165.41269
# of iterations =
12
min and max rho =
-1.5362,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
const
45.056481
6.231281
0.000000
income
-1.030647
-3.373784
0.001513
house value
-0.265970
-3.004944
0.004290
rho
0.431377
3.625292
0.000720
The third case specifies an output file opened with the command:
fid = fopen(sar.out,wr);
The file sar.out would contain output identical to that from the second use
of prt. It is the users responsibility to close the file that was opened using
the MATLAB command:
fclose(fid);
In the following chapters that present various spatial estimation methods we will provide the documentation but not the details concerning implementation of the estimation procedures in MATLAB. A function has been
CHAPTER 1. INTRODUCTION
27
devised and incorporated in the spatial econometrics library for each of the
estimation procedures that we discuss and illustrate. These functions carry
out estimation and provide printed as well as graphical presentation of the
results using the design framework that was set forth in this section.
1.6
Chapter Summary
This chapter introduced two main features of spatial data sets, spatial dependence and spatial heterogeneity. Spatial dependence refers to the fact
that sample data observations exhibit correlation with reference to points or
location in space. We often observe spatial clustering of sample data observations with respect to map regions. An intuitive motivation for this type of
result is the existence of spatial hierarchical relationships, spatial spillovers
and other types of spatial interactivity studied in regional science.
Spatial heterogeneity refers to the fact that underlying relationships we
wish to study may vary systematically over space. This creates problems
for regression and other econometric methods that do not accommodate
spatial variation in the relationships being modeled. A host of methods
have arisen in spatial econometrics that allow the estimated relationship to
vary systematically over space.
A large part of the chapter was devoted to introducing how locational
information regarding sample data observations is formally incorporated in
spatial econometric models. After introducing the concept of a spatial contiguity matrix, we provided a preview of the spatial autoregressive model
that relies on the contiguity concept. Chapters 2 and 3 cover this spatial
econometric method in detail.
In addition to spatial contiguity, other spatial econometric methods rely
on the latitude-longitude information available for spatial data samples to
allow variation in the relationship being studied over space. Two approaches
to this were introduced, the spatial expansion model and geographically
weighted regression, which are the subject of Chapter 4.
A preview that provided an applied illustration of the methods that will
be detailed in Chapters 2 through 4 was provided using a spatial data set
from Anselin (1988) on crime incidents in 49 neighborhoods in Columbus,
Ohio.
Finally, a software design for implementing the spatial econometric estimation methods discussed in this text was set forth. Our estimation methods
will be implemented using MATLAB software from the MathWorks Inc. A
design based on MATLAB structure variables was set forth. This approach
CHAPTER 1. INTRODUCTION
28
1.7
References
Amemiya, T. 1985. Advanced Econometrics, (Cambridge, MA: Harvard University Press).
Anselin, L. 1988. Spatial Econometrics: Methods and Models, (Dorddrecht: Kluwer Academic Publishers).
Brundson, C. A. S. Fotheringham, and M. Charlton. 1996. Geographically weighted regression: a method for exploring spatial nonstationarity, Geographical Analysis, Vol. 28, pp. 281-298.
Casetti, E., 1972. Generating Models by the Expansion Method:
Applications to Geographic Research, Geographical Analysis, Vol. 4,
pp. 81-91.
Casetti, E. 1992. Bayesian Regression and the Expansion Method,
Geographical Analysis, Vol. 24, pp. 58-74.
Chow, G. 1983. Econometrics, (New York: McGraw-Hill);
Dhrymes, P. 1981. Distributed Lags: Problems of Estimation and
Formulation, (Amsterdam: North-Holland).
Fomby, T., R. Hill, and S. Johnson. 1984. Advanced Econometric
Methods, (New York: Springer).
Intrilligator, M. 1978. Econometric Models, Techniques, and Applications, (Englewood Cliffs: Prentice-Hall).
Green, W. H. 1997. Econometric Analysis, third edition, (Upper Saddle River, N.J: Prentice Hall).
Kelejian, H. and W. Oates. 1989. Introduction to Econometrics: Principles and Applications, (New York: Harper and Row).
CHAPTER 1. INTRODUCTION
29
Chapter 2
(2.1)
u = W2 u +
N (0, 2In )
Where y contains an nx1 vector of cross-sectional dependent variables and
X represents an nxk matrix of explanatory variables. W1 and W2 are known
nxn spatial weight matrices, usually containing first-order contiguity relations or functions of distance. As explained in Section 1.4.1, a first-order
contiguity matrix has zeros on the main diagonal, rows that contain zeros
in positions associated with non-contiguous observational units and ones in
positions reflecting neighboring units that are (first-order) contiguous based
on one of the contiguity definitions.
From the general model in (2.1) we can derive special models by imposing
restrictions. For example, setting X = 0 and W2 = 0 produces a first-order
spatial autoregressive model shown in (2.2).
y = W1 y +
(2.2)
N (0, 2In )
This model attempts to explain variation in y as a linear combination of
contiguous or neighboring units with no other explanatory variables. The
30
31
y = W1 y + X +
(2.3)
N (0, In )
2
y = X + u
(2.4)
u = W2 u +
N (0, 2In )
This chapter is organized into sections that discuss and illustrate each
of these special cases of the spatial autoregressive model as well as the most
general model form in (2.1). Section 2.1 deals with the first-order spatial autoregressive model presented in (2.2). The mixed regressive-spatial autoregressive model is taken up in Section 2.2. Section 2.3 takes up the regression
model containing spatial autocorrelation in the disturbances and illustrates
various tests for spatial dependence using regression residuals. The most
general model is the focus of Section 2.4. Applied illustrations of all the
models are provided using a variety of spatial data sets. Spatial econometrics library functions that utilize MATLAB sparse matrix algorithms allow
us to estimate models with over 3,000 observations in around 100 seconds
on an inexpensive desktop computer.
2.1
This model is seldom used in applied work, but it serves to motivate some
of the ideas that we draw on in later sections of the chapter. The model
which we label FAR, takes the form:
y = W y +
32
(2.5)
N (0, In )
2
Where the spatial contiguity matrix W has been standardized to have row
sums of unity and the variable vector y is expressed in deviations from the
means form to eliminate the constant term in the model.
To illustrate the problem with least-squares estimation of spatial autoregressive models, consider applying least-squares to the model in (2.5) which
would produce an estimate for the single parameter in the model:
= (y 0 W 0 W y)1 y 0 W 0 y
(2.6)
(2.7)
Note that the least-squares estimate is biased, since we cannot show that
E(
) = . The usual argument that the explanatory variables matrix X in
least-squares is fixed in repeated sampling allows one to pass the expectation operator over terms like (y 0 W 0 W y)1 y 0 W 0 and argue that E() = 0,
eliminating the bias term. Here however, because of spatial dependence, we
cannot make the case that W y is fixed in repeated sampling. This also rules
out making a case for consistency of the least-squares estimate of , because
the probability limit (plim) for the term y 0 W 0 is not zero. In fact, Anselin
(1988) establishes that:
plimN 1 (y 0 W 0 ) = plimN 1 0 W (I W )1
(2.8)
This is equal to zero only in the trivial case where equals zero and we have
no spatial dependence in the data sample.
Given that least-squares will produce biased and inconsistent estimates
of the spatial autoregressive parameter in this model, how do we proceed
to estimate ? The maximum likelihood estimator for requires that we
find a value of that maximizes the likelihood function shown in (2.9).
L(y|, 2) =
1
2
2 (n/2)
|In W | exp{
33
1
(y W y)0 (y W y)} (2.9)
2 2
In order to simplify the maximization problem, we obtain a concentrated log likelihood function based on eliminating the parameter 2 for
the variance of the disturbances. This is accomplished by substituting
2 L 1
]
0
(2.11)
34
2.1.1
Building on the software design set forth in section 1.5 for our spatial
econometrics function library, we have implemented a function far to produce maximum likelihood estimates for the first-order spatial autoregressive
model. We rely on on the sparse matrix functionality of MATLAB so largescale problems can be solved using a minimum of time and computer RAM
memory. We demonstrate this function in action on a data set involving
3,107 U.S. counties.
Estimating the FAR model requires that we find eigenvalues for the large
n by n matrix W , as well as the determinant of the related n by n matrix
(In W ). In addition, matrix multiplications involving W and (In W )
are required to compute the information matrix used to produce estimates
of dispersion.
We constructed a function far that can produce estimates for the firstorder spatial autoregressive model in a case involving 3,107 observations in
95 seconds on a moderately fast inexpensive desktop computer. The MATLAB algorithms for dealing with sparse matrices make it ideally suited for
spatial modeling because spatial weight matrices are almost always sparse.
Another issue we need to address is computing measures of dispersion
for the estimates and 2 in large estimation problems. As already noted,
we cannot rely on the information matrix approach because this involves
matrix operations on very large matrices. An approach that we take to
produce measures of dispersion is to numerically evaluate the hessian matrix
using the maximum likelihood estimates of and 2 . The approach basically
produces a numerical approximation to the expression in (2.11). A key to
using this approach is the ability to evaluate the log likelihood function using
the sparse algorithms to handle large matrices.
It should be noted that Pace and Barry (1997) when confronted with the
task of providing measures of dispersion for spatial autoregressive estimates
based on sparse algorithms suggest using likelihood ratio tests to determine
the significance of the parameters. The approach taken here may suffer from
some numerical inaccuracy relative to measures of dispersion based on the
35
theoretical information matrix, but has the advantage that users will be
presented with traditional tstatistics on which they can base inferences.
We will have more to say about how our approach to solving large spatial autoregressive estimation problems using sparse matrix algorithms in
MATLAB compares to one proposed by Pace and Barry (1997), when we
apply the function far to a large data set in the next section.
Documentation for the function far is presented below. This function
was written to perform on both large and small problems. If the problem
is small (involving less than 500 observations), the function far computes
measures of dispersion using the theoretical information matrix. If more observations are involved, the function determines these measures by computing a numerical hessian matrix. (Users may need to decrease the number of
observations to less than 500 if they have computers without a large amount
of RAM memory.)
PURPOSE: computes 1st-order spatial autoregressive estimates
y = p*W*y + e, using sparse matrix algorithms
--------------------------------------------------USAGE: results = far(y,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
W = standardized contiguity matrix
rmin = (optional) minimum value of rho to use in search
rmax = (optional) maximum value of rho to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
--------------------------------------------------RETURNS: a structure
results.meth = far
results.rho = rho
results.tstat = asymptotic t-stat
results.yhat = yhat
results.resid = residuals
results.sige = sige = (y-p*W*y)*(y-p*W*y)/n
results.rsqr = rsquared
results.lik = -log likelihood
results.nobs = nobs
results.nvar = nvar = 1
results.y
= y data vector
results.iter
= # of iterations taken
results.romax = 1/max eigenvalue of W (or rmax if input)
results.romin = 1/min eigenvalue of W (or rmin if input)
--------------------------------------------------
One option we provide allows the user to supply minimum and maximum
values of rather than rely on the eigenvalues of W . This might be used if
36
2.1.2
Applied examples
This example produced the following printed output with the graphical
output presented in Figure 2.1. From the output we would infer that a
distinct spatial dependence among the crime incidents for the sample of 49
neighborhoods exists since the parameter estimate for has a tstatistic of
4.259. We would interpret this statistic in the typical regression fashion to
indicate that the estimated lies 4.2 standard deviations away from zero.
We also see that this model explains nearly 44% of the variation in crime
incidents in deviations from the means form.
First-order spatial autoregressive model Estimates
Dependent Variable =
crime
R-squared
=
0.4390
sigma^2
= 153.8452
Nobs, Nvars
=
49,
1
log-likelihood =
-373.44669
# of iterations =
17
min and max rho =
-1.5362,
1.0000
37
***************************************************************
Variable
Coefficient
t-statistic
t-probability
rho
0.669775
4.259172
0.000095
20
-20
-40
10
15
20
25
30
35
40
45
50
30
35
40
45
50
Residuals
40
20
0
-20
-40
-60
10
15
20
25
38
39
row and column elements. Continuing with our example, we can store the
first-order contiguity matrix in a single data file containing 12,429 rows with
3 columns that take the form:
row column value
This represents a considerable savings in computational space when compared to storing a matrix containing 9,653,449 elements. A handy utility
function in MATLAB is spy which allows one to produce a specially formatted graph showing the sparsity structure associated with sparse matrices.
We demonstrate by executing spy(W) on our weight matrix W from the
Pace and Barry data set, which produced the graph shown in Figure 2.2.
As we can see from the figure, most of the non-zero elements reside near the
diagonal.
As an example of storing a sparse first-order contiguity matrix, consider
example 2.2 below that reads data from the file ford.dat in sparse format and uses the function sparse to construct a working spatial contiguity
matrix W . The example also produces a graphical display of the sparsity
structure using the MATLAB function spy.
% ----- Example 2.2 Using sparse matrix functions
load ford.dat; % 1st order contiguity matrix
% stored in sparse matrix form
ii = ford(:,1);
jj = ford(:,2);
ss = ford(:,3);
clear ford;
% clear out the matrix to save RAM memory
W = sparse(ii,jj,ss,3107,3107);
clear ii; clear jj; clear ss; % clear out these vectors to save memory
spy(W);
To compare our function far with the approach proposed by Pace and
Barry, we implemented their approach and provide timing results. We take
a more efficient approach to the grid search over values of the parameter
than suggested by Pace and Barry. Rather than search over a large number
of values for , we based our search on a large increment of 0.1 for an initial
grid of values covering from 1/min to 1/max. Given the determinant
of (I W ) calculated using sparse matrix algorithms in MATLAB, we
evaluated the negative log likelihood function values for this grid of values
to find the value that minimizes the likelihood function. We then make
a second pass based on a tighter grid with increments of 0.01 around the
optimal value found in the first pass. A third and final pass is based on an
40
500
1000
1500
2000
2500
3000
0
500
1000
1500
nz = 12428
2000
2500
3000
41
How does our approach compare to that of Pace and Barry? Example 2.3
shows a program to estimate the same FAR model using our far function.
% ----- Example 2.3 Using the far() function
%
with very large data set from Pace and Barry
load elect.dat;
% load data on votes
y = elect(:,7)./elect(:,8); % proportion of voters casting votes
ydev = y - mean(y);
% deviations from the means form
clear y;
% conserve on RAM memory
clear elect; % conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
tic; res = far(ydev,W); toc;
prt(res);
In terms of time needed to solve the problem, our use of the simplex
optimization algorithm takes only 10.6 seconds to produce a more accurate
estimate than that based on the grid approach of Pace and Barry. Their
approach which we modified took 30 seconds to solve for a value accurate
to 3 decimal digits. Note also in contrast to Pace and Barry, we compute
a conventional measure of dispersion using the numerical hessian estimates
which takes only 1.76 seconds. The total time required to compute not only
the estimates and measures of dispersion for and , but the Rsquared
statistics and log likelihood function was around 100 seconds.
elapsed_time
elapsed_time
elapsed_time
elapsed_time
total time
42
Many of the ideas developed in this section regarding the use of MATLAB sparse matrix algorithms will apply equally to the estimation procedures we develop in the next three sections for the other members of the
spatial autoregressive model family.
2.2
y = W y + X +
(2.12)
N (0, In )
2
43
are carried out along with a univariate parameter optimization of the concentrated likelihood function over values of the autoregressive parameter .
The steps are enumerated in Anselin (1988) as:
2.2.1
The function sar is fairly similar to our far function, with the documentation
presented below.
PURPOSE: computes spatial autoregressive model estimates
y = p*W*y + X*b + e, using sparse matrix algorithms
--------------------------------------------------USAGE: results = sar(y,x,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
x = explanatory variables matrix
W = standardized contiguity matrix
rmin = (optional) minimum value of rho to use in search
rmax = (optional) maximum value of rho to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
--------------------------------------------------RETURNS: a structure
44
results.meth = sar
results.beta = bhat
results.rho = rho
results.tstat = asymp t-stat (last entry is rho)
results.yhat = yhat
results.resid = residuals
results.sige = sige = (y-p*W*y-x*b)*(y-p*W*y-x*b)/n
results.rsqr = rsquared
results.rbar = rbar-squared
results.lik = -log likelihood
results.nobs = # of observations
results.nvar = # of explanatory variables in x
results.y
= y data vector
results.iter
= # of iterations taken
results.romax = 1/max eigenvalue of W (or rmax if input)
results.romin = 1/min eigenvalue of W (or rmin if input)
--------------------------------------------------
As in the case of far, we allow the user to provide minimum and maximum values of to use in the search. This may save time in cases where we
wish to restrict our estimate of to the positive range.
The other point to note is that this function also uses the numerical
hessian approach to compute measures of dispersion for large problems involving more than 500 observations.
2.2.2
Applied examples
45
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
vnames = strvcat(voters,const,educ,homeowners,income);
to = clock;
res = sar(y,x,W);
etime(clock,to)
prt(res,vnames);
We see from the results that all of the explanatory variables exhibit a
significant effect on the variable we wished to explain. The results also
indicate that the dependent variable y exhibits strong spatial dependence
even after taking the effect of these variables into account as the estimate
of on the spatial lagged variable is large and significant.
As an illustration of the bias associated with least-squares estimation
of spatial autoregressive models, we present an example based on a spatial
sample of 88 observations for counties in the state of Ohio. A sample of
average housing values for each of 88 counties in Ohio will be related to
population per square mile, the housing density and unemployment rates in
each county. This regression relationship can be written as:
HOU SEi = + P OPi + HDEN SIT Yi + U N EM P LOYi + i (2.13)
46
47
load Wmat.data;
% load standardized contiguity matrix
% create contiguity matrix from x-y coordinates
[W1 W2 W3] = xy2cont(xc,yc);
% graphically compare the two
spy(W2,ok); hold on; spy(Wmat,+k);
legend(generated,actual);
10
generated
actual
15
20
25
30
35
40
45
50
10
15
20
25
nz = 232
30
35
40
45
50
48
which divides every element in the column vector ohio(:,5) containing total
households in each county by every element in the column vector ohio1(:,2),
which contains the population for every county. This produces the number of
households per capita for each county as an explanatory variable measuring
household density.
% ----- Example 2.6 Least-squares bias
%
demonstrated with Ohio county data base
load ohio1.dat; % 88 counties (observations)
% 10 columns
% col1 area in square miles
% col2 total population
% col3 population per square mile
% col4 black population
% col5 blacks as a percentage of population
% col6 number of hospitals
% col7 total crimes
% col8 crime rate per capita
% col9 population that are high school graduates
% col10 population that are college graduates
load ohio2.dat; % 88 counties
% 10 columns
% col1 income per capita
% col2 average family income
% col3 families in poverty
% col4 percent of families in poverty
% col5 total number of households
% col6 average housing value
% col7 unemployment rate
% col8 total manufacturing employment
% col9 manufacturing employment as a percent of total
% col10 total employment
load ohio.xy; % latitude-longitude coordinates of county centroids
[junk W junk2] = xy2cont(ohio(:,1),ohio(:,2)); % make W-matrix
y = log(ohio2(:,6)); n = length(y);
x = [ ones(n,1) ohio1(:,3) ohio2(:,5)./ohio1(:,2) ohio2(:,7) ];
vnames = strvcat(hvalue,constant,popsqm,housedensity,urate);
res = ols(y,x); prt(res,vnames);
res = sar(y,x,W); prt(res,vnames);
The results from these two regressions are shown below. The first point
to note is that the spatial autocorrelation coefficient estimate for the SAR
model is statistically significant, indicating the presence of spatial autocorrelation in the regression relationship. Least-squares ignores this type of
variation producing estimates that lead us to conclude that all three explanatory variables are significant in explaining housing values across the 88
county sample. In contrast, the SAR model leads us to conclude that the
49
A second point is that taking the spatial variation into account improves
the fit of the model, raising the R-squared statistic for the SAR model.
Finally, the magnitudes of the OLS parameter estimates indicate that house
values are more sensitive to the household density and the unemployment
rate variables than the SAR model. For example, the OLS estimates imply
that a one percentage point increase in the unemployment rate leads to
a decrease of 6.8 percent in house values whereas the SAR model places
this at 5.5 percent. Similarly, the OLS estimates for household density is
considerably larger in magnitude than that from the SAR model.
50
2.3
Here we turn attention to the spatial errors model shown in (2.14), where
the disturbances exhibit spatial dependence. Anselin (1988) provides a maximum likelihood method for this model which we label SEM here.
y = X + u
(2.14)
u = W u +
N (0, 2In )
y contains an nx1 vector of dependent variables and X represents the usual
nxk data matrix containing explanatory variables. W is a known spatial
weight matrix and the parameter is a coefficient on the spatially correlated
errors analogous to the serial correlation problem in time series models. The
parameters reflect the influence of the explanatory variables on variation
in the dependent variable y.
We introduce a number of statistical tests that can be used to detect
the presence of spatial autocorrelation in the residuals from a least-squares
model. Use of these tests will be illustrated in the next section.
The first test for spatial dependence in the disturbances of a regression
model is called Morans Istatistic. Given that this test indicates spatial
correlation in the least-squares residuals, the SEM model would be an appropriate way to proceed.
Morans Istatistic takes two forms depending on whether the spatial
weight matrix W is standardized or not.
1. W not standardized
2. W standardized
I = (n/s)[e0 W e]/e0 e
(2.15)
I = e0 W e/e0 e
(2.16)
51
Where e represent regression residuals. Cliff and Ord (1972, 1973, 1981)
show that the asymptotic distribution for Morans I based on least-squares
residuals corresponds to a standard normal distribution after adjusting the
Istatistic by subtracting the mean and dividing by the standard deviation
of the statistic. The adjustment takes two forms depending on whether W
is standardized or not. (Anselin, 1988, page 102).
1. W not standardized, let M = (I X(X 0X)1 X 0 ) and tr denotes the
trace operator.
E(I) = (n/s)tr(M W )/(n k)
V (i) = (n/s)2 [tr(M W M W 0 ) + tr(M W )2 + (tr(M W ))2 ]/d E(I)2
d = (n k)(n k + 2)
ZI
= [I E(I)]/V (I)1/2
(2.17)
2. W standardized
E(I) = tr(M W )/(n k)
V (i) = [tr(M W M W 0 ) + tr(M W )2 + (tr(M W ))2 ]/d E(I)2
d = (n k)(n k + 2)
ZI
= [I E(I)]/V (I)1/2
(2.18)
52
A number of other asymptotic approaches exist for testing whether spatial correlation is present in the residuals from a least-squares regression
model. Some of these are the likelihood ratio test, the Wald test and a
lagrange multiplier test, all of which are based on maximum likelihood estimation of the SEM model.
The likelihood ratio test is based on the difference between the log likelihood from the SEM model and the log likelihood from a least-squares
regression. This quantity represents a statistic that is distributed 2 (1). A
function lratios carries out this test and returns a results structure which
can be passed to the prt function for presentation of the results. Documentation for the function is:
PURPOSE: computes likelihood ratio test for spatial
correlation in the errors of a regression model
--------------------------------------------------USAGE: result = lratios(y,x,W)
or: result = lratios(y,x,W,sem_result);
where:
y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized or unstandardized)
sem_result = a results structure from sem()
--------------------------------------------------RETURNS: a structure variable
result.meth = lratios
result.lratio = likelihood ratio statistic
result.chi1 = 6.635
result.prob = marginal probability
result.nobs = # of observations
result.nvar = # of variables in x-matrix
--------------------------------------------------NOTES: lratio > 6.635, => small prob,
=> reject HO: of no spatial correlation
calling the function with a results structure from sem()
can save time for large models that have already been estimated
---------------------------------------------------
53
Note that we allow the user to supply a results structure variable from
the sem estimation function, which would save the time needed to estimate the SEM model if the model has already been estimated. This could
represent a considerable savings for large problems.
Another approach is based on a Wald test for residual spatial autocorrelation. This test statistic (shown in (2.19)) is distributed 2 (1). (Anselin,
1988, page 104).
t1 = tr(W. B
t2 = tr(W B
t3 = tr(W B
(2.19)
1 2
1 0
) (W B 1 )
LM
T
54
(2.20)
= tr(W + W ). W
Finally, a test based on the residuals from the SAR model can be used
to examine whether inclusion of the spatial lag term eliminates spatial dependence in the residuals of the model. This test differs from the four tests
outlined above in that we allow for the presence of the spatial lagged variable Cy in the model. The test for spatial dependence is conditional on
having a parameter not equal to zero in the model, rather than relying on
least-squares residuals as in the case of the other four tests.
One could view this test as based on the following model:
y = Cy + X + u
u = W u +
N (0, 2In )
(2.21)
55
Where the focus of the test is on whether the parameter = 0. This test
statistic is also a Lagrange Multiplier statistic based on: (Anselin, 1988,
page 106).
(e0 W e/ 2 )[T22 (T21)2 var()]1 2 (1)
(2.22)
T22 = tr(W. W + W W )
T21 = tr(W. CA1 + W 0 CA1 )
Where W is the spatial weight matrix shown in (2.21), A = (In C) and
var() is the maximum likelihood estimate of the variance of the parameter
in the model.
We have implemented this test in a MATLAB function lmsar with the
documentation for the function shown below.
PURPOSE: LM statistic for spatial correlation in the
residuals of a spatial autoregressive model
--------------------------------------------------USAGE: result = lmsar(y,x,W1,W2)
where: y = dependent variable vector
x = independent variables matrix
W1 = contiguity matrix for rho
W2 = contiguity matrix for lambda
--------------------------------------------------RETURNS: a structure variable
result.meth = lmsar
result.lm = LM statistic
result.prob = marginal probability
result.chi1 = 6.635
result.nobs = # of observations
result.nvar = # of variables
--------------------------------------------------NOTE: lm > 6.635, => small prob,
=> reject HO: of no spatial correlation
--------------------------------------------------See also: walds, lratios, moran, lmerrors
---------------------------------------------------
It should be noted that a host of other methods to test for spatial dependence in various modeling situations have been proposed. In addition,
the small sample properties of many alternative tests have been compared
in Anselin and Florax (1994) and Anselin and Rey (1991). One point to
consider is that many of the matrix computations required for these tests
cannot be carried out with very large data samples. We discuss this issue
and suggest alternative approaches in the applied examples of Section 2.3.2.
2.3.1
56
57
2.3.2
Applied examples
Note that we have provided code in the prt function to provide a nicely
formatted print out of the test results from our spatial correlation testing
58
functions. From the results printed below, we see that the least-squares
residuals exhibit spatial correlation. We infer this from the small marginal
probabilities that indicate significance at the 99% level of confidence. With
regard to the LM error test for spatial correlation in the residuals of the
SAR model implemented in the function lmsar,we see from the marginal
probability of 0.565 that here we can reject any spatial dependence in the
residuals from this model.
Moran I-test for spatial correlation in residuals
Moran I
0.23610178
Moran I-statistic
2.95890622
Marginal Probability
0.00500909
mean
-0.03329718
standard deviation
0.09104680
LM error tests for spatial correlation in residuals
LM value
5.74566426
Marginal Probability
0.01652940
chi(1) .01 value
17.61100000
LR tests for spatial correlation in residuals
LR value
8.01911539
Marginal Probability
0.00462862
chi-squared(1) value
6.63500000
Wald test for spatial correlation in residuals
Wald value
14.72873758
Marginal Probability
0.00012414
chi(1) .01 value
6.63500000
LM error tests for spatial
LM value
Marginal Probability
chi(1) .01 value
-0.302236
0.562233
-3.340320
4.351068
59
0.001667
0.000075
60
R-squared
=
0.6570
Rbar-squared
=
0.6566
sigma^2
=
0.0040
log-likelihood =
5063.9904
Nobs, Nvars
=
3107,
4
# iterations
=
10
min and max lam =
-1.0710,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
const
1.216656
35.780274
0.000000
educ
0.192118
14.564908
0.000000
homeowners
0.250041
29.083714
0.000000
income
-0.117625
-9.560005
0.000000
lambda
0.659193
43.121763
0.000000
% estimates from optimization approach using semo() function
Spatial error Model Estimates
Dependent Variable =
voters
R-squared
=
0.6570
Rbar-squared
=
0.6566
sigma^2
=
0.0040
log-likelihood =
5063.9904
Nobs, Nvars
=
3107,
4
# iterations
=
6
min and max lam =
-1.0710,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
const
1.216673
35.780840
0.000000
educ
0.192125
14.565561
0.000000
homeowners
0.250036
29.083079
0.000000
income
-0.117633
-9.560617
0.000000
lambda
0.659176
43.119452
0.000000
The estimates from this model indicate that after taking into account
the influence of the explanatory variables, we still have spatial correlation in
the residuals of the model that can be modeled successfully with the SEM
model. As a confirmation of this, consider that the LR test implemented
with the function lratios produced the results shown below:
LR tests for spatial correlation in residuals
LR value
1163.01773404
Marginal Probability
0.00000000
chi-squared(1) value
6.63500000
61
2.4
A general version of the spatial model includes both the spatial lagged term
as well as a spatially correlated error structure as shown in (2.23).
y = W1 y + X + u
(2.23)
u = W2 u +
N (0, 2In )
One point to note about this model is that W1 can equal W2 , but there
may be identification problems in this case. The log likelihood for this
model can be maximized using our general optimization algorithm on a
concentrated version of the likelihood function. The parameters and 2
are concentrated out of the likelihood function, leaving the parameters
and . This eliminates the ability to use the univariate simplex optimization
algorithm fmin that we used with the other spatial autoregressive models.
We can still produce a sparse matrix algorithm for the log likelihood
function and proceed in a similar fashion to that used for the other spatial
autoregressive models. One difference is that we cannot easily impose restrictions on the parameters and to force them to lie within the ranges
defined by the maximum and minimum eigenvalues from their associated
weight matrices W1 and W2 .
When might one rely on this model? If there were evidence that spatial dependence existed in the error structure from a spatial autoregressive
(SAR) model, the SAC model is an appropriate approach to modeling this
type of dependence in the errors. Recall, we can use the LM-test implemented in the function lmsars to see if spatial dependence exists in the
residuals of an SAR model.
Another place where one might rely on this model is a case where a
second-order spatial contiguity matrix was used for W2 that corresponds to
a first-order contiguity matrix W1 . This type of model would express the belief that the disturbance structure involved higher-order spatial dependence,
perhaps due to second-round effects of a spatial phenomena being modeled.
62
A third example of using matrices W1 and W2 might be where W1 represented a first-order contiguity matrix and W2 was constructed as a diagonal
matrix measuring the distance from the central city. This type of configuration of the spatial weight matrices would indicate a belief that contiguity
alone does not suffice to capture the spatial effects at work. The distance
from the central city might also represent an important factor in the phenomena we are modeling. This raises the identification issue, should we use
the distance weighting matrix in place of W1 and the first-order contiguity
matrix for W2 , or rely on the opposite configuration? Of course, comparing likelihood function values along with the statistical significance of the
parameters and from models estimated using both configurations might
point to a clear answer.
The log likelihood function for this model is:
L = C (n/2) ln( 2 ) + ln(|A|) + ln(|B|) (1/2 2)(e0 B 0 Be)
e = (Ay X)
A = (In W1 )
(2.24)
B = (In W2 )
We concentrate the function using the following expressions for and
2:
= (X 0A0 AX)1 (X 0 A0 ABy)
e = By x
(2.25)
2 = (e0 e)/n
Given the expressions in (2.25), we can evaluate the log likelihood given
values of and . The values of the other parameters and 2 can be
calculated as a function of the , parameters and the sample data in y, X.
2.4.1
Documentation for the MATLAB function sac that carries out the nonlinear optimization of the log likelihood function for this model is shown
below. There are a number of things to note about this function. First, we
provide optimization options for the user in the form of a structure variable
info. These options allow the user to control some aspects of the maxlik
63
64
2.4.2
Applied examples
Our first example illustrates the use of the general spatial model with the
Anselin Columbus neighborhood crime data set. We construct a matrix W2
for use in the model based on W2 = W 0 W . To provide an illustration of
what this matrix inner product involving the first-order contiguity matrix
W represents in terms of spatial structure, we use the spy function to produce comparative graphs of the two contiguity matrices that are shown in
Figure 2.4. As we can see from the graphs, the inner product allows for a
more wide-ranging set of spatial influences reflecting secondary influences
not captured by the first-order contiguity matrix. The sparse matrix literature refers to this phenomena where matrix products produce less sparsity as
fill-in. The motivation for this terminology should be clear from the figure.
For another approach to producing spatial lag matrices, see the function
slag which represents a more appropriate way to create higher-order spatial
contiguity matrices.
Our example illustrates the point discussed earlier regarding model specification with respect to the use of W and W2 by producing estimates for
two models based on alternative configurations of these two spatial weight
matrices. We also produce estimates based on a model that uses the W
matrix for both the autoregressive lag and the autoregressive error terms.
% ----- Example 2.9 Using the sac function
load Wmat.dat;
% standardized 1st-order contiguity matrix
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)];
W = Wmat;
vnames = strvcat(crime,const,income,house value);
W2 = W*W;
0
10
20
30
40
50
10
20
30
40
50
First-order contiguity structure
0
10
20
30
40
50
10
20
30
40
contiguity structure of W2
50
65
66
subplot(2,1,1), spy(W);
xlabel(First-order contiguity structure);
subplot(2,1,2), spy(W2);
xlabel(contiguity structure of W2);
res1 = sac(y,x,W2,W);% general spatial model W2,W
prt(res1,vnames);
% print the output
res2 = sac(y,x,W,W2);% general spatial model W,W2
prt(res2,vnames);
% print the output
res3 = sac(y,x,W,W); % general spatial model W,W
prt(res3,vnames);
% print the output
The estimation results are shown below for all three versions of the
model. The first model produced the worst fit indicated by the lower R2
value, but a slightly higher log likelihood function value. In addition, we see
that both the coefficients on and are statistically significant, as indicated
by the tstatistics and associated marginal probabilities. The negative spatial lag parameter is problematical as this type of influence is difficult to
explain intuitively. Negative spatial lags would indicate that higher crime
rates in neighboring regions exerted a negative influence on crime on average
across the entire Columbus neighborhood sample. When considered from
an overall and limiting case viewpoint, this seems a logical impossibility.
General Spatial Model Estimates
Dependent Variable =
crime
R-squared
=
0.5591
Rbar-squared =
0.5400
sigma^2
= 120.9027
log-likelihood =
-164.88367
Nobs, Nvars
=
49,
3
# iterations =
8
***************************************************************
Variable
Coefficient
t-statistic
t-probability
const
37.261470
5.472898
0.000002
income
-0.958005
-2.573394
0.013360
house value
-0.219046
-2.125609
0.038937
rho
-0.713070
-2.070543
0.044041
lambda
0.571698
18.759773
0.000000
General Spatial Model Estimates
Dependent Variable =
crime
R-squared
=
0.6528
Rbar-squared =
0.6377
sigma^2
=
95.2216
log-likelihood =
-166.12487
Nobs, Nvars
=
49,
3
# iterations =
6
***************************************************************
Coefficient
55.343773
-0.672762
-0.320741
0.512647
0.064988
t-statistic
4.628119
-2.163164
-3.552861
3.747903
0.829892
67
t-probability
0.000030
0.035760
0.000894
0.000497
0.410886
The second model produced the typical positive coefficient on the spatial lag term along with an insignificant estimate for . As noted above,
it also produced a better fit to the sample data. Confirmation of these results from the second model are provided by the test for spatial dependence
in the disturbances of the SAR model that we carried out in example 2.7,
which are repeated below. From these results, we see that there is no evidence of spatial dependence in the disturbances of the SAR model. This
explains why the parameter in the general spatial version of this model is
not significantly different from zero.
The third model that relies on the same W matrix for both the spatial
lag and error correlation terms produced estimates for both and that
were insignificant. Since we have statistical evidence of spatial correlation
in the residuals from least-squares as well as the statistical significance of
the parameter in the SAR model, we would reject this model as inferior
to the SAR model results.
By way of summary, I would reject the specification associated with the
first model since it produces counter-intuitive results. An important point to
note regarding any non-linear optimization problem such as that involved
in the SAC model is that the estimates may not reflect global optimum.
A few different solutions of the optimization problem based on alternative
starting values is usually undertaken to confirm that the estimates do indeed
68
represent global solutions to the problem. The function sac allows the user
to input alternative starting values, making this relatively easy to do.
Given that we accept the second model as best, because the parameter
is not statistically significant, we would conclude that an SAR model would
suffice for this sample data.
LM error tests for spatial
LM value
Marginal Probability
chi(1) .01 value
A final example uses the large Pace and Barry data set to illustrate the
sac function in operation on large problems. Example 2.10 turns on the
printing flag so we can observe intermediate results from the optimization
algorithm as it proceeds.
% ----- Example 2.10 Using the sac() function on a large data set
load elect.dat;
% load data on votes in 3,107 counties
y = (elect(:,7)./elect(:,8));
% convert to per capita variables
x1 = log(elect(:,9)./elect(:,8)); % education
x2 = log(elect(:,10)./elect(:,8));% homeownership
x3 = log(elect(:,11)./elect(:,8));% income
n = length(y); x = [ones(n,1) x1 x2 x3];
clear x1; clear x2; clear x3;
clear elect;
% conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
vnames = strvcat(voters,const,educ,homeowners,income);
to = clock; info.pflag = 1;
res = sac(y,x,W,W*W,info);
etime(clock,to)
prt(res,vnames);
The results including intermediate results that were printed are shown
below. It took 602 seconds to solve this problem involving only 6 iterations
by the maxlik function. This function tends to be faster than the alternative
optimization algorithms available in the Econometrics Toolbox.
==== Iteration ==== 2
log-likelihood
bconvergence
fconvergence
-5052.2920
1.1386
0.4438
Parameter
Estimates
Gradient
69
0.6329 -479.9615
0.0636 -4927.5720
fconvergence
0.0031
Parameter
Estimates
Gradient
Parameter 1
0.6644
-18.8294
Parameter 2
0.0185 -793.3583
==== Iteration ==== 4
log-likelihood
bconvergence
fconvergence
-5068.2786
0.2197
0.0000
Parameter
Estimates
Gradient
Parameter 1
0.6383
-54.3318
Parameter 2
0.0219
44.6226
==== Iteration ==== 5
log-likelihood
bconvergence
fconvergence
-5068.5622
0.0405
0.0001
Parameter
Estimates
Gradient
Parameter 1
0.6497
47.7892
Parameter 2
0.0223
27.5841
==== Iteration ==== 6
log-likelihood
bconvergence
fconvergence
-5068.5627
0.0107
0.0000
Parameter
Estimates
Gradient
Parameter 1
0.6500
0.6407
Parameter 2
0.0221
-3.6765
General Spatial Model Estimates
Dependent Variable =
voters
R-squared
=
0.6598
Rbar-squared =
0.6594
sigma^2
=
0.0040
log-likelihood =
5068.5627
Nobs, Nvars
=
3107,
4
# iterations =
6
***************************************************************
Variable
Coefficient
t-statistic
t-probability
const
1.193076
34.732852
0.000000
educ
0.180548
13.639337
0.000000
homeowners
0.267665
30.881364
0.000000
income
-0.107153
-8.596555
0.000000
rho
0.649976
39.916262
0.000000
lambda
0.022120
3.011273
0.002623
From the estimation results we see some evidence that a general spatial
70
2.5
An applied exercise
Belsley, Kuh, and Welch (1980) used the data to examine the effects of
robust estimation and published the observations in an appendix on pages
244-261. It should be noted that they published data that included various
transformations, so the data in their appendix does not match our data
71
which is in the raw untransformed format. Pace (1993), Gilley and Pace
(1996), and Pace and Gilley (1997) have used this data set with spatial
econometric models and longitude-latitude coordinates for the census tracts
have been added to the dataset.
Our regression model will simply relate the median house values to all of
the explanatory variables, simplifying our specification task. We will focus
on alternative spatial specifications and models.
The next task involves some scaling and standardization issues surrounding the data set. Belsley, Kuh and Welsch (1980) used this data set to illustrate numerically ill-conditioned data that contained outliers and influential observations. Poor scaling will adversely impact our numerical hessian
approach to determining the variance-covariance structure of the spatial
autoregressive parameter estimates. Intuitively, the hessian function attempts to compute a numerical derivative by perturbing each parameter in
turn and examining the impact on the likelihood function. If the parameters
vary widely in terms of magnitudes because the data is poorly scaled, this
task will be more difficult and we may calculate negative variances.
Example 2.11 demonstrates the nature of these scaling problems, carrying out a least-squares regression. We see that the coefficient estimates
varying widely in magnitude, so we scale the variables in the model using
a function studentize from the Econometric Toolbox that subtracts the
means and divides by the standard deviations. Another least-squares regression is then carried out to illustrate the impact of scaling on the model
coefficients.
% ----- Example 2.11 Least-squares on the Boston dataset
load boston.raw; % Harrison-Rubinfeld data
[n k] = size(boston);y = boston(:,k); % median house values
x = [ones(n,1) boston(:,1:k-1)];
% other variables
vnames = strvcat(hprice,constant,crime,zoning,industry, ...
charlesr,noxsq,rooms2,houseage,distance, ...
access,taxrate,pupil/teacher,blackpop,lowclass);
res = ols(y,x); prt(res,vnames);
ys = studentize(y); xs = studentize(x(:,2:k));
res2 = ols(ys,xs);
vnames2 = strvcat(hprice,crime,zoning,industry,charlesr, ...
noxsq,rooms2,houseage,distance,access,taxrate, ...
pupil/teacher,blackpop,lowclass);
prt(res2,vnames2);
% sort actual and predicted by housing values from low to high
yhat = res2.yhat; [ysort yi] = sort(ys); yhats = yhat(yi,1);
tt=1:n; % plot actual vs. predicted
plot(tt,ysort,ok,tt,yhats,+k);
ylabel(housing values);
72
The results indicate that the coefficient estimates based on the unscaled
data vary widely in magnitude from 0.000692 to 36.459, whereas the scaled
variables produce coefficients ranging from 0.0021 to -0.407.
Ordinary Least-squares Estimates (non-scaled variables)
Dependent Variable =
hprice
R-squared
=
0.7406
Rbar-squared =
0.7338
sigma^2
=
22.5179
Durbin-Watson =
1.2354
Nobs, Nvars
=
506,
14
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
36.459488
7.144074
0.000000
crime
-0.108011
-3.286517
0.001087
zoning
0.046420
3.381576
0.000778
industry
0.020559
0.334310
0.738288
charlesr
2.686734
3.118381
0.001925
noxsq
-17.766611
-4.651257
0.000004
rooms2
3.809865
9.116140
0.000000
houseage
0.000692
0.052402
0.958229
distance
-1.475567
-7.398004
0.000000
access
0.306049
4.612900
0.000005
taxrate
-0.012335
-3.280009
0.001112
pupil/teacher
-0.952747
-7.282511
0.000000
blackpop
0.009312
3.466793
0.000573
lowclass
-0.524758
-10.347146
0.000000
Ordinary Least-squares Estimates (scaled variables)
Dependent Variable =
hprice
R-squared
=
0.7406
Rbar-squared =
0.7343
sigma^2
=
0.2657
Durbin-Watson =
1.2354
Nobs, Nvars
=
506,
13
***************************************************************
Variable
Coefficient
t-statistic
t-probability
crime
-0.101017
-3.289855
0.001074
zoning
0.117715
3.385011
0.000769
industry
0.015335
0.334650
0.738032
charlesr
0.074199
3.121548
0.001905
noxsq
-0.223848
-4.655982
0.000004
rooms2
0.291056
9.125400
0.000000
houseage
0.002119
0.052456
0.958187
distance
-0.337836
-7.405518
0.000000
access
0.289749
4.617585
0.000005
-0.226032
-0.224271
0.092432
-0.407447
-3.283341
-7.289908
3.470314
-10.357656
73
0.001099
0.000000
0.000565
0.000000
The program in example 2.11 also produces a plot of the actual versus
predicted values from the model sorted by housing values from low to high.
From this plot (shown in Figure 2.5), we see large predicted errors for the
highest housing values. This suggests a log transformation on the dependent
variable y in the model would be appropriate. The figure also illustrates that
housing values above $50,000 have been censored to a value of $50,000, a
subject we take up in Chapter 5.
3
housing values
-1
-2
-3
100
200
300
400
census tract observations
500
600
74
We also carry out a Morans I test for spatial autocorrelation, which may
or may not work depending on how much RAM memory you have in your
computer. Note that we can directly use the prt function without first
returning the results from moran to a results structure. We rely on our
function xy2cont to generate a spatial contiguity matrix needed by far and
moran to test for spatial autocorrelation.
% ----- Example 2.12 Testing for spatial correlation
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k);
% median house values
x = boston(:,1:k-1);
% other variables
vnames = strvcat(hprice,crime,zoning,industry,charlesr, ...
noxsq,rooms2,houseage,distance,access,taxrate, ...
pupil/teacher,blackpop,lowclass);
ys = studentize(log(y)); xs = studentize(x);
res = ols(ys,xs); prt(res,vnames);
resid = res.resid; % recover residuals
rmin = 0; rmax = 1;
res2 = far(resid,W,rmin,rmax); prt(res2);
prt(moran(ys,xs,W));
The results shown below indicate that we have strong evidence of spatial
autocorrelation in the residuals from the least-squares model. Our FAR
model produced a spatial correlation coefficient estimate of 0.647 with a large
tstatistic and this indication of spatial autocorrelation in the residuals is
confirmed by the Moran test results.
Ordinary Least-squares Estimates
Dependent Variable =
hprice
R-squared
=
0.7896
Rbar-squared =
0.7845
sigma^2
=
0.2155
Durbin-Watson =
1.0926
Nobs, Nvars
=
506,
13
***************************************************************
Variable
Coefficient
t-statistic
t-probability
crime
-0.216146
-7.816230
0.000000
zoning
0.066897
2.136015
0.033170
industry
0.041401
1.003186
0.316263
charlesr
0.062690
2.928450
0.003564
noxsq
-0.220667
-5.096402
0.000000
rooms2
0.156134
5.435516
0.000000
houseage
0.014503
0.398725
0.690269
distance
-0.252873
-6.154893
0.000000
access
0.303919
5.377976
0.000000
-0.258015
-0.202702
0.092369
-0.507256
-4.161602
-7.316010
3.850718
-14.318127
75
0.000037
0.000000
0.000133
0.000000
The results from example 2.13 are presented below. We see that all
three models produced estimates indicating significant spatial autocorrelation. For example, the SAR model produced a coefficient estimate for
equal to 0.4508 with a large tstatistic, and the SEM model produced an
estimate for of 0.7576 that was also significant. The SAC model produced
estimates of and that were both significant at the 99% level.
76
Which model is best? The log-likelihood function values are much higher
for the SEM and SAC models, so this would be evidence against the SAR
model. A further test of the SAR model would be to use the function lmsar
that tests for spatial autocorrelation in the residuals of the SAR model. If
we find evidence of residual spatial autocorrelation, it suggests that the SAC
model might be most appropriate. Note that the SAC model exhibits the
best log-likelihood function value.
Spatial autoregressive Model Estimates
Dependent Variable =
hprice
R-squared
=
0.8421
Rbar-squared
=
0.8383
sigma^2
=
0.1576
Nobs, Nvars
=
506,
13
log-likelihood =
-85.099051
# of iterations =
9
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
crime
-0.165349
-6.888522
0.000000
zoning
0.080662
3.009110
0.002754
industry
0.044302
1.255260
0.209979
charlesr
0.017156
0.918665
0.358720
noxsq
-0.129635
-3.433659
0.000646
rooms2
0.160858
6.547560
0.000000
houseage
0.018530
0.595675
0.551666
distance
-0.215249
-6.103520
0.000000
access
0.272237
5.625288
0.000000
taxrate
-0.221229
-4.165999
0.000037
pupil/teacher
-0.102405
-4.088484
0.000051
blackpop
0.077511
3.772044
0.000182
lowclass
-0.337633
-10.149809
0.000000
rho
0.450871
12.348363
0.000000
Spatial error Model Estimates
Dependent Variable =
hprice
R-squared
=
0.8708
Rbar-squared
=
0.8676
sigma^2
=
0.1290
log-likelihood =
-58.604971
Nobs, Nvars
=
506,
13
# iterations
=
10
min and max lam =
0.0000,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
crime
-0.186710
-8.439402
0.000000
zoning
0.056418
1.820113
0.069348
industry
-0.000172
-0.003579
0.997146
-0.014515
-0.220228
0.198585
-0.065056
-0.224595
0.352244
-0.257567
-0.122363
0.129036
-0.380295
0.757669
-0.678562
-3.683553
8.325187
-1.744224
-3.421361
5.448380
-4.527055
-3.839952
4.802657
-10.625978
19.133467
77
0.497734
0.000255
0.000000
0.081743
0.000675
0.000000
0.000008
0.000139
0.000002
0.000000
0.000000
The results from the lmsar test shown below indicate the presence of
spatial autocorrelation in the residuals of the SAR model, suggesting that
the SAC model would be appropriate.
LM error tests for spatial correlation in SAR model residuals
LM value
60.37309581
Marginal Probability
0.00000000
chi(1) .01 value
6.63500000
We would conclude that the SAC model is most appropriate here. Regarding inferences, an interesting point is that the Charles River location
78
46.1 sec.
79.7 sec.
58.8 sec. 170.1 sec. 114.6 sec.
SAR(hess)
SEM(info)
SEM(hess) SAC(info)
SAC(hess)
-6.888522
-8.474183
-8.439402 -8.718561
-8.766862
3.009110
1.820262
1.820113
3.266408
2.824768
1.255260
-0.003584
-0.003579 0.750766
0.585884
0.918665
-0.690453
-0.678562 -0.227615
-0.194727
-3.433659
-3.687834
-3.683553 -4.598758
-3.322769
6.547560
8.345316
8.325187
8.941594
8.573808
0.595675
-1.756358
-1.744224 -1.597997
-1.337513
-6.103520
-3.435146
-3.421361 -7.690499
-5.147088
5.625288
5.478342
5.448380
6.906236
5.502331
-4.165999
-4.527097
-4.527055 -4.985237
-4.533481
-4.088484
-3.886484
-3.839952 -4.765931
-3.974717
3.772044
4.808276
4.802657
5.971414
4.768082
-10.149809 -10.954441 -10.625978 -11.650367 -10.707764
12.348363
20.890074
19.133467 24.724209
9.519920
3.747132
3.059010
79
80
The results shown below indicate that the SAC model using a secondorder spatial weight matrix produces a slightly lower likelihood function
value than the SAC model in example 2.13, but estimates that are reasonably
similar in all other regards.
General Spatial Model Estimates
Dependent Variable =
hprice
R-squared
=
0.8766
Rbar-squared =
0.8736
sigma^2
=
0.1231
log-likelihood =
-56.71359
Nobs, Nvars
=
506,
13
# iterations =
7
***************************************************************
Variable
Coefficient
t-statistic
t-probability
crime
-0.195223
-8.939992
0.000000
zoning
0.085436
2.863017
0.004375
industry
0.018200
0.401430
0.688278
charlesr
-0.008227
-0.399172
0.689939
noxsq
-0.190621
-3.405129
0.000715
rooms2
0.204352
8.727857
0.000000
houseage
-0.056388
-1.551836
0.121343
distance
-0.287894
-5.007506
0.000001
access
0.332325
5.486948
0.000000
taxrate
-0.245156
-4.439223
0.000011
pupil/teacher
-0.109542
-3.609570
0.000338
blackpop
0.127546
4.896422
0.000001
lowclass
-0.363506
-10.368125
0.000000
rho
0.684424
12.793448
0.000000
lambda
0.208597
2.343469
0.019502
First-order spatial autoregressive model Estimates
R-squared
=
0.0670
sigma^2
=
0.1245
81
Nobs, Nvars
=
506,
1
log-likelihood =
-827.5289
# of iterations =
9
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
rho
0.188278
1.420560
0.156062
The residuals from the SAC model in example 2.13 show no significant
second-order spatial autocorrelation, since the FAR model estimate is not
statistically significant.
To further explore this model as an exercise, you might consider replacing
the W2 weight matrix in example 2.14 with a matrix based on distances.
See if this variant of the SAC model is successful. Another exercise would be
to estimate a model using: sac(ys,xs,W2,W), and compare it to the model
in example 2.14.
2.6
Chapter Summary
2.7
References
82
Anselin, L. 1980. Estimation Methods for Spatial Autoregressive Structures, (New York: Regional Science Dissertation and Monograph Series 8).
Anselin, L. 1988. Spatial Econometrics: Methods and Models, (Dorddrecht: Kluwer Academic Publishers).
Anselin, L. and D.A. Griffith. 1988. Do spatial effects really matter
in regression analysis? Papers of the Regional Science Association, 65,
pp. 11-34.
Anselin, L. and R.J.G. Florax. 1994. Small Sample Properties of
Tests for Spatial Dependence in Regression Models: Some Further
Results, Research paper 9414, Regional Research Institute, West Virginia University, Morgantown, West Virginia.
Anselin, L. and S. Rey. 1991. Properties of tests for spatial dependence in linear regression models, Geographical Analysis, Volume 23,
pages 112-31.
Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Source of Collinearity, (John
Wiley: New York).
Cliff, A. and J. Ord, 1972. Testing for spatial autocorrelation among
regression residuals, Geographical Analysis, Vol. 4, pp. 267-84.
Cliff, A. and J. Ord, 1973. Spatial Autocorrelation, (London: Pion)
Cliff, A. and J. Ord, 1981. Spatial Processes, Models and Applications,
(London: Pion)
Gilley, O.W., and R. Kelley Pace. 1996. On the Harrison and Rubinfeld Data, Journal of Environmental Economics and Management,
Vol. 31 pp. 403-405.
Pace, R. K. and R. Barry. 1997. Quick Computation of Spatial
Autoregressive Estimators, forthcoming in Geographical Analysis.
Pace, R. Kelley. 1993. Nonparametric Methods with Applications to
Hedonic Models, Journal of Real Estate Finance and Economics Vol.
7, pp. 185-204.
83
Pace, R. Kelley, and O.W. Gilley. 1997. Using the Spatial Configuration of the Data to Improve Estimation, Journal of the Real Estate
Finance and Economics Vol. 14 pp. 333-340.
Pace, R. Kelley, and R. Barry. 1998. Simulating mixed regressive
spatially autoregressive estimators, Computational Statistics, Vol. 13
pp. 397-418.
Ripley, Brian D. 1988. Statistical Inference for Spatial Processes,
(Cambridge University Press: Cambridge, U.K.).
Chapter 3
Bayesian Spatial
autoregressive models
This chapter discusses spatial autoregressive models from a Bayesian perspective. It is well-known that Bayesian regression methods implemented
with diffuse prior information can replicate maximum likelihood estimation
results. We demonstrate this type of application, but focus on some extensions that are available with the Bayesian approach. The maximum likelihood estimation methods set forth in the previous chapter are based on the
presumption that the underlying disturbance process involved in generating
the model is normally distributed.
Further, many of the formal tests for spatial dependence and heterogeneity including those introduced in the previous chapter rely on characteristics
of quadratic forms for normal variates in order to derive the asymptotic distribution of the test statistics.
There is a history of Bayesian literature that deals with heteroscedastic and leptokurtic disturbances, treating these two phenomena in a similar
fashion. Geweke (1993) points out that a non-Bayesian regression methodology introduced by Lange, Little and Taylor (1989), which assume an independent Student-t distribution for the regression disturbances, is identical
to a Bayesian heteroscedastic linear regression model he introduces. Lange,
Little and Taylor (1989) show that their regression model provides robust
results in a wide range of applied data settings.
Geweke (1993) argues the same for his method and makes a connection to
the Bayesian treatment of symmetric leptokurtic disturbance distributions
through the use of scale mixtures of normal distributions. We adopt the
approach of Geweke (1993) in order to extend the spatial autoregressive
84
y = W1 y + X + u
(3.1)
u = W2 u +
N (0, 2V )
V
= diag(v1 , v2 , . . . , vn )
Where the change made to the basic model is in the assumption regarding
the disturbances . We assume that they exhibit non-constant variance,
taking on different values for every observation. The magnitudes vi , i =
1, .. . . . , n represent parameters to be estimated. This assumption of inherent
spatial heterogeneity seems more appropriate than the traditional GaussMarkov assumption that the variance of the disturbance terms is constant
over space.
The first section of this chapter introduces a Bayesian heteroscedastic
regression model and the topic of Gibbs sampling estimation without complications introduced by the spatial autoregressive model. The next section
applies these ideas to the simple FAR model and implements a Gibbs sampling estimation procedure for this model. Following sections deal with the
other spatial autoregressive models that we introduced in the previous chapter.
3.1
y = X +
N (0, V )
2
= diag(v1 , v2 , . . . , vn )
N (c, T )
(1/)
r/vi ID 2 (r)/r
r
(m, k)
(3.2)
H = (X V
X +RT
R)
(3.3)
Note that this is quite analogous to a generalized least-squares (GLS) version of the Theil and Goldberger (1961) estimation formulas, known as the
n
X
(3.4)
i=1
yi x0i .
Where we let ei =
This result parallels the simple regression case
where we know that the residuals are 2 distributed. A difference from the
standard case is that we adjust the ei using the relative variance terms vi .
Finally, the conditional distribution for the parameters V represent a 2
distribution with r + 1 degrees of freedom (see Geweke, 1993):
[( 2 e2i + r)/vi]|(, ) 2 (r + 1)
(3.5)
The Gibbs sampler provides a way to sample from a multivariate posterior probability density of the type we encounter in this estimation problem
based only on the densities of subsets of vectors conditional on all others. In
other words, we can use the conditional distributions set forth above to produce estimates for our model, despite the fact that the posterior distribution
was not tractable.
The Gibbs sampling approach that we will use throughout this chapter
to estimate Bayesian variants of the spatial autoregressive models is based
on this simple idea. We specify the conditional distributions for all of the
parameters in the model and proceed to carry out random draws from these
distributions until we collect a large sample of parameter draws. Gelfand
and Smith (1990) demonstrate that Gibbs sampling from the sequence of
complete conditional distributions for all parameters in the model produces
a set of draws that converge in the limit to the true (joint) posterior distribution of the parameters. That is, despite the use of conditional distributions
in our sampling scheme, a large sample of the draws can be used to produce
valid posterior inferences about the mean and moments of the multivariate
posterior parameter distribution.
The method is most easily described by developing and implementing a
Gibbs sampler for our heteroscedastic Bayesian regression model. Given the
three conditional posterior densities in (3.3) through (3.5), we can formulate
a Gibbs sampler for this model using the following steps:
1. Begin with arbitrary values for the parameters 0 , 0 and vi0 which we
designate with the superscript 0.
We rely on MATLAB functions norm rnd and chis rnd to provide the
multivariate normal and chi-squared random draws. These functions are
part of the Econometrics Toolbox and are discussed in Chapter 8 of the
manual. Note also, we omit the first 100 draws at start-up to allow the
Gibbs sampler to achieve a steady state before we begin sampling for the
parameter distributions.
The results are shown below, where we find that it took only 11.5 seconds
to carry out the 1100 draws and produce a sample of 1000 draws on which
we can base our posterior inferences regarding the parameters and .
For comparison purposes, we produced estimates using the theil function
from the Econometrics Toolbox that implements mixed estimation. These
t-probability
0.006417
0.001259
0.000276
Figure 3.1 shows the mean of the 1,000 draws for the parameters vi
plotted for the 100 observation sample. Recall that the last 50 observations
contained a time-trend generated pattern of non-constant variance. This
pattern was detected quite accurately by the estimated vi terms.
One point that should be noted about Gibbs sampling estimation is
that convergence of the sampler needs to be diagnosed by the user. The
Econometrics Toolbox provides a set of convergence diagnostic functions
along with illustrations of their use in Chapter 5 of the manual. Fortunately,
for simple regression models (and spatial autoregressive models) convergence
of the sampler is usually a certainty, and convergence occurs quite rapidly.
A simple approach to testing for convergence is to run the sampler once
to carry out a small number of draws, say 300 to 500, and a second time
to carry out a larger number of draws, say 1000 to 2000. If the means and
variances for the posterior estimates are similar from both runs, convergence
seems assured.
3.5
2.5
1.5
10
20
30
40
50
60
70
80
90
100
3.2
y = W y +
N (0, V )
2
(3.6)
= diag(v1 , v2 , . . . , vn )
N (c, T )
r/vi ID 2 (r)/r
r
(m, k)
(0 , d0)
Where as in Chapter 2, the spatial contiguity matrix W has been standardized to have row sums of unity and the variable vector y is expressed
in deviations from the means to eliminate the constant term in the model.
We allow for an informative prior on the spatial autoregressive parameter
, the heteroscedastic control parameter r and the disturbance variance .
This is the most general Bayesian model, but practitioners would probably
implement diffuse priors for and .
A diffuse prior for would be implemented by setting the prior mean
c to zero and using a large prior variance for T , say 1e+12. To implement
a diffuse prior for we would set 0 = 0, d0 = 0. The prior for r is based
on a (m, k) distribution which has a mean equal to m/k and a variance
equal to m/k2 . Recall our discussion of the role of the prior hyperparameter
r in allowing the vi estimates to deviate from their prior means of unity.
Small values for r around 2 to 7 allow for non-constant variance and are
associated with a prior belief that outliers or non-constant variance exist.
Large values such as r = 20 or r = 50 would produce vi estimates that are all
close to unity, forcing the model to take on a homoscedastic character and
produce estimates equivalent to those from the maximum likelihood FAR
model discussed in Chapter 2. This would make little sense if we wished
to produce maximum likelihood estimates, it would be much quicker to use
the far function from Chapter 2. In the heteroscedastic regression model
demonstrated in example 3.1, we set r = 4 to allow ample opportunity for
the vi parameters to deviate from unity. Note that in example 3.1, the vi
estimates for the first 50 observations were all close to unity despite our
prior setting of r = 4. We will provide examples that suggests an optimal
strategy for setting r is to use small values in the range from 2 to 7. If the
sample data exhibits homoscedastic disturbances that are free from outliers,
the vi estimates will reflect this fact. On the other hand, if there is evidence
of heterogeneity in the errors, these settings for the hyperparameter r will
allow the vi estimates to deviate substantially from unity. Estimates for the
vi parameters that deviate from unity are needed to produce an adjustment
in the estimated and that take non-constant variance into account or
robustify our estimates in the presence of outliers.
(3.7)
1
(y W y)0 (y W y)}
2 2
(3.8)
If we treat as known, the kernel for the conditional posterior (that part
of the distribution that ignores inessential constants) for given takes the
form:
1 0
}
(3.9)
2 2
where = y W y. It is important to note that by conditioning on
(treating it as known) we can subsume the determinant, |In W |, as part
of the constant of proportionality, leaving us with one of the standard distributional forms. From (3.9) we conclude that 2 2 (n).
Unfortunately, the conditional distribution of given takes the following non-standard form:
p(|, y) (n+1) exp{
(3.10)
x 10
0.5
0.1
0.2
0.3
0.4
0.5
0.6
conditional distribution of
0.7
0.8
0.9
3
2
1
0
0.1
0.2
0.3
0.4
0.5
0.6
normal distribution
0.7
0.8
0.9
3
2
1
0
0.1
0.2
0.3
0.4
0.5
0.6
t-distribution with 3 dof
0.7
0.8
0.9
Another point to note about the example is that we adjust the parameter
c used to produce the random normal candidate values. This is done by
using the initial nomit=100 values of rho to compute a new value for c
based on two standard deviations of the initial draws. The following code
Consider also, that we delay collecting our sample of draws for the parameters and until we have executed nomit burn-in draws, which is
100 in this case. This allows the sampler to settle into a steady state, which
might be required if poor values of and were used to initialize the sampler. In theory, any arbitrary values can be used, but a choice of good
values will speed up convergence of the sampler. A plot of the first 100
values drawn from this example is shown in Figure 3.3. We used = 0.5
and 2 = 100 as starting values despite our knowledge that the true values
were = 0.7 and 2 = 1. The plots of the first 100 values indicate that even
if we start with very poor values, far from the true values used to generate
the data, only a few iterations are required to reach a steady state. This is
usually true for regression-based Gibbs samplers.
The function c rho evaluates the conditional distribution for given 2
at any value of . Of course, we could use sparse matrix algorithms in this
function to handle large data sample problems, which is the way we approach
this task when constructing our function far g for the spatial econometrics
library.
function yout = c_rho(rho,sige,y,W)
% evaluates conditional distribution of rho
% given sige for the spatial autoregressive model
n = length(y);
IN = eye(n); B = IN - rho*W;
detval = log(det(B));
epe = y*B*B*y;
yout = (n/2)*log(sige) + (n/2)*log(epe) - detval;
Finally, we present results from executing the code shown in example 3.2,
where both Gibbs estimates based on the mean of the 1,000 draws for and
as well as the standard deviations are shown. For contrast, we present maximum likelihood estimates, which for the case of the homoscedastic Gibbs
sampler implemented here with a diffuse prior on and should produce
similar estimates.
Gibbs sampling estimates
hit rate = 0.3561
10
20
30
40
50
60
first 100 draws for rho
70
80
90
100
10
20
30
40
50
60
first 100 draws for sige
70
80
90
100
4
3.5
3
2.5
2
1.5
1
0.5
0.649
1.348
0.128
0.312
The time needed to generate 1,100 draws was around 10 seconds, which
represents 100 draws per second. We will see that similar speed can be
achieved even for large data samples.
From the results we see that the mean and standard deviations from
3.2.1
We discuss some of the implementation details concerned with constructing a MATLAB function far g to produce estimates for the Bayesian FAR
model. This function will rely on a sparse matrix algorithm approach to
handle problems involving large data samples. It will also allow for diffuse
or informative priors and handle the case of heterogeneity in the disturbance
variance.
The first thing we need to consider is that to produce a large number of
draws, say 1,000, we would need to evaluate the conditional distribution of
2,000 times. (Note that we called this function twice in example 3.2). Each
evaluation would require that we compute the determinant of the matrix
(In W ), which we have already seen is a non-trivial task for large data
samples. To avoid this, we rely on the Pace and Barry (1997) approach
discussed in the previous chapter. Recall that they suggested evaluating
this determinant over a grid of values in the feasible range of once at the
outset. Given that we have carried out this evaluation and stored the determinant values along with associated values, we can simply look-up
the appropriate determinant in our function that evaluates the conditional
distribution. That is, the call to the conditional distribution function will
provide a value of for which we need to evaluate the conditional distribution. If we already know the determinant for a grid of all feasible values,
we can simply look up the determinant value closest to the value and use
it during evaluation of the conditional distribution. This saves us the time
involved in computing the determinant twice for each draw of .
Since we need to carry out a large number of draws, this approach works
better than computing determinants for every draw. Note that in the case
3.2.2
Applied examples
The program produced the following output. Note that our printing
function computes means and standard deviations using the draws returned
in the results structure of far g. We also compute tstatistics and evaluate
the marginal probabilities. This allows us to provide printed output in the
form of a traditional regression model.
% homoscedastic models
First-order spatial autoregressive model Estimates
Dependent Variable =
y-simulated
R-squared
=
0.5908
sigma^2
=
24.9531
Nobs, Nvars
=
49,
1
log-likelihood =
-285.86289
# of iterations =
9
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
rho
0.771980
6.319186
0.000000
Gibbs sampling First-order spatial autoregressive model
Dependent Variable =
y-simulated
R-squared
=
0.5805
sigma^2
=
25.2766
r-value
=
30
Nobs, Nvars
=
49,
1
ndraws,nomit
=
1100,
100
acceptance rate =
0.8886
time in secs
=
13.7984
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
rho
0.739867
8.211548
0.000000
% outlier models
First-order spatial autoregressive model Estimates
Dependent Variable =
y-simulated
The first two sets of output illustrate the point we made regarding
Bayesian analysis implemented with a diffuse prior. The results from the
Gibbs sampling approach are very close to those from maximum likelihood
estimation. Note that the printed output shows the time required to carry
out 1,100 draws along with the acceptance rate. It took only 13.7 seconds
to produce 1,100 draws.
After introducing two outliers, we see that the maximum likelihood estimates produce a poor fit to the data as well as an inflated estimate of 2 .
The coefficient estimate for is also affected adversely, deviating from the
true value of 0.75, and the precision of the estimate is degraded. In contrast,
the Gibbs sampled estimate of was closer to the true value and exhibits
greater precision as indicated by the larger tstatistic. The estimate for 2
is much smaller than that from maximum likelihood. Robust estimates will
generally exhibit a smaller R2 statistic as the estimates downweight outliers
rather than try to fit these data observations.
We also produced a plot of the mean of the vi draws which serve as an
estimate of these relative variance terms. This graph is shown in Figure 3.4,
where we see that the two outliers were identified.
Example 3.4 illustrates the use of the far g function on the large Pace
and Barry data set. We set an r value of 4 which will capture heterogeneity
10
mean of V draws
10
15
20
25
30
Observations
35
40
45
50
From the results we see that the maximum likelihood and Bayesian robust estimates are very similar, suggesting a lack of heterogeneity. We can
further explore this issue by examining a plot of the mean vi draws, which
serve as estimates for these parameters in the model. Provided we use a
small value of r, the presence of heterogeneity and outliers will be indicated
by large vi estimates that deviate substantially from unity. Figure 3.5 shows
a plot of the mean of the vi draws, confirming that a handful of large vi values exist. Close inspection reveals that only 58 vi values greater than 3 exist
Mean of vi draws
500
1000
1500
2000
observations
2500
3000
3500
Figure 3.5: Mean of the vi draws for Pace and Barry data
An advantage of Gibbs sampling is that valid estimates of dispersion for
the parameters as well as the entire posterior distribution associated with
the estimated parameters are available. Recall that this presents a problem for large data sets estimated using maximum likelihood methods, which
we solved using a numerical hessian calculation. In the presence of outliers
or non-constant variance the numerical hessian approach may not be valid
because normality in the disturbance generating process might be violated.
In the case of Gibbs sampling, the law of large numbers suggests that we
can compute valid means and measures of dispersion from the sample of
draws. As an illustration, we use a function pltdens from the Econometrics
Toolbox to produce a non-parametric density estimate of the posterior distribution for . Figure 3.6 shows the posterior density that is plotted using
the command:
3.3
25
20
15
10
0.66
0.68
0.7
0.72
0.74
0.76
0.78
As the other functions are quite similar, we leave it to the reader to examine the documentation and demonstration files for these functions. One
point to note regarding use of these functions is that two options exist for
specifying a value for the hyperparameter r. The default is to rely on an improper prior based on r = 4. The other option allows a proper Gamma(m,k)
prior to be assigned for r.
The first option has the virtue that convergence will be quicker and less
draws are required to produce estimates. The drawback is that the estimates
are conditional on the single value of r set for the hyperparameter. The
second approach produces draws from a Gamma(m,k) distribution for r on
each pass through the sampler. This produces estimates that average over
alternative r values, in essence integrating over this parameter, resulting in
unconditional estimates.
My experience is that estimates produced with a Gamma(8,2) prior having a mean of r = 4 and variance of 2 are quite similar to those based on an
improper prior with r = 4. Use of the Gamma prior tends to require a larger
number of draws based on convergence diagnostics routines implemented by
the function coda from the Econometrics Toolbox described in Chapter 5
of the manual.
3.4
Applied examples
From the results we see that 1100 draws took around 13 seconds. The
acceptance rate falls slightly for values of near 0.9, which we would expect.
Since this is close to the upper limit of unity, we will see an increase in
rejections of candidate values for that lie outside the feasible range.
The estimates are reasonably similar to the maximum likelihood results
even for the relatively small number of draws used. In addition to presenting estimates for , we also provide estimates for the parameters in
the problem.
% sem model demonstration
True lam
ML lam
lam t Gibbs lam
lam t
time
accept
-0.6357
0.0598
0.3195
0.2691
0.5399
0.7914
0.5471
0.7457
0.8829
b1 ML
1.0570
1.2913
0.7910
1.5863
1.2081
0.3893
1.2487
1.8094
-0.9454
-3.1334
0.3057
1.8975
1.5397
4.0460
10.2477
4.1417
8.3707
17.5062
b2 ML
1.0085
1.0100
1.0298
0.9343
0.9966
1.0005
0.9584
1.0319
1.0164
-0.5049
0.0881
0.3108
0.2091
0.5141
0.7466
0.5303
0.7093
0.8539
b3 ML
0.9988
1.0010
0.9948
1.0141
1.0065
1.0155
1.0021
0.9920
0.9925
-2.5119
0.4296
1.9403
1.1509
3.6204
7.4634
3.8507
6.7513
14.6300
b1 Gibbs
1.0645
1.2684
0.7876
1.5941
1.2250
0.3529
1.3191
1.8366
-0.9783
12.9968
12.8967
13.0212
12.9347
13.1345
13.3044
13.2014
13.5251
13.7529
b2 Gibbs
1.0112
1.0121
1.0310
0.9283
0.9980
1.0005
0.9544
1.0348
1.0158
0.4475
0.4898
0.4855
0.4827
0.4770
0.4616
0.4827
0.4609
0.4349
b3 Gibbs
0.9980
1.0005
0.9936
1.0157
1.0058
1.0171
1.0029
0.9918
0.9923
The maximum likelihood SAR estimates along with the two sets of
Bayesian model estimates are shown below. We see that the homoscedastic
Gibbs estimates are similar to the maximum likelihood estimates, demonstrating that the Gibbs sampling estimation procedure is not responsible
for the difference in estimates we see between maximum likelihood and the
heteroscedastic Bayesian model. For the case of the heteroscedastic prior
we see much better estimates for both and . Note that the Rsquared
statistic is lower for the robust estimates which will be the case because
robustification requires that we not attempt to fit the outlying observations. This will generally lead to a worse fit for models that produce robust
estimates.
Spatial autoregressive Model Estimates
Dependent Variable =
y-simulated
R-squared
=
0.6779
Rbar-squared
=
0.6639
sigma^2
= 680.8004
Nobs, Nvars
=
49,
3
log-likelihood =
-212.59468
# of iterations =
14
min and max rho =
-1.5362,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
19.062669
1.219474
0.228880
income
-0.279572
-0.364364
0.717256
house value
1.966962
8.214406
0.000000
0.200210
1.595514
0.117446
One point that may be of practical importance is that using large values
3.5
An applied exercise
Coefficient
-0.165349
0.080662
0.044302
0.017156
-0.129635
0.160858
0.018530
-0.215249
0.272237
-0.221229
-0.102405
0.077511
-0.337633
0.450871
t-statistic
-6.888522
3.009110
1.255260
0.918665
-3.433659
6.547560
0.595675
-6.103520
5.625288
-4.165999
-4.088484
3.772044
-10.149809
12.348363
t-probability
0.000000
0.002754
0.209979
0.358720
0.000646
0.000000
0.551666
0.000000
0.000000
0.000037
0.000051
0.000182
0.000000
0.000000
0.227007
-0.231393
-0.110537
0.137065
-0.293952
0.689346
0.107479
2.592018
-4.713901
-3.852098
4.555835
-7.201166
7.938580
1.188044
0.009825
0.000003
0.000133
0.000007
0.000000
0.000000
0.235388
10
Vi estimates
100
200
300
400
observations --- census tracts
500
600
posterior density
-0.3
-0.2
-0.1
0.1
0.2
lambda values
0.3
0.4
0.5
t-statistic
-4.096886
1.252200
0.334302
0.360944
-1.915048
7.352177
-2.045083
-3.829066
2.771251
-5.075241
-3.730565
4.462714
-7.136230
7.891604
1.700944
t-probability
0.000049
0.211091
0.738294
0.718296
0.056065
0.000000
0.041377
0.000145
0.005795
0.000001
0.000213
0.000010
0.000000
0.000000
0.089584
The results indicate a probability of 6%, which should not make us nervous about imposing the restriction on .
Posterior density
-0.1
0.1
0.2
lambda values
0.3
0.4
126
Summarizing, I would use the Gibbs SAC model based on the restriction
that > 0. With the exception of , this would produce the same inferences
regarding all parameters as the model without this restriction. It would also
produce the same inferences regarding the parameters as the SEM model,
which is comforting.
3.6
Chapter Summary
3.7
References
Anselin, L. 1988. Spatial Econometrics: Methods and Models, (Dorddrecht: Kluwer Academic Publishers).
Casella, G. and E.I. George. 1992. Explaining the Gibbs Sampler,
American Statistician, Vol. 46, pp. 167-174.
Gelfand, Alan E., and A.F.M Smith. 1990. Sampling-Based Approaches to Calculating Marginal Densities, Journal of the American
Statistical Association, Vol. 85, pp. 398-409.
Gelfand, Alan E., Susan E. Hills, Amy Racine-Poon and Adrian F.M.
Smith. 1990. Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling, Journal of the American Statistical Association, Vol. 85, pp. 972-985.
Chapter 4
4.1
Spatial expansion
The first model of this type was introduced by Casetti (1972) and labeled
a spatial expansion model. The model is shown in (4.1), where y denotes
an nx1 dependent variable vector associated with spatial observations and
X is an nxnk matrix consisting of terms xi representing kx1 explanatory
variable vectors, as shown in (4.2). The locational information is recorded
in the matrix Z which has elements Zxi , Zyi , i = 1, . . ., n, that represent
latitude and longitude coordinates of each observation as shown in (4.2).
The model posits that the parameters vary as a function of the latitude
and longitude coordinates. The only parameters that need be estimated
are the parameters in 0 that we denote, x , y . These represent a set of
127
128
y = X +
= ZJ0
(4.1)
Where:
y =
y1
y2
..
.
X=
yn
x01 0 . . . 0
0 x02
..
..
.
.
0
x0n
1
2
..
.
n
1
2
..
.
n
Ik 0
Zx1 Ik Zy1 Ik
0
...
0 Ik
..
..
.
.
0
Z =
J = ..
.
..
.
Zxn Ik Zyn Ik
0 Ik
0 =
x
y
(4.2)
(4.3)
129
observations measured by latitude-longitude coordinates take on similar parameter values. As the location varies, the regression relationship changes
to accommodate a locally linear fit through clusters of observations in close
proximity to one another.
Another way to implement this model is to rely on a vector of distances
rather than the latitude-longitude coordinates. This implementation defines
the distance from a central observation,
q
di =
(4.4)
y = X +
= DJ0
(4.5)
4.1.1
Estimating this model is relatively straightforward as we can rely on leastsquares. One issue is that there are a number of alternative expansion
specifications. For example, one approach would be to construct a model
that includes the base k explanatory variables in the matrix X estimated
with fixed parameters, plus an additional 2k expansion variables based on
the latitude-longitude expansion. Another approach would be to include the
base k variables in the matrix X and only 2(k1) variables in expansion form
by excluding the constant term from the expansion process. Yet another
approach would be to rely on a simple expansion of all variables as was
illustrated in (4.1).
130
(4.6)
The function allows the user to specify an option for distance expansion based on a particular point in the spatial data sample or the latitudelongitude expansion. In the case of the distance expansion, the k explanatory
variables in the matrix X are used as non-expansion variables estimated with
fixed parameters and the k 1 variables excluding the constant are included
as distance-expanded variables. This version of the model can be written
as:
y = + X + XD0 +
(4.7)
(Zxi Zxc )2 + (Zyi Zyc )2 , where Zxc , Zyc denote the latitude-longitude
coordinates of the centrally located observation and Zxi , Zyi denote the coordinates for observation i in the data sample. The distance of the central
point is zero of course.
An optional input is provided to carry out isotropic normalization of the
x-y coordinates which essentially puts the coordinates in deviations from
the means form then standardizes by dividing by the square root of the sum
of the variances in the x-y directions. That is:
q
x? = (x x
)/ (x2 + y2 )
q
y ? = (y y)/ (x2 + y2 )
(4.8)
This normalization is carried out by a function normxy in the spatial econometrics library.
This normalization should make the center points xc, yc close to zero and
produces a situation where the coefficients for the base model represent a
131
Given the exclusion of the constant term from the spatial expansion
formulation, we need to impose that the user place the constant term vector
in the first column of the explanatory variables matrix X used as an input
argument to the function.
Of course, we have an associated function to print the results structure and another to provide graphical presentation of the estimation results.
Printing these estimation results is a bit challenging because of the large
132
number of parameter estimates that we produce using this method. Graphical presentation may provide a clearer picture of the variation in coefficients
over space. A call to plt using the results structure variable will produce
plots of the coefficients in both the x- and y-directions for the case of the
latitude-longitude expansion, where we sort the x-direction from left to right
and the y-direction from left to right. This provides a visual picture of how
the coefficients vary over space. If the x-coordinates are largest for the east
and smallest for the west, the plot will show coefficient variation from west
to east as in map space. Similarly, if the y-coordinates are smallest for the
south and largest in the north, the plot will present coefficient variation
from south to north. (Note that if you enter Western hemisphere latitudelongitude coordinates, the x-direction plots will be from east to west, but
the y-direction plots will be south to north.)
For the case of distance expansion estimates, the plots present coefficients sorted by distance from the central point, provided by the user in the
structure field option.ctr. The central observation (smallest distance) will
be on the left of the graph and the largest distance on the right.
Another point to note regarding the graphical presentation of the estimates relates to the fact that we present the coefficients in terms of the
individual variables total impact on the dependent variable y. It was felt
that users would usually be concerned with the total impact of a particular
variable on the dependent variable as well as the decomposition of impacts
into spatial and non-spatial effects. The printed output provides the coefficient estimates for the base model as well as the expansion coefficients that
can be used to analyze the marginal effects from the spatial and non-spatial
decomposition. To provide another view of the impact of the explanatory
variables in the model on the dependent variable, the graphical presentation
plots the coefficient estimates in a form representing their total impact on
the dependent variable. That is we graph:
xi = i + Zx xi
yi = i + Zy yi
di = i + D0i
(4.9)
Where x, y are plotted for the x-y expansion and d is graphed for the
distance expansion. This should provide a feel for the total impact of variable
i on the dependent variable since it takes into account the non-spatial impact
attributed to i , as well as the spatially varying impacts in the x-y direction
or with respect to distance.
133
An illustration in the next section will pursue this point in more detail.
4.1.2
Applied examples
134
-1.9028
-1.8130
-1.8296
-1.7637
-1.6859
-1.7319
-1.7103
-1.7434
-1.6559
-1.6453
-1.6472
-1.6651
-1.5698
-1.3966
-1.2870
-1.2561
-1.1170
-1.1732
-1.3367
1.1043
1.0522
1.0618
1.0236
0.9784
1.0051
0.9926
1.0118
0.9610
0.9549
0.9559
0.9664
0.9110
0.8105
0.7469
0.7290
0.6482
0.6809
0.7758
3.7525
3.9929
3.7209
3.6857
3.8970
4.1387
4.3864
4.4083
4.4204
4.3233
4.2091
4.1192
3.6942
3.4319
3.6250
3.4258
3.2412
3.1222
3.2279
-1.5019
-1.5982
-1.4893
-1.4752
-1.5598
-1.6565
-1.7556
-1.7644
-1.7693
-1.7304
-1.6847
-1.6487
-1.4786
-1.3736
-1.4509
-1.3712
-1.2973
-1.2497
-1.2919
135
-0.4082
-0.4586
-0.5495
-0.6034
-0.4314
-0.2755
-0.1737
-0.1087
-0.0000
-0.1542
-0.1942
-0.2269
-0.2096
-0.2978
-0.4000
-0.3479
-0.3610
-0.2715
-0.2477
-0.1080
-0.0860
-0.1379
-0.1913
-0.2235
-0.1755
-0.2397
-0.2189
-0.2941
-0.2856
-0.2682
-0.2422
-0.3631
-0.5700
-0.6533
-0.7069
-0.8684
-0.8330
-0.6619
136
-0.1137
-0.1277
-0.1530
-0.1681
-0.1201
-0.0767
-0.0484
-0.0303
-0.0000
-0.0429
-0.0541
-0.0632
-0.0584
-0.0829
-0.1114
-0.0969
-0.1005
-0.0756
-0.0690
-0.0301
-0.0239
-0.0384
-0.0533
-0.0622
-0.0489
-0.0668
-0.0610
-0.0819
-0.0795
-0.0747
-0.0675
-0.1011
-0.1587
-0.1820
-0.1969
-0.2419
-0.2320
-0.1843
137
138
Nobs, Nvars
=
49,
3
central obs
=
20
***************************************************************
Base centroid estimates
Variable
Coefficient
t-statistic
t-probability
const
62.349645
12.794160
0.000000
income
-0.855052
-1.048703
0.299794
hse value
-0.138951
-0.520305
0.605346
d-income
-0.048056
-0.613545
0.542538
d-hse value
-0.013384
-0.473999
0.637743
139
-1
1.4
x-house value
1.2
x-income
-1.5
-2
0.8
0.6
10
20
30
40
50
sorted by x-direction, left=smallest x
5.5
-1.2
-1.4
y-house value
y-income
-2.5
4.5
4
3.5
3
10
20
30
40
50
sorted by x-direction, left=smallest x
10
20
30
40
50
sorted by y-direction, left=smallest x
-1.6
-1.8
-2
10
20
30
40
50
sorted by y-direction, left=smallest x
-2.2
-5
1.8
-5.5
1.6
x-house value
x-income
-6
-6.5
1.4
1.2
10
20
30
40
50
sorted by x-direction, left=smallest x
1.5
-0.8
-1
y-house value
y-income
-7
0.5
0
-0.5
10
20
30
40
50
sorted by x-direction, left=smallest x
10
20
30
40
50
sorted by y-direction, left=smallest x
-1.2
-1.4
-1.6
-1
-1.5
140
10
20
30
40
50
sorted by y-direction, left=smallest x
-1.8
141
the pattern shown for the x-y expansion. Anselin (1988) in analyzing the
x-y model shows that heteroscedastic disturbances produce problems that
plague inferences for the model. Adjusting for the heteroscedastic disturbances dramatically alters the inferences. We turn attention to this issue
when we discuss the DARP version of this model in Section 4.2.
The plots of the coefficient estimates provide an important source of
information about the nature of coefficient variation over space, but you
should keep in mind that they do not indicate levels of significance, simply
point estimates.
Our plt wrapper function works to call the appropriate function plt cas
that provides individual graphs of each coefficient in the model as well as
a two-part graph showing actual versus predicted and residuals. Figure 4.4
shows the actual versus predicted and residuals from the distance expansion
model. This plot is produced by the plt function when given a results
structure from the casetti function.
-0.8
d-income
-1
-1.2
-1.4
-1.6
-1.8
10
15
20
25
30
35
distance from the center, left=center
40
45
50
10
15
20
25
30
35
distance from the center, left=center
40
45
50
-0.05
d-house value
-0.1
-0.15
-0.2
-0.25
-0.3
-0.35
-0.4
142
10
15
20
25
30
35
40
45
50
30
35
40
45
50
Residuals
30
20
10
0
-10
-20
-30
-40
10
15
20
25
4.2
DARP models
A problem with the spatial expansion model is that heteroscedasticity is inherent in the way the model is constructed. To see this, consider the slightly
altered version of the distance expansion model shown in (4.10), where we
have added a stochastic term u to reflect some error in the expansion relationship.
y = X + e
= DJ0 + u
(4.10)
Now consider substituting the second equation from (4.10) into the first,
producing:
y = XDJ0 + Xu + e
(4.11)
143
It should be clear that the new composite disturbance term Xu+e will reflect
heteroscedasticity unless the expansion relationship is exact and u = 0.
Casetti (1982) and Casetti and Can (1998) propose a model they label
DARP, an acronym for Drift Analysis of Regression Parameters, that aims at
solving this problem. This model case be viewed as an extended expansion
model taking the form:
y = X + e
= f (Z, ) + u
(4.12)
(4.13)
Where di denotes the squared distance between the ith observation and the
central point, and 2 , 0, 1 are parameters to be estimated. Of course, a
more general statement of the model would be that i2 = g(hi , ), indicating
that any functional form g involving some known variable hi and associated
parameters could be employed to specify the non-constant variance over
space.
Note that the alternative specifications in (4.13) imply: 2 = exp(0 ),
the constant scalar component of the variance structure, and exp(1 di) reflects the non-constant component which is modeled as a function of distance
from the central point.
144
145
(4.14)
Casetti and Can (1998) argue that the estimate 1 from this procedure
would be consistent. Given this estimate and our knowledge of the distances
Using ,
F2 GLS
1 X)1 X 0
1 y
= (X 0
1 (y X F GLS )/(n k)
= (y X F GLS )
(4.15)
P
(4.16)
146
that for the FGLS estimates shown in (4.15). The asymptotic variancecovariance matrix for the parameters ( 2 , 1) is given by:
var cov( 2, 1) = 2(D0 D)1
D = (, d)
(4.17)
147
results.nvar = # of variables in x
results.y
= y data vector
results.xc
= xc
results.yc
= yc
results.ctr
= ctr (if input)
results.dist = distance vector
results.exp
= exp input option
results.norm = norm input option
results.iter = # of maximum likelihood iterations
-------------------------------------------------NOTE: assumes x(:,1) contains a constant term
--------------------------------------------------
(4.18)
148
The printed results are shown below, where we report not only the estimates for 0 and the expansion estimates, but estimates for the parameters
as well. A chi-squared statistic to test the null hypothesis that 1 = 0
is provided as well as a marginal probability level. For the case of the x-y
expansion, we see that 1 parameter is negative and significant by virtue
of the large chi-squared statistic and associated marginal probability level
of 0.0121. The inference we would draw is that performance drift occurs in
the south-north direction. For the 2 parameter, we find a positive value
that is not significantly different from zero because of the marginal probability level of 0.8974. This indicates that the simple deterministic expansion
relationship is working well in the west-east direction. Note that these results conform to those found with the spatial expansion model, where we
indicated that parameter variation in the west-east direction was significant,
but not for the south-north.
DARP X-Y Spatial Expansion Estimates
Dependent Variable =
crime
R-squared
=
0.6180
Rbar-squared
=
0.5634
sige
= 122.2255
gamma1,gamma2 =
-0.0807,
0.0046
gamma1, prob
=
6.2924,
0.0121
gamma2, prob
=
0.0166,
0.8974
# of iterations =
16
log-likelihood =
-181.3901
Nobs, Nvars
=
49,
3
***************************************************************
Base x-y estimates
Variable
Coefficient
t-statistic
t-probability
const
66.783527
6.024676
0.000000
income
-2.639184
-0.399136
0.691640
hse value
0.249214
0.095822
0.924078
x-income
-0.048337
-0.537889
0.593247
x-hse value
0.021506
0.640820
0.524819
y-income
0.084877
0.564810
0.574947
y-hse value
-0.037460
-0.619817
0.538436
***************************************************************
Expansion estimates
Obs#
x-income x-hse value
y-income y-hse value
1
-1.3454
0.6595
4.0747
-1.6828
2
-1.3786
0.6758
3.8959
-1.6089
3
-1.3865
0.6796
3.7218
-1.5370
4
-1.2600
0.6176
3.6930
-1.5251
5
-1.4655
0.7183
4.2372
-1.7499
6
-1.5040
0.7372
3.9593
-1.6351
7
-1.5112
0.7407
3.6536
-1.5088
-1.6524
-1.4961
-1.7982
-1.8349
-1.8738
-1.8926
-1.9353
-1.9221
-1.8296
-1.7650
-1.6407
-1.6381
-1.5535
-1.6600
-1.6657
-1.6505
-1.5501
-1.6328
-1.6116
-1.5565
-1.4851
-1.5520
-1.4473
-1.5603
-1.4866
-1.5002
-1.4462
-1.3824
-1.4201
-1.4024
-1.4296
-1.3578
-1.3491
-1.3507
-1.3654
-1.2872
-1.1452
-1.0553
-1.0300
-0.9159
-0.9620
-1.0961
0.8100
0.7333
0.8814
0.8994
0.9185
0.9277
0.9487
0.9422
0.8968
0.8652
0.8042
0.8029
0.7615
0.8137
0.8165
0.8091
0.7598
0.8004
0.7900
0.7630
0.7280
0.7607
0.7095
0.7648
0.7287
0.7354
0.7089
0.6776
0.6961
0.6874
0.7008
0.6656
0.6613
0.6621
0.6693
0.6310
0.5613
0.5173
0.5049
0.4490
0.4715
0.5373
3.7766
3.3565
3.5017
3.3132
3.1392
2.8757
2.6729
2.4267
2.6854
3.0680
3.4536
3.2171
3.1863
3.0392
2.9229
2.8056
2.7671
2.6258
2.3998
2.4902
2.4854
2.6431
2.7709
2.9709
3.1613
2.9459
2.9181
3.0853
3.2767
3.4728
3.4901
3.4997
3.4228
3.3324
3.2613
2.9248
2.7171
2.8700
2.7123
2.5662
2.4719
2.5556
-1.5597
-1.3862
-1.4461
-1.3683
-1.2964
-1.1876
-1.1038
-1.0022
-1.1090
-1.2670
-1.4263
-1.3286
-1.3159
-1.2551
-1.2071
-1.1586
-1.1428
-1.0844
-0.9911
-1.0284
-1.0264
-1.0915
-1.1443
-1.2269
-1.3055
-1.2166
-1.2051
-1.2742
-1.3532
-1.4342
-1.4413
-1.4453
-1.4136
-1.3762
-1.3468
-1.2079
-1.1221
-1.1852
-1.1201
-1.0598
-1.0209
-1.0554
149
150
-0.0552
-0.0753
-0.0465
-0.0867
-0.0723
-0.1305
-0.1230
-0.1085
-0.0885
-0.1989
-0.4900
-0.6437
-0.7538
-1.1374
-1.0465
-0.6607
151
0.0008
0.0010
0.0006
0.0012
0.0010
0.0018
0.0017
0.0015
0.0012
0.0027
0.0067
0.0088
0.0103
0.0156
0.0143
0.0090
For the case of the distance expansion we find a single parameter that
is negative but not significant. This would be interpreted to mean that the
deterministic expansion relationship is performing adequately over space.
A comparison of the base model estimates from the x-y darp model
versus those from casetti show relatively similar coefficient estimates as we
would expect. In the case of the x-y expansion all of the signs are the same
for spatial expansion and darp models. The distance expansion version of
the model exhibits a sign change for the coefficient on income, which goes
from positive in the expansion model to negative in the darp model. Correcting for the heteroscedastic character of the estimation problem produces
dramatic changes in the statistical significance found for the base model
estimates. They all become insignificant, a finding consistent with results
reported by Anselin (1988) based on a jacknife approach to correcting for
heteroscedasticity in this model. (Anselin finds that the income coefficient
is still marginally significant after the correction.)
One approach to using this model is to expand around every point in
space and examine the parameters for evidence indicating where the model
is suffering from performance or parameter drift. Example 4.3 shows how
this might be accomplished using a for loop over all observations. For this
purpose, we wish only to recover the estimated values for the parameter
along with the marginal probability level.
% ----- example 4.3 Using darp() over space
% load Anselin (1988) Columbus neighborhood crime data
load anselin.dat; y = anselin(:,1); n = length(y);
x = [ones(n,1) anselin(:,2:3)];
xc = anselin(:,4); yc = anselin(:,5); % Anselin x-y coordinates
152
We use the MATLAB tic and toc commands to time the operation
of producing these maximum likelihood estimates across the entire sample.
The results are shown below, where we find that it took around 70 seconds
to solve the maximum likelihood estimation problem, calculate expansion
estimates and produce all of the ancillary statistics 49 times, once for each
observation in the data set.
elapsed_time = 69.2673
Obs#
gamma estimate marginal probability
1
-0.2198
0.0714
2
-0.2718
0.0494
3
-0.3449
0.0255
4
-0.4091
0.0033
5
-0.2223
0.0532
6
-0.3040
0.0266
7
-0.4154
0.0126
8
-0.2071
0.1477
9
-0.5773
0.0030
10
0.1788
0.1843
11
0.1896
0.1526
12
0.1765
0.1621
13
0.1544
0.1999
14
0.1334
0.2214
15
0.1147
0.2708
16
0.1429
0.2615
17
0.1924
0.2023
18
-0.1720
0.3112
19
0.1589
0.3825
20
-0.3471
0.0810
21
0.2020
0.2546
22
0.1862
0.2809
0.1645
0.0904
0.1341
0.1142
0.1150
0.0925
0.1070
-0.2765
0.0453
-0.6580
-0.3293
-0.5949
-0.8133
-0.5931
-0.4853
-0.4523
-0.5355
-0.6050
-0.6804
-0.7257
-0.7701
-0.5150
-0.3997
-0.3923
-0.3214
-0.3586
-0.4668
153
0.3334
0.6219
0.4026
0.4264
0.4618
0.5584
0.5329
0.1349
0.8168
0.0012
0.0987
0.0024
0.0000
0.0023
0.0066
0.0121
0.0016
0.0005
0.0001
0.0001
0.0000
0.0001
0.0005
0.0003
0.0004
0.0001
0.0000
From the estimated values of and the associated marginal probabilities, we infer that the model suffers from performance drift over the initial 9
observations and sample observations from 32 to 49. We draw this inference
from the negative estimates that are statistically significant for these observations. (Note that observation #8 is only marginally significant.) Over
the middle range of the sample, from observations 10 to 31 we find that
the deterministic distance expansion relationship works well. This inference
arises from the fact that none of the estimated parameters are significantly
different from zero.
In Section 4.4 we provide evidence that observations 2 and 4 represent
outliers that impact on estimates for neighboring observations 1 to 9. We
also show that this is true of observation 34, which influences observations
31 to 44. This suggests that the DARP model is working correctly to spot
places where the model encounters problems.
4.3
154
(4.19)
(4.20)
(4.21)
Where denotes the standard normal density and represents the standard
deviation of the distance vector di .
155
di =
(4.22)
(4.23)
i=1
is used, where y6=i () denotes the fitted value of yi with the observations for
point i omitted from the calibration process. A value of that minimizes
this score function is used as the distance-weighting bandwidth to produce
GWR estimates.
The non-parametric GWR model relies on a sequence of locally linear
regressions to produce estimates for every point in space by using a subsample of data information from nearby observations. Let y denote an nx1
vector of dependent variable observations collected at n points in space, X
an nxk matrix of explanatory variables, and an nx1 vector of normally
distributed, constant variance disturbances. Letting Wi represent an nxn
diagonal matrix containing distance-based weights for observation i that
reflect the distance between observation i and all other observations, we can
write the GWR model as:
1/2
Wi
1/2
y = Wi
Xi + i
(4.24)
The subscript i on i indicates that this kx1 parameter vector is associated with observation i. The GWR model produces n such vectors of
parameter estimates, one for each observation. These estimates are produced using:
i = (X 0 Wi X)1 (X 0Wi y)
(4.25)
156
4.3.1
Implementing GWR
157
results.nobs = nobs
results.nvar = nvars
results.bwidth = bandwidth
results.dtype
= input string for Gaussian, exponential weights
results.iter
= # of simplex iterations for cv
results.north = north (y-coordinates)
results.east = east (x-coordinates)
results.y
= y data vector
--------------------------------------------------NOTES: uses auxiliary function scoref for cross-validation
---------------------------------------------------
The following program illustrates using the gwr function on the Anselin
(1988) neighborhood crime data set to produce estimates based on both
Gaussian and exponential weighting functions. Figure 4.5 shows a graph of
these two sets of estimates, indicating that they are not very different.
% ----- example 4.4 Using the gwr() function
% load the Anselin data set
load anselin.dat; y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)]; tt=1:nobs;
north = anselin(:,4); east = anselin(:,5);
info.dtype = gaussian;
% Gaussian weighting function
res1 = gwr(y,x,east,north,info);
info.dtype = exponential; % Exponential weighting function
res2 = gwr(y,x,east,north,info);
subplot(3,1,1), plot(tt,res1.beta(:,1),tt,res2.beta(:,1),--);
legend(Gaussian,Exponential); ylabel(Constant term);
subplot(3,1,2), plot(tt,res1.beta(:,2),tt,res2.beta(:,2),--);
legend(Gaussian,Exponential); ylabel(Household income);
subplot(3,1,3), plot(tt,res1.beta(:,3),tt,res2.beta(:,3),--);
legend(Gaussian,Exponential); ylabel(House value);
1.1144
51.198618
-0.461074
-0.434240
16.121809
-2.938009
-6.463775
158
0.000000
0.005024
0.000000
Obs =
2, x-coordinate= 40.5200, y-coordinate= 36.5000, sige=
Variable
Coefficient
t-statistic
t-probability
const
63.563830
15.583144
0.000000
income
-0.369869
-1.551568
0.127201
hse value
-0.683562
-7.288304
0.000000
Constant term
100
2.7690
Gaussian
Exponential
80
60
40
20
10
15
20
25
30
35
40
45
50
Household income
2
0
-2
-4
-6
Gaussian
Exponential
0
10
15
20
25
30
35
40
45
50
House value
1
0
-1
Gaussian
Exponential
-2
10
15
20
25
30
35
40
45
50
4.4
159
these problems and has some advantages over the non-parametric approach
discussed in Section 4.3.
One problem with the non-parametric approach is that valid inferences
cannot be drawn for the GWR regression parameters. To see this, consider
that the locally linear estimates use the same sample data observations (with
different weights) to produce a sequence of estimates for all points in space.
Given a lack of independence between estimates for each location, conventional measures of dispersion for the estimates will likely be incorrect. These
(invalid) conventional measures are what we report in the results structure
from gwr, as this is the approach taken by Brunsdon, Fotheringham and
Charlton (1996).
Another problem is that non-constant variance over space, aberrant observations due to spatial enclave effects, or shifts in regime can exert undue
influence on locally linear estimates. Consider that all nearby observations
in a sub-sequence of the series of locally linear estimates may be contaminated by an outlier at a single point in space. The Bayesian approach
introduced here solves this problem by robustifying against aberrant observations. Aberrant observations are automatically detected and downweighted to lessen their influence on the estimates. The non-parametric
implementation of the GWR model assumed no outliers.
A third problem is that the non-parametric estimates may suffer from
weak data problems because they are based on a distance weighted subsample of observations. The effective number of observations used to produce estimates for some points in space may be very small. This problem
can be solved with the Bayesian approach by incorporating subjective prior
information during estimation. Use of subjective prior information is a wellknown approach for overcoming weak data problems.
In addition to overcoming these problems, the Bayesian formulation introduced here specifies the relationship that is used to smooth parameters
over space. This allows us to subsume the non-parametric GWR method as
part of a much broader class of spatial econometric models. For example,
the Bayesian GWR can be implemented with parameter smoothing relationships that result in: 1) a locally linear variant of the spatial expansion
methods discussed in section 4.1, 2) a parameter smoothing relation appropriate for monocentric city models where parameters vary systematically
with distance from the center of the city, 3) a parameter smoothing scheme
based on contiguity relationships, and 4) a parameter smoothing scheme
based on distance decay.
The Bayesian approach, which we label BGWR is best described using
matrix expressions shown in (4.26) and (4.27). First, note that (4.26) is the
160
Wi
1/2
y = Wi
i =
Xi + i
wi1 Ik . . . win Ik
..
. + ui
(4.26)
(4.27)
n
The terms wij represent a normalized distance-based weight so the rowvector (wi1 , . . ., win ) sums to unity, and we set wii = 0. That is, wij =
P
exp(dij /)/ nj=1 exp(dij /).
To complete our model specification, we add distributions for the terms
i and ui :
i N [0, 2Vi ],
Vi = diag(v1 , v2 , . . . , vn )
0
ui N [0, (X Wi X)
2 2
)]
(4.28)
(4.29)
The Vi = diag(v1 , v2 , . . . , vn ), represent our n variance scaling parameters from Chapter 3. These allow for non-constant variance as we move
across space. One point to keep in mind is that here we have n2 terms to estimate, reflecting n Vi vectors, one vector for each of the n observations. We
will use the same assumption as in Chapter 3 regarding the Vi parameters.
All n2 parameters are assumed to be i.i.d. 2 (r) distributed, where r is our
hyperparameter that controls the amount of dispersion in the Vi estimates
across observations. As in Chapter 3, we introduce a single hyperparameter
r to the estimation problem and receive in return n2 parameter estimates.
Consider that as r becomes very large, the prior imposes homoscedasticity
on the BGWR model and the disturbance variance becomes 2 In for all
observations i.
The distribution for ui in the parameter smoothing relationship is normally distributed with mean zero and a variance based on Zellners (1971)
gprior. This prior variance is proportional to the parameter variancecovariance matrix, 2 (X 0 Wi X)1 ) with 2 acting as the scale factor. The
use of this prior specification allows individual parameters i to vary by
different amounts depending on their magnitude.
161
The parameter 2 acts as a scale factor to impose tight or loose adherence to the parameter smoothing specification. Consider a case where
is very small, then the smoothing restriction would force i to look like
a distance-weighted linear combination of other i from neighboring observations. On the other hand, as (and Vi = In ) we produce the
non-parametric GWR estimates. To see this, we rewrite the BGWR model
in a more compact form:
i i + i
yi = X
(4.30)
i = Ji + ui
(4.31)
Where the definitions of the matrix expressions are:
1/2
yi = Wi y
i = W 1/2X
X
i
Ji =
wi1 Ik . . . win Ik
1
..
= .
n
As indicated earlier, the notation is somewhat confusing in that yi denotes an nvector, not a scalar magnitude. Similarly, i is an nvector and
i is an n by k matrix. Note that (4.30) can be written in the form of a
X
Theil and Goldberger (1961) estimation problem as shown in (4.32).
yi
Ji
i
X
Ik
i +
i
ui
(4.32)
0yi + X
0X
i = R(X
i
i i Ji / )
0X
i + X
i / 2 )1
0X
R = (X
i
0 2
stochastic restriction, X
i iJi / and Xi Xi/ become zero, and we have
the GWR estimates:
i )1 (X
i0X
i0yi )
i = (X
162
(4.33)
In practice, we can use a diffuse prior for which allows the amount of
parameter smoothing to be estimated from sample data information, rather
than by subjective prior information.
Details concerning estimation of the parameters in the BGWR model
are taken up in the next section. Before turning to these issues, we consider
some alternative spatial parameter smoothing relationships that might be
used in lieu of (4.27) in the BGWR model.
One alternative smoothing specification would be the monocentric city
smoothing set forth in (4.34). This relation assumes that the data observations have been ordered by distance from the center of the spatial sample.
i = i1 + ui
(4.34)
0
ui N [0, (X Wi X)
2 2
Given that the observations are ordered by distance from the center, the
smoothing relation indicates that i should be similar to the coefficient i1
from a neighboring concentric ring. Note that we rely on the same GWR
distance-weighted data sub-samples, created by transforming the data using:
Wi y, WiX. This means that the estimates still have a locally linear interpretation as in the GWR. We rely on the same distributional assumption
for the term ui from the BGWR which allows us to estimate the parameters from this model by making minor changes to the approach used for the
BGWR.
Another alternative is a spatial expansion smoothing based on the
ideas introduced by Casetti (1972). This is shown in (4.35), where Zxi , Zyi
denote latitude-longitude coordinates associated with observation i.
i =
Zxi Ik Zyi Ik
x
y
+ ui
(4.35)
ui N [0, 2 2 (X 0 Wi X)1 )]
This parameter smoothing relation creates a locally linear combination
based on the latitude-longitude coordinates of each observation. As in the
case of the monocentric city specification, we retain the same assumptions
regarding the stochastic term ui , making this model simple to estimate with
minor changes to the BGWR methodology.
163
i =
ci1 Ik . . . cin Ik
1
..
. + ui
n
(4.36)
4.4.1
164
(4.37)
Where:
2
0V 1 yi + X
0X
i = R(X
i i
i i Ji / )
i0V 1 X
i + X
i / 2 )1
i0X
R = (X
i
(4.38)
This result follows from the assumed variance-covariance structures for
i , ui and the Theil-Goldberger (1961) representation shown in (4.32).
The conditional posterior distribution for is a 2 (m = n2 ) distribution
shown in (4.39).
n
1 X
(0 V 1 i )}
2 2 i=1 i i
(4.39)
165
The sum in (4.39) extends over the subscript i to indicate that the n
vector of the squared residuals (deflated by the n individual Vi terms) from
each sub-sample of n observations are summed, and then these n sums are
summed as well.
The conditional posterior distribution for Vi is shown in (4.40), which indicates that we draw an n-vector based on a 2 (r+1) distribution. Note that
the individual elements of the matrix Vi act on the spatial weighting scheme
0 V 1 X
i = X 0 Wi V 1 Wi X. The
because the estimates
involve terms like: X
i i
i
p
terms Wi = exp(di /) from the weighting scheme will be adjusted by
the Vi estimates, which are large for aberrant observations or outliers. In
the event of an outlier, observation i will receive less weight when the spatial
distance-based weight is divided by a large Vi value.
p{[(e2i / 2 ) + r]/Vi | . . .} 2 (r + 1)
(4.40)
n
X
2 2
1
0X
(i Ji )0 (X
i i ) (i Ji )/2 }
(4.41)
i=1
0 V 1 yi + X
0X
i = R(X
i i
i i i1 / )
0 V 1 X
i + X
i/ 2 )1
0X
R = (X
i i
(4.42)
n
X
i )1 (i i1 )/ 2 2 }
i0X
(i i1 )0 (X
(4.43)
i=1
For the case of the spatial expansion and contiguity smoothing relationships, we can maintain the conditional expressions for i and from the case
166
4.4.2
(4.44)
Informative priors
Implementing the BGWR model with diffuse priors on may lead to large
values that essentially eliminate the parameter smoothing relationship from
the model. The BGWR estimates will then collapse on the GWR estimates
(in the case of a large value for the hyperparameter r that leads to Vi = In ).
In cases where the sample data is weak or objective prior information suggests spatial parameter smoothing should follow a particular specification,
we can use an informative prior for the parameter . A Gamma(a, b) prior
distribution which has a mean of a/b and variance of a/b2 seems appropriate. Given this prior, we could eliminate the conditional density for and
replace it with a random draw from the Gamma(a, b) distribution.
In order to devise an appropriate prior setting for , consider that the
1 , so setting values for > 1
0 X)
GWR variance-covariance matrix is: 2 (X
would represent a relatively loose imposition of the parameter smoothing
relationship. Values of < 1 would impose the parameter smoothing prior
more tightly.
A similar approach can be taken for the hyperparameter r. Using a
Gamma prior distribution with a = 8, b = 2 that indicates small values of r
around 4, should provide a fair amount of robustification if there is spatial
heterogeneity. In the absence of heterogeneity, the resulting Vi estimates
will be near unity so the BGWR distance weights will be similar to those
from GWR, even with a small value of r.
Additionally, a 2 (c, d) natural conjugate prior for the parameter could
be used in place of the diffuse prior set forth here. This would affect the
conditional distribution used during Gibbs sampling in only a minor way.
Some other alternatives offer additional flexibility when implementing
the BGWR model. For example, one can restrict specific parameters to
exhibit no variation over the spatial sample observations. This might be
167
(4.45)
For example, assuming the constant term is in the first column of the matrix
i , setting the first row and column elements of (X
1 to zero would
0X
X
i i)
restrict the intercept term to remain constant over all observations.
4.4.3
Implementation details
168
prior.k,
(default: not used)
prior.dval, improper delta value (default=diffuse)
prior.s,
informative Gamma(s,t) prior on delta
prior.t,
(default: not used)
prior.ptype, concentric for concentric city smoothing
distance for distance based smoothing (default)
contiguity for contiguity smoothing
casetti
for casetti smoothing (not implemented)
prior.ctr, observation # of central point (for concentric prior)
prior.W,
(optional) prior weight matrix (for contiguity prior)
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
--------------------------------------------------RETURNS: a results structure
results.meth
= bgwr
results.bdraw
= beta draws (ndraw-nomitxnobsxnvar) (3-d matrix)
results.sdraw
= sige draws (ndraw-nomit x 1)
results.vmean
= mean of vi draws (1 x nobs)
results.rdraw
= r-value draws (ndraw-nomit x 1)
results.ddraw
= delta draws (if diffuse prior used)
results.r
= value of hyperparameter r (if input)
results.d
= value of hyperparameter delta (if input)
results.m
= m prior parameter (if input)
results.k
= k prior parameter (if input)
results.s
= s prior parameter (if input)
results.t
= t prior parameter (if input)
results.nobs
= nobs
results.nvar
= nvars
results.ptype
= input string for parameter smoothing relation
results.xcoord = x-coordinates
results.ycoord = y-coordinates
results.ctr
= central point observation # (if concentric prior)
results.dist
= distance vector (if ptype = 0)
results.y
= y data vector
results.logpost = (nobs x 1) vector of approximate log posterior
results.time
= time taken for sampling
--------------------------------------------------NOTE: use either improper prior.rval
or informative Gamma prior.m, prior.k, not both of them
uses exponential distance weighting function exp(-d/bwdith)
where d=distance and bwidth = gwr c.v. determined bandwidth
The user also has control over options for assigning a prior to the hyperparameter r that robustifies with respect to outliers and accommodates
non-constant variance. Either an improper prior value can be set (as a ruleof-thumb I recommend r = 4), or a proper prior based on a Gamma(m,k)
distribution can be used. Here, one would try to rely on a prior in the range
of 4 to 10, because larger values produce estimates that are not robust to
169
heteroscedasticity or outliers. As an example, m = 8, k = 2 would implement a prior with the mean of r = 4 and the variance of r = 2, since the
mean of the Gamma distribution is m/k, and the variance is (m/k2 ).
The hyperparameter can be handled in three ways: 1) we can simply assign an improper prior value using say, prior.dval=20 as an input
option, 2) we can input nothing about this parameter producing a default
implementation based on a diffuse prior where will be estimated, and 3)
we can assign a Gamma(s,t) prior as in the case of the hyperparameter r.
Implementation with a diffuse prior for and a large value for the hyperparameter r will most likely reproduce the non-parameter GWR estimates,
and this approach to producing those estimates requires more computing
time. It is possible (but not likely) that a model and the sample data are
very consistent with the parameter smoothing relationship. If this occurs,
a diffuse prior for will produce a relatively small value as the posterior
estimate. In the most likely cases encountered in practice, small deviations
of the parameters from the smoothing relationship will lead to very large
estimates for , producing BGWR parameter estimates that come very close
to those from the non-parametric GWR model.
The value of the Bayesian approach outlined here lies in the ability to robustifying against outliers, so a default value of r = 4 has been implemented
if the user enters no information regarding the hyperparameter r.
Consider how the following alternative implementations of the various
prior settings could be used to shed light on the nature of parameter variation over space. We can compare the results from a GWR model to a
BGWR model implemented with r = 4 and either a diffuse prior for or
an improper prior based on large value. This comparison should show the
impact of robustification on the estimates, and a plot of the Vi estimates
can be used to detect outliers. Another model based on r = 4 along with
an informative prior for 1 that places some weight on the parameter
smoothing restrictions can be used to see how this alters the estimates when
compared to the robust BGWR estimates. A dramatic difference between
the robust BGWR estimates and those based on the informative prior for
1 indicates that the parameter smoothing relation is inconsistent with
the sample data.
It may be necessary to experiment with alternative values of because
the scale is unclear in any given problem. One way to deal with the scale issue is to calibrate based on the diffuse estimate. As an example, if 2 = 10,
then a value of < 10 will impose the parameter smoothing restriction more
tightly. To see this, consider that the GWR variance-covariance matrix is
1 , so using < 1 moves in the direction of tightening the pa 0 X)
2 (X
170
4.5
An applied exercise
The program in example 4.5 shows how to use the bgwr function to produce
estimates for the Anselin neighborhood crime data set. We begin by using
an improper prior for = 1000000 and setting r = 30 to demonstrate that
the BGWR model can replicate the estimates from the GWR model. The
program plots these estimates for comparison. We produce estimates for
all three parameter smoothing priors, but given the large = 1000000,
these relationships should not effectively enter the model. This will result in
all three sets of estimates identical to those from the GWR. We did this to
illustrate that the BGWR can replicate the GWR estimates with appropriate
settings for the hyperparameters.
% ----- example 4.5 Using the bgwr() function
load anselin.data; % load the Anselin data set
y = anselin(:,1); nobs = length(y); x = [ones(nobs,1) anselin(:,2:3)];
east = anselin(:,4); north = anselin(:,5); tt=1:nobs;
ndraw = 250; nomit = 50;
prior.ptype = contiguity; prior.rval = 30; prior.dval = 1000000;
tic; r1 = bgwr(y,x,east,north,ndraw,nomit,prior); toc;
prior.ptype = concentric; prior.ctr = 20;
tic; r2 = bgwr(y,x,east,north,ndraw,nomit,prior); toc;
dist = res2.dist; [dists di] = sort(dist); % recover distance vector
prior.ptype = distance;
tic; r3 = bgwr(y,x,east,north,ndraw,nomit,prior); toc;
vnames = strvcat(crime,constant,income,hvalue);
% compare gwr estimates with posterior means
info2.dtype = exponential;
result = gwr(y,x,east,north,info2);
bgwr = result.beta(di,:);
b1 = r1.bdraw(:,di,1); b2 = r1.bdraw(:,di,2); b3 = r1.bdraw(:,di,3);
171
As we can see from the graph of the GWR and BGWR estimates shown
in Figure 4.6, the three sets of BGWR estimates are nearly identical to the
GWR. Keep in mind that we dont recommend using the BGWR model
to replicate GWR estimates, as this problem involving 250 draws took 193
seconds for the contiguity prior, 185 for the concentric city prior and 190
seconds for the distance prior.
Example 4.6 produces estimates based on the contiguity smoothing relationship with a value of r = 4 for this hyperparameter to indicate a prior
belief in heteroscedasticity or outliers. We keep the parameter smoothing
relationship from entering the model by setting an improper prior based
on = 1000000, so there is no need to run more than a single parameter
smoothing model. Estimates based on any of the three parameter smoothing
relationships would produce the same results because the large value for
keeps this relation from entering the model. The focus here is on the impact
of outliers in the sample data and how BGWR robust estimates compare
to the GWR estimates based on the assumption of homoscedasticity. We
specify 250 draws with the first 50 to be discarded for burn-in of the Gibbs
sampler. Figure 4.7 shows a graph of the mean of the 200 draws which represent the posterior parameter estimates, compared to the non-parametric
estimates.
% ----- example 4.6 Producing robust BGWR estimates
% load the Anselin data set
172
load anselin.data;
y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)];
east = anselin(:,4); north = anselin(:,5);
ndraw = 250; nomit = 50;
prior.ptype = contiguity; prior.rval = 4; prior.dval = 1000000;
% use diffuse prior for distance decay smoothing of parameters
result = bgwr(y,x,east,north,ndraw,nomit,prior);
vnames = strvcat(crime,constant,income,hvalue);
info.dtype = exponential;
result2 = gwr(y,x,east,north,info);
% compare gwr and bgwr estimates
b1 = result.bdraw(:,:,1); b1mean = mean(b1);
b2 = result.bdraw(:,:,2); b2mean = mean(b2);
b3 = result.bdraw(:,:,3); b3mean = mean(b3);
betagwr = result2.beta;
tt=1:nobs;
subplot(3,1,1),
plot(tt,betagwr(:,1),-k,tt,b1mean,--k);
100
50
gwr
contiguity
concentric
distance
0
10
15
20
25
30
b1 parameter
35
40
45
50
10
15
20
25
30
b2 parameter
35
40
45
50
10
15
20
25
30
b3 parameter
35
40
45
50
2
0
-2
-4
1
0
-1
-2
173
legend(gwr,bgwr);
xlabel(b1 parameter);
subplot(3,1,2),
plot(tt,betagwr(:,2),-k,tt,b2mean,--k);
xlabel(b2 parameter);
subplot(3,1,3),
plot(tt,betagwr(:,3),-k,tt,b3mean,--k);
xlabel(b3 parameter);
pause;
plot(result.vmean);
xlabel(Observations);
ylabel(V_{i} estimates);
100
80
60
40
20
gwr
bgwr
0
10
15
20
25
30
b1 parameter
35
40
45
50
10
15
20
25
30
b2 parameter
35
40
45
50
10
15
20
25
30
b3 parameter
35
40
45
50
2
0
-2
-4
1
0
-1
-2
174
BGWR
GWR
Observation 31
Observation 30
0.5
1
0.5
O
0
10
20
30
40
50
1
0.5
0
O
0
10
20
30
40
30
10
20
30
10
20
30
40
50
40
50
40
50
0.5
O
1.5
Observation 35
Observation 34
20
50
0.5
10
1.5
Observation 33
Observation 32
1.5
O
0
O
0
10
20
30
40
50
1
0.5
0
175
Ultimately, the role of the parameters Vi in the model and the prior
assigned to these parameters reflects our prior knowledge that distance alone
may not be reliable as the basis for spatial relationships between variables.
If distance-based weights are used in the presence of aberrant observations,
inferences will be contaminated for whole neighborhoods and regions in our
analysis. Incorporating this prior knowledge turns out to be relatively simple
in the Bayesian framework, and it appears to effectively robustify estimates
against the presence of spatial outliers.
2.6
2.4
2.2
1.8
1.6
Vi
-1
estimates
1.4
1.2
0.8
10
15
20
25
30
neighborhoods
35
40
45
50
176
r-value
=
4.0000
delta-value
=
25.0776
gam(m,k) d-prior =
50,
2
time in secs
= 289.2755
prior type
=
distance
***************************************************************
Obs =
1, x-coordinate= 35.6200, y-coordinate= 42.3800
Variable
Coefficient
t-statistic
t-probability
constant
74.130145
15.143596
0.000000
income
-2.208382
-9.378908
0.000000
hvalue
-0.197050
-5.166565
0.000004
Obs =
2, x-coordinate= 36.5000, y-coordinate= 40.5200
Variable
Coefficient
t-statistic
t-probability
constant
82.308344
20.600005
0.000000
income
-2.559334
-11.983369
0.000000
hvalue
-0.208478
-4.682454
0.000023
The next task was to implement the BGWR model with a diffuse prior
on the parameter. The results indicated that the mean of the draws for
was: around 32 for the contiguity prior, 43 for the concentric prior and 33
for the distance prior. The BGWR estimates were almost identical to those
from the improper prior = 1000000, so we do not present these graphically.
Given these estimates for with a diffuse prior, we can impose the parameter restrictions by setting smaller values for , say in the range of 1
to 10. This should produce differing estimates that rely on the alternative
parameter smoothing relationships. We used = 1 to impose the parameter
smoothing relationships fairly tightly. The resulting parameter estimates
are shown in Figure 4.10.
Here we see some departure between the estimates based on alternative smoothing relationships. This raises the question of which smoothing
relationship is most consistent with the data.
We can compute posterior probabilities for each of the three models
based on alternative parameter smoothing relationships using an approximation from Leamer (1983). Since this is a generally useful way of comparing
alternative Bayesian models, the bgwr() function returns a vector of the
approximate log posterior for each observation. The nature of this approximation as well as the computation is beyond the scope of our discussion
here. The program in example 4.7 demonstrates how to use the vector of
log posterior magnitudes to compute posterior probabilities for each model.
We set the parameter = 0.5, to impose the three alternative parameter
smoothing relationships even tighter than the hyperparameter value of = 1
used to generate the estimates in Figure 4.10. A successive tightening of the
177
parameter smoothing relationships will show which relationship is most consistent with the sample data and which relationship is rejected by the sample
data. A fourth model based on = 1000 is also estimated to test whether
the sample data rejects all three parameter smoothing relationships.
% ----- example 4.7 Posterior probabilities for models
% load the Anselin data set
load anselin.data; y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)]; [junk nvar] = size(x);
east = anselin(:,4); north = anselin(:,5);
ndraw = 550; nomit = 50; % estimate all three models
prior.ptype = contiguity;
prior.rval = 4; prior.dval = 0.5;
res1 = bgwr(y,x,east,north,ndraw,nomit,prior);
prior2.ptype = concentric;
prior2.ctr = 20; prior2.rval = 4; prior2.dval = 0.5;
res2 = bgwr(y,x,east,north,ndraw,nomit,prior2);
prior3.ptype = distance;
prior3.rval = 4; prior3.dval = 0.5;
100
80
60
contiguity
concentric
distance
40
20
10
15
20
25
30
b1 parameter
35
40
45
50
10
15
20
25
30
b2 parameter
35
40
45
50
10
15
20
25
30
b3 parameter
35
40
45
50
0
-1
-2
-3
-4
1
0.5
0
-0.5
178
179
in.fmt = strvcat(%4d,%8.2f,%8.2f,%8.2f,%8.2f,%8.2f);
in.cnames = strvcat(Obs,avg,contiguity,concentric,...
distance,diffuse);
fprintf(1,constant term parameter \n);
mprint(b1out,in);
b2out = [ttp bavg(:,2) bb(:,2) bb(:,5) bb(:,8) bb(:,11)];
fprintf(1,household income parameter \n);
mprint(b2out,in);
b3out = [ttp bavg(:,3) bb(:,3) bb(:,6) bb(:,9) bb(:,12)];
in.fmt = strvcat(%4d,%8.3f,%8.3f,%8.3f,%8.3f,%8.3f);
fprintf(1,house value parameter \n);
mprint(b3out,in);
The graphical display of the posterior probabilities produced by the program in example 4.7 are shown in Figure 4.11 and the parameter estimates
are shown in Figure 4.12. We see some divergence between the parameter estimates produced by the alternative spatial smoothing priors and the model
with no smoothing prior whose estimates are graphed using + symbols,
but not a dramatic amount. Given the similarity of the parameter estimates, we would expect relatively uniform posterior probabilities for the
four models.
In Figure 4.11 the posterior probabilities for the model with no parameter smoothing is graphed as a line to make comparison with the three models
that impose parameter smoothing easy. We see certain sample observations
where the parameter smoothing relationships produce lower model probabilities than the model without parameter smoothing. For most observations
however, the parameter smoothing relationships are relatively consistent
with the sample data, producing posterior probabilities above the model
with no smoothing relationship. As we would expect, none of the models
dominates.
A Bayesian solution to the problem of model specification and choice is
to produce a mixed or averaged model that relies on the posterior probabilities as weights. We would simply multiply the four sets of coefficient
estimates by the four probability vectors to produce a Bayesian model averaging solution to the problem of which estimates are best.
A tabular presentation of this type of result is shown below. The averaged results have the virtue that a single set of estimates are available from
which to draw inferences and one can feel comfortable that the inferences
are valid for a wide variety of model specifications.
constant term parameter
Obs
average contiguity concentric
1
65.92
62.72
63.53
distance
77.64
diffuse
57.16
73.88
78.40
72.94
62.61
73.14
78.72
72.28
75.88
63.02
61.74
59.85
57.39
54.34
52.68
61.30
65.12
71.63
71.78
75.95
71.09
70.93
75.45
68.92
64.05
72.44
75.96
70.77
74.12
60.14
57.78
56.30
55.26
53.01
53.36
58.86
61.18
70.19
66.98
72.70
67.98
69.09
76.31
69.34
59.64
69.78
79.39
72.37
77.12
67.73
65.78
63.28
61.57
55.23
48.64
64.67
70.64
73.53
76.34
78.37
75.85
79.02
79.41
78.35
74.42
78.08
78.53
73.27
75.90
65.27
64.69
63.03
60.29
58.39
57.87
62.91
66.82
72.92
72.72
75.42
71.74
180
76.78
82.78
75.52
49.72
72.32
81.11
72.71
76.40
55.47
55.75
55.18
52.13
51.30
51.69
58.67
60.95
69.68
70.83
77.34
68.54
0.34
contiguity
concentric
distance
diffuse
0.32
probabilities
0.3
0.28
0.26
0.24
0.22
0.2
10
15
20
25
observations
30
35
40
45
50
69.67
70.48
72.66
68.20
63.00
68.06
66.06
70.67
74.97
74.99
76.53
76.54
75.99
74.54
73.08
75.10
75.99
73.69
71.52
65.65
67.08
70.60
64.11
55.68
61.08
58.06
66.11
71.80
72.23
75.21
75.11
75.03
73.76
72.79
72.84
74.56
72.20
70.39
73.99
74.32
74.26
72.25
65.25
73.99
67.35
74.73
76.25
77.88
78.22
77.90
75.24
76.34
73.09
78.38
76.65
74.50
72.53
71.23
71.02
73.61
69.78
68.75
71.04
72.25
72.47
75.23
74.64
75.86
75.72
75.90
75.27
75.27
76.24
76.88
76.46
75.59
181
67.62
69.39
72.16
66.48
62.21
65.97
66.56
69.26
76.81
75.23
76.86
77.53
77.92
72.63
71.00
72.66
75.82
71.30
67.05
100
80
60
40
20
10
15
20
25
30
35
40
b1 parameter, + = diffuse, - = contiguity, -- = concentric -. = distance
45
50
10
15
20
25
30
35
40
b2 parameter + = diffuse, - = contiguity, -- = concentric -. = distance
45
50
10
15
20
25
30
35
40
b3 parameter + = diffuse, - = contiguity, -- = concentric -. = distance
45
50
0
-1
-2
-3
-4
1
0.5
0
-0.5
-1
71.88
72.53
70.53
49.13
43.33
36.60
32.64
30.66
39.28
70.31
71.24
67.93
48.73
47.61
39.29
42.26
37.17
41.89
182
75.41
76.56
68.68
51.45
45.99
40.08
30.70
34.90
43.01
75.16
75.02
70.68
44.19
38.48
34.98
30.76
31.87
37.57
66.08
66.81
75.03
51.55
40.45
31.96
28.13
20.91
33.87
distance
-2.50
-2.57
-2.60
-2.52
-2.41
-2.60
-2.67
-2.54
-2.57
-2.34
-2.32
-2.20
-2.02
-2.12
-2.32
-2.33
-2.44
-2.52
-2.62
-2.79
-2.80
-2.99
-3.25
-3.39
-3.37
-3.42
-3.59
-3.55
-3.48
-3.17
-3.15
-2.72
-3.08
-2.98
-2.61
-2.43
-2.41
diffuse
-0.27
-1.68
-2.16
-0.07
-0.68
-1.92
-2.63
-2.55
-2.16
-1.76
-1.76
-1.67
-1.37
-1.26
-1.34
-1.73
-1.76
-2.22
-2.19
-2.71
-2.34
-2.53
-3.09
-3.48
-3.42
-3.12
-3.42
-3.36
-3.50
-3.21
-3.14
-2.29
-3.10
-2.83
-1.51
-1.14
-1.30
-2.31
-1.80
-1.65
-1.63
-1.77
-2.75
-1.38
-1.12
-0.90
-0.63
-0.80
-1.04
-2.04
-1.67
-1.49
-1.67
-1.97
-2.51
-1.47
-1.25
-1.04
-1.03
-1.38
-1.39
183
-2.90
-2.22
-2.25
-2.12
-2.25
-3.22
-1.27
-1.28
-1.00
-0.57
-0.76
-1.13
-2.48
-2.40
-2.32
-2.31
-2.36
-2.61
-1.16
-0.90
-0.80
-0.62
-0.67
-0.88
-1.78
-0.83
-0.39
-0.28
-0.40
-2.62
-1.60
-0.99
-0.74
-0.40
-0.44
-0.75
distance
-0.113
-0.110
-0.084
-0.071
-0.101
-0.094
-0.056
-0.010
0.018
0.089
0.096
0.075
0.056
0.118
0.232
0.134
0.118
0.036
0.089
0.115
0.186
0.275
0.380
0.379
0.485
0.544
0.555
0.519
0.468
0.259
0.258
0.080
0.200
0.165
diffuse
-0.566
-0.475
-0.364
-0.823
-0.265
-0.258
-0.166
-0.037
-0.113
0.020
0.017
0.004
-0.041
-0.059
0.097
0.006
-0.057
-0.048
-0.046
0.044
0.039
0.153
0.393
0.479
0.604
0.526
0.635
0.554
0.578
0.235
0.256
-0.051
0.164
0.086
4.6
-0.005
-0.091
-0.122
-0.059
-0.165
-0.139
-0.139
-0.114
0.186
-0.059
-0.054
-0.032
-0.029
0.022
-0.014
0.017
-0.090
-0.183
-0.146
-0.216
-0.205
-0.133
-0.057
0.127
0.018
-0.057
0.025
-0.022
0.231
0.124
0.061
-0.043
-0.003
0.131
-0.027
0.029
-0.048
-0.063
0.420
-0.128
-0.031
-0.052
-0.036
-0.042
-0.047
0.051
0.002
-0.035
-0.030
-0.044
-0.040
-0.030
-0.013
0.116
-0.055
-0.056
-0.045
-0.037
-0.039
-0.052
184
-0.164
-0.246
-0.280
-0.209
-0.399
-0.370
-0.367
-0.341
0.068
-0.065
-0.076
-0.052
-0.023
-0.062
-0.076
Chapter Summary
We have seen that locally linear regression models can be estimated using distance weighted sub-samples of the observations to produce different
estimates for every point in space. This approach can deal with spatial heterogeneity and provide some feel for parameter variation over space in the
relationships being explored.
Some problems arise in using spatial expansion models because they tend
to produce heteroscedastic disturbances by construction. This problem is
overcome to some extent with the DARP model approach.
The non-parametric locally linear regression models produce problems
with respect to inferences about the parameters as they vary over space.
We saw how a Bayesian approach can provide valid posterior inferences
overcome these problems.
The Bayesian GWR model also solves some problems with the nonparametric implementation of the GWR regarding non-constant variance
over space or outliers. Given the locally linear nature of the GWR estimates, aberrant observations tend to contaminate whole subsequences of
the estimates. The BGWR model robustifies against these observations by
automatically detecting and downweighting their influence on the estimates.
A further advantage of this approach is that a diagnostic plot can be used
to identity observations associated with regions of non-constant variance or
spatial outliers.
Finally, an advantage of the Bayesian approach is that is subsumes the
spatial expansion, DARP and GWR models as special cases and provides
a more flexible implementation by explicitly including a relationship that
185
describes parameter smoothing over space to the model. Diffuse implementation of the parameter smoothing specification leads to the non-parametric
GWR model. In addition to replicating the GWR estimates, the Bayesian
model presented here can produce estimates based on parameter smoothing
specifications that rely on: distance decay relationships, contiguity relationships, monocentric distance from a central point, or the latitude-longitude
locations proposed by Casetti (1972).
4.7
References
Brunsdon, C., A. S. Fotheringham, and M.E. Charlton. (1996) Geographically weighted regression: A method for exploring spatial nonstationarity, Geographical Analysis, Volume 28, pp. 281-298.
Casetti, E. (1972) Generating Models by the Expansion Method: Applications to Geographic Research, Geographical Analysis, December,
Volume 4, pp. 81-91.
Casetti, E. (1992) Bayesian Regression and the Expansion Method,
Geographical Analysis, January, Volume 24, pp. 58-74.
Casetti, E. (1982) Drift Analysis of Regression Parameters: An Application to the Investigation of Fertility Development Relations, Modeling and Simulation 13, Part 3:, pp. 961-66.
Casetti, E. and A. Can (1998) The Econometric estimation and testing of DARP models. Paper presented at the RSAI meetings, Sante
Fe, New Mexico.
Leamer, Edward. 1983. Model Choice and Specification Analysis,
in Handbook of Econometrics, Volume 1, Chapter 5, Zvi Griliches and
Michael Intriligator (eds.) (Amsterdam: North-Holland).
LeSage, James P. (1997) Bayesian Estimation of Spatial Autoregressive Models, International Regional Science Review, 1997 Volume 20,
number 1&2, pp. 113-129
McMillen, Daniel P. (1996) One Hundred Fifty Years of Land Values
in Chicago: A Nonparameteric Approach, Journal of Urban Economics, Vol. 40, pp. 100-124.
186
Chapter 5
187
188
(5.1)
(5.2)
Two distributions that have been traditionally used to produce this type
of outcome in the case of regression models (that ensures predicted values
between zero and one) are the logisitic and normal distributions resulting
in the logit model shown in (5.3) and probit model shown in (5.4), where
denotes the cumulative normal probability function.
P rob(y = 1) = eX /(1 + eX )
(5.3)
P rob(y = 1) = (X)
(5.4)
The logistic distribution is similar to the normal except in the tails where
it is fatter resembling a Student tdistribution. Green (1997) and others
indicate that the logistic distribution resembles a tdistribution with seven
degrees of freedom.
5.1
Introduction
McMillen (1992) proposed methods for estimating SAR and SEM probit
models containing spatial heteroscedasticity that rely on the EM algorithm.
Aside from McMillen (1992), very little work has appeared regarding spatial autoregressive models that contain binary or polychotomous dependent
variables.
McMillen (1995) investigates the impact of heteroscedasticity that is
often present in spatial models on probit estimation for non-spatial autoregressive models. McMillen and McDonald (1998) propose a non-parametric
locally linear probit method for GWR models of the type discussed in Chapter 4.
Bayesian estimation of logit/probit and tobit variants of spatial autoregressive models that exhibit heteroscedasticity is developed in this chapter.
The approach taken draws on work by Chib (1992) and Albert and Chib
(1993) as well as the Bayesian estimation of spatial autoregressive models
set forth in Chapter 3.
Accounting for heteroscedasticity in logit/probit and tobit models is important because estimates based on the assumption of homoscedasticity in
189
190
191
arise from the methods in Chapter 3 to deal with spatial heterogeneity and
spatial outliers. Recall that models involving binary data can rely on any
continuous probability function as the probability rule linking fitted probabilities with the binary observations. Probit models arise from a normal
probability rule and logit models from a logistic probability rule. When
one introduces the latent variables zi in the probit model to reflect unobserved values based on the binary dependent variables yi , we have an underlying conditional regression involving z and the usual spatial regression
model variables X, W , where X represents the explanatory variables and
W denotes the row-standardized spatial weight matrix. The heteroscedastic
spatial autoregressive model introduced in Chapter 3 can be viewed in the
case of binary dependent variables as a probability rule based on a family
of tdistributions that represent a mixture of the underlying normal distribution used in the probit regression. (It is well known that the normal
distribution can be modeled as a mixture of tdistributions, see Albert and
Chib, 1993).
The most popular choice of probability rule to relate fitted probabilities
with binary data is the logit function corresponding to a logistic distribution
for the cumulative density function. Albert and Chib (1993) show that the
quantiles of the logistic distribution correspond to a t-distribution around
7 or 8 degrees of freedom. We also know that the normal probability density is similar to a tdistribution when the degrees of freedom are large.
This allows us to view both the probit and logit models as special cases of
the family of models introduced here using a chi-squared prior based on a
hyperparameter specifying alternative degrees of freedom to model spatial
heterogeneity and outliers.
By using alternative values for the prior hyperparameter that we labeled
r in Chapter 3, one can test the sensitivity of the fitted probabilities to
alternative distributional choices for the regression model. For example, if
we rely on a value of r near 7 or 8, the estimates resulting from the Bayesian
version of the heteroscedastic probit model correspond to those one would
achieve using a logit model. On the other hand, using a large degrees of
freedom parameter, say r = 50 would lead to estimates that produce fitted
probabilities based on the probit model choice of a normal probability rule.
The implication of this is that the heteroscedastic spatial probit model we
introduce here represents a more general model than either probit or logit.
The generality derives from the family of tdistributions associated with
alternative values of the hyperparameter r in the model.
A final advantage of the method described here is that estimates of the
non-constant variance for each point in space are provided and the prac-
192
titioner need not specify a functional form for the non-constant variance.
Spatial outliers or aberrant observations as well as patterns of spatial heterogeneity will be identified in the normal course of estimation. This represents a considerable improvement over the approach described by McMillen
(1992), where a separate model for the non-constant variance needs to be
specified.
5.2
j ij
(In W )1 for both the SAR and SEM models. The pdf of the latent
variables zi is then:
f (zi |, , ) =
193
[1 (
yi /ti)]1 exp[(zi yi )2 /2it], if zi 0
0
if zi > 0
(5.5)
Similarly, for the case of probit, the conditional distribution of zi given
all other parameters is:
(
f (zi |, , )
(5.6)
2
2
Where pi
= j ij
, because the probit model is unable to identify both
2
and , leading us to scale our problem so 2 equals unity. The predicted
value yi takes the same form for the SAR and SEM models as described
above for the case of tobit.
The tobit expression (5.5) indicates that we rely on the actual observed
y values for non-censored observations and use the sampled latent variables
for the unobserved values of y. For the case of the probit model, we replace
values of yi = 1 with the sampled normals truncated at the left by 0 and
values of yi = 0 with sampled normals truncated at the right by 0.
Given these sampled continuous variables from the conditional distribution of zi , we can implement the remaining steps of the Gibbs sampler
described in Chapter 3 to determine draws from the conditional distributions for , ( and in the case of tobit) using the sampled zi values in
place of the censored variables yi .
5.3
Heteroscedastic models
The models described in this section can be expressed in the form shown
in (5.5), where we relax the usual assumption of homogeneity for the disturbances used in SAR, SEM and SAC modeling. Given the discussion in
Section 5.2, we can initially assume the existence of non-censored observations on the dependent variable y because these can be replaced with
sampled values z as motivated in the previous section.
y = W1 y + X + u
u = W2 u +
N(0, 2V ),
V = diag(v1 , v2 , . . ., vn )
(5.7)
194
5.4
195
We have functions sarp g and sart g that carry out Gibbs sampling estimation of the probit and tobit spatial autoregressive models. The documentation for sarp g is:
PURPOSE: Gibbs sampling spatial autoregressive Probit model
y = p*Wy + Xb + e, e is N(0,sige*V)
y is a 0,1 vector
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
B = N(c,T), sige = gamma(nu,d0), p = diffuse prior
--------------------------------------------------USAGE: results = sarp_g(y,x,W,ndraw,nomit,prior,start)
where: y = dependent variable vector (nobs x 1)
x = independent variables matrix (nobs x nvar)
W = 1st order contiguity matrix (standardized, row-sums = 1)
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
prior = a structure for: B = N(c,T), sige = gamma(nu,d0)
prior.beta, prior means for beta,
c above (default 0)
prior.bcov, prior beta covariance , T above (default 1e+12)
prior.rval, r prior hyperparameter, default=4
prior.m,
informative Gamma(m,k) prior on r
prior.k,
(default: not used)
prior.nu,
a prior parameter for sige
prior.d0,
(default: diffuse prior for sige)
prior.rmin = (optional) min rho used in sampling
prior.rmax = (optional) max rho used in sampling
start = (optional) structure containing starting values:
defaults: beta=1,sige=1,rho=0.5, V= ones(n,1)
start.b
= beta starting values (nvar x 1)
start.p
= rho starting value
(scalar)
start.sig = sige starting value (scalar)
start.V
= V starting values (n x 1)
--------------------------------------------------RETURNS: a structure:
results.meth = sarp_g
results.bdraw = bhat draws (ndraw-nomit x nvar)
results.sdraw = sige draws (ndraw-nomit x 1)
results.vmean = mean of vi draws (1 x nobs)
results.ymean = mean of y draws (1 x nobs)
results.rdraw = r draws (ndraw-nomit x 1) (if m,k input)
results.pdraw = p draws
(ndraw-nomit x 1)
results.pmean = b prior means, prior.beta from input
results.pstd = b prior std deviations sqrt(diag(T))
results.r
= value of hyperparameter r (if input)
results.r2mf = McFadden R-squared
results.rsqr = Estrella R-squared
196
results.nobs = # of observations
results.nvar = # of variables in x-matrix
results.zip
= # of zero y-values
results.ndraw = # of draws
results.nomit = # of initial draws omitted
results.y
= actual observations (nobs x 1)
results.yhat = predicted values
results.nu
= nu prior parameter
results.d0
= d0 prior parameter
results.time = time taken for sampling
results.accept= acceptance rate
results.rmax = 1/max eigenvalue of W (or rmax if input)
results.rmin = 1/min eigenvalue of W (or rmin if input)
This function documentation and use is very similar to the sar g function
from Chapter 3. One difference is in the measures of fit calculated. These
are Rsquared measures that are traditionally used for limited dependent
variable models.
Following an example provided by McMillen (1992) for his EM algorithm
approach to estimating SAR and SEM probit models, we employ the data
set from Anselin (1988) on crime in Columbus, Ohio. McMillen censored the
dependent variable on crime such that yi = 1 for values of crime greater than
40 and yi = 0 for values of crime less than or equal to 40. The explanatory
variables in the model are neighborhood housing values and neighborhood
income. Example 5.1 demonstrates how to implement a spatial probit model
Gibbs sampler using the probit g function.
% ----- Example 5.1 SAR Probit Model
load anselin.data;
y = anselin(:,1); [n junk] = size(y);
x = [ones(n,1) anselin(:,2:3)];
vnames = strvcat(crime,constant,income,hvalue);
load Wmat.data; W = Wmat;
yc = zeros(n,1);
% now convert the data to 0,1 values
for i=1:n
if y(i,1) > 40.0
yc(i,1) = 1;
end;
end;
ndraw = 1100; nomit = 100;
prior.rval = 4; prior.rmin = 0; prior.rmax = 1;
result = sarp_g(yc,x,W,ndraw,nomit,prior);
prt(result,vnames);
plt(result,vnames);
197
The printed results are shown below and the graphical results provided
by the plt function are shown in Figure 5.1. For comparison we also present
the results from ignoring the limited dependent variable nature of the y
variable in this model and using the sar function to produce maximum
likelihood estimates.
Gibbs sampling spatial autoregressive Probit model
Dependent Variable =
crime
McFadden R^2
=
0.4122
Estrella R^2
=
0.5082
sigma^2
=
2.4771
r-value
=
4
Nobs, Nvars
=
49,
3
# 0, 1 y-values =
30,
19
ndraws,nomit
=
1100,
100
acceptance rate =
0.7836
time in secs
=
69.0622
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Prior Mean
Std Deviation
constant
0.000000 1000000.000000
income
0.000000 1000000.000000
hvalue
0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable
Coefficient
t-statistic
t-probability
constant
3.616236
2.408302
0.020092
income
-0.197988
-1.698013
0.096261
hvalue
-0.036941
-1.428174
0.159996
rho
0.322851
2.200875
0.032803
Spatial autoregressive Model Estimates
R-squared
=
0.5216
Rbar-squared
=
0.5008
sigma^2
=
0.1136
Nobs, Nvars
=
49,
3
log-likelihood =
-1.1890467
# of iterations =
13
min and max rho =
-1.5362,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
variable 1
0.679810
2.930130
0.005259
variable 2
-0.019912
-1.779421
0.081778
variable 3
-0.005525
-1.804313
0.077732
rho
0.539201
2.193862
0.033336
Gibbs sampling spatial autoregressive Probit model
Dependent Variable =
crime
198
McFadden R^2
=
0.3706
Estrella R^2
=
0.4611
sigma^2
=
2.2429
r-value
=
40
Nobs, Nvars
=
49,
3
# 0, 1 y-values =
30,
19
ndraws,nomit
=
1100,
100
acceptance rate =
0.9616
time in secs
=
70.3423
min and max rho =
0.0000,
1.0000
***************************************************************
Variable
Prior Mean
Std Deviation
constant
0.000000 1000000.000000
income
0.000000 1000000.000000
hvalue
0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable
Coefficient
t-statistic
t-probability
constant
4.976554
2.408581
0.020078
income
-0.259836
-2.047436
0.046351
hvalue
-0.053615
-1.983686
0.053278
rho
0.411042
3.285293
0.001954
There is a remarkable difference between the maximum likelihood estimates that ignore the limited dependent nature of the variable y which we
would expect. To explore the difference between logit and probit estimates, we produced a set of Bayesian estimates based on a hyperparameter
value of r = 40, which would correspond to the Probit model. Green (1997)
states that the issue of which distributional form should be used on applied
econometric problems is unresolved. He further indicates that inferences
from either logit or probit models are often the same. This does not appear
to be the case for the data set in example 5.1, where we see a difference
in both the magnitude and significance of the parameters from the models
based on r = 4 and r = 40.
A final point with respect to interpreting the estimates is that the marginal
impacts of the variables on the fitted probabilities is usually the inference
aim of these models. The estimates would need to be converted according
to the probability rule implied by the alternative underlying tdistributions
associated with the r value employed. These marginal impacts would be
very similar (as in the case of non-spatial maximum likelihood logit versus
probit marginal impacts) despite the apparent differences in the coefficients.
For the purpose of computing marginal impacts, one needs to evaluate the
posterior density of pk , which we denote
(pk ) for k ranging over all sample
observations. (It is conventional to assess marginal impacts across all obser-
199
Residuals
1.2
1
0.5
0.8
0.6
0.4
-0.5
0.2
0
10
20
30
-1
40
10
Mean of Vi draws
2.5
1.9
1.8
1.5
1.7
1.6
0.5
1.5
0
0
10
20
30
30
40
50
1.4
20
40
50
0.5
(pk ) = (1/m)
m
X
i=1
N[
yki , (1/vki )x0k (X 0 Vi1 X)1 xk ]/N[0, 1]
(5.8)
200
estimates are reported for both r = 4 and r = 40. Results for two values of r
were reported because the inferences are different from these two models. It
should be noted that there is a tendency of the consistent EM estimates of
dispersion to overstate the precision, producing generally larger tstatistics
for the EM versus Gibbs estimates.
Table 5.1: EM versus Gibbs estimates
CONSTANT
tvalue
INCOME
tvalue
HOUSING
tvalue
tvalue
EM
SAR
2.587
2.912
-0.128
-2.137
-0.029
-1.617
0.429
Gibbs
SAR r = 4
3.758
2.150
-0.213
-1.583
-0.037
-1.416
0.325
2.175
Gibbs
SAR r = 40
4.976
2.408
-0.259
-2.047
-0.053
-1.983
0.411
3.285
EM
SEM
2.227
3.115
-0.123
-2.422
-0.025
-1.586
0.279
Gibbs
SEM r = 4
3.925
1.936
-0.214
-1.636
-0.048
-1.439
0.311
1.766
Gibbs
r = 40
2.710
2.168
-0.143
-1.719
-0.032
-1.791
0.315
1.796
Another reason why the EM and Gibbs estimates may be different is that
McMillens approach requires that a model for the non-constant variance be
specified. The specification used by McMillen was: vi = 0.0007INCOME2 +
0.0004HOUSING2 . This is quite different from the approach taken in the
Bayesian model that relies on vi estimates from Gibbs sampling.
There are functions for carrying out Bayesian Probit and Tobit versions
of all spatial autoregressive models. SAR models are implemented by sarp g
and sart g, SEM models by semp g and semt g, SAC models by sacp g
and sact g.
Tobit models can involve either left or right truncation. That is censored
observations can be y values that fall below a limiting value or censoring can
take place above a limit value. The functions sart g, sact g and semt g
allow the user to specify the type of censoring and a limit value. This
defaults to the typical case of left censoring at zero. The documentation for
the function sart g is:
PURPOSE: Gibbs sampling spatial autoregressive Tobit model
y = p*Wy + Xb + e, e is N(0,sige*V)
y is a censored vector (assumed to be at zero)
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
B = N(c,T), sige = gamma(nu,d0), p = diffuse prior
--------------------------------------------------USAGE: results = sart_g(y,x,W,prior,ndraw,nomit,start)
where: y = dependent variable vector (nobs x 1)
=
=
=
=
=
201
202
203
The printed results for an SAR model based on the uncensored data as
well as a set of estimates that ignore the sample censoring and the Tobit
version of the SAR model are presented below.
Spatial autoregressive Model Estimates
Dependent Variable =
crime
R-squared
=
0.7344
Rbar-squared
=
0.7229
sigma^2
=
5.6564
Nobs, Nvars
=
49,
3
log-likelihood =
-99.041858
# of iterations =
13
min and max rho =
-1.5362,
1.0000
***************************************************************
Variable
Coefficient
t-statistic
t-probability
constant
0.617226
1.328646
0.190519
income
1.064967
2.271643
0.027831
hvalue
0.897449
2.254232
0.028988
rho
0.724180
5.068285
0.000007
Gibbs sampling spatial autoregressive model
Dependent Variable =
crime
R-squared
=
0.7013
sigma^2
=
3.4570
r-value
=
30
Nobs, Nvars
=
49,
3
ndraws,nomit
=
600,
100
acceptance rate =
0.9662
time in secs
=
14.4129
min and max rho =
-1.5362,
1.0000
***************************************************************
Variable
Prior Mean
Std Deviation
constant
0.000000 1000000.000000
income
0.000000 1000000.000000
hvalue
0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable
Coefficient
t-statistic
t-probability
constant
1.412658
3.037593
0.003922
income
1.253225
3.479454
0.001111
hvalue
0.364549
1.267454
0.211372
rho
0.599010
6.097006
0.000000
Gibbs sampling spatial autoregressive Tobit model
Dependent Variable =
crime
R-squared
=
0.6977
sigma^2
=
6.9121
r-value
=
30
Nobs, Nvars
=
49,
3
204
# censored values =
16
ndraws,nomit
=
600,
100
acceptance rate
=
0.9120
time in secs
=
19.7692
min and max rho
=
-1.5362,
1.0000
***************************************************************
Variable
Prior Mean
Std Deviation
constant
0.000000 1000000.000000
income
0.000000 1000000.000000
hvalue
0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable
Coefficient
t-statistic
t-probability
constant
0.936753
1.847863
0.071058
income
1.479955
2.844135
0.006624
hvalue
0.544580
1.320079
0.193340
rho
0.629394
6.446799
0.000000
5.5
An applied example
The Harrison and Rubinfeld Boston data set used in Chapter 3 to illustrate
Gibbs sampling Bayesian spatial autoregressive models contains censored
values. Median house values greater than $50,000 were set equal to 50,000
for 16 of the 506 sample observations (see Gilley and Pace, 1995). This
provides an opportunity to see if using tobit estimation to take the sample
truncation into account produces different parameter estimates.
Example 5.3 reads the Boston data set and sorts by median housing
values. Note that we must also sort the explanatory variables using the
index vector yind returned by the sort for the y values as well as the latitude
and longitude vectors. After carrying out Gibbs sampling estimation for the
SAR, SEM and SAC models, we add to the prior structure variable a field
for right-truncation and we supply the limit value which is the log of 50,000
205
15
Actual
Mean of Gibbs samples
10
Crime
-5
-10
10
15
20
25
30
Observations
35
40
45
50
206
-0.089610
0.267168
-0.036438
-0.178140
0.188203
-0.212748
-0.117601
0.107424
-0.313225
0.314435
-1.803644
5.731838
-0.878105
-5.673916
2.713291
-5.525728
-3.980184
3.873596
-7.068474
3.212364
0.071899
0.000000
0.380315
0.000000
0.006895
0.000000
0.000079
0.000122
0.000000
0.001403
207
208
0.728140
12.943930
0.000000
209
Coefficient
-0.040444
-0.155084
0.045257
0.005911
-0.003574
-0.106601
0.292600
-0.104126
-0.173827
0.214266
-0.239525
-0.110112
0.131408
-0.283951
0.666444
0.100070
t-statistic
-0.967258
-4.123056
1.152237
0.111250
-0.144284
-1.739330
6.640997
-2.252888
-3.428468
2.627556
-5.153612
-4.067087
4.760042
-6.968683
9.336428
1.795139
210
t-probability
0.333890
0.000044
0.249783
0.911463
0.885335
0.082602
0.000000
0.024706
0.000658
0.008869
0.000000
0.000055
0.000003
0.000000
0.000000
0.073245
Contrary to our expectation that these two sets of estimates would produce identical inferences, an interesting and perhaps substantive conclusion
arises. In comparing the estimates that ignore sample censoring to the tobit
estimates we find further evidence regarding the noxsq air pollution variable. For all of the tobit models, this variable is less significant than for
the non-tobit models. Recall that in Chapter 3 we found that maximum
likelihood estimates produced estimates for this variable that were significantly different from zero for all three spatial autoregressive models at the
traditional 5% level. After introducing the Bayesian heteroscedastic spatial
models we found estimates that were not significantly different from zero
at the 5% level, but still significant at the 10% level. After introducing tobit variants of the Bayesian heteroscedastic spatial models we find further
movement away from significance for this variable. In the case of the SEM
model we find that noxsq has a marginal probability of 0.16. Recall that
the SEM and SAC models were judged to be most appropriate for this data
set.
It is especially interesting that none of the other variables in the models
change in significance or magnitude by very much. This is as we would
expect, given the small number (16) of censored observations in a relatively
large sample (506).
5.6
Chapter Summary
A Gibbs sampling approach to estimating heteroscedastic spatial autoregressive and spatial error probit and tobit models was presented. With the
211
5.7
References
Albert, James H. and Siddhartha Chib (1993), Bayesian Analysis of
Binary and Polychotomous Response Data, Journal of the American
Statistical Association, Volume 88, number 422, pp. 669-679.
Anselin, Luc (1988). Spatial Econometrics: Methods and Models,
212
Chapter 6
6.1
VAR models
213
y1t
y2t
..
.
ynt
. . . A1n (`)
..
..
.
.
A11 (`)
..
=
.
y1t
y2t
..
.
ynt
C1
C2
..
.
Cn
1t
2t
..
.
214
(6.1)
nt
The VAR model posits a set of relationships between past lagged values
of all variables in the model and the current value of each variable in the
model. For example, if the yit represent employment in state i at time
t, the VAR model structure allows employment variation in each state to
be explained by past employment variation in the state itself, yitk , k =
1, . . . , m as well as past employment variation in other states, yjtk , k =
1, . . . , m, j 6= i. This is attractive since regional or state differences in
business cycle activity suggest lead/lag relationships in employment of the
type set forth by the VAR model structure.
The model is estimated using ordinary least-squares, so we can draw on
our ols routine from the Econometrics Toolbox. A function var produces
estimates for the coefficients in the VAR model as well as related regression
statistics and Granger-causality test statistics.
The documentation for the var function is:
PURPOSE: performs vector autoregressive estimation
--------------------------------------------------USAGE: result = var(y,nlag,x)
where:
y
= an (nobs x neqs) matrix of y-vectors
nlag = the lag length
x
= optional matrix of variables (nobs x nx)
NOTE:
constant vector automatically included
--------------------------------------------------RETURNS a structure
results.meth = var
results.nobs = nobs, # of observations
results.neqs = neqs, # of equations
results.nlag = nlag, # of lags
results.nvar = nlag*neqs+nx+1, # of variables per equation
--- the following are referenced by equation # --results(eq).beta = bhat for equation eq
results(eq).tstat = t-statistics
results(eq).tprob = t-probabilities
results(eq).resid = residuals
results(eq).yhat = predicted values
results(eq).y
= actual values
results(eq).sige = ee/(n-k)
results(eq).rsqr = r-squared
215
This function utilizes a new aspect of MATLAB structure variables, arrays that can store information for each equation in the VAR model. Estimates of the parameters for the first equation can be accessed from
the results structure using: result(1).beta as can other results that are
equation-specific.
In most applications, the user would simply pass the results structure
on to the prt function that will provide an organized printout of the regression results for each equation. Here is an example of a typical program to
estimate a VAR model.
% ----- Example 6.1 Using the var() function
dates = cal(1982,1,12);
load test.dat;
% monthly mining employment in 8 states
y = growthr(test(:,1:2),12); % (use only two states) convert to growth-rates
yt = trimr(y,dates.freq,0); % truncate to account for lags in growth-rates
dates = cal(1983,1,1);
% redefine calendar for truncation
vnames
= strvcat(illinos,indiana);
nlag = 2;
result = var(yt,nlag);
% estimate 2-lag VAR model
prt(result,vnames);
% printout results
F-value
363.613553
7.422536
216
Probability
0.000000
0.000837
Dependent Variable =
indiana
R-squared
=
0.8236
Rbar-squared =
0.8191
sige
=
6.5582
Q-statistic
=
0.0392
Nobs, Nvars
=
159,
5
******************************************************************
Variable
Coefficient
t-statistic
t-probability
illinos lag1
0.258853
2.334597
0.020856
illinos lag2
-0.195181
-1.795376
0.074555
indiana lag1
0.882544
10.493894
0.000000
indiana lag2
-0.029384
-0.349217
0.727403
constant
-0.129405
-0.487170
0.626830
****** Granger Causality Tests *******
Variable
F-value
Probability
illinos
2.988892
0.053272
indiana
170.063761
0.000000
There are two utility functions that help analyze VAR model Grangercausality output. The first is pgranger, which prints a matrix of the
marginal probabilities associated with the Granger-causality tests in a convenient format for the purpose of inference. The documentation is:
PURPOSE: prints VAR model Granger-causality results
-------------------------------------------------USAGE: pgranger(results,varargin);
where: results = a structure returned by var(), ecm()
varargin = a variable input list containing
vnames = an optional variable name vector
cutoff = probability cutoff used when printing
usage example 1: pgranger(result,0.05);
example 2: pgranger(result,vnames);
example 3: pgranger(result,vnames,0.01);
example 4: pgranger(result,0.05,vnames);
---------------------------------------------------e.g. cutoff = 0.05 would only print
marginal probabilities < 0.05
--------------------------------------------------NOTES: constant term is added automatically to vnames list
you need only enter VAR variable names plus deterministic
---------------------------------------------------
As example of using this function, consider our previous program to estimate the VAR model for monthly mining employment in eight states. Rather
217
than print out the detailed VAR estimation results, we might be interested
in drawing inferences regarding Granger-causality from the marginal probabilities. The following program would produce a printout of just these probabilities. It utilizes an option to suppress printing of probabilities greater
than 0.1, so that our inferences would be drawn on the basis of a 90% confidence level.
% ----- Example 6.2 Using the pgranger() function
dates = cal(1982,1,12);
% monthly data starts in 82,1
load test.dat;
% monthly mining employment in 8 states
y = growthr(test,12);
% convert to growth-rates
yt = trimr(y,dates.freq,0); % truncate
dates = cal(1983,1,1);
% redefine the calendar for truncation
vname = strvcat(il,in,ky,mi,oh,pa,tn,wv);
nlag = 12;
res = var(yt,nlag);
% estimate 12-lag VAR model
cutoff = 0.1;
% examine Granger-causality at 90% level
pgranger(res,vname,cutoff); % print Granger probabilities
pa
0.04
NaN
NaN
NaN
0.01
0.00
0.09
0.00
tn
0.09
NaN
0.07
NaN
NaN
NaN
0.00
NaN
wv
0.02
NaN
NaN
NaN
0.01
0.06
NaN
0.03
The format of the output is such that the columns reflect the Grangercausal impact of the column-variable on the row-variable. That is, Indiana,
Kentucky, Pennsylvania, Tennessee and West Virginia exert a significant
Granger-causal impact on Illinois employment whereas Michigan and Ohio
do not. Indiana exerts the most impact, affecting Illinois, Michigan, Ohio,
Tennessee, and West Virginia.
The second utility is a function pftest that prints just the Grangercausality joint F-tests from the VAR model. Use of this function is similar
to pgranger, we simply call the function with the results structure returned
by the var function, e.g., pftest(result,vnames), where the vnames argument is an optional string-vector of variable names. This function would
218
produce the following output for each equation of a VAR model based on
all eight states:
****** Granger Causality Tests *******
Equation illinois
F-value
illinois
395.4534
indiana
3.3255
kentucky
0.4467
michigan
0.6740
ohio
2.9820
pennsylvania
6.6383
tennessee
0.9823
west virginia
3.0467
F-probability
0.0000
0.0386
0.6406
0.5112
0.0536
0.0017
0.3768
0.0504
A handy option on the prt function is the ability to print the VAR model
estimation results to an output file. Because these results are quite large,
they can be difficult to examine in the MATLAB command window.
In addition to the prt function, there is plt that produce graphs of the
actual versus predicted and residuals for these models.
One final issue associated with specifying a VAR model is the lag length
to employ. A commonly used approach to determining the lag length is to
219
perform statistical tests of models with longer lags versus shorter lag lengths.
We view the longer lag models as an unrestricted model versus the restricted
shorter lag version of the model, and construct a likelihood ratio statistic
to test for the significance of imposing the restrictions. If the restrictions
are associated with a statistically significant degradation in model fit, we
conclude that the longer lag length model is more appropriate, rejecting the
shorter lag model.
Specifically, the chi-squared distributed test statistic which has degrees
of freedom equation to the number of restrictions imposed is:
LR = (T c)(log|r | log|u |)
(6.2)
with Sims
statistic
statistic
statistic
statistic
statistic
statistic
correction
=
75.6240,
=
89.9364,
=
91.7983,
=
108.8114,
=
125.7240,
=
114.2624,
probability
probability
probability
probability
probability
probability
=
=
=
=
=
=
0.1517
0.01798
0.01294
0.0004052
6.573e-06
0.0001146
6
5
4
5, LR statistic =
4, LR statistic =
3, LR statistic =
220
6.2
(6.3)
(6.4)
221
222
This would be used to test a time-series vector for I(1) or I(0) status.
Allowance is made for polynomial time trends as well as constant terms in
the function and a set of critical values are returned in a structure by the
function. A function prt coint (as well as prt) can be used to print output
from adf, cadf and johansen, saving users the work of formatting and
printing the result structure output.
The function cadf is used for the case of two variables, yt , xt, where we
wish to test whether the condition yt = xt can be interpreted as an equilibrium relationship between the two series. The function documentation
is:
PURPOSE: compute augmented Dickey-Fuller statistic for residuals
from a cointegrating regression, allowing for deterministic
polynomial trends
-----------------------------------------------------------USAGE: results = cadf(y,x,p,nlag)
where: y = dependent variable time-series vector
x = explanatory variables matrix
p = order of time polynomial in the null-hypothesis
p = -1, no deterministic part
p = 0, for constant term
p = 1, for constant plus time-trend
p > 1, for higher order polynomial
nlag = # of lagged changes of the residuals to include in regression
-----------------------------------------------------------RETURNS: results structure
results.meth = cadf
results.alpha = autoregressive parameter estimate
results.adf
= ADF t-statistic
results.crit = (6 x 1) vector of critical values
[1% 5% 10% 90% 95% 99%] quintiles
results.nvar = cols(x)
results.nlag = nlag
223
load test.dat;
y = test(:,1:2); % use only two series
vnames = strvcat(illinois,indiana);
% test Illinois for I(1) status
nlags = 6;
for i=1:nlags;
res = adf(y(:,1),0,i);
prt(res,vnames(1,:));
end;
% test Indiana for I(1) status
nlags = 6;
for i=1:nlags;
res = adf(y(:,2),0,i);
prt(res,vnames(2,:));
end;
% test if Illinois and Indiana are co-integrated
for i=1:nlags;
res = cadf(y(:,1),y(:,2),0,i);
prt(res,vnames);
end;
illinois
indiana
illinois,indiana
# of lags
6
5% Crit Value
-3.404
224
AR(1) estimate
-0.062974
10% Crit Value
-3.089
We see from the adf function results that both Illinois and Indiana are
I(1) variables. We reject the augmented Dickey-Fuller hypothesis of I(0)
because our t-statistics for both Illinois and Indiana are less than (in absolute
value terms) the critical value of -2.588 at the 90% level.
From the results of cadf we find that Illinois and Indiana mining employment are not co-integrated, again because the t-statistic of -1.67 does not
exceed the 90% critical value of -3.08 (in absolute value terms). We would
conclude that an EC model is not appropriate for these two time-series.
For most EC models, more than two variables are involved so the Engle
and Granger two-step procedure needs to be generalized. Johansen (1988)
provides this generalization which takes the form of a likelihood-ratio test.
We implement this test in the function johansen. The Johansen procedure
provides a test statistic for determining r, the number of co-integrating
relationships between the n variables in yt as well as a set of r co-integrating
vectors that can be used to construct error correction variables for the EC
model.
As a brief motivation for the work carried out by the johansen function,
we start with a reparameterization of the EC model:
yt = 1 yt1 + . . . + k1 ytk+1 ytk + t
(6.5)
225
our EC model. In practice, the function ecm does this for you, so you need
not worry about the details.
The documentation for johansen is shown below. A few points to note,
the published literature contains critical values for the trace statistic for
VAR models with up to 12 variables in Johansen (1995), and for the maximal
eigenvalue statistic, Johansen and Juselius (1988) present critical values for
VAR models containing up to 5 variables. To extend the number of variables
for which critical values are available, a procedure by MacKinnon (1996) was
used to generate critical values for both the trace and maximal eigenvalue
statistics for models with up to 12 variables. MacKinnons method is an
approximation, but it produces values close to those in Johansen (1995). The
critical values for the trace statistic have been entered in a function c sjt
and those for the maximal eigenvalue statistic are in c sja. The function
johansen calls these two functions to obtain the necessary critical values. In
cases where the VAR model has more than 12 variables, zeros are returned
as critical values in the structure field result.cvt for the trace statistic and
the result.cvm field for the maximal eigenvalue.
Another less serious limitation is that the critical values for these statistics are only available for trend transformations where 1 p 1. This
should not present a problem in most applications where p will take on
values of -1, 0 or 1.
PURPOSE: perform Johansen cointegration tests
------------------------------------------------------USAGE: result = johansen(x,p,k)
where:
x = input matrix of time-series in levels, (nobs x m)
p = order of time polynomial in the null-hypothesis
p = -1, no deterministic part
p = 0, for constant term
p = 1, for constant plus time-trend
p > 1, for higher order polynomial
k = number of lagged difference terms used when
computing the estimator
------------------------------------------------------RETURNS: a results structure:
result.eig = eigenvalues (m x 1)
result.evec = eigenvectors (m x m), where first
r columns are normalized coint vectors
result.lr1 = likelihood ratio trace statistic for r=0 to m-1
(m x 1) vector
result.lr2 = maximum eigenvalue statistic for r=0 to m-1
(m x 1) vector
result.cvt = critical values for trace statistic
(m x 3) vector [90% 95% 99%]
226
The johansen function is called with the y matrix of time-series variables for the eight states, a value of p = 0 indicating we have a constant
term in the model, and 9 lags. (We want p = 0 because the constant term
is necessary where the levels of employment in the states differ.) The lag of
9 was determined to be optimal using the lrratio function in the previous
section.
The johansen function will return results for a sequence of tests against
alternative numbers of co-integrating relationships ranging from r 0 up
to r m 1, where m is the number of variables in the matrix y.
The function prt provides a printout of the trace and maximal eigenvalue
statistics as well as the critical values returned in the johansen results
structure.
Johansen MLE estimates
NULL:
Trace Statistic Crit 90%
r <= 0
illinos
307.689
153.634
r <= 1
indiana
205.384
120.367
r <= 2
kentucky
129.133
91.109
r <= 3
ohio
83.310
65.820
r <= 4
pennsylvania
52.520
44.493
r <= 5
tennessee
30.200
27.067
r <= 6
west virginia
13.842
13.429
Crit 95%
159.529
125.618
95.754
69.819
47.855
29.796
15.494
Crit 99%
171.090
135.982
104.964
77.820
54.681
35.463
19.935
michigan
NULL:
r <= 0
r <= 1
r <= 2
r <= 3
r <= 4
r <= 5
r <= 6
r <= 7
illinos
indiana
kentucky
ohio
pennsylvania
tennessee
west virginia
michigan
0.412
227
2.705
3.841
6.635
Crit 95%
52.362
46.230
40.076
33.878
27.586
21.131
14.264
3.841
Crit 99%
58.663
52.307
45.866
39.369
32.717
25.865
18.520
6.635
The printout does not present the eigenvalues and eigenvectors, but they
are available in the results structure returned by johansen as they are
needed to form the co-integrating variables for the EC model. The focus of
co-integration testing would be the trace and maximal eigenvalue statistics
along with the critical values. For this example, we find: (using the 95%
level of significance) the trace statistic rejects r 0 because the statistic of
307.689 is greater than the critical value of 159.529; it also rejects r 1,
r 2, r 3, r 4, and r 5 because these trace statistics exceed the
associated critical values; for r 6 we cannot reject H0, so we conclude that
r = 6. Note that using the 99% level, we would conclude r = 4 as the trace
statistic of 52.520 associated with r 4 does not exceed the 99% critical
value of 54.681.
We find a different inference using the maximal eigenvalue statistic. This
statistic allows us to reject r 0 as well as r 1 and r 2 at the 95% level.
We cannot reject r 3, because the maximal eigenvalue statistic of 30.791
does not exceed the critical value of 33.878 associated with the 95% level.
This would lead to the inference that r = 3, in contrast to r = 6 indicated
by the trace statistic. Using similar reasoning at the 99% level, we would
infer r = 2 from the maximal eigenvalue statistics.
After the johansen test determines the number of co-integrating relationships, we can use these results along with the eigenvectors returned by
the johansen function, to form a set of error correction variables. These are
constructed using yt1 , (the levels of y lagged one period) multiplied by the
r eigenvectors associated with the co-integrating relationships to form r cointegrating variables. This is carried out by the ecm function, documented
below.
PURPOSE: performs error correction model estimation
--------------------------------------------------USAGE: result = ecm(y,nlag,r)
where:
y
= an (nobs x neqs) matrix of y-vectors in levels
228
229
The printed output is shown below for a single state indicating the presence of two co-integrating relationships involving the states of Illinois and
Indiana. The estimates for the error correction variables are labeled as such
in the printout. Granger causality tests are printed, and these would form
the basis for valid causality inferences in the case where co-integrating relationships existed among the variables in the VAR model.
Dependent Variable =
wv
R-squared
=
0.1975
Rbar-squared =
0.1018
sige
= 341.6896
Nobs, Nvars
=
170,
19
******************************************************************
Variable
Coefficient
t-statistic
t-probability
il lag1
0.141055
0.261353
0.794176
il lag2
0.234429
0.445400
0.656669
in lag1
1.630666
1.517740
0.131171
in lag2
-1.647557
-1.455714
0.147548
ky lag1
0.378668
1.350430
0.178899
ky lag2
0.176312
0.631297
0.528801
mi lag1
0.053280
0.142198
0.887113
mi lag2
0.273078
0.725186
0.469460
oh lag1
-0.810631
-1.449055
0.149396
oh lag2
0.464429
0.882730
0.378785
pa lag1
-0.597630
-2.158357
0.032480
pa lag2
-0.011435
-0.038014
0.969727
tn lag1
-0.049296
-0.045237
0.963978
tn lag2
0.666889
0.618039
0.537480
wv lag1
-0.004150
-0.033183
0.973572
wv lag2
-0.112727
-0.921061
0.358488
230
0.129886
0.105129
0.653052
Crit 99%
171.090
135.982
104.964
77.820
54.681
35.463
19.935
6.635
Crit 99%
58.663
52.307
45.866
39.369
32.717
25.865
18.520
6.635
The results indicate that given the two lag model, two co-integrating
relationships were found leading to the inclusion of two error correction
variables in the model. The co-integrating relationships are based on the
trace statistics compared to the critical values at the 95% level. From the
trace statistics in the printed output we see that, H0: r 2 was rejected
at the 95% level because the trace statistic of 90.363 is less than the associated critical value of 95.754. Keep in mind that the user has the option
of specifying the number of co-integrating relations to be used in the ecm
function as an optional argument. If you wish to work at the 90% level of
significance, we would conclude from the johansen results that r = 4 cointegrating relationships exist. To estimate an ecm model based on r = 4
we need simply call the ecm function with:
% estimate the model, using 4 co-integrating vectors
231
result = ecm(y,nlag,4);
6.3
Bayesian variants
N (0, 2j )
(6.6)
where i denotes the coefficients associated with the lagged dependent variable in each equation of the VAR and j represents any other coefficient.
The prior means for lagged dependent variables are set to unity in belief
that these are important explanatory variables. On the other hand, a prior
mean of zero is assigned to all other coefficients in the equation, j in (6.6),
indicating that these variables are viewed as less important in the model.
The prior variances, 2i , specify uncertainty about the prior means i =
1, and 2j indicates uncertainty regarding the means j = 0. Because
the VAR model contains a large number of parameters, Doan, Litterman
and Sims (1984) suggested a formula to generate the standard deviations
as a function of a small number of hyperparameters: , and a weighting
matrix w(i, j). This approach allows a practitioner to specify individual
prior variances for a large number of coefficients in the model using only a
few parameters that are labeled hyperparameters. The specification of the
standard deviation of the prior imposed on variable j in equation i at lag k
is:
ijk = w(i, j)k
uj
ui
(6.7)
232
where
ui is the estimated standard error from a univariate autoregression
involving variable i, so that (
uj /
ui ) is a scaling factor that adjusts for varying magnitudes of the variables across equations i and j. Doan, Litterman
and Sims (1984) labeled the parameter as overall tightness, reflecting the
standard deviation of the prior on the first lag of the dependent variable.
The term k is a lag decay function with 0 1 reflecting the decay
rate, a shrinkage of the standard deviation with increasing lag length. This
has the effect of imposing the prior means of zero more tightly as the lag
length increases, based on the belief that more distant lags represent less
important variables in the model. The function w(i, j) specifies the tightness of the prior for variable j in equation i relative to the tightness of the
own-lags of variable i in equation i.
The overall tightness and lag decay hyperparameters used in the standard Minnesota prior have values = 0.1, = 1.0. The weighting matrix
used is:
W=
1 0.5 . . . 0.5
0.5 1
0.5
..
..
..
. .
.
0.5 0.5 . . . 1
(6.8)
This weighting matrix imposes i = 1 loosely, because the lagged dependent variable in each equation is felt to be an important variable. The
weighting matrix also imposes the prior mean of zero for coefficients on
other variables in each equation more tightly since the j coefficients are
associated with variables considered less important in the model.
A function bvar will provide estimates for this model. The function
documentation is:
PURPOSE: Performs a Bayesian vector autoregression of order n
--------------------------------------------------USAGE: result = bvar(y,nlag,tight,weight,decay,x)
where:
y
= an (nobs x neqs) matrix of y-vectors
nlag = the lag length
tight = Littermans tightness hyperparameter
weight = Littermans weight (matrix or scalar)
decay = Littermans lag decay = lag^(-decay)
x
= an optional (nobs x nx) matrix of variables
NOTE: constant vector automatically included
--------------------------------------------------RETURNS: a structure:
results.meth
= bvar
233
results.nobs
= nobs, # of observations
results.neqs
= neqs, # of equations
results.nlag
= nlag, # of lags
results.nvar
= nlag*neqs+1+nx, # of variables per equation
results.tight
= overall tightness hyperparameter
results.weight
= weight scalar or matrix hyperparameter
results.decay
= lag decay hyperparameter
--- the following are referenced by equation # --results(eq).beta = bhat for equation eq
results(eq).tstat = t-statistics
results(eq).tprob = t-probabilities
results(eq).resid = residuals
results(eq).yhat = predicted values
results(eq).y
= actual values
results(eq).sige = ee/(n-k)
results(eq).rsqr = r-squared
results(eq).rbar = r-squared adjusted
--------------------------------------------------SEE ALSO: bvarf, var, ecm, rvar, plt_var, prt_var
---------------------------------------------------
The printout shows the hyperparameter values associated with the prior.
It does not provide Granger-causality test results as these are invalid given
the Bayesian prior applied to the model. Results for a single equation of the
mining employment example are shown below.
***** Bayesian Vector Autoregressive Model *****
*****
Minnesota type Prior
*****
234
0.50
Dependent Variable =
il
R-squared
=
0.9942
Rbar-squared =
0.9936
sige
=
12.8634
Nobs, Nvars
=
171,
17
******************************************************************
Variable
Coefficient
t-statistic
t-probability
il lag1
1.134855
11.535932
0.000000
il lag2
-0.161258
-1.677089
0.095363
in lag1
0.390429
1.880834
0.061705
in lag2
-0.503872
-2.596937
0.010230
ky lag1
0.049429
0.898347
0.370271
ky lag2
-0.026436
-0.515639
0.606776
mi lag1
-0.037327
-0.497504
0.619476
mi lag2
-0.026391
-0.377058
0.706601
oh lag1
-0.159669
-1.673863
0.095996
oh lag2
0.191425
2.063498
0.040585
pa lag1
0.179610
3.524719
0.000545
pa lag2
-0.122678
-2.520538
0.012639
tn lag1
0.156344
0.773333
0.440399
tn lag2
-0.288358
-1.437796
0.152330
wv lag1
-0.046808
-2.072769
0.039703
wv lag2
0.014753
0.681126
0.496719
constant
9.454700
2.275103
0.024149
There exists a number of attempts to alter the fact that the Minnesota
prior treats all variables in the VAR model except the lagged dependent
variable in an identical fashion. Some of the modifications suggested have
focused entirely on alternative specifications for the prior variance. Usually,
this involves a different (non-symmetric) weight matrix W and a larger value
of 0.2 for the overall tightness hyperparameter in place of the value = 0.1
used in the Minnesota prior. The larger overall tightness hyperparameter
setting allows for more influence from other variables in the model. For
example, LeSage and Pan (1995) constructed a weight matrix based on
first-order spatial contiguity to emphasize variables from neighboring states
in a multi-state agricultural output forecasting model. LeSage and Magura
(1991) employed interindustry input-output weights to place more emphasis
on related industries in a multi-industry employment forecasting model.
These approaches can be implemented using the bvar function by constructing an appropriate weight matrix. For example, the first order conti-
235
guity structure for the eight states in our mining employment example can
be converted to a set of prior weights by placing values of unity on the main
diagonal of the weight matrix, and in positions that represent contiguous
entities. An example is shown in (6.9), where row 1 of the weight matrix
is associated with the time-series for the state of Illinois. We place a value
of unity on the main diagonal to indicate that autoregressive values from
Illinois are considered important variables. We also place values of one in
columns 2 and 3, reflecting the fact that Indiana (variable 2) and Kentucky
(variable 3) are states that have borders touching Illinois. For other states
that are not neighbors to Illinois, we use a weight of 0.1 to downweight
their influence in the BVAR model equation for Illinois. A similar scheme
is used to specify weights for the other seven states based on neighbors and
non-neighbors.
W =
1.0
1.0
1.0
0.1
0.1
0.1
0.1
0.1
1.0
1.0
1.0
1.0
1.0
0.1
0.1
0.1
1.0
1.0
1.0
0.1
1.0
0.1
1.0
1.0
0.1
1.0
0.1
1.0
1.0
0.1
0.1
0.1
0.1
1.0
1.0
1.0
1.0
1.0
0.1
1.0
0.1
0.1
0.1
0.1
1.0
1.0
0.1
1.0
0.1
0.1
1.0
0.1
0.1
0.1
1.0
0.1
0.1
0.1
1.0
0.1
1.0
1.0
0.1
1.0
(6.9)
The intuition behind this set of weights is that we really dont believe
the prior means of zero placed on the coefficients for mining employment
in neighboring states. Rather, we believe these variables should exert an
important influence. To express our lack of faith in these prior means, we
assign a large prior variance to the zero prior means for these states by
increasing the weight values. This allows the coefficients for these timeseries variables to be determined by placing more emphasis on the sample
data and less emphasis on the prior.
This could of course be implemented using bvar with a weight matrix
specified, e.g.,
% ----- Example 6.9 Using bvar() with general weights
vnames = strvcat(il,in,ky,mi,oh,pa,tn,wv);
dates = cal(1982,1,12);
y = load(test.dat); % use all eight states
nlag = 2;
tight = 0.1;
decay = 1.0;
1.0
1.0
1.0
1.0
1.0
0.1
0.1
0.1
1.0
1.0
1.0
0.1
1.0
0.1
1.0
1.0
0.1
1.0
0.1
1.0
1.0
0.1
0.1
0.1
0.1
1.0
1.0
1.0
1.0
1.0
0.1
1.0
0.1
0.1
0.1
0.1
1.0
1.0
0.1
1.0
0.1
0.1
1.0
0.1
0.1
0.1
1.0
0.1
236
0.1
0.1
1.0
0.1
1.0
1.0
0.1
1.0];
result = bvar(y,nlag,tight,w,decay);
prt(result,vnames);
W =
0
1
1
0
0
1
0
1
0
0
1
1
1
1
0
0
0
0
0
1
0
0
0
1
1
237
(6.10)
C=
0
0.5 0.5
0
0
0.5
0
0.5
0
0
0.33 0.33 0.33 0
0
0
0
0.5
0 0.5
0
0
0
0.5 0.5
(6.11)
n
X
(6.12)
j=1
y1t
y2t
y3t
y4t
y5t
1
2
3
4
5
0.5y2t1 + 0.5y3t1
0.5y1t1 + 0.5y3t1
0.33y1t1 + 0.33y2t1 + 0.33y3t1
0.5y3t1 + 0.5y5t1
0.5y4t1 + 0.5y5t1
u1t
u2t
u3t
u4t
u5t
(6.13)
This suggests a prior mean for the VAR model coefficients on variables
associated with the first own-lag of important variables equal to 1/ci, where
ci is the number of important variables in each equation i of the model. In
the example shown in (6.13), the prior means for the first own-lag of the
important variables y2t1 and y3t1 in the y1t equation of the VAR would
equal 0.5. The prior means for unimportant variables, y1t1 , y4t1 and y5t1
in this equation would be zero.
238
This prior is quite different from the Minnesota prior in that it may
downweight the lagged dependent variable using a zero prior mean to discount the autoregressive influence of past values of this variable. In contrast,
the Minnesota prior emphasizes a random-walk with drift model that relies
on prior means centered on a model: yit = i + yit1 + uit , where the intercept term reflects drift in the random-walk model and is estimated using
a diffuse prior. The random-walk averaging prior is centered on a randomwalk model that averages over important variables in each equation of the
model and allows for drift as well. As in the case of the Minnesota prior,
the drift parameters i are estimated using a diffuse prior.
Consistent with the Minnesota prior, LeSage and Krivelyova use zero
as a prior mean for coefficients on all lags other than first lags. Litterman
(1986) motivates reliance on zero prior means for many of the parameters
of the VAR model by appealing to ridge regression. Recall, ridge regression
can be interpreted as a Bayesian model that specifies prior means of zero
for all coefficients, and as we saw in Chapter 3 can be used to overcome
collinearity problems in regression models.
One point to note about the random walk averaging approach to specifying prior means is that the time series for the variables in the model need
to be scaled or transformed to have similar magnitudes. If this is not the
case, it would make little sense to indicate that the value of a time series
observation at time t was equal to the average of values from important
related variables at time t 1. This should present no problem as time series data can always be expressed in percentage change form or annualized
growth rates which meets our requirement that the time series have similar
magnitudes.
The prior variances LeSage and Krivelyova specify for the parameters in
the model differ according to whether the coefficients are associated with
variables that are classified as important or unimportant as well as the lag
length. Like the Minnesota prior, they impose lag decay to reflect a prior
belief that time series observations from the more distant past exert a smaller
influence on the current value of the time series we are modeling. Viewing
variables in the model as important versus unimportant suggests that the
prior variance (uncertainty) specification should reflect the following ideas:
1. Parameters associated with unimportant variables should be assigned
a smaller prior variance, so the zero prior means are imposed more
tightly or with more certainty.
2. First own-lags of important variables are given a smaller prior variance,
239
so the prior means force averaging over the first own-lags of important
variables.
3. Parameters associated with unimportant variables at lags greater than
one will be given a prior variance that becomes smaller as the lag length
increases to reflect our belief that influence decays with time.
4. Parameters associated with lags other than first own-lag of important
variables will have a larger prior variance, so the prior means of zero
are imposed loosely. This is motivated by the fact that we dont
really have a great deal of confidence in the zero prior mean specification for longer lags of important variables. We think they should exert
some influence, making the prior mean of zero somewhat inappropriate. We still impose lag decay on longer lags of important variables
by decreasing our prior variance with increasing lag length. This reflects the idea that influence decays over time for important as well as
unimportant variables.
It should be noted that the prior relies on inappropriate zero prior means
for the important variables at lags greater than one for two reasons. First, it
is difficult to specify a reasonable alternative prior mean for these variables
that would have universal applicability in a large number of VAR model
applications. The difficulty of assigning meaningful prior means that have
universal appeal is most likely the reason that past studies relied on the
Minnesota prior means while simply altering the prior variances. A prior
mean that averages over previous period values of the important variables
has universal appeal and widespread applicability in VAR modeling. The
second motivation for relying on inappropriate zero prior means for longer
lags of the important variables is that overparameterization and collinearity
problems that plague the VAR model are best overcome by relying on a
parsimonious representation. Zero prior means for the majority of the large
number of coefficients in the VAR model are consistent with this goal of parsimony and have been demonstrated to produce improved forecast accuracy
in a wide variety of applications of the Minnesota prior.
A flexible form with which to state prior standard deviations for variable
j in equation i at lag length k is shown in (6.14).
(aijk ) = N (1/ci, c), j C,
k = 1,
i, j = 1, . . . , n
(aijk ) = N (0, c/k), j C,
k = 2, . . ., m, i, j = 1, . . . , n
(aijk ) = N (0, c/k), j C, k = 1, . . ., m, i, j = 1, . . . , n
(6.14)
240
where:
0 < c < 1
(6.15)
> 1
(6.16)
0< < 1
(6.17)
241
0.5
mean
upper
lower
0.4
0.3
0.2
0.1
-0.1
-0.2
-0.3
-0.4
-0.5
6
lag length
10
12
242
0.2
mean
upper
lower
0.15
0.1
0.05
-0.05
-0.1
-0.15
-0.2
6
lag length
10
12
243
6.4
A set of forecasting functions are available that follow the format of the
var, bvar, rvar, ecm, becm, recm functions named varf, bvarf, rvarf,
244
ecmf, becmf, recmf. These functions all produce forecasts of the timeseries levels to simplify accuracy analysis and forecast comparison from the
alternative models. They all take time-series levels arguments as inputs and
carry out the necessary transformations. As an example, the varf documentation is:
PURPOSE: estimates a vector autoregression of order n
and produces f-step-ahead forecasts
------------------------------------------------------------USAGE:yfor = varf(y,nlag,nfor,begf,x,transf)
where:
y
= an (nobs * neqs) matrix of y-vectors in levels
nlag = the lag length
nfor = the forecast horizon
begf = the beginning date of the forecast
(defaults to length(x) + 1)
x
= an optional vector or matrix of deterministic
variables (not affected by data transformation)
transf = 0, no data transformation
= 1, 1st differences used to estimate the model
= freq, seasonal differences used to estimate
= cal-structure, growth rates used to estimate
e.g., cal(1982,1,12) [see cal() function]
------------------------------------------------------------NOTE: constant term included automatically
------------------------------------------------------------RETURNS:
yfor = an nfor x neqs matrix of level forecasts for each equation
-------------------------------------------------------------
Note that you input the variables y in levels form, indicate any of four
data transformations that will be used when estimating the VAR model,
and the function varf will carry out this transformation, produce estimates
and forecasts that are converted back to levels form. This greatly simplifies
the task of producing and comparing forecasts based on alternative data
transformations.
For the case of a growth rates transformation a calendar structure variable is required. These are produced with the cal function that is part of the
Econometrics Toolbox discussed in Chapter 3 of the manual. In brief, using
cstruct = cal(1982,1,12) would set up the necessary calendar structure if
the data being used were monthly time series that began in 1982.
Of course, if you desire a transformation other than the four provided,
such as logs, you can transform the variables y prior to calling the function
and specify transf=0. In this case, the function does not provide levels
forecasts, but rather forecasts of the logged-levels will be returned. Setting
245
transf=0, produces estimates based on the data input and returns forecasts
based on this data.
As an example of comparing alternative VAR model forecasts based on
two of the four alternative transformations, consider the program in example
6.11.
% ----- Example 6.11 Forecasting VAR models
y = load(test.dat); % a test data set containing
% monthly mining employment for
% il,in,ky,mi,oh,pa,tn,wv
dates = cal(1982,1,12); % data covers 1982,1 to 1996,5
nfor = 12; % number of forecast periods
nlag = 6; % number of lags in var-model
begf = ical(1995,1,dates); % beginning forecast period
endf = ical(1995,12,dates); % ending forecast period
% no data transformation example
fcast1 = varf(y,nlag,nfor,begf);
% seasonal differences data transformation example
freq = 12; % set frequency of the data to monthly
fcast2 = varf(y,nlag,nfor,begf,[],freq);
% compute percentage forecast errors
actual = y(begf:endf,:);
error1 = (actual-fcast1)./actual;
error2 = (actual-fcast2)./actual;
vnames = strvcat(il,in,ky,mi,oh,pa,tn,wv);
fdates = cal(1995,1,12);
fprintf(1,VAR model in levels percentage errors \n);
tsprint(error1*100,fdates,vnames,%7.2f);
fprintf(1,VAR - seasonally differenced data percentage errors \n);
tsprint(error2*100,fdates,vnames,%7.2f);
oh
-5.33
-7.56
-5.67
-5.18
-5.88
-4.65
-1.23
-2.49
-4.33
pa
-7.83
-8.28
-6.69
-5.41
-3.93
-4.15
-5.06
-5.41
-6.28
tn
-0.19
-0.99
2.26
2.14
2.77
2.90
3.44
3.63
3.38
wv
-0.65
0.38
2.30
0.17
-1.11
-2.44
-3.67
-3.59
-4.04
-11.11
-11.79
-12.10
VAR - seasonally
Date
il
Jan95
-6.53
Feb95
-4.35
Mar95
-1.12
Apr95
-0.38
May95
0.98
Jun95
-0.73
Jul95
-1.41
Aug95
-3.36
Sep95
-3.19
Oct95
-2.74
Nov95
-2.47
Dec95
-1.35
-3.23
-4.30
-5.56
-12.10
-11.53
-11.12
-7.38
-8.93
-13.11
-4.74
-4.90
-5.57
246
-8.34
-7.27
-8.78
3.21
3.60
2.13
-5.57
-5.69
-9.38
tn
3.86
4.46
2.96
5.55
6.49
3.96
7.68
8.75
8.30
9.00
9.02
7.03
wv
0.05
2.56
3.97
2.73
-0.43
-1.44
-4.24
-3.38
-3.02
0.08
0.64
-3.92
247
pa
0.56
0.78
1.00
1.25
1.57
1.81
1.99
2.10
2.33
2.51
2.70
2.94
tn
0.88
1.06
1.16
1.35
1.53
1.64
1.76
1.89
1.99
2.12
2.27
2.23
wv
1.08
1.58
1.91
2.02
2.10
2.15
2.49
2.87
3.15
3.39
3.70
3.96
248
249
The program code stores the individual MAPE forecast errors in a structure variable using: err(cnt).rm2 = abs((actual-frm2)./actual);, which
will have fields for the errors from all five models. These fields are matrices
of dimension 12 x 12, containing MAPE errors for each of the 12-step-ahead
forecasts for time cnt and for each of the 12 industries. We are not really interested in these individual results, but present this as an illustration.
As part of the illustration, we show how to access the individual results to
compute the average MAPE errors for each horizon and industry. If you
wished to access industry number 2s forecast errors based on the model using r co-integrating relations, for the first experimental forecast period you
would use: err(1).rm(:,2). The results from our experiment are shown
below. These results represent an average over a total of 312 twelve-stepahead forecasts. Our simple MATLAB program produced a total of 224,640
forecasts, based on 312 twelve-step-ahead forecasts, for 12 industries, times
5 models!
Our experiment indicates that using more than the r co-integrating relationships determined by the Johansen likelihood trace statistic degrades
the forecast accuracy. This is clear from the large number of forecast error
ratios greater than unity for the two models based on r + 1 and r + 2 ver-
250
sus those from the model based on r. On the other hand, using a smaller
number of co-integrating relationships than indicated by the Johansen trace
statistic seems to improve forecast accuracy. In a large number of industries
at many of the twelve forecast horizons, we see comparison ratios less than
unity. Further, the forecast errors associated with r 2 are superior to those
from r 1, producing smaller comparison ratios in 9 of the 12 industries.
relative to
r
I2
I3
0.99 1.00
1.01 0.99
1.04 1.00
1.03 0.99
1.03 0.98
1.05 0.97
1.07 0.96
1.04 0.95
1.03 0.93
1.01 0.92
1.00 0.91
0.99 0.90
r
I2
I3
0.99 1.00
1.01 0.99
1.04 1.00
1.03 0.99
1.03 0.98
1.05 0.97
1.07 0.96
1.04 0.95
1.03 0.93
1.01 0.92
1.00 0.91
0.99 0.90
r
I2
I3
1.00 1.02
1.02 1.01
1.01 1.01
0.99 1.01
0.98 1.03
0.98 1.03
0.98 1.04
0.96 1.05
0.95 1.05
0.96 1.05
0.97 1.05
0.97 1.05
r
I2
I3
1.01 1.02
1.05 1.00
1.02 1.01
0.99 1.01
0.97 1.03
0.95 1.04
0.96 1.06
0.93 1.08
0.92 1.09
0.92 1.09
0.93 1.09
0.93 1.09
251
I5
1.00
1.03
1.03
1.05
1.03
1.01
0.99
0.98
0.97
0.96
0.95
0.94
I6
1.00
1.00
1.02
1.03
1.03
1.04
1.03
1.02
1.01
0.99
0.98
0.98
I7
1.01
1.02
1.01
1.02
1.04
1.04
1.05
1.04
1.02
1.01
1.01
0.99
I8
0.99
0.99
0.98
1.00
1.00
0.99
0.98
0.96
0.95
0.94
0.95
0.94
I9
1.00
1.01
0.99
0.97
0.97
0.97
0.97
0.96
0.95
0.94
0.94
0.93
I10
0.98
1.03
1.03
1.01
0.98
0.96
0.94
0.93
0.91
0.90
0.90
0.89
I11
0.97
0.99
1.00
1.00
1.02
1.03
1.03
1.03
1.01
0.99
0.99
0.98
I12
0.99
0.94
0.93
0.91
0.92
0.92
0.92
0.93
0.94
0.94
0.95
0.95
I4
1.01
0.96
0.94
0.94
0.94
0.94
0.93
0.93
0.92
0.91
0.91
0.91
I5
1.00
1.03
1.03
1.05
1.03
1.01
0.99
0.98
0.97
0.96
0.95
0.94
I6
1.00
1.00
1.02
1.03
1.03
1.04
1.03
1.02
1.01
0.99
0.98
0.98
I7
1.01
1.02
1.01
1.02
1.04
1.04
1.05
1.04
1.02
1.01
1.01
0.99
I8
0.99
0.99
0.98
1.00
1.00
0.99
0.98
0.96
0.95
0.94
0.95
0.94
I9
1.00
1.01
0.99
0.97
0.97
0.97
0.97
0.96
0.95
0.94
0.94
0.93
I10
0.98
1.03
1.03
1.01
0.98
0.96
0.94
0.93
0.91
0.90
0.90
0.89
I11
0.97
0.99
1.00
1.00
1.02
1.03
1.03
1.03
1.01
0.99
0.99
0.98
I12
0.99
0.94
0.93
0.91
0.92
0.92
0.92
0.93
0.94
0.94
0.95
0.95
I4
1.00
0.99
0.99
0.98
0.99
0.99
1.00
1.00
1.01
1.01
1.01
1.01
I5
1.00
0.99
1.00
1.01
1.01
1.01
1.01
1.02
1.02
1.02
1.02
1.02
I6
1.01
1.03
1.04
1.05
1.05
1.06
1.06
1.06
1.07
1.07
1.07
1.07
I7
1.01
1.00
1.00
1.01
1.01
1.00
1.00
0.99
0.99
0.98
0.98
0.98
I8
0.99
0.99
0.99
1.01
1.03
1.03
1.04
1.05
1.05
1.05
1.06
1.07
I9
1.00
0.99
0.98
0.97
0.97
0.97
0.97
0.97
0.96
0.96
0.95
0.95
I10
1.01
1.05
1.07
1.08
1.08
1.07
1.08
1.06
1.05
1.04
1.05
1.05
I11
1.02
1.03
1.03
1.04
1.04
1.04
1.04
1.04
1.04
1.04
1.04
1.04
I12
1.01
1.04
1.04
1.03
1.04
1.04
1.04
1.04
1.04
1.03
1.03
1.03
I4
1.01
0.97
0.96
0.97
0.98
0.99
1.00
0.99
0.99
0.99
0.99
0.99
I5
0.99
1.00
1.01
1.02
1.04
1.04
1.05
1.05
1.06
1.05
1.05
1.05
I6
1.01
1.03
1.06
1.07
1.08
1.10
1.10
1.10
1.11
1.11
1.12
1.12
I7
1.01
1.01
1.02
1.02
1.02
1.02
1.01
1.00
0.99
0.98
0.98
0.97
I8
0.99
1.00
1.02
1.04
1.07
1.08
1.09
1.10
1.11
1.11
1.13
1.13
I9
0.99
0.99
0.98
0.97
0.97
0.97
0.96
0.95
0.95
0.94
0.94
0.94
I10
1.05
1.11
1.13
1.14
1.15
1.15
1.15
1.15
1.14
1.13
1.14
1.15
I11
1.03
1.03
1.04
1.05
1.05
1.06
1.06
1.07
1.08
1.08
1.09
1.09
I12
1.01
1.06
1.06
1.05
1.04
1.04
1.03
1.02
1.02
1.01
1.00
0.99
6.5
252
An applied exercise
To illustrate using vector autoregressive modeling in a regional data example, we create a model that links national and state employment models.
Monthly employment time series for ten national 2-digit industry categories
is in a Bayesian vector autoregressive model to estimate and forecast employment one-year ahead for the period 1994,1 to 1994,12. We then use
the historical national data as well as the 12 months of forecasted values
as deterministic variables in a regional employment Bayesian vector autoregressive model that models monthly employment for eight contiguous states
in the midwest.
Forecast errors are computed for the 12 month forecast and compared to
the errors made by a Bayesian vector autoregressive model that is identical
except for the inclusion of the national employment information as deterministic variables. This should provide a test of the forecasting value of
national information in the regional model.
Example 6.14 shows the program to carry out the forecasting experiment. The program estimates and forecasts the ten industry national model
using no data transformations. That is, the data are used in levels form.
For the regional model a first-difference transformation is used. This transformation will be applied to the vector autoregressive variables, but not the
deterministic national variables in the model.
% ----- Example 6.14 Linking national and regional models
% load 10 sic national manufacturing employment
% time-series covering the period 1947,1 to 1996,12
% from the MATLAB data file level.mat
dates = cal(1947,1,12);
load level.mat;
% the data is in a matrix variable named level
% which we used to create the file (see make_level.m)
% 20 Food and kindred products
% 21 Tobacco manufactures
% 22 Textile, fabrics, yarn and thread mills
% 23 Miscellaneous fabricated textile products
% 24 Lumber and wood products
% 25 Furniture and fixtures
% 26 Paper and allied products
% 27 Printing and publishing
% 28 Chemicals, plastics, and synthetic materials
% 29 Petroleum refining and related industries
% produce a 12-month-ahead national forecast using a bvar model
begf = ical(1994,1,dates);
begs = ical(1982,1,dates);
253
in
26502.0
26654.0
26905.9
27225.6
ky
15457.8
15525.7
15685.2
15862.7
mi
40274.7
40440.7
40752.1
41068.7
oh
49284.1
49548.7
50019.1
50528.2
pa
51025.3
51270.1
51634.6
52052.0
tn
23508.2
23676.3
24005.3
24206.0
wv
6568.9
6623.8
6698.2
6755.5
254
24407.6
24439.0
24293.8
24358.8
24568.3
24683.0
24794.5
24873.4
6889.0
6833.5
6845.9
6795.8
6800.8
6866.5
6884.3
6908.3
tn
23250.0
23469.0
23798.0
24009.0
24270.0
24317.0
24096.0
24315.0
24675.0
24622.0
25001.0
24940.0
wv
6414.0
6458.0
6554.0
6674.0
6894.0
6783.0
6807.0
6797.0
6842.0
6856.0
6983.0
6888.0
tn
-1.11
-0.88
-0.87
-0.82
-0.57
-0.50
-0.82
-0.18
0.43
-0.25
0.83
0.27
wv
-2.41
-2.57
-2.20
-1.22
0.07
-0.74
-0.57
0.02
0.60
-0.15
1.41
-0.29
tn
-1.12
-0.69
-0.25
0.32
0.97
1.33
1.01
1.54
2.14
1.59
2.77
wv
-2.06
-1.58
-0.74
0.92
2.82
2.41
2.69
2.90
3.45
2.66
4.23
1.87
1.97
1.95
2.88
3.07
1.47
255
2.26
2.63
256
A complication in the program results from the fact that the national
data sample begins in January, 1947 whereas the state data used in the regional model begins in January, 1982. This required the definitions begf1
and begf2 and begin date to pull out appropriate vectors of national variables for use in the regional model.
The mean absolute percentage forecast errors produced by the program
are shown below.
forecast MAPE
Horizon
il
step1
0.58
step2
0.75
step3
0.83
step4
0.95
step5
1.05
step6
1.15
step7
1.37
step8
1.46
step9
1.59
step10
1.74
step11
1.89
step12
2.06
forecast MAPE
Horizon
il
step1
0.54
step2
0.68
step3
0.74
step4
0.88
step5
0.86
step6
0.91
step7
0.95
step8
1.04
step9
1.14
step10
1.15
step11
1.18
step12
1.27
errors
in
0.47
0.64
0.83
0.99
1.13
1.35
1.49
1.63
1.77
1.90
2.02
2.07
errors
in
0.48
0.66
0.86
1.02
1.14
1.20
1.25
1.27
1.35
1.43
1.53
1.59
tn
0.47
0.64
0.84
0.99
1.24
1.44
1.69
1.82
1.96
2.13
2.20
2.35
wv
1.02
1.34
1.49
1.88
2.12
2.34
2.62
2.75
2.97
3.02
3.19
3.48
tn
0.47
0.59
0.76
0.93
1.10
1.19
1.30
1.44
1.50
1.62
1.74
1.95
wv
1.00
1.22
1.36
1.61
1.64
1.63
1.67
1.76
1.71
1.59
1.56
1.65
257
From these results we see the expected pattern where longer horizon forecasts exhibit larger mean absolute percentage errors. The results from our
single forecast experiment are not consistent with those from this experiment
involving sequential forecast over a period of five years. The inclusion of national variables (and forecast) lead to less accurate forecasting performance
for our regional model. The decrease in forecast accuracy is particularly
noticable at the longer forecast horizons, which is probably indicative of
poor national forecasts.
As an exercise, you might examine the accuracy of the national forecasts
and try to improve on that model.
6.6
Chapter Summary
258
6.7
References
Dickey, David A., Dennis W. Jansen and Daniel L. Thornton. 1991.
A primer on cointegration with an application to money and income, Federal Reserve Bulletin, Federal Reserve Bank of St. Louis,
March/April, pp. 58-78.
Doan, Thomas, Robert. B. Litterman, and Christopher A. Sims. 1984.
Forecasting and conditional projections using realistic prior distributions, Econometric Reviews, Vol. 3, pp. 1-100.
Engle, Robert F. and Clive W.J. Granger. 1987. Co-integration
and error Correction: Representation, Estimation and Testing, Econoemtrica, Vol. 55, pp. 251-76.
Johansen, Soren. 1988. Statistical Analysis of Co-integration vectors, Journal of Economic Dynamics and Control, Vol. 12, pp. 231254.
Johansen, Soren. 1995. Likelihood-based Inference in Cointegrated
Vector autoregressive Models, Oxford: Oxford University Press.
Johansen, Soren and Katarina Juselius. 1990. Maximum likelihood
estimation and inference on cointegration - with applications to the
demand for money, Oxford Bulletin of Economics and Statistics, Vol.
52, pp. 169-210.
LeSage, James P. 1990. A Comparison of the Forecasting Ability of
ECM and VAR Models, Review of Economics and Statistics, Vol. 72,
pp. 664-671.
LeSage, James P. and Anna Krivelyova. 1997. A Spatial Prior for
Bayesian Vector Autoregressive Models, forthcoming in Journal of
Regional Science, also available at: www.econ.utoledo.edu,
LeSage, James P. and Michael Magura. 1991. Using interindustry
input-output relations as a Bayesian prior in employment forecasting
models, International Journal of Forecasting, Vol. 7, pp. 231-238.
259
260
261
used by olsar1
(likelihood)
used by bma_g
used by box_cox (likelihood)
used by box_cox2 (likelihood)
used by box_cox, box_cox2
computes chi-squared probabilities
used by mlogit
computes F-statistic probabilities
used by bma_g
Grunfelds data used by sur_d
documents Grunfelds data set
used by logit
(likelihood)
used by tobit
used by hwhite
used by mlogit
used by mlogit
used by probit_g
used by probit_g, tobit_g
used by probit, pr_like
used by prt_reg, probit
ols returning only residuals (used by sur)
plots everything
plots equation systems
plots regressions
used by probit
(likelihood)
prints everything
prints equation systems
262
263
264
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrated
cal function
fturns and plt
ical function
lprint function
lprintf function
mprint function
mprint3 function
sacf
spacf
tsdate function
tsprint function
some of the utility functions
265
- used by pairs
- plots turning points from fturns function
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
demonstrates
bkw
bpagan
cusums
dfbeta, plt_dfb, plt_dff
diagnose
rdiag
recresid
least-squares regression
plots everything
plots cusums test results
plots dfbetas
plots dffits
266
- varf demonstration
used by ecm,ecmf,becm,becmf,recm,recmf
does ordinary lags
likelihood ratio lag length tests
does var-type lags
does contiguous lags (used by rvar,rvarf,recm,recmf)
used for VAR estimation
prints Granger F-tests
prints Granger causality probabilities
prints results from all functions
used by prt_var for ecm,becm,recm
prints results of all var/bvar models
prints results of all Gibbs var/bvar models
used for RVARF forecasts
does univariate AR for BVAR
used for Gibbs sampling estimates and forecasts
used for BVAR forecasts
used for BVAR estimation
used by VARF,BVARF, johansen (in /util/trimr.m)
used by lrratio (vare uses /regress/olse.m)
267
268
demonstrates
demonstrates
demonstrates
demonstrates
apm
coda
momentg
raftery
beta(a,b) cdf
beta inverse (quantile)
beta(a,b) pdf
binomial(n,p) cdf
binomial inverse (quantile)
binomial pdf
chisquared(a,b) cdf
chi-inverse (quantile)
chisquared(a,b) pdf
269
demo
demo
demo
demo
demo
demo
demo
demo
demo
demo
demo
demo
demo
of
of
of
of
of
of
of
of
of
of
of
of
of
used by fdis_prb
used by fdis_prb
binomial coefficients
test and converts to common size
used by fdis_prb
test for scalar argument
Davidson-Fletcher-Powell
Fletcher-Reeves-Polak-Ribiere
general all-purpose optimization routine
Powell conjugate gradient
yet another general purpose optimization routine
270
271
evaluates hessian
line minimization routine (used by dfp, frpr, pow)
stepsize determination
used by optim1_d, optim2_d
updates hessian
272
used by sem_g
used by sar_g
used by sdm_g
used by sac_g
used by darp
used by darp
Pace and Barry 3,107 obs data set
Pace and Barry 1st order contiguity matrix
far model likelihood (concentrated)
sac model likelihood (concentrated)
sar model likelihood (concentrated)
sem model likelihood (concentrated)
sdm model likelihood (concentrated)
far model likelihood
sac model likelihood
sar model likelihood
sem model likelihood
semo model likelihood
sdm model likelihood
isotropic normalization of x-y coordinates
prints gwr_reg results structure
prints results from spatial models
used by gwr
Anselin (1988) 1st order contiguity matrix
273