Caballero Et Al., 2002

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CLIMATE RESEARCH

Vol. 21: 127–140, 2002 Published June 14


Clim Res

Long memory in surface air temperature:


detection, modeling, and application to
weather derivative valuation
Rodrigo Caballero1,*, Stephen Jewson 2, Anders Brix 2
1
Danish Center for Earth System Science, University of Copenhagen, Juliane Maries Vej 30, 2100 Copenhagen Ø, Denmark
2
Risk Management Solutions, London, United Kingdom

ABSTRACT: Three multidecadal daily time series of mid-latitude near-surface air temperature are
analysed. Long-range dependence can be detected in all 3 time series with 95% statistical signifi-
cance. It is shown that fractionally integrated time-series models can accurately and parsimoniously
reproduce the autocovariance structure of the observed data. The concept of weather derivatives is
introduced and problems surrounding their pricing are discussed. It is shown that the fractionally
integrated time-series models provide much more accurate pricing as compared with traditional
autoregressive models employing a similar number of parameters. Finally, it is suggested that a sim-
ple explanation for the presence of long memory in the time series may be given in terms of aggre-
gation of several short-memory processes.

KEY WORDS: Surface temperature · Long memory · Weather derivative


Resale or republication not permitted without written consent of the publisher

1. INTRODUCTION where x is the SAT anomaly, η is Gaussian white noise


with unit variance and γ and σ are constants. The
The daily-mean near-surface air temperature (SAT) white-noise term would represent the rapid excitation
measured at a given point fluctuates randomly from of the perturbations, while the first term on the right-
day to day. In the mid-latitudes, this is largely due to hand side assumes that the perturbations are linearly
the ceaseless passage of high- and low-pressure sys- damped with a time constant, γ –1. A reasonable value
tems, whose associated horizontal and vertical winds for this time scale may be obtained by noting that the
bring and remove heat, moisture and clouds to and average eddy kinetic energy of the atmosphere is
from the measurement point. These traveling pertur- around 106 J m–2, while the mean potential-to-kinetic
bations generally exhibit a characteristic life cycle fea- energy conversion rate, which must match the dissipa-
turing rapid initial growth fueled by baroclinic energy tion rate, is about 2 W m–2 (Peixoto & Ort 1992); thus
conversion followed by slower barotropic decay (Sim- γ–1 ≈ 5 × 105 s ≈ 6 d. For processes in discrete time,
mons & Hoskins 1978). On this basis, one might heuris- Eq. (1) may be discretised to give a first-order auto-
tically model SAT as a simple Ornstein-Uhlenbeck regressive or AR(1) process:
process:
xi = αx i−1 + εi (2)
d
x (t ) = γx (t ) + ση(t ) (1) with α = 1 − γ∆t, where ∆t is the discretisation interval
dt and εi is a Gaussian-white-noise process with variance
σ2∆t.
The power spectral density (S) of the process de-
*E-mail: rca@dcess.ku.dk scribed by Eq. (1) takes the form:

© Inter-Research 2002 · www.int-res.com


128 Clim Res 21: 127–140, 2002

σ2 1 a log-log plot at high frequencies. The crossover


S(ω) = (3) between these 2 regimes should be at ω ≈ γ, that is,
2π ω 2 + γ 2
around ω = 1⁄6 rad d–1. How successful is this predic-
where ω is the angular frequency. We thus expect the tion?
SAT spectrum to be almost white (constant) at low fre- A glance at some observed spectra (Fig. 1; see Sec-
quencies and red (negatively sloping) with slope −2 on tion 2 for a description of the data) shows partial suc-
cess. At high frequencies, the spectra are indeed well
approximated by a straight line of slope −2. Further, all
spectra show a shoulder or crossover region centered
around 6 d. However, the low-frequency region is not
white, as expected, but weakly red. There are no sta-
tistically significant peaks, and at frequencies below
10–1 rad d–1 the spectra are well fitted by a straight line
with a negative slope.
Processes showing the kind of power law low-fre-
quency behaviour suggested in Fig. 1 are collectively
referred to as having ‘long memory’. More precisely, a
stationary stochastic process is said to exhibit long
memory or long-range dependence if there exists a
real number, d ∈ (0, 1⁄2), known as the ‘intensity’ of the
long memory, such that:
S(ω) ≈ ω–2d as ω>0 (4)
(Beran 1994). If Eq. (4) holds, then the autocorrelation
function ρ(τ) will behave as:
ρ(τ) ≈ τ 2d –1 as τ→∞
where τ is the lag; that is, it will decay as a power law,
meaning that highly persistent serial correlations will
be present in the data (hence the term ‘long memory’).
By contrast, when d = 0 the autocorrelation decays
exponentially and the process is said to have short
memory; this is the case for the Ornstein-Uhlenbeck
process above. Long memory was first empirically
detected by Hurst (1951), who studied Nile river-level
data. Since then it has been detected in a host of envi-
ronmental data series, with examples from geophysics
(Mandelbrot & Wallace 1969), hydrology (Montanari et
al. 1996), surface winds (Haslett & Raftery 1989), SAT
(Bloomfield 1992, Koscielny-Bunde et al. 1996, Pel-
letier 1997, Syroka & Toumi 2001), mid-tropospheric
geopotential heights (Tsonis et al. 1999) and the North
Atlantic Oscillation (Stephenson et al. 2000).
A characteristic trait of long-memory processes is
that the variance of an N-member sample mean
decreases more slowly than N –1 (Beran 1989). This can
Fig. 1. Power spectral density estimates for SAT measured in have important consequences for applications. For
(a) Central England, (b) Chicago and (c) Los Angeles. The
instance, incorrectly assuming short memory can lead
spectra are smoothed using a Tukey window giving 19
degrees of freedom in all cases. At frequencies above 10−1 rad to exaggeratedly narrow confidence intervals for the
d−1, the spectra are further smoothed by averaging within 20 mean. That was the problem tackled by Haslett &
logarithmically equispaced bins. The data were detrended Raftery (1989) in their study of the mean power obtain-
and deseasonalised in the mean before computing the spec- able from a wind turbine.
tral estimator. The straight lines at low frequencies show a
least-squares fit (with the slope indicated on each panel) and In the present paper, we address a similar problem
an approximate 95% confidence envelope. The line at high but in the context of a different application, namely
frequencies, with slope −2, is shown for comparison weather derivatives. These, as discussed in greater
Caballero et al.: Long memory in surface air temperature 129

detail in Section 5, are essentially a form of insurance 2. DATA AND PREPROCESSING


against fluctuations in weather, of interest to compa-
nies whose business is affected by such fluctuations.1 This study is based on 2 data sets. The first is the
In order to set the insurance premium (i.e. to value the daily Central England temperature time series (Parker
weather derivative), it is necessary to know the prob- et al. 1992). It is representative of a roughly triangular
ability distribution of a weather index, typically a sea- area of the United Kingdom enclosed by Preston, Lon-
sonal-mean temperature measured at a specified sta- don and Bristol. The time series, beginning in 1772
tion. One can of course simply look at historical data and ending in 1991 (222 yr, 81 084 d), is one of the
and obtain suitable estimates. For a number of rea- longest available instrumental records of daily tem-
sons, however, it is desirable to have a stochastic perature in the world and is thus particularly suitable
time-series model which permits the use of Monte for examining the asymptotic properties of interest
Carlo methods. here. The second data set, prepared by Risk Manage-
The classical approach to time-series modeling ment Solutions on the basis of data provided by
involves fitting a model of the Box-Jenkins type (see NOAA (US National Oceanic and Atmospheric Ad-
Section 4) using as few parameters as possible. It is ministration), consists of daily temperature records for
well known, however, that these models are subject to the last 50 yr (18 262 d) at 200 weather stations in the
the ‘overdispersion’ problem, i.e. will underestimate US. We focus on 2 stations, Chicago and Los Angeles,
the variance of seasonal means (Shea & Madden 1990, which are representative of stations having respec-
Katz & Parlange 1998). Here, we show that the prob- tively the minimum and maximum long-memory
lem may be overcome (at least for our particular appli- intensity within the dataset.
cation) by employing a generalisation of Box-Jenkins In the following sections, we will apply a suite of
models, known as fractional ARIMA or ARFIMA tests to these time series designed to detect the pres-
(Granger & Joyeux 1980, Hosking 1981), which incor- ence of long memory. From a physical point of view,
porate long memory in a natural way, using the single detecting long memory can only be considered inter-
parameter d. These models offer particular flexibility esting if it reveals something about the ‘internal’ work-
in that they can capture both high-frequency short- ings of the climate system. Thus, before applying any
memory behaviour and the long-memory tail using a tests, we must try to rid the time series from the signa-
minimum of parameters. ture of processes which are ‘external’ to the climate
Our aims in the present study are 3-fold. Firstly, we system. Processes which induce nonstationarity in the
wish to show that long memory can indeed be mean are particularly problematic, since these are
detected in SAT time series. We have already pre- most likely to lead to spurious detection of long mem-
sented some evidence for this in Fig. 1, and more is ory. There are several such processes:
given in Section 3. Secondly, we show that ARFIMA • Seasonality, which depends on changes in solar forc-
models can accurately and parsimoniously capture the ing and is therefore not internal to the climate sys-
autocorrelation structure of observed SAT time series; tem, will give a periodic signal in both mean and
this is done in Section 4. Our third aim is to present an variance. It may be trivially removed by Fourier-
example of the application of time-series modeling to transforming the raw SAT time series and estimating
weather derivative pricing and compare the perfor- the seasonal cycle using the amplitude and phase of
mance of ARMA and ARFIMA models having the the Fourier coefficients corresponding to annual and
same number of parameters (Section 5). We comment sub-annual periodicities (the only harmonics that we
also on the fact that the autocorrelation structure con- found to give well-defined peaks in the spectrum). A
tains seasonal dependency and outline its effects on similar procedure was applied to the squared SAT
the performance of the time-series models. In Section anomalies to estimate and remove the seasonal cycle
6, we offer some suggestions as to the possible physi- in the variance.
cal origin of long memory. A summary and conclu- • Urbanization, whereby weather stations originally
sions are given in Section 7. in open country are gradually engulfed by nearby
towns or cities and their accompanying heat island
(Cotton & Pielke 1995). This leads to a strong
upward trend in the time series, on the order of sev-
1
The term ‘derivative’ is used in finance to indicate a security eral degrees centigrade over the last 50 yr. It is diffi-
whose value is based on that of another security, called the cult to model the exact form of the trend, which will
‘underlying’: an example of a derivative is an ‘option’, whose vary from station to station. Lacking the information
value is based on an underlying stock price. An introduction
to financial derivatives may be found in Hull (1998). In the
to do otherwise, we make the simplest choice,
case of weather derivatives, the underlying is a temperature which is to detrend the time series using a linear
or other weather index (see Section 5) least-squares fit.
130 Clim Res 21: 127–140, 2002

Table 1. Estimates of d (‘intensity’ of the long memory) obtained by long memory in time series (see Taqqu et al. 1995
various methods: least-squares regression on the periodogram for a list). Each method produces an estimate of d;
(P), aggregated variance (AV), differenced variance (DV) and if d > 0, we say the time series has long memory.
maximum likelihood estimation of an ARFIMA(1,d,1) model
(ML). Where available, 95% confidence intervals are shown in The methods are intuitive and easy to apply, but
parentheses they are ‘heuristic’ in that they generally do not
permit confidence interval estimation (though the
P AV DV ML periodogram method, see below, is an exception).
We consider 3 of these methods here; motivation
Central England 0.16 (± 0.04) 0.12 0.15 0.20 (± 0.02) for these particular choices is given below.
Chicago 0.12 (± 0.08) 0.09 0.19 0.13 (± 0.04)
Los Angeles 0.29 (± 0.08) 0.24 0.31 0.23 (± 0.02)

3.1. Periodogram method

• Shifts in station position or instrumentation can lead In Eq. (4) we defined long memory as an asymptotic
to abrupt changes in the mean of the time series; a property of the spectral density. Thus, a direct way to
clear example of this (and of the urbanization trend) test for long memory is to compute a spectral estimator
is discussed in von Storch & Zwiers (1999, p. 9). and fit a straight line (on a log-log plot) to the low-
Much effort has been devoted to eliminating these frequency end; we then have d = −b /2, where b is the
inhomogeneities in the US dataset. Where possible, slope of the line. We select this method because it is
station records have been consulted to establish the exceptional in that a procedure for confidence interval
date of position or instrumentation changes. Unfortu- estimation is known (Beran 1994, p. 98), provided
nately such records are often incomplete or unavail- the unsmoothed periodogram is used (note that a
able, and in those cases a statistical procedure is smoothed periodogram, which is more suitable for dis-
adopted which attempts to locate the breaks. We did play purposes, was used in Fig. 1). We applied the
not apply this procedure to the Central England method to our 3 time series using classical least-
dataset. squares regression in the frequency interval 10– 4 to
In addition, there are smaller effects due to global 10–1 rad d–1. Results are reported in Table 1. We can
warming, volcanism and changes in solar activity and infer with 95% confidence that all 3 series display long
stratospheric ozone which will also give secular trends; memory. Changing the regression interval changes the
we assume (again for want of better knowledge) that numerical values somewhat but does not change this
their aggregate effect is also linear and is removed qualitative conclusion.
together with the urbanization trend.
At this point we could in principle also remove the
signatures of such well-known phenomena as the El 3.2. Aggregated variance and differenced variance
Niño/Southern Oscillation, the North Atlantic Oscilla- methods
tion and so on. We do not do this for 2 reasons. One is
that it is technically difficult to do so in a meaningful As mentioned in the Section 1, a key property of
way — for instance, one could build a regression model long-memory processes is that the variance of sample
between an El Niño index and the SAT time series, but means decreases slowly with sample size. In fact, it can
questions arise as to which index is the most suitable be shown (Beran 1989) that given N data points, X i,
and as to the appropriateness of linear regression. The i = 1, …, N:
main reason, however, is that these are modes of vari- N
 1
ability ‘internal’ to the climate system, and are thus Var
N ∑ X i  ≈ N 2d −1 as N→∞
part and parcel of the ‘interesting’ signal. Some further i =1

remarks on the relation between these low-frequency This suggests the following method for estimating d.
forms of variability and higher frequencies are offered Divide the series into Nm blocks of size m and com-
in Sections 6 and 7. pute the mean for each block:
km
1
x k (m) = ∑ Xi
m i =(k −1)m +1
k = 1,…, Nm
3. DETECTION OF LONG MEMORY: HEURISTIC
METHODS
and the variance of the block mean:
Long memory was first detected using the rescaled N m

∑ [ x k (m ) − x ]
1 2
range or R/S statistic (Hurst 1951). Since then, a num- s 2 (m) =
ber of similar methods have been developed to detect N m −1 k =1
Caballero et al.: Long memory in surface air temperature 131

where x - indicates an overall mean. Now a log-log plot Here, ε is a white noise process and B is the backstep
of s 2(m) against m should yield a straight line with operator:
slope 2d − 1. This is known as the aggregated variance
BXi = Xi−1
method.
A shortcoming of this and other heuristic estimators while φ(B) and ψ(B) are polynomials in B:
is that inhomogeneity in the data (see Section 2) can
φ(B) = a0 + a 1B + a2 B 2 + …
produce a positive value of d even in the absence of
long memory. A modification of the above method, ψ(B) = b0 + b1B + b 2 B 2 + …
known as the differenced variance method, avoids this
problem. The idea is to study the first-order difference which represent the autoregressive and moving aver-
of the above variances: age parts respectively. The term:
∆s 2(m) = s 2 (m + 1) − s 2 (m) (1 − B)d
It can be shown (Teverovsky & Taqqu 1997) that a log- where d is zero or a positive integer, represents a
log plot of this quantity against m will again asymptot- finite-difference time derivative of order d; the inter-
ically produce a straight line with slope 2d − 1, but this pretation is that one should first compute the process
time the value of d will be free from the effects of inho- with d = 0, and then sum (integrate) it d times to obtain
mogeneity in the mean.2 the full process. ARIMA processes are non-stationary
We have applied both of the above procedures to our for d = 1. Since the processes we are dealing with here
3 time series using 50 logarithmically equispaced val- are stationary, we need only consider the case d = 0.
ues of m. The results are plotted in Fig. 2. Given the Then, if φ and ψ are of degree p and q respectively, the
considerable scatter in the points at high m, we use process is denoted ARMA( p,q), or AR( p) if q = 0.
robust fitting in the interval 30 < m < 5000 d to estimate The autocorrelation function of an ARMA( p,q) pro-
d. Estimated values are given in Table 1. We note that cess will decay exponentially to zero for lags greater
the values of d estimated with the differenced variance than max( p,q). For this reason, they are unsuitable for
method are all greater than those obtained with the modeling long-memory time series; in order to obtain
aggregated variance, so that inhomogeneity does not appreciable correlation at, say, Lag 100, it is necessary
seem to play a role. We note also that these estimates to use an ARMA process of order 100, which means fit-
are all consistent (within the 95% confidence interval) ting 100 parameters. This contravenes the general
with the periodogram estimates. Again, the values principle of parsimony (Box & Jenkins 1970), according
obtained here are sensitive to the choice of regression to which one should always seek a model which will
interval, but not so much as to invalidate the above will adequately fit the data with the bare minimum of
qualitative conclusions. parameters.
It turns out that daily temperature time series can be
accurately and parsimoniously fitted using a more gen-
4. MODELING SAT TIME SERIES eral class of stochastic process known as fractional
ARIMA or ARFIMA models. They are defined exactly
The classical approach to modeling discrete time as in Eq. (5), except that now 0 < d < 1⁄2; it can be shown
series is through the use of autoregressive integrated (Granger & Joyeux 1980) that in this range the models
moving average (ARIMA) models, popularised by Box are stationary and have the long-memory property with
& Jenkins (1970). In their notation, they take the gen- intensity d. For d = 1⁄2, the models are non-stationary.
eral form: To make sense of the fractional differencing, the opera-
tor is formally expanded as a power series:
φ(B) (1 − B)dXi = ψ(B)εi (5)

Γ(d + 1)
(1 − B)d = ∑ Γ(k + 1)Γ(d − k + 1) (−1)k B k (6)
2
There is an interesting parallel between the problem ad- k =0
dressed here and those faced in the study of long-range cor- Thus, an ARFIMA( p,d,q) model is equivalent to an
relations in DNA nucleotides. These were originally studied
ARMA(8,q) model while using only p + q + 1 parame-
using the ‘fluctuation analysis’ method (Peng et al. 1992),
which is almost identical to the aggregated variance method. ters. The presence of autoregressive and moving-aver-
It was later objected that the long-range correlations de- age components allows these models to capture the
tected could be trivial due to the well-known ‘patchiness’ or short-memory high-frequency behaviour, while the
inhomogeneity of DNA. To get around this, ‘detrended fluc- slow decay of the coefficients in Eq. (6) controls the
tuation analysis’ was devised (Peng et al. 1994), which is in-
sensitive to the inhomogeneity and is in this sense analogous
long-term behaviour. This allows ARFIMA time series
(though quite different in detail) to the differenced variance to accurately model long-memory processes while
method using only a small number of parameters.
132 Clim Res 21: 127–140, 2002

more rigorous method for detection


of long-range dependence than the
heuristic methods discussed in Sec-
tion 3.
Here we use the approximate ML
method proposed by Haslett & Raftery
(1989) and available in the ‘fracdiff’
package of the ‘R’ statistical comput-
ing environment. We used these rou-
tines to fit an ARFIMA(1,d,1) model to
our 3 data series. This particular
ARFIMA model was selected by trial
and error. Models with less than 2
ARMA parameters all give a consider-
ably worse fit (as assessed by compar-
ing the model and data autocorrelation
functions), while those with more than
2 give no significant improvement in
the fit; amongst those with 2 ARMA
parameters, ARFIMA(1,d,1) gave the
best fit in the 3 cases studied here. Re-
sults are shown in Table 1. The results
again indicate that long memory is
present in all 3 series at the 95% confi-
dence level. We note also that the ML
values are compatible with those ob-
tained using the periodogram, within
the respective uncertainties.
To show how significant an im-
provement ARFIMA is over ARMA,
we compare the autocorrelation func-
tions of the historical data to those of
ARFIMA(1,d,1), AR(3) and AR(20)
models fitted to the time series (note
that AR models fitted to a given series
generally have autocorrelations which
decay more slowly than MA or ARMA
models with the same number of para-
meters and are hence better suited to
Fig. 2. Aggregated variance and differenced variance for the 3 datasets time series with persistent correla-
(detrended and deseasonalised in mean and variance) as a function of the sam- tions). Results are shown in Fig. 3. In
ple size m (see Section 3). The thick solid line shows a robust fit to the data
points. The thin line, with slope −1, is shown for comparison all 3 time series, the ARFIMA model
gives a much better fit to the autocor-
relation structure than the other mod-
els, even at high lags. There is of
Fitting an ARFIMA model to data is somewhat more course some scatter in the observed autocorrelation,
involved than with ARMA models. Exact maximum but the ARFIMA model appears to give a more-or-less
likelihood (ML) estimation is possible but prohibitively unbiased fit (though with some overestimation in the
expensive from a computational viewpoint for the Central England case). The AR(3) model, on the other
length of the time series and practical applications we hand, quickly drops to zero and substantially underes-
are considering here. Fortunately, a number of effi- timates the correlation at lags higher than a few days.
cient approximate ML methods have been developed The AR(20) model closely follows the observed auto-
(see review by Beran 1994). ML methods naturally per- correlation function up to a lag of 20 d, as expected,
mit estimation of confidence intervals for the fitted but then drops off rapidly, again consistently underes-
parameters, and hence provide an alternative and timating persistence at higher lags.
Caballero et al.: Long memory in surface air temperature 133

company’s revenues. There are a


number of financial and commercial
reasons why this is beneficial.
Two ingredients are necessary in
order to structure a weather derivative
contract such as the above: a weather
index and a payout function. The
weather index, denoted I, charac-
terises the weather over a certain
period, called the ‘strike’ period. It is
most commonly based on temperature
(though rain- and snowfall and hours
of sunshine have also been used).
Rather than use raw temperature, it is
customary to use heating or cooling
degree days (HDDs or CDDs), defined
as:
HDDi = max(T * − Ti , 0) (7)
CDDi = max(Ti − T *, 0) (8)
where the subscript i indicates a spe-
cific day, Ti is the average tempera-
ture measured on that day, and T * is a
fixed reference temperature (65°F in
the US, 18°C in Europe). In the exam-
ple above, the index might be defined
Fig. 3. Autocorrelation functions for the (a) Central England, (b) Chicago and as the total number of HDDs over the
(c) Los Angeles SAT anomaly time series with the observed data and fits for the winter season:
ARFIMA(1,d,1), AR(3) and AR(20) models
i f +N −1
I = ∑ HDDi
i =i f

5. PRICING WEATHER DERIVATIVES: ARMA where i f is the first day of the season and N is its
VERSUS ARFIMA length.
The payout function, Q (I), determines how much the
5.1. What is a weather derivative? financial institution will pay out for a given index out-
come. Referring to the example given previously, we
As noted in the Section 1, weather derivatives are a expect no payout if the winter is harsh (high I ) and an
form of insurance, or in financial terms an instrument increasing payout the milder the winter (low I ). A typ-
which a company can use to hedge its exposure to fluc- ical payout structure has the form shown in Fig. 4, fea-
tuations in weather. They are a rather recent develop- turing a zero-payout threshold and a linear increase in
ment, dating back only to 1997, and are not yet widely payout below the threshold. The position of the thresh-
known in the meteorological literature (an exception old and the slope of the linear part must be agreed on
being Zeng 2000). by the 2 parties entering the contract.
The basic concept may be understood through a sim- Once the definition of the index and the form of the
ple example. A company selling gas for heating may payout function have been stipulated, the actual pay-
want to protect itself against losses due to anomalously out from the contract depends only on the final index
mild conditions the following winter. It can then buy an value, which itself depends on the weather. What is
insurance policy (the weather derivative) from a finan- less clear is how much the financial institution should
cial institution which agrees to pay out a certain charge the company, i.e. how to price the weather
amount if the winter should indeed turn out to be mild. derivative. The subject of derivative pricing is a com-
Note that the company thereby reduces its losses in plex one and has received much attention over the past
adverse circumstances, but incurs the cost of the insur- decades. For reasons outside the scope of this paper,
ance premium, thus reducing its gains if the winter the usual way to price a weather derivative is to set:
turns out to be harsh. The purpose of weather deriva-
tives is to smooth out the temporal fluctuations in the S = E [Q] + R (9)
134 Clim Res 21: 127–140, 2002

daily model at all? Firstly, we note that there is no a pri-


ori reason why a ‘good’ daily model (i.e. one free from
the overdispersion problem) should be any less accu-
rate than the other methods. In fact, in cases where the
daily temperature often falls below or above the
degree-day threshold, T * (see Eqs. 7 & 8), the daily
model will make more efficient use of the available
data and is likely to be more accurate.
Further, there is at least 1 situation in which daily
modeling is clearly the method of choice: the pricing of
derivatives within the strike period, a procedure
known in financial jargon as ‘marking to market’. Dur-
ing the strike period, a new temperature observation
becomes available each day, and one can use this
information to make a better estimate of the final value
of I. One can also make use of deterministic weather
forecasts to project information up to 10 d into the
future (see, for instance, Jewson 2000). Since SAT time
Fig. 4. Plot of a typical payout function Q(I). Overlaid is a series are highly autocorrelated, it clearly makes sense
hypothetical Gaussian index probability distribution function to use an appropriate time series model to project
(pdf), P (I) available information even further into the future, out
to the end of the strike period. Accurate marking to
where S is the price of the derivative. The first term is market is important because many derivatives are
simply the mean payout: actually most heavily traded during the strike period,

and all parties involved in derivatives trading must
E [Q] = ∫0 Q(I )P (I )dI keep track on a day-by-day basis of their total expo-
sure and risk.
P(I) being the probability distribution function (pdf) of
the index. The second term, R, is a ‘risk premium’ — a
positive correction which compensates the financial 5.3. Factors influencing the price of the derivative
institution for the risk taken in selling the derivative.
We will not dwell on its specific form here, but note The gross features of P(I), namely its location and
that it too in general will depend on P(I). width, are captured by its first 2 moments, the mean µI
and the variance σ I2. It is clear from Fig. 4 that, if either
the mean is overestimated or the variance underesti-
5.2. Why use a time-series model? mated, the derivative will be underpriced. A financial
institution trading contracts with such incorrect pricing
We saw above that to price a weather derivative will, over time, lose money.
accurately, it is essential to have a good estimate of Let us then consider what aspects of daily tempera-
P(I), or at least of E [Q]. There are several possible ture variability influence the values of µI and σ I2. For
approaches: the sake of this discussion, we assume that the temper-
• ‘Burn’ analysis, which involves evaluating Q(I) for ature over the winter period is always below T *. We
historical values of I and directly computing E[Q]. can then write:
• Direct modeling of the index distribution, which
HDDi = T * − Ti = T * − E[Ti] − T ’i
involves fitting a parametric or non-parametric dis-
tribution to the historical values of I. where E[·] indicates an expected value and we define
• Daily modeling (the subject of this paper), which the temperature anomalies T ’i ≡ Ti − E[Ti ]. Then:
involves fitting a time-series model to the daily data i f +N −1
and then using the model to generate a very large µ I = E [I ] = NT * − ∑ E [Ti ] (10)
sample of synthetic I values. i =i f

In current practice, the first 2 methods are by far the Thus, to obtain an accurate estimate of the mean
most commonly used. Indeed, the daily modeling index value, µI, we need only an accurate estimate of
approach is somewhat more complicated to imple- the seasonal cycle, E[Ti].
ment, and there is widespread mistrust of it because of The situation is more complicated for the variance.
the overdispersion problem. Why then bother with the We have:
Caballero et al.: Long memory in surface air temperature 135

2
 i f +N −1   i f +N −1 model and compute a close approximation to µMI, the
σ 2I = E [(I − µ I ) ] = E  ∑T 'i   = σT2 ∑ ρi , j model population mean. Let us set ∆ = µMI − µ̂ HI. If ∆ is
2

 i =i f   i , j =i f large, then we should reject the model. To establish a


where σ T2 is the variance and ρi, j the autocorrelation rejection threshold, we need an estimate of the sam-
function of the temperature anomalies. Assuming the pling fluctuations. This we obtain by dividing up the
process is stationary, the autocorrelation will only large model sample into 50-member sub-samples.
depend on the lag k; we then have: Each sub-sample provides a sample mean, µ̂ MI, and we
N can use these to compute a value, ∆ 99, such that only
 
σ 2I = σT2 N + 2 ∑ (N − k )ρk (11) 1% of the sub-samples gives |µMI − µ̂ MI | > ∆ 99. Thus, we
 
k =1 can reject the model with 99% confidence if ∆ > ∆ 99.
Thus, to estimate of the variance of I accurately, we We can proceed analogously for σI2 and indeed for any
need to estimate not only the variance of the tempera- parameter we wish.
ture anomalies, but also their autocorrelation structure We restrict the test to the Chicago and Los Angeles
up to lag N. If the autocorrelation is underestimated, so stations, which are relevant for real-world derivative
too will the index variance be. In summary, the mini- pricing. For Chicago, we consider a 5 mo HDD contract
mum requirements for a suitable time-series model are spanning November−March; for Los Angeles, we con-
that it should correctly capture the seasonal cycle, the sider a 5 mo CDD contract spanning May−September.
anomaly variance, and the anomaly autocorrelation We generated 2 × 105 index values to perform the test.
structure out to lags of a season. Note that the models are fitted to deseasonalised data;
before computing synthetic index values, we add the
estimated seasonal cycle in mean and variance to the
5.4. Comparing the performance of ARMA and model output.
ARFIMA Results are reported in Table 2. In the case of µI,
modeled and historical values are in good agreement
Here we compare the skill of best-fit ARFIMA(1,d,1) at both stations for both models. In view of Eq. (10), this
and AR(3) models in simulating the index pdf; this is essentially means that the seasonal cycle is well esti-
essentially a test of how well they will fare in pricing a mated. Note that ∆ values are almost identical for both
derivative based on this index. The comparison is fair models, as they should be. Neither model can be
in the sense that the models use the same number of rejected at either of the stations.
parameters. We approach the problem by making the Results for σI2 are much more critical. The AR model
null hypothesis that the model is perfect (that is, that can be rejected at both stations. Note also that ∆ is
the historical data are actually generated by the model always negative, which means that the model is under-
itself) and then try to reject the hypothesis. We will see estimating the index variance; this is what we expect
that the hypothesis is much more easily rejected for from Eq. (11), given the systematic underestimate of
AR(3) than for ARFIMA(1,d,1). the autocorrelation (Fig. 3). The ARFIMA model fares
In practice, we proceed as follows: The historical much better at Chicago. It also performs better in Los
data provides us with 50 values of I; we use these to Angeles (it gives a smaller ∆), but it can be rejected at
obtain the historical sample mean, µ̂ HI. We now gener- the 99% confidence level. The reasons for this are dis-
ate a very large number of index values using the fitted cussed below.

Table 2. Mean index value, µI , computed from observations and from 2 models. For Chicago (Los Angeles), a 5 mo HDD (CDD)
contract covering the period November−March (May−September) is considered. Units everywhere are Fahrenheit degree-days.
∆ indicates the difference between the historical and modeled value, as a percentage of the latter. ∆ 99 indicates a 99% confidence
level for ∆ (that is, only 1% of model-produced 50-member samples will give |∆| > ∆ 99)

Historical ARFIMA(1,d,1) AR(3)


µ̂ HI µMI ∆ ∆ 99 µMI ∆ ∆ 99

Chicago 5032 5036 −0.1 2.6 5036 −0.1 2.1


Los Angeles 438 457 –4.1 10.3 457 –4.0 6.3

σ̂ IH σIM ∆ ∆ 99 σIM ∆ ∆ 99

Chicago 359 360 –1 24 280 −270 26


Los Angeles 166 126 −32 20 80 −107 25
136 Clim Res 21: 127–140, 2002

Fig. 5. q-q plots of the observed against the modeled index quantiles (dots) for (a,b) Chicago and (c,d) Los Angeles using the
ARFIMA(1,d,1) model (a,c) and the AR(3) model (b,d). Solid (dotted) curves show 99% (95%) confidence envelopes. The central
solid line is a diagonal for comparison. Units are degree days in Fahrenheit

The Monte Carlo test can also be applied non-para- 5.5. Seasonality in the autocorrelation structure and
metrically by computing a sample cumulative distribu- its effects
tion function (CDF) for each of the 50-member sub-
samples and estimating confidence intervals for each We saw in the previous section that while the
of the CDF quantiles. The results of this procedure are ARFIMA(1,d,1) model generally performs better than
displayed in q-q form in Fig. 5. In this figure, the dots the AR(3), it still fails in Los Angeles, despite the close
indicate (x,y) pairs, where x is the model population- fit to the observed autocorrelation function displayed in
mean quantile and y the historical data sample quan- Fig. 3b. It turns out that one major reason for this is that
tile. If the 2 CDFs were identical, the dots would lie the model was fitted to the entire data series. This is
along the diagonal. The solid (dotted) lines above and only correct if the autocorrelation of the data is station-
below the diagonal indicate 99% (95%) confidence ary. As Fig. 6 clearly shows, this is not the case. The
intervals for each quantile. The figure confirms the curves in the figure were obtained by computing the
results obtained above. The ARFIMA model generally autocorrelation function separately for each summer
does better than the AR at both stations, the dots being (winter) and then averaging over all summers (winters).
more closely aligned with the diagonal. However, both There is clearly much greater persistence during sum-
models fail in Los Angeles, with several dots exiting mer than winter. The physical reasons for this are not
the 99% confidence envelope and a pronounced sys- clear; we may speculate that during summer the high-
tematic difference between the modeled and observed frequency ‘weather’ activity is much lower, so that the
CDFs. low-frequency variability (plausibly of oceanic origin,
Caballero et al.: Long memory in surface air temperature 137

from season to season (not shown), so the issue is not so


critical there.
A simple ‘fudge’ to correct matters is to fit the
model separately to summer data. This was done by
extracting summer (May−September) data from each
year, and then fitting the model to the single time
series obtained by juxtaposing all the summers. The
results of this operation are displayed in Fig. 7. We
see that the ARFIMA model now performs reasonably
well, with only 1 point in the tail of the distribution
falling outside the 99% confidence envelope. The
AR(3) model, on the other hand, continues to perform
poorly, giving a significant underestimate of the index
variance. In any event, we stress that this is only a
temporary solution; a more satisfactory one would be
to fit the entire data series using seasonally varying
parameters. We are currently investigating tech-
niques for doing this.

Fig. 6. Seasonal variation of the autocorrelation structure of 6. SOME REMARKS ON THE ORIGIN OF LONG
observed SAT in Los Angeles: summer and winter average MEMORY IN SAT
autocorrelations. The all-year autocorrelation function is re-
plotted from Fig. 3b
In Section 1, we saw that simple physical considera-
tions based on the behaviour of mid-latitude synoptic-
given the coastal location of this station) accounts for a scale eddies led to the Ornstein-Uhlenbeck process,
greater fraction of the total variance. which successfully accounts for the high-frequency
From our applied point of view, the consequence is behaviour of SAT but underestimates the low-
that if we fit the model to the ‘all year’ data set, we will frequency variance. We then showed (Section 4) that
considerably underestimate the autocorrelation during ARFIMA models can successfully capture both the
summer and hence, by Eq. (11), the variance of a sum- high- and low-frequency behaviour. However,
mer-based index. This is exactly what is observed in ARFIMA models are mathematical tools which, though
the results of the previous section. We note that in useful for applications, have no immediate physical
Chicago the autocorrelation structure varies much less interpretation. That is, they do not provide a direct

Fig. 7. q-q plots of the observed against the modeled index quantiles (dots) for Los Angeles using the (a) ARFIMA(1,d,1) and
(b) AR(3) models fitted to summer data only
138 Clim Res 21: 127–140, 2002

answer to the basic physical question of why there is strictly speaking have the long-memory property: the
long memory in SAT time series. spectrum will not behave as a power law all the way to
The adequacy of the Ornstein-Uhlenbeck process at the origin but will eventually flatten out to a constant.
high frequency suggests we are on the right track in However, the cross-over may occur at very low fre-
modeling the effect of high-frequency transients. It is quency and may not be detectable given the finite-
then plausible to attribute the ‘extra’ variance at low length data series available. We illustrate this point
frequency to other, slower processes which also affect with a specific example. We consider 3 independent
SAT. The identification and analysis of mechanisms for AR(1) processes with parameter values 0.82, 0.95 and
the low-frequency atmospheric variability has been a 0.999 (corresponding to time scales of 6, 20 and 1000 d)
central theme of climate research over the past 2 and noise variances 1, 0.3 and 0.01 respectively. We
decades, and is still ongoing. A few of the candidates generate 105 d long series with each model, compute
are regime-like behaviour in the atmosphere (Hansen the aggregate series (12) and take its power spectrum.
& Sutera 1995), local or remote oceanic forcing (Wal- The result, shown in Fig. 8a, has the same qualitative
lace & Gutzler 1981), and atmosphere-land surface features as the observed spectra reported in Fig. 1: a
interaction (Manabe & Stouffer 1996). high-frequency part with slope close to −2 crossing
How can we incorporate the slow mechanisms into over to a shallower slope at low frequencies. The peri-
the model? The simplest option (Granger 1980) is to odogram test (Section 3.1) applied to this series indi-
assume that each mechanism affecting SAT can be cates that long memory is present with intensity d =
modeled as an AR(1) process with a certain value of á 0.12 ± 0.07. Note that the process parameters
and that all mechanisms act independently of each employed here have been selected arbitrarily for illus-
other. The final SAT process is then just the sum of a trative purposes only. It may be possible to devise an
certain number N of AR(1) processes: appropriate parameter-selection algorithm giving an
optimal fit to any given time series, but we do not pur-
N
sue this issue here.
xi = ∑ y i( n) (12)
One might argue that it is unrealistic to assume that
n =1
the various processes affecting SAT occur indepen-
where yi (n) are the individual AR(1) processes. It can be dently of each other. In fact, a large part of the vari-
shown that, if the coefficients of the AR(1) processes ability is actually generated by the coupling of the var-
are appropriately chosen, then as N → ∞ the aggregate ious parts of the climate system. For instance, much of
process converges to one having the long-memory the variability in mid-latitude oceanic temperatures
property (Eq. 4). can be attributed to stochastic forcing by atmospheric
In the real atmosphere there is presumably only a transients (Frankignoul 1995). The simplest way to
finite number of relevant processes affecting SAT. model such interacting processes is with a multivariate
Thus, N < 8, and the aggregate time series will not AR(1) model:

Fig. 8. Power spectral density estimates for synthetic data produced by (a) the aggregation of 3 independent AR(1) processes and
(b) a trivariate AR(1) process. See Fig. 1 and Section 6 for details
Caballero et al.: Long memory in surface air temperature 139

x i = Ax i−1 + Be i (13) heat flux. Sea-surface-temperature (SST) perturba-


tions have a typical decay rate on the order of several
where x i = (x1i, x 2i, …, x ni), ei = (ε1i, ε2i, …, εni ), is a vector months. This scale separation allows the atmospheric
of n independent unit-variance white noise processes forcing of the ocean to be modeled as a white noise.
and A and B are n × n matrices. Let us again consider Thus the SST can be modeled as an Ornstein-Uhlen-
a specific example. We take: beck process with intrinsic time scale of several
months. In Section 6, we suggested that the variability
 0.82 0.25 0.085  1 0 0
induced in the ocean may in turn feed back onto the
A =  0.03 0.9 0.0  B =  0 0 0 (14)
    atmosphere and contribute to its low-frequency vari-
 0.001 0.0 0.998  0 0 0 ability, thus reddening the low-frequency tail of the
Only x 1, which we may think of as representing high- atmospheric spectra. While this is quite plausible in the
frequency atmospheric transients, is directly forced by Tropics, where atmosphere-ocean coupling is strong, it
external white noise. Components x 2 and x 3 are essen- is less obvious in the mid-latitudes, where the coupling
tially AR(1) processes with long decay time scales, sto- is much weaker. Feedback of mid-latitude SST vari-
chastically forced by the ‘atmosphere’ x 1; we may ability onto the atmosphere has, however, been docu-
think of them as representing slow components of the mented in a GCM (general circulation model) by Rod-
climate system such as the land surface and the ocean. well et al. (1999). This explanation for the appearance
These slow components in turn feed back onto the of long memory in atmospheric time series is similar to
atmosphere through the top-row elements in A. The that suggested in Tsonis et al. (1999). Other explana-
particular values of the entries in A are entirely arbi- tions, relying on internal dynamics of the atmospheric
trary and serve only for illustration. We generate a boundary layer, have been suggested in the literature
105 d long run of the process and compute the power (Jánosi & Vattay 1992, Pelletier 1997). Further work is
spectrum of the atmospheric component only. The needed to decide among these alternatives.
result (Fig. 8b) again looks quite similar to observa- We summarise our main conclusions as follows:
tions. The periodogram test gives d = 0.23 ± 0.07. • Long memory of moderate intensity (d ~ 0.1 to 0.25)
Again, any value of d is obtainable by manipulating A. can be detected in the 3 mid-latitude SAT time series
studied here with 95% statistical confidence;
• A simple explanation for the apparent presence of
7. SUMMARY AND DISCUSSION long memory in these time series is that SAT is simul-
taneously affected by a number of physical pro-
We have analysed 3 multidecadal daily SAT time cesses, each of short memory but with widely dis-
series representative of conditions in the northern mid- parate intrinsic time scales;
latitudes. We have applied a number of tests to detect • ARFIMA models with only 3 parameters give an
the presence of long memory; these indicate that long excellent fit to SAT time series;
memory is indeed present in all 3 time series. We have • ARFIMA models are suitable for pricing weather
shown that an ARFIMA(1,d,1) model gives a qualita- derivatives, provided care is taken to account for
tively unbiased fit to the autocorrelation structure of seasonality in the autocovariance structure.
the data out to lags of a season or more, while AR mod-
els even of high order give a considerable underesti-
Acknowledgements. R.C. was supported by Dansk Grund-
mate of the high-lag autocovariance. For the reasons forskningsfond. The Central England temperature data were
outlined in Section 5.3, ARFIMA models are therefore obtained through the Data Support Section of the National
much better suited to weather-derivative pricing than Center for Atmospheric Research (http://dss.ucar.edu/
AR models employing a similar number of parameters. datasets/ds825.0/data). The ‘R’ environment is freely avail-
able at http://cran.stat.wisc.edu. All figures were prepared
Finally, we have shown (Section 6) how spectra with using Ferret software, freely available at: http://ferret.
the same qualitative features as those of observed SAT wrc.noaa.gov.
(Fig. 1) can be generated by simple aggregation of sev-
eral short-memory processes which may be indepen-
dent or coupled. We note that, in the coupled case, our
LITERATURE CITED
argument is very close to the stochastic climate model
of Hasselmann (1976). The reasoning behind the sto- Beran J (1989) A test of location for data with slowly decaying
chastic climate model is as follows: The atmosphere, if serial correlations. Biometrika 76:261–269
left to its own devices, will produce a spectrum which Beran J (1994) Statistics for long-memory processes. No. 61,
Monographs on Statistics and Applied Probability. Chap-
is white for time scales longer than about 10 d. Atmos- man & Hall, New York, CRC, Boca Raton, FL
pheric variability will act as a source of stochastic forc- Bloomfield P (1992) Trends in global temperature. Clim
ing on the ocean, due to fluctuations in wind stress and Change 21:1–16
140 Clim Res 21: 127–140, 2002

Box GEP, Jenkins GM (1970) Time series analysis, forecasting properties of rainfall records in Italy. J Geophys Res 101:
and control. Holden-Day, San Francisco 29431–29438
Cotton WR, Pielke RA (1995) Human impacts on weather and Parker DE, Legg TP, Folland CK (1992) A new daily Central
climate. Cambridge University Press, Cambridge England temperature time series, 1772–1991. Int J Clima-
Frankignoul C (1995) Climate spectra and stochastic climate tol 12:585–596
models. In: von Storch H, Navarra A (eds) Analysis of cli- Peixoto JP, Ort AH (1992) Physics of climate. American Insti-
mate variability: applications of statistical techniques. tute of Physics, New York
Springer Verlag, Berlin, p 139–157 Pelletier JD (1997) Analysis and modeling of the natural vari-
Granger CWJ (1980) Long memory relationships and the ability of climate. J Clim 10:1331–1342
aggregation of dynamical models. J Econometr 14: Peng CK, Buldyrev SV, Goldberger AL, Havlin S, Sciortino S,
227–238 Simons M, Stanley HE (1992) Long-range correlations in
Granger CWJ, Joyeux R (1980) An introduction to long-range nucleotide sequences. Nature 356:168–170
time series models and fractional differencing. J Time Ser Peng CK, Buldyrev SV, Havlin S, Simons M, Stanley HE,
Anal 1:15–30 Goldberger AL (1994) Mosaic organization of DNA
Hansen AR, Sutera A (1995) The probability density distribu- nucleotides. Phys Rev E 49:1685–1689
tion of planetary-scale atmospheric wave amplitude revis- Rodwell MJ, Rowell DP, Folland CK (1999) Oceanic forcing of
ited. J Atmos Sci 52:2463–2472 the wintertime North Atlantic oscillation and European
Haslett J, Raftery AE (1989) Space-time modelling with long climate. Nature 398:320–323
memory dependence: assessing Ireland’s wind power Shea DJ, Madden RA (1990) Potential for long-range pre-
resource. Appl Stat 38:1–50 dictability of monthly-mean surface temperatures over
Hasselmann KF (1976) Stochastic climate models. Part I: The- North America. J Clim 3:1444–1451
ory. Tellus 28:473–484 Simmons AJ, Hoskins BJ (1978) The life cycles of some non-
Hosking JRM (1981) Fractional differencing. Biometrika 68: linear baroclinic waves. J Atmos Sci 35:414–432
165–176 Stephenson DB, Pavan V, Bojariu R (2000) Is the North
Hull JC (1998) Introduction to futures and options markets, Atlantic Oscillation a random walk? Int J Climatol 20:1–18
3rd edn. Prentice-Hall International, New York Syroka J, Toumi R (2001) Scaling and persistence in observed
Hurst HE (1951) Long-term storage capacity of reservoirs. and modeled surface temperature. Geophys Res Lett 28:
Trans Am Soc Civil Eng 116:770–799 3255–3258
Jánosi IM, Vattay G (1992) Soft turbulent state of the atmos- Taqqu MS, Teverovsky V, Willinger W (1995) Estimators for
pheric boundary layer. Phys Rev A 46:6386–6389 long-range dependence: an empirical study. Fractals 3:
Jewson S (2000) Use of GCM forecasts in financial-meteoro- 785–798
logical models. In: Proceedings of the 25th Annual Cli- Teverovsky V, Taqqu MS (1997) Testing for long-range
mate Diagnostics and Prediction Workshop. US Depart- dependence in the presence of shifting means or a slowly
ment of Commerce, Washington, DC declining trend using a variance type estimator. J Time
Katz RW, Parlange MB (1998) Overdispersion phenomenon in Ser Anal 18:279–304
stochastic modeling of precipitation. J Clim 11:591–601 Tsonis AA, Roebber PJ, Elsner JB (1999) Long-range correla-
Koscielny-Bunde E, Bunde A, Havlin S, Goldreich Y (1996) tions in the extratropical atmospheric circulation: origins
Analysis of daily temperature fluctuations. Physica A 231: and implications. J Clim 12:1534–1541
393–396 von Storch H, Zwiers FW (1999) Statistical analysis in climate
Manabe S, Stouffer RJ (1996) Low-frequency variability of research. Cambridge University Press, Cambridge
surface air temperature in a 1000-year integration of a Wallace JM, Gutzler DS (1981) Teleconnections in the geopo-
coupled atmosphere-ocean-land surface model. J Clim 9: tential height field during the northern hemisphere win-
376–393 ter. Mon Weather Rev 109:784–812
Mandelbrot BB, Wallis JR (1969) Some long-run properties of Zeng L (2000) Weather derivatives and weather insurance:
geophysical records. Water Resour Res 5:321–340 concept, application and analysis. Bull Am Meteorol Soc
Montanari A, Rosso R, Taqqu MS (1996) Some long-run 81:2075–2082

Editorial responsibility: Hans von Storch, Submitted: May 24, 2001; Accepted: October 21, 2001
Geesthacht, Germany Proofs received from author(s): May 15, 2002

You might also like