Academia.eduAcademia.edu

Lag length and mean break in stationary VAR models

We consider three approaches to determine the lag length of a stationary vector autoregression model and the presence of a mean break. The first approach, commonly used in practice, uses a break test as a specification check after the lag length is selected by an information criterion. The second performs the break test prior to estimating the lag length. The third simultaneously selects both the lag length and the break by some information criterion. While the latter two approaches are consistent for the true lag order, we justify the validity of the first approach by showing that the lag length estimator based on specific information criteria is at worst biased upwards asymptotically when the mean break is ignored. Thus, conditional on the estimated lag length, the break test retains its asymptotic power properties. Finite-sample simulation results show that the second approach tends to have the most stable performance. The results also indicate that the best strategy for short-run forecasting does not necessarily coincide with the best strategy for finding the correct model.

INTRODUCTION

In constructing a stationary vector autoregression (VAR) model, the determination of its lag length and the examination of its parameter stability are two important issues. When there is no structural break or a break is known, the lag length of the VAR model is usually estimated by minimizing an information criterion. Popular information criteria include Akaike (1969), Schwarz (1978) and Hannan and Quinn (1979), which we label as AIC, SC and HQ, respectively. On the other hand, when the lag length is known, the parameter stability can be readily tested by the procedures of Andrews (1993) and Andrews and Ploberger (1994).

We consider the cases where neither the lag length nor the break is known and investigate how the lag length and the presence or absence of a break in the mean can be determined. We focus on the break in the mean mainly because, in addition to its simplicity, it tends to have most severe impact on the forecast performance of the stationary VAR model, see Clements and Hendry (2000). As one could proceed to estimate the lag length first, or check parameter stability first, or determine both simultaneously, the following three approaches are examined.

The first approach, closely related to the Box-Jenkins methodology, treats break tests as specification checks on an estimated model. For the VAR model, the approach amounts to first estimating the lag length ignoring the possible break and then performing the break tests based on the estimated lag length. Since the possible break is ignored at the time of selecting the lag length, it is desirable to identify the effect of ignoring the break on the estimated lag length. It is also desirable to know how the 'contaminated' lag length estimator affects the subsequent break test, the chance of getting the correct model, and the forecast performance of the resultant model. While this approach is commonly used in practice, see Lütkepohl (1993, Ch. 4), Candelon and Lütkepohl (2001), Boivin (1999) and Olekalns (2001), its asymptotic behavior has not been considered in the literature. In this paper, we justify the asymptotic validity of this approach by showing that the lag length estimated by SC or HQ is consistent when the ignored mean break is 'small' and is asymptotically biased upwards when the ignored mean break is 'large'. Hence the large-sample power properties of the break test will not be affected as the test is conducted conditional on a lag length that tends to be longer than the correct length at worst, see Pötscher (1991, Lemma 2).

The second approach is to perform the break test first on the VAR with a maximal lag length that is known to the investigator and then estimate the lag length based on the result of the break test (a break dummy may be included in the model). For this approach, as the break test is carried out on a correct (but possibly over-parameterized) model, its asymptotic power properties are retained and the break point (fraction) is consistently estimated as shown by Bai (1997). Consequently, the lag length estimated with SC or HQ will be consistent when the mean break is present. The asymptotic probability that the approach selects the correct model depends on the nominal size of the break test and the presence or absence of the mean break.

The third approach is to determine both the break and lag length simultaneously using an information criterion. This approach differs from the previous two in that hypothesis testing is not involved. Given the result of Bai (1997) that the break point (fraction) of a 'large' break is consistently estimable, this approach can also be justified as being consistent as the general consistency result of SC and HQ criteria applies in this context. The amount of computation involved in this approach is much larger than required by the previous two approaches since a two-dimensional search (for both lag length and break point) has to be carried out.

Given that the large-sample properties can be poor descriptions of the finite-sample behavior of the aforementioned three approaches, it is interesting to investigate their finite sample performances. Monte Carlo experiments based on a VAR model with two variables are carried out for this purpose. For each approach, three information criteria (AIC, SC and HQ) are considered, resulting in nine different methods. The performance of each method is assessed by the probability of finding the correct model and the mean square error (MSE) of one-step ahead forecasts. While the forecast performance is often regarded as an important measure to validate different approaches of modelling,foot_1 the probability of finding the correct model is also desirable for constructing structural VAR models where the dynamics of the model are important. We note that these two measures do not always agree in terms of ranking models.

Subject to the usual caveats of simulation studies, the simulation results may be summarized as follows. The third approach with AIC or HQ tends to detect a spurious break too often when there is no break in the data generating process (DGP) and the first approach appears to be sensitive to the choice of DGP parameters. Over all experiments, the second approach tends to exhibit satisfactory and stable performance both in forecasting and identifying the correct model. The results indicate that the best strategy for short-run forecasting does not always coincide with the best strategy for finding the correct model. In the literature, interestingly, there are similar observations that a mis-specified parsimonious model can outperform the correct model in terms of the forecast MSE, see Hendry and Clements (2001), or impulse response MSEs, see Ivanov and Kilian (2001).

Recently, Kiviet and Dufour (1997) considered an exact joint testing procedure to determine the lag length and structural change in the framework of single-equation autoregressive distributed lag models. Based on the generalized method of moments, Rossi (2000) proposed a joint testing procedure for selecting nested models in presence of parameter instability. Although these procedures can potentially be modified for VAR models and merit future research, they are not included in our comparison as such modifications are beyond the scope of this paper.

The paper is organized as follows. Section 2 lays out the framework and asymptotic considerations. The simulation design and results are given in Section 3. Section 4 concludes the paper. The proof is contained in the appendix.

MODEL AND LARGE SAMPLE CONSIDERATION

Model

Suppose that we observe the p-dimensional time series {y t } n t=-K +1 that can be decomposed into two unobservable parts

where n and K are respectively the effective sample size and the number of reserved observations. Here K is sufficiently large to accommodate the lags of y t such that the effective sample for a VAR model always begins at t = 1 and ends at t = n regardless of the number of lags included.

The first part y o t in (1) is generated by the stationary VAR process

where φ(L) = I pk j=1 φ j L j with L being the lag operator and k standing for the lag length; the random error term ε t is i.i.d. N(0, ) with a positive definite variance . The second part δ n 1 t (τ ) in (1) represents a possible break in the mean of the series, where the indicator 1 t (τ ) assumes value one if t ≥ τ and zero otherwise. Additionally, we make the following assumptions.

Part (a) of the assumptions requires y o t being stationary so that y t itself is piece-wise stationary. Part (b) is a standard assumption used to derive the asymptotic properties of break tests, where the interval is usually chosen to be [0.15, 0.85], see Andrews (1993). An implication of this assumption is that the subsample prior to the break point cannot be ignored even when the sample size n tends to infinity. Part (c) states that the change in the constant term may shrink towards zero as the sample size increases. It defines the 'small break' (α ≥ 1

2 ) that is difficult to detect in both small and large samples. Part (e) is also a standard assumption in the literature, see Lütkepohl (1993, p. 131).

Under Assumptions (a)-(e), we can write the DGP (1) and ( 2) as

where

However, the small episode of dynamics in b t (τ 0 ) will be replaced by a single dummy variable in the following model, which is used for statistical inferences,

where

The hypotheses regarding the mean break in (4) are stated as H 0 : d = 0 and H 1 : d = 0.

Model selection

When there is no break (δ n = 0) or the break point τ 0 is known, the lag length k 0 may be estimated by minimizing certain information criteria. We shall consider the criteria of Akaike (1969), Schwarz (1978) and Hannan and Quinn (1979), denoted as AIC, SC and HQ respectively. The estimators of k 0 are minimizers of

respectively, where ln is the natural logarithm; ˆ k is the maximum likelihood estimator of with lag length k; h > 2 is a constant (h = 2.02 is used in the simulation in Section 3); i = 2 if a break-dummy variable is included and i = 1 otherwise. For the third approach A3 below, the break point is treated as a parameter and the number of parameter in the criteria becomes p( pk + 2) + 1 if the break-dummy is included and p( pk + 1) otherwise. It is known that, when there is no break or the break point is known, both the SC and HQ estimators are strongly consistent for k 0 , see Hannan and Quinn (1979). We note that HQ, based on the law of the iterated logarithm, imposes the lowest penalty term that ensures the consistency of the resultant estimator.

The fact that AIC is not consistent for k 0 , see Shibata (1976), does not rule out the possibility that the AIC estimator might outperform SC or HQ in small samples.

On the other hand, if the true lag length k 0 is known, the break test supW documented in Andrews (1993) can be used, where supW = max

and W τ is the usual LM (or LR or Wald) statistic for testing the null of d = 0 against the alternative of d = 0 at a chosen break point τ in (4). Here [x] stands for the largest integer part or the floor of x. In addition to its asymptotic admissibility, the supW test has the advantage that the break point (or fraction) is estimated at the time of computing supW . Obviously, the optimal tests of Andrews and Ploberger (1994) can also be used in this context. Diebold and Chen (1996) show that the supW type tests have large size distortions in small samples from zero-mean AR(1) DGPs. Indeed, size distortions are also observed in our simulation experiments with VAR models. In this sense, the approaches A1 and A2 listed below are biased in small samples in terms of identifying the break and lag length.

In the cases where neither τ 0 nor k 0 is known, both the lag length and the break point need to be determined statistically. We consider the following three approaches, where the knowledge of the maximal lag length k m is required. A1: Determine the lag length first by minimizing one of the criteria, ignoring the possible break. Then perform the break test as a specification check, using the estimated lag length. A2: Use k m to perform the break test (a dummy variable is included in the model if a break is detected). Then determine the lag length by minimizing one of the criteria. A3: Estimate k 0 and τ 0 simultaneously by minimizing one of the criteria, searching the twodimensional grid on the lag length and break point (k, τ )

when the break dummy is included and the one-dimensional grid on k when the break dummy is excluded. Here t i = [nr i ] is the floor of nr i for i = 1, 2.

Large sample consideration

For the approach A1, a possibly mis-specified model is used to obtain the lag length estimator, which may be inconsistent. It is critical to clarify how the contamination caused by the ignored break affects the subsequent break test, at least for large-sample cases. Intuition suggests that the estimated lag length will be biased towards the direction that detecting the break becomes more difficult. However, as long as the lag length estimator is biased upwards (as we will show in the theorem below), the large-sample power properties of the break test will not be affected.

For A2, the model is correctly specified for the purpose of testing parameter stability, although its lag length may be longer than the true lag length. For large samples, since the presence or absence of a break with 0 ≤ α < 1 2 is correctly detected by the break test and the break fraction is consistently estimated, see Bai (1997), the lag length subsequently determined by SC or HQ is consistent. The asymptotic probability that the approach finds the correct model depends on the nominal size of the break test and the presence or absence of the break. Specifically, the asymptotic probability of obtaining the correct model is one minus the nominal size of the break test when there is no break, and is one when a break exists and the break test is consistent.

The approach A3 differs from the first two in that hypothesis testing is not involved and the amount of required computation is significantly larger. Given the fact that the break fraction r 0 and other parameters in the model can be consistently estimated for 0 ≤ α < 1 2 , see Bai (1997), the (weak) consistency of the lag length estimators from SC(k) and HQ(k) follows because every possible combination of the lag length and break point is searched.

The approaches A2 and A3 are consistent in the sense that the probability of correctly identifying the true model tends to one (or, for A2, one minus the nominal size of the break test if there is no break) as the sample size goes to infinity. However, the asymptotic behavior of the first approach has not been considered in the literature. Therefore, the theorem below may serve as an asymptotic justification for the first approach.

Theorem 1. Suppose the break in the mean is ignored when the lag length is estimated. Let kB and kH be the minimizers of SC(k) and HQ(k) respectively. Under Assumption (a)-(e),

2 and q(α) = 0, then the probability that kB or kH is greater than k 0 tends to 1 as n → ∞, where q(α) is defined in the proof;

(3) for any α ≥ 0, the probability that kB or kH is smaller than k 0 tends to 0 as n → ∞.

Proof. See appendix.

2

When the magnitude of the break δ n = δ/n α is 'small' (α ≥ 1 2 ), the consistency of the SC or HQ lag length estimator is retained. When the break magnitude is 'large' (0 ≤ α < 1 2 ), the lag length estimator is asymptotically biased upwards at worst. Since the probability that the lag length estimator is less than k 0 is negligible for large samples, the break test conditional on the estimated lag length retains the desired large sample properties, see Pötscher (1991, Lemma 2). In the sense that the large break can be consistently detected, the first approach can be regarded as asymptotically valid.

SIMULATION COMPARISON

Data generating process

The DGP in (1) and ( 2) is used for the simulation experiments with p = 2, k 0 = 2 and c = 0. In the simulation study we consider a maximum lag length of k m = 5, a true break fraction of r 0 = 0.7 and effective sample sizes of n = 100, 200, 300. The square root of the variance matrix of the disturbance vector ε t and the break magnitude are chosen as

where the scalar θ = 0, 0.5, 1 and the vector F contains the diagonal elements of φ(1) -1 1 2 . In other words, the break magnitude is proportional to the square root of the long-run variance of y o t . We consider two stationary VAR(2) models as DGPs. The first, DGP1, is defined by

and the second, DGP2, is defined by

The dominant autoregressive root for DGP1 is 0.838 or 1/1.193 and the serial correlations for DGP1 are positive. The dominant autoregressive roots for DGP2 are 0.450 ± i0.312 or 1/(1.500 ± i1.041) and the serial correlations for DGP2 can be negative. Series generated by DGP2 tend to oscillate more than those by DGP1.

Simulation

For each n, (n + 100) observations are drawn from DGP1 (or DGP2) with zero initial values, of which the first 95 observations are discarded and the last (n + 5) observations constitute a sample for inference. A constant term is always included in estimation. For each sample, the three approaches (A1-A3) defined in Section 2.2 are used to determine the lag length and the break. For each approach, the three information criteria (AIC, SC, HQ) are utilized. For A3, the presence of a break is also determined by the information criteria. For the approaches A1 and A2, the supW test on the space = [0.15, 0.85] at the 5% level is used to check the presence of the break,foot_4 where W τ is the Lagrangian multiplier (LM) statistic. 3 The finite-sample adjustment factor (nν)/n is applied to the supW statistic, where ν is the number of coefficients in a single equation of the VAR. The asymptotic critical value (=11.79), taken from Andrews (1993), is used for the case with θ = 0 (no break), from which the empirical critical value is computed and subsequently used for the cases with θ = 0. In the case that a break is detected, a corresponding dummy variable is included in the model. The number of Monte Carlo replications for each specification of the DGPs is 5000.

Results and comparison

Simulation results are given in Tables 1 and2 (for DGP1 and DGP2 respectively). There are nine different methods to identify a model (three approaches and three information criteria). For each parameter combination panel, reported in the tables are the following four block-columns: the relative frequencies of detected breaks, correct lags, correct models (i.e. both the lag length and break are correct) and the MSEs of one-step ahead forecasts. For A1 and A2, the relative frequency of detected breaks (the first column) is the size of the break test for the case with θ = 0 and the size-adjusted power for the cases with θ = 0.

Table 1

Simulation results for DGP1.

Table 2

Simulation results for DGP2.

Subject to the usual caveats of simulation studies, the simulation results may be summarized as follows.

(i) A3 with AIC or HQ tends to detect a spurious break too often when there is no break.

When there is a large break, A1 appears to have much larger lag-length biases for DGP2 than for DGP1. For the nine methods (three approaches and three information criteria), the first three block-columns list the relative frequencies of detected breaks, correct lags and correct models that are observed in the simulation. The last blockcolumn contains one-step ahead forecast MSEs. For the nine methods (three approaches and three information criteria), the first three block-columns list the relative frequencies of detected breaks, correct lags and correct models that are observed in the simulation. The last blockcolumn contains one-step ahead forecast MSEs.

c Royal Economic Society 2002 (ii) A2 seems to have the most stable performance across all experiments. When n = 100 and 200, A2 with AIC or HQ tends to find the correct modelfoot_6 more often than the other methods considered. The advantage of AIC in identifying the correct model in small samples accords with Kilian (2001)'s finding. (iii) The approach that has the best chance of finding the correct model does not necessarily deliver the best short-run forecast performance. (iv) The upward size distortion of the 5% supW test for the DGP1 with θ = 0 and n = 100 is not negligible.

In the experiments, the break point is estimated from the information criteria or as a result of computing supW (not included in the tables). The mean break fractions from all estimation methods are very close to one another. The estimated break fractions are biased downward. For example, for DGP1 with n = 100, θ = 1 and r 0 = 0.7, the typical mean, median and standard deviation of the estimated break fraction are 0.684, 0.700 and 0.060 respectively. In a different context, Lee and Strazicich (2001) find that endogenous break unit root tests tend to result in downward-biased estimates for a break point and the SC produces more accurate estimates for the break point.

To access the sensitivity to the true break fraction r 0 , experiments are carried out for r 0 = 0.5. The results are qualitatively similar to those reported in Tables 1 and2 and are not included.

Since there are many different choices for the VAR parameters and the simulation results depend on the specific parameters chosen for the simulation, the above results need to be interpreted with caution. We also note that, for forecasting purposes, some recently introduced forecasting techniques such as intercept corrections, see Clements and Hendry (1998, Ch. 8), which are not implemented in this simulation study, may be used to improve forecasts when a break occurs at or near the forecast origin.

Table 2 .

c Royal Economic Society 2002. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA, 02148, USA.

See Hendry and Clements (2001) for a discussion on the inadequacy of using forecast performance as the sole criterion of model comparison.c Royal Economic Society

c Royal Economic Society 2002

The optimal tests, aveW and expW , ofAndrews and Ploberger (1994) are included in the simulation but not reported.In most cases expW and supW have similar size and power properties while aveW is less powerful than expW and supW .

Some preliminary experiments indicate that, without sacrificing its power, the LM version of the supW test has the smallest size distortion in comparison with the Wald and LR versions. c Royal Economic Society 2002

The correct model here is defined as a model having the correct lag length and correct inclusion (or exclusion) of a break dummy.c Royal Economic Society 2002