Tema05 Box Jenkins1
Tema05 Box Jenkins1
Xavier Barber
Departamento de Estadística, Matemáticas e Informática
I. Centro de Investigación Operativa
Universidad Miguel Hernández de Elche
09/Apr/2019
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 1 / 96
Licencia del Material
Material adaptado de la Profesora Melody Ghahramanj (University of
Winnipeg) http://www.ghahramani.ca/
Creando el objeto ts
co2 = ts(co2dat$interpolated, freq = 12, start = c(1958, 3))
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1958 315.71 317.45 317.50 317.10 315.86 314.93 313.20 312.66
1959 315.62 316.38 316.71 317.72 318.29 318.15 316.54 314.80 313.84 313.26
1960 316.43 316.97 317.58 319.02 320.03 319.59 318.18 315.91 314.16 313.83
1961 316.93 317.70 318.54 319.48 320.58 319.77 318.57 316.79 314.80 315.38
1962 317.94 318.56 319.68 320.63 321.01 320.55 319.58 317.40 316.26 315.42
1963 318.74 319.08 319.86 321.39 322.25 321.47 319.74 317.77 316.21 315.99
1964 319.57 320.07 320.73 321.77 322.25 321.89 320.44 318.70 316.70 316.79
1965 319.44 320.44 320.89 322.13 322.16 321.87 321.39 318.81 317.81 317.30
1966 320.62 321.59 322.39 323.87 324.01 323.75 322.39 320.37 318.64 318.10
1967 322.07 322.50 323.04 324.42 325.00 324.09 322.55 320.92 319.31 319.31
1968 322.57 323.15 323.89 325.02 325.57 325.36 324.14 322.03 320.41 320.25
1969 324.00 324.42 325.64 326.66 327.34 326.76 325.88 323.67 322.38 321.78
1970 325.03 325.99 326.87 328.13 328.07 327.66 326.35 324.69 323.10 323.16
1971 326.17 326.68 327.18 327.78 328.92 328.57 327.34 325.46 323.36 323.57
1972 326.77 327.63 327.75 329.72 330.07 329.09 328.05 326.32 324.93 325.06
1973 328.54 329.56 330.30 331.50 332.48 332.07 330.87 329.31 327.51 327.18
1974 329.35 330.71 331.48 332.65 333.20 332.16 331.07 329.12 327.32 327.28
1975 330.73 331.46 331.90 333.17 333.94 333.45 331.98 329.95 328.50 328.34
1976 331.59 332.75 333.52 334.64 334.77 334.00 333.06 330.68 328.95 328.75
1977 332.66 333.13 334.95 336.13 336.93 336.17 334.88 332.56 331.29 331.27
1978 334.95 335.25 336.66 337.69 338.03 338.01 336.41 334.41 332.37 332.41
1979 336.14 336.69 338.27 338.95 339.21 339.26 337.54 335.75 333.98 334.19
Xavier Barber
1980 337.90 (@umh1465)
338.34 340.01 340.93 ×
ARIMA(p,d,q) SARIMA(P,D,Q)(P,
341.48 D, Q)s
341.33 339.40 337.70 09/Apr/2019
336.19 336.15 7 / 96
Leyendo la Serie temporal
Creando el objeto ts
A veces los datos en formato serie temporal se
registran de formar regular, y esto es muy
importante que lo indiquemos en el commando ts.
Recordad:
Semanal= 52
Mensual= 12
Trimestral= 4
Semestral= 6
También se puede especificar el año/periodo de
inicio:
freq=12, start(1995, 2) # febrero de 1995
freq=4, start(1995,2) # 2º trimestre de 1995
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 8 / 96
Leyendo la Serie temporal
390
Partes por millon
360
330
0
−2
0 20 40 60 80 100
Time
0
−1
−2
0 20 40 60 80 100
Time
−1
−2
−3
0 20 40 60 80 100
Time
Modelo ARMA(p,q)
# install.packages('astsa')
library(astsa)
acf2(out1, 48) #prints values and plots
0 10 20 30 40
LAG
0.6
PACF
0.2 −0.2
0 10 20 30 40
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 17 / 96
Modelo ARMA(p,q)
0 10 20 30 40
LAG
0.0 0.2
PACF
−0.4
0 10 20 30 40
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 18 / 96
Modelo ARMA(p,q)
0 10 20 30 40
LAG
0.4
0.2
PACF
−0.2
0 10 20 30 40
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 19 / 96
Modelo ARMA(p,q)
Trasnformando ts en R
Los modelos ARMA asumen que el proceso es de
estacionariedad “débil”.
Series NO estacionarias
Trasnformando ts en R
Tendencias lineales
Tomar la primera diferencia: wt = 5Yt = yt − yt−1 .
Y entonces ajustamos un ARMA al modelo wt
Ajustar el modelo yt = βo + β1 × t + at . Y
entonces ajustar los residuos utilizando un modelo
ARMA(p,q).
Diferenciando
Trasnformando ts en R
Tendencia cuadráticas
Tomar la segunda diferencia:
1000
500
Time
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
−0.2
−0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag Lag
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
−0.2
−0.2
0 5 10 15 0 5 10 15
Lag Lag
0.4
0.2
0.0
−0.2
4e+07
0e+00
0 50 100 150
Time
16
15
14
0 50 100 150
Time
0.8
0.8
0.6
0.4
ACF
ACF
0.4
0.2
0.0
0.0
0 5 10 15 20 −0.4 0 5 10 15 20
Lag Lag
Modelos ARIMA
ARIMA(p,d,q)
arima(x, order = c(0L, 0L, 0L),
seasonal = list(order = c(0L, 0L, 0L),
period = NA),
xreg = NULL, include.mean = TRUE,
transform.pars = TRUE,
fixed = NULL, init = NULL,
method = c("CSS-ML", "ML", "CSS"), n.cond,
SSinit = c("Gardner1980", "Rossignol2011"),
optim.method = "BFGS",
optim.control = list(), kappa = 1e6)
Se recomienda el uso de paquetes como sarima o astsa que pueden facilitar los
gráficos y otros aspectos del análisis y ajuste.
ARIMA(p,d,q)
Series de capturas de pescado
100
80
60
rec
40
20
0
Time
ARIMA(p,d,q)
Series: as.vector(rec)
1.0
0.5
ACF
0.0
−0.5
0 10 20 30 40
LAG
1.0
0.5
PACF
0.0 −0.5
0 10 20 30 40
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 36 / 96
Modelos ARIMA
ARIMA(p,d,q)
fit1 <- arima(rec, order = c(2, 0, 0))
fit1
##
## Call:
## arima(x = rec, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 1.3512 -0.4612 61.8585
## s.e. 0.0416 0.0417 4.0039
##
## sigma^2 estimated as 89.33: log likelihood = -1661.51, aic = 3331.02
ARIMA(p,d,q)
El “intercepto” en el arima es la estimación de la
media
El modelo ajustado será pues:
Estacionalidad
SARIMA(P,D,Q)
Cuando los datos muestran un comportamiento
estacional y esto queda patente también en el ACF,
entonces incorporaremos al modelo ajustado esta
componente utilizando los miodelos SARIMA(P,D,Q)
ARIMA(p,d,q) y SARIMA(P,D,Q)
sarima(xdata, p, d, q, P = 0, D = 0, Q = 0, S = -1,
details = TRUE, xreg=NULL, Model=TRUE,
tol = sqrt(.Machine$double.eps),
no.constant = FALSE)
SARIMA(P,D,Q)
La opción nop.constant:
Controla si el modelo SARIMA(P,D,Q) incluye la
constante en el modelo
En particular, si no hay difefrenciación (d=0 and
D=0) se obtiene al estimaciónd de la media.
Si hay diferenciación de orden 1 (d=1 o D=1, pero
no ambas), se incluye un término constante en
modelo
lo dicho anteriomente se obvia si el modelo inclute
no.constant=TRUE.
SARIMA(P,D,Q)
La idea es que la diferenciación total sea de primer
orden (d+D≤ 1)
Trabajaremos pues con modelos del estilo:
ARIMA(0,0,1) × SARIMA(0,1,1)1 2
Test de estacionariedad
La Prueba de Dickey-Fuller busca determinar la existencia o no de raíces
unitarias en una serie temporal.
H0 : No hay estacionariedad
Ha : Existe estacionariedad
##
## Augmented Dickey-Fuller Test
##
## data: rec
## Dickey-Fuller = -4.0878, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
Test de Estacionariedad-Tendencia
H0 : Serie estacionaria con una tendencia o nivel
##
## KPSS Test for Trend Stationarity
##
## data: serie
## KPSS Trend = 0.46997, Truncation lag parameter = 4, p-value = 0.01
Ejemplo SARIMA
modelo <- sarima(rec, 2, 0, 0)
Model: (2,0,0) Standardized Residuals
4
2
0
−2
−4
Time
Sample Quantiles
2
0.2
ACF
0
−2
0.0
−4
−0.2
Ejemplo SARIMA
modelo <- sarima(rec, 1, 1, 0, 0, 0, 2, 12)
Model: (1,1,0) (0,0,2) [12] Standardized Residuals
4
2
0
−2
−4
Time
4
0.4
Sample Quantiles
2
0.2
ACF
0
0.0
−2
−0.2
−4
Box-Ljung test
data: rec
X-squared = 1245.2, df = 25, p-value < 2.2e-16
Predicción
# modelo2<-auto.arima(rec)
sarima.for(rec, 12, 1, 1, 0, 0, 0, 2, 12, plot.all = FALSE)
100
50
rec
0−50
−100
Xavier Barber1980
(@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P,
1982 1984 D, Q)s1986 09/Apr/2019
1988 49 / 96
Estacionalidad
Predicción
# modelo2<-auto.arima(rec)
sarima.for(rec, 12, 1, 1, 0, 0, 0, 2, 12)
−100 100
rec
## $pred
## Jan Feb Mar Apr May Jun Jul
## 1987
## 1988 15.629090 17.139372 16.447856 13.995275 10.953905 9.175640 4.938035
## Aug Sep Oct Nov Dec
## 1987 15.486908 14.996399 13.484424
## 1988 3.129710 4.709371
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 1987
## 1988 27.094743 31.283351 35.010316 38.389846 41.499602 44.393685 47.110906
## Aug Sep Oct Nov Dec
## 1987 9.709541 16.580948 22.275078
## 1988 49.679947 52.122597
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 50 / 96
Ejemplo modelo ARIMA
US GNP Series:
Análisis del producto interior bruto de Estados Unidos
library(tseries)
library(astsa)
data(gnp)
plot(gnp)
title(’Quarterly U.S. GNP from 1947(1) to 1991(1)’)
acf2(as.vector(gnp), 50)
#estacionariedad
adf.test(rec, alternative="stationary", k=0)
# diferenciación de orden 1 (por no estacionaria)
plot(diff(gnp))
title(’First Difference of U.S. GNP from 1947(1) to 1991(1)’)
# diferenciación de orden 1 y logaritmo (por varianza no cons
gnpgr = diff(log(gnp)) # growth rate
plot(gnpgr)
title(’First difference of the U.S. log(GNP) data’)
acf2(as.vector(gnpgr),
Xavier Barber (@umh1465) 24)× SARIMA(P,D,Q)(P, D, Q)s
ARIMA(p,d,q) 09/Apr/2019 52 / 96
Ejemplo modelo ARIMA
US GNP Series:
Quarterly U.S. GNP from 1947(1) to 1991(1)
8000
6000
gnp
4000
2000
Time
0 10 20 30 40 50
LAG
0.8
PACF
0.4
0.0
0 10 20 30 40 50
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 54 / 96
Ejemplo modelo ARIMA
data: gnp
Dickey-Fuller = 0.069639, Lag order = 0, p-value = 0.99
alternative hypothesis: stationary
Claramente no es estacionaria.
US GNP: test de
estacionariedad-Tendencia
kpss.test(gnp, null = "Trend")
data: gnp
KPSS Trend = 0.94871, Truncation lag parameter = 4, p-value = 0.01
50
0
−50
−100
Time
data: diff(gnp)
Dickey-Fuller = -10.64, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary
0.00
−0.02
Time
data: gnpgr
Dickey-Fuller = -10.341, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary
data: gnpgr
KPSS Trend = 0.024163, Truncation lag parameter = 4, p-value = 0.1
5 10 15 20
LAG
0.4
0.2
PACF
0.0
−0.2
5 10 15 20
LAG
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 62 / 96
Ejemplo modelo ARIMA
AUTO.ARIMA
Este comando encuentra el “mejor” modelo que se
adapta a los datos, sea o no estacional.
install.packages("forecast")
library(forecast)
auto.arima(x, d=NA, D=NA, max.p=5, max.q=5,
max.P=2, max.Q=2, max.order=5, start.p=2,
start.q=2, start.P=1, start.Q=1,
stationary=FALSE,
seasonal=TRUE,ic=c("aicc","aic", "bic"),
stepwise=TRUE, trace=FALSE,
approximation=(length(x)>100 | frequency(x)>12),
xreg=NULL,test=c("kpss","adf","pp"),
seasonal.test=c("ocsb","ch"),allowdrift=TRUE,
lambda=NULL, parallel=FALSE, num.cores=NULL)
# library(forecast) # libro: https://otexts.com/fpp2/
ar.mod = Arima(log(gnp), order = c(1, 1, 0), sea = c(0, 0, 0))
auto.AR.mod = auto.arima(log(gnp), d = 1, D = 0, sea = FALSE)
Series: log(gnp)
ARIMA(1,1,0) with drift
Coefficients:
ar1 drift
0.3467 0.0083
s.e. 0.0627 0.0010
40
0.10
30
0.05
count
ACF
0.00 20
−0.05 10
−0.10
0
4 8 12 16 20 24 −0.02 0.00 0.02 0.04
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0) with drift
## Q* = 9.8183, df = 6, p-value = 0.1325
##
## Model df: 2. Total lags used: 8
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 65 / 96
Ejemplo modelo ARIMA
Series: log(gnp)
ARIMA(0,1,1) with drift
Coefficients:
ma1 drift
0.2719 0.0083
s.e. 0.0549 0.0008
0.02
0.00
−0.02
0.2 40
0.1 30
count
ACF
20
0.0
10
−0.1
0
4 8 12 16 20 24 −0.02 0.00 0.02 0.04
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 17.439, df = 6, p-value = 0.007797
## Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q) 09/Apr/2019 67 / 96
s
Ejemplo modelo ARIMA
Series: log(gnp)
ARIMA(1,1,1) with drift
Coefficients:
ar1 ma1 drift
0.4632 -0.1306 0.0083
s.e. 0.1272 0.1350 0.0010
0.02
0.00
−0.02
40
0.10
30
0.05
count
ACF
0.00 20
−0.05 10
−0.10
0
4 8 12 16 20 24 −0.02 0.00 0.02 0.04
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1) with drift
## Q* = 9.0298, df = 5, p-value = 0.1079
## Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q) 09/Apr/2019 69 / 96
s
Ejemplo modelo ARIMA
Series: log(gnp)
ARIMA(3,1,1) with drift
Coefficients:
ar1 ar2 ar3 ma1 drift
1.0905 -0.1251 -0.1739 -0.7881 0.0084
s.e. 0.3027 0.1529 0.0777 0.3112 0.0006
0.02
0.00
−0.02
40
0.10
0.05 30
count
ACF
0.00 20
−0.05
10
−0.10
0
4 8 12 16 20 24 −0.02 0.00 0.02 0.04
Lag residuals
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1) with drift
## Q* = 1.3307, df = 3, p-value = 0.7218
##
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 71 / 96
Ejemplo modelo ARIMA
Predicción
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2002 Q4 9.166345 9.154251 9.178440 9.147848 9.184842
2003 Q1 9.177074 9.157215 9.196933 9.146703 9.207446
2003 Q2 9.187448 9.160494 9.214403 9.146225 9.228671
2003 Q3 9.197489 9.164874 9.230104 9.147609 9.247369
2003 Q4 9.207020 9.169975 9.244065 9.150364 9.263675
2004 Q1 9.216098 9.175644 9.256551 9.154229 9.277966
2004 Q2 9.224803 9.181695 9.267912 9.158875 9.290732
2004 Q3 9.233249 9.188011 9.278487 9.164063 9.302434
2004 Q4 9.241535 9.194513 9.288557 9.169621 9.313449
2005 Q1 9.249746 9.201156 9.298337 9.175433 9.324059
2005 Q2 9.257940 9.207906 9.307973 9.181420 9.334459
2005 Q3 9.266151 9.214744 9.317558 9.187531 9.344772
autoPred$mean = exp(autoPred$mean)
autoPred$lower = exp(autoPred$lower)
autoPred$upper = exp(autoPred$upper)
autoPred$x = exp(autoPred$x)
autoPred$fitted = exp(autoPred$fitted)
autoPred$residuals = exp(autoPred$residuals)
plot(autoPred)
AirPassengers
library(tseries) # for ADF test
library(forecast) # auto.arima
data(AirPassengers) # load
class(AirPassengers)
## [1] "ts"
AirPassengers
autoplot(AirPassengers)
600
AirPassengers
400
200
AirPassengers
summary(AirPassengers)
AirPassengers
start(AirPassengers)
## [1] 1949 1
end(AirPassengers)
## [1] 1960 12
## [1] 12
AirPassengers
plot(AirPassengers)
linearModel = lm(AirPassengers ~ time(AirPassengers))
abline(reg = linearModel) # Probando un modelo lineal
500
AirPassengers
300
100
Time
AirPassengers: ACF
apNum = as.numeric(AirPassengers)
apDiff = diff(AirPassengers, differences = 1)
op = par(mfrow = c(1, 2)) # start multi plot
acf(apDiff, plot = T)
pacf(apDiff, plot = T)
AirPassengers: ACF
Series apDiff Series apDiff
0.6
1.0
0.4
0.8
0.6
0.2
Partial ACF
0.4
ACF
0.0
0.2
−0.2
0.0
−0.4
−0.2
Lag Lag
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(AirPassengers))
## Dickey-Fuller = -9.6003, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
0.1
0.0
−0.1
−0.2
acf(diff(log(apNum)))
Series diff(log(apNum))
1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
pacf(diff(log(apNum)))
Series diff(log(apNum))
0.4
Partial ACF
0.0
−0.4
5 10 15 20
Lag
## Series: AirPassengers
## ARIMA(2,1,1)(0,1,0)[12]
##
## Coefficients:
## ar1 ar2 ma1
## 0.5960 0.2143 -0.9819
## s.e. 0.0888 0.0880 0.0292
##
## sigma^2 estimated as 132.3: log likelihood=-504.92
## AIC=1017.85 AICc=1018.17 BIC=1029.35
## Series: log(AirPassengers)
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001371: log likelihood=244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
AirPassengers: checkresiduals
checkresiduals(autoArimaModelLog)
30
0.2
0.1 20
count
ACF
0.0
10
−0.1
0
12 24 36 −0.10 −0.05 0.00 0.05 0.10
Lag residuals
##
## Ljung-Box test
##Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 90 / 96
Ejemplo serie con estacionalidad
##
## Call:
## arima(x = log(AirPassengers), order = pdqParam, seasonal = list(
## period = 12))
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001348: log likelihood = 244.7, aic = -
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 91 / 96
Ejemplo serie con estacionalidad
AirPassengers: checkresiduals
checkresiduals(manualFit)
30
0.2
0.1 20
count
ACF
0.0
10
−0.1
0
12 24 36 −0.10 −0.05 0.00 0.05 0.10
Lag residuals
##
## Ljung-Box test
##
Xavier Barber (@umh1465) ARIMA(p,d,q) × SARIMA(P,D,Q)(P, D, Q)s 09/Apr/2019 92 / 96
Ejemplo serie con estacionalidad
AirPassengers: Prediciendo
autoPred = forecast(autoArimaModel, h = 25)
plot(autoPred)
Time