Intro to ACF and
PACF
A RIMA MODELS IN P YTH ON
James Fulton
Climate informatics researcher
Motivation
ARIMA MODELS IN PYTHON
ACF and PACF
ACF - Autocorrelation Function
PACF - Partial autocorrelation function
ARIMA MODELS IN PYTHON
What is the ACF
lag-1 autocorrelation → corr(yt , yt−1 )
lag-2 autocorrelation → corr(yt , yt−2 )
...
lag-n autocorrelation → corr(yt , yt−n )
ARIMA MODELS IN PYTHON
What is the ACF
ARIMA MODELS IN PYTHON
What is the PACF
ARIMA MODELS IN PYTHON
Using ACF and PACF to choose model order
AR(2) model →
ARIMA MODELS IN PYTHON
Using ACF and PACF to choose model order
MA(2) model →
ARIMA MODELS IN PYTHON
Using ACF and PACF to choose model order
ARIMA MODELS IN PYTHON
Using ACF and PACF to choose model order
ARIMA MODELS IN PYTHON
Implementation in Python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Create figure
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(8,8))
# Make ACF plot
plot_acf(df, lags=10, zero=False, ax=ax1)
# Make PACF plot
plot_pacf(df, lags=10, zero=False, ax=ax2)
plt.show()
ARIMA MODELS IN PYTHON
Implementation in Python
ARIMA MODELS IN PYTHON
Over/under differencing and ACF and PACF
ARIMA MODELS IN PYTHON
Over/under differencing and ACF and PACF
ARIMA MODELS IN PYTHON
Let's practice!
A RIMA MODELS IN P YTH ON
AIC and BIC
A RIMA MODELS IN P YTH ON
James Fulton
Climate informatics researcher
AIC - Akaike information criterion
Lower AIC indicates a better model
AIC likes to choose simple models with lower order
ARIMA MODELS IN PYTHON
BIC - Bayesian information criterion
Very similar to AIC
Lower BIC indicates a better model
BIC likes to choose simple models with lower order
ARIMA MODELS IN PYTHON
AIC vs BIC
BIC favors simpler models than AIC
AIC is better at choosing predictive models
BIC is better at choosing good explanatory model
ARIMA MODELS IN PYTHON
AIC and BIC in statsmodels
# Create model
model = SARIMAX(df, order=(1,0,1))
# Fit model
results = model.fit()
# Print fit summary
print(results.summary())
Statespace Model Results
==============================================================================
Dep. Variable: y No. Observations: 1000
Model: SARIMAX(2, 0, 0) Log Likelihood -1399.704
Date: Fri, 10 May 2019 AIC 2805.407
Time: 01:06:11 BIC 2820.131
Sample: 01-01-2013 HQIC 2811.003
- 09-27-2015
Covariance Type: opg
ARIMA MODELS IN PYTHON
AIC and BIC in statsmodels
# Create model
model = SARIMAX(df, order=(1,0,1))
# Fit model
results = model.fit()
# Print AIC and BIC
print('AIC:', results.aic)
print('BIC:', results.bic)
AIC: 2806.36
BIC: 2821.09
ARIMA MODELS IN PYTHON
Searching over AIC and BIC
# Loop over AR order
for p in range(3):
# Loop over MA order
for q in range(3):
# Fit model
model = SARIMAX(df, order=(p,0,q))
results = model.fit()
# print the model order and the AIC/BIC values
print(p, q, results.aic, results.bic)
0 0 2900.13 2905.04
0 1 2828.70 2838.52
0 2 2806.69 2821.42
1 0 2810.25 2820.06
1 1 2806.37 2821.09
1 2 2807.52 2827.15
...
ARIMA MODELS IN PYTHON
Searching over AIC and BIC
order_aic_bic =[]
# Loop over AR order
for p in range(3):
# Loop over MA order
for q in range(3):
# Fit model
model = SARIMAX(df, order=(p,0,q))
results = model.fit()
# Add order and scores to list
order_aic_bic.append((p, q, results.aic, results.bic))
# Make DataFrame of model order and AIC/BIC scores
order_df = pd.DataFrame(order_aic_bic, columns=['p','q', 'aic', 'bic'])
ARIMA MODELS IN PYTHON
Searching over AIC and BIC
# Sort by AIC # Sort by BIC
print(order_df.sort_values('aic')) print(order_df.sort_values('bic'))
p q aic bic p q aic bic
7 2 1 2804.54 2824.17 3 1 0 2810.25 2820.06
6 2 0 2805.41 2820.13 6 2 0 2805.41 2820.13
4 1 1 2806.37 2821.09 4 1 1 2806.37 2821.09
2 0 2 2806.69 2821.42 2 0 2 2806.69 2821.42
... ...
ARIMA MODELS IN PYTHON
Non-stationary model orders
# Fit model
model = SARIMAX(df, order=(2,0,1))
results = model.fit()
ValueError: Non-stationary starting autoregressive parameters
found with `enforce_stationarity` set to True.
ARIMA MODELS IN PYTHON
When certain orders don't work
# Loop over AR order
for p in range(3):
# Loop over MA order
for q in range(3):
# Fit model
model = SARIMAX(df, order=(p,0,q))
results = model.fit()
# Print the model order and the AIC/BIC values
print(p, q, results.aic, results.bic)
ARIMA MODELS IN PYTHON
When certain orders don't work
# Loop over AR order
for p in range(3):
# Loop over MA order
for q in range(3):
try:
# Fit model
model = SARIMAX(df, order=(p,0,q))
results = model.fit()
# Print the model order and the AIC/BIC values
print(p, q, results.aic, results.bic)
except:
# Print AIC and BIC as None when fails
print(p, q, None, None)
ARIMA MODELS IN PYTHON
Let's practice!
A RIMA MODELS IN P YTH ON
Model diagnostics
A RIMA MODELS IN P YTH ON
James Fulton
Climate informatics researcher
Introduction to model diagnostics
How good is the nal model?
ARIMA MODELS IN PYTHON
Residuals
ARIMA MODELS IN PYTHON
Residuals
# Fit model
model = SARIMAX(df, order=(p,d,q))
results = model.fit()
# Assign residuals to variable
residuals = results.resid
2013-01-23 1.013129
2013-01-24 0.114055
2013-01-25 0.430698
2013-01-26 -1.247046
2013-01-27 -0.499565
... ...
ARIMA MODELS IN PYTHON
Mean absolute error
How far our the predictions from the real values?
mae = np.mean(np.abs(residuals))
ARIMA MODELS IN PYTHON
Plot diagnostics
If the model ts well the residuals will be white
Gaussian noise
# Create the 4 diagostics plots
results.plot_diagnostics()
plt.show()
ARIMA MODELS IN PYTHON
Residuals plot
ARIMA MODELS IN PYTHON
Residuals plot
ARIMA MODELS IN PYTHON
Histogram plus estimated density
ARIMA MODELS IN PYTHON
Normal Q-Q
ARIMA MODELS IN PYTHON
Correlogram
ARIMA MODELS IN PYTHON
Summary statistics
print(results.summary())
...
===================================================================================
Ljung-Box (Q): 32.10 Jarque-Bera (JB): 0.02
Prob(Q): 0.81 Prob(JB): 0.99
Heteroskedasticity (H): 1.28 Skew: -0.02
Prob(H) (two-sided): 0.21 Kurtosis: 2.98
===================================================================================
Prob(Q) - p-value for null hypothesis that residuals are uncorrelated
Prob(JB) - p-value for null hypothesis that residuals are normal
ARIMA MODELS IN PYTHON
Let's practice!
A RIMA MODELS IN P YTH ON
Box-Jenkins method
A RIMA MODELS IN P YTH ON
James Fulton
Climate informatics researcher
The Box-Jenkins method
From raw data → production model
identi cation
estimation
model diagnostics
ARIMA MODELS IN PYTHON
Identi cation
Is the time series stationary?
What differencing will make it stationary?
What transforms will make it stationary?
What values of p and q are most promising?
ARIMA MODELS IN PYTHON
Identi cation tools
Plot the time series
df.plot()
Use augmented Dicky-Fuller test
adfuller()
Use transforms and/or differencing
df.diff() , np.log() , np.sqrt()
Plot ACF/PACF
plot_acf() , plot_pacf()
ARIMA MODELS IN PYTHON
Estimation
Use the data to train the model coef cients
Done for us using model.fit()
Choose between models using AIC and BIC
results.aic , results.bic
ARIMA MODELS IN PYTHON
Model diagnostics
Are the residuals uncorrelated
Are residuals normally distributed
results.plot_diagnostics()
results.summary()
ARIMA MODELS IN PYTHON
Decision
ARIMA MODELS IN PYTHON
Repeat
We go through the process again with more
information
Find a better model
ARIMA MODELS IN PYTHON
Production
Ready to make forecasts
results.get_forecast()
ARIMA MODELS IN PYTHON
Box-Jenkins
ARIMA MODELS IN PYTHON
Let's practice!
A RIMA MODELS IN P YTH ON