Applications#

import numpy as np
import pandas as pd
import pandas_datareader as pdr

import matplotlib.pyplot as plt
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa import stattools as st
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
from statsmodels.tsa.arima.model import ARIMA

# Suppress all warnings about future changes to syntax
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

Unemployment#

This application follows example in Tsay (2018), chapter 1. We’ll begin by downloading monthly unemployment data. This series, calculated by the Bureau of Labor Statistics, is seasonally adjusted and begins in 1948.

ur = pdr.get_data_fred('UNRATE', 1948)

We’ll begin by calculating the average rate on a quarterly basis. This is simple to do using the resample method in pandas, which is possible since we have a DatetimeIndex.

ur.index
DatetimeIndex(['1948-01-01', '1948-02-01', '1948-03-01', '1948-04-01',
               '1948-05-01', '1948-06-01', '1948-07-01', '1948-08-01',
               '1948-09-01', '1948-10-01',
               ...
               '2025-05-01', '2025-06-01', '2025-07-01', '2025-08-01',
               '2025-09-01', '2025-10-01', '2025-11-01', '2025-12-01',
               '2026-01-01', '2026-02-01'],
              dtype='datetime64[ns]', name='DATE', length=938, freq=None)
urq = ur.resample('Q')['UNRATE'].mean()
urq
DATE
1948-03-31    3.733333
1948-06-30    3.666667
1948-09-30    3.766667
1948-12-31    3.833333
1949-03-31    4.666667
                ...   
2025-03-31    4.133333
2025-06-30    4.200000
2025-09-30    4.333333
2025-12-31    4.450000
2026-03-31    4.350000
Freq: QE-DEC, Name: UNRATE, Length: 313, dtype: float64
urq.plot(marker='.', xlabel='', grid=True);
_images/755d9a8e347493a142055059c2f5ad8492a0f053ec0f1cfa5f80ff49dbfa4c66.png
urqd = urq.diff().dropna()
urqd.plot(marker='.', grid=True);
_images/057cce02d452824fc826691ae192379cf284e13cc3174898e8887e8fbc401a21.png

To simplify the analysis, we’ll drop the period at the end of the sample, when the pandemic caused a massive, never-before-seen change in unemployment.

urqd = urqd.loc[:'2019']

urqd.plot(marker='.', grid=True);
_images/2b7dc4e306325bdbc31f232ced02fbe9476eb06177ba650830be0a42ed13a32e.png

Model selection#

modsel = ar_select_order(urqd, maxlag=18, old_names=False)
# pass modsel.aic to pd.Series to get a nice table of AIC values

pd.Series(modsel.aic).round(4)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)                         54.3214
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)                             54.4209
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)                     55.8434
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)         57.2608
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)     57.6978
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)                 57.7964
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)             59.4825
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)                                 61.8631
(1, 2, 3, 4, 5, 6, 7, 8, 9)                                         63.4146
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)                                     65.3997
(1, 2, 3, 4, 5, 6, 7, 8)                                            71.0780
(1, 2, 3, 4, 5, 6, 7)                                               77.3457
(1, 2, 3, 4, 5)                                                     77.6608
(1, 2, 3, 4)                                                        78.0792
(1, 2, 3, 4, 5, 6)                                                  78.5765
(1, 2, 3)                                                           79.5936
(1, 2)                                                              81.4666
(1,)                                                                84.6806
0                                                                  222.6336
dtype: float64
pd.Series(modsel.bic).round(4)
(1,)                                                                91.8701
(1, 2)                                                              92.2507
(1, 2, 3)                                                           93.9724
(1, 2, 3, 4)                                                        96.0527
(1, 2, 3, 4, 5)                                                     99.2291
(1, 2, 3, 4, 5, 6, 7, 8, 9)                                         99.3617
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)                            101.1522
(1, 2, 3, 4, 5, 6, 7, 8)                                           103.4304
(1, 2, 3, 4, 5, 6)                                                 103.7395
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)                        104.6474
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)                                    104.9415
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)                                104.9996
(1, 2, 3, 4, 5, 6, 7)                                              106.1034
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)                    109.7641
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)                115.3118
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)            120.5926
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)        121.9656
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)    125.9974
0                                                                  226.2283
dtype: float64
ar_select_order(urqd, maxlag=18, old_names=False, ic='aic').ar_lags
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
ar_select_order(urqd, maxlag=18, old_names=False, ic='bic').ar_lags
[1]

Using BIC would choose \(p=1\), but AIC indicates \(p=13\).

Comparing these two models using both AIC and BIC suggests \(p=13\) is a good model.

ar1 = AutoReg(urqd, lags=1).fit()
print(ar1.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                 UNRATE   No. Observations:                  287
Model:                     AutoReg(1)   Log Likelihood                 -51.751
Method:               Conditional MLE   S.D. of innovations              0.290
Date:                Mon, 30 Mar 2026   AIC                            109.501
Time:                        11:27:01   BIC                            120.469
Sample:                    09-30-1948   HQIC                           113.898
                         - 12-31-2019                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       6.993e-05      0.017      0.004      0.997      -0.034       0.034
UNRATE.L1      0.6500      0.045     14.466      0.000       0.562       0.738
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.5385           +0.0000j            1.5385            0.0000
-----------------------------------------------------------------------------
ar13 = AutoReg(urqd, lags=13).fit()
print(ar13.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                 UNRATE   No. Observations:                  287
Model:                    AutoReg(13)   Log Likelihood                 -16.418
Method:               Conditional MLE   S.D. of innovations              0.257
Date:                Mon, 30 Mar 2026   AIC                             62.836
Time:                        11:27:05   BIC                            117.033
Sample:                    09-30-1951   HQIC                            84.590
                         - 12-31-2019                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0010      0.016      0.061      0.951      -0.029       0.031
UNRATE.L1      0.7109      0.060     11.827      0.000       0.593       0.829
UNRATE.L2     -0.0097      0.072     -0.135      0.893      -0.151       0.132
UNRATE.L3     -0.0635      0.072     -0.881      0.378      -0.205       0.078
UNRATE.L4     -0.2520      0.072     -3.506      0.000      -0.393      -0.111
UNRATE.L5      0.0935      0.072      1.290      0.197      -0.049       0.235
UNRATE.L6      0.1542      0.069      2.220      0.026       0.018       0.290
UNRATE.L7      0.0011      0.070      0.016      0.987      -0.136       0.138
UNRATE.L8     -0.3298      0.069     -4.751      0.000      -0.466      -0.194
UNRATE.L9      0.1800      0.072      2.509      0.012       0.039       0.321
UNRATE.L10     0.0809      0.071      1.136      0.256      -0.059       0.220
UNRATE.L11    -0.0135      0.071     -0.190      0.849      -0.153       0.126
UNRATE.L12    -0.2278      0.071     -3.209      0.001      -0.367      -0.089
UNRATE.L13     0.0949      0.058      1.639      0.101      -0.019       0.208
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.0734           -0.3600j            1.1321           -0.4485
AR.2            -1.0734           +0.3600j            1.1321            0.4485
AR.3            -0.8641           -0.8669j            1.2240           -0.3747
AR.4            -0.8641           +0.8669j            1.2240            0.3747
AR.5            -0.3644           -1.0261j            1.0889           -0.3043
AR.6            -0.3644           +1.0261j            1.0889            0.3043
AR.7             0.4348           -1.0201j            1.1089           -0.1859
AR.8             0.4348           +1.0201j            1.1089            0.1859
AR.9             0.8500           -0.7549j            1.1369           -0.1156
AR.10            0.8500           +0.7549j            1.1369            0.1156
AR.11            1.0908           -0.3209j            1.1371           -0.0455
AR.12            1.0908           +0.3209j            1.1371            0.0455
AR.13            2.2514           -0.0000j            2.2514           -0.0000
------------------------------------------------------------------------------

Let’s prune the model to keep only coefficients that are statistically significant.

ar13.tvalues
const          0.061384
UNRATE.L1     11.827093
UNRATE.L2     -0.134720
UNRATE.L3     -0.881150
UNRATE.L4     -3.506309
UNRATE.L5      1.289900
UNRATE.L6      2.220079
UNRATE.L7      0.016272
UNRATE.L8     -4.750995
UNRATE.L9      2.509202
UNRATE.L10     1.136459
UNRATE.L11    -0.190377
UNRATE.L12    -3.208969
UNRATE.L13     1.639165
dtype: float64
# 5% significance level

ar13.tvalues[ar13.tvalues.abs() > 1.96]
UNRATE.L1     11.827093
UNRATE.L4     -3.506309
UNRATE.L6      2.220079
UNRATE.L8     -4.750995
UNRATE.L9      2.509202
UNRATE.L12    -3.208969
dtype: float64
# 1% significance level

ar13.pvalues[ar13.pvalues < 0.01]
UNRATE.L1     2.827644e-32
UNRATE.L4     4.543675e-04
UNRATE.L8     2.024185e-06
UNRATE.L12    1.332118e-03
dtype: float64
ar_v2 = AutoReg(urqd, lags=[1, 4, 8, 12]).fit()

print(ar_v2.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                 UNRATE   No. Observations:                  287
Model:             Restr. AutoReg(12)   Log Likelihood                 -29.081
Method:               Conditional MLE   S.D. of innovations              0.269
Date:                Mon, 30 Mar 2026   AIC                             70.161
Time:                        11:27:09   BIC                             91.862
Sample:                    06-30-1951   HQIC                            78.870
                         - 12-31-2019                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0004      0.016      0.025      0.980      -0.031       0.032
UNRATE.L1      0.6276      0.045     14.010      0.000       0.540       0.715
UNRATE.L4     -0.1772      0.044     -3.987      0.000      -0.264      -0.090
UNRATE.L8     -0.1174      0.043     -2.733      0.006      -0.202      -0.033
UNRATE.L12    -0.1222      0.043     -2.856      0.004      -0.206      -0.038
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1             1.0301           -0.3268j            1.0807           -0.0489
AR.2             1.0301           +0.3268j            1.0807            0.0489
AR.3             0.8843           -0.7578j            1.1645           -0.1128
AR.4             0.8843           +0.7578j            1.1645            0.1128
AR.5             0.4442           -1.1222j            1.2069           -0.1900
AR.6             0.4442           +1.1222j            1.2069            0.1900
AR.7            -0.3140           -1.1605j            1.2023           -0.2921
AR.8            -0.3140           +1.1605j            1.2023            0.2921
AR.9            -1.1815           -0.3878j            1.2435           -0.4495
AR.10           -1.1815           +0.3878j            1.2435            0.4495
AR.11           -0.8630           -0.9177j            1.2597           -0.3701
AR.12           -0.8630           +0.9177j            1.2597            0.3701
------------------------------------------------------------------------------
print(AutoReg(urqd, lags=[1,  4,  6,  8,  9, 12], trend='n').fit().summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                 UNRATE   No. Observations:                  287
Model:             Restr. AutoReg(12)   Log Likelihood                 -19.055
Method:               Conditional MLE   S.D. of innovations              0.259
Date:                Mon, 30 Mar 2026   AIC                             52.110
Time:                        11:27:10   BIC                             77.427
Sample:                    06-30-1951   HQIC                            62.270
                         - 12-31-2019                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
UNRATE.L1      0.6668      0.044     15.141      0.000       0.580       0.753
UNRATE.L4     -0.2399      0.047     -5.126      0.000      -0.332      -0.148
UNRATE.L6      0.1848      0.049      3.745      0.000       0.088       0.282
UNRATE.L8     -0.3058      0.061     -5.047      0.000      -0.425      -0.187
UNRATE.L9      0.1861      0.056      3.323      0.001       0.076       0.296
UNRATE.L12    -0.1472      0.042     -3.512      0.000      -0.229      -0.065
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1             1.0569           -0.2923j            1.0966           -0.0429
AR.2             1.0569           +0.2923j            1.0966            0.0429
AR.3             0.8633           -0.7253j            1.1275           -0.1112
AR.4             0.8633           +0.7253j            1.1275            0.1112
AR.5             0.4598           -1.0438j            1.1406           -0.1840
AR.6             0.4598           +1.0438j            1.1406            0.1840
AR.7            -1.0861           -0.3895j            1.1538           -0.4452
AR.8            -1.0861           +0.3895j            1.1538            0.4452
AR.9            -0.4159           -1.0497j            1.1290           -0.3100
AR.10           -0.4159           +1.0497j            1.1290            0.3100
AR.11           -0.8780           -1.1148j            1.4190           -0.3562
AR.12           -0.8780           +1.1148j            1.4190            0.3562
------------------------------------------------------------------------------

Using glob=True performs a global search over all \(2^{12}=4096\) possible models. Clearly trying all possible models for a large number of lags would eventually become infeasible; using 20 lags requires testing over 1 million models, for example.

sel = ar_select_order(urqd, 12, old_names=False, glob=True)
sel.ar_lags
[1, 4, 6, 8, 9, 12]

Residuals#

The standardized residuals are calculated by scaling the residuals from their estimated standard deviation, \(\epsilon_t / \sigma.\)

std_resid = ar_v2.resid / np.sqrt(ar_v2.sigma2)

Hide code cell source

fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(16,12))
plt.subplots_adjust(hspace=0.25)

std_resid.plot(ax=ax1, xlabel='', title='Standardized Residuals')
ax1.grid(alpha=0.3)

fig = sm.graphics.tsa.plot_acf(std_resid.values, lags=30, ax=ax2,
                               zero=False, auto_ylims=True)
ax2.set_title('ACF')

acf, qstats, pvals = st.acf(std_resid, nlags=30, qstat=True)                           
ax3.plot(np.r_[np.nan, pvals], 'o')
ax3.set_ylim((0,1))
ax3.set_title('Box-Ljung statistic $p$-values')

plt.show()
_images/2946539e4df6ce67b4205d283bbd98356bf0f620027dda4338fc601d888d2403.png

We can use kernel density estimation to get an empirical estimate of the PDF.

ax = std_resid.plot.kde(xlim=(-3,3), bw_method=0.25)
ax.grid(alpha=0.25)
_images/75d8c67da3288e21586ad8c9efba5f46e6032d9e02e5310e03b7e884c64e2c4d.png
std_resid.skew(), std_resid.kurtosis()
(0.5712214353027926, 1.8777501533375105)

There are several statistical tests we can use to test the null hypothesis that data are drawn from a normal distribution. Two that are available in scipy are due to D’Agostino and Pearson (1973) and the Shapiro and Wilk (1965). The former combines information about the skewness and kurtosis of a series; the latter is a more complicated test that uses information about order statistics.

from scipy.stats import normaltest, shapiro
normaltest(std_resid)
NormaltestResult(statistic=28.285962906502313, pvalue=7.207442680254298e-07)
shapiro(std_resid)
ShapiroResult(statistic=0.958171619651281, pvalue=4.0510448979408166e-07)

Oil prices#

Next we’ll use the Crude Oil Price series for West Texas Intermediate at Cushing, Oklahoma (WTI), available from FRED.

oil = pdr.get_data_fred('WCOILWTICO', 2000, 2025)
oil.index
DatetimeIndex(['2000-01-07', '2000-01-14', '2000-01-21', '2000-01-28',
               '2000-02-04', '2000-02-11', '2000-02-18', '2000-02-25',
               '2000-03-03', '2000-03-10',
               ...
               '2024-10-25', '2024-11-01', '2024-11-08', '2024-11-15',
               '2024-11-22', '2024-11-29', '2024-12-06', '2024-12-13',
               '2024-12-20', '2024-12-27'],
              dtype='datetime64[ns]', name='DATE', length=1304, freq=None)
# specify a frequency for the series
oil.index = pd.DatetimeIndex(oil.index, freq='W-FRI')
ax = oil.plot(xlabel='')
ax.grid(alpha=0.3)
_images/56fcfbbbecf0a403c0b997528ce283f7b054602d052409004d2020e4c5ef0b45.png
oild = oil.diff().dropna()
oild.plot(lw=1, xlabel='');
_images/240ded594e65d367ee29f4366b46d0edfe1670b50b506e34bab95edddf704989.png
ar_select_order(oild, maxlag=18, old_names=False, ic='aic').ar_lags
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
ar = AutoReg(oild, lags=17, old_names=False).fit()
print(ar.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:             WCOILWTICO   No. Observations:                 1303
Model:                    AutoReg(17)   Log Likelihood               -3096.451
Method:               Conditional MLE   S.D. of innovations              2.688
Date:                Mon, 30 Mar 2026   AIC                           6230.902
Time:                        12:10:56   BIC                           6328.929
Sample:                    05-12-2000   HQIC                          6267.702
                         - 12-27-2024                                         
==================================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0246      0.075      0.329      0.743      -0.122       0.172
WCOILWTICO.L1      0.1428      0.028      5.128      0.000       0.088       0.197
WCOILWTICO.L2     -0.0012      0.028     -0.043      0.965      -0.056       0.054
WCOILWTICO.L3      0.0630      0.028      2.248      0.025       0.008       0.118
WCOILWTICO.L4     -0.0087      0.028     -0.308      0.758      -0.064       0.046
WCOILWTICO.L5      0.0439      0.028      1.569      0.117      -0.011       0.099
WCOILWTICO.L6      0.0337      0.028      1.200      0.230      -0.021       0.089
WCOILWTICO.L7     -0.0216      0.028     -0.770      0.441      -0.077       0.033
WCOILWTICO.L8      0.1108      0.028      3.942      0.000       0.056       0.166
WCOILWTICO.L9     -0.0605      0.028     -2.140      0.032      -0.116      -0.005
WCOILWTICO.L10     0.0056      0.028      0.199      0.842      -0.050       0.061
WCOILWTICO.L11    -0.0049      0.028     -0.173      0.862      -0.060       0.050
WCOILWTICO.L12     0.0184      0.028      0.654      0.513      -0.037       0.074
WCOILWTICO.L13    -0.0640      0.028     -2.272      0.023      -0.119      -0.009
WCOILWTICO.L14     0.0492      0.028      1.744      0.081      -0.006       0.105
WCOILWTICO.L15    -0.0174      0.028     -0.616      0.538      -0.073       0.038
WCOILWTICO.L16    -0.0737      0.028     -2.613      0.009      -0.129      -0.018
WCOILWTICO.L17     0.0593      0.028      2.116      0.034       0.004       0.114
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.1847           -0.1382j            1.1927           -0.4815
AR.2            -1.1847           +0.1382j            1.1927            0.4815
AR.3            -0.9510           -0.6157j            1.1329           -0.4086
AR.4            -0.9510           +0.6157j            1.1329            0.4086
AR.5            -0.6576           -0.8816j            1.0998           -0.3520
AR.6            -0.6576           +0.8816j            1.0998            0.3520
AR.7            -0.2201           -1.1087j            1.1303           -0.2812
AR.8            -0.2201           +1.1087j            1.1303            0.2812
AR.9             0.1814           -1.1568j            1.1709           -0.2252
AR.10            0.1814           +1.1568j            1.1709            0.2252
AR.11            0.6172           -1.0118j            1.1852           -0.1628
AR.12            0.6172           +1.0118j            1.1852            0.1628
AR.13            0.9385           -0.7602j            1.2078           -0.1084
AR.14            0.9385           +0.7602j            1.2078            0.1084
AR.15            1.2165           -0.2853j            1.2495           -0.0367
AR.16            1.2165           +0.2853j            1.2495            0.0367
AR.17            1.3631           -0.0000j            1.3631           -0.0000
------------------------------------------------------------------------------
# 1% significance level

ar.tvalues[ar.tvalues.abs() > 2.326]
WCOILWTICO.L1     5.128402
WCOILWTICO.L8     3.942358
WCOILWTICO.L16   -2.613228
dtype: float64
ar_b = AutoReg(oild, lags=[1,8,16], old_names=False).fit()
print(ar_b.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:             WCOILWTICO   No. Observations:                 1303
Model:             Restr. AutoReg(16)   Log Likelihood               -3111.642
Method:               Conditional MLE   S.D. of innovations              2.715
Date:                Mon, 30 Mar 2026   AIC                           6233.284
Time:                        12:12:39   BIC                           6259.085
Sample:                    05-05-2000   HQIC                          6242.970
                         - 12-27-2024                                         
==================================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
const              0.0296      0.076      0.392      0.695      -0.119       0.178
WCOILWTICO.L1      0.1317      0.027      4.803      0.000       0.078       0.185
WCOILWTICO.L8      0.1056      0.028      3.829      0.000       0.052       0.160
WCOILWTICO.L16    -0.0735      0.028     -2.659      0.008      -0.128      -0.019
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1             1.1493           -0.1948j            1.1657           -0.0267
AR.2             1.1493           +0.1948j            1.1657            0.0267
AR.3             0.9566           -0.6670j            1.1662           -0.0969
AR.4             0.9566           +0.6670j            1.1662            0.0969
AR.5             0.6846           -0.9526j            1.1730           -0.1508
AR.6             0.6846           +0.9526j            1.1730            0.1508
AR.7             0.2118           -1.1545j            1.1737           -0.2211
AR.8             0.2118           +1.1545j            1.1737            0.2211
AR.9            -0.1919           -1.1663j            1.1820           -0.2760
AR.10           -0.1919           +1.1663j            1.1820            0.2760
AR.11           -0.6725           -0.9724j            1.1823           -0.3463
AR.12           -0.6725           +0.9724j            1.1823            0.3463
AR.13           -0.9684           -0.6871j            1.1874           -0.4018
AR.14           -0.9684           +0.6871j            1.1874            0.4018
AR.15           -1.1693           -0.2068j            1.1875           -0.4721
AR.16           -1.1693           +0.2068j            1.1875            0.4721
------------------------------------------------------------------------------
std_resid = ar_b.resid / np.sqrt(ar_b.sigma2)

Hide code cell source

fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(16,12))
plt.subplots_adjust(hspace=0.25)

std_resid.plot(ax=ax1, xlabel='', title='Standardized Residuals')

fig = sm.graphics.tsa.plot_acf(std_resid.values, lags=30, zero=False, ax=ax2)
ax2.set_title('Standardized Residuals')
ax2.set_ylabel('ACF')

fig = sm.graphics.tsa.plot_acf(std_resid.values**2, lags=30, zero=False, ax=ax3)
ax3.set_title('Squared Standardized Residuals')
ax3.set_ylabel('ACF')

plt.show()
_images/ad51340695ebd1c92b41b592f9f7cceb47ae1883da3c85c3b1fba23c2d099bbd.png

Hide code cell source

ax = std_resid.plot.kde(xlim=(-3,3), bw_method=0.25, label='Standardized residuals')

# Add N(0,1) plot
from scipy.stats import norm
x = np.linspace(-3, 3, 100)
pdf = norm(0,1).pdf(x)
ax.plot(x, pdf, 'k--', linewidth=1, label='N(0,1)')

ax.legend(loc='upper left')
plt.show()
_images/9959a1bce46c13f192f2ca964a9550012b7ee07bd486c16ad5408f7ade0a492c.png
std_resid.skew(), std_resid.kurtosis()
(-0.2535073744148516, 5.4491566084082415)
normaltest(std_resid)
NormaltestResult(statistic=171.8329216954173, pvalue=4.863569756405212e-38)

Housing starts#

Next we’ll examine housing starts using the series Housing Starts: Total: New Privately Owned Housing Units Started (HOUSTNSA).

hstarts = pdr.get_data_fred('HOUSTNSA', 1959, 2025).asfreq('MS').squeeze()

Hide code cell source

fig, (ax1, ax2) = plt.subplots(2, figsize=(12,9))

hstarts.plot(ax=ax1, xlabel='')

hstartsd = hstarts.pct_change().dropna() * 100
hstartsd.plot(ax=ax2, xlabel='')
plt.show()
_images/f91dafaa3f353acc567a731e4916bb492c69e1b8d2590377b19c6ca4129d700a.png
sel = ar_select_order(hstartsd, 24, old_names=False)

# fit selected model
res = sel.model.fit()

print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  792
Model:                    AutoReg(24)   Log Likelihood               -2848.436
Method:               Conditional MLE   S.D. of innovations              9.875
Date:                Mon, 30 Mar 2026   AIC                           5748.872
Time:                        12:15:21   BIC                           5869.610
Sample:                    02-01-1961   HQIC                          5795.343
                         - 01-01-2025                                         
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            1.5393      0.461      3.341      0.001       0.636       2.442
HOUSTNSA.L1     -0.2832      0.035     -8.062      0.000      -0.352      -0.214
HOUSTNSA.L2     -0.1187      0.036     -3.254      0.001      -0.190      -0.047
HOUSTNSA.L3     -0.0450      0.037     -1.225      0.220      -0.117       0.027
HOUSTNSA.L4     -0.0753      0.037     -2.057      0.040      -0.147      -0.004
HOUSTNSA.L5     -0.0107      0.036     -0.293      0.769      -0.082       0.061
HOUSTNSA.L6     -0.0470      0.036     -1.289      0.197      -0.119       0.024
HOUSTNSA.L7     -0.0481      0.036     -1.321      0.186      -0.119       0.023
HOUSTNSA.L8     -0.0775      0.036     -2.130      0.033      -0.149      -0.006
HOUSTNSA.L9     -0.0279      0.036     -0.769      0.442      -0.099       0.043
HOUSTNSA.L10    -0.1118      0.036     -3.087      0.002      -0.183      -0.041
HOUSTNSA.L11     0.0277      0.036      0.762      0.446      -0.044       0.099
HOUSTNSA.L12     0.3472      0.035      9.879      0.000       0.278       0.416
HOUSTNSA.L13     0.2475      0.035      7.042      0.000       0.179       0.316
HOUSTNSA.L14     0.0466      0.036      1.287      0.198      -0.024       0.118
HOUSTNSA.L15    -0.0273      0.036     -0.757      0.449      -0.098       0.043
HOUSTNSA.L16    -0.1371      0.036     -3.803      0.000      -0.208      -0.066
HOUSTNSA.L17    -0.0259      0.036     -0.714      0.475      -0.097       0.045
HOUSTNSA.L18    -0.0848      0.036     -2.340      0.019      -0.156      -0.014
HOUSTNSA.L19     0.0015      0.036      0.042      0.966      -0.070       0.073
HOUSTNSA.L20    -0.0930      0.036     -2.559      0.010      -0.164      -0.022
HOUSTNSA.L21    -0.0929      0.036     -2.553      0.011      -0.164      -0.022
HOUSTNSA.L22    -0.0094      0.037     -0.258      0.797      -0.081       0.062
HOUSTNSA.L23     0.0607      0.036      1.675      0.094      -0.010       0.132
HOUSTNSA.L24     0.2272      0.035      6.525      0.000       0.159       0.295
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.1039           -0.0000j            1.1039           -0.5000
AR.2            -1.0266           -0.2516j            1.0570           -0.4617
AR.3            -1.0266           +0.2516j            1.0570            0.4617
AR.4            -0.8897           -0.5143j            1.0276           -0.4166
AR.5            -0.8897           +0.5143j            1.0276            0.4166
AR.6            -0.7635           -0.7742j            1.0874           -0.3739
AR.7            -0.7635           +0.7742j            1.0874            0.3739
AR.8            -0.5372           -0.9002j            1.0483           -0.3356
AR.9            -0.5372           +0.9002j            1.0483            0.3356
AR.10           -0.3285           -1.0522j            1.1023           -0.2982
AR.11           -0.3285           +1.0522j            1.1023            0.2982
AR.12           -0.0143           -1.0421j            1.0422           -0.2522
AR.13           -0.0143           +1.0421j            1.0422            0.2522
AR.14            0.2139           -1.0530j            1.0745           -0.2181
AR.15            0.2139           +1.0530j            1.0745            0.2181
AR.16            0.5019           -0.8717j            1.0059           -0.1669
AR.17            0.5019           +0.8717j            1.0059            0.1669
AR.18            0.7584           -0.8062j            1.1068           -0.1299
AR.19            0.7584           +0.8062j            1.1068            0.1299
AR.20            0.8700           -0.5015j            1.0042           -0.0832
AR.21            0.8700           +0.5015j            1.0042            0.0832
AR.22            1.0827           -0.2575j            1.1129           -0.0372
AR.23            1.0827           +0.2575j            1.1129            0.0372
AR.24            1.1026           -0.0000j            1.1026           -0.0000
------------------------------------------------------------------------------

Removing seasonality#

By allowing each month of the year to have its own intercept — or “dummy variable” — we can remove the seasonal variation in the series.

hstartsd.groupby(hstartsd.index.month).mean()
DATE
1     -4.269131
2      4.160395
3     31.359965
4     13.224910
5      4.059291
6      0.538924
7     -3.537971
8     -1.794444
9     -4.022403
10     2.524579
11   -15.054469
12   -13.934532
Name: HOUSTNSA, dtype: float64

When estimating an autoregression or running order selection, we can include seasonal dummies. Here’s we’ll exclude the main intercept and instead have 12 dummies (one for each month).

sel = ar_select_order(hstartsd, 24, seasonal=True, trend='n', old_names=False)
sel.ar_lags
[1]

The initial data begin in January, but when we difference the data we end up with a series that starts in February. Therefore, the s(1,12) coefficient is automatically defined as the February dummy.

res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  792
Model:               Seas. AutoReg(1)   Log Likelihood               -2932.672
Method:               Conditional MLE   S.D. of innovations              9.861
Date:                Mon, 30 Mar 2026   AIC                           5893.345
Time:                        12:15:36   BIC                           5958.771
Sample:                    03-01-1959   HQIC                          5918.492
                         - 01-01-2025                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
s(1,12)         3.2422      1.232      2.633      0.008       0.828       5.656
s(2,12)        32.2999      1.222     26.424      0.000      29.904      34.696
s(3,12)        20.3096      1.629     12.466      0.000      17.117      23.503
s(4,12)         7.0470      1.297      5.431      0.000       4.504       9.590
s(5,12)         1.4560      1.222      1.192      0.233      -0.939       3.851
s(6,12)        -3.4162      1.214     -2.814      0.005      -5.796      -1.037
s(7,12)        -2.5937      1.220     -2.126      0.034      -4.985      -0.203
s(8,12)        -4.4278      1.215     -3.643      0.000      -6.810      -2.046
s(9,12)         1.6159      1.222      1.323      0.186      -0.779       4.011
s(10,12)      -14.4841      1.217    -11.902      0.000     -16.869     -12.099
s(11,12)      -17.3356      1.321    -13.121      0.000     -19.925     -14.746
s(12,12)       -7.4172      1.306     -5.678      0.000      -9.978      -4.857
HOUSTNSA.L1    -0.2259      0.035     -6.520      0.000      -0.294      -0.158
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -4.4264           +0.0000j            4.4264            0.5000
-----------------------------------------------------------------------------
fig = plt.figure(figsize=(16,9))
fig = res.plot_diagnostics(fig=fig, lags=30)
_images/585dc2c2281d20324fc686d1a10efe73a6754f82c243ffc87643a4db19b05a64.png

Inflation#

Consumer Price Index for All Urban Consumers: All Items in U.S. City Average (CPIAUCNS)

cpi = pdr.get_data_fred('CPIAUCNS', start=1970).squeeze()

cpi.index = pd.DatetimeIndex(cpi.index, freq='MS')
cpi.tail(10)
DATE
2025-05-01    321.465
2025-06-01    322.561
2025-07-01    323.048
2025-08-01    323.976
2025-09-01    324.800
2025-10-01        NaN
2025-11-01    324.122
2025-12-01    324.054
2026-01-01    325.252
2026-02-01    326.785
Freq: MS, Name: CPIAUCNS, dtype: float64
# calculating an annualized inflation rate in percent
inf = np.log(cpi).resample('QS').mean().diff()[1:] * 400
inf.tail(10)
DATE
2023-10-01    0.419097
2024-01-01    4.143722
2024-04-01    4.582997
2024-07-01    1.206464
2024-10-01    0.898997
2025-01-01    4.113436
2025-04-01    3.443490
2025-07-01    2.892898
2025-10-01    0.182036
2026-01-01    2.374514
Freq: QS-JAN, Name: CPIAUCNS, dtype: float64

Hide code cell source

fig, ax = plt.subplots()
ax.plot(inf.index, inf, '.', label=r'$\Delta \log CPI$')
ax.legend(loc='lower left')
ax.grid(alpha=0.3)
plt.show()
_images/9a5937b56355b866a0caa31bc9e8158a2555d3cd849fa24b99373e3881062562.png

Another flexible model for estimation are seasonal autoregressive integrated moving average models, or SARIMAX. These models are a class of state space models,

\[\begin{equation*} \begin{split}y_t & = Z_t \alpha_t + d_t + \varepsilon_t, \\ \alpha_{t+1} & = T_t \alpha_t + c_t + R_t \eta_t. \\\end{split} \end{equation*}\]

Here, we observe the time series \(y_t\), but it depends on the state variable, \(\alpha_{t}\), which is unobservable. The \(\varepsilon\) and \(\eta\) terms are both noise processes. (Uppercase letters denote matrices, while lowercase letters are vectors.) The first equation is called the observation equation while the second set of equations are the state equations.

This class of models nests ARMA models as well as other types of models. For example, we can write the ARMA(1,1) model

\[y_t = \phi y_{t-1} + \varepsilon_t + \theta_1 \varepsilon_{t-1}, \qquad \varepsilon_t \sim N(0, \sigma^2)\]

as a state-space model:

\[\begin{align*} y_t & = \underbrace{\begin{bmatrix} 1 & \theta_1 \end{bmatrix}}_{Z} \underbrace{\begin{bmatrix} \alpha_{1,t} \\ \alpha_{2,t} \end{bmatrix}}_{\alpha_t} \\ \begin{bmatrix} \alpha_{1,t+1} \\ \alpha_{2,t+1} \end{bmatrix} & = \underbrace{\begin{bmatrix} \phi & 0 \\ 1 & 0 \\ \end{bmatrix}}_{T} \begin{bmatrix} \alpha_{1,t} \\ \alpha_{2,t} \end{bmatrix} + \underbrace{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}_{R} \underbrace{\varepsilon_{t+1}}_{\eta_t}. \end{align*}\]

To see this, note that the observation equation says

\[y_t = \alpha_{1,t} + \theta_1 \alpha_{2,t},\]

which implies that

(17)#\[\begin{align} y_t - \phi y_{t-1} &= \alpha_{1,t} + \theta_1 \alpha_{2,t} - \phi(\alpha_{1,t-1} + \theta_1 \alpha_{2,t-1}) \\ &= \alpha_{1,t} - \phi\alpha_{1,t-1} + \theta_1(\alpha_{2,t} - \phi \alpha_{2,t-1}). \end{align}\]

The two state equations imply that

\[\alpha_{1,t+1} = \phi\alpha_{1,t} + \varepsilon_{t+1} \Longrightarrow \alpha_{1,t} - \phi\alpha_{1,t-1} = \varepsilon_{t}\]

and

\[\alpha_{2,t+1} = \alpha_{1,t} \Longrightarrow \alpha_{2,t} = \alpha_{1,t-1}.\]

Substituting these into (15), we get

\[y_t - \phi y_{t-1} = \varepsilon_{t} + \theta_1\varepsilon_{t-1},\]

which is the desired ARMA(1,1) model.

The following examples are based on those of Chad Fulton, an economist at the Federal Reserve.

# Estimate ARMA(1,1) model with SARIMAX
inf_model2 = sm.tsa.SARIMAX(inf, order=(1, 0, 1))
inf_results2 = inf_model2.fit()

print(inf_results2.summary())
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            3     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.40150D+00    |proj g|=  1.02415D-01

At iterate    5    f=  2.36720D+00    |proj g|=  6.26196D-02

At iterate   10    f=  2.35807D+00    |proj g|=  1.00190D-03

At iterate   15    f=  2.35804D+00    |proj g|=  1.58860D-06

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    3     15     17      1     0     0   1.589D-06   2.358D+00
  F =   2.3580439428985573     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
                               SARIMAX Results                                
==============================================================================
Dep. Variable:               CPIAUCNS   No. Observations:                  224
Model:               SARIMAX(1, 0, 1)   Log Likelihood                -528.202
Date:                Mon, 30 Mar 2026   AIC                           1062.404
Time:                        12:17:59   BIC                           1072.639
Sample:                    04-01-1970   HQIC                          1066.535
                         - 01-01-2026                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.9771      0.015     64.390      0.000       0.947       1.007
ma.L1         -0.6176      0.056    -11.014      0.000      -0.727      -0.508
sigma2         6.4916      0.317     20.457      0.000       5.870       7.114
===================================================================================
Ljung-Box (L1) (Q):                   4.32   Jarque-Bera (JB):               558.70
Prob(Q):                              0.04   Prob(JB):                         0.00
Heteroskedasticity (H):               1.82   Skew:                            -1.17
Prob(H) (two-sided):                  0.01   Kurtosis:                        10.37
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
 This problem is unconstrained.

Notice that the observation from October 2008 is quite different from all the other data. This outlier might be having a out-sized impact on our results, so we can remove its effect by letting it have its own intercept. We do this by including an additional exogenous variable in the regression using the exog argument; in this case the variable is a vector that is zero for every observation except the one for October 2008.

outlier = pd.Series(np.zeros(len(inf)), index=inf.index, name='outlier')
outlier['2008-10-01'] = 1
inf_model3 = sm.tsa.SARIMAX(inf, order=(1, 0, 1), exog=outlier)
inf_results3 = inf_model3.fit()

print(inf_results3.summary())
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            4     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.30107D+00    |proj g|=  1.14907D-01

At iterate    5    f=  2.26805D+00    |proj g|=  3.18944D-02

At iterate   10    f=  2.26137D+00    |proj g|=  1.75196D-03

At iterate   15    f=  2.25957D+00    |proj g|=  1.20166D-03

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    4     18     21      1     0     0   8.724D-06   2.260D+00
  F =   2.2595629677557780     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
                               SARIMAX Results                                
==============================================================================
Dep. Variable:               CPIAUCNS   No. Observations:                  224
Model:               SARIMAX(1, 0, 1)   Log Likelihood                -506.142
Date:                Mon, 30 Mar 2026   AIC                           1020.284
Time:                        12:18:01   BIC                           1033.931
Sample:                    04-01-1970   HQIC                          1025.793
                         - 01-01-2026                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
outlier      -14.5575      1.717     -8.480      0.000     -17.922     -11.193
ar.L1          0.9772      0.014     71.008      0.000       0.950       1.004
ma.L1         -0.5745      0.057    -10.092      0.000      -0.686      -0.463
sigma2         5.3281      0.474     11.231      0.000       4.398       6.258
===================================================================================
Ljung-Box (L1) (Q):                   2.61   Jarque-Bera (JB):                 2.16
Prob(Q):                              0.11   Prob(JB):                         0.34
Heteroskedasticity (H):               1.28   Skew:                            -0.09
Prob(H) (two-sided):                  0.28   Kurtosis:                         3.45
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
 This problem is unconstrained.

We can also add additional structure to the model to remove seasonality using the by specifying values for the seasonal_order(p,d,q,s) argument. The final s parameter indicates the periodicity of the time series; the first three parameters are analogous to the parameters to the order arguement, but are applied with lags defined by the periodicity.

For example, since we’re working with quarterly data, it is natural to specify s=4. Setting p=1 and s=1 therefore takes 4-period differences of the data and includes an additional autoregressive term at a lag of 4.

inf_model3 = sm.tsa.SARIMAX(inf, order=(3, 0, 0), seasonal_order=(1, 1, 0, 4), exog=outlier)
inf_results3 = inf_model3.fit()

print(inf_results3.summary())
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            6     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.26680D+00    |proj g|=  1.47627D-01

At iterate    5    f=  2.24409D+00    |proj g|=  3.24593D-03

At iterate   10    f=  2.24282D+00    |proj g|=  2.24330D-02

At iterate   15    f=  2.24052D+00    |proj g|=  1.48140D-03

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    6     18     21      1     0     0   3.794D-06   2.241D+00
  F =   2.2405167990716035     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
                                     SARIMAX Results                                     
=========================================================================================
Dep. Variable:                          CPIAUCNS   No. Observations:                  216
Model:             SARIMAX(3, 0, 0)x(1, 1, 0, 4)   Log Likelihood                -483.952
Date:                           Mon, 04 Mar 2024   AIC                            979.903
Time:                                   20:58:09   BIC                           1000.043
Sample:                               04-01-1970   HQIC                           988.043
                                    - 01-01-2024                                         
Covariance Type:                             opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
outlier      -11.1106      1.449     -7.670      0.000     -13.950      -8.271
ar.L1          0.4959      0.059      8.400      0.000       0.380       0.612
ar.L2         -0.0471      0.068     -0.697      0.486      -0.179       0.085
ar.L3          0.1884      0.064      2.967      0.003       0.064       0.313
ar.S.L4       -0.3854      0.059     -6.546      0.000      -0.501      -0.270
sigma2         5.6065      0.560     10.020      0.000       4.510       6.703
===================================================================================
Ljung-Box (L1) (Q):                   0.47   Jarque-Bera (JB):                 0.56
Prob(Q):                              0.49   Prob(JB):                         0.76
Heteroskedasticity (H):               1.14   Skew:                            -0.10
Prob(H) (two-sided):                  0.59   Kurtosis:                         3.14
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
 This problem is unconstrained.
inf_results3.plot_diagnostics(figsize=(16,9));
_images/a9014b54218a70bca23862900117a25c61ade15efacabfebe9ad53ba7dcc34d9.png
inf_results3.forecast(4, exog=np.zeros(4))
2024-04-01    4.967205
2024-07-01    2.598361
2024-10-01   -0.654048
2025-01-01    1.485415
Freq: QS-JAN, Name: predicted_mean, dtype: float64