Shiller’s CAPE

Shiller’s CAPE#

Bob Shiller is a professor of economics at Yale and winner of the 2013 Nobel Prize in economics for his empirical analysis of asset prices.

Shiller developed the Cyclically Adjusted Price-to-Earnings (CAPE) Ratio as a valuation metric for stock markets. CAPE is measured as the ratio of the current price of a market index (e.g. the S&P 500) to the average real earnings over the past ten years for stocks in the index.

Let’s calculate CAPE using data from Shiller’s website.

The URL for current file is:

https://img1.wsimg.com/blobby/go/e5e77e0b-59d1-44d9-ab25-4763ac982e53/downloads/25d6827d-c04b-447a-bb6d-918d5d88be49/ie_data.xls?ver=1770307872442.

I have saved this URL in a variable called file_loc.

shiller = pd.read_excel(file_loc,
                        sheet_name='Data', skiprows=range(7),
                        skipfooter=1, usecols='A:E',
                        dtype={'Date':str})
shiller
Date P D E CPI
0 1871.01 4.440000 0.26000 0.4 12.464061
1 1871.02 4.500000 0.26000 0.4 12.844641
2 1871.03 4.610000 0.26000 0.4 13.034972
3 1871.04 4.740000 0.26000 0.4 12.559226
4 1871.05 4.860000 0.26000 0.4 12.273812
... ... ... ... ... ...
1857 2025.1 6735.691739 78.62672 NaN 325.212000
1858 2025.11 6740.885789 78.77344 NaN 324.122000
1859 2025.12 6853.025455 78.92016 NaN 324.054000
1860 2026.01 6929.122000 NaN NaN 324.020000
1861 2026.02 6882.720000 NaN NaN 324.003000

1862 rows × 5 columns

We’ll use the Date column to add a ‘DatetimeIndex’.

shiller['Date'] = pd.date_range(start='1871-01-31', periods=len(shiller), freq='ME')

shiller = shiller.set_index('Date')
shiller.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1862 entries, 1871-01-31 to 2026-02-28
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   P       1862 non-null   float64
 1   D       1860 non-null   float64
 2   E       1857 non-null   float64
 3   CPI     1862 non-null   float64
dtypes: float64(4)
memory usage: 72.7 KB
shiller
P D E CPI
Date
1871-01-31 4.440000 0.26000 0.4 12.464061
1871-02-28 4.500000 0.26000 0.4 12.844641
1871-03-31 4.610000 0.26000 0.4 13.034972
1871-04-30 4.740000 0.26000 0.4 12.559226
1871-05-31 4.860000 0.26000 0.4 12.273812
... ... ... ... ...
2025-10-31 6735.691739 78.62672 NaN 325.212000
2025-11-30 6740.885789 78.77344 NaN 324.122000
2025-12-31 6853.025455 78.92016 NaN 324.054000
2026-01-31 6929.122000 NaN NaN 324.020000
2026-02-28 6882.720000 NaN NaN 324.003000

1862 rows × 4 columns

To convert all the data to “real” terms, we need to adjust for inflation. We do this by calculating how much the CPI has changed from each year until the last year in the sample. (Shiller uses the non-seasonally adjusted CPI data, which starts in 1913, along with another price series that goes earlier.)

Exercise

Use the data to calculate the total inflation rate during 2020–2025.

We’ll create a CPI factor using the number from a few months before the end of the series because the most recent values are estimates.

shiller['CPI_factor'] = shiller['CPI'] / shiller['CPI'].iloc[-3]
shiller.head()
P D E CPI CPI_factor
Date
1871-01-31 4.44 0.26 0.4 12.464061 0.038463
1871-02-28 4.50 0.26 0.4 12.844641 0.039637
1871-03-31 4.61 0.26 0.4 13.034972 0.040225
1871-04-30 4.74 0.26 0.4 12.559226 0.038757
1871-05-31 4.86 0.26 0.4 12.273812 0.037876
shiller.tail(12)
P D E CPI CPI_factor
Date
2025-03-31 5683.983333 76.145301 216.690000 319.799 0.986869
2025-04-30 5369.495714 76.546867 218.636667 320.795 0.989943
2025-05-31 5810.919524 76.948434 220.583333 321.465 0.992011
2025-06-30 6029.951500 77.350000 222.530000 322.561 0.995393
2025-07-31 6296.498182 77.726667 226.770000 323.048 0.996896
2025-08-31 6408.949524 78.103333 231.010000 323.976 0.999759
2025-09-30 6584.018095 78.480000 235.250000 324.800 1.002302
2025-10-31 6735.691739 78.626720 NaN 325.212 1.003573
2025-11-30 6740.885789 78.773440 NaN 324.122 1.000210
2025-12-31 6853.025455 78.920160 NaN 324.054 1.000000
2026-01-31 6929.122000 NaN NaN 324.020 0.999895
2026-02-28 6882.720000 NaN NaN 324.003 0.999843
shiller['p_real'] = shiller['P'] / shiller['CPI_factor']
shiller['d_real'] = shiller['D'] / shiller['CPI_factor']
shiller['e_real'] = shiller['E'] / shiller['CPI_factor']
shiller = shiller.reindex(columns=['p_real', 'd_real', 'e_real'])
shiller.head()
p_real d_real e_real
Date
1871-01-31 115.435871 6.759758 10.399628
1871-02-28 113.529289 6.559470 10.091492
1871-03-31 114.606226 6.463692 9.944141
1871-04-30 122.301797 6.708537 10.320827
1871-05-31 128.314047 6.864538 10.560827
shiller.tail(15)
p_real d_real e_real
Date
2024-12-31 6171.825434 76.835574 215.796420
2025-01-31 6099.662594 76.782344 216.609972
2025-02-28 6132.786084 76.887310 217.859309
2025-03-31 5759.610052 77.158432 219.573111
2025-04-30 5424.045151 77.324517 220.857826
2025-05-31 5857.719240 77.568157 222.359857
2025-06-30 6057.861624 77.708021 223.559998
2025-07-31 6316.106033 77.968714 227.476182
2025-08-31 6410.492533 78.122137 231.065618
2025-09-30 6568.895935 78.299747 234.709678
2025-10-31 6711.707596 78.346750 NaN
2025-11-30 6739.471568 78.756914 NaN
2025-12-31 6853.025455 78.920160 NaN
2026-01-31 6929.849085 NaN NaN
2026-02-28 6883.803381 NaN NaN

Next we’ll calculate a rolling 10-year average of earnings.

earn_ma = shiller['e_real'].rolling(window=120, min_periods=100).mean()

earn_ma.iloc[95:105]
Date
1878-12-31          NaN
1879-01-31          NaN
1879-02-28          NaN
1879-03-31          NaN
1879-04-30    10.955256
1879-05-31    10.979792
1879-06-30    11.007675
1879-07-31    11.035686
1879-08-31    11.065405
1879-09-30    11.092137
Name: e_real, dtype: float64

The value of CAPE is the current price value divided by the rolling average lagged one period. We do this by calling .shift to shift the time series one period.

shiller['CAPE'] = shiller['p_real'] / earn_ma.shift()

shiller.tail(12)
p_real d_real e_real CAPE
Date
2025-03-31 5759.610052 77.158432 219.573111 34.784750
2025-04-30 5424.045151 77.324517 220.857826 32.621270
2025-05-31 5857.719240 77.568157 222.359857 35.076697
2025-06-30 6057.861624 77.708021 223.559998 36.111092
2025-07-31 6316.106033 77.968714 227.476182 37.474246
2025-08-31 6410.492533 78.122137 231.065618 37.846145
2025-09-30 6568.895935 78.299747 234.709678 38.580387
2025-10-31 6711.707596 78.346750 NaN 39.205692
2025-11-30 6739.471568 78.756914 NaN 39.272363
2025-12-31 6853.025455 78.920160 NaN 39.832601
2026-01-31 6929.849085 NaN NaN 40.172372
2026-02-28 6883.803381 NaN NaN 39.797673

Check your understanding

Check your understanding. Why is CAPE not missing at the end of the series?

Next, we’ll add a recession indicator variable (1 or 0) by accessing data from the St. Louis Federal Reserve data service, FRED using the pandas_datareader.

The particular series we want is at https://fred.stlouisfed.org/series/USREC.

import pandas_datareader as pdr
nber = pdr.get_data_fred('USREC', '1870')
nber.head()
USREC
DATE
1870-01-01 1
1870-02-01 1
1870-03-01 1
1870-04-01 1
1870-05-01 1
nber.mean()
USREC    0.274426
dtype: float64

The easiest way to merge these two dataframes is to adjust the NBER data’s index so that its dates are at the end of the month, like they are in the Shiller data. We do this using the pandas MonthEnd offset function. Since we want to adjust the date to be the end of the current month, we add zero months.

from pandas.tseries.offsets import MonthEnd

nber.index = nber.index + MonthEnd(0)
shiller = shiller.join(nber).dropna()

shiller.head()
p_real d_real e_real CAPE USREC
Date
1879-05-31 156.036733 7.457289 13.433416 14.243094 0.0
1879-06-30 158.673595 7.613127 13.823836 14.451421 0.0
1879-07-31 159.997056 7.591940 13.892814 14.535046 0.0
1879-08-31 161.185153 7.655305 14.126473 14.605812 0.0
1879-09-30 161.492602 7.462336 13.872291 14.594369 0.0

Check your understanding

Why does this joined data series start in May of 1879?

Plotting#

Now we can plot the series and add some other bells and whistles.

import matplotlib.pyplot as plt
from matplotlib.dates import YearLocator

# main plot
ax = shiller['CAPE'].plot(figsize=(14,9), ylim=(0,45), legend=True, x_compat=True, c='b', lw=1)

# create Series with means and current value repeated
avg = pd.Series(shiller['CAPE'].mean(), index=shiller.index)
avg47 = pd.Series(shiller.loc['1947':,'CAPE'].mean(), index=shiller.loc['1947':].index)
curentval = pd.Series(shiller['CAPE'].iloc[-1], index=shiller.index)

# add lines for means and current value
curentval.plot(ax=ax, label='Current level', legend=True, c='k', ls=':', lw=1)
avg.plot(ax=ax, label='Avg', legend=True, c='r', ls='--', lw=1)
avg47.plot(ax=ax, label='Avg, post-war', legend=True, c='g', ls='--', lw=1)
plt.legend(loc='upper left', fontsize='medium')

# add shaded areas for recessions
ax.fill_between(shiller['CAPE'].index, 0, 45, where=shiller['USREC']==1, 
                facecolor='lightgrey', edgecolor='lightgrey', alpha=0.5)

# change year locations
yrs10 = YearLocator(10)
yrs5 = YearLocator(5)
ax.xaxis.set_major_locator(yrs10)
ax.xaxis.set_minor_locator(yrs5)
ax.set_xlim(left=shiller.index[0], right=shiller.index[-1])
ax.set_xlabel('')

plt.show()
_images/471b7eca73c32f079f4b64b76b1f1c11e1616f07b6a169669d3ad9f8f44ed2f3.png

Correlations#

Let’s calculate future 2- and 5-year returns. We do this by dividing the price in 2 or 5 years by the price “today”.

The forward returns is

\[R_{t,t+k} = \frac{P_{t+k}}{P_t} - 1.\]

Note that we cannot calculate this using pct_change.

shiller['ret2yr'] = shiller['p_real'].shift(-2*12) / shiller['p_real'] - 1
shiller['ret5yr'] = shiller['p_real'].shift(-5*12) / shiller['p_real'] - 1
shiller.dropna()
p_real d_real e_real CAPE USREC ret2yr ret5yr
Date
1879-05-31 156.036733 7.457289 13.433416 14.243094 0.0 0.418777 0.091373
1879-06-30 158.673595 7.613127 13.823836 14.451421 0.0 0.412371 0.029384
1879-07-31 159.997056 7.591940 13.892814 14.535046 0.0 0.338353 0.031959
1879-08-31 161.185153 7.655305 14.126473 14.605812 0.0 0.271915 0.088661
1879-09-30 161.492602 7.462336 13.872291 14.594369 0.0 0.231891 0.063768
... ... ... ... ... ... ... ...
2020-05-31 3690.074336 75.387123 132.620054 27.328481 0.0 0.213890 0.587426
2020-06-30 3902.596951 75.018494 124.733331 28.838316 0.0 0.092605 0.552264
2020-07-31 4011.724373 74.294919 123.684510 29.599195 0.0 0.066495 0.574412
2020-08-31 4228.630539 73.716452 122.875992 31.158209 0.0 0.076015 0.515974
2020-09-30 4190.138074 73.269471 122.285938 30.839426 0.0 0.003305 0.567704

1697 rows × 7 columns

shiller[['CAPE', 'ret2yr', 'ret5yr']].corr()
CAPE ret2yr ret5yr
CAPE 1.000000 -0.154628 -0.214495
ret2yr -0.154628 1.000000 0.581104
ret5yr -0.214495 0.581104 1.000000
shiller.loc['1980':, ['CAPE', 'ret2yr', 'ret5yr']].corr()
CAPE ret2yr ret5yr
CAPE 1.000000 -0.270159 -0.463946
ret2yr -0.270159 1.000000 0.590445
ret5yr -0.463946 0.590445 1.000000

Discussion questions#

  1. How would you interpret these correlation results?

  2. How confident are you that we should believe these correlations? Should you use this result to make investment decisions?

  3. How might changing macroeconomic trends affect the relevance of the CAPE ratio?