Introduction
In this blog we will explore how to set up and interpret cointegration results using a real-world time series example. We will cover the case with no structural breaks as well as the case with one unknown structural break using tools from the GAUSS tspdlib library.
Dataset
In this blog, we will use the famous Nelson-Plosser time series data. The dataset contains macroeconomic fundamentals for the United States.
We will be using three of these fundamentals:
- M2 money stock.
- Bond yield (measured by the basic yields of 30-year corporate bonds).
- S&P 500 index stock prices.
The time series data is annual data, covering 1900 - 1970.
Preparing for Cointegration
In order to prepare for cointegration testing, we will take some preliminary time series modeling steps. We will:
Establishing an Underlying Theory
In this example, we will examine the macroeconomic question of whether stock prices are linked to macroeconomic indicators. In particular, we will examine if there is a cointegrated, long-run relationship between the S&P 500 price index and monetary policy indicators of the M2 money stock and the bond yields.
Mathematically we will consider the cointegrated relationship:
$$y_{sp, t} = c + \beta_1 y_{money, t} + \beta_2y_{bond, t} + u_t$$
Time Series Visualization
When visualizing time series data, we look for visual evidence of:
- The comovements between our variables.
- The presence of deterministic components such as constants and time trends.
- Potential structural breaks.
Our time series plots give us some important considerations for our testing, providing visual evidence to support:
- Comovements between the variables.
- At least one structural break in the time series dynamics of all three of our variables.
- A potential time trend in the datasets, especially in the later years of the sample.
Unit Root Testing
Prior to testing for cointegration between our time series data, we should check for unit roots in the data. We will do this using the adf
procedure in the tspdlib
library to conduct the Augmented Dickey-Fuller unit root test.
Variable | Test Statistic | 1% Critical Value | 5% Critical Value | 10% Critical Value | Conclusion |
---|---|---|---|---|---|
Money | 1.621 | -4.04 | -3.45 | -3.15 | Cannot reject the null |
Bond yield | -1.360 | -4.04 | -3.45 | -3.15 | Cannot reject the null |
S&P 500 | -0.3842 | -4.04 | -3.45 | -3.15 | Cannot reject the null |
Our ADF test statistics are greater than the 10% critical value for all of our time series. This implies that we cannot reject the null hypothesis of a unit root for any of our time series data.
Unit Root Testing with Structural Breaks
What about the potential structural break that we see in our time series data? Does this have an impact on our unit root testing?
Using the adf_1break
procedure in the tspdlib
library to test for unit roots with a single structural break in the trend and constant we get the following results.
Variable | Test Statistic | Break Date | 1% Critical Value | 5% Critical Value | 10% Critical Value | Conclusion |
---|---|---|---|---|---|---|
Money | -4.844 | 1948 | -5.57 | -5.08 | -4.82 | Cannot reject the null |
Bond yield | -3.226 | 1963 | -5.57 | -5.08 | -4.82 | Cannot reject the null |
S&P 500 | -4.639 | 1945 | -5.57 | -5.08 | -4.82 | Cannot reject the null |
Our ADF test statistics again suggest that even when accounting for the structural break, we cannot reject the null hypothesis of a unit root for any of our time series data.
Conducting our Cointegration Tests
Having concluded that there is evidence for unit roots in our data, we can now run our cointegration tests.
When setting up cointegration tests, there are a number of assumptions that we must specify:
- Which normalization we want to use.
- The deterministic components to include in our model.
- The maximum number of lags to allow in our test.
- The information criterion to use to select the optimal number of lags.
To better understand these general assumptions, let’s look at the simplest of our tests, the Engle-Granger cointegration test.
Normalization
In the two-stage, residual-based cointegration tests which we will consider today, normalization amounts to deciding which variable is our dependent variable and which variables are our independent variables in the cointegration regression.
We will choose our normalization to reflect our theoretical question of whether the S&P 500 index is cointegrated with the money stock and the bond yield. As we mentioned earlier, this means we will consider the cointegrated relationship:
$$y_{sp, t} = c + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
// Set fname to name of dataset
fname = "nelsonplosser.dta";
// Load three variables from the dataset
// and remove rows with missing values
coint_data = packr(loadd(fname, "sp500 + m + bnd"));
// Define y and x matrix
y = coint_data[., 1];
x = coint_data[., 2 3];
The Deterministic Component
The second assumption we must make about our Engle-Granger test is which model
we wish to use. To understand how to make this decision, let's look closer at what this input means.
The Engle-Granger test is a two-step test:
- Estimate the cointegration regression.
- Test for stationary in the residuals using the ADF unit root test.
When we specify which model to use we impact two things:
- The deterministic components which are used in the first-stage cointegration regression.
- The distribution of the test statistic.
There are three options to choose from:
-
No constant or trend (
model = 0
) $$y_{sp, t} = \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$ -
Constant (
model = 1
) $$y_{sp, t} = \alpha + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$ - Constant and trend (
model = 2
) $$y_{sp, t} = \alpha + \delta t + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
For our example, we will include a constant and trend in our first-stage cointegration regression by setting:
// Select model with constant and trend
model = 2;
The Lag Specifications
In the second-stage ADF residual unit root test, the error terms should be serially independent. To account for possible autocorrelation, lags of the first differences of the residual can be included in ADF test regression.
The GAUSS coint_egranger
will automatically determine the optimal number of lags to include in the second-stage regression based on two user inputs:
- The maximum number of lags to allow.
- The criterion to use to determine the optimal number of lags:
- The Akaike information criterion (AIC) [
ic = 0
] - The Schwarz information criterion (SIC) [
ic = 1
] - The t-stat criterion [
ic = 2
]
- The Akaike information criterion (AIC) [
/*
** Information Criterion:
** 1=Akaike;
** 2=Schwarz;
** 3=t-stat sign.
*/
ic = 2;
// Maximum number of lags
pmax = 12;
Calling our Cointegration Test
Now that we have loaded our data and chosen the test settings, we can call the coint_egranger
procedure:
// Perform Engle-Granger Cointegration Test
{ tau_eg, cvADF_eg } = coint_egranger(y, x, model, pmax, ic);
Interpreting Our Cointegration Results
In order to interpret our cointegration results, let's revisit the two steps of the Engle-Granger test:
- Estimate the cointegration regression.
- Test the residuals from the cointegration regression for unit roots.
The Engle-Granger test statistic for cointegration reduces to an ADF unit root test of the residuals of the cointegration regression:
- If the residuals contain a unit root, then there is no cointegration.
- The null hypothesis of the ADF test is that the residuals have a unit root. Therefore, the Engle-Granger test considers the null hypothesis that there is no cointegration.
- As the Engle-Granger test statistic decreases:
- We are more likely to reject the null hypothesis of no cointegration.
- We have stronger evidence that the variables are cointegrated.
After running our cointegration test we obtain the following results:
-----------Engle-Granger Test--------------------------- -----------Constant and Trend--------------------------- H0: no co-integration (EG, 1987 & P0, 1990) Test Statistic CV(1%, 5%, 10%) ------ ------------------------------------- EG_ADF -2.105 -4.645 -4.157 -3.843
We can see that:
- Our test statistic of -2.105 is larger than the critical values at the 1%, 5%, and 10% levels.
- We cannot reject the null hypothesis of no cointegration.
- We do not find evidence in support of the cointegration of the S&P 500 with the U.S. money stock and bond yield.
Conducting our Cointegration Tests with One Structural Break
Earlier we saw that the potential structural break in our data did not change our unit root test conclusion. We should also see if the structural break has an impact on our cointegration testing.
To do this we will use the Gregory-Hansen cointegration test which can be implemented using the coint_ghansen
test in the tspdlib
library.
We can carry over all of our coint_egranger
testing specifications, except our model specification.
The Model Specification
When implementing the Gregory-Hansen test, we must decide on a model which specifies:
- Which deterministic components are present in the cointegration regression.
- How the structural break affects the cointegration regression.
There are four modeling options to choose from
- The level shift [
model = 1
]
$$y_{sp, t} = \mu_1(1 - d_{\tau}) + \mu_{1,\tau} d_{\tau} + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
In this model, there is a structural break at time $\tau$ and $d_{\tau}$ is an indicator variable equal to 1 when $t >= \tau$. The constant before the structural break is $\mu_1$ and the constant after the structural break is $\mu_2$. - The level shift with trend [
model = 2
]
$$y_{sp, t} = \mu_1(1 - d_{\tau}) + \mu_{1,\tau} d_{\tau} + \delta t + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
In this model, the structural break again affects the constant. However, there is also a time trend included in the model. - The regime shift [
model = 3
]
$$y_{sp, t} = \mu_1(1 - d_{\tau}) + \mu_{1,\tau} d_{\tau} + \beta_1(1 - d_{\tau})y_{money, t} +$$ $$\beta_{1,\tau}d_{\tau}y_{money, t} + \beta_2(1 - d_{\tau}) y_{bond, t} + \beta_{2,\tau}d_{\tau}y_{bond, t} + u_t$$
In this model, the structural break affects the constant and regression coefficients. - The regime and trend shift shift [
model = 4
]
$$y_{sp, t} = \mu_1(1 - d_{\tau}) + \mu_{1,\tau} d_{\tau} + \delta_1(1 - d_{\tau}) t + \delta_{1,\tau}d_{\tau}t + \beta_1(1 - d_{\tau})y_{money, t} +$$ $$\beta_{1,\tau}d_{\tau}y_{money, t} + \beta_2(1 - d_{\tau}) y_{bond, t} + \beta_{2,\tau}d_{\tau}y_{bond, t} + u_t$$
In this model, the structural break again affects the constant, the regression coefficients, and the trend.
For example, let's consider the last case, where the constant, coefficients, and trend are all impacted by the structural break:
// Set fname to name of dataset
fname = "nelsonplosser.dta";
// Load three variables from the dataset
// and remove rows with missing values
coint_data = packr(loadd(fname, "sp500 + m + bnd"));
// Define y and x matrix
y = coint_data[., 1];
x = coint_data[., 2 3];
// Regime and trend shift
model = 4;
/*
** Information Criterion:
** 1=Akaike;
** 2=Schwarz;
** 3=t-stat sign.
*/
ic = 2;
// Maximum number of lags
pmax = 12;
/*
** Long run variance computation
** 1 = iid
** 2 = Bartlett
** 3 = Quadratic Spectral (QS);
** 4 = SPC with Bartlett /see (Sul, Phillips & Choi, 2005)
** 5 = SPC with QS;
** 6 = Kurozumi with Bartlett
** 7 = Kurozumi with QS
*/
varm = 1;
// Bandwidth for variance
bwl=1;
// Data trimming
trimm=0.1;
// Perform cointegration test
{ ADF_min_gh, TBadf_gh, Zt_min_gh, TBzt_gh, Za_min_gh, TBza_gh, cvADFZt_gh, cvZa_gh } =
coint_ghansen(y, x, model, bwl, ic, pmax, varm, trimm);
Interpreting Our Cointegration Results with One Structural Break
The coint_ghansen
procedure provides more extensive results than the coint_egranger
test. In particular, the Gregory-Hansen test:
- Performs Augmented Dickey-Fuller testing on the residuals from the cointegration regression.
- Perform the Phillips-Perron testing on the residuals from the cointegration regression.
- Identifies structural breaks.
Cointegration results with one structural break
Cointegration test results
After calling the coint_ghansen
procedure and testing all possible models, we obtain the following test statistic results:
Test | $ADF$ Test Statistic | $Z_t$ Test Statistic | $Z_{\alpha}$ Test Statistic | 10% Critical Value $ADF$,$Z_t$ | 10% Critical Value $Z_{\alpha}$ | Conclusion |
---|---|---|---|---|---|---|
Gregory-Hansen, Level shift | -4.004 | -3.819 | -27.858 | -4.690 | -42.490 | Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. |
Gregory-Hansen, Level shift with trend | -3.889 | -3.751 | -27.618 | -5.030 | -48.94 | Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. |
Gregory-Hansen, Regime change | -4.658 | -4.539 | -32.766 | -5.23 | -52.85 | Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. |
Gregory-Hansen, Regime change with trend | -5.834 | -4.484 | -32.411 | -5.72 | -63.10 | Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. |
As we can see from these results, there is no evidence that our S&P 500 Index is cointegrated with the money stock and bond yield.
Structural break results
The coint_ghansen
procedure also returns estimates for break dates based on the $ADF$, $Z_t$, and $Z_{\alpha}$ tests:
Test | $ADF$ Break Date | $Z_t$ Break Date | $Z_{\alpha}$ Break Date |
---|---|---|---|
Gregory-Hansen, Level shift | 1958 | 1956 | 1956 |
Gregory-Hansen, Level shift with trend | 1958 | 1956 | 1956 |
Gregory-Hansen, Regime change | 1955 | 1955 | 1955 |
Gregory-Hansen, Regime change with trend | 1951 | 1953 | 1947 |
What can we Conclude from the Gregory-Hansen Cointegration Test?
The results from our Gregory Hansen cointegration test provide some important conclusions:
- There is no support for cointegration.
- Incorporating a structural break does NOT change our conclusion that there is no cointegration.
Note that while the Gregory-Hansen test does estimate break dates, it does not provide the statistical evidence to conclude whether these are statistically significant break dates or not.
Conclusion
Today's blog looks closer at the Engle-Granger and Gregory-Hansen residual-based cointegration tests. By building a better understanding of how the tests work and what assumptions we make when running the tests, you will be better equipped to interpret the test results.
In particular, today we learned
- How to prepare for cointegration testing.
- How to set up the specifications for cointegration tests.
- How to interpret the results from the Engle-Granger and Gregory-Hansen cointegration tests.
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.
Nice post, very pedagogical, these three parameters need to be specified:
// To be specified
bwl=1;
trimm=0.1;
varm=1;
Best,
JS
Hello Jamel,
Thank you for your comment! I've updated the blog to reflect this.
Also, it should be noted that since the last update of TSPDLIB, the
bwl
,ic
,pmax
,varm
, andtrimm
arguments are all optional arguments. This allows you to callcoint_ghansen
using internal defaults for these parameters:-
More information about the default values can be in the TSPDLIB documentation.
Best,
Erica