Introduction
Multivariate time series analysis turns to vector autoregressive models not only for understanding the relationships between variables but also for forecasting. In today’s blog, we look at how to improve VAR model selection and achieve better forecasts using Granger causality.
Today’s blog explores the questions:
- What is Granger causality?
- When to use Granger causality?
- How to use Granger causality?
What is Granger causality?
If you’ve explored the vector autoregressive literature, it is likely that you have come across the term Granger causality. Granger causality is an econometric test used to verify the usefulness of one variable to forecast another.
A variable is said to:
- Granger-cause another variable if it is helpful for forecasting the other variable.
- Fail to Granger-cause if it is not helpful for forecasting the other variable.
At this point, you may be asking yourself what does it mean for a variable to be “helpful” in forecasting? In simple terms, a variable is “helpful” for forecasting, if when added to the forecast model, it reduces the forecasting error.
In the context of the vector autoregressive models, a variable fails to Granger-cause another variable if its:
- Lags are not statistically significant in the equation for another variable.
- Past values aren’t significant in predicting the future values of another.
Example applications of Granger causality. |
---|
Do sunspots help forecast real GDP growth? |
Does the price of Amazon stock help forecast UPS stock prices? |
What is the functional connectivity of brain structure to underlying perception, cognition, and behavior? |
When do we use Granger causality?
To understand when to use Granger causality testing, it helps to consider what Granger causality doesn’t tell us. Granger causality only provides information about forecasting ability, it does not provide insight into the true causal relationship between two variables.
This should be considered in conjunction with some of the statistical requirements for using Granger causality testing.
In particular, we should use Granger causality testing when:
- We are interested in forecasting performance, not the theoretical model behind the forecast.
- Our data is stationary.
How do we test for Granger causality?
Testing for Granger causality is relatively simple, though it is important to consider a few nuances.
Bivariate system
To start, let’s consider the simple case that we have two time-series, $X$ and $Y$, and are modeling them in a VAR(3) system.
The VAR(3) model is made up of two equations: $$x_t = c_1 + \sum_{i=1}^3 \alpha_{1,i} y_{t-i} + \sum_{i=1}^3 \beta_{1,i} x_{t-i} + \epsilon_{x,t}$$ $$y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \sum_{i=1}^3 \beta_{2,i} x_{t-i} + \epsilon_{y,t}$$
To test if $X$ Granger-causes $Y$, we need to determine if any lags of $X$ are statistically significant in our model. We can do this using a Wald test for linear restrictions.
The Wald test is based on the fairly simple premise that we wish to compare the performance of a restricted model for $Y$, which excludes $X$, against an unrestricted model for $Y$, which includes $X$.
Granger causality comparisons | |||
---|---|---|---|
Model | Regression | $X$ Coefficients | Wald test |
Restricted | $y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \epsilon_{x,t}$ | $\beta_{2,1} = \beta_{2,2} = \beta_{2,3} = 0$ | Null hypothesis |
Unrestricted | $y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \sum_{i=1}^3 \beta_{2,i} x_{t-i} + \epsilon_{x,t}$ | At least one of $\beta_{2,1}, \beta_{2,2}, \beta_{2,3} \neq 0$ | Alternative hypothesis |
When testing for Granger causality:
- We test the null hypothesis of non-causality $(H_0: \beta_{2,1} = \beta_{2,2} = \beta_{2,3} = 0)$.
- The Wald test statistic follows a $\chi^2$ distribution.
- We are more likely to reject the null hypothesis of non-causality as the test statistic gets larger.
- We should test both directions $X \Rightarrow Y$ and $X \Leftarrow Y$.
Multivariate system
Now let’s consider a system with more than two variables, $X$, $Y$, and $Z$. Testing for Granger causality is more complicated in this model.
Suppose we are modeling this system as a VAR(2) model such that: $$x_t = c_1 + \sum_{i=1}^2 \alpha_{1,i} y_{t-i} + \sum_{i=1}^2 \beta_{1,i} x_{t-i} + \sum_{i=1}^2 \gamma_{1,i} z_{t-i} + \epsilon_{x,t}$$ $$y_t = c_2 + \sum_{i=1}^2 \alpha_{2,i} y_{t-i} + \sum_{i=1}^2 \beta_{2,i} x_{t-i} + \sum_{i=1}^2 \gamma_{2,i} z_{t-i} + \epsilon_{y,t}$$ $$z_t = c_2 + \sum_{i=1}^2 \alpha_{3,i} y_{t-i} + \sum_{i=1}^2 \beta_{3,i} x_{t-i} + \sum_{i=1}^2 \gamma_{3,i} z_{t-i} + \epsilon_{z,t}$$
We can again test if $X$ Granger-causes $Y$ by testing the hypothesis that $\beta_{2,1} = \beta_{2,2} = 0$. Many researchers will report the results of this test.
However, this may not give a complete picture regarding causality, because it only accounts for direct causality but does not acknowledge the indirect causality that $X$ may have on $Y$ through its impacts on $Z$.
One solution proposed for this issue is to consider the impact of $X$ on $Y$ and $Z$ combined. Very generally, this is done by considering the "variable" $W = \{Y, Z\}$ and testing whether $X$ Granger causes $W$.
In our system, this is the same as testing the null hypothesis $(H_0: \beta_{2,1} = \beta_{2,2} = \beta_{3,1} = \beta_{3,2} = 0)$.
Example:
Let's look at a simple example to help solidify some of these concepts. In this example, we will look at the relationship between West Texas Intermediate oil prices and gold prices.
In this example, we walk through all the steps of testing Granger causality including:
- Viewing the time series plot of our data.
- Checking for stationarity.
- Testing for Granger causality using the
granger
procedure in GAUSS.
Data information | |||
---|---|---|---|
Series | Units | Dates | Source |
West Texas Intermediate oil prices | USD per barrel | 2016-06 through 2021-06 | FRED DCOILWTICO |
Gold Fixing Price 10:30 A.M. (London time) in London Bullion Market | USD per Troy ounces | 2016-06 through 2021-06 | FRED GOLDAMGBD228NLBM |
Time series plot
Before any time series modeling, it is generally helpful to plot your data. The time series plot of our data provides some interesting insights into our data:
- Both of our series have non-zero means so we should include a constant in our model.
- Neither series appears to have a time trend.
- Both series appear to have structural breaks, which for the sake of simplicity we will ignore in this post.
Checking for stationarity
To test for stationarity we will use two fundamental tests:
- Augmented Dickey Fuller (ADF) test for unit roots.
- KPSS test for stationarity.
We'll use the adf
and kpss
procedures from the free GAUSS library tspdlib
to test for unit roots.
library tspdlib;
// Load data
price_data = loadd( "price_data.xls", "date($observation_date) +
price_gold + price_oil");
// Set model to include constant
model = 1;
// Call ADF unit root test
call adf(price_data[., "price_gold"], model);
call adf(price_data[., "price_oil"], model);
// Call KPSS stationarity test
call lmkpss(price_data[., "price_gold"], model);
call lmkpss(price_data[., "price_oil"], model);
The results of these tests suggest:
- Our data does not meet the stationarity requirements for Granger causality testing.
- We need to transform our data using first differences prior to testing.
Testing for stationarity | |||
---|---|---|---|
Test | Series | Statistic | Conclusion |
ADF | Oil | -1.953 | Cannot reject the null hypothesis of a unit root. |
KPSS | Oil | 12.037 | Reject the null hypothesis of a stationarity at 1% level. |
ADF | Gold | -0.343 | Cannot reject the null hypothesis of a unit root. |
KPSS | Gold | 101.374 | Reject the null hypothesis of a stationarity at 1% level. |
Testing for Granger causality
We will again turn to the tspdlib
library to test for Granger causality using the granger
procedure. This built-in procedure requires two inputs:
- data
- Matrix or dataframe, data to be tested.
- test
- Scalar, type of Granger causality test to use.
0 Granger causality (Gragner 1969) 1 Toda & Yamamoto (Toda & Yamamote, 1995) 2 Single Fourier-frequency Granger causality (Enders & Jones, 2016) 3 Single Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019) 4 Cumulative Fourier-frequency Granger causality (Enders & Jones, 2019) 5 Cumulative Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019)
There are some helpful things to note about this procedure:
- It offers a number of advanced causality testing options. These are beyond the scope of this blog and we will just stick to standard Granger causality testing.
- The procedure tests for Granger causality across all columns in both directions.
- For model options 0, 2, and 4 the data is first-differenced before testing. This means we don't have to take any additional steps to deal with the non-stationarity of our data.
Continuing with our price_data data from earlier:
/*
** Granger causality test
*/
// This specifies to use
// the standard Granger causality test.
// Note that data will be tested in
// differences.
test = 0;
// Run test
call granger(price_data[., "price_gold" "price_oil"], test);
The procedure prints the Wald statistic, along with its respective p-value:
Standard Granger Causality Test ------------------------------------------------------------ Direction Wald Bootstrap p-val price_oil => price_gold 6.860 0.093 price_gold => price_oil 5.982 0.109
Our results suggest that we:
- Reject the null hypothesis that oil prices fail to Granger-cause gold prices at the 10% level.
- Cannot reject the null hypothesis that gold prices fail to Granger-cause oil prices.
Conclusion
In today’s blog, we explored how to improve model selection using Granger causality. Proper model selection upfront can
- Reduce time running invalid computationally expensive models.
- Improve model reliability.
- Improve forecasting.
After today's blog, you should have a better understanding of what Granger causality is and how to use it.
The code and data used in this blog can be downloaded from the Aptech GitHub repository.
Further reading
- Introduction to the Fundamentals of Autoregressive Models
- Introduction to the Fundamentals of Time Series Data and Analysis
- How to Conduct Unit Root Tests in GAUSS
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.
Thank you very much for this fantastic blog, could you indicate the references for the causality tests:
4 Cumulative Fourier-frequency Granger causality (Enders & Jones, 2019)
5 Cumulative Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019)
Best regards,
JS
Hi Jamel,
Thank you for your kind comment! I am glad you enjoyed the blog on Granger Causality. I do have the full references for the tests you are inquiring about. (It appears that the Ender & Jones paper is actually a 2016 paper):
Enders, W., & P. Jones. (2016). Grain prices, oil prices, and multiple smooth breaks in a var. Studies in Nonlinear Dynamics & Econometrics 20 (4):399-419.
Nazlioglu, S., Soytas, U. & Gormus, A. (2019). Oil prices and monetary policy in emerging markets: structural shifts in causal linkages”. Emerging Markets Finance and Trade. 55:1, 105-117.
I hope this helps!
Erica
Hi Erica, it helps a lot! Thank you very much.
Jamel
Hello Eric,
You have done an amazing work and thank you very much for that.
Can I ask please if I have 3 variables (commodity prices) and I want to explore whether each one granger causes the other on and the opposite direction, shall I make a trivariate var model or every possible combination and then to apply the causality test on the derived residuals? I plan to use the non parametric test of Dicks and Panchenko (2006) and needs the data used to be stationary .
Can this causality test be conducted on gauss?
Thank you very much and congratulations again on your work!