Introduction to Granger Causality

by Eric · Published June 29, 2021 · Updated October 4, 2021

Introduction

Multivariate time series analysis turns to vector autoregressive models not only for understanding the relationships between variables but also for forecasting. In today’s blog, we look at how to improve VAR model selection and achieve better forecasts using Granger causality.

Today’s blog explores the questions:

What is Granger causality?
When to use Granger causality?
How to use Granger causality?

What is Granger causality?

If you’ve explored the vector autoregressive literature, it is likely that you have come across the term Granger causality. Granger causality is an econometric test used to verify the usefulness of one variable to forecast another.

A variable is said to:

Granger-cause another variable if it is helpful for forecasting the other variable.
Fail to Granger-cause if it is not helpful for forecasting the other variable.

At this point, you may be asking yourself what does it mean for a variable to be “helpful” in forecasting? In simple terms, a variable is “helpful” for forecasting, if when added to the forecast model, it reduces the forecasting error.

In the context of the vector autoregressive models, a variable fails to Granger-cause another variable if its:

Lags are not statistically significant in the equation for another variable.
Past values aren’t significant in predicting the future values of another.

Example applications of Granger causality.
Do sunspots help forecast real GDP growth?
Does the price of Amazon stock help forecast UPS stock prices?
What is the functional connectivity of brain structure to underlying perception, cognition, and behavior?

When do we use Granger causality?

To understand when to use Granger causality testing, it helps to consider what Granger causality doesn’t tell us. Granger causality only provides information about forecasting ability, it does not provide insight into the true causal relationship between two variables.

This should be considered in conjunction with some of the statistical requirements for using Granger causality testing.

In particular, we should use Granger causality testing when:

We are interested in forecasting performance, not the theoretical model behind the forecast.
Our data is stationary.

The stationarity requirement implies that stationarity and cointegration testing should be performed prior to Granger-causality testing. For an introduction to these you concepts, I suggest reviewing our earlier blogs, How to Conduct Unit Root Tests in GAUSS and A Guide to Conducting Cointegration Tests.

How do we test for Granger causality?

Testing for Granger causality is relatively simple, though it is important to consider a few nuances.

Bivariate system

To start, let’s consider the simple case that we have two time-series, $X$ and $Y$ , and are modeling them in a VAR(3) system.

The VAR(3) model is made up of two equations:

x_{t} = c_{1} + \sum_{i = 1}^{3} α_{1, i} y_{t - i} + \sum_{i = 1}^{3} β_{1, i} x_{t - i} + ϵ_{x, t}

$x_t = c_1 + \sum_{i=1}^3 \alpha_{1,i} y_{t-i} + \sum_{i=1}^3 \beta_{1,i} x_{t-i} + \epsilon_{x,t}$

y_{t} = c_{2} + \sum_{i = 1}^{3} α_{2, i} y_{t - i} + \sum_{i = 1}^{3} β_{2, i} x_{t - i} + ϵ_{y, t}

$y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \sum_{i=1}^3 \beta_{2,i} x_{t-i} + \epsilon_{y,t}$

To test if $X$ Granger-causes $Y$ , we need to determine if any lags of $X$ are statistically significant in our model. We can do this using a Wald test for linear restrictions.

The Wald test is based on the fairly simple premise that we wish to compare the performance of a restricted model for $Y$ , which excludes $X$ , against an unrestricted model for $Y$ , which includes $X$ .

Granger causality comparisons
Model	Regression	$X$ Coefficients	Wald test
Restricted	$y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \epsilon_{x,t}$	$\beta_{2,1} = \beta_{2,2} = \beta_{2,3} = 0$	Null hypothesis
Unrestricted	$y_t = c_2 + \sum_{i=1}^3 \alpha_{2,i} y_{t-i} + \sum_{i=1}^3 \beta_{2,i} x_{t-i} + \epsilon_{x,t}$	At least one of $\beta_{2,1}, \beta_{2,2}, \beta_{2,3} \neq 0$	Alternative hypothesis

When testing for Granger causality:

We test the null hypothesis of non-causality $(H_0: \beta_{2,1} = \beta_{2,2} = \beta_{2,3} = 0)$ .
The Wald test statistic follows a $\chi^2$ distribution.
We are more likely to reject the null hypothesis of non-causality as the test statistic gets larger.
We should test both directions $X \Rightarrow Y$ and $X \Leftarrow Y$ .

Multivariate system

Now let’s consider a system with more than two variables, $X$ , $Y$ , and $Z$ . Testing for Granger causality is more complicated in this model.

Suppose we are modeling this system as a VAR(2) model such that:

x_{t} = c_{1} + \sum_{i = 1}^{2} α_{1, i} y_{t - i} + \sum_{i = 1}^{2} β_{1, i} x_{t - i} + \sum_{i = 1}^{2} γ_{1, i} z_{t - i} + ϵ_{x, t}

$x_t = c_1 + \sum_{i=1}^2 \alpha_{1,i} y_{t-i} + \sum_{i=1}^2 \beta_{1,i} x_{t-i} + \sum_{i=1}^2 \gamma_{1,i} z_{t-i} + \epsilon_{x,t}$

y_{t} = c_{2} + \sum_{i = 1}^{2} α_{2, i} y_{t - i} + \sum_{i = 1}^{2} β_{2, i} x_{t - i} + \sum_{i = 1}^{2} γ_{2, i} z_{t - i} + ϵ_{y, t}

$y_t = c_2 + \sum_{i=1}^2 \alpha_{2,i} y_{t-i} + \sum_{i=1}^2 \beta_{2,i} x_{t-i} + \sum_{i=1}^2 \gamma_{2,i} z_{t-i} + \epsilon_{y,t}$

z_{t} = c_{2} + \sum_{i = 1}^{2} α_{3, i} y_{t - i} + \sum_{i = 1}^{2} β_{3, i} x_{t - i} + \sum_{i = 1}^{2} γ_{3, i} z_{t - i} + ϵ_{z, t}

$z_t = c_2 + \sum_{i=1}^2 \alpha_{3,i} y_{t-i} + \sum_{i=1}^2 \beta_{3,i} x_{t-i} + \sum_{i=1}^2 \gamma_{3,i} z_{t-i} + \epsilon_{z,t}$

We can again test if $X$ Granger-causes $Y$ by testing the hypothesis that $\beta_{2,1} = \beta_{2,2} = 0$ . Many researchers will report the results of this test.

However, this may not give a complete picture regarding causality, because it only accounts for direct causality but does not acknowledge the indirect causality that $X$ may have on $Y$ through its impacts on $Z$ .

One solution proposed for this issue is to consider the impact of $X$ on $Y$ and $Z$ combined. Very generally, this is done by considering the "variable" $W = \{Y, Z\}$ and testing whether $X$ Granger causes $W$ .

In our system, this is the same as testing the null hypothesis $(H_0: \beta_{2,1} = \beta_{2,2} = \beta_{3,1} = \beta_{3,2} = 0)$ .

Example:

Let's look at a simple example to help solidify some of these concepts. In this example, we will look at the relationship between West Texas Intermediate oil prices and gold prices.

In this example, we walk through all the steps of testing Granger causality including:

Viewing the time series plot of our data.
Checking for stationarity.
Testing for Granger causality using the granger procedure in GAUSS.

Data information
Series	Units	Dates	Source
West Texas Intermediate oil prices	USD per barrel	2016-06 through 2021-06	FRED DCOILWTICO
Gold Fixing Price 10:30 A.M. (London time) in London Bullion Market	USD per Troy ounces	2016-06 through 2021-06	FRED GOLDAMGBD228NLBM

Time series plot

Before any time series modeling, it is generally helpful to plot your data. The time series plot of our data provides some interesting insights into our data:

Both of our series have non-zero means so we should include a constant in our model.
Neither series appears to have a time trend.
Both series appear to have structural breaks, which for the sake of simplicity we will ignore in this post.

Checking for stationarity

To test for stationarity we will use two fundamental tests:

Augmented Dickey Fuller (ADF) test for unit roots.
KPSS test for stationarity.

We will again ignore the structural breaks when checking for stationarity for simplicity. However, for a more comprehensive analysis of checking for stationarity with structural breaks see the earlier blog "Unit Root Tests with Structural Breaks".

We'll use the adf and kpss procedures from the free GAUSS library tspdlib to test for unit roots.


library tspdlib;
 
// Load data
price_data = loadd( "price_data.xls", "date($observation_date) + 
                                       price_gold + price_oil");
 
// Set model to include constant
model = 1;
 
// Call ADF unit root test
call adf(price_data[., "price_gold"], model);
call adf(price_data[., "price_oil"], model);
 
// Call KPSS stationarity test
call lmkpss(price_data[., "price_gold"], model);
call lmkpss(price_data[., "price_oil"], model);

The results of these tests suggest:

Our data does not meet the stationarity requirements for Granger causality testing.
We need to transform our data using first differences prior to testing.

Testing for stationarity
Test	Series	Statistic	Conclusion
ADF	Oil	-1.953	Cannot reject the null hypothesis of a unit root.
KPSS	Oil	12.037	Reject the null hypothesis of a stationarity at 1% level.
ADF	Gold	-0.343	Cannot reject the null hypothesis of a unit root.
KPSS	Gold	101.374	Reject the null hypothesis of a stationarity at 1% level.

Testing for Granger causality

We will again turn to the tspdlib library to test for Granger causality using the granger procedure. This built-in procedure requires two inputs:

data

Matrix or dataframe, data to be tested.

test

Scalar, type of Granger causality test to use.

0	Granger causality (Gragner 1969)
1	Toda & Yamamoto (Toda & Yamamote, 1995)
2	Single Fourier-frequency Granger causality (Enders & Jones, 2016)
3	Single Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019)
4	Cumulative Fourier-frequency Granger causality (Enders & Jones, 2019)
5	Cumulative Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019)

There are some helpful things to note about this procedure:

It offers a number of advanced causality testing options. These are beyond the scope of this blog and we will just stick to standard Granger causality testing.
The procedure tests for Granger causality across all columns in both directions.
For model options 0, 2, and 4 the data is first-differenced before testing. This means we don't have to take any additional steps to deal with the non-stationarity of our data.

Continuing with our price_data data from earlier:


/*
** Granger causality test
*/
 
// This specifies to use
// the standard Granger causality test.
// Note that data will be tested in 
// differences.
test = 0;
 
// Run test
call granger(price_data[., "price_gold" "price_oil"], test);

The procedure prints the Wald statistic, along with its respective p-value:

   Standard Granger Causality Test
------------------------------------------------------------
Direction                   Wald         Bootstrap p-val
price_oil => price_gold    6.860               0.093
price_gold => price_oil    5.982               0.109

Our results suggest that we:

Reject the null hypothesis that oil prices fail to Granger-cause gold prices at the 10% level.
Cannot reject the null hypothesis that gold prices fail to Granger-cause oil prices.

Conclusion

In today’s blog, we explored how to improve model selection using Granger causality. Proper model selection upfront can

Reduce time running invalid computationally expensive models.
Improve model reliability.
Improve forecasting.

After today's blog, you should have a better understanding of what Granger causality is and how to use it.

The code and data used in this blog can be downloaded from the Aptech GitHub repository.

4 thoughts on “Introduction to Granger Causality”

jamels July 6, 2021 at 12:15 am

Thank you very much for this fantastic blog, could you indicate the references for the causality tests:
4 Cumulative Fourier-frequency Granger causality (Enders & Jones, 2019)
5 Cumulative Fourier-frequency Toda & Yamamoto (Nazlioglu et al., 2019)
Best regards,
JS

Log in to Reply ↓
Erica Post authorJuly 7, 2021 at 8:21 am

Hi Jamel,

Thank you for your kind comment! I am glad you enjoyed the blog on Granger Causality. I do have the full references for the tests you are inquiring about. (It appears that the Ender & Jones paper is actually a 2016 paper):

Enders, W., & P. Jones. (2016). Grain prices, oil prices, and multiple smooth breaks in a var. Studies in Nonlinear Dynamics & Econometrics 20 (4):399-419.

Nazlioglu, S., Soytas, U. & Gormus, A. (2019). Oil prices and monetary policy in emerging markets: structural shifts in causal linkages”. Emerging Markets Finance and Trade. 55:1, 105-117.

I hope this helps!

Erica

Log in to Reply ↓
jamels July 7, 2021 at 3:44 pm

Hi Erica, it helps a lot! Thank you very much.

Jamel

Log in to Reply ↓
rant June 12, 2022 at 9:15 am

Hello Eric,
You have done an amazing work and thank you very much for that.
Can I ask please if I have 3 variables (commodity prices) and I want to explore whether each one granger causes the other on and the opposite direction, shall I make a trivariate var model or every possible combination and then to apply the causality test on the derived residuals? I plan to use the non parametric test of Dicks and Panchenko (2006) and needs the data used to be stationary .
Can this causality test be conducted on gauss?
Thank you very much and congratulations again on your work!

Log in to Reply ↓

You must be logged in to post a comment.

	library tspdlib;

	// Load data
	price_data = loadd( "price_data.xls", "date($observation_date) +
	price_gold + price_oil");

	// Set model to include constant
	model = 1;

	// Call ADF unit root test
	call adf(price_data[., "price_gold"], model);
	call adf(price_data[., "price_oil"], model);

	// Call KPSS stationarity test
	call lmkpss(price_data[., "price_gold"], model);
	call lmkpss(price_data[., "price_oil"], model);

	/*
	** Granger causality test
	*/

	// This specifies to use
	// the standard Granger causality test.
	// Note that data will be tested in
	// differences.
	test = 0;

	// Run test
	call granger(price_data[., "price_gold" "price_oil"], test);