Introduction
If you're an applied researcher, chances are you've used hypothesis testing before. It's an essential tool in practical applications — whether you're validating economic models, assessing policy impacts, or making data-driven business and financial decisions.
The power of hypothesis testing lies in its ability to provide a structured framework for making objective decisions based on data rather than intuition or anecdotal evidence. It allows us to systematically check the validity of our assumptions and models. The idea is simple — by formulating null and alternative hypotheses, we can determine whether observed relationships between variables are statistically significant or simply due to chance.
In today’s blog, we’ll take a closer look at the statistical intuition behind hypothesis testing using the Wald Test and provide a step-by-step guide for implementing hypothesis testing in GAUSS.
Understanding the Intuition of Hypothesis Testing
We don’t need to completely understand the mathematical background of hypothesis testing with the Wald Test to use it effectively. However, having some background will help ensure correct implementation and interpretation.
The Null Hypothesis
At the heart of hypothesis testing is the null hypothesis. It formally represents the assumptions we want to test.
In mathematical terms, it is constructed as a set of linear restrictions on our parameters and is given by:
$$ H_0: R\beta = q $$
where:
- $R$ is a matrix specifying the linear constraints on the parameters.
- $q$ is a vector of hypothesized values.
- $\beta$ is the vector of model parameters.
The null hypothesis captures two key pieces of information:
- Information from our observed data, reflected in the estimated model parameters.
- The assumptions we are testing, represented by the linear constraints and hypothesized values.
The Wald Test Statistic
After formulating the null hypothesis, the Wald Test Statistic is computed as:
$$ W = (R\hat{\beta} - q)' (R\hat{V}R')^{-1} (R\hat{\beta} - q) $$
where $\hat{V}$ is the estimated variance-covariance matrix.
The Intuition of the Wald Test Statistic
Let's take a closer look at the components of the test statistic.
The first component of the test statistic, $(R\hat{\beta} - q)$, measures how much the observed parameters differ from the null hypothesis:
- If our constraints hold exactly, $R\hat{\beta} = q$, and the test statistic is zero.
- Because the test statistic squares the deviation, it captures differences in either direction.
- The larger this component, the farther the observed data are from the null hypothesis.
- A larger deviation leads to a larger test statistic.
The second component of the test statistic, $(R\hat{V}R')^{-1}$, accounts for the variability in our data:
- As the variability of our data increases, $(R\hat{V}R')$ increases.
- Since the squared deviation is divided by this component, an increase in variability leads to a lower test statistic. Intuitively, high variability implies that even a large deviation from the null hypothesis might not be statistically significant.
- Scaling by variability prevents us from rejecting the null hypothesis due to high uncertainty in the estimates.
waldTest
procedure uses the F-test alternative to the Wald Test, which scales the Wald Statistic by the number of restrictions.Interpreting the Wald Test Statistic
Understanding the Wald Test can help us better interpret its results. In general, the larger the Wald Test statistic:
- The further our observed data deviates from $H_0$.
- The less likely our observed data are under $H_0$.
- The more likely we are to reject $H_0$.
To make more specific conclusions, we can use the p-value of our test statistic. The F-test alternative used by the GAUSS waldTest
procedure follows an F distribution:
$$ F \sim F(q, d) $$
where:
- $q$ is the number of constraints.
- $d$ is the residual degrees of freedom.
The p-value, compared to a chosen significance level $\alpha$, helps us determine whether to reject the null hypothesis. It represents the probability of observing a test statistic as extreme as (or more extreme than) the calculated Wald Test statistic, assuming the null hypothesis is true.
Thus:
- If $p \leq \alpha$, we reject $H_0$.
- If $p > \alpha$, we fail to reject $H_0$.
Curious about using GAUSS? Contact us for a GAUSS 25 demo!
The GAUSS waldTest
Procedure
In GAUSS, hypothesis testing can be performed using the waldTest
procedure, introduced in GAUSS 25.
The waldTest
procedure can be used in two ways:
- Post-estimation with a filled output structure after estimation using
olsmt
,gmmfit
,glm
, orquantilefit
. - Directly, using an estimated parameter vector and variance matrix.
Post-estimation Usage
If used post-estimation, the waldTest
procedure has one required input and four optional inputs:
{ waldtest, p_value } = waldTest(out [, R, q, tau, joint])
- out
- Post-estimation filled output structure. Valid structure types include:
olsmtOut
,gmmOut
,glmOut
, andqfitOut
. - R
- Optional, LHS of the null hypothesis. Should be specified in terms of the model variables, with a separate row for each hypothesis. The function accepts linear combinations of the model variables.
- q
- Optional, RHS of the null hypothesis. Must be numeric vector.
- tau
- Optional, tau level corresponding to the testing hypothesis. Default is to jointly tests across all tau values. Only valid for the
qfitOut
structure. - joint
- Optional, specification to test
quantileFit
hypotheses jointly across all coefficients for theqfitOut
structure.
Data Matrices
If data matrices are used, the waldTest
procedure has two required inputs and four optional inputs:
{ waldtest, p_value } = waldTest(sigma, params [, R, q, df_residuals, varnames])
- sigma
- Parameter variance-covariance estimation.
- params
- Parameter estimates.
- R
- Optional, LHS of the null hypothesis. Should be specified in terms of the model variables, with a separate row for each hypothesis. The function accepts linear combinations of the model variables.
- q
- Optional, RHS of the null hypothesis. Must be numeric vector.
- df_residuals
- Optional, model degrees of freedom for the F-test.
- varnames
- Optional, variable names.
Specifying The Null Hypothesis for Testing
By default, the waldTest
procedure tests whether all estimated parameters jointly equal zero. This provides a quick way to assess the overall explanatory power of a model. However, the true strength of the waldTest
procedure lies in its ability to test any linear combination of estimated parameters.
Specifying the hypothesis for testing is intuitive and can be done using variable names instead of manually constructing constraint matrices. This user-friendly approach:
- Reduces errors.
- Speeds up workflow.
- Allows us to focus on interpreting results rather than setting up complex computations.
Now, let's take a closer look at the two inputs used to specify the null hypothesis: the R and q inputs.
The R Restriction Input
The optional R input specifies the restrictions to be tested. This input:
- Must be a string array.
- Should use your model variable names.
- Can include any linear combination of the model variables.
- Should have one row for every hypothesis to be jointly tested.
For example, suppose we estimate the model:
$$ \hat{mpg} = \beta_0 + \beta_1 \cdot weight + \beta_2 \cdot axles $$
and want to test whether the coefficients on weight and axles are equal.
To specify this restriction, we define R as follows:
// Set R to test
// if the coefficient on weight
// and axles are equal (weight - axles = 0)
R = "weight - axles";
The q Input
The optional q input specifies the right-hand side (RHS) of the null hypothesis. By default, it tests whether all hypotheses have a value of 0.
To test hypothesized values other than zero, we must specify the q input.
The q input must:
- Be a numerical vector.
- Have one row for every hypothesis to be jointly tested.
Continuing our previous example, suppose we want to test whether the coefficient on weight equals 2.
// Set R to test
// coefficient on weight = 2
R = "weight";
// Set hypothesized value
// using q
q = 2;
The waldTest
Procedure in Action
The best way to familiarize ourselves with the waldTest
procedure is through hands-on examples. Throughout these examples, we will use a hypothetical dataset containing four variables: income, education, experience, and hours.
You can download the dataset here.
Let's start by loading the data into GAUSS.
// Load data into GAUSS
data = loadd("waldtest_data.csv");
// Preview data
head(data);
income education experience hours 45795.000 19.000000 24.000000 64.000000 30860.000 14.000000 26.000000 30.000000 106820.00 11.000000 25.000000 64.000000 84886.000 13.000000 28.000000 66.000000 36265.000 21.000000 28.000000 76.000000
Example 1: Testing a Single Hypothesis After OLS
In our first example, we will estimate an ordinary least squares (OLS) model:
$$ income = \beta_0 + \beta_1 \cdot education + \beta_2 \cdot experience + \beta_3 \cdot hours $$
and test the null hypothesis that the estimated coefficient on education is equal to the estimated coefficient on experience:
$$ H_0: \beta_1 - \beta_2 = 0. $$
First, we estimate the OLS model using olsmt
:
// Estimate ols model
// Store results in the
// olsOut structure
struct olsmtOut ols_out;
ols_out = olsmt(data, "income ~ education + experience + hours");
Ordinary Least Squares ==================================================================================== Valid cases: 50 Dependent variable: income Missing cases: 0 Deletion method: None Total SS: 4.19e+10 Degrees of freedom: 46 R-squared: 0.0352 Rbar-squared: -0.0277 Residual SS: 4.04e+10 Std. err of est: 2.96e+04 F(3,46): 0.559 Probability of F: 0.645 ==================================================================================== Standard Prob Lower Upper Variable Estimate Error t-value >|t| Bound Bound ------------------------------------------------------------------------------------ CONSTANT 51456 26566 1.9369 0.058913 -613.63 1.0352e+05 education 397.36 919.54 0.43213 0.66767 -1404.9 2199.7 experience 77.251 453.39 0.17038 0.86546 -811.39 965.89 hours 384.83 302.48 1.2723 0.20967 -208.02 977.68 ====================================================================================
Next, we use waldtest
to test our hypothesis:
// Test if coefficients for education and experience are equal
R = "education - experience";
call waldTest(ols_out, R);
=================================== Wald test of null joint hypothesis: education - experience = 0 ----------------------------------- F( 1, 46 ): 0.0978 Prob > F : 0.7559 ===================================
Since the test statistic is 0.0978 and the p-value is 0.756, we fail to reject the null hypothesis, suggesting that the coefficients are not significantly different.
Example 2: Testing Multiple Hypotheses After GLM
In our second example, let's use waldTest
to test multiple hypotheses jointly after using glm
. We will estimate the same model as in our first example. However, this time we will use the waldTest
procedure to jointly test two hypotheses:
$$ \begin{align} H_0: & \quad \beta_1 - \beta_2 = 0 \\ & \quad \beta_1 + \beta_2 = 1 \end{align} $$
First, we estimate the GLM model:
// Run GLM estimation with normal family (equivalent to OLS)
struct glmOut glm_out;
glm_out = glm(data, "income ~ education + experience + hours", "normal");
Generalized Linear Model =================================================================== Valid cases: 50 Dependent variable: income Degrees of freedom: 46 Distribution normal Deviance: 4.04e+10 Link function: identity Pearson Chi-square: 4.04e+10 AIC: 1177.405 Log likelihood: -584 BIC: 1186.965 Dispersion: 878391845 Iterations: 1186 Number of vars: 4
=================================================================== Standard Prob Variable Estimate Error t-value >|t| ------------------------------------------------------------------- CONSTANT 51456 26566 1.9369 0.058913 education 397.36 919.54 0.43213 0.66767 experience 77.251 453.39 0.17038 0.86546 hours 384.83 302.48 1.2723 0.20967 ===================================================================
Next, we test our joint hypothesis. For this test, keep in mind:
- We must specify a q input because one of our hypothesized values is different from zero.
- Our R and q inputs will each have two rows because we are jointly testing two hypotheses.
// Define multiple hypotheses:
// 1. education - experience = 0
// 2. education + experience = 1
R = "education - experience" $| "education + experience";
q = 0 | 1;
// Perform Wald test for joint hypotheses
call waldTest(glm_out, R, q);
=================================== Wald test of null joint hypothesis: education - experience = 0 education + hours = 1 ----------------------------------- F( 2, 46 ): 0.5001 Prob > F : 0.6097 ===================================
Since the test statistic is 0.5001 and the p-value is 0.6097:
- We fail to reject the null hypothesis, indicating that the constraints hold within the limits of statistical significance.
- Our observed data does not provide statistical evidence to conclude that either restriction is violated.
Example 3: Using Data Matrices
While waldTest
is convenient for use after GAUSS estimation procedures, there may be cases where we need to apply it after manual parameter computations. In such cases, we can input our estimated parameters and covariance matrix directly using data matrices.
Let's repeat the first example but manually compute our OLS estimation:
// Run OLSMT estimation with manual computation of beta and sigma
X = ones(rows(data), 1) ~ data[., "education" "experience" "hours"];
y = data[., "income"];
// Compute beta manually
params = invpd(X'X) * X'y;
// Compute residuals and sigma
residuals = y - X * params;
n = rows(y);
k = cols(X);
sigma = (residuals'residuals) / (n - k) * invpd(X'X);
We can now use the manually computed params and sigma with waldTest
. However, we must also provide the following additional information:
- The residual degrees of freedom.
- The variable names.
// Define hypothesis: education - experience = 0
R = "education - experience";
q = 0;
// Find degrees of freedom
df_residuals = n - k;
// Specify variable names
varnames = "CONSTANT"$|"experience"$|"education"$|"hours";
// Perform Wald test
call waldTest(sigma, params, R, q, df_residuals, varnames);
=================================== Wald test of null joint hypothesis: education - experience = 0 ----------------------------------- F( 1, 46 ): 0.0978 Prob > F : 0.7559 ===================================
"X1, X2, ..., XK"
.Conclusion
In today’s blog, we explored the intuition behind hypothesis testing and demonstrated how to implement the Wald Test in GAUSS using the waldTest
procedure.
We covered:
- What the Wald Test is and why it matters in statistical modeling.
- Key features of the
waldTest
procedure. - Step-by-step examples of applying
waldTest
after different estimation methods.
The code and data from this blog can be found here.
Further Reading
- More Research, Less Effort with GAUSS 25!.
- Exploring and Cleaning Panel Data with GAUSS 25.
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.