Hypothesis Testing In GAUSS

Introduction

If you're an applied researcher, chances are you've used hypothesis testing before. It's an essential tool in practical applications — whether you're validating economic models, assessing policy impacts, or making data-driven business and financial decisions.

The power of hypothesis testing lies in its ability to provide a structured framework for making objective decisions based on data rather than intuition or anecdotal evidence. It allows us to systematically check the validity of our assumptions and models. The idea is simple — by formulating null and alternative hypotheses, we can determine whether observed relationships between variables are statistically significant or simply due to chance.

In today’s blog, we’ll take a closer look at the statistical intuition behind hypothesis testing using the Wald Test and provide a step-by-step guide for implementing hypothesis testing in GAUSS.

Understanding the Intuition of Hypothesis Testing

We don’t need to completely understand the mathematical background of hypothesis testing with the Wald Test to use it effectively. However, having some background will help ensure correct implementation and interpretation.

The Null Hypothesis

At the heart of hypothesis testing is the null hypothesis. It formally represents the assumptions we want to test.

In mathematical terms, it is constructed as a set of linear restrictions on our parameters and is given by:

$$ H_0: R\beta = q $$

where:

  • $R$ is a matrix specifying the linear constraints on the parameters.
  • $q$ is a vector of hypothesized values.
  • $\beta$ is the vector of model parameters.

The null hypothesis captures two key pieces of information:

  • Information from our observed data, reflected in the estimated model parameters.
  • The assumptions we are testing, represented by the linear constraints and hypothesized values.

The Wald Test Statistic

After formulating the null hypothesis, the Wald Test Statistic is computed as:

$$ W = (R\hat{\beta} - q)' (R\hat{V}R')^{-1} (R\hat{\beta} - q) $$

where $\hat{V}$ is the estimated variance-covariance matrix.

The Intuition of the Wald Test Statistic

Factors leading to the rejection of the null hypothesis.

Let's take a closer look at the components of the test statistic.

The first component of the test statistic, $(R\hat{\beta} - q)$, measures how much the observed parameters differ from the null hypothesis:

  • If our constraints hold exactly, $R\hat{\beta} = q$, and the test statistic is zero.
  • Because the test statistic squares the deviation, it captures differences in either direction.
  • The larger this component, the farther the observed data are from the null hypothesis.
  • A larger deviation leads to a larger test statistic.

The second component of the test statistic, $(R\hat{V}R')^{-1}$, accounts for the variability in our data:

  • As the variability of our data increases, $(R\hat{V}R')$ increases.
  • Since the squared deviation is divided by this component, an increase in variability leads to a lower test statistic. Intuitively, high variability implies that even a large deviation from the null hypothesis might not be statistically significant.
  • Scaling by variability prevents us from rejecting the null hypothesis due to high uncertainty in the estimates.

Interpreting the Wald Test Statistic

Understanding the Wald Test can help us better interpret its results. In general, the larger the Wald Test statistic:

  • The further our observed data deviates from $H_0$.
  • The less likely our observed data are under $H_0$.
  • The more likely we are to reject $H_0$.

To make more specific conclusions, we can use the p-value of our test statistic. The F-test alternative used by the GAUSS waldTest procedure follows an F distribution:

$$ F \sim F(q, d) $$

where:

  • $q$ is the number of constraints.
  • $d$ is the residual degrees of freedom.

The p-value, compared to a chosen significance level $\alpha$, helps us determine whether to reject the null hypothesis. It represents the probability of observing a test statistic as extreme as (or more extreme than) the calculated Wald Test statistic, assuming the null hypothesis is true.

Thus:

  • If $p \leq \alpha$, we reject $H_0$.
  • If $p > \alpha$, we fail to reject $H_0$.

Curious about using GAUSS? Contact us for a GAUSS 25 demo!

The GAUSS waldTest Procedure

In GAUSS, hypothesis testing can be performed using the waldTest procedure, introduced in GAUSS 25.

The waldTest procedure can be used in two ways:

  • Post-estimation with a filled output structure after estimation using olsmt, gmmfit, glm, or quantilefit.
  • Directly, using an estimated parameter vector and variance matrix.

Post-estimation Usage

If used post-estimation, the waldTest procedure has one required input and four optional inputs:

{ waldtest, p_value } = waldTest(out [, R, q, tau, joint])

out
Post-estimation filled output structure. Valid structure types include: olsmtOut, gmmOut, glmOut, and qfitOut.
R
Optional, LHS of the null hypothesis. Should be specified in terms of the model variables, with a separate row for each hypothesis. The function accepts linear combinations of the model variables.
q
Optional, RHS of the null hypothesis. Must be numeric vector.
tau
Optional, tau level corresponding to the testing hypothesis. Default is to jointly tests across all tau values. Only valid for the qfitOut structure.
joint
Optional, specification to test quantileFit hypotheses jointly across all coefficients for the qfitOut structure.

Data Matrices

If data matrices are used, the waldTest procedure has two required inputs and four optional inputs:

{ waldtest, p_value } = waldTest(sigma, params [, R, q, df_residuals, varnames])

sigma
Parameter variance-covariance estimation.
params
Parameter estimates.
R
Optional, LHS of the null hypothesis. Should be specified in terms of the model variables, with a separate row for each hypothesis. The function accepts linear combinations of the model variables.
q
Optional, RHS of the null hypothesis. Must be numeric vector.
df_residuals
Optional, model degrees of freedom for the F-test.
varnames
Optional, variable names.

Specifying The Null Hypothesis for Testing

By default, the waldTest procedure tests whether all estimated parameters jointly equal zero. This provides a quick way to assess the overall explanatory power of a model. However, the true strength of the waldTest procedure lies in its ability to test any linear combination of estimated parameters.

Specifying the hypothesis for testing is intuitive and can be done using variable names instead of manually constructing constraint matrices. This user-friendly approach:

  • Reduces errors.
  • Speeds up workflow.
  • Allows us to focus on interpreting results rather than setting up complex computations.

Now, let's take a closer look at the two inputs used to specify the null hypothesis: the R and q inputs.

The R Restriction Input

The optional R input specifies the restrictions to be tested. This input:

  • Must be a string array.
  • Should use your model variable names.
  • Can include any linear combination of the model variables.
  • Should have one row for every hypothesis to be jointly tested.

For example, suppose we estimate the model:

$$ \hat{mpg} = \beta_0 + \beta_1 \cdot weight + \beta_2 \cdot axles $$

and want to test whether the coefficients on weight and axles are equal.

To specify this restriction, we define R as follows:

// Set R to test
// if the coefficient on weight 
// and axles are equal (weight - axles = 0)
R = "weight - axles";

The q Input

The optional q input specifies the right-hand side (RHS) of the null hypothesis. By default, it tests whether all hypotheses have a value of 0.

To test hypothesized values other than zero, we must specify the q input.

The q input must:

  • Be a numerical vector.
  • Have one row for every hypothesis to be jointly tested.

Continuing our previous example, suppose we want to test whether the coefficient on weight equals 2.

// Set R to test 
// coefficient on weight = 2
R = "weight";

// Set hypothesized value
// using q
q = 2;

The waldTest Procedure in Action

The best way to familiarize ourselves with the waldTest procedure is through hands-on examples. Throughout these examples, we will use a hypothetical dataset containing four variables: income, education, experience, and hours.

You can download the dataset here.

Let's start by loading the data into GAUSS.

// Load data into GAUSS 
data  = loadd("waldtest_data.csv");

// Preview data
head(data);
          income        education       experience            hours
       45795.000        19.000000        24.000000        64.000000
       30860.000        14.000000        26.000000        30.000000
       106820.00        11.000000        25.000000        64.000000
       84886.000        13.000000        28.000000        66.000000
       36265.000        21.000000        28.000000        76.000000 

Example 1: Testing a Single Hypothesis After OLS

In our first example, we will estimate an ordinary least squares (OLS) model:

$$ income = \beta_0 + \beta_1 \cdot education + \beta_2 \cdot experience + \beta_3 \cdot hours $$

and test the null hypothesis that the estimated coefficient on education is equal to the estimated coefficient on experience:

$$ H_0: \beta_1 - \beta_2 = 0. $$

First, we estimate the OLS model using olsmt:

// Estimate ols model 
// Store results in the
// olsOut structure
struct olsmtOut ols_out;
ols_out = olsmt(data, "income ~ education + experience + hours");
Ordinary Least Squares
====================================================================================
Valid cases:                       50          Dependent variable:            income
Missing cases:                      0          Deletion method:                 None
Total SS:                    4.19e+10          Degrees of freedom:                46
R-squared:                     0.0352          Rbar-squared:                 -0.0277
Residual SS:                 4.04e+10          Std. err of est:             2.96e+04
F(3,46):                        0.559          Probability of F:               0.645
====================================================================================
                            Standard                    Prob       Lower       Upper
Variable        Estimate       Error     t-value        >|t|       Bound       Bound
------------------------------------------------------------------------------------

CONSTANT           51456       26566      1.9369    0.058913     -613.63  1.0352e+05
education         397.36      919.54     0.43213     0.66767     -1404.9      2199.7
experience        77.251      453.39     0.17038     0.86546     -811.39      965.89
hours             384.83      302.48      1.2723     0.20967     -208.02      977.68
====================================================================================

Next, we use waldtest to test our hypothesis:

// Test if coefficients for education and experience are equal
R = "education - experience";
call waldTest(ols_out, R);
===================================
Wald test of null joint hypothesis:
education - experience =  0
-----------------------------------
F( 1, 46 ):                  0.0978
Prob > F :                   0.7559
===================================

Since the test statistic is 0.0978 and the p-value is 0.756, we fail to reject the null hypothesis, suggesting that the coefficients are not significantly different.


Ready to elevate your research? Try GAUSS 25 today.

Example 2: Testing Multiple Hypotheses After GLM

In our second example, let's use waldTest to test multiple hypotheses jointly after using glm. We will estimate the same model as in our first example. However, this time we will use the waldTest procedure to jointly test two hypotheses:

$$ \begin{align} H_0: & \quad \beta_1 - \beta_2 = 0 \\ & \quad \beta_1 + \beta_2 = 1 \end{align} $$

First, we estimate the GLM model:

// Run GLM estimation with normal family (equivalent to OLS)
struct glmOut glm_out;
glm_out = glm(data, "income ~ education + experience + hours", "normal");
Generalized Linear Model
===================================================================
Valid cases:              50           Dependent variable:   income
Degrees of freedom:       46           Distribution          normal
Deviance:           4.04e+10           Link function:      identity
Pearson Chi-square: 4.04e+10           AIC:                1177.405
Log likelihood:         -584           BIC:                1186.965
Dispersion:        878391845           Iterations:             1186
Number of vars:            4
=================================================================== Standard Prob Variable Estimate Error t-value >|t| ------------------------------------------------------------------- CONSTANT 51456 26566 1.9369 0.058913 education 397.36 919.54 0.43213 0.66767 experience 77.251 453.39 0.17038 0.86546 hours 384.83 302.48 1.2723 0.20967 ===================================================================

Next, we test our joint hypothesis. For this test, keep in mind:

  • We must specify a q input because one of our hypothesized values is different from zero.
  • Our R and q inputs will each have two rows because we are jointly testing two hypotheses.
// Define multiple hypotheses:
// 1. education - experience = 0
// 2. education + experience = 1
R = "education - experience" $| "education + experience";
q = 0 | 1; 

// Perform Wald test for joint hypotheses
call waldTest(glm_out, R, q);
===================================
Wald test of null joint hypothesis:

education - experience =  0
education + hours      =  1
-----------------------------------
F( 2, 46 ):                  0.5001
Prob > F :                   0.6097
===================================

Since the test statistic is 0.5001 and the p-value is 0.6097:

  • We fail to reject the null hypothesis, indicating that the constraints hold within the limits of statistical significance.
  • Our observed data does not provide statistical evidence to conclude that either restriction is violated.

Example 3: Using Data Matrices

While waldTest is convenient for use after GAUSS estimation procedures, there may be cases where we need to apply it after manual parameter computations. In such cases, we can input our estimated parameters and covariance matrix directly using data matrices.

Let's repeat the first example but manually compute our OLS estimation:

// Run OLSMT estimation with manual computation of beta and sigma
X = ones(rows(data), 1) ~ data[., "education" "experience" "hours"];
y = data[., "income"];

// Compute beta manually
params = invpd(X'X) * X'y;

// Compute residuals and sigma
residuals = y - X * params;
n = rows(y);
k = cols(X);
sigma = (residuals'residuals) / (n - k) * invpd(X'X);

We can now use the manually computed params and sigma with waldTest. However, we must also provide the following additional information:

  • The residual degrees of freedom.
  • The variable names.
// Define hypothesis: education - experience = 0
R = "education - experience";
q = 0;

// Find degrees of freedom 
df_residuals = n - k;

// Specify variable names
varnames = "CONSTANT"$|"experience"$|"education"$|"hours";

// Perform Wald test
call waldTest(sigma, params, R, q, df_residuals, varnames);
===================================
Wald test of null joint hypothesis:
education - experience =  0
-----------------------------------
F( 1, 46 ):                  0.0978
Prob > F :                   0.7559
===================================

Conclusion

In today’s blog, we explored the intuition behind hypothesis testing and demonstrated how to implement the Wald Test in GAUSS using the waldTest procedure.

We covered:

  • What the Wald Test is and why it matters in statistical modeling.
  • Key features of the waldTest procedure.
  • Step-by-step examples of applying waldTest after different estimation methods.

The code and data from this blog can be found here.

Further Reading

  1. More Research, Less Effort with GAUSS 25!.
  2. Exploring and Cleaning Panel Data with GAUSS 25.
Leave a Reply