Using Feasible Generalized Least Squares To Improve Estimates

by Eric · Published January 25, 2024 · Updated January 25, 2024

Introduction

Data analysis in reality is rarely as clean and tidy as it is presented in the textbooks. Consider linear regression -- data rarely meets the stringent assumptions required for OLS. Failing to recognize this and incorrectly implementing OLS can lead to embarrassing, inaccurate conclusions.

In today's blog, we'll look at how to use feasible generalized least squares to deal with data that does not meet the OLS assumption of Independent and Identically Distributed (IID) error terms.

What Is Feasible Generalized Least Squares (FGLS)?

FGLS is a flexible and powerful tool that provides a reliable approach for regression analysis in the presence of non-constant variances and correlated errors.

Feasible Generalized Least Squares (FGLS):

Is an extension of the traditional Ordinary Least Squares (OLS) regression method.
Accommodates heteroscedasticity and serial correlation in data.
Allows for more robust parameter estimates by considering the structure of the error terms.

Why is this important?

Recall the fundamental OLS IID assumption which implies that the error terms have constant variance and are uncorrelated. When this assumption is violated:

OLS estimators are no longer efficient.
The estimated covariance matrix of the coefficients will be inconsistent.
Standard inferences will be incorrect.

Unfortunately, many real-world cross-sectional, panel data, and time series datasets do violate this fundamental assumption.

FGLS allows for a more accurate modeling of complex and realistic data structures by accommodating the heteroscedasticity and autocorrelation in the error terms.

How Does FGLS Work?

FGLS uses a weighting matrix that captures the structure of the variance-covariance matrix of the errors.

This allows FGLS to:

Give more weight to observations with smaller variances.
Account for correlations.
Provide more efficient and unbiased estimates in the presence of non-constant variance and serial correlation.

The method uses a relatively simple iterative process:

Pick a method for estimating the covariance matrix based on believed data structure.
Make initial OLS parameter estimates.
Use the OLS residuals and the chosen method to estimate an initial covariance matrix.
Compute FGLS estimates using the estimated covariance matrix for weighting.
Calculate residuals and refine the weighting matrix.
Repeat steps 3, 4, and 5 until convergence.

How Do I Know If I Should Use FGLS?

We've already noted that you should use FGLS when you encounter heteroscedasticity and/or autocorrelation. It's easy to say this but how do you identify when this is the case?

There are a number of tools that can help.

Example Tools for Identifying Heteroscedasticity and Autocorrelation
Tool	Description	Used to Identify
Scatter plots	Plot the dependent variable against each independent variable and look for patterns that suggest relationships between the variance and variables. Plot the residuals over time and look for cycles or trends in the residuals.	Heteroscedasticity and autocorrelation.
Residual plot	A fan-shaped or funnel-shaped pattern in a plot of the residuals against fitted values indicates that the variance of the residuals is not constant across all levels of the independent variable. A pattern of correlation in plots of residuals against lagged residuals may indicate autocorrelation.	Heteroscedasticity and autocorrelation.
Histogram of residuals	Plot a histogram of the residuals. If the histogram is skewed or has unequal spread, it could suggest heteroscedasticity or non-normal distribution.	Heteroscedasticity.
Durbin-Watson statistic	The Durbin-Watson statistic tests for first-order autocorrelation in the residuals. The test statistic ranges from 0 to 4, with values around 2 indicating no autocorrelation.	Autocorrelation.
Breusch-Pagan test	The Breusch-Pagan test considers the null hypothesis of homoscedasticity against the alternative of heteroscedasticity.	Heteroscedasticity.
Breusch-Godfrey test	The Breusch-Godfrey test extends the Durbin-Watson test to higher-order autocorrelation. The test assesses whether larger lags of residuals and independent variables help explain the current residuals.	Autocorrelation.
White test	Similar to the Breusch-Pagan test, the White test considers the null hypothesis of homoscedasticity.	Heteroscedasticity.

Example One: US Consumer Price Index (CPI)

Let's get a better feel for FGLS using real-world data. In this application, we will:

Find OLS estimates and examine the results for signs of heteroscedasticity and autocorrelation.
Compute FGLS estimates and discuss results.

Data

For this example, we will use publicly available FRED time series data:

Consumer Price Index for All Urban Consumers: All Items in U.S. City Average (CPIAUCSL), seasonally adjusted.
Compensation of employees, paid (COE), seasonally adjusted.

Both variables are quarterly, continuously compounded rates of change spanning from 1947Q2 to 2023Q3.


// Load data 
fred_fgls = loadd("fred_fgls.gdat");
 
// Preview data
head(fred_fgls);
tail(fred_fgls);

            date              COE         CPIAUCSL
      1947-04-01      0.013842900      0.014184600
      1947-07-01      0.015131100      0.021573900
      1947-10-01      0.030381600      0.027915600
      1948-01-01      0.025448400      0.020966300
      1948-04-01      0.011788800      0.015823300

            date              COE         CPIAUCSL
      2022-07-01      0.023345300      0.013491800
      2022-10-01     0.0048207000      0.010199500
      2023-01-01      0.021001700     0.0093545000
      2023-04-01      0.013436000     0.0066823000
      2023-07-01      0.013337800     0.0088013000

For convenience, we're using an already saved GAUSS dataframe to load the data. However, FRED data can be directly loaded into GAUSS using a personal API KEY and the GAUSS FRED data tools. The code for loading the data from FRED is available here.

OLS Estimation

Let's start by using OLS to examine the relationship between COE and CPI returns. We'll be sure to have GAUSS save our residuals so we can use them to evaluate OLS performance.


// Declare 'ols_ctl' to be an olsmtControl structure
// and fill with default settings
struct olsmtControl ols_ctl;
ols_ctl = olsmtControlCreate();
 
// Set the 'res' member of the olsmtControl structure
// so that 'olsmt' will compute residuals and the
// Durbin-Watson statistic
ols_ctl.res = 1;
 
// Declare 'ols_out' to be an olsmtOut structure
// to hold the results of the computations
struct olsmtOut ols_out;
 
// Perform estimation, using settings in the 'ols_ctl'
// control structure and store the results in 'ols_out'
ols_out = olsmt(fred_fgls, "CPIAUCSL ~ COE", ols_ctl);

Valid cases:                   306      Dependent variable:            CPIAUCSL
Missing cases:                   0      Deletion method:                   None
Total SS:                    0.019      Degrees of freedom:                 304
R-squared:                   0.197      Rbar-squared:                     0.195
Residual SS:                 0.016      Std error of est:                 0.007
F(1,304):                   74.673      Probability of F:                 0.000
Durbin-Watson:               0.773

                         Standard                 Prob   Standardized  Cor with
Variable     Estimate      Error      t-value     >|t|     Estimate    Dep Var
-------------------------------------------------------------------------------
CONSTANT   0.00397578 0.000678566     5.85909     0.000       ---         ---
COE          0.303476   0.0351191     8.64133     0.000    0.444067    0.444067

Evaluating the OLS Results

Taken at face value, these results look good. The standard errors on both estimates are small and both variables are statistically significant. We may be tempted to stop there. However, let's look more closely using some of the tools mentioned earlier.

Checking For Heteroscedasticity

First, let's create some plots using our residuals to check for heteroscedasticity. We will look at:

A histogram of the residuals.
The residuals versus the independent variable.


/*
** Plot a histogram of the residuals 
** Check for skewed distribution
*/
plotHist(ols_out.resid, 50);

Our histogram indicates that the residuals from our OLS regression are asymmetric and slightly skewed. While the results aren't dramatic, they warrant further exploration to check for heteroscedasticity.


/*
** Plot residuals against COE
** Check for increasing or decreasing variance 
** as the independent variable changes.
*/
plotScatter(fred_fgls[., "COE"], ols_out.resid);

It's hard to determine if these results are indicative of heteroscedasticity or not. Let's add random normal observations to our scatter plot as see how they compare.


// Add random normal observations to our scatter plot
// scale by 100 to put on same scale as residuals
rndseed 897680;
plotAddScatter(fred_fgls[., "COE"], rndn(rows(ols_out.resid), 1)/100);

Our residual plot doesn't vary substantially from the random normal observations and there isn't strong visual evidence of heteroscedasticity.

If we did have heteroscedasticity, our residuals would exhibit a fan-like shape, indicating a change in the spread between residuals as our observed data changes. For example, consider this plot of hypothetical residuals against COE:

Checking For Autocorrelation

As you may have noticed, we don't have to look further than our OLS results for signs of autocorrelation. The olsmt procedure reports the Durbin-Watson statistic as part of the printed output. For this regression, the Durbin-Watson statistic is 0.773, which is significantly below 2, suggesting positive autocorrelation.

We can find further support for this conclusion by inspecting our residual plots, starting with a plot of the residuals against time.


// Checking for autocorrelation
/*
** Plot the residuals over time and 
** look for cycles or trends to 
** check for autocorrelation.
*/
plotXY(fred_fgls[., "date"], ols_out.resid);

Our time plot of residuals:

Has extended periods of large residuals, (roughly 1970-1977, 1979-1985, and 2020-2022).
Suggests positive autocorrelation.

Now let's examine the plot of our residuals against lagged residuals:


/*
** Plot residuals against lagged residuals 
** look for relationships and trends
*/
// Lag residuals and remove missing values
lagged_res = lagn(ols_out.resid, 1);
 
// Trim first observations and plot residuals
// against lagged residuals
plotScatter(lagged_res, ols_out.resid);

This plot gives an even clearer visual of our autocorrelation issue demonstrating:

A clear linear relationship between the residuals and their lags.
Larger residuals in the previous period lead to larger residuals in the current period.

FGLS Estimation

After examining the results more closely from the OLS estimation, we have clear support for using FGLS. We can do this using the fgls procedure, introduced in GAUSS 24.

The GAUSS `fgls` Procedure

The fgls procedure allows for model specification in one of two styles. The first style requires a dataframe input and a formula string:


// Calling fgls using a dataframe and formula string
out = fgls(data, formula);

The second option requires an input matrix or dataframe containing the dependent variable and an input matrix or dataframe containing the independent variables:


// Calling fgls using dependent variable
// and independent variable inputs
out = fgls(depvar, indvars);

Both options also allow for:

An optional input specifying the computation method for the weighting matrix. GAUSS includes 7 pre-programmed options for the weighting matrix or allows for a user-specified weighting matrix.
An optional fglsControl structure input for advanced estimation settings.


out = fgls(data, formula [, method, ctl])

The results from the FGLS estimation are stored in a fglsOut structure containing the following members:

Member	Description
out.beta_fgls	The feasible least squares estimates of parameters.
out.sigma_fgls	Covariance matrix of the estimated parameters.
out.se_fgls	Standard errors of the estimated parameters.
out.ci	Confidence intervals of the estimated parameters.
out.t_stats	The t-statistics of the estimated parameters.
out.pvts	The p-value of the t-statistics of the estimated parameters.
out.resid	The estimate residuals.
out.df	Degrees of freedom.
out.sse	Sum of squared errors.
out.sst	Total sum of squares.
out.std_est	Standard deviation of the residuals.
out.fstat	Model f-stat.
out.pvf	P-value of the model f-stat.
out.rsq	R-squared.
out.dw	Durbin-Watson statistic.

Running FGLS

Let's use FGLS and see if it helps with autocorrelation. We'll start with the default weighting matrix, which is an AR(1) structure.


// Estimate FGLS parameters using
// default setting
struct fglsOut fOut;
fOut = fgls(fred_fgls, "CPIAUCSL ~ COE");

Valid cases:                    306          Dependent variable:             COE
Total SS:                     0.019          Degrees of freedom:             304
R-squared:                    0.140          Rbar-squared:                 0.137
Residual SS:                  0.017          Std error of est:             0.007
F(1,304)                     49.511          Probability of F:             0.000
Durbin-Watson                 0.614

--------------------------------------------------------------------------------
                        Standard                    Prob
Variable   Estimates       Error     t-value        >|t|  [95% Conf.   Interval]
--------------------------------------------------------------------------------

Constant     0.00652    0.000908        7.19       0.000     0.00474      0.0083
CPIAUCSL        0.14      0.0286         4.9       0.000       0.084       0.196

The FGLS estimates the AR(1) weighting matrix differ from our OLS estimates in both the coefficients and standard errors.

Let's look at a plot of our residuals:


// Plot FGLS residual 
lagged_resid = lagn(fOut.resid, 1);
plotScatter(lagged_resid, fOut.resid);

Our residuals suggest that FGLS hasn't fully addressed our autocorrelation. What should we take from this?

This likely means that we need to consider higher-order autocorrelation. We may want to extend this analysis by:

Running the Breusch-Godfrey test to check for higher-order autocorrelation.
Examining the autocorrelation function (ACF) and partial autocorrelation functions (PACF).
Estimating an alternative time series model, such as an ARIMA model.

Ready to give FGLS a try? Get started with GAUSS demo today!

Example Two: American Community Survey

Now let's consider a second example using a subset of data from the 2019 American Community Survey (ACS).

Data

The 2019 ACS data subset was cleaned and provided by the Social Science Computing Cooperative from University of Wisconsin-Madison.

The survey data subset contains 5000 observations of the following variables:

Variable	Census Codebook Name	Description
household	SERIALNO	Housing unit identifier.
person	SPORDER	Person number.
state	ST	State.
age	AGEP	Age in years.
other_languages	LANX	Another language is spoken at home.
english	ENG	Self-rated ability to speak English, if another language is spoken.
commute_time	JWMNP	Travel time to work in minutes, top-coded at 200.
marital_status	MAR	Marital status.
education	SCHL	Educational attainment, collapsed into categories.
sex	SEX	Sex (male or female).
hours_worked	WKHP	Usual hours worked per week in the past 12 months.
weeks_worked	WKHN	Weeks worked per year in the past 12 months.
race	RAC1P	Race.
income	PINCP	Total income in current dollars, rounded.

Let's run a naive model of income against two independent variables, age and hours_worked.

Our first step is loading our data:


/*
** Step One: Data Loading 
** Using the 2019 ACS 
*/
// Load data 
acs_fgls = loadd("acs2019sample.dta", "income + age + hours_worked");
 
// Review the summary statistics
dstatmt(acs_fgls);

---------------------------------------------------------------------------------------------
Variable             Mean     Std Dev      Variance     Minimum     Maximum    Valid Missing
---------------------------------------------------------------------------------------------

income          4.062e+04   5.133e+04     2.634e+09       -8800   6.887e+05     4205    795
age                 43.38       24.17           584           0          94     5000      0
hours_worked        38.09       13.91         193.5           1          99     2761   2239

Based on our descriptive statistics there are a few data cleaning steps that will help our model:

Remove missing values using the packr procedure.
Transform income to thousands of dollars to improve data scaling.
Remove cases with negative incomes.


// Remove missing values
acs_fgls = packr(acs_fgls);
 
// Transform income
acs_fgls[., "income"] = acs_fgls[., "income"]/1000;
 
// Filter out cases with negative incomes
acs_fgls = delif(acs_fgls, acs_fgls[., "income"] .< 0);

In this example we will overlook the issues that arise with using truncated data.

OLS Estimation

Now we're ready to run a preliminary OLS estimation.


// Declare 'ols_ctl' to be an olsmtControl structure
// and fill with default settings
struct olsmtControl ols_ctl;
ols_ctl = olsmtControlCreate();
 
// Set the 'res' member of the olsmtControl structure
// so that 'olsmt' will compute residuals and the Durbin-Watson statistic
ols_ctl.res = 1;
 
// Declare 'ols_out' to be an olsmtOut structure
// to hold the results of the computations
struct olsmtOut ols_out;
 
// Perform estimation, using settings in the 'ols_ctl'
// control structure and store the results in 'ols_out'
ols_out = olsmt(acs_fgls, "income ~ age + hours_worked", ols_ctl);

Valid cases:                  2758      Dependent variable:              income
Missing cases:                   0      Deletion method:                   None
Total SS:              8771535.780      Degrees of freedom:                2755
R-squared:                   0.147      Rbar-squared:                     0.146
Residual SS:           7481437.527      Std error of est:                52.111
F(2,2755):                 237.536      Probability of F:                 0.000
Durbin-Watson:               1.932

                             Standard                 Prob   Standardized  Cor with
Variable         Estimate      Error      t-value     >|t|     Estimate    Dep Var
-----------------------------------------------------------------------------------
CONSTANT         -31.0341     3.91814    -7.92062     0.000       ---         ---
age              0.762573   0.0620066     12.2983     0.000    0.216528    0.227563
hours_worked      1.25521   0.0715453     17.5443     0.000    0.308893    0.316628

Our results make intuitive sense and suggest that:

Both age and hours_worked are statistically significant.
Increases in age lead to increases in income.
Increase in hours_worked lead to increases in income.

Evaluating the OLS Results

As we know from our previous example, we need to look beyond the estimated coefficients and standard errors when evaluating our model results. Let's start with the histogram of our residuals:


/*
** Plot a histogram of the residuals 
** Check for skewed distribution
*/
plotHist(ols_out.resid, 50);

The histogram of our residuals is right skewed with a long tail on the right side.

However, because our initial data is truncated, residual scatter plots will be more useful for checking for heteroscedasticity.


/*
** Plot residuals against independent variables
** Check for increasing or decreasing variance 
** as the independent variable changes.
*/
plotScatter(acs_fgls[., "age"], ols_out.resid);
 
// Open second plot window
plotOpenWindow();
plotScatter(acs_fgls[., "hours_worked"], ols_out.resid);

Both plots show signs of heteroscedasticity:

The age scatter plot demonstrates the tell-tale fan-shaped relationship with residuals. This indicates that variance in residuals increases as age increases.
The hours_worked scatter plot is less obvious but does seem to indicate higher variance in the residuals at the middle ranges (40-60) than the lower and higher ends.

FGLS estimation

To address the issues of heteroscedasticity, let's use FGLS. This time we'll use the "HC0" weighting matrix (White, 1980).


// Estimate FGLS parameters 
// using the HC1 weighting matrix
struct fglsOut fOut;
fOut = fgls(acs_fgls, "income ~ age + hours_worked", "HC0");

Valid cases:                   2758              Dependent variable:             age
Total SS:               8771535.780              Degrees of freedom:            2755
R-squared:                    0.147              Rbar-squared:                 0.146
Residual SS:            7481440.027              Std error of est:            52.111
F(2,2755)                   237.535              Probability of F:             0.000
Durbin-Watson                 1.932

-------------------------------------------------------------------------------------
                             Standard                    Prob
     Variable   Estimates       Error     t-value        >|t|  [95% Conf.   Interval]
-------------------------------------------------------------------------------------
    Constant       -30.9      0.0743        -416       0.000       -31.1       -30.8
hours_worked       0.762    0.000475    1.61e+03       0.000       0.761       0.763
      income        1.25     0.00181         694       0.000        1.25        1.26

While using FGLS results in slightly different coefficient estimates, it has a big impact on the standard error estimations. In this case, these changes don't have an impact on our inferences -- all of our regressors are still statistically significant.

Conclusion

Today we've seen how FGLS offers a potential solution for data that doesn't fall within the restrictive IID assumption of OLS.

After today, you should have a better understanding of how to:

Identify heteroscedasticity and autocorrelation.
Compute OLS and FGLS estimates using GAUSS.

	// Load data
	fred_fgls = loadd("fred_fgls.gdat");

	// Preview data
	head(fred_fgls);
	tail(fred_fgls);

	// Declare 'ols_ctl' to be an olsmtControl structure
	// and fill with default settings
	struct olsmtControl ols_ctl;
	ols_ctl = olsmtControlCreate();

	// Set the 'res' member of the olsmtControl structure
	// so that 'olsmt' will compute residuals and the
	// Durbin-Watson statistic
	ols_ctl.res = 1;

	// Declare 'ols_out' to be an olsmtOut structure
	// to hold the results of the computations
	struct olsmtOut ols_out;

	// Perform estimation, using settings in the 'ols_ctl'
	// control structure and store the results in 'ols_out'
	ols_out = olsmt(fred_fgls, "CPIAUCSL ~ COE", ols_ctl);

	/*
	** Plot a histogram of the residuals
	** Check for skewed distribution
	*/
	plotHist(ols_out.resid, 50);

	/*
	** Plot residuals against COE
	** Check for increasing or decreasing variance
	** as the independent variable changes.
	*/
	plotScatter(fred_fgls[., "COE"], ols_out.resid);

	// Add random normal observations to our scatter plot
	// scale by 100 to put on same scale as residuals
	rndseed 897680;
	plotAddScatter(fred_fgls[., "COE"], rndn(rows(ols_out.resid), 1)/100);

	// Checking for autocorrelation
	/*
	** Plot the residuals over time and
	** look for cycles or trends to
	** check for autocorrelation.
	*/
	plotXY(fred_fgls[., "date"], ols_out.resid);

	/*
	** Plot residuals against lagged residuals
	** look for relationships and trends
	*/
	// Lag residuals and remove missing values
	lagged_res = lagn(ols_out.resid, 1);

	// Trim first observations and plot residuals
	// against lagged residuals
	plotScatter(lagged_res, ols_out.resid);

	// Calling fgls using a dataframe and formula string
	out = fgls(data, formula);

	// Calling fgls using dependent variable
	// and independent variable inputs
	out = fgls(depvar, indvars);

	// Estimate FGLS parameters using
	// default setting
	struct fglsOut fOut;
	fOut = fgls(fred_fgls, "CPIAUCSL ~ COE");

	// Plot FGLS residual
	lagged_resid = lagn(fOut.resid, 1);
	plotScatter(lagged_resid, fOut.resid);

	/*
	** Step One: Data Loading
	** Using the 2019 ACS
	*/
	// Load data
	acs_fgls = loadd("acs2019sample.dta", "income + age + hours_worked");

	// Review the summary statistics
	dstatmt(acs_fgls);

	// Remove missing values
	acs_fgls = packr(acs_fgls);

	// Transform income
	acs_fgls[., "income"] = acs_fgls[., "income"]/1000;

	// Filter out cases with negative incomes
	acs_fgls = delif(acs_fgls, acs_fgls[., "income"] .< 0);

	/*
	** Plot residuals against independent variables
	** Check for increasing or decreasing variance
	** as the independent variable changes.
	*/
	plotScatter(acs_fgls[., "age"], ols_out.resid);

	// Open second plot window
	plotOpenWindow();
	plotScatter(acs_fgls[., "hours_worked"], ols_out.resid);

	// Estimate FGLS parameters
	// using the HC1 weighting matrix
	struct fglsOut fOut;
	fOut = fgls(acs_fgls, "income ~ age + hours_worked", "HC0");

Using Feasible Generalized Least Squares To Improve Estimates

Introduction

What Is Feasible Generalized Least Squares (FGLS)?

How Does FGLS Work?

How Do I Know If I Should Use FGLS?

Example Tools for Identifying Heteroscedasticity and Autocorrelation

Example One: US Consumer Price Index (CPI)

Data

OLS Estimation

Evaluating the OLS Results

Checking For Heteroscedasticity

Checking For Autocorrelation

FGLS Estimation

The GAUSS fgls Procedure

Running FGLS

Example Two: American Community Survey

Data

OLS Estimation

Evaluating the OLS Results

FGLS estimation

Conclusion

Further Reading

The GAUSS `fgls` Procedure