Panel Data Basics: One-way Individual Effects

by Eric · Published April 14, 2019 · Updated April 23, 2024

Introduction

In this blog, we examine one of the fundamentals of panel data analysis, the one-way error component model. Today we will:

Explain the theoretical one-way error component model.
Consider fixed effects vs. random effects.
Estimate models using an empirical example.

The theoretical one-way error component model

The one-way error-component model is a panel data model which allows for individual-specific or temporal-specific error components

$\begin{equation}y_{it} = \alpha + X_{it} \beta + u_{it} \label{OWEM}\end{equation}$

$u_{it} = \mu_{i} + \nu_{it}$

where the subscript i indicates cross-sections of households, individuals, firms, countries, etc. and the subscript t indicates time periods.

In this model, the individual-specific error component, $\mu_{i}$ , captures any unobserved effects that are different across individuals but fixed across time.

The one-way error component model
$\alpha$	Variable of interest which measures an intercept that is constant across all individuals and time periods.
$\beta$	Variable of interest which measures the effect of x on y. It is constant across all individuals and time periods.
$\mu_i$	Individual-specific variation in y which stays constant across time for each individual. In the fixed effects model this is an individual-specific effect to be estimated. In the random effects model this follows a random distribution with parameters that must be estimated.
$\nu_{it}$	Usual stochastic regression disturbance which varies across time and individuals.

Fixed effects vs. random effects

The two most common approaches to modeling individual-specific error components are the fixed effects model and the random effects model.

The key difference between these two approaches is how we believe the individual error component behaves.

The fixed effects model

In the fixed effects model the individual error component:

Can be thought of as an individual-specific intercept term.
Captures any omitted variables that are not included in the regression.
Is correlated with other variables included in the model.

Given these assumptions, the fixed effects model can be thought of as a pooled OLS model with individual specific intercepts:

$\begin{equation}y_{it} = \delta_{i} + X_{it} \beta + \nu_{it}\label{FEM}\end{equation}$

The intercept term, $\delta_i$ , varies across individuals but is constant across time for each individual. This term is composed of the constant intercept term, $\alpha$ , and the individual-specific error terms, $\mu_i$ .

The distinguishing feature of the fixed effects model is that $\delta_i$ has a true, but unobservable, effect which we must estimate.

The random effects model

In the random effects model the individual-specific error component, $\mu_i$ :

Is distributed randomly and is independent of $\nu_{it}$ .
Occurs in cases where individuals are drawn randomly from a large population, such as household studies (Baltagi, 2008).
Is assumed to be uncorrelated with all other variables in the model.
Random effects impact our model through the covariance structure of the error term.

For example, consider the total error disturbance in the model, $u_{it} = \mu_{i} + \nu_{it}$ . The covariance of the error at time t and time s depends on the variance of both $\mu_{i}$ and $\nu_{it}:$

$\begin{equation}cov(u_{it}, u_{is}) = \left\{ \begin{array}{ll} \sigma_{\mu}^2 & \text{for } t \neq s \\ \sigma_{\mu}^2 + \sigma_{\nu}^2 & \text{for } t = s \\ \end{array} \right. \label{REM}\end{equation}$

The distinguishing feature of the random effects model is that $\mu_i$ does not have a true value but rather follows a random distribution with parameters that we must estimate.

Estimation

The fixed effects model

In the fixed effects model, the individual effects introduce an endogeneity that will result in biased estimates if not properly accounted for.

Fortunately, we can make consistent estimates using one of three estimation techniques:

Within-group estimation
First differences estimation
Least squares dummy variable (LSDV) estimation

The first two of these techniques focuses on eliminating the individual effects before estimation. The LSDV method directly incorporates these effects using dummy variables.

	Within-group estimator	LSDV estimator	First differences estimator
Data transformation	Demean the data.	Use dummy variables.	Difference the data.
Regression equation	$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i}$	$Y_{it} = X_{it} \beta_{fe} +\\ \alpha D_{i} + \nu_{it}$	$\Delta{Y}_{it} = \Delta{X}_{it} \beta_{fe} + \Delta{\nu}_{it}$

Let's consider an example panel dataset with three individuals and three time periods shown in the table below.

Individual	Time Period	Y_it	Within Group Ave. Y_i	X_it	Within Group Ave. X_i
1	1	3.901	2.744	0.978	1.174
1	2	2.345	2.744	1.798	1.174
1	3	1.987	2.744	0.745	1.174
2	1	1.250	1.715	1.652	1.425
2	2	0.654	1.715	0.438	1.425
2	3	3.240	1.715	2.185	1.425
3	1	0.901	2.077	2.119	1.653
3	2	1.341	2.077	1.516	1.653
3	3	3.989	2.077	1.324	1.653

Example within-group estimation
We will estimate the fixed effects model using the within-group method. This can be done in three steps:

Find the within-subject means.
Demean the dependent and independent variables using the within-subject means.
Run a linear regression using the demeaned variables.

Finding the within-subject means
To find the within-subject mean of Y for individual one we compute:

$\bar{Y_{1}} = \frac{(3.901 + 2.345 + 1.987)}{3} = 2.7443 .$

We can find the within-subject means using the withinMeans procedure from the pdlib. The withinMeans procedure requires two inputs:

grps: (T*N) x 1 matrix, group identifier.
data: (T*N) x k, panel data.

The pdlib library is available for free and can be directly installed using the GAUSS Package Manager.

Using our sample data stored in the GAUSS data file simple_data.dat:


// Load data
data = loadd("simple_data.dat");
 
// Assign groups variable
grps = data[., 1];
 
// Assign y~x matrix
reg_data = data[.,3:4];
 
// Find group means
grp_means = withinMeans(grp, reg_data);
 
print "Group means for Y and X:";
grp_means;

Our output reads:

Group means for Y and X:

 2.7443  1.1737
 1.7147  1.4250

Demeaning the data
The next step is to demean the data. This removes any time-invariant effects. After finding the within-subject means, the data is demeaned:

$\widetilde{Y_1} = Y_{1t} - \overline{Y}_1 =\\ 3.901 - 2.744 = 1.157,\\ 2.345 - 2.744 = -0.399,\\ 1.987 - 2.744 = -0.757 .$

In GAUSS we can demean data using the demeanData procedure from the pdlib library. The demeanData procedure requires two inputs:

grps: (T*N) x 1 matrix, group identifier.
data: (T*N) x k, panel data.

The demeanData procedure internally computes the within-subject means and requires just the the reg_data and grps variables that we created in the first step:


// Remove time-invariant group means
data_tilde = demeanData(grps, reg_data);
 
print "Demeaned data:";
data_tilde;
print;

Our demeaned data is printed in the output:

Demeaned data:

 1.1567 -0.1957
-0.3993  0.6243
-0.7573 -0.4287
-0.4647  0.2270
-1.0607 -0.9870
 1.5253  0.7600
-1.1760  0.4660
-0.7360 -0.1370
 1.9120 -0.3290

Performing the regression
Once we have transformed our x and y data we are ready to estimate the parameters of the fixed effects regression model:

$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i}$

where

$\widehat{\beta}_{fe} = (\widetilde{X_i}'\widetilde{X_i})^{-1}(\widetilde{X_i}'\widetilde{Y_i}) .$

Using the data we previously demeaned:


// Extract variables
y_tilde = data_tilde[., 1];
x_tilde = data_tilde[., 2];
 
// Regress independent on dependent variables
coeff = inv(x_tilde'x_tilde)*(x_tilde'y_tilde);
 
// Print the fixed effects coefficient
print "Fixed effects coefficient:";
coeff;

The result reads:

Fixed effects coefficient:
 0.3413

Using the fixedEffects procedure
As an alternative to computing these three steps separately, we can use the fixedEffects procedure from the GAUSS panel data library, pdlib. This procedure runs all three steps in a single call. The fixedEffects procedure takes four inputs:

y: (T*N) x 1 matrix, the panel of stacked dependent variables.
x: (T*N) x k matrix, the panel of stacked independent variables.
grps: (T*N) x 1 matrix, group identifier.
robust: Scalar, an indicator variable of whether to use robust standard errors.


// Use fixedEffects procedure
call fixedEffects(reg_data[.,1], reg_data[.,2], grps, 1);

This prints:

------------------- FIXED EFFECTS (WITHIN) RESULTS -------------------

Observations          :  9
Number of Groups      :  3
Degrees of freedom    :  2
R-squared             :  0.026
Adj. R-squared        :  -0.558
Residual SS           :  11.021
Std error of est      :  1.485
Total SS (corrected)  :  11.319
F                     =  0.054        with 1,2 degrees of freedom
P-value               =  0.838

Variable            Coef.       Std. Error       t-Stat       P-Value
----------------------------------------------------------------------
X1                0.341276       1.011041       0.337549       0.768

The random effects model

The covariance structure of the random effects model means that pooled OLS will result in inefficient estimates. Instead, the random effects model is estimated using pooled feasible generalized least squares.

The pooled FGLS method estimates the model

$\widetilde{Y_i} = \widetilde{W_i} \delta_{re} + \widetilde{\epsilon_i}$

where the data is transformed using $\Omega = E[\epsilon_i \epsilon_i']$

$\widetilde{Y_i} = \Omega^{-\frac{1}{2}}Y_{i},$

$\widetilde{W_i} = \Omega^{-\frac{1}{2}}W_{i},$

$\widetilde{\epsilon_i} = \Omega^{-\frac{1}{2}}\epsilon_{i},$

and

$W_i = [1, X_i],$

$\delta = [\alpha, \beta']',$

$\epsilon_i = \mu_i i_T + \nu_i .$

The most difficult part of estimating this model is estimating $\Omega$ and there are a number of different proposed methods.

Example random effects estimation
One of the most common approaches for estimating the random effects model:

Estimates the between-group regression to obtain $\sigma_u^2$ .
Estimates the within-group regression to obtain $\sigma_{\nu}^2$ .
Transforms the data using $\sigma_u^2$ and $\sigma_{\nu}^2$ .
Finds the pooled OLS estimator using the transformed data.

We can perform these steps in one procedure call using the randomEffects procedure in pdlib GAUSS library.

Using the randomEffects procedure
The randomEffects procedure takes four inputs:

y: (T*N) x 1 matrix, the panel of stacked dependent variables.
x: (T*N) x k matrix, the panel of stacked independent variables.
grps: (T*N) x 1 matrix, group identifier.
robust: Scalar, an indicator variable of whether to use robust standard errors.

Continuing with our fixed effects example, we will use our sample data stored in the GAUSS data file simple_data.dat.


// Use randomEffects procedure
call randomEffects(reg_data[., 1], reg_data[., 2], grps, 1);

---------------------- GLS RANDOM EFFECTS RESULTS  ----------------------

Observations          :  9
Number of Groups      :  3
Degrees of freedom    :  2
R-squared             :  0.004
Adj. R-squared        :  -2.985
Residual SS           :  12.907
Std error of est      :  1.358
Total SS (corrected)  :  12.956
F                     =  3.314        with 2,2 degrees of freedom
P-value               =  0.232

Variable            Coef.       Std. Error       t-Stat       P-Value
----------------------------------------------------------------------
CONSTANT          1.994513       1.720996       1.158930       0.366
X1                0.129940       1.053423       0.123350       0.913

Conclusion

In today's blog we have covered the fundamentals of the individual error component models:

The theoretical one-way error component model.
Fixed effects vs. random effects.
Estimating fixed effects and random effects.

The code and data for this blog can be found at our Aptech Blog Github code repository.

References

Baltagi, B.(2008). Econometric analysis of panel data. John Wiley & Sons.

Eric( Director of Applications and Training at Aptech Systems, Inc. )

Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.

	// Load data
	data = loadd("simple_data.dat");

	// Assign groups variable
	grps = data[., 1];

	// Assign y~x matrix
	reg_data = data[.,3:4];

	// Find group means
	grp_means = withinMeans(grp, reg_data);

	print "Group means for Y and X:";
	grp_means;

	// Remove time-invariant group means
	data_tilde = demeanData(grps, reg_data);

	print "Demeaned data:";
	data_tilde;
	print;

	// Extract variables
	y_tilde = data_tilde[., 1];
	x_tilde = data_tilde[., 2];

	// Regress independent on dependent variables
	coeff = inv(x_tilde'x_tilde)*(x_tilde'y_tilde);

	// Print the fixed effects coefficient
	print "Fixed effects coefficient:";
	coeff;

	// Use fixedEffects procedure
	call fixedEffects(reg_data[.,1], reg_data[.,2], grps, 1);

	// Use randomEffects procedure
	call randomEffects(reg_data[., 1], reg_data[., 2], grps, 1);

Panel Data Basics: One-way Individual Effects

Introduction

The theoretical one-way error component model

Fixed effects vs. random effects

The fixed effects model

The random effects model

Estimation

The fixed effects model

The random effects model

Conclusion

Further Reading

References