Introduction
Maximum likelihood is a fundamental workhorse for estimating model parameters with applications ranging from simple linear regression to advanced discrete choice models. Today we examine how to implement this technique in GAUSS using the Maximum Likelihood MT library.
Maximum Likelihood MT provides a number of useful, pre-built tools that we will demonstrate today using a simple linear model. We'll show all the fundamentals you need to get started with maximum likelihood estimation in GAUSS including:
- How to create a likelihood function.
- How to call the
maxlikmt
procedure to estimate parameters. - How to interpret the results from
maxlikmt
.
Maximum Likelihood Estimation in GAUSS
The Maximum Likelihood MT library provides a full suite of tools for easily and efficiently tackling maximum likelihood estimation.
Today we will use the maxlikmt
procedure to estimate two unknown parameters, $\hat{\beta}$ and $\hat{\sigma^2}$. The maxlikmt
procedure requires three inputs, a pointer to a likelihood function, a vector of starting parameter values, and the response data.
It also accepts any optional inputs needed for the likelihood function and an optional control structure for fine-tuning optimization.
out = maxlikmt(&lfn, par_start, y [,..., ctl]);
- &lfn
- A pointer to a procedure that returns either the scalar log-likelihood, a vector of log-likelihoods, or the weighted log-likelihood.
- par_start
- Vector, starting parameter values.
- y
- Vector, the response data.
- ...
- Optional, additional inputs required for the likelihood function. These are passed directly in the order and form provided to the likelihood function.
- ctl
- Structure, an instance of the
maxlikmtControl
structure used to control features of optimization including algorithm, bounds, etc.
Maximum Likelihood Estimation Linear Model Example
Let's start with a simple linear regression example.
In linear regression, we assume that the model residuals are identical and independently normally distributed:
$$\epsilon = y - \hat{\beta}x \sim N(0, \sigma^2)$$
Based on this assumption, the log-likelihood function for the unknown parameter vector, $\theta = \{\beta, \sigma^2\}$, conditional on the observed data, $y$ and $x$ is given by:
$$L(\theta|y, x) = \frac{1}{2}\sum_{i=1}^n \Big[ \ln \sigma^2 + \ln (2\pi) + \frac{(y_i-\hat{\beta}x_i)^2}{\sigma^2} \Big] $$
For today's example, we will simulate 800 observations of linear data using randomly generated $x$ values and true parameter values of $\beta = 1.2$ and $\sigma^2 = 4$.
The code to generate our dataset can be found here.
The Linear Model Likelihood Procedure
Our first step is to create the procedure to compute the log-likelihood.
The log-likelihood procedure will be called by maxlikmt
, so we have to set it up in the way that maxlikmt
expects. The inputs to the log-likelihood procedure are:
- parms
- Vector, the current parameter vector.
- y
- Vector, the response data.
- ...
- Optional, additional inputs required for the likelihood function.
- ind
- 3x1 vector created by
maxlikmt
, specifying whether the gradient or Hessian should be computed. Your procedure can ignore this input if you choose to havemaxlikmt
compute a numerical gradient and / or Hessian.
The log-likelihood procedure must return a modelResults
structure with the function
member containing the value of log-likelihood at the current parameter values.
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2;
// Extract parameters
beta_est = theta[1];
sigma2 = theta[2];
// Declare the modelResults structure
struct modelResults mm;
// Set the modelResults structure member, 'function',
// equal to the log-likelihood function value
mm.function = -1/2 * (ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);
retp(mm);
endp;
After looking at the above procedure, you are probably wondering:
- Why do we need to return the function value in a
modelResults
structure? - What is
ind
?
Hold those questions for now. We will answer them in our second example when we add an analytical gradient.
Calling the maxlikmt
procedure
Once the log-likelihood procedure has been created, we are ready to call maxlikmt
. We first specify our starting values:
// Starting values
theta_start = { 0.5, 0.5 };
Finally, we call maxlikmt
and print the results using maxlikmtprt
:
// Perform estimation and print report
call maxlikmtprt(maxlikmt(&lfn, theta_start, y, x));
maxlikmt
as to the likelihood function.Maximum Likelihood Estimate Results
GAUSS prints the following results to the input/output window:
return code = 0 normal convergence Log-likelihood -1686.04 Number of cases 800 Covariance of the parameters computed by the following method: ML covariance matrix Parameters Estimates Std. err Est./se Prob Gradient ------------------------------------------------------------------ x[1,1] 1.1486 0.0710 16.177 0.0000 0.0000 x[2,1] 3.9638 0.1982 20.001 0.0000 0.0028
The first thing to note is that GAUSS tells us that the optimization converges normally. If the optimization failed to converge GAUSS would report this along with the reason for failure.
The GAUSS maximum likelihood estimates are $\hat{\beta}_{MLE} = 1.1486$ and $\hat{\sigma^2} = 3.9638$.
In addition to the parameter estimates, GAUSS provides confidence intervals around the estimates:
Wald Confidence Limits 0.95 confidence limits Parameters Estimates Lower Limit Upper Limit Gradient ---------------------------------------------------------------- x[1,1] 1.1486 1.0093 1.2880 0.0000 x[2,1] 3.9638 3.5747 4.3528 0.0028
Storing Estimation Results in the maxlikmtResults
Output Structure
Our first example used the call
keyword to discard the return from maxlikmt
. You can return the estimation results in a maxlikmtResults
structure.
Some of the most useful elements in the maxlikmtResults
structure include:
- m_out.par.obj.m
- Vector, estimated parameter values. These are in the same order as the parameters were passed in the parameter starting values input.
- m_out.returnDescription
- String, describes if convergence normally or if there are errors.
- m_out.covPar
- Matrix, covariance of the paramter estimates. This is computed as the Moore-Penrose inverse of the Hessian at the final parameter estimates.
Here is how to modify our earlier code to save the results.
// Declare 'm_out' to be a maxlikmtResults structure
struct maxlikmtResults m_out;
// Perform estimation and store results in 'm_out'
m_out = maxlikmt(&lfn, theta_start, y, x);
// Print the parameter estimates
print m_out.par.obj.m;
// Print the return description
print m_out.returnDescription;
Specifying the Analytical Gradient
For our linear model, it is quite feasible to derive the analytical first derivates:
$$\frac{\partial L(\theta||y, x)}{\partial \beta} = \frac{1}{\sigma^2}\Big[x*(y - \beta x)\Big]$$ $$\frac{\partial L(\theta||y, x)}{\partial \sigma^2} = -\frac{1}{2}\Big[\frac{1}{\sigma} - \frac{(y - \beta x)^2}{\sigma^2}\Big]$$
We can specify to GAUSS to use these analytical first derivatives by computing them in our likelihood procedure and assigning the results to the gradient
member of the modelResults
structure as shown below:
// Write likelihood function
// with analytical derivatives
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2, g1, g2;
beta_est = theta[1];
sigma2 = theta[2];
struct modelResults mm;
// Specify likelihood function
if ind[1];
mm.function = -1/2*(ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);
endif;
// Include gradients
if ind[2];
g1 = 1/sigma2 * ((y - x*beta_est) .*x);
g2 = -1/2 * ((1/sigma2) - (y - x*beta_est)^2 / sigma2^2);
// Concatenate into a (n observations)x2 matrix.
mm.gradient = g1 ~ g2;
endif;
retp(mm);
endp;
What is ind
?
ind
is a 3x1 vector created by maxlikmt
which tells your likelihood procedure whether it needs a gradient or Hessian calculation in addition to the function evaluation. As we can see in the above procedure, if the second element of ind
is nonzero, maxlikmt
is asking for a gradient calculation.
Note that:
- You do not have to use or check for
ind
in your likelihood procedure if you are using numerical derivatives. maxlikmt
createsind
internally and passes it to the likelihood procedure. You do not have to declare or create it.
How ind
can speed up your modeling
The advantage of using the ind
input rather than a separate gradient or Hessian procedure is that often times many calculations needed by the function evaluation are also needed by the gradient. Combining them all into one procedure gives an opportunity to only compute the operations once.
For example, if we needed to speed up our model, we could modify our likelihood procedure to compute the residuals just once towards the top of the code and use the newly created residuals
and residuals2
variables in the function and gradient calculations.
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2, g1, g2, residuals, residuals2;
beta_est = theta[1];
sigma2 = theta[2];
// Operations common to likelihood and gradient
residuals = y - x*beta_est;
residuals2 = residuals^2;
struct modelResults mm;
// Specify likelihood function
if ind[1];
mm.function = -1/2*(ln(sigma2) + ln(2*pi) + residuals2 / sigma2);
endif;
// Include gradients
if ind[2];
g1 = 1/sigma2 * (residuals .* x);
g2 = -1/2 * ((1/sigma2) - (residuals2 / sigma2^2);
// Concatenate into a 1x2 row vector.
mm.gradient = g1 ~ g2;
endif;
retp(mm);
endp;
Conclusions
Congratulations! After today's blog, you should have a better understanding of how to implement maximum likelihood in GAUSS.
We've covered the components of the maxlikmt
procedure including:
- The
maxlikmt
inputs. - The log-likelihood procedure and the
modelResults
structure. - The
maxlikmtResults
output structure. - Specifying analytical gradients in the log-likelihood function.
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.