Example introduction
Suppose we have a dependent variable that is observed in several ordered categories. We might estimate coefficients of a regression on this variable using the ordered probit model
$$y^* = x\beta + \epsilon$$
where
$$ y = \begin{cases} 1 \text{ if } -\infty \lt y^* \lt \tau_2\\ 2 \text{ if } \tau_2 \le y^* \lt \tau_3\\ \ \ \ \ \vdots\\ K \text{ if } \tau_K \le y^* \lt +\infty\\ \end{cases} $$ and $$-\infty \lt \tau_2 \lt \tau_3 \lt \cdots \lt \tau_K \lt +\infty$$
Assuming
$$\epsilon \sim N(0,1)$$
we have
$$P(y = k) = \Phi(\tau_k - x\beta) - \Phi(\tau_{k-1} - x\beta)$$
where $\Phi$ represents the normal cumulative distribution function. The log-likelihood for this model is
$$lnL(\beta,\tau) = \sum_{k=1}^{K} \sum_{y_1=k} ln[\Phi(\tau_k - x_i\beta) - \Phi(\tau_{k-1} - x_i\beta)]$$
Step 1: Load the CMLMT library
// Make cmlmt library functions available
library cmlmt;
Step 2: Load data
Our data source is a CSV file, containing four variables. The first column is the dependent variable and the remaining three will be the independent variables.
// Load all four variables into one matrix
data = loadd(getGAUSSHome() $+ "pkgs/cmlmt/examples/cmlmt_ordered_probit.csv");
// Create separate matrices for dependent
// and independent variables
y = data[., 1];
X = data[., 2:4];
Step 3: Starting parameters
Our model has two characteristics which make it less convenient to store our parameters as a single vector.
- Our log-likelihood function has two vectors which we will want to reference separately.
- We do not want to estimate the first and last elements of $\tau$.
In order to handle this, we will place our parameters inside of a PV structure. The PV structure allows us to 'pack' our vectors into the structure and retrieve them by name. It also allows us to apply a mask so that some of the parameters will not be estimated.
// Declare 'p' to be a PV structure
// and initialize it
struct PV p;
p = pvCreate();
// Pack the starting vectors for 'beta'
// into 'p' so that we can reference it by name
beta_strt = { 0.5, 0.5, 0.5 };
p = pvPack(p, beta_strt, "beta");
// Pack the starting values for 'tau'
// with a mask, telling CMLMT not to
// estimate the first and last elements of 'tau'
// Note the use of 'pvPackM' instead of 'pvPack'
mask = { 0, 1, 1, 1, 0 };
tau_strt = { -30, -1, 0, 1, 30 };
p = pvPackM(p, tau_strt, "tau", mask);
Step 4: The log-likelihood procedure
Now that we have set up our parameters, it is time to look at how we will use them in the log-likelihood procedure. The log-likelihood procedure has three sets of inputs
- The first input is the parameter vector or PV structure containing the model parameters.
- Next comes any data your function needs other than the parameters.
- The final input is an indicator vector which CMLMT uses to tell your function whether to compute the gradient or Hessian. For this example, we will ignore the indicator vector.
// The log-likelihood procedure
proc (1) = orderedProbit(p, y, X, ind);
local mu, tau_, beta_, emu, eml;
// Declare 'mm' to be a modelResults
// structure to hold the function value
struct modelResults mm;
// Extract the current 'tau'
// and 'beta' vectors
tau_ = pvUnpack(p, "tau");
beta_ = pvUnpack(p, "beta");
mu = X * beta_;
eml = tau_[y] - mu;
emu = tau_[y+1] - mu;
// Assign the log-likelihood value to the
// 'function' member of the modelResults structure
mm.function = ln(cdfn(emu) - cdfn(eml));
// Return the model results structure
retp(mm);
endp;
Step 5: Apply the inequality constraints
In order to enforce the inequality
$$-\infty \lt \tau_2 \lt \tau_3 \lt \cdots \lt \tau_K \lt +\infty$$
we will set this inequality constraint equation
$$ -1\tau_2 + 1\tau_3 + 0\tau_4 \ge 0\\ 0\tau_2 + -1\tau_3 + 1\tau_4 \ge 0$$
// Declare 'ctl' to be a cmlmtControl structure
// and fill with default values
struct cmlmtControl ctl;
ctl = cmlmtControlCreate();
/*
** The dot product of each row of ctl.C
** with the parameter vector must be
** greater than or equal to the corresponding
** row of ctl.D
**
** Note that the first 3 elements of our parameter
** vector are 'beta' and the final 3 are the middle
** 3 elements of 'tau', because the mask keeps
** the first and last elements of 'tau' from
** the estimation.
*/
ctl.C = { 0 0 0 -1 1 0,
0 0 0 0 -1 1 };
ctl.D = { 0,
0 };
Step 6: Run the estimation and print results
The cmlmt
procedure will take five inputs
- A pointer to the log-likelihood procedure.
- The PV structure containing the model parameters.
3-4. The
y
vector andX
matrix. - The
cmlmtControl
structure, containing the inequality constraints.
// Declare 'out' to be a cmlmtResults structure
// to hold the results of the estimation
struct cmlmtResults out;
// Perform estimation and print results
out = cmlmt(&orderedProbit, p, y, X, ctl);
call cmlmtPrt(out);
Results
After running all of the above code, the following report will be printed
=============================================================================== CMLMT Version 3.0.0
=============================================================================== return code = 0 normal convergence Log-likelihood -14.1623 Number of cases 200 Covariance of the parameters computed by the following method: ML covariance matrix Parameters Estimates Std. err. Est./s.e. Prob. Gradient --------------------------------------------------------------------- beta[1,1] 4.2996 0.2816 15.268 0.0000 0.0000 beta[2,1] 5.1704 0.3275 15.787 0.0000 0.0000 beta[3,1] 6.1077 0.2931 20.838 0.0000 0.0000 tau[2,1] -10.3687 0.6380 -16.251 0.0000 0.0000 tau[3,1] 0.2012 0.2833 0.710 0.4776 0.0000 tau[4,1] 10.2323 0.6439 15.890 0.0000 0.0000 Correlation matrix of the parameters 1 0.5337 0.37587 -0.56933 0.12117 0.55204 0.53365 1 0.56732 -0.71255 -0.029152 0.66281 0.37587 0.56732 1 -0.62297 -0.010140 0.61754 -0.56933 -0.71255 -0.62297 1 -0.013120 -0.59049 0.12117 -0.029152 -0.010140 -0.013120 1 0.016093 0.55204 0.66281 0.61754 -0.59049 0.016093 1 Wald Confidence Limits 0.95 confidence limits Parameters Estimates Lower Limit Upper Limit Gradient ---------------------------------------------------------------------- beta[1,1] 4.2996 3.7442 4.8551 0.0000 beta[2,1] 5.1704 4.5245 5.8164 0.0000 beta[3,1] 6.1077 5.5296 6.6858 0.0000 tau[2,1] -10.3687 -11.6271 -9.1104 0.0000 tau[3,1] 0.2012 -0.3575 0.7599 0.0000 tau[4,1] 10.2323 8.9623 11.5023 0.0000