Logit, Tobit, Probit 3-1

Revised Chapter 3 in Specifying and Diagnostically Testing Econometric Models (Edition 3)

© by Houston H. Stokes Revised 14 May 2011. All rights reserved. Preliminary Draft

Chapter 3

Logit, Tobit, Probit

3.0 Introduction

3.1 Probit Models

3.2 Multinomial Probit Models

3.3 Logit Models

3.4 Multinomial Logit Models

3.5 Tobit Models

Table 3.1 B34S Program to Calculate TOBIT Values

3.6 Examples

Table 3.2 Examples Using McManus Data

3.7 Poisson Models

Table 3.3 Subroutine poiss_se used to calculate alternative Poission Covariance Matrices

Table 3.4 Oster and Hilbe Poisson Test Data

Table 3.5 An Extended Test Case of a Poisson Model Discussed in Greene

3.8 Conclusion

Logit, Tobit, Probit

3.0 Introduction

The econometric analysis discussed in this chapter concern models with restrictions on the range of the dependent variable. The B34S commands discussed in this chapter include the following:

- probit: Runs a single-equation probit model for a 0-1 dependent variable.

- mprobit: Runs a single-equation probit model for a dependent variable with up to ten ordered states.

- loglin: Runs up to a four-equation logit model for 0-1 dependent variables.

- mloglin: Runs up to a 69-equation logit model with each dependent variable having

up to 20 states.

- tobit: Runs a single-equation tobit model in which the dependent variable is

bounded from above or below.

In addition the poisson model is discussed and estimated with the matrix command.The B34Smprobit command includes the probit command as a special case. Likewise, the loglin command is a special case of the mloglin command. If only the probit or loglin command is needed, substantial time will be saved by using these commands in place of the more general mprobit and mloglin commands. These commands will be discussed in turn. Good general references for this chapter are Maddala(1983) and Daganzo(1979). Promising non-parametric methods such as the Random Forest Model are dicsueed in Chapter 17.

3.1 Probit Models

The B34Sprobit command is based on heavily modified code originally obtained from Mathematica Policy Research. The discussion of its use is based on undated manuals, which were obtained with the original code.[1]Probit analysis is used in estimating models of the form

(3.1-1)

where takes the value 0-1 and () can be a continuous or discrete variable. If equation (3.1-1) is estimated with OLS, some of the estimated values of will lie outside the range 0-1 and the residuals will be heteroskedastic. A better way to think of the problem would be to consider a probit model that specifies a random process for the determination of the dependent variable. Whether or not the response is observed depends on the relationship between the stimulus and a random critical level of the stimulus S*. In terms of the model

(3.1-2)

where and it is assumed that S* = N(0,1). The assumption of zero mean is handled by estimating a constant in equation (3.1-1), while the assumption of a unit variance obviates the necessity of estimating a variance for S*.

S* takes the place of the random residual in the OLS model. The distribution of in the probit model is as follows:

(3.1-3)

(3.1-4)

where F( ) is the cumulative normal density at . We can define as the standard normal density and . At a given data point calculate

(3.1-5)

(3.1-6)

Equations (3.1-3) and (3.1-4) can be thought of as conditional probabilities. The expected value of the dependent variable is obtained by weighting the outcomes by their respective probabilities. This is shown in equations (3.1-7) and (3.1-8).

(3.1-7)

(3.1-8)

For each observation, equation (3.1-8) gives the expected probability (on the unit interval) that . This probability is the value of the cumulative normal at .

In OLS models, the partial derivative of E(y) with respect to one of the independent variables is the estimated coefficient. In probit models, this partial derivative changes, depending on the levels of the independent variables, and is equal to the estimated coefficient weighted by the normal density evaluated at .

(3.1-9)

As runs from large negative values to large positive values, runs from zero to one.converges to zero for very large or small values of S and will be maximum (.398 ) when is near zero.

The probit command prints out:

- Independent variable summations and means.

- The log of the likelihood function after each iteration (iitlk option).

- The estimates of the parameters after each iteration (iiest option).

- The second-derivatives matrix after each iteration (isecd option).

After the final iteration, the following output is available:

- The negative inverse of the second derivatives matrix (variance-covariance matrix of the estimated coefficients).

- The final maximum likelihood estimates of the parameters, their standard errors and the ratio of the coefficients to their standard errors.

- The partial derivatives of E(y), with respect to each xi, for the mean and the maximum value of each xi.

- The number of limit and nonlimit observations.

- Minus two times the log likelihood ratio. For a model with q right-hand-side variables is distributed as chi-square with q degrees of freedom, where

= (max. likelihood with q constraints) /(max. likelihood without constraints) (3.1-10)

is bounded at 1 for a situation in which the constraints do not matter. A larger value indicates a more significant regression result since the imposition of the constraints has significantly reduced the likelihood of the sample. By use of the nstrt and nstop parameters, the user can output the actual and calculated values for as well as the density of the normal distribution, f(S), at that point. Although many users run probit models with 1-0 right-hand-side variables, such as sex and race dummies, it makes little sense to evaluate the partial derivatives at the means of these 0-1 variables because it is very hard to interpret the results. The B34Sprobit nadj option allows the user to input up to 20 values of the explanatory variables and calculate the partial derivatives for these specific values. These partial derivatives are much easier to interpret since they give an indication of the change in the probability of as the explanatory variables change. A sample setup for the probit command is given below.

b34sexec probit $

model y= x h q $ b34seend$

b34sexec probit nadj=3 $

model y= x h q $

means x(.5, .88, 1.03) H(1.0,2.0,1.0)$ b34seend$

In the first example, the model y = f(X, Q, H) is given. The second example shows the use of the optional means sentence, in which three sets of Z scores and their associated partial derivatives are calculated. In calculating the Z scores, the sample mean values are used for Q; the X values used are .5, .88, and 1.3; and the H values used are 1.0, 2.0, and 1.0 and a constant is included. The following calculations are made:

The means sentence outputs the values .

The probit command supports up to 97 variables on the right-hand side and has no limit on the number of observations other than disk size available. This command has routinely been run with 35 variables on the right-hand side and 250,000 observations.[2]

3.2 Multinomial Probit Models

The mprobit command is based on code initially obtained from McKelvey and Zavoina (1971, 1975) and will perform multinomial probit analysis with up to ten ordered categories on the left-hand- side variable and up to 97 variables on the right-hand side. If the left-hand-side variable has only two categories (0-1), the mprobit command gives the same results as the probit command. mprobit estimates a model of the form

(3.2-1)

where is the probability that for the jth observation, the dependent variable is in the kth category. is the product of the j th observation vector X and the estimated coefficient vector .

F( ) calculates the probabilities of the normal distribution function and the 's are estimates of the threshold parameters and have associated significance values similar to the estimated standard errors of the estimated coefficient vector . is assumed to equal 0.0. Thus, when there are only two categories, . After mprobit has been run, it is often desirable to be able to calculate the probability of being in each category. As an example, assume a five-category problem with three right-hand-side variables CONSTANT,X1 and X2 with coefficient estimates of 10., 1.2 and 2.2 respectively and estimated 's of 0.0, 1.9561290, 5.5418716 and 8.6592352. The B34S commands listed below will calculate the probability of being in the five categories (PZ1, ..., PZ5).

b34sexec data$

input x1 x2$

build xb pz1 pz2 pz3 pz4 pz5$

gen xb = 10.0 + 1.2*x1 + 2.2*x2 $

* b1 = 10. $

* b2 = 1.2 $

* b3 = 2.2 $

* MU1=0.0$

* MU2=1.9561290$

* MU3=5.5418716$

* MU4=8.6592352$

gen pz1 =probnorm(0.0 - xb)$

gen pz2 =probnorm(1.9561290 - xb) - probnorm(0.0 - xb) $

gen pz3 =probnorm(5.5418716 - xb) - probnorm(1.9561290 - xb)$

gen pz4 =probnorm(8.6592352 - xb) - probnorm(5.5418716 - xb)$

gen pz5 = 1.0 - pz1 - pz2 - pz3 - pz4$

datacards$

(x1 & x2 values here)

b34sreturn$

b34srun$

b34sexec list$ b34srun$

The mprobit command calculates and outputs (-2) logeL, a table showing how the log likelihood changes at each iteration and estimates of the explained sum of squares, the residual sum of squares, the total sum of squares and the R2, following the approach outlined in McKelvey and Zavoina (1975). Since the multinomial probit model is quite different from the OLS model, the assumptions used to get these estimates follow.

We first assume that the dependent variable on its underlying interval scale satisfies a regression model. Because there is no way of knowing its variance, we normalize it so that its variance around the regression line, 2, is unity. We next define the residual sum of squares (RSS) to be

. (3.2-2)

The total sum of squares (TSS) becomes

TSS= (3.2-3)

and the explained sum of squares (ESS) is

ESS=TSS - RSS (3.2-4)

from which it is possible to calculate the R2 as ESS / TSS. McKelvey and Zavoina (1975) caution that these values are only estimates. Unlike OLS models, we cannot observe the residuals about the regression plane or the deviations of the dependent variable about its mean. In addition, equation (3.2-3) assumes that the estimated 's are OLS estimates, not probit estimates. This latter problem is minimized in large samples.

Additional diagnostic output includes the percent predicted correctly and the rank-order correlation, i. e., the predicted vs. the actual. The purpose of these statistics is to give the user some comparison with what an OLS model would show if y were a continuous variable.

In survey work, the researcher often wants to ask questions with more than two possible choices. If the choices are known to be ordered, but the degree of difference between the choices is not known, then the B34Smprobit command provides an alternative to forcing the problem into a standard 0-1 probit model. A study of the significance values for the will determine whether there really are significant differences between the choices. Use of the vvalues sentence, which will be discussed, facilitates this study. As an example of the mprobit command, consider a problem in which the dependent variable y is coded 1, 2, 3, 4, 5. The commands

b34sexec mprobit$

vvalues=(1.0,2.0,3.0,4.0,5.0)$

model y =x1 x2 x3 x4 $ b34seend$

will test if x1, x2, x3, and x4 significantly predict y and if there really is a distinction between the cases. If 3 was found not to be significantly different from 2, this would suggest that categories 2 and 3 are, in fact, the same category.

Next, assume that the researcher wishes only to look at the first, third and fourth categories. The vvalues sentence is used to define the problem so that only these categories are valid values for y. Observations coded y = 2 and y = 5 are dropped from the problem.

b34sexec mprobit$

vvalues=(1.0,3.0,4.0)$

model y =x1 x2 x3 x4 $ b34seend$

The above discussion of the mprobit command indicates that it is a superset of the probit command. In problems in which there are only two categories, the probit command will be faster than the mprobit command.

3.3 Logit Models

The B34Sloglin command is based on code developed by Nerlove and Press (1973). This command allows estimation of single-equation logit models. It can estimate more general multivariate models in which there are up to four jointly polytomous variables. The loglin command is a subset of the more general mloglin command, which will be discussed next. While the loglin command handles up to four 0-1 problems, the mloglin command handles up to 69 jointly determined variables with up to 20 states. The advantage of the loglin command is that it does not load the data into core and thus can work with substantially larger data sets than the mloglin command. The binomial logit model estimated with the loglin command assumes

(3.3-1)

and because of the scaler 2 will produce estimates that are half the usual formulation, which does not use such a scaler. Estimated asymptotic t scores are not affected by the scaling. The logistic function, in contrast with the probit function, has slightly fatter tails. If we define as the probability that , then equation (3.3-1) can be transformed to

(3.3-2)

where the dependent variable is the log of the odds that . The slope of the cumulative logistic distribution is largest at = .5. In an OLS model, the effect of a change in any independent variable on the dependent variable is always the estimated coefficient. In the probit and logit models, the effect varies, depending on the value of . Since the loglin procedure is a special case of the more general mlogit procedure, a detailed discussion of the econometric theory is left to the next section in which the mlogit command is discussed.

Output available from the loglin command includes the following:

- OLS starting values for the coefficients.

- Estimated coefficients, their asymptotic standard errors, t-ratios and significance and the gradient associated with each coefficient. The gradient is the first partial derivative of F( ) with respect to each coefficient and should be small for the final values of the coefficients.

- The log of the likelihood function.

Optional output includes the following:

- Routines to test the data (KTEST option).

- Output at each iteration (KCHCK option).

- Options to suppress bivariate interaction terms (iallb option), suppress trivariate interactions (iallt option), and options to suppress all four-way interaction terms (iall4 option). These options are only useful if there is more than one dependent variable in the problem.

- Options to control the tolerance for convergence of the likelihood function (itol1 parameter), and the tolerance for the coefficients (itol2 parameter), and limits on the iterations (limit parameter).

The setup for a simple loglin run of one equation would be

b34sexec loglin$ model y1 = x1 x2 x3 $ b34seend$

A simple multiple dependent variable setup would be

b34sexec loglin$

model y1 y2 y3 y4 = x1 x2 x3 x4 x5$ b34seend$

The loglin command has options whereby selective bivariate interactions can be suppressed (bsupp sentence), selective trivariate interaction terms are suppressed (tsupp sentence), and exogenous variables can be selectively suppressed from some equations (ssupp sentence). The following setup illustrates these options.

b34sexec loglin$

model y1 y2 y3 y4 = x1 x2 x3 x4 x5 x6$

bsupp y1 y2$

tsupp y1 y2 y4$

ssupp y2 x1 y3 x5$

b34seend$

Here the bivariate interaction y1 y2 is suppressed, the trivariate interaction y1 y2 y4 is suppressed, the exogenous variable x1 is suppressed from the equation for y2, and the exogenous variable x5 is suppressed from the equation for y3.

3.4 Multinomial Logit Models

The multinomial logit code, called with the mloglin command, was developed by Kawasaki (1978, 1979) as an extension of the work of Nerlove and Press (1973, 1976). The code was subsequently extended by Lehrer and Stokes (1985) and further refined by Klein and Klein (1988) and Klein (1988). Subsequent improvements were made to the Klein code by Stokes. Currently, only the new version of mloglin is available. The function used with the mloglin command is

(3.4-1)

which implies that the estimated coefficients for the same problem run with the loglin and mloglin commands will differ by sign and magnitude, with the mloglin coefficients being twice the loglin coefficients. LIMDEP (Greene 1995) uses the formulation and produces coefficients that are the same absolute value as mloglin but differing in sign. LIMDEP's coefficients are the same sign as those of loglin, but twice the value. The absolute value of the asymptotic t scores are the same for all three programs.[3]

Assume a problem in which A is a trichotomous endogenous variable and B is a dichotomous, endogenous variable. We define indexes for A and B. If we assume for the sake of this example that A and B are jointly dependent on one continuous variable, x, the log-linear probability model may be written as follows:

(3.4-2)

Identification restrictions are necessary. Nerlove and Press (1973, 1976) impose the constraints that if any one of the effects is summed over all the values of one of the indexes on which it depends, the sum should equal zero. Because it facilitates the interpretation of the results, we make the following alternative assumptions:

Logit, Tobit, Probit 3-1

(3.4-3)

The justification for these alternative assumptions follows. Assume an alternative model with one trichotomous variable, A, as a function of one continuous variable, x. The log-linear probability model may be written as follows:

(3.4-4)

If we impose the constraint that , and , we may write

(3.4-5)

Differentiation of the first equation in (3.4-5) with respect to x indicates that f1 represents the change in the log odds that A will take the value 1, relative to the geometric average of the probabilities associated with a unit change in x. Similar statements may be made about f2 and f3.

If the constraints that and are imposed instead, from equation (3.4-4) we obtain

(3.4-6)

In this case, represents the change in the log odds that A will take the value 1 rather than 3 associated with a unit change in x and may be interpreted analogously.