Appendix 01-1, ver. 1.3 (May 10, 2001) 1-151
Appendix on Censored Data Analysis
Timothy S. Thomas[*]
World Bank, Washington, DC, USA
A.1Doubly Truncated Normal Distribution
In this section, I will present some results which make working with the doubly trucated normal distribution easier.
Definition A.1.1 If y has mean and variance 2, then define the normal probability density function (pdf) as
Definition A.1.2 Define the standard normal pdf as the normal pdf with = 0 and = 1. For notational simplicity, when the normal pdf has only one parameter in parentheses, it will mean that it is the standard normal pdf. That is,
Definition A.1.3 If y has mean and variance 2, define the normal cumulative distribution function (cdf) as
Definition A.1.4 As we did for the standard normal pdf, define the standard normal cdf as the normal cdf with = 0 and = 1. For simplicity, when the normal cdf has only one parameter in parentheses, it will mean that it is the standard normal cdf. That is,
Definition A.1.5. Define the normalization transformation function as
When our attention is only on one distribution, so that and are identical for all stochastic terms, we will use the shorthand notation of (y), or even shorter, y. When we evaluate a standard normal cdf or pdf at (y), we will abbreviate them as y and y. These might be termed the standardized normal cdf and pdf.
Note the following about the transformations:
- , which says that .
Lemma A.1.1 , or more simply .
Proof Use Definitions A.1.1, A.1.2, and A.1.5 to write each side of the lemma. The proof is by inspection of the result.
Lemma A.1.2 , or more simply .
Proof We will use the change of variables method. Let w = (z – ) / . Then z = + w, and dz = dw. Also, where z = y, w = (y – ) / . Then we may write
where the first and last equalities are by Definitions A.1.3 and A.1.4, and the middle equality is simply from the change of variables. Then apply Definition A.1.5, and the proof is done.
Theorem A.1.1 (Leibniz' Rule) The rule is stated without proof.
Lemma A.1.3
(a) (b)
Note that in the case of the standard normal distribution, this reduces to
(a) (b)
Proof To prove (a), we use Lemma A.1.2, Definition A.1.4, Leibniz’ Rule (Theorem A.1.1—note that only the middle term on the right hand side applies here), and Lemma A.1.1, to simply write out
To prove (b), we use Lemma A.1.1 and Definition A.1.2, then differentiate directly and apply Definition A.1.2 and Lemma A.1.1 in reverse. The steps are
In order to make this appendix useful in a wider range of situations, I will produce versions of the lemmas, theorems, and definitions developed for double truncation and censoring for the cases of single truncation or censoring (both lower and upper), and when appropriate, for the special case of lower censoring at 0. In order to do this, it is helpful to note four limits, which will be stated without proof because they result from the very definitions of pdfs and cdfs:
, , , and.
It will also be helpful to develop additional limits, which will require the use of L'Hospital's Rule.
Theorem A.1.2 (L'Hospital's Rule) If and , then
This rule also holds for the case where the limits of f and g are increasing or decreasing without bound. It is stated without proof.
Lemma A.1.4
(a) and
(b) and
(c)
(d)
Proof Each limit, except (c), requires L'Hospital's Rule (Theorem A.1.2). We will begin with the first limit in (a).
The first equality is simply from rewriting the equation in a form in which L'Hospital's Rule may be applied. The second equality if from L'Hospital's Rule, and the third equality is from cancelling terms. The limit being equal to zero is from noting that both terms in the denominator increase without bound.
The second limit in (a) results from noting that the effect of the limit decreasing without bound is to reverse the sign on the first limit of (a), and since the pdf in symmetric—that is, (x) = (-x)—we see that the limit this second equation in (a) is just the negative of the limit of the first equation, which is 0.
The first limit in (b) can be solved as follows.
The first equality is from rearranging terms; the second from L'Hospital's Rule; the third from cancelling terms in the denominator; and the fourth from applying L'Hospital's Rule a second time. The limit being 0 is from the fact that each of the three terms in the denominator increases without bound. The second limit in (b) being equal to 0 follows the same line of reasoning used to explain why the second limit in (a) was equal to 0.
The limit in (c) is explained by observing that the cdf approaches 1 as x increases without bound, so its product with x naturally increases without bound.
The limit in (d) is computed as follows.
The first equality is by rearranging terms; the second by L'Hospital's Rule; the third by rearraning terms; and the last by using the second limit of (b).
Definition A.1.5 Define the doubly truncated normal pdf as a pdf that is normally distributed on the interval from a to b; and takes on a value of 0 elsewhere. Since a valid pdf must integrate to 1 over its range (see, for example, Casella and Berger), the doubly truncated normal pdf must be normalized, and is defined as
(a) for a < y < b; 0, elsewhere.
A truncated normal pdf (i.e., one that is not doubly truncated) is one that either has or . In such cases,
(b) for a < y <; 0, elsewhere.
(c) for < y < b; 0, elsewhere.
Definition A.1.6 Similarly, the doubly truncated normal cdf is defined as
(a)
where h = a if y a, h = y if a < y < b, and h = b if b y.
The special versions for lower and upper truncation are
(b)
where h = a if y a, and h = y if a < y; and
(c)
where h = y if y < b, and h = b if b y.
Lemma A.1.5
(a)
where we define .
For lower and upper truncation this becomes
(b) with
(c) with .
Proof We will use the change of variables method, using z = (y).
After transforming the variables, we note that the first integral is the from Definition A.1.4, and the second integral can be evaluated directly (or apply Lemma A.1.3.b). The next step reverses the change of variables.
Theorem A.1.3 The mean of a doubly truncated normal pdf is
E(Y; , , a, b) = + .
Proof By definition, the mean is the expected value of a variable evaluated over the range of its distribution. So we write
Applying Lemma A.1.5 to the integral, we complete the proof.
Lemma A.1.6
(a)
For lower and upper truncation, this becomes
(b)
(c)
where each is defined as in Lemma A.1.5.
Proof Use integration by parts. Let u = (y; , ), which equals ((y)), by Lemma A.1.2. We also have dv = dy, then du = (((y)) / ) dy and v = y. Then we have
where the integral on the right hand side is evaluated after noting Lemmas A.1.1 and A.1.6. By combining like terms, the proof is completed.
Lemma A.1.7
(a)
For lower and upper truncation, this becomes
(b)
(c)
where each is defined as in Lemma A.1.5.
Proof Note first of all that (y;,) = ((y)) / by Lemma A.1.1. Then use integration by parts. Recall that if u and v are continuous and differentiable functions of y over the range of interest, then
Let u represent y and dv represent y(y;,)dy, thendu = dy and v = ((y)) – ((y)), by Lemma A.1.5. This gives us
(A.1.1)
The first integral on the right hand side is solved in Lemma A.1.6. The second integral on the right hand side can be solved by using Lemmas A.1.1 and A.1.3, and is equal to 2 (b – a).
However, the results are more useful to us if we note the following from the proof of Lemma A.1.6
and from the proof of Lemma A.1.3
Combining, we get
If we make use of y + = y – + 2, and substitute in a and b, the proof of the lemma is completed.
Theorem A.1.4 The variance of a doubly truncated normal pdf is
(a)
For lower and upper truncation, this becomes
(b)
(c)
where each is defined as in Lemma A.1.5.
Proof
The first equality is by definition of the variance. The second equality is by noting that E(Y2;,) is equal to the integral in Lemma A.1.7, divided by b – a, because we are using the truncated normal distribution. The last term is from squaring E(Y) from Theorem A.1.3. Subtracting one from the other completes the proof.
A.2Doubly Censored Normal Distribution
Definition A.2.1 A doubly censored normal distribution is one that is normally distributed between a and b; takes on the value of a for values that would have otherwise been less than a; and takes on a value of b for values that would have otherwise been greater than b.
The standard way to present this idea for the multivariate regression is to let the unobservable variable be described by
, where
What we observe is a variable y which is equal to y* for y* between a and b; equal to a for y* < a; and equal to b for y* > b. Note that in the preceding and following definitions, lemmas, and theorems, we may substitute Xi for , and use the results.
Theorem A.2.1 The mean and variance of a censored variable are
(a)
(b)
where
For lower and upper censoring, this becomes
(c)
(d)
where
(e)
(f)
where
For the special case of lower censoring at 0, we get
(g)
(h)
where
In each case above, is defined as in Lemma A.1.5. For (g) and (h), use the value of from the lower censoring case.
Proof The first part is true since EY = E[E(Y|X)], (see Theorem A.2.5 from the Appendix on Conditional Distributions). That is, the probability that y = a is given by a; the probability that y = b is given by (1 – b); and the probability that y is between a and b is given by (b – a); we can simply write down the mean of y as the sum of the products of the probabilities in each interval times the expected values in those intervals. We used Theorem A.1.3 to provide the expected value for the middle range, and the end ranges, of course, have expected values of a and b. This completes the proof of (a).
To prove the first equality in (b), use Theorem A.2.7 from the Appendix on Conditional Distributions (which is from Casella and Berger pp 167-168), which says VarY = Var[E(Y|X)] + E[Var(Y|X)] = E{[E(Y|X) – EY]2} + E{[Y – E(Y|X)]2}. Note that E[Var(Y|y=a)] = E[Var(Y|y=b)] = 0, and Theorem A.1.4 gives us E[Var(Y|a<y<b)]. It is easier from an applied perspective not to write out EY, and not to solve all of the square terms.
The second equality in (b) is shown by substituting the definition of into the second equation, and doing the algebra to see that it is the same as the first equation.
The third equality is true by Theorem A.2.6 from the Appendix on Conditional Distributions. It is most easily seen by noting that VarY = E(Y2) – (EY)2. We know the latter right hand side term from the proof of (a). The former right hand side term uses the same theorem used in (a), but applied to Y2. That is, E(Y2) = E[E(Y2|X)]. We use Lemma A.1.7 above to provide the expected value of the square in the uncensored range. This completes the proof.
Theorem A.2.2 The marginal effect of a change in an independent variable x on the conditional mean is , where ' and ' are shorthand for partial derivatives with respect to x.
Proof Note that both and could be functions of x. Theorem A.2.1 tells us that the conditional mean is
(A.2.1)
The derivative then is given by
If we gather all the terms multiplied by
and
we get and . After cancelling these terms, the only thing that remains completes the proof.
Note that one implication of heteroscedasticity is that one can no longer use one's intuition in interpreting the meaning of a positive derivative of the mean with respect to x, if x is also a determinant of the variance. The variance can amplify this effect, but it can also reverse the effect, making interpretation more difficult.
Theorem A.2.3 The log likelihood of a doubly censored distribution is given by
where i is a possibly non-linear function of parameter vector B and variable vector Xi, and i is a possibly non-linear function of parameter vector A and variable vector Xi, and where some of the parameters in A or B might possibly be restricted equal to zero. That is, some (or all) of the variables in vector Xi may be in i, some (or all) may be in i, and some (or all) may be in both.
Definition A.2.2 Define the expected value Tobit for a doubly censored dependent variable to be
, where ,
and where E(yi|Xi) and Var(yi|Xi) are defined as in Theorem A.2.1.
This approach was possibly first suggestested in Maddala and Nelson, and later amplified in Maddala (pp 182-185). Ruud (ch. 28) discusses the approach more completely, making several important points, such as showing that the maximum likelihood (the traditional Tobit) is more efficient than the expected value Tobit (FWNLS), and that the distributional specification is critical in the FWNLS approach. He suggests that the logistic or Laplace may be a better distribution to use with the FWNLS (because their tails are less flat than those for the normal, and the expectations for values in the tails will fit the data better), and points out that in addition to these distributions, we could use the Weibull, uniform, and a host of other distributions which are log-concave (Ruud, p. 814, cites Karlin's list).
Theorem A.2.4 The log likelihood for the expected value Tobit is just that of a non-linear least squares with heteroscedasticity, and is given by
Here, X represents all explanatory variables, regardless of whether the variable is used to explain or or both.
References
Casella, George and Roger L. Berger. 1990. Statistical Inference. Belmont, CA: Duxbury.
Greene, William. 2d ed.
Karlin, S. P. 1982. "Some results on optimal partitioning of variance and monotonicity with truncation level," in Statistics and Probability: Essays in Honor of C. R. Rao, ed. by G. Kalianpur, P. R. Krishnaiah, and J. K. Ghosh. Amsterdam: North-Holland, 375-382.
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. New York: CambridgeUniversity Press.
Maddala, G. S. and F. Nelson. 1975, "Switching Regression Models with Exogenous and Endogenous Switching," in Proceeding of the American Statistical Association (Business and Economics Section), pp. 423-426.
Ruud, Paul A. 2000. An Introduction to Classical Econometric Theory. New York: OxfordUniversity Press.
[*] Tel: +1-410-751-6025, Email: . The findings, interpretations, and conclusions expressed in this paper are entirely those of the author. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent.