Selective Bias Models

  1. Selectivity Bias

Informative sampling—The real problem is the possibility that our sample for the dependent variable is not random. The mere observability of the dependent variable implies information about the regression residual.

Eg. Wage equation based on a sample of positive working hours.

We do not always observe y1

Let y2 be the index function. i.e.

We only observe y1 if y2=1

Or, we observe y1 if y1y2* where

u2 c0+c1x1

If the Threshold were fixed, i.e.

Consider a simple case: y2*=d0

At x3, we are to sample from almost the whole distribution.

=> The observed mean is very close to the population mean.

When we try to fit a linear regression, we will under-estimate the effect of x on y1.

When y2* is stochastic, say y2*=d0+u0

Suppose u1 and u3 are positively correlated and

=> Because of small variance of u1, we will tend not to observe y1 in the upper tail

=> In this case, we will over-estimate the slope.

If u1 and u3 are negatively correlated, we will tend to under-estimate the slope.

=> There is no a priori supposition that we will under-estimate or over-estimate the slope.

Note: The basic problem is, in this kind of model, the sample mean does not represent the population mean.

Selection bias problem can be treated as a Complicated Tobit model with stochastic censoring point.

We select our sample from a random sample representing the population. i.e.

We observe y1 if u2< xγ

=> x= 2

Ifu1 and u2are uncorrelated => no harm, since E(u1|u2)=E(u1) or f(u1|u2)=f(u1). The problem comes when u1 andu2are correlated.

Ifu1 and u2are positively correlated then if u2 is negative (smaller than its mean), it is more likely to observe y1.

If γ is negative and u1 and u2 are positive correlated, then the situation is reversed.

=> The effect is like γ is positive and u1 and u2 are negatively correlated.

=>If γ is negative, the higher x reduce the prob. of observingy1.

=> We will underestimate the slope.

So it is impossible to sign, a priori, the selectivity bias. The bias depends on the correlation between u1 and u2 and sign of γ—how the prob. of observing y1 changes as x changes.

  1. Estimation

Given the information of sample mean of observation y1, we cannot make inference about the population mean from the sample mean information.

=> in the selection bias model, we need information for identification.

(1)Maximum Likelihood

Full model

u1 and u2 are bivariate normal with correlation ρ and the standard deviation of u2 is σ and the standard deviation of u1 is one.

=> The likelihood function is

If we take the second term and express the joint pdf as a marginal times a conditional and make a change of variable.

i.e.

Recall: If x and y are bivariate normal

Note: When ρ=0, the likelihood function reduced to

=>OLS of y2on x can yield unbiased β since the probit part becomes nuisance for estimating β.

Note: The likelihood is expressed in terms of the marginal pdf (u2) for the regression model times the conditional pdf for the indicator variable given their regression outcome (u1|u2) because it is computationally convenient. In describing the selection phenomenon we usually speak in terms of the marginal for the probit times the conditional of the regression given the probit outcome.

Note: The likelihood is not globally concave, so it is possible that there exist multiple local maxima.

ln L atρ=0 is not concave => problem occurs when doing numerical MLE.

In order to make the likelihood function well-behaved, we can reparameterize β and σ such as

so that the likelihood function in these parameter space is more stable.

(2)Two-step method of estimating selection bias model.

Heckmen(1979):

Wage offer

Hours of work

Consider the model

(1)

(2)

where

Estimating(1) but data are missing on Y1 for certain observation.

=> population regression function: E(Y1i|X1i)= X1iβ1

The regression function for the subsample of available data is

E(Y1i |X1i, sample selection rule)= X1iβ1+E(u1i|sample selection rule)

Note: If E(u1i|sample selection rule)=0, the OLS estimators of β will be unbiased. We will have a loss in efficiency only.

Suppose Y1i is observed if Y2i≥0

Y1i is not available for Y2i <0

E(u1i|X1i ,sample selection rule)

=E(u1i|X1i,Y2i≥0)

=E(u1i|X1i,u2i≥-X2iβ2)

If u1i and u2i are independent, then E(u1i|X1i,u2i≥-X2iβ2)=0

=> OLS will be fine.

In general, E(Y1i|X1i,Y2i≥0)=X1iβ1 + E(u1i|u2i≥-X2iβ2)

The bias that results from using non-randomly selected samples to estimate behavioral relationships is seen to arise from the ordinary problem of omitted variables.

Heckman suggests a simple estimator for the case where u1i and u2i are biavariate-normally distributed.

Note: Two well-known results from bivariate normality.

Based on these results, we have

If we know zi and hence λi, we can enter λi as a regressor and estimate the equation by OLS.

In practice, we do not know λi. If we have the case of a censored sample for Y2, i.e. we know X2i for observations with Y2≤0. Then we can use two-step estimation.

1st step: Estimating Pr(Y2≥0.) by Probit to yield consistent estimator of . Then we can estimate zi and λi based on these estimates.

2nd step: The estimated λimay be used as a regressor. OLS will yield unbiased estimators of β and .

(3)Olsen: (1980)

Assume E(ui)=0 E(vi)=μv x, y exogenous

E(uiuj)=σ2v i=j

=0 o.w.

If we assume further v~N(0, 1), then

=> Withoutassuming normality for the u’s we have arrived at Heckman’s specification.

So Heckman’s result does not require bivariate normality, only the normality of the v’s and the linearity of the conditional expectation of u given v.

Bivariate normality is a sufficient condition, but not necessary.

1