Outline for Class Meeting 7 (Chapter 3 (3.2,3.4), Lohr, 2/8/06)

Model Based sampling for auxiliary data, Intro to Stratified sampling

I.  Ratio estimation can be justified from a model-based point of view

A. Consider the following model. The population is a realization from a model of the form

where ei ~ (0, xis2) and independent. Under this model, is a r.v. and the parameter of interest is one realization of this r.v. Thus our original estimation problem is a prediction problem.

B. A reasonable predictor of ty is

where is the least squares estimator of b. Observe that

Thus

So the estimator is the same as the randomization-based estimator.

C. . The estimator is model-unbiased (even though it is not randomization unbiased.

D. The model-based variance can be shown (see p. 82) to be

.

This is different from the randomization based variance.

II. Other ways to make use of auxiliary data

A. Regression estimator

Suppose that the best model for the data is not that shown in I., but rather

,

where ei ~ (0, s2) and are independent. Then prediction as before, using the least squares estimator for the parameters b0 and b1, leads to the regression estimator

,

where

1. From a randomization-based point of view, this estimator is biased in small samples, and an estimate of its approximate variance is

where .

2. From a model-based point of view, this estimator is unbiased with a variance that looks like variance of a regular simple regression predictor. (See p. 86).

B. Difference estimator

The difference estimator is often used in accounting populations. It is

.

This is an unbiased estimator of ty and its variance is

where di = yi – xi. Under what model is this estimator the best linear unbiased predictor?

III. Stratified sampling

When separate samples are selected from each of several subsets of the population (defined ahead of time, called strata), the sample is said to be a stratified sample. If the samples from each strata are SRS, the the design is said to be a stratified random sample.

A.Estimators

1.  Denote by the total for the hth stratum. Likewise, all other notation is subscripted by h to indicate that it is for the hth of H strata. Thus an unbiased estimator of population total from a stratified random sample is

.

and

.

2.  The variance of the stratified estimator is obtained as the sum of the variances across the strata, since sampling is independent from one stratum to the next. Likewise the estimate of the variance is obtained as the sum of the variance estimates across the strata.

3.  A confidence interval for the mean or total can be constructed based on the normal approximation if either the sample sizes within each stratum are large or there are many strata.