Outline for Class Meeting 7 (Chapter 3 (3.2,3.4), Lohr, 2/8/06)
Model Based sampling for auxiliary data, Intro to Stratified sampling
I. Ratio estimation can be justified from a model-based point of view
A. Consider the following model. The population is a realization from a model of the form
where ei ~ (0, xis2) and independent. Under this model, is a r.v. and the parameter of interest is one realization of this r.v. Thus our original estimation problem is a prediction problem.
B. A reasonable predictor of ty is
where is the least squares estimator of b. Observe that
Thus
So the estimator is the same as the randomization-based estimator.
C. . The estimator is model-unbiased (even though it is not randomization unbiased.
D. The model-based variance can be shown (see p. 82) to be
.
This is different from the randomization based variance.
II. Other ways to make use of auxiliary data
A. Regression estimator
Suppose that the best model for the data is not that shown in I., but rather
,
where ei ~ (0, s2) and are independent. Then prediction as before, using the least squares estimator for the parameters b0 and b1, leads to the regression estimator
,
where
1. From a randomization-based point of view, this estimator is biased in small samples, and an estimate of its approximate variance is
where .
2. From a model-based point of view, this estimator is unbiased with a variance that looks like variance of a regular simple regression predictor. (See p. 86).
B. Difference estimator
The difference estimator is often used in accounting populations. It is
.
This is an unbiased estimator of ty and its variance is
where di = yi – xi. Under what model is this estimator the best linear unbiased predictor?
III. Stratified sampling
When separate samples are selected from each of several subsets of the population (defined ahead of time, called strata), the sample is said to be a stratified sample. If the samples from each strata are SRS, the the design is said to be a stratified random sample.
A.Estimators
1. Denote by the total for the hth stratum. Likewise, all other notation is subscripted by h to indicate that it is for the hth of H strata. Thus an unbiased estimator of population total from a stratified random sample is
.
and
.