Outline for Class meeting 8 (Chapter 4, Lohr, 2/13/06)
Stratified Sampling: Allocation
I. How do you decide how many to sample from each stratum? Two methods are:
- Proportional allocation
1. In this method, you sample at the same rate from each stratum. Let . This leads to the estimator . This estimator is called “self-weighting” because the calculation is done just like a srs.
3. This leads to variance
.
- Optimal allocation
1. In this method, you allocate so as to minimize variance subject to a fixed cost. For a linear cost function ( ), this leads to (hw problem)
.
2. If the costs are the same in every stratum, then
This is called Neyman allocation, and leads to variance
.
3. If Sh is the same in every stratum, then proportional allocation is optimal.
II. How effective is allocation in improving precision?
- When allocation is proportional
- When allocation is optimal
II. How much is gained by stratification?
- ANOVA identity is
When Nh’s are large, this means that
B. Gain from proportional allocation
So the more diverse are the stratum means, the greater the gain of proportional allocation over srs.
C. 1. Gain from optimal allocation
where . So the more diverse are the stratum variances, the greater the gain of optimal allocation over proportional allocation.
2. Optimal allocation is most often worth the extra effort and information required to execute it when sampling institutions or audit populations (rarely demographic).
D. Examples
1. Example: Suppose we want to estimate the number of churches who have at least one member already volunteering for VNA. We decide to select a stratified sample, with the strata being denominations that VNA has marketed to (30%), and those they have not (70%). We have a budget to sample 200 churches.
a. How many should we sample from each stratum?
b. If the proportion preferring TAS in the two strata are 15% and 75%, how much can we expect stratified sampling to help? What is the most it could help?
2. Library data.
Find optimal allocation.
Library Data
------STRAT=a ------
Variable N Mean Std Dev Minimum Maximum
------
CIRC 241 17830.93 13053.11 0 49825.00
INQ 241 1091.43 1705.98 0 12845.00
------
------STRAT=b ------
Variable N Mean Std Dev Minimum Maximum
------
CIRC 88 101312.30 39969.69 51409.00 198416.00
INQ 88 6912.24 6657.50 0 31549.00
------
------STRAT=c ------
Variable N Mean Std Dev Minimum Maximum
------
CIRC 40 849420.50 1199338.52 203392.00 6384212.00
INQ 40 245834.30 654235.64 7643.00 3278281.00
------
Univariate Procedure
Variable=TOTINQ (Estimate of Total Inquiries from proportionately stratified design)
Moments Quantiles(Def=5)
N 1000 Sum Wgts 1000 100% Max 53292479 99% 49486812
Mean 10722503 Sum 1.072E10 75% Q3 11323522 95% 30907707
Std Dev 10933341 Variance 1.195E14 50% Med 5211784 90% 29226820
Skewness 1.412842 Kurtosis 0.945901 25% Q1 3797284 10% 2554879
USS 2.344E17 CSS 1.194E17 0% Min 1437769 5% 2237295
CV 101.9663 Std Mean 345742.6 1% 1783554
T:Mean=0 31.01296 Pr>|T| 0.0001 Range 51854710
Num ^= 0 1000 Num > 0 1000 Q3-Q1 7526238
M(Sign) 500 Pr>=|M| 0.0001 Mode 1437769
Sgn Rank 250250 Pr>=|S| 0.0001
Variable=TOTINQ
Histogram # Boxplot
5.3E7+* 3 *
.* 5 *
.* 3 *
.
.
.
.
.
.* 2 *
.* 5 *
.** 12 0
.******** 48 0
.********** 57 0
2.7E7+****** 33 0
.********** 55 0
.**** 19 0
.
.
.
.
.* 4 |
.**** 19 +--+--+
.******** 47 | |
.******************* 113 | |
.*********************************************** 280 *-----*
.********************************************** 276 +-----+
1000000+**** 19 |
----+----+----+----+----+----+----+----+----+--
* may represent up to 6 counts
Univariate Procedure
Variable=TOTINQ (Estimate of total inquiries from ‘’optimally’’ stratified design)
Moments Quantiles(Def=5)
N 1000 Sum Wgts 1000 100% Max 11829638 99% 11591298
Mean 10716359 Sum 1.072E10 75% Q3 10907715 95% 11280504
Std Dev 315980.1 Variance 9.984E10 50% Med 10673863 90% 11138286
Skewness 0.625536 Kurtosis 0.242308 25% Q1 10486892 10% 10340441
USS 1.149E17 CSS 9.974E13 0% Min 10066390 5% 10278613
CV 2.948577 Std Mean 9992.169 1% 10151051
T:Mean=0 1072.476 Pr>|T| 0.0001 Range 1763248
Num ^= 0 1000 Num > 0 1000 Q3-Q1 420823.3
M(Sign) 500 Pr>=|M| 0.0001 Mode 10066390
Sgn Rank 250250 Pr>=|S| 0.0001
Histogram # Boxplot
1.185E7+* 1 0
.** 6 0
.* 2 0
.*** 7 0
.****** 17 |
.***** 13 |
.********* 25 |
.**************** 48 |
.********************** 66 |
1.095E7+************************* 75 +-----+
.************************************* 109 | |
.************************************ 106 | + |
.*************************************** 115 *-----*
.*********************************************** 141 | |
.************************************ 106 +-----+
.******************************* 93 |
.**************** 47 |
.****** 18 |
1.005E7+** 5 |
----+----+----+----+----+----+----+----+----+--
* may represent up to 3 counts