STT 825F 02EXAM 1SOLUTIONS
1. (32 points) We want to estimate the mean hourly wage for electricians working in Ingham County. We have a list (available from the State of Michigan) of the 540 licensed electricians with addresses in Ingham Country, and will take a simple random sample from this list.
a. Suppose we know the wages are between $20.00 and $50.00 per hour.
Assume a skewed (right triangle- J shape) for the distribution, and find S for planning the sample size.
S2 = 302/18 = 50, S = 7.07
b. We decided to use an old study value of S = $5.00 for planning. How large a sample is needed to estimate the mean wage to within .75 with 95% confidence?
no = 170.73 = (1.96)2 52/ (.75)2
n* = 129.7, recommend n* = 130
c. If, based on past experience, we think we will get 50% nonresponses, how many questionnaires should we send out to get the precision we want in (b)?
2 (your answer in (b)) = 260
d. Some electricians are not licensed and thus do not appear on the original list. This would be a source of
[sampling errorselection biasmeasurement error ] in our study.
e. Some licensed electricians charge different hourly rates depending on whom they are working for. This would be a source of
[sampling errorselection biasmeasurement error ] in our study.
f. Some of the electricians selected for the sample refused to respond to the survey. This would be a source of
[sampling errorselection biasmeasurement error ] in our study.
g. Some electricians work in Ingham County but don’t live there. This would be a source of
[sampling errorselection biasmeasurement error ] in our study.
h. Samples were taken by two different research firms and they then reported confidence intervals that weren’t the same. This is due to
[sampling error, selection bias, measurement error ].
2. (55 points) Twelve adults signed up for a special fitness class.
Adult Label123456789101112
Age195634293540283325373142
Note summary characteristics for these 12 adults.
Variable N Mean Median TrMean StDev SE Mean
age 12 34.08 33.50 33.40 9.39 2.71
We will take samples of size 4 from this population of 12 adults.
a. How many samples of size 4 are possible from this population of 12 adults?
12C4 = 495
b. Consider the sampling plan to the right:SiP(Si)
1,3,4,101/8****
3,4,7,81/8 31.0
1,2,5,81/435.5
6,9,11,121/234.5
i. Find the probability that unit #8 is selected via this sampling plan.
1/8 + 1/4= 3/8
ii. Find for sample S1.
(19 + 34 + 29 + 37)/4 = 29.75
iii. Find E().
(1/8)(29.75) + 1/8(31) + ¼(34.5) + ½(34.5) = 33.719
iv. Is biased or unbiased under this sampling plan?
biased
#2 continued.
c. Describe the sampling plan which is systematic sampling, samples of size 4, from this population of size 12. (Give all the Si and their P(Si).
SiP(Si)
1,4,7,101/3
2,5,8,111/3
3,6,9,121/3
d. Suppose we take a simple random sample of size 4 from this population of 12 adults.
i. What is the probability of selecting sample {1,2,3,4}?
1/495
ii. Find Var() for the simple random sampling plan. (Note population characteristics on the previous page.)
(1-n/N) S2/n= (1 - 4/12) (9.39)2/4 = 14.695
e. We are thinking of starting the special fitness class for a local athletic club, and want to estimate the mean age of their 500 adult members. Use the characteristics from the population of size 12 given below for planning:
Variable N Mean Median TrMean StDev SE Mean
age 12 34.08 33.50 33.40 9.39 2.71
If we want to estimate the mean age to within 6% of the actual mean age with 95% confidence, how large a sample should we take?
Relative error e = .06CV = 9.39/34.08 = .2755
no = (1.96)2(.2755)2/(.06)2 = 81.01
n* = 69.71, report n*=70.
(iteration with t is also accepted)
f. Your answer in (e) is for a population of size 500. For larger populations (planning with the same parameters values and for the same precision), we need a larger sample size. No matter how large the population, the needed sample size never goes above what value? (Give one number.)
no = 81
3. (57 points) A local green house has 2000 flats with up to 6 plants in each flat. To monitor insect infestation, a simple random sample of 40 flats is taken, the plants are examined, and the number of insects is counted for each plant then totaled to get a count yi for each sample flat.
Sample Statistics:MeanStandard deviation
Number of insects 14.03.2
Number of plants5.40.7
a. What is the sampling unit?
Flat
b. What is the observation unit?
Plant
c. The flats are labeled 1,2,…,2000. Use the portion of the random number table given below to get the first THREE flat labels in the SRS.
23481 41279 04190 82147 59171 90812 75215 57214 59832 14879 64152 51087 23410 89743
1412, 1908, 1275
d. Note the sample statistics above.
Compute a 95% t-confidence interval for the mean number of insects per flat.
14 2.021 (.50) which gives 14 1.00
e. Note the sample statistics given above.
Estimate the mean number of insects per plant (do not find the standard error).
estimate a ratio by 14/5.4 = 2.6
#3 continued.
f. When doing the experiment, a simple random sample of 40 labels was taken from labels {1,2,…,2000}.
Flat #587 was selected, but was damaged and not usable in the experiment. To replace #587,
i. the technician just took Flat #588 instead. Does this produce an SRS?
YESNO
ii. the technician flipped a coin to decide between putting Flat #588 and Flat #586 in the sample.
Does this produce an SRS?
YESNO
iii. the technician randomly selected a flat from those not yet selected. Does this produce an SRS?
YESNO
g. Suppose we sampled every 50th flat in the list of 2000, but started the selection at random. Does this produce an SRS?
YESNO
h. Suppose we decided to sample plants rather than flats.
i. Select a simple random sample of 100 flats, then select 2 plants at random from each selected flat.
Does this produce an SRS of plants?
YESNO
ii. At each stage, select a flat at random, then select a plant at random from those not yet selected in the flat (note that if m plants remain unselected in the flat, each has chance 1/m of being selected at that stage). If all the plants have been selected in a flat, then select another flat at random. Does this produce an SRS of plants?
YESNO
iii. At each stage, select a flat at random, then select a plant position at random from 1-6. If that position is empty or that plant has been selected, then begin selecting from all flats again. Does this produce an SRS of plants?
YESNO
iv. Consider sampling frame _ _ _ _/_ where the first 4 digits identify the flat, and the last digit identifies the plant in the flat. If we take an SRS from this frame, will we get an SRS of plants?
YESNO
======
4. (30 points) We took a simple random sample of 50 stores from a list of 370 stores in the area. Data and summary statistics are reported under Problem #4 on the gold sheet. Sales are in $1000 units.
a. For an SRS of size 50, what is the probability that store #206 would be selected?
SRS is an epsem scheme, and thus, the probability of any single unit being selected = sampling fraction = n/N = 50/370 = .135
b. A 95% confidence interval for the mean sales is given by 53.56 8.88.
i. Use this to get a 95% confidence interval for the total sales.
370 ( CI above) gives 19,817.2 3285.6
ii. The confidence interval in (i) requires approximate normality of the sample mean. Note the histogram of the data. Would you say that n=50 is large enough to use the normal?
YESNO
d. The number of employees was also recorded for each store. Note the scatter diagram and summary statistics given under #4 on the gold sheet.
i. Looking at the scatter diagram for (x,y), which estimator of the total sales would you recommend?
[ ratioregression ]
ii. Compute the ratio estimate of the total sales (note that the average number of employees for the 370 stores is 8.7 per store). Do NOT compute the standard error.
53.56 x (370 x 8.7)/10.34 = 16,674.05 Note that you had to get tx = 370 x 8.7 to use for estimating ty.
iii. Note observation #1 on your gold sheet, and find ei to use if you were computing the standard error of your ratio estimator. Find the ei for only the FIRST observation, not the others!
ei = 55 - 5.18 (14) = - 17.52
======
5. (26 points) We want to estimate the proportion of 370 stores that stay open at night.
a. Define the measurement yi.
= 1 if open at night
= 0 if not open at night
b. Data gave 46 stores out of 50 staying open at night. Compute a 95% confidence interval for the proportion staying open at night. Use the Z-distribution.
sample proportion = 46/50 = .92
.92 (1 - 50/370)1/2[.92(.08)/49]1/2
.92 1.96 (.93) (.0388)
.92 1.96 (.036)
.92 .0707
c. Is the confidence interval in (b) valid? (That is, do the needed assumptions approximately hold?)
Why or why not?
No, n (q-hat) = 50 (.08) = 4 < 5 required to use the normal.
d. We want to plan a similar study about stores in a larger area.
i. If we assume that the proportion staying open at night is more than 85%, what value of p should we plan with?
.85
ii. If we assume that the proportion staying open at nights is below 90%, what value of p should we plan with?
.5
PROBLEM #4
Variable N Mean Median TrMean StDev SE Mean
sales 50 53.56 53.50 53.36 17.67 2.50
number 50 10.340 11.000 10.364 5.542 0.784
Data Display
Obs number sales
1 14 55
2 15 63
3 13 64
4 16 85
5 7 40
6 14 74
etc. up to
50 observations
HISTOGRAM OF SALES DATA
SCATTER DIAGRAM (x=number of employees, y=sales)
1