Math 1680.010Page 1 of 8
Chapter 21
Chapter 21: The Accuracy of Percentages
In the previous chapter, we had the following set-up:
- We had a population with known parameters (i.e., voters who voted for Ventura).
- A simple random sample was selected from this population.
- We found the expected percentage and the SE for the sample percentage.
- We used the normal curve to find the chance that the sample percentage is in a prescribed range.
In most polling examples one critical assumption is violated. We usually do not know the contents of the population. In other words, knowing the number of “1” tickets and the number of “0” tickets to place in the box.
Ex. #1: The Sunday before the 1998 Maryland gubernatorial election, a poll showed Gov. Glendening leading – 1,370 of 2,500 supported him. Will the governor be re-elected?
Let’s observe that the population is unknown (we don’t know what every voter in Maryland thinks); only the sample is known.
Wrong Answer: As of Sunday, according to the poll, Glendening had the support of the voters. So he will win.
Let’s think about what is wrong with this simplistic analysis:
Box Model:
??? s ??? s
As you can see, we don’t know the fractions of s and s in the box. Thus, we are not able to find the SD of the box (or the average, for that matter).
Bootstrap Estimation: We estimate the SD of the boxusing the sample:
SD of box
SE of the number of voters in the sample =
SE of the sample percentage =
Thus, the estimate of 54.8% of the vote is off by only 1 percentage point or so – and it is very unlikely to be off by as many as 5 percentage points (how unlikely do you think?). It is a safe bet that the governor will be re-elected.
Ex. #2: A coin is flipped 100 times; it lands head 57 times. Estimate the percentage of times that the coin will land heads and attach a standard error to the estimate.
The coin will land heads about of the time, give or take or so.
Ex. #3: A simple random sample of size 400 is taken from all manufacturing establishments in a state. Of these, 16 had 250 employees or more. Estimate the percentage of manufacturing establishments with 250 employees or more. Attach a standard error to this estimate.
Ex. #4: A statistician chooses a simple random sample of size 1,000 to determine the popularity of a certain television show. Of these, 308 saw the show. Complete the following table (use NA where appropriate).
Known to be / Estimate from the data isObserved value
Expected value
SE
SD of box
Number of draws
CONFIDENCE INTERVALS
Ex. #5: Earlier we saw that, if a coin is flipped 100 times and lands heads 57 times, the expected percentage of heads is 57%, with an SE of 4.95%.
So:
- The interval from 52.05-61.95% is a “68%-confidence interval” for the likelihood of the coin landing heads.
- The interval from 47.1-66.9% is a “95%-confidence interval” for the likelihood of the coin landing heads.
- The interval from 42.15-71.85% is a “99.7%-confidence interval” for the likelihood of the coin landing heads.
Observations:
1. We do NOT say, “There is a 95% chance that the population parameter lies between 47.1 and 66.9%.” Whatever the population parameter is (currently unknown), either it lies in this range or it does not. Hence, the word “confident” instead of chance.
2. The true interpretation is as follows: If several people run this experiment and they all find a 95%-confidence interval, then the true population parameter will lie in 95% of these intervals.
3. The normal approximation has been used. As discussed in Chapter 18, a large number of draws is required for this assumption to hold. How many draws are necessary? It depends if the sample percentage is near 50%, then only 100 or so will be sufficient. More are needed if the sample percentage is close to 0% or 100%.
4. There is no such thing as a 100%-confidence interval.
Ex. #6: Find a 95%-confidence interval for the percentage of Glendening voters in the previously discussed Maryland election poll.
The 95%- confidence interval for the percentage of Glendening voters is from .
Ex. #7: The Postmaster of Atlanta found that 44 packages out of a random sample of 650 had insufficient postage. Find a 68%-confidence interval for the probability that a randomly selected package has insufficient postage.
The 68%-confidence interval for the likelihood of insufficient postage is from .
Ex. #8: Find a 99.7%-confidence interval for the percentage of people who watched the television show in Example #4.
The 99.7%-confidence interval for the percentage of people who watched the show is from .
Ex. #9: In a survey of 500 households, 498 had refrigerators. Find the percentage of households with refrigerators, and find an SE for this percentage. Can we find a 95%-confidence interval for this percentage?
Ex. #10: A simple random sample of 1,000 persons is taken to estimate the percentage of Democrats in a large population. It turns out 543 of these people are Democrats.
A) Find an estimate for the percentage of Democrats in the population and attach a standard error to this estimate.
B) True or false:
i) 54.3 3.2% is a 95%-confidence interval for this population percentage.
ii) 54.3 3.2% is a 95%-confidence interval for the sample percentage.
iii) There is a 95% chance for the percentage of Democrats in the population to be in the range 54.3 3.2%.
Ex. #11: Fill in the blanks with either: box or draws.
Probabilities are used when reasoning from the ______to the ______. Confidence levels are used when reasoning from the ______to the ______.
Ex. #12: Fill in the blank with either: observed or expected.
The chance error is the ______value.
Ex. #13: Fill in the blank with either: sample or population.
The confidence level is for the ______percentage.
Definition: Margin of Error. In an election poll, the margin of error is twice the SE for the percentage.
Ex. #14: A Gallup poll pre-election survey based on 1,000 people estimates that Bush will win 72% of the vote. Find the margin of error in this survey.
Observation: This method of calculating the margin of error only applies to simple random samples. More complicated formulas (beyond the scope of this class) exist for the stratified methods of the Gallup poll.
1