1. The Empirical Rule
a.Suppose that the mean SATM score for seniors in Georgia was 550 with a standard deviation of 50 points. Consider a simple random sample of 100 Georgia seniors who take the SAT. Describe the distribution of the sample mean scores.
Solution: The distribution, given that there are more than 1000 seniors who take the SAT, should be approximately normal.
b.What are the mean and standard deviation of this sampling distribution?
Solutions: The mean is 550 and the standard deviation is 50/(sqrt 100) = 5.
c.Use the Empirical Rule to determine between what two scores 68% of the data falls, 95% of the data falls, and 99.7% of the data falls.
Solutions:
- 68% of the data would be one standard deviation on either side of the mean, so (550 – 5, 550 + 5) = (545, 555)
- 95% of the data would be within two standard deviations of the mean or (540, 560).
- 99.7% of the data is within three standard deviations or (535, 565).
For the 95% interval, this means that in 95% of all samples of 100 students from this population, the mean score for the sample will fall within ___ standard deviations of the true population mean or ____ points from the mean.
2. Confidence Intervals
In the above problem, we took the mean and added/subtracted a certain number of standard deviations. That is, we calculated for the 95% interval and for the 99.7% interval.
The interval of numbers found, i.e. (540, 560) is called a 95% confidence interval for the population mean.
Above, we knew the population mean, but in practice, we often do not. So we take samples and create confidence intervals as a method of estimating the true value of the parameter. When we find a 95% confidence interval, we believe with 95% confidence that the true parameter falls within our interval. However, we must accept that 5% of all samples will give intervals which do not include the parameter.
Every confidence interval takes the same shape: estimate margin of error. In the interval (540, 560), the margin of error is 10.
The margin of error has two main components: the number of standard deviations from the mean (i.e. the z-score) and the standard deviation. (Margin of error = z .)
Because we do not usually know the details of a population parameter (e.g. mean and standard deviation), we must use estimates of these values. So our margin of error becomes m = z(estimate). Therefore, the confidence interval becomes
estimate margin of error estimate z(estimate).
The z-score used in the confidence interval depends on how confident one wants to be. There are a few common levels of confidence used in practice: 90%, 95%, and 99%.
The Empirical Rule provides estimates for the amount of data within specified numbers of standard deviations, and therefore, can help us find approximate intervals for being 68%, 95%, and 99.7% confident that we have included the true population parameter. Let’s find closer estimates for the number of standard deviations from the mean within which certain percentages of data lie.
- Within how many standard deviations of the mean would one locate the middle 95% of the data? (Hint: Draw a picture and use the normal table or invNorm on your calculator.)
Solution: This is a question about z-scores. Using either feature, consider the following: If 95% is in the middle, then 2.5% is on each end. So the area under each end is 0.025. Using the standard normal table, an area of .025 is located exactly at the z-score of -1.96. Because the curve is symmetric, the upper z-score is 1.96. Using the invNorm feature, first determine the lower tail area as above (0.025). Then input invNorm(.025). The output is approximately -1.96. One could also input invNorm(.975) to get the upper z-score, however, this is not necessary because of the symmetry of the curve.
- Within how many standard deviations of the mean would one locate the middle 90% of the data? (Hint: Draw a picture and use the normal table or invNorm on your calculator.)
Solution: If 90% is in the middle, then 5% is on each end. So the area under each end is 0.05. Using the standard normal table, an area of .05 is half way between the area of .0495 (z-score of -1.65) and an area of .0505 (z-score of -1.64). We can reason that the appropriate lower z-score is -1.645. Because the curve is symmetric, the upper z-score is 1.645. Using the invNorm feature, first determine the lower tail area as above (0.05). Then input invNorm(.05). The output is approximately -1.645. One could also input invNorm(.95) to get the upper z-score, however, this is not necessary because of the symmetry of the curve.
- For the common confidence levels, then, we have the following z-scores, called z*. Complete the following table using your answers above.
Confidence level / z*
90%
95%
99% / 2.576
Solutions: 1.645; 1.96
For any confidence intervals you are expected to compute by hand in Math 4, you will use these z* values. Thus, our final form of the confidence interval is estimate z*(estimate).
You will continue investigating confidence intervals and, specifically, margin of error, through the next two activities.