1. the Empirical Rule

1. The Empirical Rule

a.Suppose that the mean SATM score for seniors in Georgia was 550 with a standard deviation of 50 points. Consider a simple random sample of 100 Georgia seniors who take the SAT. Describe the distribution of the sample mean scores.

Solution: The distribution, given that there are more than 1000 seniors who take the SAT, should be approximately normal.

b.What are the mean and standard deviation of this sampling distribution?

Solutions: The mean is 550 and the standard deviation is 50/(sqrt 100) = 5.

c.Use the Empirical Rule to determine between what two scores 68% of the data falls, 95% of the data falls, and 99.7% of the data falls.

Solutions:

68% of the data would be one standard deviation on either side of the mean, so (550 – 5, 550 + 5) = (545, 555)
95% of the data would be within two standard deviations of the mean or (540, 560).
99.7% of the data is within three standard deviations or (535, 565).

For the 95% interval, this means that in 95% of all samples of 100 students from this population, the mean score for the sample will fall within ___ standard deviations of the true population mean or ____ points from the mean.

2. Confidence Intervals

In the above problem, we took the mean and added/subtracted a certain number of standard deviations. That is, we calculated for the 95% interval and for the 99.7% interval.

The interval of numbers found, i.e. (540, 560) is called a 95% confidence interval for the population mean.

Above, we knew the population mean, but in practice, we often do not. So we take samples and create confidence intervals as a method of estimating the true value of the parameter. When we find a 95% confidence interval, we believe with 95% confidence that the true parameter falls within our interval. However, we must accept that 5% of all samples will give intervals which do not include the parameter.

Every confidence interval takes the same shape: estimate  margin of error. In the interval (540, 560), the margin of error is 10.

The margin of error has two main components: the number of standard deviations from the mean (i.e. the z-score) and the standard deviation. (Margin of error = z .)

Because we do not usually know the details of a population parameter (e.g. mean and standard deviation), we must use estimates of these values. So our margin of error becomes m = z(estimate). Therefore, the confidence interval becomes

estimate margin of error  estimate  z(estimate).

The z-score used in the confidence interval depends on how confident one wants to be. There are a few common levels of confidence used in practice: 90%, 95%, and 99%.

The Empirical Rule provides estimates for the amount of data within specified numbers of standard deviations, and therefore, can help us find approximate intervals for being 68%, 95%, and 99.7% confident that we have included the true population parameter. Let’s find closer estimates for the number of standard deviations from the mean within which certain percentages of data lie.

Within how many standard deviations of the mean would one locate the middle 95% of the data? (Hint: Draw a picture and use the normal table or invNorm on your calculator.)

Solution: This is a question about z-scores. Using either feature, consider the following: If 95% is in the middle, then 2.5% is on each end. So the area under each end is 0.025. Using the standard normal table, an area of .025 is located exactly at the z-score of -1.96. Because the curve is symmetric, the upper z-score is 1.96. Using the invNorm feature, first determine the lower tail area as above (0.025). Then input invNorm(.025). The output is approximately -1.96. One could also input invNorm(.975) to get the upper z-score, however, this is not necessary because of the symmetry of the curve.

Within how many standard deviations of the mean would one locate the middle 90% of the data? (Hint: Draw a picture and use the normal table or invNorm on your calculator.)

Solution: If 90% is in the middle, then 5% is on each end. So the area under each end is 0.05. Using the standard normal table, an area of .05 is half way between the area of .0495 (z-score of -1.65) and an area of .0505 (z-score of -1.64). We can reason that the appropriate lower z-score is -1.645. Because the curve is symmetric, the upper z-score is 1.645. Using the invNorm feature, first determine the lower tail area as above (0.05). Then input invNorm(.05). The output is approximately -1.645. One could also input invNorm(.95) to get the upper z-score, however, this is not necessary because of the symmetry of the curve.

For the common confidence levels, then, we have the following z-scores, called z*. Complete the following table using your answers above.

Confidence level / z*
90%
95%
99% / 2.576

Solutions: 1.645; 1.96

For any confidence intervals you are expected to compute by hand in Math 4, you will use these z* values. Thus, our final form of the confidence interval is estimate  z*(estimate).

You will continue investigating confidence intervals and, specifically, margin of error, through the next two activities.