Slide 1 Welcome to Unit Six Lesson One Confidence Intervals. Using Techniques of Statistical

Slide 1
Welcome to unit six lesson one confidence intervals. Using techniques of statistical inference, we draw conclusions about a population based on our sample data. So are drawing inferences on our population using our sample. Statistical inferences provide a statement of how much confidence we can place in our conclusions. In this lesson we are going to examine one type of statistical inference referred to as confidence intervals. In the next lesson, we will examine tests of significance. A confidence interval is a good descriptive tool to present along with the results of a test, so the two go hand in hand.

Slide 2

Suppose we are interested in estimating the diastolic blood pressure for women living in North Carolina for women between the ages of 50 and 60. We’d take a simple random sample of say 200 women and measure their blood pressure. Suppose the average diastolic blood pressure of these women was 117 mm of mercury. Now what can we say about the diastolic blood pressure of the population? Because remember now we’ve only measured some random sample, but we’d like to make inference about the population. We know of X bar of the sample mean is an unbiased estimate of MU, the population mean. But what do we know about its reliability? How confident are we that this value of 117 is close to the true population mean? The second sample might produce a mean of 115 or 120 millimeters of mercury.

Slide 3

Questions about variation are answered by examining the spread of the samples. So population has the mean of MU and a standard deviation of sigma. And the sample mean follows the distribution with the mean of MU and a sample standard deviation of sigma divided by the square root of n. If the population of diastolic blood pressure for women 50 to 60 years of age has the mean MU and a standard deviation sigma, then in repeated samples of size 200 the sample mean has a normal distribution with mean MU and variant sigma divided by the square root of 200. So sigma is a standard deviation of the observations within a population. And sigma divided by the square root of n is the standard deviation of a group of means. So sigma at the variation, and if we estimate sigma we estimation the standard deviation of observations with in a sample, and we can estimate sigma divided by the square root of n by finding the standard deviation of a group of samples means.

Slide 4

Now suppose we know sigma, the population standard deviation. This might not be realistic but we will see lessons later that will address this problem. Suppose sigma equals 50, since the sample mean is 117 millimeters of mercury, the standard deviation of the mean is about 3.5 millimeters of mercury. The standard deviation of the mean is frequently referred to as the standard error of the mean, or just the standard error.

Side 5

Remember that in the normal distribution about 95 percent of the probability lies between two standard deviations of the mean. Since the standard error of the mean is 3.5, then twice the standard error of the mean is 7, two times 3.5. To say that the samples mean lies with in 7 units of the population mean, is the same as saying that the population mean lies within 7 points of the sample mean. So about 95 percent of all samples will capture the true population mean in intervals from 117 minus 7 to 117 plus 7.

Slide 6

Now we’re ready to construct a confidence interval. We say that we are 95 percent confident that the unknown mean diastolic blood pressure for 50 to 60 year old female NC residents is between 117 minus twice the standard error of the mean and 117 plus twice the standard error of the mean. Notice the third set of parentheses here on your slide. This is the notation we use most often when displaying the confidence intervals. So it presents the low number the high number. We interpret the interval as 95 percent confidence that the unknown mean diastolic blood pressure lies between 110 and 124 millimeters of mercury. In other words if we repeated the sampling procedure many times we would expect 95 percent of our samples to contain the true mean.

Slide 7

The confidence interval contains estimates plus or minus a margin of error. The margin of error for a 95 percent confidence interval, in this case, is twice the standard error. It shows how accurate we believe our guess is base on the variability estimate. The plus and the minus symbol is a short hand way of indicating that we subtract the margin of error from our estimate to obtain a lower bound for our confidence interval and then we add the margin of error to our estimate to obtain the upper bound for our confidence interval.

Slide 8

In the previous example, we use the fact that the sampling distribution of the sample mean X bar is approximately normally distributed. The mean of MU and the standard deviation of sigma divided by the square root of n, the sample size. The mean of diastolic blood pressure is exactly normally distributed if diastolic blood pressure is normally distributed. So if the data is distributed normally, then the distribution of the means from that data is also distributed normally. Even if the distribution of diastolic blood pressure is not normal, then the central limit theorem from unit five states that with large samples the mean is approximately normally distributed. By large sample we mean greater than 30. We will see a demonstration of the central limit theorem in the activity for this lesson. We need our estimates to follow normal distribution so that we can easily compute confidence intervals. So it’s nice to have data that has a strong distribution.

Slide 9

In constructing the confidence interval for the mean of the diastolic blood pressure, we know that 95 percent of the area under the normal curve is located within plus and minus two standard deviations of the mean. This was an approximation. Above is the standard normal curve. And we want area C to equal 95 percent or .95. Since the curve is symmetric, you know if you fold it half it’s a mirror image on either side, and the area under this curve must total one. So we have symmetric curve and the total area of the curve is one. So now we know we have .025 area to the right of C and .025 area to the left of C, if in the middle you have 95 percent and you have 5 percent left over, half of it 2.5 percent on one tail and the other half on the other tail. Notice that the area to the left of C equals one minus C divided by two. And one minus .95 divided by two equals .025. We are going to make use of this relationship very often.

Slide 10

Now assign the value from standard normal distribution with .025 area to the left of it. We look at table T dash 2 in your text. Part of this table is reproduced in the slide here. Look for the value .025 in the body of the table. This area is associated with negative Z value of minus 1.96. We read the ones place and the tenth place of the Z statistic from the left most column. So you get the minus 1.9 from that first column. And then you read across. And when you read across you’re reading the hundredth place of the Z statistics from the first row. So you go to the six hundreds. So the distribution is symmetric, Z equals 1.96. So 95 percent of the area to the standard normal curve is between plus and minus 1.96 standard deviations of the mean. So the table here is telling you that a Z value of negative1.96 has two and half percent of .025 to the left of it. And that’s how much we want in our last little table if we are going to have 95 percent in the middle.

Slide 11

We’re often interested in having from area C equal to 95 percent. However we need to be able to determine the Z value associated with any area not just 95 percent. And to do this you can just use the method described in the previous slide. And here are some of the most commonly used values. If you want to be 90 percent confident, or if you want to have 90 percent in the middle of your curve, that means you have ten percent left over and you have five percent in each tail. And a Z value with five percent to the left of it or five percent to the right of it is a Z of 1.645. As you saw before, when you have a Z of 95 percent, 95 percent of the data in the middle with two and a half left on each tail, that is a Z of 1.96 and if you’re 99 percent confident your Z will be 2.576. So a lot of the times if I was looking it up, it’s kind of nice to memorize those or have those values written down because those are the ones that are most commonly used.

Slide 12

Now we can state the general strategy for finding a confidence interval for population means. Choose a simple random sample of size n from a population having an unknown mean MU and a known standard deviation sigma. A level C confidence interval for MU is, as you see the formula on your slide, X bar plus and minus V times sigma divided by the square root of n. This interval is exact when the population distribution is normal and is approximately correct for large n in other cases and if you had data from other distributions. As long as you have that larger sample size 30 or more, this equation here will work fine for you.

Slide 13

Now let’s see an example. Suppose you want a 90 percent confidence interval for the mean height of tenth grade males in Chapel HillHigh School. You take a simple random sample of 30 tenth grade males. You know the population standard deviation is 5 inches, which is assuming we know that to be 5 inches. The sample mean is 64 inches. Using the methods discussed previously you can determine that the Z value associated with 90 percent confidence is 1.645. Now you can substitute the number into the equation. The 90 percent confidence interval is 62.5 inches to 65.5 inches. So in other words we are 90 percent confident that the population mean lies in that interval. So the true mean height of tenth grade males in Chapel Hill High School is 90 percent confident that it’s between 62.5 inches and 65.5 inches.

Slide 14

It is desirable to have a small confidence interval because that means that our estimate is more precise because there is less variability in it. There are three ways you can decrease the size of the confidence interval. You can lower the level of confidence, you can increase the sample size. Those are the first two ways. And you can also reduce the population variable, but that’s less obvious about how you can do that, I mean it’s usually just given to be whatever it is. This can be achieved by carefully controlling the measurement process or by restricting our attention to only part of a large population. Certain part of your population might have a less variability in height or IQ or something. If you just concentrate on those types of people, make that population, then your variances will be smaller. But that may not work for your experiment, so it kind of depends what works. But usually you either lower the level of confidence or increase the sample size in order to end up with a smaller confidence interval.

Slide 15

Here we use the example; the mean height of tenth graders to illustrate the effect confidence level has on the size of the confidence interval. So we’re going to see how changing your confidence will change the width of your confidence interval. At 90 percent confidence, the confidence interval is 62.5 to 65.5 inches, as we saw in a previous slide. Often we use a margin of error to describe the size of a confidence interval. And this margin of error is actually half the width of the interval. You can also think of your margin of error is the Z value times the standard error. But since you are adding that, you’re taking the Z value times the standard error and you’re adding that to your sample means subtracting that to your sample means, then that value is half of your confidence interval. So that‘s the value of margin of errors. So it’s kind of like two different words to describing the same thing. At 90 percent confidence the margin of error is 1.5, at 95 percent confidence is 1.79, and at 99 percent confidence is 2.35. So you can see as the margin of error is getting bigger the confidence interval is getting wider, as you want to be more confident. Because as you have a larger interval you’re obviously going to be able to contain more numbers in it and you’re going to be more confident that true mean value is in that interval.

Slide 16

Here we use the example of the mean height tenth graders to illustrate the effect sample size has on the size of the confidence interval. So for each example the confidence level has been fixed at 90 percent. So we are 90 percent confident for each one, but how many people, males do we have in our sample? As it increases, we see that the margin of error decreased. So we see that for a sample size of 30, the margin of error is 1.5, and this reduces to 1.1 with a sample size of 60 and, margin of error reduces to .8 with a sample size of 100. So what you are seeing here is that if you can increase your sample size this will reduce the width of your confidence interval. But also of course the trade off is, the more sample size, the more costly is your experiment.

Slide 17

The first equation on the slide is the formula for the confidence interval. And Z times sigma divided by the square root of n is the margin of error. If we call this value m and solve for the sample size then we can determine the sample size needed to achieve a particular margin of error. So the bottom equation here is the formula for the sample size and will be illustrated with the example in the next slide. But basically we are saying if we state what the margin of error is, then we can determine what size of sample we need to collect to be able to have that size of margin of error.

Slide 18

Let’s return to the problem of estimating the diastolic blood pressure. We know the population standard deviation is 50 millimeters of mercury. Suppose we would like to estimate the mean with the margin of error 5 with a 95 percent confidence. To determine the sample size, we use a sample size formula. The answer is 383 once we plug all the numbers in. But it is 383 in some fractional part. Since we cannot sample fractional part we are going to round the number up to 384. So we need a sample size of 384 to be able to estimate our population mean of the margin of error of 5.

Slide 19

Now you can take the quiz to check your understanding of this material, and this concludes the tutorial for unit 6 of lesson 1.