ECON 309
Lecture 7A: Confidence Intervals
I. The Central Limit Theorem
The Central Limit Theorem is a convenient result from probability theory that allows us to use the normal distribution even in many cases where variables are not normally distributed.
Suppose we take a 6-sided die, and we throw it n times. We know that the probability distribution for a die is not normal – it is discrete and uniform. But what is the distribution of the sample mean? Notice that you’re much more likely to get a sample mean close to true mean (3.5) than a sample mean that is near 1 or 6. For example, what if we had a small sample of n = 2? We could get a sample mean of 1 only by throwing two 1’s. We could get a sample mean of 6 only by throwing two 6’s. But we could get a sample mean of 3.5 by throwing (1 and 6), (6 and 1), (2 and 5), (5 and 2), (3 and 4), or (4 and 3). So there are many more ways to get a sample mean in the center than at the edges. And that’s for a tiny sample of size n = 2. As n gets larger, it becomes even less likely that you’ll get an average far from the center.
To understand the CLT, you need to understand that a sample mean is a random variable; it just happens to be a random variable that is constructed from n other random variables (the observations). And like any random variable, we can talk about its distribution.
The CLT says that as n gets larger, the distribution of the sample mean gets more and more like the normal distribution.
II. Standard Error of the Mean
If the sample mean is a variable, we can talk about its mean and standard deviation. Not surprisingly, the mean of the sample mean is the same as the mean of the population the sample is taken from.
What about the standard deviation of the sample mean? It is not the same as the standard deviation of the population the sample is taken from, but it is related to it like so:
The symbol on the left is the standard deviation of the sample mean, which is formally known as the standard error of the mean. Notice that it gets smaller as the sample size (n) grows. This makes sense. The bigger your sample is, the more likely its sample mean is to be close to the true mean. If you had a sample size equal to the whole population, it would be impossible for the sample mean to be anything other than the population mean, so the standard error of the mean would be zero.
The CLT tells us that, for a large enough sample, the sample mean is normally distributed with a mean of μ and a standard deviation (standard error) of σ/√n. [Show this with bell curve; show curve getting tighter and taller as n gets larger.]
III. Forming a Confidence Interval if σ Is Known
A confidence interval (CI) is a range of values that you believe is likely to contain the true population mean.
To create a CI, you start with your sample mean. This is your best guess as to the value of the population mean; we sometimes call it your “point estimate.” But it is only a guess. We would like to give a range of values instead of a point, and then state how confident we are that the range includes the population mean.
So you take your sample mean, and you add and subtract something from it to get the endpoints of your range. If you happen to know the population variance, then you also know the standard error of the mean, and you can use the following formula for your confidence interval:
We will discuss what to use for zc in a moment. For now, notice you’re multiplying it by the standard error of the mean, which gets smaller as the sample gets larger. So our CI will shrink as our sample gets larger, because we are more confident in our point estimate.
So what is zc? This is “z-critical,” a z-score that represents how confident we want to be. If we want to be very confident our CI includes the population mean, then we’ll want a larger z-critical; if we don’t need to be that confident, we can have a smaller z-critical.
People often want to have a 90% CI. This means we want to construct our CI’s so that 90% of the time they will include the population mean – or that our CI will not include the population mean only 10% of the time.
Notice that there are two ways the population mean can fall outside the CI: by being below the lowest value, or by being above the highest value. So the 10% chance of the CI not including the population mean needs to be split in two: 5% of the time our CI will be too high, and 5% of the time it will be too low.
To find our z-critical, then, we need a value of z from Table 3 such that 0.95 of the weight will be to the left of it. Looking in the values inside the table, we find 0.9495 at z = 1.64, and we can’t get any closer to 0.95 than that. We set z-critical equal to 1.64 in the CI formula above. This means that weight of 0.05 falls the right of z = 1.64, another 0.05 falls to the left of z = -1.64, and 0.90 falls in between -1.64 and 1.64.
Example: You want to find out CSUN students’ mean income. You take a sample of n = 49 students and find the sample mean is $26,000. You happen to know the standard deviation is $7,000. You want a 90% CI for the mean income. It is:
26,000 ± (1.64)(7000/√49)
26,000 ± (1.64)(1000)
26,000 ± 1640
[$24,360, $27,640]
What if you’d wanted to be more confident? We could have constructed a 95% CI by finding a different z-critical. You want a value of z from Table 3 such that 0.975 of the weight falls to the left of it (that is, we want 0.025 in each tail). That z-critical is 1.96.
26,000 ± (1.96)(7000/√49)
26,000 ± (1.96)(1000)
26,000 ± 1960
[$24,040, $27,960]
Notice the CI is wider now, because you wanted to be more certain you wouldn’t leave out the true mean.
IV. About Confidence Levels
We will sometimes call the probability that a CI will not include the population mean the “significance level” or α. When we used a 90% confidence level, we had α = 0.10. When we used a 95% confidence level, we had α = 0.05.
There is nothing magic about significance levels! You’ve probably had 5% or α = 0.05 drilled into your head in previous stats classes. Maybe sometimes you used α = 0.10 or α = 0.01. But there’s no reason we have to use any of those. We could have picked any α we wanted. It’s all a question of how confident we want to be.
Scientists tend to pick a rather small α. This means they want to be very confident in their predictions. But those small α’s may not be appropriate for other contexts, such as business. We will discuss this more later, when we’re doing significance tests.
V. About the Population Standard Deviation
In the above discussion above CI’s, we assumed we knew the population’s standard deviation. What a bizarre assumption! Why would we already know the population’s standard deviation but not know the population mean? Most of the time, you won’t.
So what canyou do? When you have a large enough sample size (the rule of thumb is n 30), you can do the exact same procedure, but use the sample standard deviation in place of the population standard deviation.
In other words, use this:
(The fraction there on the right is called the estimated standard error of the mean.) Notice that we are still using a z-critical value. This is because, when n is large enough, the CLT gives us relatively great confidence that our sample means are normally distributed.
But what if we don’t have a large enough sample? In that case, we can no longer use the normal distribution (and corresponding table). Instead, we use the Student’s t distribution (and corresponding table – in our book, Table 4). The t-distribution is very similar to the normal distribution, but it’s flatter. This reflects the fact that when your sample is small, it’s more likely that you’ll get sample means far away from the population mean. Our new CI formula is like so:
Same as the above, except with t-critical replacing z-critical. So now all we need to do is find the t-critical. We do this by looking at Table 4.
We look for the row with degrees of freedom = n – 1. (Degrees of freedom is a complicated concept, but it will always be equal to your sample size minus the number of things you’re trying to estimate. In this case, you’re only estimating the mean, so you subtract 1. When we get to regressions, we’ll be estimating more things and you’ll have to subtract more.)
We look for the column that corresponds to the confidence level we want. This is the second row of boldfaced column headings. Say we want 95% confidence; we choose the 5th column from the right, with the column headings 0.0250 and 0.9500. (The first column heading is the area in the right tail; the second column heading is the confidence level. The reason these don’t add up to 1 is that we want 0.0250 in both the left and right tails.)
If we can use the t-distribution for small samples, why not for big samples? Well, that’s actually what economists and statisticians do now. The only reason we still teach people the z-distribution method any more is history. The t-table consumes a lot of space (you need a row for every different sample size), and the z- and t-distributions are similar enough when n is large that it was good enough to use the z-table instead. But now we have computers with enough memory to have complete t-tables. So when you have Excel and other statistical programs construct a CI, they will always use the t-distribution.
(So why do I teach the z-distribution? First, to be consistent with everyone else. Second, so I can tell you what I just told you. Third, because this way you can use the sample problems in the book for practice; some use z and some use t, but all rely on pretty much the same method.)
Example: CSUN students’ incomes again, but now suppose you only have a sample of 16. The sample mean is $26,000 (like before), and you don’t know the true standard deviation. Your sample standard deviation is s = 6800. To find t-critical, note that df = 16 – 1 = 15, and suppose we want a 90% CI; this give us t-critical = 1.75. The CI is:
26,000 ± (1.75)(6800/√16)
26,000 ± (1.75)(1700)
26,000 ± 2975
[$23,025, $28,975]