Normal Approximation to the Binomial Distribution

Confidence Interval Estimation of a Population Proportion, a Example

Assume that I select a random sample of size n from a population, where n is “large.” For each member of the sample, I want the answer to a yes-or-no question. For sample member i, let

, if the person says, “Yes,” or , if the person says, “No.” Assume that the fraction of members of the population who would give “Yes” answers is p.

i)There is a fixed number, n, of trials.

ii)The trials are identical to each other, since each one consists of randomly selecting a member of the population and asking the yes-or-no question.

iii)The trials are independent of each other, due to random sampling.

iv)Each trial has two possible outcomes: Success = {person says, “Yes”} or Failure = {person says, “No”}.

v)P(Success) = p for each trial, due to random sampling.

Let . Then Y ~ Binomial( n, p). The mean of this distribution is , and the standard deviation is .

Now consider the answers to the question as a random sample of size n from a distribution that is

Binomial(1, p). The mean of this distribution is , and the standard deviation is . The sample mean is , where is the fraction of “Yes” answers from the members of the sample. If the sample size is large, then the Central Limit Theorem says that

will have an approximate normal distribution with mean and standard deviation . Then for large samples, we can calculate approximate binomial probabilities using the Central Limit Theorem.

Example: On October 27 – 30, 2008, ABC News/ Washington Post conducted a poll about the outcome of the Presidential election. A national random sample of 1,580 likely voters were asked who they supported for President. There were 837 members of the sample who said that they supported Senator Barack Obama for President.

i)The experiment consists of a fixed number (n = 1,580) of trials.

ii)The trials are identical to each other, since each trial consists of randomly selecting a person from the population of likely voters and asking the person, “Do you intend to vote for Sen. Barack Obama for President?”

iii)The trials are independent of each other, due to random sampling.

iv)Each trial has two possible outcomes: Success = {person says, “Yes.”} or Failure = {person says, “No.”}.

v)P(Success) is the same for each trial, due to random sampling.

In this situation, we do not know the value of p, the Senator’s level of support in the population; the purpose of the experiment is to estimate p.

Go to STAT, TESTS, 1-PropZInt. For x, we will enter 837, the number of successes in our binomial experiment (the number of people in the sample who said, “Yes”). For n, we will enter 1,580, the size of the sample. For C-Level, we will enter 0.95, the desired level of confidence for our interval estimate.

We are 95% confident that the proportion of the population of likely voters who support Sen. Obama for President is between 50.514% and 55.436%. In particular, since the lower confidence limit is greater than 50%, we predict that Sen. Obama will win the election.