P201 Lecture Notes

Statistical Inference and EstimationChapter 6

The two processes that are statistical inference.

I. Estimation.

Question asked: What is the value of the population parameter?

Example: What % of U.S. citizens believe that Congress should raise the debt ceiling?

What % of persons in Chattanooga believe that kids under 18 should be allowed in Coolidge Park only under adult supervision?

What % of persons using the Riverwalk believe that dogs should prohibited?

Answers: A number, called a point estimate, or an interval, called a confidence interval. The interval is one that has a prespecified probability (usually .95) of surrounding the population parameter.

So the answer to the first example might be reported as “37% with a 5% margin of error.”

This is a combination of point estimate (the 37%) and a interval estimate (from 37-5 to 37+5 or from 32 to 42%).

II. Hypothesis Testing.

A. With one population. . .

Deciding whether a particular population parameter (usually the mean) equals a value specified by prior research or other considerations.

Example: A light bulb is advertised as having an “average lifetime” of 5000 hours.

Question: Is the mean of the population of lifetimes of bulbs produced by the manufacturer equal to 5000 or not?

B. With two populations. . .

Deciding whether corresponding parameters (usually means) of the two populations are equal or not.

Example: Statistics taught with lab vs. Statistics taught without lab.

Question: Is the mean amount learned by the population of students taught with a lab equal to the mean amount learned by the population of student taught without lab?

C. Three populations. . .

Deciding whether corresponding parameters (usually means) of the three populations are equal or not. And on and on and on.

Estimation – not covered in Corty

Estimation is using information from samples to guess the value of a population parameter or difference between parameters. A lot of this goes on during an election season.

Point Estimate

A single value which represents our single best estimate of the value of a population parameter.

Interval Estimate(usually reported as “Margin of error”)

An interval which has a prespecified probability of surrounding the unknown parameter.

This interval estimate is called a confidence interval (CI).

Typically the interval estimate is centered around the point estimate.

The sample statistic most often used as a point estimate is the Sample Mean.

Reporting the result of estimation: (Hypothetical data) . . .

“From the result of the XYZ poll, it is estimated that 17% of adult residents of the U.S. have tried gluten free diets, with margin of error equal to 3%.”

This means that the pollster is 95% confident that the actual population percentage of persons trying gluten free diets is between 14% and 20%.

Introduction to Hypothesis Testing: The mean of a population

Suppose we have a single population, say the population of light bulbs mentioned above.

Suppose we want to determine whether or not the mean of the population equals 5000.

You might wonder: What does it matter whether the mean is 5000?

Answer: The manufacturer might be selling light bulbs whose average lifetime is only 2000 hours and may be counting on the fact that no one really pays attention.

But the difference between 2000 hours and 5000 hours will add up over multiple purchases. In times when money is tight, those small differences may combine to make a large overall difference.

Two possibilities

H0. The population mean equals 5000.

H1. The population mean does not equal 5000.

These possibilities are called hypotheses.

The first, the hypothesis of no difference is called the Null Hypothesis. (H0)

The second, the hypothesis of a difference is called the Alternative Hypothesis. (H1)

Our task is to decide which of the two hypotheses is true.

“I reject the null”

vs

“I fail to reject the null” (I will sometimes say, “I retain the null.”)

Why can’t we just know about the population?

Light bulbs – The manufacturer may have simply made up the numbers on the package. They may have made a mistake in estimation.

Treatment for C.Diff – One doctor says take antibiotics for 2 weeks. He/she believes that the mean number of C.Diff bacteria will be essentially 0 after 2 weeks.

A 2nd doctor says take them for 6 weeks. He/she believes that the mean number of C.Diff bacteria will not be zero until 6 weeks.

The point is that no one “knows” the correct value. If we have a belief about a specific population value, we have to test that belief using hypothesis testing procedures.
Two general approaches.

1. The Bill Gates (Warren Buffet) approach.

Purchase ALL the light bulbs in the population. Measure the lifetime of each bulb. Compute the mean. If the mean equals 5000, retain the null. If the mean does not equal 5000, reject the null.

Problem: Too many bulbs.

2. The Plan B approach.

Take a sample of light bulbs.

Compute the mean of the sample.

From our study of sampling distributions, we know that the sample mean will not exactly equal the population mean. But we also know that it’ll be close to the population mean. If the population mean is 5000, then the sample mean should be close to 5000. So . . .

An intuitively reasonable decision rule

If the value of the sample mean is “close” to 5000, decide that the null must be true.

If the value of the sample mean is “far” from 5000, decide that the null must be false.

But how close is “close”? How far is “far”?

What if the mean of the lifetimes of 25 bulbs were 4999.99? Most rational people would retainnull.

What if the mean of the lifetimes of 25 bulbs were 1003.23 Most rational people would reject null.

What if the mean of the lifetimes of 25 bulbs were 4876.44? Hmm. A gray area.

Clearly we need some rules, or perhaps we should say that we need an operational definition of “close” and of “far”.

Close and Far as Probabilities – not covered in Corty

Recall from sampling distributions that means of samples from a population vary around that mean of the population. This variation is due to sampling error.

When we take samples from mean whose population is 5000, for example, the means most likely to occur will be those close to 5000.

The means least likely to occur will be those far from 5000.

This suggests that we could say that a mean is “close” to the hypothesized mean if it is one of those means that would be likely to be obtained from a population whose mean was the hypothesized value.

And we could say that a mean is “far” from the hypothesized mean if it is one of those that would be unlikely to be obtained from a population whose mean was the hypothesized value.
Our Decision Rule Stated in terms of Probabilities

So we can describe out decision rule (“close” vs. “far”) in terms of probabilities . . .

If the sample mean is one of those that would have high probability of occurring if the population mean were 5000, then we’ll conclude that the population mean must be 5000.

If the sample mean is one of those that would have low probability of occurring if the population mean were 5000, then we’ll conclude that the population mean must not be 5000.

The p-value.

Statisticians state the decision rule in terms of probabilities.

They formalize the process by computing a special probability called the p-value.

The p-value is the probability of an outcome (e.g., sample mean value) as extreme as the obtained outcome if the null hypothesis is true.

Statisticians base their decision on the p-value.

Our Decision Rule described in terms of the p-value

But if the p-value is larger than the agreed-upon criterion value, the null hypothesis will not be rejected.

Statisticians have agreed that if the p-valueis smaller than or equal to an agreed-upon criterion value then the null hypothesis is to be rejected.

Close = High probability = large p-value.Far= Low probability = small p-value.

Signficance Level

The criterion against which the p-value is compared is called the significance level.

Typically, the significance level is (arbitrarily) set at .05.

Our Final Description of the process in terms of p-value and significance level . . .

If the p-value is larger than the significance level, then the null hypothesis is not rejected.

If the p-value is less than or equal to the significance level, then the null hypothesis is rejected.

Corty’s Steps in Hypothesis Testing

Every hypothesis test involves a set of steps.

Every statistical text has a variation on the following list of steps.

These steps are always carried out, regardless of the type of hypothesis being tested.

Step 1. Pick the statistical test appropriate for your hypothesis.

Right now, you don’t know of any statistical tests. That will soon change.

Step 2. Make sure your data meet the assumptions of the test, e.g., unimodality, symmetry, near normality, no outliers.

Create a frequency distribution with Normal Curve Overlay. Look for outliers – extremely positive or negative values. Look at skewness values – should be |skewess| < = 1.5.

Step 3. List the null and the alternative hypothesis

See my example.

Step 4. Set the significance level of the test. (Corresponds to Corty’s “critical value” step.)

Significance level will always be .05 for this course. See below for critical values.

Step 5. Compute the value of the test statistic. Also compute its p-value.

See example.

Step 6. Compare the p-value with the significance level and make your decision. Then interpret the result.

You should memorize these steps for the next exam.
Worked Out Example – The light bulb example

Suppose that purchasers from across the country were contacted and instructed to go to the nearest hardware/building supply store and purchase a packet of the bulbs and then to select one bulb randomly from the packet. Those bulbs were then packed in foam and sent to a centralized testing facility where 100 of them were randomly selected from the nearly 500 shipped from across the country. Those 100 were plugged into standard sockets and power was applied. They were allowed to burn continuously until they failed and the time to failure of each bulb in the sample was recorded.

Suppose that the manufacturer had substantial evidence that the standard deviation of the population was equal to 300and that this value was not related to the mean of the population. We couldn’t actually know this in real life.

Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single population is the Z test. The test is called the One Population Z Test.

Step 2: Assumptions. We’ll assume that the data are essentially uniomodel and symmetric, pretty nearly normally distributed. We’ll assume that there are no outliers.

Step 3: Null hypothesis and Alternative hypothesis.

We want to determine whether the manufacturer’s claim that the “average” lifetime in the population is 5000 is true or not. This suggests the following . . .

Null Hypothesis: The mean of the population equals 5000. µ = 5000.

Alternative Hypothesis: The mean of the population does not equal 5000. µ ≠ 5000.

Step 4: Significance Level

This one's easy. It's quite common to use .05 as the criterion for the probability of the observed outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than .05, we retain the null. So let the significance level be .05

As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96.

Step 5: Computed value of test statistic and the p-value

Suppose the mean of the sample of 100 lifetimes was 4935.33 with sample standard deviation equal to 305.4.

Then Z = (X-bar – Hypothesized mean)

------

Standard error of the mean

Z = 4935.33 – 5000- 64.67

------=------=-2.16

300/10 30

The p-value for a Z of -2.16 is computed as a Normal Distribution Problem.

Recall that the p-value isthe probability of a value as extreme as the obtained value of Z.

The Z is 2.165, so any Z as large as + 2.16 or larger is “as extreme as” the obtained Z.

But note that any Z as negative as – 2.16 or more negative is also “as extreme as” the obtain ed Z.

So we want the probability of a Z as positive as +2.16 + probability of a Z as negative as -2.16.

To solve it, weget the two areas beyond 2.16 in either direction – the area to the left of the Z as a negative number and the area to the right of Z as a positive number.

Z =

Tail

area =

p-value = .0154 + .0154 = .0308

Step 6: Compare the p-value with the significance level.

.0308 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population mean is NOT equal to 5000.

Critical Values of the Z statistic

The Z obtained Z value for the above example was 2.16. The p-value was .0308.

Here are some other Z values that we could have obtained and the p-value for each of them.

Z value we could have obtained.p-valueDecision

0.500.6170Do not reject

1.000.3174Do not reject

1.500.1336Do not reject

1.700.0892Do not reject

1.800.0718Do not reject

1.960.0500Reject

2.16 (our Z)0.0308Reject

2.500.0124Reject

3.000.0027Reject

3.500.0005Reject

Note the pattern. As Zs get farther from 0, the p-values get smaller and smaller.

Note that there is one “special” Z value whose p is exactly 0.0500. That Z is called the critical Z. Its value is 1.96. If your Z is -1.96 or +1.96, you know that your p-value is exactly .05. Any Z larger in absolute value than the critical value will have a p smaller than .05.

This Z will always be the value that divides “Do not reject” from “Reject”, whenever you do a Z test.

This means that knowledgeable data analysts don’t even bother to compute p-values when they do a Z test. They remember that the Critical Z is 1.96 and after conducting their research, if their obtained Z is equal to or more negative than -1.96 or equal to or more positive than + 1.96, they reject.

We will use p-values, not critical values.

If all of our hypothesis tests were Z tests, then we would not bother computing p-values for any of them. If that were the case, we’d simply compare our obtained Z with 1.96 and base our decision on the result of that comparison.

But 99+% of our statistical tests will NOT be Z tests.

And computation of critical values for the other types of tests is cumbersome.

Luckily, our computer program will compute the p-value for each of the other tests that we’ll conduct. So we don’t have to deal with critical values of any of the statistical tests that follow.

(I may mention them in passing, but we’ll let the computer do that work for us.)

Another example.

A drug manufacturer has developed a drug that it believes will affect the duration of cold symptoms. Suppose that prior careful measurement of colds symptoms has determined that the population average duration of symptoms without treatment is 8 days with population standard deviation equal to 2 days.

The manufacturer recruits a sample of persons who sign a waiver allowing the representatives of the manufacturer to “give them” colds by swabbing their nasal passages with a fluid containing the cold virus. The time of swab is time 0.

The number of days until a carefully calibrated measure indicated the absence of cold symptoms was recorded. For 25 persons, the values are listed below . . .

days
Frequency / Percent / Valid Percent / Cumulative Percent
Valid / 3 / 3 / 12.0 / 12.0 / 12.0
4 / 1 / 4.0 / 4.0 / 16.0
5 / 4 / 16.0 / 16.0 / 32.0
6 / 7 / 28.0 / 28.0 / 60.0
7 / 4 / 16.0 / 16.0 / 76.0
8 / 2 / 8.0 / 8.0 / 84.0
9 / 1 / 4.0 / 4.0 / 88.0
10 / 3 / 12.0 / 12.0 / 100.0
Total / 25 / 100.0 / 100.0

Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single population is the Z test.

Step 2: Assumptions.

Check these

Step 3: Null hypothesis and Alternative hypothesis.

We want to determine whether the manufacturer’s claim that the “average” duration of symptoms in the population is 8 is true or not. This suggests the following . . .

Null Hypothesis: The mean of the population equals 8. µ = 8.

Alternative Hypothesis: The mean of the population does not equal 8. µ ≠ 8.

Step 4: Significance Level

This one's easy. It's quite common to use .05 as the criterion for the probability of the observed outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than .05, we retain the null. So let the significance level be .05

As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96.

Step 5: Computed value of test statistic and the p-value

Z = (X-bar – Hypothesized mean)

------

Standard error of the mean

Z = 6.32 – 8`-1.68

------=------=-4.2

2/5 0.4

The p-value for a Z of -4.20 is computed as a Normal Distribution Problem.

Recall that the p-value is the probability of a value as extreme as the obtained value of Z.

To solve it, weget the two areas beyond 4.2 in either direction – the area to the left of the Z as a negative number and the area to the right of Z as a positive number.

Tail

area =

p-value = .00133 + .00133 = .00266

Step 6: Compare the p-value with the significance level.

.0000 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population mean is NOT equal to 8.

Possible results of the Hypothesis Testing Process Start here on 10/23/14.

State of World

Null True, µ=5000Null False,µ≠5000

Fail to reject NullCorrectIncorrect Failure to Reject

Failure to rejectType II Error

Decision

Reject NullIncorrect RejectionCorrect Rejection Type I Error

Correct Failure to reject – a good outcome

The null is true.µ really does equal 5000. The manufacturer’s claim is true.

We do not reject the null but instead conclude that the null is true. We make a correct decision.

Correct Rejection – another good outcome

The null is false. µ really does not equal 5000The manufacturer’s claim is wrong.

We "detected" the difference between the actual population mean and the manufacturer’s claim.

Incorrect Rejection: Type I Error

The Null is true.µ really does equal 5000. The manufacturer’s claim is true.

But unbeknownst to us, because of a random accumulation of factors, our outcome was one which seemed inconsistent with the null. So we rejected it and incorrectly accused the manufacturer of lying on its packaging.

Controlling P(Type I Error): The significance level.

So, in most research, the probability of this error is .05.