YALESchool of Management
EMBA MGT511- HYPOTHESIS TESTING AND REGRESSION

K. Sudhir

Sessions 1 and 2
Hypothesis Testing

1. Introduction to Hypothesis Testing
A hypothesis is a statement about the population. The hypothesis (statement about the population) may be true or false.
Examples of Hypothesis:
1. The average salary of SOM MBA students who finished their MBAs in 2001 is $110000.

2. The proportion of 2001 SOM MBA graduates who had jobs at the time of graduation is 0.9.

For these hypotheses about the population (SOM MBA students who finished their MBAs in 2001), it is easy to verify whether the statement is true by asking every student who is graduating (i.e., take a census) their starting salaries as well as whether they had a job at the time of graduation. In that case, we can categorically say whether the hypothesis is true or false.
In most practical situations, however, it is not possible to conduct a census. Suppose we had not collected the data from the students at the time of graduation, but now need this information. We could send a survey to these alumni and ask them. It is likely that only a fraction of the alumni would respond. Assuming we obtained a representative set of responses, we could use this sample to still assess whether our statement about the population is true or false. However, we have to recognize that there is likely to be some probability with which we could make errors, because the sample mean would be different from the population mean.
The approach of using sample data to assess whether a hypothesis is true or false is the essence of hypothesis testing.
There are two types of hypotheses: the null and alternative (research) hypothesis.
The null hypothesis is usually the default belief about the population parameter.
The alternative hypothesis reflects a research claim.
Consider the following null and alternative hypothesis.

Null / Alternative
Defendant is innocent / Defendant is guilty
Machine is “in control”
(working according to specs) / Machine is “out of control”
The new drug is no better than a placebo / The new drug is better than a placebo
The portfolio manager’s performance is equal to the S&P 500 performance / The portfolio manager’s performance exceeds the S&P 500 performance

In hypothesis testing, the null hypothesis is assumed as the default unless the evidence is strong enough to claim support for the alternative. It is a method of “proof by contradiction.” More precisely, it offers proof of the alternative hypothesis by contradicting the null hypothesis.

1. Defendant is assumed innocent until proven guilty

2. Machine is assumed to be “in control” until the evidence suggests otherwise.

3. A new drug is assumed to be ineffective unless it is shown to be better than a placebo (or some other benchmark).

  1. A portfolio manager is assumed to perform no better than the S&P 500 unless…

We use the weight of the evidence from the sample to see if we can reject the null hypothesis. If the weight of the sample evidence is such that the null hypothesis is unlikely, we reject the null.

Type I and Type II Errors

Since we use sample evidence to reject or not reject the null hypothesis, we face the possibility that there will be some errors in the decisions due to sampling variation. There are two types of errors: Type I and Type II Errors.

A Type I error occurs if the null hypothesis is true, but it is rejected. This is akin to convicting an innocent defendant.

A Type II error occurs if the null hypothesis is false, but it is not rejected. This is akin to acquitting a guilty defendant.These errors are well summarized in the table below:

Decision

Type I error / Correct
Decision
Correct
Decision / Type II error

Reject Ho

Do not reject Ho

We want to keep both types of errors to a minimum. However, for a constant sample size, these errors are negatively correlated. Reducing one increases the likelihood of the other. Carrying forward the innocent-guilty example: A low probability of convicting someone who is innocent (Type I Error) implies a very high threshold for conviction of any defendant. But such a high threshold implies that you are likely to acquit a guilty defendant (Type II Error). So when you set the acceptable level of one type of error, you automatically set the level for the other type of error, unless you change the sample size.

In practice, we typically control for Type I error. We often allow for a 5% level of Type I error. But the level of error is in fact a managerial decision when doing hypothesis testing. If Type I Error is more costly than Type II error, then managers will want to keep Type I error to a minimum. If Type II error is more costly that Type I error, then managers may accept higher Type I errors.

Practice Exercise: Think of situations where Type I errors may be costlier than Type II errors and vice versa.

2. The Hypothesis Testing Process

Hypothesis testing involves a series of steps as shown by the following example problem.

Step 1: Defining the Null and Alternative Hypotheses

A machine is designed to fill bottles with an average content of 12 oz. and a standard deviation of 0.1 oz. Periodically, random samples are taken to determine if the machine might be “out of control”. Define the null and alternative hypotheses for this problem using symbols and in words.

Note: When defining the hypotheses, it is critical to define the population precisely.

Null Hypothesis: Ho: = 12 oz.

The true average content of all bottles filled by the machine during the time period of sampling is 12 oz. (note the precise definition of the population of interest)

Alternative Hypothesis: HA: ≠ 12 oz.

The true average content of all bottles filled by the machine during the time period of sampling is not 12 oz.

Step 2: Specify the appropriate probability of Type I Error (alpha)

Since we use sample evidence to reject or not reject the null hypothesis, we face the possibility of errors in the decisions due to sampling variation. There are two types of errors: Type I and Type II Errors. Recall that, for any given sample size, reducing the probability of Type I error increases the likelihood of Type II error. We typically control (minimize) the probability of Type I error.

Decision

Type I error / Correct
Decision
Correct
Decision / Type II error

Reject Ho

Do not reject Ho

The implication of specifying a 5% type I error probability (alpha=0.05) is that we will reject the null hypothesis (Ho) even when it is true 5% of the time. Thus, the sample outcomes for which we reject Ho are determined under the assumption that Ho is true. We make our decision to reject or not reject the null as follows. If the sample outcome is among the 5% least likely outcomes assuming that the Null hypothesis is true, then we reject the null. The basic logic of this decision is that the 5% the least likely outcomes under the null hypothesis are more likely to occur if Ho is false.

Note the similarity with a court procedure. A defendant is charged with a crime. The judge and jury assume the defendant is innocent until proven otherwise. If the sample evidence presented by the prosecutor is very unlikely to occur under the assumption that the defendant is innocent, the decision of guilt is favored. Thus, the decision is based on inconsistency (highly unlikely evidence to occur under the assumption of innocence) between the evidence and the assumption of innocence.

How do we know what are the 5% least likely outcomes? The sampling distribution of the sample statistic of interest will help us to identify the least likely (most extreme) outcomes. So the next step is to set up an appropriate test statistic for this problem and identify what values of the statistic cause us to reject the null hypothesis.

Step 3: Defining and Justifying the Relevant Test Statistic

In the above problem, we know the population standard deviation (). Given random sampling, the sample mean will tend to follow a normal distribution with mean equal to the population mean () and the standard deviation . It is conceivable that the content of individual bottles is actually normally distributed. But if it is not, the Central Limit Theorem allows us to claim that the theoretical distribution of all possible sample means, of a given sample size, will tend to a bell-shaped curve (as the sample size increases). Given this knowledge of the distribution, we can find out what the 5% least likely values of are. However, it is conventional to specify the rejection region in terms of extreme values for the standardized test statistic. If the null hypothesis is true, then the population mean should be . So we specify the rejection region in terms of standardized units, i.e. we follow the usual procedure of standardizing the variable of interest when we need to compute probabilities.

Z =

Step 4: Determining the Rejection Region

Having defined the test statistic, we now decide for what values of the test-statistic we reject the null hypothesis. The selection of the rejection region depends on whether the hypotheses are stated as one- or two-tailed tests.

For a two-tailed test, the five percent of the least likely values are split equally between the two tails as in the figure above. So and are the 5% least likely values if the null hypothesis is true. Therefore, we will reject Ho if the computed Z-value based on the sample data is in this rejection region.


Steps 5 and 6: Computing the test statistics and drawing statistical and managerial conclusions

Exercise: For the above bottling machine problem, suppose a simple random sample of 100 bottles is taken and the sample mean is 11.982 oz. Is this sample result among the five percent least likely to occur under the null hypothesis (Ho)?

From the null hypothesis, =12.

Z = .

Since and is the rejection region with the 5% least likely values, this computed value from the sample does not fall in the rejection region. Hence we cannot reject the null hypothesis (statistical conclusion).

Strictly speaking, we conclude that the machine is functioning according to bottling content specifications at the time the sample was taken (managerial conclusion). However, one might argue that it is possible for the machine to be “out of control” but that we have insufficient evidence to reject Ho at the 5% level. Compare a jury’s decision that the defendant is not guilty.

Summary of Steps in Hypothesis Testing

  1. Identify the appropriate null and alternative hypotheses (Ho and HA). Be precise about the interpretation of hypotheses, by carefully identifying the population of interest.
  2. Choose an acceptable probability of Type I error (alpha).
  3. Define a relevant test statistic. Justify it.
  4. Determine the rejection region (for what values of the test statistic will Ho be rejected?).
  5. Collect the data (in practice, we need to consider the proper sample size so that the probability of a type II error is controlled), compute sample results and calculate the test statistic value.
  6. Draw statistical and managerial conclusions.

P-Values

An interesting issue with deciding on the level of Type I error is: who should decide what type I error probability is tolerable? What if the person conducting the test does not know the decision maker’s tolerance?

One solution to this problem is the following. Instead of testing H0 at a specified type I error probability (alpha), we can report the probability of a type I error if H0 is rejected This probability is called p-value.

In the example above, what is the probability of Type I error if the Ho is rejected?

We can answer this directly by looking at what fraction of values of Z lies below -1.8 and above -1.8.

We can do this either by looking at the normal tables in a statistics textbook or using Excel.

In Excel, the function =normsdist(Z) (Hint: this is short form for Normal Standardized Distribution) can be used to find out P(X<Z). Plugging this function in to Excel tell us that P(X<-1.8) is 0.036; i.e., 3.6% of the values lie below -1.8.

Since the rejection region includes both P(X<-1.8) and P(X>1.8), the probability of Type I error will be 0.036*2=0.072. Hence the p-value is 0.072.

Question: What should we do to compute p-value when Z is positive?

When Z is positive we need to compute P(X>Z). Since P(X>Z) = 1 – P(X<Z), simply compute1-normdist(Z), when Z is positive and get the p-value by multiplying that number by 2.

3. One-tailed versus Two-tailed tests

Two-tailed test:

Recall: In the example in the previous section, we conducted a two-tailed test. The null and the alternative hypotheses were:

Null Hypothesis: Ho:

Alternative Hypothesis: HA: .

We rejected the null if or . This implies that the person conducting the test wants to stop the machine if it is out of control in either direction. That is, there is a cost associated with having an excessive amount of liquid as well as with an insufficient amount in the bottles.

One-tailed test:

Suppose in the machine-bottling problem, we take the perspective of a distributor, who does not care if the bottles truly have more than 12 oz. on average (or the bottles have a maximum capacity of 12 oz). That is, the distributor is only concerned about insufficient content on average. So the distributor takes samples out of batches of items and returns the entire batch if the hypothesis test shows that the contents are less than 12 oz on average. Now the null and the alternative hypotheses are:

Null Hypothesis: Ho: .

Alternative Hypothesis: HA: .

For a one-tailed test, the alternate hypothesis expresses the values of the parameter for which we want to reject the null hypothesis. If we create mutually exclusive and collectively exhaustive hypotheses, the null hypothesis must then have the complement of the alternative. Note that it is usually easier, in practice, to start with the alternative hypothesis. It represents what management is concerned about or what a researcher might believe based on theory. The test proceeds under the assumption that the equality case under the null hypothesis applies. In other words, the machine is assumed to be in control, i.e. the true average is assumed to be 12 oz. But now only the 5% extreme cases in the left tail will result in rejection of the null hypothesis.

As before, we use the Z statistic because we know the population standard deviation. As argued above, the Z-statistic is still computed at the boundary of the null hypothesis (12 oz).

Z = .

Since now all of the extreme values for rejection are concentrated in the left tail, the 5% rejection region is for computed . (See the figures on the next page)

The interesting finding is that for the same test result, we cannot reject the null hypothesis at the 5% level if the test is two-tailed but we can if it is one-tailed.

Therefore, for the one-tailed test, we reject the null hypothesis. (Statistical Conclusion)

The batch of bottles received by the distributor from this manufacturer has lower than the specified contents of 12 oz and therefore must be returned to the manufacturer. (Managerial Conclusion)

Note that the two-tailed test is more conservative in rejecting the null hypothesis. While the z-score needs to be below –1.96 to reject the null in the two tailed test, it needs to be just –1.645 to reject the null in the one-tailed test. In this case, the distributor thus rejects the null hypothesis with a lower threshold of evidence, than the manufacturer.



4. What are appropriate test-statistics for hypothesis testing?

The choice of the appropriate test-statistic is critical in hypothesis testing. In the example above we were testing a hypothesis about a population mean. We used a Z-statistic, because we knew the population standard deviation and we knew Y was normally distributed. Even if Y is not normally distributed, we know by the Central Limit Theorem that will be normally distributed for sufficiently large samples. Therefore we can use the Z-statistic. However, if the population standard deviation is not known, we need to use a t-statistic instead to compensate for additional uncertainty due to the use of the sample standard deviation instead of the population value. Note that we still require the theoretical distribution of all possible sample means to be normal.

We now discuss appropriate test statistics for means and proportions under different conditions.

Test of One Mean

Population Standard Deviation is known:

H0:  = 0

HA: 0

Condition: If (1) simple random sampling

(2) is normally distributed (because Y is normal or Central Limit Theorem applies) and

(3)  is known,

then use the Z statistic:

Z = where: =

(assuming N is large so that we can

ignore the finite population correction

factor)

Population Standard Deviation is unknown

Condition: If (1) simple random sampling

(2) is normally distributed (because Y is normal or Central Limit Theorem applies) and

(3)  is unknown,

then use the t statistic:

t = where: =

with (n-1) df (assuming N is large; re: finite population correction)

Test of one proportion

H0:  = 0

HA: 0

Condition: If (1) simple random sampling

(2) is approximately normally distributed (for large N)

then we can use the Z statistic:

Z = where: =

However this statistic can be used only if the following conditions are satisfied:
n 0 > 5 and n (1-0) > 5

Example:

John Rowland and Bill Curry are candidates for CT Governor. We are interested in knowing who is likely to win the election on Nov 5, 2002. A survey of 900 CT “likely voters” on October 30, 2002 asked who they intend to vote on election day. 56% of respondents said they intend to vote for Rowland and 44% for Bill Curry. Test the hypothesis that one of the candidates is more likely to win the election.

Let be the proportion of likely voters who intend to vote for Bill Curry on Nov 5, as of October 30. (You could just as well have written the hypothesis in terms of voters intending to vote for Rowland and do this test).