CHAPTER 15
Section 15.1 The Chi-square Test for Two-Way Tables
- Conducting a Chi-square test
The Chi-square test, for either independence or homogeneity, is conducted in the context of having information on two categorical variables. The information can consist of raw data or of data already organized in a table.
In Chapter 6 we saw how to conduct the test in the case of having raw data. In that case the first thing we did was to tabulate the observations in a two-way table; for that we used StatTablesCross Tabulation and Chi-Square. In the Cross Tabulation and Chi-Squarewindow,the indicated columns of the raw data were selected for the Categorical variables:dialog boxes. Next, the Chi-Square analysis button was selected to perform the Chi-square test, and the option Display Countswas selected to produce the counts (number of observations) in each cell of the table.
In Example 15.1, the raw data are not available; the data have alreadybeen tabulated. We need to type the table in a Minitab worksheet in order to perform the analysis.
Use FileNew to create a new worksheet and type in the data as shown in the worksheet below:
Save the worksheet with a name related to the problem to make it easy to remember. To save the file, use FileSave CurrentWorksheetAs then type the name of the file in the File name dialog box.
To perform the Chi-square test, useStatTablesChi-square Test (Two-Way Table in Worksheet) and then select the columns that contain the counts of the table:
After clicking OK the following output is obtained:
Chi-Square Test: No Infection, Yes Infection
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
No Infection Yes Infection Total
1 129 49 178
138.93 39.07
0.709 2.522
2 150 29 179
139.71 39.29
0.758 2.696
3 137 39 176
137.37 38.63
0.001 0.003
Total 416 117 533
Chi-Sq = 6.690, DF = 2, P-Value = 0.035
This is the same display you see in the book for Example 15.1 continued. The treatments (Placebo, Xyl Gum and Xyl Loz) are listed as 1, 2 and 3.
- Finding a p-value given the chi-square value and degrees of freedom.
The test statistic of the Chi-square test is not difficult to calculate: . In case you calculated the statistic manually and want to find the corresponding p-value, from the menu select
CalcProbability DistributionsChi-square
We first need to find P(X ≤ 6.69): select Cumulative probability, enter 2 for the Degrees of freedom and enter 6.69 (the value of the Chi-square statistic) in the Input constant dialog box.
The output is:
Cumulative Distribution Function
Chi-Square with 2 DF
x P(X<=x)
6.69 0.964740
Hence, the p-value is P(X > 6.69) = 1 – P(X ≤ 6.69) = 1 - 0.96474 = 0.03526.
(Alternatively) Use Graph > Probability Distribution Plot > View Probability
SelectChi-Square Distribution 2 degrees of freedom as shown below.
Select Shaded Area, X Value, Right Tail and enter 6.69 in the dialog box.
Clicking OK produces the desired result.
Section 15.2 Fisher’s Exact test
Fisher’s exact test for 2 × 2 tables, presented in Section 15.2 of the textbook , will be illustrated usingthe following example (this is not the example in Figure 15.6): ‘1 in 10 people taking Echinacea get a cold during the study while 4 of 10 taking a placebo get a cold.’ This data can be displayed in the way of a two-way table as seen in the following worksheet.
As described in the book, ‘given that 5 out of the 20 participants get a cold, the probability that only 1 or fewer are in the Echinacea group’ is the p-value for a one-sided Fisher’s Exact Test. Version 13 of Minitab and earlier versions do not have Fisher’s exact test as a menu option, however, the p-value can be calculated using the hypergeometric distribution. From the menu, select CalcProbability DistributionsHypergeometricand remember that given there are 5 ‘successes’ (colds) in the total group of 20 individuals, we want to calculate the probability that 1 or less are in the group of 10 who received Echinacea. Fill-in the dialog boxes as shown below.
The output is:
Cumulative Distribution Function
Hypergeometric with N = 20, X = 5, and n = 10
x P( X <= x )
1.00 0.1517
The experimental data does not constitute enough evidence (p-value=0.1517, rounded in the book to 0.152) that taking Echinacea reduces the risk of getting a cold.
NoteMinitab 15 includes the Fisher’s Exact Test for the two-sided alternative. Create a new worksheet and enter the data as shown below.
Select StatTablesCross Tabulation and Chi-square and fill-in the dialog boxes as shown below.
Select Other Stats… and check Fisher’s exact test for 2 x 2 tables.
The output is
Tabulated statistics: Treatment, Cold
Using frequencies in Count
Rows: Treatment Columns: Cold
No Yes All
Echinacea 9 1 10
Placebo 6 4 10
All 15 5 20
Cell Contents: Count
Fisher's exact test: P-Value = 0.303406
Note that the p-value (0.303406) for the two-sided alternative is equal to 2 times the p-value (0.1517) found previously for the one-sided alternative.
Section 15.3 Chi-square goodness of fit test
The question in Example 15.8 is if the digits 0, 1, 2, …, 9 are equally likely to occur in the lottery. Select File Newto create a new worksheet and type in the data. Since the null hypothesis says that all digits are equally likely, it makes sense to calculate the expected value as the sum of the observed values (500) divided by the number of digits (10).
Use Calc>Calculator to form the new column: (observed-expected)2/expected.
Now, use Calc > ColumnStatistics to calculate the sum of the values in C4. This sum is the chi-square statistic.
The output is: Sum of C4 = 6.0400
To find the p-value corresponding to the value of the chi-square statistic (6.04), from the menu, select:CalcProbability distributions> Chi-square, enter 9 (because there are 10 digits) for the number of Degrees of freedom and 6.04 (chi-square statistic) in the Input constant dialog box.
The output is:
Cumulative Distribution Function
Chi-Square with 9 DF
x P(X<=x)
6.04 0.264092
Therefore the p-value is 1-0.264092= 0.735908and the null hypothesis is not rejected (the 10 digits are equally likely).
(Alternatively) Based on the above worksheet with columns C1 (digit) and C2 (observed), select Stat > Tables > Chi-Square Goodness-of-Fit Test and fill-in the dialog boxes as shown below. Since we are testing H0: p = 1/10 for each of the 10 possible digits in first container we select Equal proportions.
Clicking OK gives the following. The graph below shows that the greatest difference between the observed and expected counts occurs for the number 5.
Chi-Square Goodness-of-Fit Test for Observed Counts in Variable: observed
Using category names in digit
Test Contribution
Category Observed Proportion Expected to Chi-Sq
0 47 0.1 50 0.18
1 50 0.1 50 0.00
2 55 0.1 50 0.50
3 46 0.1 50 0.32
4 53 0.1 50 0.18
5 39 0.1 50 2.42
6 55 0.1 50 0.50
7 55 0.1 50 0.50
8 44 0.1 50 0.72
9 56 0.1 50 0.72
N DF Chi-Sq P-Value
500 9 6.04 0.736
1