13.4 Test of Independence: Contingency Tables

Motivating Example:

Objective:

we want to determine whether the beer preference is independent of the gender of the beer drinker.

We want to test

Beer preference is independent of the gender

vs.

Beer preference is not independent of the gender

with .

We have the following data:

Beer Preference
Light / Regular / Dark / Total / Proportion
Gender / Male / 20 / 40 / 20 / 80 /
Female / 30 / 30 / 10 / 70 /
Total / 50 / 70 / 30 / 150 / 1
Proportion / / / / 1

The above table is called a contingency table.

If is true, then the expected numbersunder are

.

The expected numbers under can be summarized by

Beer Preference
Light / Regular / Dark / Proportion
Gender / Male / / / /
Female / / / /
Proportion / / /

Intuitively, if the differences between the observed number and the expect number (under ) , , are small, that might imply is true and thus the observed number and the expected number (under ) are close. The following statistic can be used to reflect the difference between the observed number and the expected number,

General Case:

Suppose there are two variables, column variable (with m categories) and row variable (with p categories). We want test the hypothesis

Row variable is independentof column variable

vs.

Row variable is not independentof column variable.

Suppose the sample size is n. The contingency tableis

Column Variable (m columns)
1 / ... / j / … / m / proportions
Row
Variable
(p rows) / 1 / / … / / … / /
i / / … / / … / /
p / / … / / … / /
proportions / / … / / … / / 1

If is true, thenthe expected numbersunder are

Column Variable (m columns)
1 / ... / j / … / m / proportions
Row
Variable
(p rows) / 1 / / … / / … / /
i / / … / / … / /
p / / … / / … / /
proportions / / … / / … / / 1

Note:

where

and

.

Thus, the chi-square statistic used to reflect the difference between the observed number and the expected number is

Next question:how large must be to reject ?

Chi-Square Test:

Let

As for every i and j, the chi-square test with level of significance for

Row variable is independentof column variable

vs.

Row variable is not independentof column variable.

is to

,

where can be obtained by

.

In addition,

.

Note: as is true, the random variable with sample value is .

Example (continue)

Since and , thus we reject. Also,

,

we also rejectbased on p-value. Therefore, we conclude that the beer preference is not independent of the gender of the beer drinker.

Example:

The following data are the number of people who are in favor of, are not in favor of, and have no comment on, some proposal:

Favor / Not Favor / No Comment
Male / 252 / 145 / 203
Female / 148 / 105 / 147

Please test if female and male differ in their opinions about the proposal with.

[solution:]

The column totals are while the row totals are . In addition, the total number is 1000.

The table for the expected numbers is

Favor / Not Favor / No Comment / Row Total
Male / / / / 600
Female / / / / 400
Column Total / 400 / 250 / 350 / 1000

Thus,

Since , we do not reject .

Online Exercise:

Exercise 13.4.1

Exercise 13.4.2

1