# Inference on Several Proportions the Chi-Square Test (Large Sample)

AMS 572 Lecture Notes 7

Nov. 16th , 2009

**Inference on several proportions—the Chi-square test (large sample)**

Def. Multinomial Experiment.

We have a total of n trials (sample size=n)

① For each trial, it will result in 1 of k possible outcomes.

② The probability of getting outcome i is , and =1

③ These trials are independent.

e.g. Previous experience indicates that the probability of obtaining 1 healthy calf from a mating is 0.83. Similarly, the probabilities of obtaining 0 and 2 healthy calves are 0.15 and 0.02 respectively. If the farmer breeds 3 dams from the herd, find the probability of getting exact 3 health calves.

Def. Multinomial Distribution

Let be the number of trials resulted in i-th category out of a total of n trials and be the probability of getting i-th category outcome, then

Solution:

P(exact 3 health calves)=+

=0.015+0.572=0.59

*Relations to the Binomial Distribution (k=2)

Category / 1 / 2Probability / =p / =1-p

# trials / =x / =n-x

**Chi-square goodness of fit test**

e.g1. Gregor Mendel (1822-1884) was an Austrian monk whose genetic theory is one of the greatest scientific discovery of all time. In his famous experiment with garden peas, he proposed a genetic model that would explain inheritance. In particular, he studied how the shape (smooth or wrinkled) and color (yellow or green) of pea seeds are transmitted through generations. His model shows that the second generation of peas from a certain ancestry should have the following distribution.

wrinkled-green / wrinkled-green / smooth-green / smooth-yellowTheoretical probabilities / / / /

n=556

General test:

Test whether the theoretical probability is correct

T.S

where is the observed number of observations in category i

is the expected count of the i-th category ,

At the significance level α, reject iff

Nov. 16th , 2009

Solution:

wrinkled-green / wrinkled-green / smooth-green / smooth-yellowTheoretical probabilities / / / /

Observed count out of 556 / =31 / =102 / =108 / =315

Expected counts / =34.75 / =104.25 / =104.25 / =312.75

T.S

< =7.815

At significance level 0.05, we cannot reject

e.g2. A classic tale involves four car-pooling students who missed a test and gave as an excuse of a flat tire. On the make-up test, the professor asked the students to identify the particular tire that went flat. If they really did not have a flat tire, would they be able to identify the same tire?

To mimic this situation, 40 other students were asked to identify the tire they would select.

The data are:

Tire / Left front / Right front / Left rear / Right rearFrequency / 11 / 15 / 8 / 6

At α=0.05, please test whether each tire has the same chance to be selected?

Solution:

n=40, =n=10

Fail to reject .

**The chi-square goodness of fit test is an extension of the Z-test for one population proportion.**

Data: sample size n, x: successes with probability p

n-x: failures with probability 1-p

TS.

At α, reject iff

Success / FailureExpected / /

Observed / =x /

=

=

Recall: If , then .

When k=1,

Let Z~N(0,1), then

=

The two tests are identical.

Nov. 23th , 2009

- Fisher’s exact test:

**Inference on 2 population proportions, 2 independent samples**

e.g1. The result of a randomized clinical trial for comparing Predsome and Predsome+VCR drugs, is summarized below. Test if the success and failure probabilities are the same for the two drugs.

Drug / Success / Failure / Row totalPred

PVCR / 14

38 / 7

4 / =21

=42

m=58 / n-m=11 / n=63

General setting:

“S” / “F” / TotalSample1 / x / -x /

Sample2 / y / -y /

m=x+y / n-m / n

Solution:

<0.05

Reject

SAS code:

Data trial;

input drug $ outcome$ count;

datalines;

pred S 14

pred F 7

PVCR S 38

PVCR F 4

;

run;

proc freq data=trial;

tables drug*outcome/chisq;

weight count;

run;

- McNemar’s test

**Inference on 2 population proportions- paired samples**

e.g2. A preference poll of a panel of 75 voters was conducted before and after a TV debate during the campaign for the 1980 presidential election between Jimmy Carter and Ronald Reagan. Test whether there was a significant shift from Carter as a result of the TV debate.

Preferencebefore / Preference after

Carter / Reagan

Carter / 28 / 13

Reagan / 7 / 27

**General setting:**

response / Condition2 response

Yes / No

Yes / A=a, / B=b,

No / C=c, / D=d,

+++=1, A+B+C+D=n, (A, B, C, D)~Multinomial

,

P(B=k| B+C=m)~Bin(m,p=)

Under , P(B=k| B+C=m)~Bin(m,p=1/2)

①

②

③

Solution:

SAS code:

Data election;

input before $ after $ count;

datalines;

Carter Carter 28

Carter Reagan 13

Reagan Reagan 27

Reagan Carter 7

;

run;

proc freq data=election;

tables before*after/agree;

weight count;

run;