Ch8. Inference on Two Populations

AMS 572 Lecture Notes6

Oct. 14th, 2008

Ch8. Inference on Two Populations

I. Paired samples:

Find the paired difference, then it becomes one population problem.

II. Independent samples:

1. Normal population with known population variances

However, we know that

Data:

①Point estimator:

not the PQ, since we don’t know

( are independent)

Definition:, when , then .

Definition: is a special gamma random variable.

Definition:

t-distribution: , when , , and Z & W are independent.

Let

and are independent

where is pooled variance.

This is the PQ of the inference on the parameter of interest

②Confidence Interval for

This is the C.I for

③Test:

Test statistic:

At the significance level , if , reject in favor of .

If , reject .

Oct. 16th, 2008

e.g. In a study, 5 male volunteers participate in a two-phase experimental session. In the first phase, respiration was measured while the subject was awake and at rest. In the second phase, the subject was told to imagine that he was performing muscular work, and respiration was measured again. The following table shows the measurements of the total volunteers (liter of air per minute per square meter of body area) for 5 subjects.

subject / rest / work / Diff
1 / 6 / 6 / 0
2 / 7 / 9 / -2
3 / 8 / 9 / -1
4 / 7 / 10 / -3
5 / 6 / 7 / -1

(1) Use suitable test to investigate whether there is any difference between the two phases in terms of total ventilation. Please state the assumptions of the test and report the p-value. At the significance level of 0.05, what is your conclusion?

(2) Please write up the entire SAS program necessary to answer the questions in (1), including the data step as well as tests for testing two assumptions.

Solution:

(1) Assumption: the distribution of the difference is normal.

Test statistic:

P-value=2P(T< )=2*0.026=0.052>0.05, fail to reject .

(2) SAS Code:

data vent;

input subject rest work;

diff=rest-work;

datalines;

1 6 6

2 7 9

3 8 9

4 7 10

5 6 7

;

run;

procunivariatedata=vent normal;

var diff;

run;

Note: Check the Shapiro-wilk test. If the p-value>0.05, then at the 0.05 significance level the distribution is normal. We can use t-test. If the p-value<=0.05, we can use non-parametric test.

Result:

Test -Statistic------p Value------

Student's t t -2.74563 Pr > |t| 0.0516

Sign M -2 Pr >= |M| 0.1250

Signed Rank S -5 Pr >= |S| 0.1250

Tests for Normality

Test --Statistic------p Value------

Shapiro-Wilk W 0.960859 Pr < W 0.8140

Kolmogorov-Smirnov D 0.23714 Pr > D >0.1500

Cramer-von Mises W-Sq 0.03991 Pr > W-Sq >0.2500

Anderson-Darling A-Sq 0.231804 Pr > A-Sq >0.2500

F-test:

Both populations are normal, two independent random samples.

Data:

①Point estimator:

②P.Q:

Definition: F-Distribution

Let ,, and , are independent. Then

Pivotal Quantity

Test Statistic:

At significance level , we will reject in favor of iif. or

Reject iif

Reject iff

P-value=2*tail area bounded by .

Oct. 21st .2008

Homework#3 8.5

SAS code (data step):

data clouds;

input group rainfall;

datalines;

1 1230

1 830

…

1 1

2 2746

2 1698

…

2 8

2 4

;

run;

First check the normality of both populations

Shapiro-wilk test:

If p-value< , reject , which means the population is not normal.

SAS code:

procunivariatedata=clouds normalplot;

class group;

var rainfall;

run;

2. If the population is normal, use t-test.

F-test

Pooled-varianced:

Unequal-varianced (Satterthwaite):

Walch Satterthwaite method:

where

or another way to find df (less accurate and more conservative)

SAS code:

procttestdata=clouds;

class group;

var rainfall;

run;

3. If one of the two populations is not normal, we use the nonparametric test comparing “means” (medians) based on two independent samples. Use the Wilcoxan rank sum test.

SAS code:

procnpar1waydata=cloud;

class group;

var rainfall;

run;

4. Result:

group = 1

Tests for Normality

Test --Statistic------p Value------

Shapiro-Wilk W 0.597622 Pr < W <0.0001

Kolmogorov-Smirnov D 0.291111 Pr > D <0.0100

Cramer-von Mises W-Sq 0.713179 Pr > W-Sq <0.0050

Anderson-Darling A-Sq 3.835941 Pr > A-Sq <0.0050

Normal Probability Plot

1300+ *

| * +++

700+ +++++++

| +++++++

| ++++++** * *

100+ * * * ****++*+****** **

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

Shapiro-Wilk p-value<0.0001 Not normal

Group = 2

Tests for Normality

Test --Statistic------p Value------

Shapiro-Wilk W 0.656238 Pr < W <0.0001

Kolmogorov-Smirnov D 0.296716 Pr > D <0.0100

Cramer-von Mises W-Sq 0.646113 Pr > W-Sq <0.0050

Anderson-Darling A-Sq 3.391083 Pr > A-Sq <0.0050

Normal Probability Plot

2750+ *

| ++

| * * +++++++

| ++++++++

| ++++++++* *

250+ * * * * ****+******* * ***

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

Shapiro-Wilk p-value<0.0001 Not normal

Wilcoxon Two-Sample Test

Statistic 553.5000

t Approximation

One-Sided Pr < Z 0.0084

Two-Sided Pr > |Z| 0.0169

Two-sided p-value is 0.0169<0.05, reject.

*Suppose the problem 8.5 has the paired data:

SAS code:

data pairclod;

input seed unseed;

diff=seed-unseed;

datalines;

2746 1203

1698 830

...

8 5

4 1

;

run;

procunivariatedata=pairclod normalplot;

var diff;

run;