AMS 572 Lecture Notes6
Oct. 14th, 2008
Ch8. Inference on Two Populations
I. Paired samples:
Find the paired difference, then it becomes one population problem.
II. Independent samples:
1. Normal population with known population variances
However, we know that
Data:
①Point estimator:
not the PQ, since we don’t know
,
( are independent)
Definition:, when , then .
Definition: is a special gamma random variable.
Definition:
t-distribution: , when , , and Z & W are independent.
Let
and are independent
.
where is pooled variance.
This is the PQ of the inference on the parameter of interest
②Confidence Interval for
This is the C.I for
③Test:
Test statistic:
At the significance level , if , reject in favor of .
If , reject .
Oct. 16th, 2008
e.g. In a study, 5 male volunteers participate in a two-phase experimental session. In the first phase, respiration was measured while the subject was awake and at rest. In the second phase, the subject was told to imagine that he was performing muscular work, and respiration was measured again. The following table shows the measurements of the total volunteers (liter of air per minute per square meter of body area) for 5 subjects.
subject / rest / work / Diff1 / 6 / 6 / 0
2 / 7 / 9 / -2
3 / 8 / 9 / -1
4 / 7 / 10 / -3
5 / 6 / 7 / -1
(1) Use suitable test to investigate whether there is any difference between the two phases in terms of total ventilation. Please state the assumptions of the test and report the p-value. At the significance level of 0.05, what is your conclusion?
(2) Please write up the entire SAS program necessary to answer the questions in (1), including the data step as well as tests for testing two assumptions.
Solution:
(1) Assumption: the distribution of the difference is normal.
Test statistic:
P-value=2P(T< )=2*0.026=0.052>0.05, fail to reject .
(2) SAS Code:
data vent;
input subject rest work;
diff=rest-work;
datalines;
1 6 6
2 7 9
3 8 9
4 7 10
5 6 7
;
run;
procunivariatedata=vent normal;
var diff;
run;
Note: Check the Shapiro-wilk test. If the p-value>0.05, then at the 0.05 significance level the distribution is normal. We can use t-test. If the p-value<=0.05, we can use non-parametric test.
Result:
Test -Statistic------p Value------
Student's t t -2.74563 Pr > |t| 0.0516
Sign M -2 Pr >= |M| 0.1250
Signed Rank S -5 Pr >= |S| 0.1250
Tests for Normality
Test --Statistic------p Value------
Shapiro-Wilk W 0.960859 Pr < W 0.8140
Kolmogorov-Smirnov D 0.23714 Pr > D >0.1500
Cramer-von Mises W-Sq 0.03991 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.231804 Pr > A-Sq >0.2500
F-test:
Both populations are normal, two independent random samples.
Data:
①Point estimator:
②P.Q:
Definition: F-Distribution
Let ,, and , are independent. Then
Pivotal Quantity
Test Statistic:
At significance level , we will reject in favor of iif. or
Reject iif
Reject iff
P-value=2*tail area bounded by .
Oct. 21st .2008
Homework#3 8.5
SAS code (data step):
data clouds;
input group rainfall;
datalines;
1 1230
1 830
…
1 1
2 2746
2 1698
…
2 8
2 4
;
run;
- First check the normality of both populations
Shapiro-wilk test:
If p-value< , reject , which means the population is not normal.
SAS code:
procunivariatedata=clouds normalplot;
class group;
var rainfall;
run;
2. If the population is normal, use t-test.
F-test
Pooled-varianced:
Unequal-varianced (Satterthwaite):
Walch Satterthwaite method:
where
or another way to find df (less accurate and more conservative)
SAS code:
procttestdata=clouds;
class group;
var rainfall;
run;
3. If one of the two populations is not normal, we use the nonparametric test comparing “means” (medians) based on two independent samples. Use the Wilcoxan rank sum test.
SAS code:
procnpar1waydata=cloud;
class group;
var rainfall;
run;
4. Result:
group = 1
Tests for Normality
Test --Statistic------p Value------
Shapiro-Wilk W 0.597622 Pr < W <0.0001
Kolmogorov-Smirnov D 0.291111 Pr > D <0.0100
Cramer-von Mises W-Sq 0.713179 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 3.835941 Pr > A-Sq <0.0050
Normal Probability Plot
1300+ *
|
| * +++
700+ +++++++
| +++++++
| ++++++** * *
100+ * * * ****++*+****** **
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
Shapiro-Wilk p-value<0.0001 Not normal
Group = 2
Tests for Normality
Test --Statistic------p Value------
Shapiro-Wilk W 0.656238 Pr < W <0.0001
Kolmogorov-Smirnov D 0.296716 Pr > D <0.0100
Cramer-von Mises W-Sq 0.646113 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 3.391083 Pr > A-Sq <0.0050
Normal Probability Plot
2750+ *
| ++
| * * +++++++
| ++++++++
| ++++++++* *
250+ * * * * ****+******* * ***
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
Shapiro-Wilk p-value<0.0001 Not normal
Wilcoxon Two-Sample Test
Statistic 553.5000
t Approximation
One-Sided Pr < Z 0.0084
Two-Sided Pr > |Z| 0.0169
Two-sided p-value is 0.0169<0.05, reject.
*Suppose the problem 8.5 has the paired data:
SAS code:
data pairclod;
input seed unseed;
diff=seed-unseed;
datalines;
2746 1203
1698 830
...
8 5
4 1
;
run;
procunivariatedata=pairclod normalplot;
var diff;
run;