AMS572.01 Final Exam Fall, 2010

Name ______ID ______Signature______AMS Major? ______

Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please enter “Yes” or “No” for “AMS Major”. Please provide complete solutions for full credit. The exam goes from 2:15 - 4:45pm. Good luck!

1.  (for all) The following data set from a study by the well-known chemist and Nobel Laureate Linus Pauling gives the incidence of cold among 279 French skiers who were randomized to the Vitamin C and Placebo groups.

Group / Cold
Yes / No
Vitamin C / 17 / 122
Placebo / 31 / 109

(a)  Construct a 95% confidence interval for the difference between the two incidence rates;

(b)  Please test whether the incidence rates for the Placebo group is significantly higher than that of the Vitamin C group at the 5% level of significance. Please report the p-value of your test.

(c)  Please write up the entire SAS program necessary to answer question raised in (b), including the data step.

Answer:

(a)  VC: ; Placebo: ;

The 100(1-α)% confidence interval for (p1 - p2) is

After plugging in Z0.025 = 1.96 etc., we found the 95% CI to be [-0.187, -0.011]

(b) This is problem 9.12 in our text book. (*It is also OK, in fact better, if they used the pooled proportion in the denominator. *It is also OK if they did a 2-sided test.)

(c) SAS code:

Data cold;

Input group $ outcome $ count;

Datalines;

VC yes 17

VC no 122

Placebo yes 31

Placebo no 109

;

Run;

Proc freq data=cold;

Tables group*outcome/chisq;

Weight count;

Run;

2.  (for all) People at high risk of sudden cardiac death can be identified using the change in a signal averaged electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave correct results on 46 patients.

(a)  Is this convincing evidence that the new method is more accurate? Please test at α =.05.

(b)  If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the new method is better at α =.05?

(c)  How many patients should be tested in order for this power to be at least 0.75?

Answer: This is problems 9.7 & 9.8 in our text book.

3.  (for all) A classic tale involves four car-pooling students who missed a test and gave as an excuse of a flat tire. On the make-up test, the professor asked the students to identify the particular tire that went flat. If they really did not have a flat tire, would they be able to identify the same tire? To mimic this situation, 40 other students were asked to identify the tire they would select. The data are:

Tire / Left front / Right front / Left rear / Right rear
Frequency / 11 / 15 / 8 / 6

(a)  Is At α=0.05, please test whether each tire has the same chance to be selected.

(b) Please write up the entire SAS program necessary to answer question raised in (a), including the data step.

Answer. This is a problem from our lecture notes 12.

(a) 

n=40, =n=10

Fail to reject .

(b)  DATA TIRE;

INPUT location $ NUMBER;

DATALINES;

LF 11

RF 15

LR 8

RR 6

;

* HYPOTHESIZING A 1:1:1:1 RATIO;

PROC FREQ DATA=TIRE ORDER=DATA; WEIGHT NUMBER;

TITLE3 'GOODNESS OF FIT ANALYSIS';

TABLES location / CHISQ NOCUM TESTP=(0.25 0.25 0.25 0.25);

RUN;

4.  (for all) The effect of caffeine levels on performing a simple finger tapping task was investigated in a double blind study. Thirty male college students were trained in finger tapping and randomly assigned to receive three different doses of caffeine (0, 100, or 200 mg) with 10 students per dose group. Two hours following the caffeine treatment, students were asked to finger tap and the numbers of taps per minute were counted. The data are tabulated below.

Caffeine Dose / Finger Taps per Minute
0 mg / 242 / 245 / 244 / 248 / 247 / 248 / 242 / 244 / 246 / 242
100 mg / 248 / 246 / 245 / 247 / 248 / 250 / 247 / 246 / 243 / 244
200 mg / 246 / 248 / 250 / 252 / 248 / 250 / 246 / 248 / 245 / 250

(a)  Construct an ANOVA table and test if there are significant differences in finger tapping between the groups at α =.05.

(b)  Compare the finger tapping speed between the 0 mg and the 200 mg groups at α =.05. List assumptions necessary – and, please perform tests for the assumptions that you can test in an exam setting.

(c)  Please write up the entire SAS program necessary to answer question raised in (a), including the data step.

(d)  Please write up the entire SAS program necessary to answer question raised in (b), including the data step, and the tests for all assumptions necessary.

Answer:

(a)  This is Problem 12.2(b) in our text book, one-way ANOVA. We are testing whether the mean tapping speed in the three groups are equal or not. That is:

(b)  This is inference on two population means, independent samples. The first assumption is that both populations are normal. The second is the equal variance assumption which we can test in the exam setting as the follows.

Group 1 (dose 0 mg): , ,

Group 2 (dose 200 mg): , ,

Under the normality assumption, we first test if the two population variances are equal. That is, versus . The test statistic is

, .

Since F0 < 3.18, we cannot reject H0 . Therefore it is reasonable to assume that.

Next we perform the pooled-variance t-test with hypotheses versus

Since is smaller than , we reject H0 and claim that the finger tapping speed are significantly different between the two groups at the significance level of 0.05.

(c)  data finger;

input group taps @@;

datalines;

0 242 0 245 0 244 0 248 0 247 0 248 0 242 0 244 0 246 0 242

1 248 1 246 1 245 1 247 1 248 1 250 1 247 1 246 1 243 1 244

2 246 2 248 2 250 2 252 2 248 2 250 2 246 2 248 2 245 2 250

;

run;

proc anova data = finger;

class group;

model taps = group;

means group/tukey;

run;

/*the means step is not necessary for the given problem.*/

(d)  data finger2;

set finger;

where group ne 1;

run;

proc univariate data = finger2 normal;

class group;

var taps;

run;

proc ttest data = finger2;

class group;

var taps;

run;

proc npar1way data = finger2;

class group;

var taps;

run;

/* the data step from part (d) follows immediately after that from part (c).*/

/* alternatively, one can save the data finger as a permanent sas data, and then you can use that later*/

5A. (for AMS majors) Suppose we have two independent random samples from two normal populations: , and .

(a)  At the significance level α, please construct a test using the pivotal quantity approach to test whether or not. (*Please include the derivation of the pivotal quantity, the proof of its distribution, and the derivation of the rejection region for full credit.)

(b)  At the significance level α, please derive the likelihood ratio test for testing whether or not. Subsequently, please show whether this test is equivalent to the one derived in part (a).

Answer:

(a)  Here is a simple outline of the derivation of the test: versus using the pivotal quantity approach.

[1]. We start with the point estimator for the parameter of interest: . Its distribution is using the mgf for which is , and the independence properties of the random samples. From this we have . Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.

[2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled-variance t-statistic. We found that using the mgf for which is , and the independence properties of the random samples.

[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: , where is the pooled sample variance.

[4]. The rejection region is derived from , where . Thus . Therefore at the significance level of α, we reject in favor of iff

(b)  Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for:

Let , then,

={},

, and there are two parameters .

, for it contains two parameters, we do the partial derivatives with and respectively and let the partial derivatives equal to 0. Then we have:

, and there are three parameters.

We do the partial derivatives with and respectively and let them all equal to 0. Then we have:

At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have:

where is the test statistic in the pooled variance t-test. Therefore, is equivalent to ||. Thus at the significance level α, we reject the null hypothesis in favor of the alternative when c =. This shows that the pivotal quantity approach and the likelihood ratio test approach are equivalent in this case.

5B. (for non AMS majors) We have two independent samples and , where and . For the hypothesis of

(a)  Please derive the general formula for power calculation for the pooled variance t-test based on an effect size of EFF at the significance level of α.

Recall - Definition: Effect size = EFF =|| (e.g. Eff=1)

(b)  With a sample size of 20 per group, α = 0.05, and an estimated effect size ranging from 0.8 to 1.2, please calculate the power of your pooled variance t-test.

Answer:

(a)  T.S : =

At α=0.05, reject in favor of iff

Power = 1-β = P(reject |) =

=

=

≈ (Effect size =)

(b)  With n = 20, α = 0.05, Eff = 0.8 to 1.2, the power is calculated as follows:

Power (Eff = 0.8) =

Power (Eff = 1.2) =

Note: the T statistic above follows a t-distribution with 38 (=20+20-2) degrees of freedom.

Therefore we conclude that the power will range from 80% to 98% for a given effect size of 0.8 to 1.2.

6.  (extra credit for all students) Suppose we have two independent random samples from two normal populations i.e., , and . Furthermore, suppose, although their values are unknown. Please prove whether the one-way ANOVA F-test is equivalent to the pooled variance t-test (2-sided) or not.

Answer:

That’s all, class; I wish you a very happy holiday season and winter vacation!

5