(B ) Find the Null Distribution of the P-Value for a Generic Hypothesis Test

Stat 502

Homework 2

Assigned 10/11/07

Due 10/18/07

1. Sample size: A researcher needs to estimate the mean circumference of a population of smalltrees in the Cascade mountains. The researcher would like to construct a confidence intervalfor the mean, and wants this interval to contain the true mean with probability at least .95and would like to estimate the mean with a precision of about 2 cm, i.e. the length of theconfidence interval should be no more than 2 cm. Studies from other regions suggest thestandard deviation in circumference is about 6.5 cm. How many trees do you recommend theresearcher sample? What assumptions are you making for this recommendation? Describemathematically the relation between the number of samples required and the precision.

2(*). Null distribution of p-values: Recall that a p-value is a function of the data, and so beforethe experiment is run it is a random variable.

(a) Consider the one-sample t-test for evaluating evidence against . Derive thedistribution of the p-value under the null hypothesis. Using the result, show that thetype I error of a level- test is .

(b**) Find the null distribution of the p-value for a generic hypothesis test.

(c*) You expect a small p-value when the null hypothesis is false. What does this tell you about the shape of the distribution of the p-value when the null hypothesis is false in comparison with your result for part (a).

3. Randomization test versus t-test: A chemist is testing the reaction time of a set of cells to twodifferent, but similar chemical compounds A and B. A set of 16 cell cultures are randomizedso that 8 receive A and 8 receive B. The reaction times in seconds are recorded.

(a) Make a histogram and boxplots for each of the two groups. Comment on the differences.

(b) Compute the means and medians of each group. Comment on the differences.

(c) Plot the density of the appropriate t-distribution if one were to use the ordinary twosamplet-test to evaluate differences between the groups. Obtain the corresponding p-value. Write down the assumptions which validate the use of this p-value, and commenton whether or not they are met for these data.

(d) Make a histogram of the randomization distribution of the t-statistic, and compute thecorresponding p-value. Write down the assumptions which validate the use of this p-value,and comment on whether or not they are met for these data.

4. Power calculation: Researchers are planning a clinical trial for testing a drug to treat ALS (Amyotrophic Lateral Sclerosis). They will randomize ALS patients to the standardtreatment (A) and patients to the new drug (B). The response will be thechange in muscle score from time of enrollment into the study to one year post-enrollment. Previous studies suggest that = −1.3 and = .89, indicating that on average for thesepatients, muscle score declines.

(a) It is thought that the new drug will give a mean response ranging from −.5 to .5, i.e. asmaller decline in muscle score. Compute a power curve, as a function of , foreach of using a type I error rate of = 0.05. Find the sample size foreach value so that the power is 0.75.

(b) Suppose that, unknown to the researchers, the variances are unequal, and that.Via simulation, normal approximation or otherwise compute the actual power of thetest under the three sample sizes and values of you found in (a), assuming that theresearchers use the standard two-sample t-test.