DPHS 568, Summer 2006, Homework #2 Solutions

a) The bell-shaped histogram is the probability histogram for the averages (central limit theorem)

b)The histogram with the two equal-height bars is the contents of the urn (equal numbers of 1's and 4's

c)The remaining histogram with three unequal bars portrays the distribution of the actual sample of 50 drawn. We would expect this to be close to the b) histogram, but perhaps not exactly because of the random sample. mean = 9.667 days, median = 8 days, standard deviation = 6.88 days, range = 27 (or 3 to 30).

  1. If there really is no true effect, then the hypothesis tests each have 5% chance of labeling a difference “statistically significant”. Thus, the probability of at least one can be found by using a binomial distribution with n=5, p=0.05. P(X > 0) = 1 – P(X = 0) = 1 – 0.7738 = 0.2262.

a)The best estimate of probability of the drug being effective is

b)The 95% confidence interval for the true effectiveness rate is = (0.102, 0.273) = (10.2%, 27.3%). We can be fairly certain (95% confident) that the true effectiveness rate is inside this interval. Since the entire interval is above 10% (barely), we can be fairly certain that the true effectiveness rate is greater than 10% and thus more effective than placebo.

a)The standard error of the mean is estimated by .

b)The 95% confidence interval for the mean is given by

.

c)We want to test a one-sided hypothesis

H0: μ = 0, versus H1: μ < 0,

where μ is the mean difference in blood pressure.

Use the statistic .

Because we are testing a one-sided hypothesis where the alternate hypothesis is on the low side, we want to compare –3.96 to t79, .05 =-1.66. Since –3.96 < -1.66, we reject at the α=.05 significance level. The one-sided p-value is P(t79 < -3.96) = P(t79 > 3.96) = 8.15E-5 = 0.00008 (via Excel). Using the coursepack tables you could note that 3.96 is greater than 3.44, which would be greater than the 99.95th percentile of a t79 distribution, thus from this info we would know that p < .0005, since is P(t79 > 3.96) < P(t70 >3.96) = 1-.9995 = .0005.

d)The hypothesis test tells us it is very unlikely that we could have seen this much of a difference in the sample if there really were no difference. The confidence intervals give us an estimate of what the true difference may be.

e)We would want to use a two-sample, independent samples t-test to compare the means. Equal variances assumption would be reasonable. The hypotheses to be tested should be

H0: μ1- μ2 = 0, versus H1: μ1- μ2 ≠ 0.

.

P-value = P(|t158 |> 2.56) = 2 ∙P(t158 > 2.56) < .02, since 2.56 is greater than 2.35, the 99th percentile of the t158 distribution. Also note the p-value is greater than .01 since 2.56 < 2.60, which is less than the 99.5th percentile of the t158 distribution. From Excel the p-value = 0.0114.

  1. To test the hypothesis that the high-fiber diet has an effect on the cholesterol levels as compared with baseline, we would focus on the average differences of high-fiber cholesterols minus baseline cholesterols. The null hypothesis of no effect can be thought of as the mean of the differences is zero. To evaluate this hypothesis test we can use the hypothesis test-confidence interval relationship.

a)The second to last column in Table 1 presents the mean and 95% confidence interval for the differences within individual of cholesterol levels after high-fiber diet and at baseline. The mean difference was –14 mg/dL and the 95% confidence interval was (-21, -7) mg/dL. Zero is not contained in the confidence interval which implies that the test with null hypothesis H0: μ = 0, would be rejected at the α=.05 significance level. This, in turn, implies that the p-value would be less than .05.

b)When comparing the low-fiber diet levels to baseline levels, the confidence interval is (-20, -6) mg/dL. Again, the 95% confidence interval does not contain zero, so the null hypothesis of no (zero) difference would be rejected with p < .05.

c)When comparing high-fiber to low-fiber the 95% confidence interval for the mean difference in cholesterol levels is (-8, +7) mg/dL, which does contain zero. Thus, an hypothesis test at α=.05 would not reject the null hypothesis of zero difference. Equivalently, p > .05.

d)To estimate the standard error, one must use the formula for computing the confidence interval bounds. We know the upper bound of the confidence interval is given by . We are given that , and the upper bound is +7. Note that the standard error is , and . Thus . Note that since n = 20, this gives s ≈ 17. Also note that if you used the lower bound instead you would get a slightly different answer because this confidence interval, as stated is not symmetric about the mean as it should be. Unless they computed these confidence intervals incorrectly, the reason this one appears non-symmetric is probably because of rounding away too many decimals.

e)This question requires estimating minimum sample size. Using s = 17 mg/dL as an estimate for σ, we use the sample-size formula from chapter 10 to get

Note Z.90 = 1.282. So we would need 62 subjects to have 90% power to find a difference if the true difference were 7 mg/dL (or more).

  1. To compare fish consumption levels between cases and controls, we would want to use a two-sample, independent samples t-test to compare the means. From homework 1 we know that means of cases and controls are 5.4 and 3.6, respectively. The sample standard deviations are 2.4 and 2.1. Equal variances assumption should be reasonable. For the remainder of this explanation I’ll refer the cases as group 1 and controls group 2. The hypotheses to be tested should be

H0: μ1- μ2 = 0, versus H1: μ1- μ2 ≠ 0.

.

P-value = P(|t16 |> 1.67) = 2 ∙P(t16 > 1.67) > .10, since 1.67 is less than 1.75, the 95th percentile of the t16 distribution. Also note the p-value is less than .20 since 1.67 > 1.34, the 90th percentile. From Excel the p-value = 0.114.

The 95% confidence interval is

Conclude that there is not good evidence to conclude the means are different.

Here’s the output utilizing SPSS. Note the differences due to my rounding of the initial values in the above calculations.