Statistics 215Lab 3 Solutions

Statistics 215Lab 3 Solutions

1. Doritos

(a) Do these data satisfy the assumptions for inference? The inference here concerns an unknown mean. We assume the 6 bags constitute a random sample. The 10% condition is clearly satisfied. And to assess whether the data are nearly normal, we consider the stemplot.

Stem-and-Leaf Plot

Stem & Leaf

28 .

28 . 579

29 . 1

29 . 59

The data are reasonably symmetric with no outliers.

(b) From the SPSS output, the mean is 28.98 grams; the standard deviation is 0.36 grams.

(c) The confidence intervals are computed using t- procedures. The critical value is computed using 6-1=5 degrees of freedom. From the table, these values are 2.571 for a 95% confidence level, and 2.015 for a 90% confidence level. The standard error for theestimator is 0.147. Thus for the 95% confidence interval we have

28.98 ± 2.571(.147) = 28.98 ± .38, giving (28.60, 29.36). The 90% confidence interval is

28.98 ± 2.015(.147) = 28.98 ± .30, giving (28.68, 29.28).

(d) We are 95% confident that the average weight of the Doritos bags is between 28.60 and 29.36 grams. We are 90% confidence that that weight is between 28.68 and 29.28 grams.

(e) To comment on the company’s stated weight, we conduct a hypothesis test:

Ho: µ = 28.3 versus Ha: µ ≠ 28.3. Here is the SPSS output from a one-sample t-test.

The test statistic value is t = 4.648. The P-value for the (two-sided) test is 0.006. This is strong evidence against the null hypothesis. The company appears to be erring on the safe side, putting in slightly more chips than stated.

2. On-line testing

For the on-line testing problem we are in the matched pairs setting. The samples of the two tests are not independent as they were both given to the same people. We assume that the subjects are a representative sample. To check the nearly normal condition we need to look at the distribution of the differences is test scores. For (a), the histogram of differences between Test A and Test B are

We see that the data are roughly symmetric and unimodal with no outliers. The hypothesis test is Ho: µd = 0 versus Ha: µd≠ 0, where µdis the mean of the population difference in the Test A and Test B scores. We perform the matched pairs test in SPSS and obtain this output.

The P-value for the test is 0.708, which is much too large to reject Ho. There is no evidence that the tests differ in mean difficulty.

(b) For testing environment (paper versus) we conduct the same test but with the data paired appropriately. The distribution of differences is described by the histogram.

There is no apparent violation of the near-normality condition. Note that the boxplot

for these data show three outliers, but they are somewhat symmetric. Letting µd represent the

mean (population) difference between paper and online test scores, and again using the hypotheses Ho: µd = 0 versus Ha: µd≠ 0, SPSS gives

with a P-vale of 0.803, showing no evidence of a difference in mean scores. Just to make sure we redo the test with the three outliers removed, giving

The P-value of 0.256 still shows no evidence of a significant difference in means. We conclude that the testing environment does not affect mean score.

3. Of mice and iron.For this study we are in the two-sample framework for inference involving means. The data are not matched. The study is based on random selection of mice, and it is reasonable to assume that the two groups of mice were selected independently of each other. Thus we want to conduct an independent two-sample t-test. We check the near-normality condition on the two groups, looking at histograms and boxplots.

The data are skewed and have outliers. Each group has size n = 18, so we are on the borderline.

We will conduct the hypothesis test both with and without the outliers. Let µ3 denote the mean percentage of iron retained in the population of mice that would take Fe3+; and let µ4 be defined similarly. We test Ho: µ3-µ4= 0 versus Ha: µ3-µ4≠ 0. The SPSS output for the independent two-sample t-test shows

The P-value is 1% showing that the results are significant. Since the mean difference is negative we conclude that there is evidence that mice have higher retention of Fe4+ than Fe3+. Note SPSS also gives a 95% confidence interval for this mean difference: (-3.90, -.58), giving a measure of the magnitude of the difference.

Rerunning the test without the three outliers gives

The test results are still significant at the 1% level giving further support to our conclusions.

4. Highway fatalities

For the highway safety problem we are in the two sample setting, making an inference on the mean difference in fatality rates between those states that increased their speed limits and those that did not. The histograms of the two groups show that the nearly normal condition is satisfied.

The boxplot indicates that Washington, D.C. is an outlier.

This is troubling for the analysis because D.C. is an urban area and has different characteristics from the other 50 states. Further, the presence of the outlier could tilt the findings in the direction of showing a significant difference in the two groups. So we will proceed with the test but repeat the analysis with the outlier removed. Here the hypothesis test is Ho: µYes -µNo = 0 versus

Ha: µYes -µNo 0. Note that the form of the alternative is due to the researchers’ question of whether there was an increase in fatalities for those states that increased their speed limits. The SPSS output for the independent two-sample test is

The P-value is .002 (half of the stated P-value since the test is one-sided). There is strong evidence that the increase in fatality rate is “real” for those states that increased the speed limits. The reported 95% confidence interval of (5.83, 38.80) gives a measure of the magnitude of the increase in the fatality rate.

With D.C. removed we obtain

The resulting P-value is 0.005, still highly significant. The confidence interval changes slightly to (3.45, 33.25).