Seeing Statistics Sols, 6E

Full file at

SEEING STATISTICS APPLET EXERCISES

Applet 1: Influence of a Single Observation on the Median

1.1When we click on the right-most green dot and drag it so that the speed of the ninth car is 40 mph,

the median will be 30.0.

1.2When we click on the right-most green dot and drag it so that the speed of the ninth car increases from 40 mph to 100 mph, the median will continue to be 30.0.

1.3The highest possible value for the median is 30.0 mph. The lowest possible median speed is 29.0 mph.

1.4Even if the ninth car were a jet-powered supersonic vehicle moving at 1000 mph, the median would remain at its maximum possible value: 30.0 mph. On the other hand, the approximate value of the mean would be (1000 + the speeds of the other eight cars)/9, or nearly 140 mph! Of these two sample descriptors, the mean has been more greatly affected by this extreme value in the data.

Applet 2: Scatter Diagrams and Correlation

2.1With the slider in the center position, the best-fit straight line will be horizontal and the coefficient of correlation will be zero.

2.2When the slider is at the extreme right position, the coefficient of correlation will be +1.0.

The best-fit straight line will slope upward and all of the points will fall on the line.

2.3We position the slider to the right of the center position. Some fine adjustment may be necessary to reach the r = +0.60 value or very close to it. The best-fit line will slope upward and there will be more “scatter” of points above and below the line.

2.4Clicking the “Switch Sign” button will cause the best-fit line to slope downward, the value of r will now become negative. The amount of scatter of points about the line will be the same as in Applet Exercise 2.3.

2.5With the slider positioned at the far left, r = -1.0, the best-fit line slopes downward, and all of the points are on the line. As the slider is gradually moved to the far right, the best-fit line “rotates” in a counter-clockwise direction. The value of r approaches 0, becomes positive, and will be + 1.0 when the slider arrives at the far-right position. The amount of scatter increases, then decreases along the way.

Applet 3: Sampling

3.1Results will vary. For example, the sample proportion may have been less than the actual population value 8 times, equal to it 1 time, and greater than it 11 times.

3.2Again, keep in mind that the results of Applet Exercise 3.1 will vary. Although the average difference would theoretically be expected to be zero, the actual sampling experience may lead to an average difference that is slightly greater than or slightly less than zero.

Applet 4: Size and Shape of Normal Distribution

4.1When we position the top slider at the far left, then gradually move it to the far right, the red distribution moves to the right and its mean gradually increases from approximately -10.0 to approximately +8.0.

4.2When we use the bottom slider to change the standard deviation so that it is greater than 1, the red curve “flattens out” compared to that of the blue curve.

4.3Positioning both sliders to the far left, the red curve will move to the extreme left and it will become very narrow. Its approximate mean and standard deviation will be -10.0 and 0.5, respectively, compared to the 0 and 1 values for the standard normal distribution.

Applet 5: Normal Distribution Areas

5.1With a mean of 130 hours and a standard deviation of 30 hours, then moving the left boundary to 70 and the right boundary to 190, the probability in the top row will be 0.95. This is the probability that a randomly selected plane will have flown between 70 and 190 hours during the year. In the second row of the display, the values of z corresponding to 70 hours and 190 hours will be z = -2.0 and z = +2.0, respectively.

5.2With the mean and standard deviation set at 130 and 30, respectively, and the left and right boundaries of the shaded area at 130 and 190, the upper line in the display shows 0.48 as the probability that a randomly selected plane will have flown between 130 and 190 hours during the year. The corresponding values of z are z = 0.0 and z = 2.0.

5.3With the left and right boundaries at 140 and 170, respectively, the probability that a randomly selected plane will have flown between 140 and 170 hours during the year is shown in the top line of the display as 0.28. The corresponding values of z are z = 0.32 and z = 1.33.

5.4 With the left and right boundaries corresponding to z = 0 and z = +1.0 (or as close to 1.0 as possible), the probability associated with the shaded area is shown as 0.34.

Applet 6: Normal Approximation to Binomial Distribution

6.1With n = 15 and  = 0.6, the probability that there are no more than k = 9 females in the sample of 15 walkers is shown in the “Prob” box as 0.5968. The corresponding probability using the normal approximation to the binomial distribution is displayed as 0.6039, a difference of just 0.0071.

6.2With n = 5 and  = 0.6, we find the actual binomial probability that there are no more than k = 3 females in the sample of 5 to be 0.663. The corresponding probability using the normal approximation is 0.676, a difference of 0.013. With the smaller sample size, the normal approximation is a little less close.

6.3With n = 100 and  = 0.6, we find the actual binomial probability that there are no more than k = 60 females in the sample of 100 to be 0.5379. The corresponding probability using the normal approximation is 0.5406, a difference of 0.0027. With the larger sample size, the normal approximation has become a little closer.

6.4Repeating Applet Exercise 6.1 for k values of 6 through 12, in each case identifying the actual binomial probability that there will be no more than k females in the sample of n = 15:

k / binomial probability
that x  k / normal approximation probability that x  k / difference, binomial - normal approximation
6 / 0.0950 / 0.0938 / 0.0012
7 / 0.2131 / 0.2146 / -0.0015
8 / 0.3902 / 0.3961 / -0.0059
9 / 0.5968 / 0.6039 / -0.0071
10 / 0.7827 / 0.7854 / -0.0027
11 / 0.9095 / 0.9062 / 0.0033
12 / 0.9729 / 0.9675 / 0.0054

Applet 7: Distribution of Means: Fair Dice

7.1Generating 3000 rolls of a single die, using the “Sample Size = 1” applet version. The heights of the six bars will be fairly even, though not perfectly so. With a fair die, such a shape is close to the level distribution we would expect for the distribution of the “means” when each sample is just 1 die.

7.2Generating 3000 samples, each one representing a set of three dice. The distribution of sample means will now take on more of a symmetric bell shape with a mean that is close to the expected value of 3.5 for samples involving fair dice.

7.3Generating 3000 samples, each one representing a set of twelve dice. The distribution of sample means will tend to become even more narrow and more symmetrical, again having a mean that is very close to the expected value of 3.5 for samples involving fair dice.

7.4If there were an additional applet version that allowed each sample to consist of 100 dice, and we generated 2000 of these samples, the distribution of sample means would take on a distribution very close to the normal distribution. According to the central limit theorem, as the sample sizes become larger, the distribution of sample means will approach the normal distribution.

Applet 8: Distribution of Means: Loaded Dice

8.1Generating 3000 rolls of a single loaded die, using the “Sample Size = 1” applet version. The heights of the six bars will be very uneven, as one might expect when the die is weighted to favor one or more of the sides.

8.2Generating 3000 samples, each one representing a set of three loaded dice. The distribution of sample means will tend to be strongly skewed and will not have a mean that is close to 3.5, the expected mean for samples involving fair dice.

8.3Generating 3000 samples, each one representing a set of twelve loaded dice. The distribution of sample means will become more narrow, and it will tend to become a little more symmetrical. However, it will continue to have a mean that is not very close to 3.5, the expected mean for samples involving fair dice.

8.4If there were an additional applet version that allowed each sample to consist of 100 dice, and we generated 2000 of these samples, the distribution of sample means would tend to be relatively normally distributed, even though the underlying distribution of values is decidedly non-normal.

Applet 9: Confidence Interval Size

9.1With the slider positioned so as to specify a 95% confidence interval for , the upper and lower confidence limits are displayed as 1.381 and 1.419, respectively.

9.2When we move the slider so that the confidence interval is 99%, the confidence interval is now wider. The upper and lower confidence limits are displayed as 1.3751 and 1.4249, respectively.

9.3When we move the slider so that the confidence interval is 80%, the confidence interval becomes more narrow.

9.4As we gradually move the slider from the extreme left position to the extreme right position, both the confidence level and the width of the confidence interval increase.

Applet 10: Comparing the Normal and Student t Distributions

10.1With the slider positioned at df = 5, the shape of the t distribution is flatter and wider than that of the standard normal distribution.

10.2When the slider is moved downward so that df = 2, the t distribution becomes even more flat and wide compared to the shape of the standard normal distribution.

10.3When the slider is moved upward so that df increases from 2 to 10, the t distribution becomes less flat and less wide, more closely approaching the shape of the standard normal distribution.

10.4As the slider is moved upward from df = 2 to df = 100, the t distribution becomes less flat and less wide, and it becomes more and more difficult to differentiate between the two distributions.

At the df = 100 value, the two curves are practically identical.

Applet 11: Student t Distribution Areas

11.1With the slider set so that df = 9 and the left text box containing t = 3.25, the area beneath the curve between t = -3.25 and t = +3.25 is 1.00 - 0.01, or 0.99.

11.2Gradually moving the slider upward until df = 89, we see that the t value shown in the text box decreases from 3.25 to 2.63.

11.3We position the slider so that df = 2, then gradually move it upward until df = 100. The t value decreases and the curve becomes more tall and more narrow, approaching the shape of the normal distribution.

11.4We position the slider so that df = 9, then enter 0.10 into the two-tail probability text box at the right. The t value in the left text box becomes t = 1.83. This corresponds to a right-tail area of 0.05. Referring to the t table that precedes the inside back cover of the book, the corresponding table value of t is t = 1.833.

Applet 12: z-Interval and Hypothesis Testing

12.1When the applet initially loads, the sample mean is displayed as 1.3229 and the 95% confidence interval limits for  are displayed as 1.3142 and 1.3316. Based on this confidence interval, it would seem believable that the true population mean might be 1.325 minutes, because this value falls within the limits.

12.2When we use the slider to increase the sample mean to approximately 1.330 minutes, the 95% confidence interval limits for  are displayed as 1.3213 and 1.3386. Based on this confidence interval, it would seem believable that the true population mean might be 1.325 minutes, because this value falls within the limits.

12.3When we use the slider to decrease the sample mean to approximately 1.310 minutes (e.g., 1.3099 may be the closest you can get), the 95% confidence interval limits  are displayed as 1.3013 and 1.3186. Based on this confidence interval, it would not seem believable that the true population mean might be 1.325 minutes, because this value does not fall with the limits.

Applet 13: Statistical Power of a Test

13.1With the left and right sliders set so that  = 0.10 and n = 20, we move the bottom slider so that the actual  is as close as possible to 10 without being equal to 10. For example, for an actual  = 10.01, the power of the test is 0.10. This is the value we would expect, since  = 0.10.

13.2With the left, right, and bottom sliders set so that  = 0.05, n = 15, and  = the same 10.01 value we selected in Applet Exercise 13.1, we find that moving the left slider upward and downward results in  and the power of the test increasing and decreasing together, and that they continue to have the same numerical value.

13.3With the left and bottom sliders set so that  = 0.05 and  = 11.2, we move the right slider upward and downward to change the sample size for the test. The power of the test corresponding to each of the selected values of n (2, 10, 20, 40, 60, 80, and 100) are displayed as 0.12, 0.39, 0.67, 0.92, 0.99, 1.0, and 1.0, respectively.

13.4With the right and bottom sliders set so that n = 20 and  = 11.2, we move the left slider upward and downward to change the  level for the test. The power of the test corresponding to each of the selected values of  (0.01, 0.02, 0.05, 0.10, 0.20, 0.30, 0.40, and 0.50) are displayed as 0.43, 0.53, 0.67, 0.77, 0.87, 0.91, 0.94, and 0.96, respectively.

Applet 14: Distribution of Difference Between Sample Means

14.1Setting the top slider so that the difference between the population means is -3.0, the slider at the upper right so that the standard deviation of each population is 2.5, and the slider at the lower right so that n1 = n2 = 20:

a.There is quite a lot of overlap between the two population curves at the top of the applet.

b.There is practically no overlap at all between the two sampling distribution curves in the center part of the applet.

c.Viewing the bottom portion of the applet, and assuming that a sample is going to be taken from each of the two populations, it seems very unlikely that (1 - 2) will be > 0. Note: In the applet, the red curves represent population 2 (top section) and the sampling distribution of the means from population 2 (center section), respectively. The green curve at the bottom represents the sampling distribution of (1 - 2).

14.2Repeating Applet Exercise 14.1, but with the top slider set so the difference between the sample means is +0.5:

a.There is so much overlap between the two population curves at the top of the applet that they nearly coincide.

b.There is also a considerable amount of overlap between the two sampling distribution curves in the center part of the applet.

c.Viewing the bottom portion of the applet, and assuming that a sample is going to be taken from each of the two populations, it quite possible that (1 - 2) will be > 0.

14.3Repeating Applet Exercise 14.1, but with the top slider set so the difference between the sample means is 3.0:

a.There is quite a lot of overlap between the two population curves at the top of the applet.

b.There is practically no overlap at all between the two sampling distribution curves in the center part of the applet.

c.Viewing the bottom portion of the applet, and assuming that a sample is going to be taken from

each of the two populations, it seems extremely likely (1 - 2) will be > 0.

14.4Using the top slider to gradually change the difference between the population means from

-0.5 to 4.5: In the set of curves at the top, the underlying population represented by the red curve shifts to the right. In the sampling distributions in the center section, the sampling distribution of means from the red-curve population shifts to the right. In the bottom of the applet, the green curve representing the sampling distribution of (1 - 2) also shifts to the right.

14.5Using the upper right slider to gradually increase the population standard deviations from 1.1 to 3.0:

This tends to spread out and increase the amount of overlap between the population curves in the top of the applet, to spread out and increase the amount of overlap between the sampling distribution curves in the center of the applet, and to spread out the sampling distribution of (1 - 2) in the bottom of the applet.

14.6Using the lower right slider to gradually increase the sample sizes from 2 to 20: There is no change in the underlying populations in the upper portion of the applet. Each of the sampling distribution curves in the center portion of the applet becomes narrower, and the amount of overlap between the two curves tends to decrease. In the bottom portion of the applet, the sampling distribution of

(1 - 2) becomes more narrow.

Applet 15: F Distribution and ANOVA

15.1When we use the left slider to increase the number of degrees of freedom for the numerator of the

F ratio to df1 = 5, then to df1 = 10, the F distribution curve tends to flatten out slightly and extend further to the right.

15.2After using the left and right sliders to set the degrees of freedom back to df1 = 2 and df2 = 7,

we use the right slider to set the number of degrees of freedom for the denominator to df = 10,

then to df = 15. The F distribution curve tends to flatten out and extend further to the right.

15.3After using the left and right sliders to set the degrees of freedom back to df1 = 2 and df2 = 7,

we use the left text box to increase the F value to 9.55. (Be sure to press the Enter or Return key after changing the text box entry.) The probability changes to 0.01.

15.4After using the left and right sliders to set the degrees of freedom back to df1 = 2 and df2 = 7,

we use the left text box to return the F value to 6.54, then the right text box to change the probability to 0.01. The F value is now displayed as 9.57. The sharp-eyed reader will note that the 9.57 F value in this solution differs slightly from the 9.55 F value that was entered in Applet Exercise 15.3. Both correspond to a probability of 0.01 (with the probability rounded to two decimal places). In Applet Exercise 15.3, we entered the exact value F = 9.55 and got a rounded probability of 0.01. In this exercise, we entered the exact probability 0.01 and got a rounded F value of 9.57.

Applet 16: Interaction Graph in Two-Way ANOVA

16.1We center all three sliders so that the value at the far right of each slider scale is a zero. Next, we slide the top slider to the right and set it at +10 to increase the difference between the row means. The two lines representing the row 1 and row 2 effects are parallel, with the “R2” line 10 points above 100 and the “R1” line 10 points below 100. When we slide the top slider further to the right, to +20, the lines remain parallel but the “R2” and “R1” are now at the 120 and 80 levels, respectively. We have made the row 2 effect stronger and the row 1 effect weaker.

16.2We center all three sliders so that the value at the far right of each slider scale is a zero. Next, we slide the middle slider to the right and set it at +10 to increase the difference between the column means. The line in the graph slopes upward, as we have increased the effect of column 2 and decreased the effect of column 1. When we move this slider further to the right, to +20, the column-effect advantage of column 2 becomes more pronounced, and the line gets even steeper.