SOLUTIONS TO ACTIVITY SET 1

Activity 1.1 Suppose a medical researcher compares the average blood pressures of women who take oral contraceptives to the blood pressures of women who do not.

(a) Is this an observational study or a randomized experiment? Explain.

Observational. Almost certainly, the researchers only surveyed women on oral contraceptive use and blood pressure

(b) In the relationship between blood pressure and oral contraceptive use (or not) which of the two variables is the “response variable” and which is the “explanatory variable?”

Response = blood pressure. Explanatory = Oral contraceptive use.

(c) Is blood pressure a categorical variable or a quantitative variable?

Quantitative

(d) Is oral contraceptive use (or not) a categorical variable or a quantitative variable?

Categorical

(e) What variables that affect blood pressure might confuse the comparison of average blood pressures for users and nonusers? That is, what factors affecting blood pressure might differ for users and nonusers. Explain.

Answers may vary. One possibility is age. Age affects blood pressure and also there may age differences between users and non-users of oral contraceptives.

Activity 1.2 A statistics class at UC Davis was asked “About how many hours do you watch television per week? A five-number summary of the responses from 173 students follows.

Median 6

Quartiles 2 12.5

Extremes 0 100

(a) What was the median hours of weekly television watching? In the context of this situation, write a sentence that interprets the median.

Median = 6. About 50% of the class watched less than 6 hours per week and about 50% watched more.

(b) Give the value that completes the following sentence. About 1/4 of the students watch less than ___ hours of television per week.

2 (lower quartile) or Q1

(c) Give the value that completes the following sentence. About 1/4 of the students watch more than ___ hours of television per week.

12.5 (upper quartile) or Q3

(d) What is an interval that describes the middle 1/2 of the student’s television watching amounts?

2 to 12.5 (between the quartiles), Q3 – Q1

(e) The mean for these data is 8.9 hours per week.

How do you think the mean is calculated? Sum of all values divided by total number of values

Why do you think it is larger than the median in this instance? may be an outlier and generally the data are more stretched toward high values

Activity 1.3 At the course website, access the Datasets folder. Within this folder, click on the link for the data set named U.S. Smoking (Minitab file). This should cause a program named Minitab to open, with the data in place. The data are estimates of the percentage of adults who smoke in each state of the U.S. (and also District of Columbia).

a. In Minitab, use Graph>Stem-and-Leaf to create a stemplot of the percents that smoke in the 50 states and Washington D.C. In the dialog box, double click on the name of the second column to enter it as the variable you want to plot.

1 1 2

1 1

2 1 6

5 1 999

15 2 0001111111

(18) 2 222222222233333333

18 2 44444

13 2 6666666777

3 2 89

1 3

1 3 2

About where do most states fall, in terms of percent smoking? In the 22 to 23% smoking range

About what is lowest percent in the dataset? From the stemplot it is 12% corresponding to the 12.7% for UTAH

About what is the highest percent in the dataset? 32% corresponding to the 32.6% for KENTUCKY

What do you notice about the values in the worksheet and the values displayed in the stemplot? The values in the stemplot have been truncated. That is, the decimal place has been dropped from the data in the worksheet when creating the stemplot.

How would you describe the shape of this data? Roughly symmetric

b. Use Stat>Basic Statistics>Display Descriptive Statistics to determine summary statistics for the percents. As in part d, double click the name of the second column to list it as the variable we’re analyzing. Inspect the output, to find these values:

Mean Percent = 23.353

Standard Deviation = 3.327

Median percent = 23.100

lower quartile (denoted by Q1) = 21.500

upper quartile ( denoted by Q3) = 26.000

c. Write a sentence that interprets the median in the context of this situation.

One interpretation: The smoking percent is less than 23.1% in about one-half of the states and greater than 23.1% in the other half.

d. What value completes the following sentence? In about 1/4 of the states, the percent that smokes is less than _21.5 (Q1)____.

e. What interval includes the middle 1/2 of the values of the state smoking percentages?

21.5 to 26% (between the Q1 and Q3)

f. Use Calc>Calculator to manipulate the data in column 2. In the Store Result in Window type in ‘Plus10’ and in the Expression Window double click on the name of the second column and use the calculator pad to add 10 (click the ‘+’ and 1 then 0), and click OK. Repeat this step but in the Store Result Window enter ‘Times10’ and in the Expression box change the ‘+’ to ‘*’. Repeat part f above to find the mean and standard deviation for the original data (column 2) as well as the new data in Columns 3 and 4. You can simply double click on all three columns or highlight all three and click the ‘Select’ button.

What do you notice about the changes in the mean and standard deviation from the original to the new data?

Variable Mean StDev

Percentage of Ad 23.353 3.327

Plus 10 33.353 3.327

Times 10 233.53 33.27

Note that when you add a constant (in this case add 10) the mean increases by the constant but the standard deviation remains the same. This is because by adding a constant you have only in affect shifted the location of the center (mean) in the direction of the constant. But when you multiply all values by a constant, then both the mean and standard deviation are affected by that constant.