Displaying Data
1 Suppose a medical researcher compares the average blood pressures of women who take oral contraceptives to the blood pressures of women who do not.
a. Is blood pressure a categorical variable or a quantitative variable?
b. Is oral contraceptive use (or not) a categorical variable or a quantitative variable?
c. What variables that affect blood pressure might confuse the comparison of average blood pressures for users and nonusers? That is, what factors affecting blood pressure might differ for users and nonusers. Explain.
2 A statistics class at UC Davis was asked “About how many hours do you watch television per week? A five-number summary of the responses from 173 students follows.
Median 6
Quartiles 2 12.5
Extremes 0 100
a. What were the median hours of weekly television watching? In the context of this situation, write a sentence that interprets the median.
b. Give the value that completes the following sentence. About 1/4 of the students watch less than ___ hours of television per week.
c. Give the value that completes the following sentence. About 1/4 of the students watch more than ___ hours of television per week.
d. What is an interval that describes the middle 1/2 of the student’s television watching amounts?
e. The mean for these data is 8.9 hours per week.
How do you think the mean is calculated?
Why do you think it is larger than the median in this instance?
3 In ANGEL on the Lessons page, access the Datasets folder. Within this folder, click on the link for the data set named U.S. Smoking (Minitab file). This should cause a program named Minitab to open, with the data in place. The data are estimates of the percentage of adults who smoke in each state of the U.S. (and also District of Columbia).
a. In Minitab, use Graph>Stem-and-Leaf to create a stemplot of the percents that smoke in the 50 states and Washington D.C. In the dialog box, double click on the name of the second column to enter it as the variable you want to plot.
About where do most states fall, in terms of percent smoking?
About what is lowest percent in the dataset?
About what is the highest percent in the dataset?
What do you notice about the values in the worksheet and the values displayed in the stemplot?
How would you describe the shape of this data?
b. Use Stat>Basic Statistics>Display Descriptive Statistics to determine summary statistics for the percents. As in part d, double click the name of the second column to list it as the variable we’re analyzing. Inspect the output, to find these values:
Mean Percent =
Standard Deviation =
Median percent =
lower quartile (denoted by Q1) =
upper quartile ( denoted by Q3) =
c. Write a sentence that interprets the median in the context of this situation.
d. What value completes the following sentence? In about 1/4 of the states, the percent that smokes is less than _____.
e. What interval includes the middle 1/2 of the values of the state smoking percentages?
f. Use Calc>Calculator to manipulate the data in column 2. In the Store Result in Window type in ‘Plus10’ and in the Expression Window double click on the name of the second column and use the calculator pad to add 10 (click the ‘+’ and 1 then 0), and click OK. Repeat this step but in the Store Result Window enter ‘Times10’ and in the Expression box change the ‘+’ to ‘*’. Again find the Descriptive Statistics to get the mean and standard deviation for the original data (column 2) as well as the new data in Columns 3 and 4. For ease, enter all three variables into the Variables box at once.
What do you notice about the changes in the mean and standard deviation from the original to the new data?
4 Car and truck speeds at a particular location have approximately a bell-shaped distribution with mean = 65 mph and standard deviation = 5 mph. [Recall from the notes/text that for any bell shaped curve, you will find that roughly 68% of the observations fall within +/- one standard deviation from the mean; 95% of the observations fall within +/- two standard deviations; and 99.7%% of the observations fall within +/- three standard deviations from the mean.]
a. About 68% of cars and trucks travel between ______and ______at this location.
b. About 95% of cars and trucks travel between ______and ______at this location.
c. About 99.7% of cars and trucks travel between ______and ______at this location.
d. A z-score is a measure of how many standard deviations a value is from the mean. Later in the course, we will see that it is an important measure of the size of a value.
The formula is Z = .
Determine a z-score for a vehicle speed of 72 mph.
e. What vehicle speed has a z-score = −1? Said another way, what vehicle speed is one standard deviation below the mean? (You will need to do some algebra to solve for Observed Value)
5 Open the Class Survey (Minitab File) data file from the Datasets folder in ANGEL on the Lessons page. This data are from a survey given to students in my Stat200 courses last semester. You are a researcher and want to use this class survey data to research how PSU undergraduate students compare to these national averages. .
a. The purpose of most statistical studies is to use the sample data to generalize to a larger group. What do you think are the weaknesses of using this class survey data for generalizing to all PSU undergraduate students?
b. (Importance of checking data). Compute the Descriptive Statistics (for SATM (C16) and SATV (C17). Note the minimum and maximum value for each. (REMEMBER: Stat > Basic Statistics > Display Descriptive Statistics. Enter together into the Variables window SATM and SATV.)
i. From the output, what does the * represent?
ii. How many students answered the question regarding their SATM and SATV scores?
SATM______SATV______
c. Now find the Descriptive Statistics for SATM and SATV by Gender (Repeat what you did for part b but now enter Gender in the By Variable window) and use the output to answer the following:
Female SATM: Q1 ______Q3 ______IQR ______
Female SATV: Q1 ______Q3 ______IQR ______
Male SATM: Q1 ______Q3 ______IQR ______
Male SATV: Q1 ______Q3 ______IQR ______
d. Using the 5-number summary, a data point is considered an outlier on a boxplot if it is either larger than Q3+ (1.5´IQR), or smaller than Q1 - (1.5´IQR). Calculate and identify any outliers for the Female group.
SATM: Calculate the value of Q3+ (1.5´IQR) =
SATM: Calculate the value of Q1 - (1.5´IQR) =
SATV: Calculate the value of Q3+ (1.5´IQR) =
SATV: Calculate the value of Q1 - (1.5´IQR) =
e Based on the Descriptive Statistics you calculated by Gender and to answer the following:
· How do the SAT scores from our survey compare across gender? Do you believe that any differences are significant? That is, do you think these differences are large enough that statistically they are the different?
6 Staying with the Class Survey (Minitab File). In column C20 Book Cost are the responses to how much students expected to pay for books that semester.
a. Use Graph>Histogram click on Simple, and then enter Book Cost in the Variables box to draw a histogram. Use the mouse to identify in the graph the characteristics of the various bars in the histogram. Do this to complete the following sentences.
The most frequently reported amount spent was between ___ and ___. Of the 226 students, ___ students said they spent that much. [HINT: place your mouse pointer over the tallest bar.]
The second most frequently reported amount spent was between ____ and ____.
b. Using Minitab, draw a boxplot of the Book Cost (Use Graph>Boxplot, select “Simple” and then enter Book Cost in the variable window). The boxplot provides a graph of the 5-number summary for a set of data.. By placing the mouse pointer over the “box” a pop-up will appear displaying part of the 5-number summary.
· What does the * represent in a boxplot?
· How many * are there for the variable Book Cost?
· What are the outlier values? [place your mouse over the * to see the value]
· What is the 5-number summary for Book Cost?
· The shape of the data represented by the box plot can be determined by the location of the median bar in the box and by comparing the length of the “whiskers” – the two lines that extend from either end of the box. If the median is in the center and the whiskers are of roughly equal length then the data is symmetrical. If the median is near the bottom of the box and the top whisker is longer, then the distribution is said to be skewed to the right or positively skewed. If the median is near the top of the box and the bottom whisker is longer, then skewed to the left or negatively skewed. What is the shape of Book Cost based on the boxplot? Does this concur with how you would interpret the histogram?
1