Stat 280: Elementary Applied Statistics

Answer Key to Test I

Q

/ 1 / 2 / 3 / 4 / 5 / Total
P / 15 / 15 / 25 / 20 / 25 / 100

Instructions:

  1. Please write your answers in the space provided below. Use the back pages if extra space is needed.
  2. You may use calculators, and the normal distribution table inside the front cover of your textbook. However, you are not allowed to look into the text material in the book.
  3. Show work in detail. Partial credit may be given for the right steps even if your final answer is not exactly correct.

Some “hard to remember” formulas:

.

Question 1 [15 points]

For each of the following variables, state whether it is categorical (C) or quantitative (Q). Then suggest one type of graphical display suitable for showing the distribution of , say, 100 observations of the variable.

  • Height in inches Q, Stem-leaf Plot/ Histogram/ Box-plot
  • Eye colorC, Bar Chart/ Pie Chart
  • Area code in phone numbersC, Bar Chart/ Pie Chart
  • SAT scoresQ, Histogram/ Box-Plot
  • Number of letters in your last nameQ, Histogram/Box-Plot

(3 points each)

Note that stem-leaf plots do not work for SAT scores and number of letters in last name because each node of the stem would have either too few or too many leaves.

Question 2 [15 points]

(a)Explain what is the five number summary (also called 5-point summary) of a set of data?

The five number summary consists of the minimum (min.), first quartile (Q1), median (M), third quartile (Q3), maximum (max.).

(4 points)

(b)Write down the five number summary for the following set of temperatures measured in oF. Also write down the inter-quartile range (IQR).

52, 57, 18, 82, 65, 65

The five number summary for this set of data is

min.=18, Q1=52, M=61, Q3=65, max.=82.

The interquartile range IQR = 65 – 52 =13.

(4 points)

(c)If we use the 1.5 x IQR criterion for outliers, write down the outliers in this set of data if there are any.

1.5 x IQR = 1.5 x 3 = 19.5

upper cut off point = Q3 + 1.5 x IQR = 65 + 19.5 = 84.5

lower cut off point = Q1 - 1.5 x IQR = 32.5

Any observation below the lower cut off or above the upper cut off point is an outlier. By this criterion, 18 is the only outlier in the data set.

(7 points)

Question 3 [25 points]

(a)Find the mean (), variance (), and standard deviation (s) of the set of data in Question 2. What will be the mean and standard deviation of these temperatures in oC? [The Celsius and Fahrenheit temperature scales are related by C = (5/9)(F –32) ].

(10 points)

(b)Briefly explain in a few English sentences what are the interpretations of ,and s and what characteristics of the distribution are measured by these quantities.

The mean is the average value of the observations. It is a measure of the center of the distribution.

The variance is an adjusted average of the squared deviations from the mean. It is measured in square units. The square root of is the standard deviation s, which is measured in the original unit. Both and s measure the spread of the distribution.

(10 points)

(c)If a set of data has standard deviation 0, what can you say about it?

If a set of data has standard deviation 0, every observation must be identical to each other.

(5 points)

Question 4 [20 points]

(a)What is the 68-95-99.7 rule?

The 68-95-99.7 rule states that in a normal distribution with mean  and standard deviation , about 68% of observations fall within  of , about 95% of observations fall within 2 of , about 99.7% of observations fall within 3 of .

(5 points)

(b)If height is normally distributed with mean and standard deviation 65 and 3 inches respectively, what percentage of people are taller than 62 inches?

By the 68-95-99.7 rule and the symmetry of the normal distribution, we know that about 34% of the people are between 62 and 65 inches, and 50% of the people are taller than 65 inches. So, in total, about 34%+50% = 84% of people are taller than 62 inches.

(5 points)

(c)What percentage of people are shorter than 60 inches?

The standardized z-score corresponding to the height of 60 inches is

.

From the standard normal distribution table, the percentage of observations below this z-score is 4.75%.

(10 points)

Question 5 [25 points]

The following are the heights of five women and their dates on Valentine’s day.

Women / 66 / 64 / 66 / 65 / 70
Men / 72 / 68 / 70 / 68 / 71

(a)Find the correlation between men and women. How would you describe the direction and strength of the relationship between the height of women and their dates?

The correlation r can be obtained from direct calculations using the formula (given on front page) or using the functions provided by a statistical calculator. For the above set of data, the correlation is 0.6251. It is a positive association but is only moderate in strength because the correlation is not very close to 1.

(5 points)

(b)Determine the equation of the least square regression line.

Taking the explanatory variable x to be the woman’s height and the response variable to be the height of her date, we can find the equation of the least square regression line where and .

For the data above, we obtain , giving a = 37.34, and b = 0.39. The eqation of the least square regression line is

(5 points)

(c)What would be the predicted height of the date of a woman 67 inches tall? A woman 58 inches tall? Comment on the accuracy of the predictions.

Using the regression equation obtained in part (b), the predicted height of the date of a woman 67 inches tall is 70.2 inches, and that of a woman 58 inches tall woman is 65.8 inches.

Since the correlation r is not very strong, we cannot expect the prediction to be very accurate. Moreover, the prediction for x=58 is an extrapolation, it should be used with caution.

(7 points)

(d)If heights were measured in centimeters rather than inches, how would the correlation change? (There are 2.54 centimeters in an inch.)

There will be no change in correlation, because correlation is not affected by the scale of measurement.

(3 points)

(e)If every woman dated a man exactly 3 inches taller than herself, what would be the correlation between female and male heights?

In this case, r = 1 because there will be a perfect positive correlation between the height of women and their dates. The scatter plot is a perfect straight line.

(5 points)