Descriptive Statistics: Numerical Methods

Chapter 3

Descriptive Statistics: Numerical Methods

Learning Objectives

1. Understand the purpose of measures of location.

2. Be able to compute the mean, median, mode, quartiles, and various percentiles.

3. Understand the purpose of measures of variability.

4. Be able to compute the range, interquartile range, variance, standard deviation, and coefficient of variation.

5. Understand skewness as a measure of the shape of a data distribution. Learn how to recognize when a data distribution is negatively skewed, roughly symmetric, and positively skewed.

6. Understand how z scores are computed and how they are used as a measure of relative location of a data value.

7. Know how Chebyshev’s theorem and the empirical rule can be used to determine the percentage of the data within a specified number of standard deviations from the mean.

8. Learn how to construct a 5-number summary and a box plot.

9. Be able to compute and interpret covariance and correlation as measures of association between two variables.

10. Be able to compute a weighted mean.

Solutions:

1.

10, 12, 16, 17, 20

Median = 16 (middle value)

2.

10, 12, 16, 17, 20, 21

Median =

3. 15, 20, 25, 25, 27, 28, 30, 34

2nd position = 20

6th position = 28

4.

Median = 57 6th item

Mode = 53 It appears 3 times

5. a.

b. Median 10th $160 Los Angeles

11th $162 Seattle

Median =

c. Mode = $167 San Francisco and New Orleans

d.

5th $134

6th $139

e.

15th $167

16th $173

6. a. minutes

b. Median

8th position 380 minutes

c. 85th percentile

13th position 690 minutes

d. Using the mean = 422, cell-phone subscribers are using 422/750 = 56% of the capacity of their plans. Part (c) shows 85% of the subscribers are using 690 minutes or less. In general, cell-phone users are not coming close to using the 750 minute capacity of their plans.

7. a.

Median is average of 10th and 11th values after arranging in ascending order.

Data are multimodal

b.

Data are bimodal: 19.95 (3 brokers), 29.95 (3 brokers)

c. Comparing the measures of central location, we conclude that it costs more to trade 100 shares in a broker assisted trade than 500 shares online.

d. From the data we have here it is more related to whether the trade is broker-assisted or online. The amount of the online transaction is 5 times as great but the cost of the transaction is less.

However, if the comparison was restricted to broker-assisted or online trades, we would probably find that larger transactions cost more.

8. a. = 695

The modal age is 25; it appears 3 times.

b. Median is average of 10th and 11th items.

Data suggest at - home workers are slightly younger.

c. For Q1,

Since i is integer,

For Q3,

Since i is integer,

d.

Since i is not an integer, we round up to the 7th position.

32nd percentile = 27

9. a. Median (Position 13) = 8296

b. Median would be better because of large data values.

c. i = (25 / 100) 25 = 6.25

Q1 (Position 7) = 5984

i = (75 / 100) 25 = 18.75

Q3 (Position 19) = 14,330

d. i = (85/100) 25 = 21.25

85th percentile (position 22) = 15,593. Approximately 85% of the web sites have less than 15,593 unique visitors.

10. a. minutes

Median 7th 73

8th 79

Median =minutes

b. minutes

Median 6th 37

7th 38

Median =minutes

c. The mean emergency room waiting time when operating at full capacity is more than an hour (1 hour, 16 minutes). Roughly 50% of the patients wait longer than 76 minutes. This average waiting time is approximately twice as long as the waiting time for hospitals not operating at capacity. The waiting times are a serious problem for hospitals operating at capacity.

11. Using the mean we get =15.58, = 18.92

For the samples we see that the mean mileage is better in the country than in the city.

City

13.2 14.4 15.2 15.3 15.3 15.3 15.9 16 16.1 16.2 16.2 16.7 16.8

­

Median

Mode: 15.3

Country

17.2 17.4 18.3 18.5 18.6 18.6 18.7 19.0 19.2 19.4 19.4 20.6 21.1

­

Median

Mode: 18.6, 19.4

The median and modal mileages are also better in the country than in the city.

12. a.

b. pictures

c. minutes

13. Range 20 - 10 = 10

10, 12, 16, 17, 20

Q1 (2nd position) = 12

Q3 (4th position) = 17

IQR = Q3 - Q1 = 17 - 12 = 5

14.

15. 15, 20, 25, 25, 27, 28, 30, 34 Range = 34 - 15 = 19

IQR = Q3 - Q1 = 29 - 22.5 = 6.5

16. a. Range = 190 - 168 = 22

b.

c.

d.

17. a. With DVD

Without DVD

With DVD $410 - $310 = $100 more expensive

b. With DVD Range = 500 - 300 = 200

Without DVD Range = 360 - 290 = 70

Models with DVD players have the greater variation in prices. The price range is $300 to $500. Models without a DVD player have less variation in prices. The price range is $290 to $360.

18. a. per day

b. The mean car-rental rate per day is $38 for both Eastern and Western cities. However, Eastern cities show a greater variation in rates per day. This greater variation is most likely due to the inclusion of the most expensive city (New York) in the Eastern city sample.

19. a. Range = 60 - 28 = 32

IQR = Q3 - Q1 = 55 - 45 = 10

b.

c. The average air quality is about the same. But, the variability is greater in Anaheim.

20. Dawson Supply: Range = 11 - 9 = 2

J.C. Clark: Range = 15 - 7 = 8

21. a. Cities:

Retirement Areas:

b. Mean cost of the market basket is roughly the same with the retirement areas sample mean $1 less. However, there is more variation in the cost in cities than in retirement areas.

22. a. 100 Shares at $50 (Broker-assisted)

Min Value = 9.95 Max Value = 55.00

Range = 55.00 - 9.95 = 45.05

Interquartile range = 48.975 - 24.995 = 23.98

500 Shares at $50 (Online)

Min Value = 5.00 Max Value = 62.50

Range = 62.50 - 5.00 = 57.50

Interquartile range = 24.950 - 13.475 = 11.475

b. 100 Shares at $50 (Broker-assisted)

500 Shares at $50 (Online)

c. 100 Shares at $50 (Broker-assisted)

Coefficient of Variation =

500 Shares at $50 (Online)

Coefficient of Variation =

d. Using the standard deviation as a measure, the variability seems to be greater for the broker-assisted trades. But, using the coefficient of variation as a measure, we see that the relative variability is greater for the online trades.

23. Range = 92-67 = 25

IQR = Q3 - Q1 = 80 - 77 = 3

= 78.47

24. Quarter milers

s = 0.0564

Coefficient of Variation = (s/)100% = (0.0564/0.966)100% = 5.8%

Milers

s = 0.1295

Coefficient of Variation = (s/)100% = (0.1295/4.534)100% = 2.9%

Yes; the coefficient of variation shows that as a percentage of the mean the quarter milers’ times show more variability.

25.

10

20

12

17

16

26.

27. a. At least 75%

b. At least 89%

c. At least 61%

d. At least 83%

e. At least 92%

28. a. Approximately 95%

b. Almost all

c. Approximately 68%

29. a. This is from 2 standard deviations below the mean to 2 standard deviations above the mean.

With z = 2, Chebyshev’s theorem gives:

Therefore, at least 75% of adults sleep between 4.5 and 9.3 hours per day.

b. This is from 2.5 standard deviations below the mean to 2.5 standard deviations above the mean.

With z = 2.5, Chebyshev’s theorem gives:

Therefore, at least 84% of adults sleep between 3.9 and 9.9 hours per day.

c. With z = 2, the empirical rule suggests that 95% of adults sleep between 4.5and 9.3 hours per day. The percentage obtained using the empirical rule is greater than the percentage obtained using Chebyshev’s theorem.

30. a. $1.39 is one standard deviation below the mean and $1.55 is one standard deviation above the mean. The empirical rule says that approximately 68% of gasoline sales should be in the price range.

b. Part (a) shows that approximately 68% of the gasoline sales are between $1.39 and $1.55. Since the bell-shaped distribution is symmetric, approximately half of 68%, or 34%, of the gasoline sales should be between $1.39 and the mean price of $1.47. $1.63 is two standard deviations above the mean price of $1.47. The empirical rule says that approximately 95% of the gasoline sales should be within two standard deviations of the mean. Thus, approximately half of 95%, or 47.5%, of the gasoline sales should be between the mean price of $1.47 and $1.63. The percentage of gasoline sales between $1.39 and $1.63 should be approximately 34% + 47.5% = 81.5%.

c. $1.63 is two standard deviations above the mean and the empirical rule says that approximately 95% of the gasoline sales should be within two standard deviations of the mean. Thus, 1 - 95% = 5% of the gasoline should be more than two standard deviations from the mean. Since the bell-shaped distribution is symmetric, we expected half of 5%, or 2.5%, would be more than $1.63.

31. a. 607 is one standard deviation above the mean. Approximately 68% of the scores are between 407 and 607 with half of 68%, or 34%, of the scores between the mean of 507 and 607. Also, since the distribution is symmetric, 50% of the scores are above the mean of 507. With 50% of the scores above 507 and with 34% of the scores between 507 and 607, 50% - 34% = 16% of the scores are above 607.

b. 707 is two standard deviations above the mean. Approximately 95% of the scores are between 307 and 707 with half of 95%, or 47.5%, of the scores between the mean of 507 and 707. Also, since the distribution is symmetric, 50% of the scores are above the mean of 507. With 50% of the scores above 507 and with 47.5% of the scores between 507 and 707, 50%- 47.5% = 2.5% of the scores are above 707.

c. Approximately 68% of the scores are between 407 and 607 with half of 68%, or 34%, of the scores between 407 and the mean of 507.

d. Approximately 95% of the scores are between 307 and 707 with half of 95%, or 47.5%, of the scores between 307 and the mean of 507. Approximately 68% of the scores are between 407 and 607 with half of 68%, or 34%, of the scores between the mean of 507 and 607. Thus, 47.5% + 34% = 81.5% of the scores are between 307 and 607.

32. a.

b.

c. $2300 is .67 standard deviations below the mean. $4900 is 1.50 standard deviations above the mean. Neither is an outlier.

d.

$13,000 is 8.25 standard deviations above the mean. This cost is an outlier.

33. a. is approximately 63 or $63,000, and s is 4 or $4000

b. This is from 2 standard deviations below the mean to 2 standard deviations above the mean.

With z = 2, Chebyshev’s theorem gives:

Therefore, at least 75% of benefits managers have an annual salary between $55,000 and $71,000.

c. The histogram of the salary data is shown below:

Visual inspection of the histogram and the skewness measure of .97 indicate that it is moderately skewed to the right. Although the distribution is not perfectly bell shaped, it does appear that the distribution of annual salaries for benefit managers is roughly symmetric and could be approximated by a bell-shaped distribution.

d. With z = 2, the empirical rule suggests that 95% of benefits managers have an annual salary between $55,000 and $71,000. The percentage is much higher than obtained using Chebyshev’s theorem, but requires the assumption that the distribution of annual salary is bell shaped.

e. There are no outliers because all the observations are within 3 standard deviations of the mean.

34. a.

b.

Approximately one standard deviation above the mean. Approximately 68% of the scores are within one standard deviation. Thus, half of (100-68), or 16%, of the games should have a winning score of 84 or more points.

Approximately two standard deviations above the mean. Approximately 95% of the scores are within two standard deviations. Thus, half of (100-95), or 2.5%, of the games should have a winning score of more than 90 points.

c.

Largest margin 24: . No outliers.

35. a.

Median = (average of 10th and 11th values)

b. Q1 = 4.00 (average of 5th and 6th values)

Q3 = 4.50 (average of 15th and 16th values)

c.

d. The distribution is significantly skewed to the left.

e. Allison One:

Omni Audio SA 12.3:

f. The lowest rating is for the Bose 501 Series. It’s z-score is:

This is not an outlier so there are no outliers.

36. 15, 20, 25, 25, 27, 28, 30, 34

Smallest = 15

Largest = 34

37.

38. 5, 6, 8, 10, 10, 12, 15, 16, 18

Smallest = 5

Q1 = 8 (3rd position)

Median = 10

Q3 = 15 (7th position)

Largest = 18

39. IQR = 50 - 42 = 8

Lower Limit: Q1 - 1.5 IQR = 42 - 12 = 30

Upper Limit: Q3 + 1.5 IQR = 50 + 12 = 62

65 is an outlier

40. a. Smallest 619

Q1:

6th position: Q1 = 725

Median: 1/2 way between 11th and 12th

Median =

Q3:

17th position: Q3 = 1699

Largest 4450

619, 725, 1016, 1699, 4450

b. IQR = 1699 - 725 = 974

Lower Limit = Q1 - 1.5(IQR)

= 725 - 1.5(975) = -736

Use Lower Limit = 0

Upper Limit = Q3 + 1.5(IQR)

= 1699 + 1.5(975) = 3160

c. Yes; larger than upper limit of $3,160,000