Solutions to Autumn 2001 exam

Business Statistics

Question 1

a.

Scale 14 = 14

14

25

3

48 8

51 2 2 4 5 6 8

61 2 4 5 7 8 8

70 0 1 3 4 4 5 7 7

83 4 5 8

90 6

b.

median = 17th data valuerange = 96-14

= 68 = 82

c.The scores on exam 1 are more consistent.

  • Range for exam 1 = 57, which is much smaller that the other two ie. 81 and 82.
  • Standard deviation for exam 1 is 13.79. This is much smaller than the other two ie. 19.96 and 18.81.
  • Variance for exam 1 is 190.13. This is much smaller than the other two ie. 398.51 and 328.5.
  • Coefficient of variation for exam 1 is 0.18. This is much smaller than the other two ie. 0.30 and 0.28.
  • Boxplot for exam 1 shows a smaller spread of marks.

d.All three exams would be considered ‘fair’. This can be verified by examining the boxplots. In all three boxplots the lower quartile exceeds the pass mark of 50%. In exam 1 the lower quartile is approximately 62, in exam 2, 52 and exam 3, 55.

e.Exams 2 and 3. This is because the scores in both these exams are reasonably skewed. When we have skewed data the median provides a better indicator of the centre.

Question 2

a.Let F = event mechanism is faulty

= event mechanism is not faulty

ii.

b.We must first determine the type of distribution we have. Ask yourself the following questions when trying to determine what type of probability distribution we have here.

Does the question say that the number of customers with claims is normally distributed? No

Does the question include information on the mean and standard deviation number of customers? No

Therefore the number of customers with claims is not normally distributed.

Can only two things happen?Yes

Are there a fixed number of trials?Yes

Does the probability a customer will have a claim remain constant?Yes

Therefore the number of customers with claims follows a binomial distribution where

X = number of customers who have claims

= 0,1,2,3,…..,25.

n = 25

p = probability a customer has a claim

= 0.1


ii

iii.

 = =
= 1.5 claims
c.i.
ii.Reading the counts directly from the contingency table we get

or using the formula for the union of two events we get

Either method gives the same probability.
Question 3

ii.

iii.

b.

Therefore the 95% confidence interval estimate of the average credit balance is between $3055.52 and $3344.48.

c.i.

therefore will be approximately normally distributed.

Therefore the proportion of bulbs functioning properly is between 89.8% and 94.2%.

Therefore 1225 bulbs should be sampled.

Question 4

a.

Step 1

Since is unknown and being estimated by , we use the t test statistic.

Step 2

Step 3

Step 4

Reject if t < -2.403

Step 5

Step 6

Since -3.54 < -2.403 we reject .

There is sufficient evidence, at  = 0.01, to support a claim by the consumer

group that the bottles are being under filled.

b.i.

  1. This condition can be verified by examining the histogram of the residuals and observing the shape of the histogram. The histogram should have an approximate bell shape. Since regression analysis is robust to slight departures from normality, we would only be concerned if this histogram had an extremely non-normal shape. The histogram presented here shows an approximate bell shape, therefore there is no reason to believe the condition has been violated.


Reject if p-value < 0.01

Since we reject .
Therefore there is sufficient evidence at  = 0.01 to conclude that sales and advertising expenditure are linearly related.
We can test the same hypotheses using the t statistic instead of the p-value. The solutions using this method follows. Only one of the other is necessary when testing these hypotheses.
Step 1

Step 2

Step 3

Step 4
Reject if
Step 5

Step 6
Since 5.78 > 2.845 we reject .
Therefore there is sufficient evidence at  = 0. 01 to conclude that sales and advertising expenditure are linearly related.

This can be interpreted that 62.6% of the variation in sales is explained by the variation in advertising expenditure. 31.4% remains unexplained and is due to other factors.

Part B

  1. B.

The data recorded for the size of fries ordered would be categorical. The category recorded would be either regular, medium, large or extra large, and the categories are ordered by size. Regular < medium < large < extra large. Therefore the data are qualitative (categorical) and ordinal (ordered categories).

  1. D.

The heights of the five students were recorded and the mean calculated. The sample mean was then used to infer that the mean height of all students at the university is 170cm.

3E.

The labelling used along the x-axis in graph A. is inappropriate. This method should only be used to label a bar graph.

The labelling used along the x-axis in graph B. is incorrect. The upper limit of each class has been plotted at the midpoint of each column.

Graphs C. and D. are bar charts as there are gaps between the columns.

Graph E. is correct as it is a histogram with the midpoint of each class plotted at the midpoint of each column.

4D.

A. is incorrect. If the distribution is skewed to the left, the mean will be influenced by some unusually small values. These small values cause the mean to underestimate the centre and hence the mean < median;

B. is incorrect. If the distribution is skewed to the right, the mean will be influenced by some unusually large values. These large values cause the mean to overestimate the centre and hence the mean > median;

C. is incorrect for the same reason as A.

D. is correct as when we have a perfectly symmetric and unimodal distribution, mean = median = mode.

E. is incorrect. When a distribution is bimodal and symmetric, it has two modes. Therefore the mode cannot be equal to the mean and the median as these are each single values.

5E.

Categorical data can only be represented graphically by either a pie chart or a bar chart. Consequently a grouped bar chart would be the best type of chart from those suggested.

6A.

0 1.8 z

7D.

Events which are mutually exclusive, have no elements in common.

Therefore

For questions 8. and 9. We must first determine the type of distribution we have.

Ask yourself the following questions when trying to determine what type of probability distribution we have here.

Does the question say that the number of computer malfunctions are normally distributed? No

Does the question include information on the mean and standard deviation number of malfunctions? No

Therefore the number of computer malfunctions is not normally distributed.

Can only two things happen?No

Are there a fixed number of trials?No

Does the probability a customer will have a claim remain constant?No

Therefore the number of computer malfunctions does not follow a binomial distribution.

Does the question provide an average rate ie the average number of computer malfunctions within a given time? Yes

"an average rate of 3 malfunctions per month"

Therefore the number of computer malfunctions is Poisson distributed where:

X = no. malfunctions = 0, 1, 2, 3, . . .

 = 3 malfunctions per month

Wherever possible, Poisson probabilites should be determined from the Poisson tables in the Appendix at the rear of the text. We use Excel for determining Poisson probabilities not found in these tables.

8B.

9C.

Here we have been asked to find the probability of less than four malfunctions in the next two months.

Therefore malfunctions per 2 months

10A.

In order to solve this problem we must draw a diagram to represent the question ie.

-c 0 c z

We now look in the body of the z tables for a probability as close to 0.4332 as we can find. From the tables we find .

Therefore c = 1.52

11A.

We do not have so we must find this first.

Therefore

12D.

If the conclusion is to reject , we reject the null in favour of the alternative hypothesis. This means that the home owner is incorrect and the house is worth more than $250 000.

13C.

When testing a hypothesis, the decision rule takes the form

'reject if the '

Therefore if the p - value = 0.02 we would reject for all values of which are greater than 0.02.

14C.

We are testing a hypothesis about the mean. We do not know and are estimating it by s. Therefore, we must use the t test statistic.

15B.

Since the alternative hypothesis is , we have a 2-tailed test with two areas of rejection.

therefore the decision rule would be

reject if ie if

-2.0092.009

16E.

The standard error is a measure of the variability of the data points about the fitted regression line. Therefore, if all the data points fitted perfectly along the regression line, the standard error would be zero.

17E.

All of the statistics and procedures in responses A. through D. are used to determine whether a linear regression model is appropriate.

18B.

When the correlation coefficient is one, then Now measures the proportion of variation in y that is explained by the regression model. Therefore when , this means that, 100% of the variation in y is explained by the regression model, and hence there is no unexplained variation.

19C.

The least squares regression line takes the form

where is the y intercept and is the slope coefficient.

Therefore in the regression line the y-intercept is 5 and the slope is -2. Since the slope is negative, this implies that the relationship between x and y is negative.

20E.

The higher the magnitude of the correlation coefficient, the stronger the correlation. The sign has no effect on the strength of the relationship, it simply indicates whether the relationship between the two variables is negative or positive. Therefore both 0.7 and -0.75 indicate stronger correlations than 0.65.