4/27/00 252y0032 ECO252 QBA2Name key

THIRD HOUR EXAMHour of Class Registered (Circle)

April 19, 2000 MWF TR 10 12 12:30 2:00

(Open this document in 'Page Layout' view!)

I. (10+ points) Do all the following;

1. Hand in your computer printouts for problems 1,2 and 3.(5 points – 3 point penalty for not handing in)

2. Do not do the following unless you handed in at least two outputs.

On the next few pages there are problems very much like the ones you did.

a. An oil price report reveals that the mean national price for gas was $1.46 last week. A random sample of gas stations (data is "gaspr2") in Southeastern Pennsylvania reveals that the mean price was $1.44. Using the tests with p-values in Minitab Problem 1 below, can we conclude that the price in Southeastern Pennsylvania is lower than it is nationwide? Do not answer without citing which of the tests and p-values you used. (2)

b. The regression Problem 2 relates sales of a product (‘sales1’) to shelf space ('shsp') in a random sample of grocery stores. Identify the slope of the equation and explain whether it is significant at the 2% significance level.(2) Add a regression line to the graph. (1) Do a 95% confidence interval for the slope. (2)

Remember: As I have said innumerable times, your null hypothesis is usually 'no

significance.'

The rule on p-value:

If the p-value is less than the significance level (alpha) reject the null hypothesis; if the p-value is greater than or equal to the significance level, do not reject the null hypothesis.

Since a p-value is a probability, there is no reason in the universe why you would compare it with anything but a probability. (The significance level is also a probability.)

See next few pages for the answer to question 2.

4/24/00 252y0032

Problem1:

MTB > RETR 'C:\MINITAB\2X0031-3.MTW'.

Retrieving worksheet from file: C:\MINITAB\2X0031-3.MTW

Worksheet was saved on 4/11/2000

MTB > ttest mu=1.46 'gaspr2'

T-Test of the Mean

Test of mu = 1.46000 vs mu not = 1.46000

Variable N Mean StDev SE Mean T P-Value

gaspr2 35 1.44473 0.04962 0.00839 -1.82 0.077

MTB > ttest mu=1.46 'gaspr2';

SUBC> alt =1.

T-Test of the Mean

Test of mu = 1.46000 vs mu > 1.46000

Variable N Mean StDev SE Mean T P-Value

gaspr2 35 1.44473 0.04962 0.00839 -1.82 0.96

MTB > ttest mu=1.46 'gaspr2';

SUBC> alt=-1.

T-Test of the Mean*********

Test of mu = 1.46000 vs mu < 1.46000

Variable N Mean StDev SE Mean T P-Value

gaspr2 35 1.44473 0.04962 0.00839 -1.82 0.039

MTB > Stop.

Solution to 2a: The only part of the printout that is relevant is the starred section which tests . Since p-value is .039 and this is very small, we reject the null hypothesis at the 5% significance level (but not at the 1% level). If we use the 5% significance level, we conclude that the mean Southeastern PA price is below the mean nationwide price. Testing is not the same as testing

Problem 2:

Worksheet size: 100000 cells
MTB > RETR 'C:\MINITAB\2X0031-2.MTW'.
Retrieving worksheet from file: C:\MINITAB\2X0031-2.MTW
Worksheet was saved on 4/13/2000
MTB > print 'sales1''shsp'
Data Display

Row sales1 shsp

1 2.6 5

2 3.2 5

3 2.4 5

4 2.9 10

5 3.4 10

6 3.6 10
7 3.3 15
8 3.7 15
9 3.8 15
10 3.6 20

11 4.0 20

12 4.3 20

4/24/00 252y0032

MTB > regress 'sales1' on 1 'shsp''resid''pred'

Regression Analysis

The regression equation is

sales1 = 2.40 + 0.0800 shsp

Predictor Coef Stdev t-ratio p

Constant 2.4000 0.2280 10.52 0.000

shsp 0.08000 0.01665 4.80 0.000

s = 0.3225 R-sq = 69.8% R-sq(adj) = 66.7%

Analysis of Variance

SOURCE DF SS MS F p

Regression 1 2.4000 2.4000 23.08 0.000

Error 10 1.0400 0.1040

Total 11 3.4400

MTB > plot 'pred'*'shsp'

MTB > plot 'sales1'*'shsp'

MTB > plot 'pred'*'shsp' 'sales1'*'shsp';

SUBC> symbol;

SUBC> type 3 1;

SUBC> color 8 9;

SUBC> overlay.

MTB > Save 'C:\MINITAB\2X0031-2.MTW';

SUBC> Replace.

Saving worksheet in file: C:\MINITAB\2X0031-2.MTW

* NOTE * Existing file replaced.

MTB >

Answer to 2b: (i)According to the printout, the slope of the regression equation is . (Its standard deviation is ) If we assume a confidence level of 2% and test , we find a p-value of 0.000 which is below .02 (=2%) and reject the null hypothesis. We thus conclude that the slope is significant. The p-value from the ANOVA has the same effect. (ii) Just connect the x's. (iii) From the outline: . Here

4/24/00 252y0032

Problem 3:

Worksheet size: 100000 cells

MTB > RETR 'C:\MINITAB\2X0031-1.MTW'.

Retrieving worksheet from file: C:\MINITAB\2X0031-1.MTW

Worksheet was saved on 4/13/2000

MTB > print 'br st' 'op''mach'

Data Display

Row br st op mach

1 116 1 1

2 116 1 1

3 120 1 1

4 112 1 2

5 109 1 2

6 115 1 2

7 110 1 3

8 111 1 3

9 108 1 3

10 119 2 1

11 116 2 1

12 116 2 1

13 106 2 2

14 103 2 2

15 107 2 2

16 111 2 3

17 114 2 3

18 115 2 3

19 110 3 1

20 111 3 1

21 107 3 1

22 100 3 2

23 102 3 2

24 100 3 2

25 104 3 3

26 103 3 3

27 106 3 3

28 113 4 1

29 116 4 1

30 112 4 1

31 106 4 2

32 108 4 2

33 108 4 2

34 109 4 3

35 112 4 3

36 111 4 3

MTB > table'op''mach'

4/24/00 252y0032

Tabulated Statistics

ROWS: op COLUMNS: mach

1 2 3 ALL

1 3 3 3 9

2 3 3 3 9

3 3 3 3 9

4 3 3 3 9

ALL 12 12 12 36

CELL CONTENTS --

COUNT

MTB > table 'op''mach';

SUBC> data 'br st'.

Tabulated Statistics

ROWS: op COLUMNS: mach

1 2 3

1 116.00 112.00 110.00

116.00 109.00 111.00

120.00 115.00 108.00

2 119.00 106.00 111.00

116.00 103.00 114.00

116.00 107.00 115.00

3 110.00 100.00 104.00

111.00 102.00 103.00

107.00 100.00 106.00

4 113.00 106.00 109.00

116.00 108.00 112.00

112.00 108.00 111.00

CELL CONTENTS --

br st:DATA

4/24/00 252y0032

MTB > table 'op''mach';

SUBC> mean 'br st'.

Tabulated Statistics

ROWS: op COLUMNS: mach

1 2 3 ALL

1 117.33 112.00 109.67 113.00

2 117.00 105.33 113.33 111.89

3 109.33 100.67 104.33 104.78

4 113.67 107.33 110.67 110.56

ALL 114.33 106.33 109.50 110.06

CELL CONTENTS --

br st:MEAN

MTB > twoway 'br st''op''mach'

Two-way Analysis of Variance

Analysis of Variance for br st

Source DF SS MS

op 3 361.22 120.41

mach 2 389.56 194.78

Interaction 6 90.44 15.07

Error 24 88.67 3.69

Total 35 929.89

MTB > Save 'C:\MINITAB\2X0031-1.MTW';

SUBC> Replace.

Saving worksheet in file: C:\MINITAB\2X0031-1.MTW

* NOTE * Existing file replaced.

MTB >

4/24/00 252y0032

III. Do at least 4 of the following 6 Problems (at least 10 each) (or do sections adding to at least 40 points - Anything extra you do helps, and grades wrap around) . Show your work! State and where applicable.

1. In Problem 3 in the computer output above we are looking at the effect of operator ('op') and machine ('mach') on the breaking strength ('br st') of a product.

a. Complete the table in the ANOVA by computing all the Fs and finding the relevant Fs for comparison on your F table. Is there a significant difference between mean breaking strength for the machines at the 1% significance level? Show what numbers brought you to your conclusion. (4)

b. Test for significant interaction – explain your conclusion. Use a 99% confidence level. (1)

c. Do a 99% confidence interval for the mean of machine 2 (2)

d. Do a 99% confidence interval between the means for machine 2 and 3 that is

(i) Valid when used alone. (1)

(ii) Valid when used with other possible differences between means. (2)

e. I have not asked you any questions about operators. If I was not interested in the effect of

operators, why would I have done a 2-way analysis of variance instead of a 1-way ANOVA? (1)

f. In a study of the time necessary to repair a VCR, 5 brands (Factor A) , 3 service centers (Factor B) and two types (standard or deluxe) of product (Factor C) are distinguished. There are four measurements (replications) per cell. Generate an ANOVA table showing all possible interactions, using the following data. SSA = 59, SSB = 950, SSC = 38, SSAB = 405, SSAC = 15, SSBC = 20, SSW = 245, SST = 1900. Using a 5% significance level, which of the differences and interactions are significant.? (7)

Solution: a. The printout says (with F and columns added):

Analysis of Variance for br st

Source DF SS MS F

op 3 361.22 120.41 32.63

mach 2 398.56 194.78 52.79

Interaction 6 90.04 15.07 4.08

Error 24 88.67 3.69

Total 35 929.89

For the 'mach' line the null hypothesis is 'machine means equal.' We have computed Because this F is larger than from the F table, we reject the null hypothesis and conclude that there is a significant difference between the breaking strengths.

b. For the interaction the null hypothesis is 'no interaction'., and is larger than , so we reject the null hypothesis and conclude that there is significant interaction.

c. The table of means in the printout says:

ROWS: op COLUMNS: mach

1 2 3 ALL

1 117.33 112.00 109.67 113.00

2 117.00 105.33 113.33 111.89

3 109.33 100.67 104.33 104.78

4 113.67 107.33 110.67 110.56

ALL 114.33 106.33 109.50 110.06

CELL CONTENTS --

br st:MEAN

4/24/00 252y0032

From this we conclude that and . Note that

The outline says:

i. A Single Confidence Interval

If we desire a single interval we use the formula for a Bonferroni Confidence Interval below with .

ii. Scheffé Confidence Interval

For column means, use .

iii. Bonferroni Confidence Interval

Use for column means .

You were also told to use the formulas without the second mean if you wanted intervals for one mean. (See problems.) The ANOVA table tells us that the error mean square term has 24 degrees of freedom so that Under these circumstances, the Bonferroni formula becomes .

d. (i) Using the advice above, the Bonferroni formula becomes

(ii) From the ANOVA table So the Scheffé interval becomes

e. The columns are still not random samples and it is important to eliminate any effects due to the operators before we look at the machines.

f. b) There are groups with 4 observations in each group, so .

‘s’ means ‘significant difference’ ( rejected), ‘ns’ means ‘no significant difference’ ( accepted).

Source / SS / DF / MS / F /
Factor A / 59 / 4 / 14.75 / 5.42 / s
Factor B / 950 / 2 / 475 / 174.79 / s
Factor C / 38 / 1 / 38 / 13.96 / s
Interaction AB / 405 / 8 / 50.625 / 18.59 / s
Interaction AC / 15 / 4 / 3.75 / 1.38 / ns
Interaction BC / 20 / 2 / 10 / 3.67 / s
Interaction ABC / 168 / 8 / 21 / 7.71 / s
Error (Within) / 245 / 90 / 2.722222
Total / 1900 / 119

4/24/00 252y0032

2. A hospital administrator wishes to compare the distribution of unoccupied beds in three hospitals. Because she believes that the distributions are quite badly skewed she does not use analysis of variance but instead ranks the entire sample and uses a test based on these ranks. Data is below. Numbers in boldface were added in the process of solving the problem.

Day / Beds / Rank / Beds / Rank / Beds / Rank / Ranks
1 / 6 / 5 / 34 / 25 / 13 / 9.5 / 1 3 2
2 / 38 / 27 / 28 / 19 / 35 / 26 / 3 1 2
3 / 3 / 2 / 42 / 30 / 19 / 15 / 1 3 2
4 / 17 / 13 / 13 / 9.5 / 4 / 3 / 3 2 1
5 / 11 / 8 / 40 / 29 / 29 / 20 / 1 3 2
6 / 30 / 21 / 31 / 22 / 0 / 1 / 2 3 1
7 / 15 / 11 / 9 / 7 / 7 / 6 / 3 2 1
8 / 16 / 12 / 32 / 23 / 33 / 24 / 1 2 3
9 / 25 / 17 / 39 / 28 / 18 / 14 / 2 3 1
10 / 5 / 4 / 27 / 18 / 24 / 16 / 1 3 2
120 / 210.5 / 134.5 / 18 25 17

a. On the basis of the rank sums test the hypothesis that the median number of beds unoccupied differs for the three hospitals. (5)

b. A statistician claims that her method is inappropriate because she has ignored the fact that each row corresponds to a specific day, even if the days were chosen randomly. Rerank the data to account for the fact that it is cross-classified and repeat the analysis. (9)

Solution:

a. Columns come from same distribution or medians equal.

Sums of ranks are added above in boldface. To check the ranking, note that the sum of the three rank sums is 120 + 210.5 + 134.5 = 465, and that the sum of the first numbers is

Now, compute the Kruskal-Wallis statistic . If we try to look up this result in the (10,10,10) section of the Kruskal-Wallis table (Table 9) , we find that the problem is to large for the table. Thus we must use the chi-squared table with 2 degrees of freedom. Since reject .

.

4/24/00 252y0032

b. Columns come from same distribution or medians equal.

Items were ranked within the rows. To check the ranking, note that the sum of the three rank sums is 18 + 25 + 17 = 60, and that the sum of the numbers in a row is . However, there are rows, so we must multiply the expression by . So we have .

Now compute the Friedman statistic . If we try to find the place on the Friedman Table (Table 8) for 3 columns and 10 rows, we find that the problem is too big far the table. We thus compare our Friedman statistic with the distribution, with , whereis the number of columns. Since is not larger than , do not reject the null hypothesis.

4/24/00 252y0032

3. A law enforcement agency has come up with three methods of publicizing burglary-prevention methods to use during summer vacations. Three communities are selected and the numbers of burglaries are listed below. You may do this as a 2-way or 1-way ANOVA.

a. In each case state your null hypothesis about the publicity methods and the results (8)

b. Do a confidence interval for the mean for method 3 based on your computations in a. (2)

c. If you did two-way ANOVA explain what else it showed and why this is important if our major interest is what publicity method to use. (5)

Method
1 / Method
2 / Method
3
Community 1 / 15 / 13 / 8
Community 2 / 17 / 25 / 13
Community 3 / 27 / 24 / 17

Solution: a) 2-way ANOVA ‘s’ indicates that the null hypothesis is rejected.

Community / Burglaries / Sum / / / SS /
1 / 15 / 13 / 8 / 36 / 3 / 12.000 / 458 / 144.000
2 / 17 / 25 / 13 / 55 / 3 / 18.333 / 1083 / 336.111
3 / 27 / 24 / 17 / 68 / 3 / 22.667 / 1594 / 513.778
Sum / 59 / + 62 / + 38 / = 159 / 9 / 17.667 / 3135 / 993.889
/ 3 / + 3 / + 3 / = 9 / /
/ 19.6667 / 20.6667 / 12.6667 / 17.667
SS / 1243 / + 1370 / + 522 / 3135
/ 386.778 / + 427.111 / + 160.444 / = 974.333

Note that is not a sum, but is . .

. This is in a one way ANOVA.

()

Source

/

SS

/

DF

/

MS

/ / /
Rows (Community) / 172.667 / 2 / 86.333 / 9.25 / s / Row means equal
Columns(Method) / 114.000 / 2 / 57.000 / 5.80 / ns / Column means equal
Within (Error) / 39.333 / 4 / 9.8333
Total / 326.000 / 8

One way ANOVA (Not blocked by community) ()

Source

/

SS

/

DF

/

MS

/ / /
Columns(Method) / 114.000 / 2 / 57.000 / 1.61 / ns / Column means equal
Within (Error) / 212.000 / 6 / 35.333
Total / 326.000 / 8

Because the computed is less than the table , there is no significant difference between the methods.

4/24/00 252y0032

b) Using the formulas in the outline for one mean (and ). In any case . For 1-way ANOVA

For column means in 2-way ANOVA with , use .

c) We do find that there is a significant difference between the community means. This could cause us to believe that there is more difference between the methods than there really is.

4/24/00 252y0032

4. a. To test the response of an 800 number, I make 20 attempts to reach the number, continuing to call until I get through. I hypothesize that the results follow a Poisson distribution with a mean of 2.0. Test this – data are below. (6)

Number of Unsuccessful
Tries before success. / Observed
Frequency
0 / 2
1 / 2
2 / 4
3 / 6
4 / 4
5 / 2
20

b. A Service station reports the following sales:

12.78 8.89 10.09 10.64 16.98 26.99 (With a sample mean of 14.395 and a sample standard deviation of 6.795.) Do these data follow a normal distribution? (7)

Solution: a. Because of the small size of the sample, this is best done by the Kolmogorov-Smirnov method.

0 2 .10 .10 .13534.03534

1 2 .10 .20 .40601.20601

2 4 .20 .40 .67668.27668

3 6 .30 .70 .85713.15713

4 4 .20 .80 .94735.14735

5 2 .10 1.00 .98344.01656

20 1.00

The maximum difference is , which must be checked against the Kolmogorov-Smirnoff table for . According to the table, for , the critical value for is .294. Since is less than .294, accept the null hypothesis.

Of course, most of you wanted to do this problem by the chi-squared method.

0 2 .1353 2.706* 0-1 4 8.120 1.9704

1 2 .2707 5.414 2 4 5.414 2.9553

2 4 .2707 5.414 3+ 12 6.466 22.2703

3 6 .1804 3.608* 20 20.000 27.1460

4 4 .0902 1.804* 20.0000

5+ 2 .0527 1.054* 7.1460

20 1.0000 20.000 * Indicates groups with that had to be merged.

Since is less than our computed , reject the null hypothesis.

4/24/00 252y0032

b. Not Normal

Because the population mean and standard deviation are unknown and the sample is small, this is a Lilliefors problem. Thevalues must be in order From the data we find that and . .This is often called as in a K-S problem and is a cumulative Normal probability computed just like below.

4/24/00 252y0032

5. An firm wishes to explain the volume of office sales (in millions of dollars) over a year as a function of number of customers (in thousands). It collects data for a random sample of nine offices as follows:

Observation cust sales(The xy column is added here.)

1 10.1 13.4135.34

2 10.3 13.0 133.90

3 6.1 8.6 52.46

4 8.4 11.2 94.08

5 8.9 11.4 101.48

6 9.9 12.1 119.79

7 9.7 11.4 110.58

8 6.1 8.7 53.07

9 6.3 9.2 57.96

858.64

For your convenience the following values are given:

a. Compute the regression equation to predict sales. (6)
b. On the basis of your regression, how many millions of dollars of sales do you expect from an office that has nine thousand customers. (1)

c. Compute . (4)

Solution:

Spare Parts Computation:

1

1

a.

b. becomes , and is the number of millions of dollars that we forecast.

c. or

( always!)

4/24/00 252y0032

6. Continuing the previous problem.

a. Compute . (3)
b. Compute and do a significance test on .(4)

c. Do a confidence interval for (3)

d. Using your SST etc., put together the ANOVA table (6)

Solution:

a. or or

or (The computer has 0.4763) ( is always positive!)

b.

. Since this is not between

, reject and thus conclude that is significant.

c. .

So

d. From the previous page and above , and is that there is no relation between and .

Source / SS / DF / MS / F /
Regression / 25.0327 / 1 / 25.0327 / 111.80 / s
Error (Within) / 1.5673 / 7 / 0.2239
Total / 25.6000 / 8

Since the table F is less than the computed F, reject .

1