252y0421 3/29/04 ECO252 QBA2Name

(Page layout view!) SECOND HOUR EXAMHour of Class Registered

March 24,2004 Circle 10am 11am

Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable.Note that some equations have been squashed by Word. If you click on them or print them out, they should be fine.

I. (8 points) Do all the following.

- If you are not using the supplement table, make sure that I know it.

1.

Make a diagram! For draw a Normal curve with zero in the middle. Shade the area between -3.32 and -0.22 and note that it is all on one side of the mean, so that you subtract the area between -0.22 and zero from the area between -3.32 and zero.

2.

Make a diagram! For draw a Normal curve with zero in the middle. Shade the area between -0.11 and 1.56 and note that it is on both sides of the mean, so that you add the area between -0.11 and zero to the area between zero and 1.56.

3. (The cumulative probability up to 16)

Make a diagram! For draw a Normal curve with zero in the middle. Shade the entire area below 1.56 and note that it is on both sides of the mean, so that you add the area below zero to the area between zero and 1.56.

4. . First we must find . This is the value of that has or

. On the Normal table, the closest we can find to .3850 is So and

Check:

Make a diagram! For draw a Normal curve with zero in the middle. Divide the area above zero into 11.5% above and 50% - 11.5% below .

252y0421 3/17/04

II. (24+ points) Do all the following? (2points each unless noted otherwise).

Questions 1-6 refer to Exhibit 1.

Exhibit 1:(Edited from problems presented by Samuel Wathen with one small error for Lind et. al. 2002)

The first two columns below are evaluations of a sample of five products, first at FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective than FIFO in keeping the value of inventory lower? (Assume that the underlying distribution is Normal.)

Product

1 225 221 4 50625 48841 16

2 119 100 19 14161 10000 361

3 100 113 -13 10000 12769 169

4 212 200 12 44944 40000 144

5 248 245 3 61504 60025 9

904 879 25 181234 171635 699

Minitab calculated the following sample statistics:

Variable Mean Median StDev SE Mean

5 180.8 212.0 66.7 29.8

5 175.8 200.0 ____

5 5.00 4.00 11.98 5.36

  1. Compute the standard deviation of . You may use any of the material given in exhibit 1.

Solution:

Note: If you wasted our time using the definitional formula, see the end of Part II.

  1. What is the null hypothesis?

a)

b)

c)*

d)

e)None of the above.

Explanation:The question seems to be asking if . This is the same as , which cannot be a null hypothesis because it does not contain an equality. It must be an alternate hypothesis so that the null hypothesis is its opposite,

252y0421 3/17/04

Exhibit 1:The first two columns below are evaluations of a sample of five products, first at FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective than FIFO in keeping the value of inventory lower? (Assume that the underlying distribution is Normal.)

Product

1 225 221 4 50625 48841 16

2 119 100 19 14161 10000 361

3 100 113 -13 10000 12769 169

4 212 200 12 44944 40000 144

5 248 245 3 61504 60025 9

904 879 25 181234 171635 699

Variable Mean Median StDev SE Mean

5 180.8 212.0 66.7 29.8

5 175.8 200.0 ____

5 5.00 4.00 11.98 5.36

  1. What is (are) the degrees of freedom?

a)*4 Since each line represents on product, this is paired data.

b)5

c)8

d)15

e)10

Explanation: Since each line represents one product, this is paired data. Our variable is thus really , which contains only 5 numbers.

  1. If you used the 5% level of significance, what is the appropriate or value from the tables.

a)

b)

c)

d)

e)

f)

g)*None of the above.

This is a one-sided 5% test, and the alternate hypothesis,, is the same as , where . and must be larger than

  1. What is the value of your calculated or ?

a)*

b)

c)

d)

e)None of the above.

252y0421 3/17/04

  1. What is your decision at the 5% significance level?

a)Do not reject the null hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower.

b)Reject the null hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower.

c)Reject the alternative hypothesis and conclude that LIFO is more effective in keeping the value of the inventory lower.

d)*Do not reject the null hypothesis and conclude that LIFO is not more effective in keeping the value of the inventory lower.

e)None of the above.

  1. Find an approximate p-value for the null hypothesis that you tested. Please explain your result!

Solution:We need . If we check the line for 4 degrees of freedom, we find and , which means that and . Since 0.933 liesbetween these values, it must be true that . There is some flexibility here depending on your answer to Question five.

  1. A manufacturer revises a manufacturing process and finds a fall in the defect rate of

a)The fall in defects is statistically significant because 5% is larger than 4%.

b)The fall in defects is statistically significant because the confidence interval supportsH0.

c)*The fall in defects is not statistically significant because 4% is smaller than 5%.

d)The fall in defects is not statistically significant because the confidence interval would lead us to reject H0.

Questions 9-11 refer to Exhibit 2.

Exhibit 2:(Edited from problems presented by Samuel Wathen) A group of adults and a group of children both tried Wow! Cereal. Was there a difference in how adults and kids responded to it?

Number in Sample / Number who liked it / Fraction of sample who liked it
Adults
(Group 1) / 250 / 187 / .748 /
Children (Group 2) / 100 / 66 / .660 /
Total / 350 / 253 / .723 /

252y0421 3/29/04

  1. What is the null hypothesis ?

a) There is no reasonable way to define a mean here.

b)

c)

d)* There is no reason to assume that one fraction is larger than the other before we look at the data. Of course b), c), e) and f) do not contain equalities and cannot be null hypotheses.

e)

f)

g)None of the above.

  1. Calculate a 99% confidence interval for the difference between the fraction of adults and fraction of kids that liked Wow! Explain why you reject or do not reject the null hypothesis. (4)

Solution: The outline says the following: or ,where . The tables say , so the interval is , or

. Since this interval includes zero, do not reject the null hypothesis.

  1. (Extra Credit)Calculate a 77% confidence interval for the difference between the fraction of adults and fraction of kids that liked Wow! (2)

On page 1, we found . Since the confidence level is 77%, the significance level is 23% and half the significance level is 11.5%. The interval is thus

Questions 12-14 refer to Exhibit 3.

Exhibit 3:(Edited from problems presented by Samuel Wathen)

A survey was taken among a randomly selected 100 property owners to see if opinion about a street widening was related to the distance of front footage they owned. The results appear below.

Opinion
Front-Footage / For / Undecided / Against
Under 45 feet / 12 / 4 / 4
45-120 feet / 35 / 5 / 30
Over 120 feet / 3 / 2 / 5
  1. How many degrees of freedom are there?

a)2

b)3

c)*4

d)5

e)9

f)None of the above.

252y0421 3/17/04

  1. What is the value of for people in favor of the project who own less than 45 feet of frontage ?

a)*10

b)12

c)35

d)50

e)None of the above.

Front-Footage / For / Undecided / Against /
Under 45 feet / 12 / 4 / 4 / 20 / .20
45-120 feet / 35 / 5 / 30 / 70 / .70
Over 120 feet / 3 / 2 / 5 / 10 / .30
50 / 11 / 39 / 100
  1. Assume that the computed value of chi square is 8.5

a)What is the null hypothesis that you are testing ? (2)

Solution: Opinions and Front footage are independent.

b)What is your conclusion ? Why ? (3)

Solution:We do not reject the null hypothesis at the 5% level because 8.5 is below .

  1. Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3 points - 2 point penalty for not handing this in.)
  1. The following output is from a computer problem very much like the one you did to compare two sets of data. Two production processes are in use. I wish to compare numbers of defects in Process A and Process B to test the statement “ The number of defects in process A is significantly lower than in process B.” Three tests are done. Assume that the underlying distribution is Normal. a)Which of the three tests should we use? b) What is the null hypothesis as we use it? c) Should we reject the null hypothesis? Why?

Test 1:

MTB > twosamplet 'A' 'B'

Two-Sample T-Test and CI: A, B

Two-sample T for A vs B

N Mean StDev SE Mean

A 90 220.5 34.7 3.7

B 110 300.5 82.7 7.9

Difference = mu A - mu B

Estimate for difference: -79.98

95% CI for difference: (-97.15, -62.81)

T-Test of difference = 0 (vs not =): T-Value = -9.20 P-Value = 0.000 DF = 152

252y0421 3/17/04

Test 2:

MTB > twosamplet 'A' 'B';

SUBC> alter 1.

Two-Sample T-Test and CI: A, B

Two-sample T for A vs B

N Mean StDev SE Mean

A 90 220.5 34.7 3.7

B 110 300.5 82.7 7.9

Difference = mu A - mu B

Estimate for difference: -79.98

95% lower bound for difference: -94.36

T-Test of difference = 0 (vs >): T-Value = -9.20 P-Value = 1.000 DF = 152

Test 3:

MTB > Twosamplet 'A' 'B';

SUBC> alter -1.

Two-Sample T-Test and CI: A, B

Two-sample T for A vs B

N Mean StDev SE Mean

A 90 220.5 34.7 3.7

B 110 300.5 82.7 7.9

Difference = mu A - mu B

Estimate for difference: -79.98

95% upper bound for difference: -65.59

T-Test of difference = 0 (vs <): T-Value = -9.20 P-Value = 0.000 DF = 152

Solution: a), b)Test 3 is appropriate because it tests and is equivalent to

c) Since the p-value is below any significance level we might use, we reject the null hypothesis.

  1. (Extra credit) My boss objects that he thinks that the variances are equal, so that I used the wrong test. I go back to the computer and do the following. (The null hypothesis is equal variances.) Was I right? Why?

MTB > %VarTest c3 c4;

SUBC> Unstacked.

Test for Equal Variances

F-Test (normal distribution)

Test Statistic: 0.176

P-Value : 0.000

Solution:I was right. If the null hypothesis was equal variances, the p-value was below any significance level that I would use. So we reject the null hypothesis.

252y0421 3/17/04

  1. (Extra Credit)Now my beloved boss says that maybe the underlying distribution is not Normal. I go back to the computer and run the following. Process A results are in C3. Process B results are in C4. Remember that there are 90 data items for process A and 100 for process B. What are our hypotheses and results?

MTB > Stack c3 c4 c5;This stacks the 2 sets of results together so they can be ranked.

SUBC> Subscripts c6;

SUBC> UseNames.

MTB > Rank c5 c7.C7 now contains the ranks.

MTB > Unstack (c7);

SUBC> Subscripts c6;Ranks for A are now in C7_A. Ranks for B are now in C7_B.

SUBC> After;

SUBC> VarNames.

MTB > sum c8

Sum of C7_A

Sum of C7_A = 6008.0

MTB > sum c9

Sum of C7_B

Sum of C7_B = 14092

Solution: We use the Wilcoxon-Mann-Whitney Test for Two Independent Samples to compare the medians..

According to the outline, for values of and that are too large for the tables,has the normal distribution with mean and variance . If the significance level is 5% and the test is one-sided, we reject our null hypothesis if lies below -1.645.

So and is the smaller of the rank sums.

and

So . Since the p-value would be

, we would reject the null hypothesis of at any significance level.

252x0421 3/17/04

Questions 19-22 refer to Exhibit 4.

Exhibit 4:(Edited from problems presented by Samuel Wathen)

A professor asserts that she uses a Normal curve with a mean of 75 and a standard deviation of 10 to grade students. Last year’s grades are below. Test to see if the professor’s assertions are correct at the 99% confidence level.

Row Grade Interval

1 A 90+ 7.6820 15 29.2892

2 B 80-90 27.7955 20 14.3908

3 C 70-80 44.0450 40 36.3265

4 D 60-70 27.7955 30 32.3793

5 F Below 60 7.6820 10 13.0174

115.0000 115 125.4032

  1. Show the calculations necessary to get the number that were expected to get B’s.

Solution:

and

252y0421 3/29/04

  1. What table value of chi-square would you use to test the professor’s assertion?

Solution:

  1. What is the calculated value of chi-square?

Solution:

  1. Explain your conclusion.

Solution: Since the calculated chi-square is smaller than the table chi-squared, do not reject the null hypothesis that the grades follow a Normal distribution.

Answer to Question 1 using definitional formula:

Row xL C2 C3

1 221 45.2 2043.04

2 100 -75.8 5745.64

3 113 -62.8 3943.84

4 200 24.2 585.64

5 245 69.2 4788.64

879 0.0 17106.80

Paired Samples / Independent Samples
Location - Normal distribution.
Compare means. / Method D4 / Methods D1- D3
Location - Distribution not Normal. Compare medians. / Method D5b / Method D5a
Proportions / Method D6
Variability - Normal distribution. Compare variances. / Method D7

252y0421 3/17/04 ECO252 QBA2

SECOND EXAM

March 24, 2004

TAKE HOME SECTION

-

Name: ______

Student Number: ______

III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly.(19+ points)

1) Chi-squared and Related Tests (Bassett et. al.)To personalize the data below, change the number of stations reporting 4 thunderstorms to the second to last digit of your student number. This will change the total number of stations reporting.For example,Seymour Butz’s student number is 976500, so he will change the number of stations reporting 4 thunderstorms to zero and the total number of stations reporting will be 22 + 37 + 20 + 13 + 0 + 2 = 94.

a) 100 weather stations reported the following in August 2003:

Number of Thunderstorms / 0 / 1 / 2 / 3 / 4 / 5
Number of stations reporting thunderstorms / 22 / 37 / 20 / 13 / 6 / 2

In the region in question, the number of thunderstorms per month is believed to have a Poisson distribution with a mean of 1. Test to see if this is appropriate using a chi-squared method. For example if, 5 stations reported 2 thunderstorms and 5 stations reported 3 thunderstorms and there were only 10 stations, the total number of storms reported would be , and the average number of storms reported would be . (4)

b) Repeat the test using the Kolmogorov-Smirnov method. (3)

c) Find the average number of storms per station and use it to generate a Poisson table on Minitab. To do so follow the example below, replacing 0.732 with your mean (a number like 1.723). Head Column 1 (C1) , column 2 and column 3 or something similar.. ( stands for ) In column 1 place the numbers 0 through 10.

MTB > PDF c1 c2;

SUBC> Poisson 0.732.

MTB > CDF c1 c3;

SUBC> Poisson 0.732.

MTB > print c1 - c3

Data Display

Row k P(k) P(x le k)

1 0 0.480946 0.48095

2 1 0.352053 0.83300

3 2 0.128851 0.96185

4 3 0.031440 0.99329

5 4 0.005753 0.99904

6 5 0.000842 0.99989

7 6 0.000103 0.99999

8 7 0.000011 1.00000

9 8 0.000001 1.00000

10 9 0.000000 1.00000

11 10 0.000000 1.00000

252y0421 3/17/04

This table tells us that, for a Poisson distribution with a mean of 0.732, and . To keep the numbers correct, you could merge the data for k = 5 to 10into a category of ‘5 or more storms.’ Decide whether a chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson distribution with your mean,remembering that you estimated the mean from your data. (4)

d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that the distribution follows the binomial distribution with and . (2)

Number of Sixes / 0 / 1 / 2
Frequency / 105 / 70 / 5

e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using

(2)

Solution: I will use Seymour’s version here, and try to put the others into an appendix.

a) 100 weather stations reported the following in August 2003:

Number of Thunderstorms / 0 / 1 / 2 / 3 / 4 / 5
Number of stations reporting thunderstorms / 22 / 37 / 20 / 13 / 0 / 2

In the region in question, the number of thunderstorms per month is believed to have a Poisson distribution with a mean of 1. Test to see if this is appropriate using a chi-squared method. (4)

This is the data from the Supplementary Materials book.

0 0.367879 0.36788

1 0.367879 0.73576

2 0.183940 0.91970

3 0.061313 0.98101

4 0.015328 0.99634

5 0.003066 0.99941

6 0.000511 0.99992

7 0.000073 0.99999

8 0.000009 1.00000

So we need to put together a version of . Note that adds to . So if we take and multiply by 94, we get (34.5806, 34.5806, 17.2904, 5.7634, 1.4408, 0.2882, 0.0480, 0.0069, 0.0008). The last three, at least, are too small to use, so we combine them to get the table below.

Row

1 22 34.5806 12.5806 158.272 4.57690 13.9963

2 37 34.5806 -2.4194 5.853 0.16927 39.5886

3 20 17.2904 -2.7096 7.342 0.42464 23.1343

4 13 5.7634 -7.2366 52.368 9.08628 29.3229

5 2 1.7847 -0.2153 0.046 0.02597 2.2413

94 93.9997 -0.0003 14.2833 108.2831

Depending on which method you used or . These are both above , so reject the null hypothesis.

b) Repeat the test using the Kolmogorov-Smirnov method. (3) First, take , divide by and make the result into a cumulative distribution.

252y0421 3/17/04

Row

1 22 0.234043 0.23404

2 37 0.393617 0.62766

3 20 0.212766 0.84043

4 13 0.138298 0.97872

5 0 0.000000 0.97872

6 2 0.021277 1.00000

Copy , label it , compute , and find the maximum , which is .133837. According to the K-Stable, this should be compared with . Because the maximum deviation is not above .1403, do not reject the null hypothesis.

Row

1 0.23404 0.36788 0.133837

2 0.62766 0.73576 0.108100

3 0.84043 0.91970 0.079274

4 0.97872 0.98101 0.002287

5 0.97872 0.99634 0.017617

6 1.00000 0.99941 0.000590

7 0.99992

8 0.99999

9 1.00000

c) Find the average number of storms per station and use it to generate a Poisson table on Minitab. Decide whether a chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson distribution with your mean, remembering that you estimated the mean from your data. (4) We multiply and and get 22(0)+ 37(1)+20(2)+13(3)+0(4)+2(5) =0 + 37 + 40 + 39 + 0 + 10 = 126 as our mean. We generate the part of the Poisson table that we need, multiply by and use a chi-square method. We compare our computed chi-square of 5.9364 to , and do not reject the null hypothesis, . Note that we have lost a degree of freedom by computing the mean from the data, which is why we can’t use the K-S method.

Row

1 0 0.261846

2 1 0.350873

3 2 0.235085

4 3 0.105005

5 4 0.035177

6 5 0.009427

7 6 0.002105

8 7 0.000403

9 8 0.000068

10 9 0.000010

11 10 0.000001

Row

1 22 24.6135 2.61349 6.8303 0.27750 19.6640

2 37 32.9821 -4.01792 16.1437 0.48947 41.5074

3 20 22.0980 2.09799 4.4016 0.19918 18.1012

4 13 9.8704 -3.12956 9.7942 0.99227 17.1218

5 0 3.3066 3.30660 10.9336 3.30660 0.0000

6 2 1.1293 -0.87070 0.7581 0.67132 3.5420

94 93.9999 -0.00009 5.93644 99.9364

252y0421 3/17/04

d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that the distribution follows the binomial distribution with and . (2)

Number of Sixes / 0 / 1 / 2
Frequency / 105 / 70 / 5

e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using

I did these together. Since the total number of sixes was 1(70) + 2(5) = 80 .

My table for .4444 could have been generated by , where , but I used

MTB > cdf c7 c10;

SUBC> binomial 2 .4444.

MTB > pdf c7 c11;

SUBC> binomial 2 .4444.

The table for .15 could have been done the same way or with the formula. I used the table in the Supplement and then took the difference between the numbers. I got the column by multiplying by 180.

Row

1 0 0.7225 0.7225 0.30869 0.308691

2 1 0.9775 0.2550 0.80251 0.493817

3 2 1.0000 0.0225 1.00000 0.197491

Only one method was needed in each of d) and e). If you used chi-squared, you should have gotten the following.

Row

1 0 105 130.05 25.05 627.503 4.8251 84.775

2 1 70 45.90 -24.10 580.810 12.6538 106.754

3 2 5 4.05 -0.95 0.903 0.2228 6.173

180 180.00 0.00 17.7017 197.702

Since , and our computed chi-square of 17.7017 is larger, we reject the null hypothesis.

The K-S method is probably easier. I got the following.

Row

1 0 105 130.05 0.58333 0.7225 0.139167 0.583333

2 1 70 45.90 0.97222 0.9775 0.005278 0.388889

3 2 5 4.05 1.00000 1.0000 0.000000 0.027778

The maximum is .139167. According to the K-S table, this should be compared with . Because the maximum deviation is above .1403, reject the null hypothesis.

d) In this section, we have lost a degree of freedom. Since is way below our chi-square of 74.248, we reject the null Hypothesis.

Row

1 0 105 55.5644 -49.4356 2443.87 43.9827 198.418

2 1 70 88.8871 18.8871 356.72 4.0132 55.126

3 2 5 35.5484 30.5484 933.21 26.2517 0.703

180 180.000 -0.0000 74.2476 254.248

252y0421 3/17/04

2) (Meyer and Krueger) WEFA compiled the following random samples of single-family home prices in the eastern and western parts of the US (in $thousands.). (Note – in this problem it is OK to use Excel or Minitab as a help – but you must fool me into believing that you did it by hand.)

Row City - E x1 City-W x2 City-No

1 AlbanyNY 108.607 BakersfieldCA 137.171 1

2 AllentownPA 85.250 FresnoCA 107.627 2

3 BaltimoreMD 112.747 Orange C. CA 204.862 3

4 BergenNJ 195.232 PortlandOR 123.605 4

5 BostonMA 180.865 RiversideCA 123.836 5

6 BuffaloNY 83.122 SacramentoCA 120.232 6

7 CharlestownSC 92.840 San DiegoCA 172.601 7

8 Charlotte NC 104.433 San FranciscoCA 220.067 8

9 GreensboroNC 97.638 San JoseCA 224.828 9

10 GreenvilleSC 88.355 SeattleWA 147.854 10

11 HarrisburgPA 79.846 StocktonCA 98.440 11

12 HartfordCT 129.130 TacomaWA 119.884 12

13 Middlesex NJ 169.540

14 Monmouth NJ 137.859

15 New Haven CT 134.856

16 New YorkNY 170.830

17 NewarkNJ 187.128

18 PhiladelphiaPA 114.553

19 Raleigh/Durham NC 119.355

20 RochesterNY 85.043

21 SpringfieldMA 102.678

22 SyracuseNY 82.372

23 WashingtonDC 155.176

These are available on the website in Minitab. Minitab reports the following sample statistics.

Variable n Mean Median StDev

x1 23 122.50 112.75 37.20

x2 12 150.10 130.50 44.50

You may use the statistics given for x1, but personalize the data for Western cities as follows: Use the fourth digit of your student number to pick the first city to be eliminated and then eliminate the third city after that. (You may, if you wish, drop the last two digits of the prices in the Western Cities.)For example,Seymour Butz’s student number is 976500, so he will use the number 5 to eliminate cities 5 (Riverside) and 8 (San Francisco). If the fourth digit of your student number is zero, eliminate cities 10 and 1. You will thus have only 10 cities in your second sample.