4/25/03 252x0342 ECO252 QBA2Name
FINAL EXAMHour of Class Registered (Circle) May 7, 2003
I. (18 points) Do all the following. Note that answers without reasons receive no credit.
A researcher wishes to explain the selling price of a house in thousands on the basis of its asseseesd valuation, whether it was new and the time period. New is 1 if the house is new construction, zero otherwise. The researcher assembles the following data for a random sample of 30 home sales. Use in this problem.
————— 4/25/20039:58:00 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE\My Documents\Drive D\MINITAB\2x0342-1.MTW".
Retrieving worksheet from file: C:\Documents and Settings\RBOVE\My Documents\Drive D\MINITAB\2x0342-1.MTW
# Worksheet was saved on Fri Apr 25 2003
Results for: 2x0342-1.MTW
MTB > print c1 - c4
Data Display
Row Price Value New Time
1 69.00 66.28 0 1
2 115.50 86.31 0 2
3 100.80 84.78 1 2
4 96.90 79.74 1 3
5 72.00 65.54 0 4
6 61.90 59.93 0 4
7 97.00 79.98 1 4
8 87.50 75.22 0 5
9 96.90 81.88 1 5
10 81.50 72.94 0 5
11 69.34 60.80 0 6
12 97.90 81.61 1 6
13 96.00 79.11 0 7
14 92.00 77.96 0 9
15 94.10 78.17 1 10
16 101.90 80.24 1 10
17 109.50 85.88 1 10
18 88.65 74.03 0 11
19 93.00 75.27 0 11
20 83.00 74.31 0 11
21 106.70 84.36 0 12
22 97.90 77.90 1 12
23 97.30 79.85 1 12
24 90.50 74.92 0 12
25 95.90 79.07 1 12
26 113.90 85.61 0 13
27 94.50 76.50 1 14
28 86.50 72.78 0 14
29 91.50 72.43 0 17
30 93.75 76.64 0 17
1. Looking for a place to start, the researcher does individual regressions of price against the individual independent variables.
a. Explain why the researcher concludes from the rgressions that valuation (‘value’) is the most important independentvariable. Consider the values of and the significance tests on the slope of the equation (2)
b. What kind of variable is ‘new.’ Explain why the regression of ‘price’ against ‘new’ is equivalent to a test of the equality of 2 sample means, and what the conclusion would be. (2)
4/25/03 252x0342
MTB > regress c1 1 c2
Regression Analysis: Price versus Value
The regression equation is
Price = - 44.2 + 1.78 Value
Predictor Coef SE Coef T P
Constant -44.172 7.346 -6.01 0.000
Value 1.78171 0.09546 18.66 0.000
S = 3.475 R-Sq = 92.6% R-Sq(adj) = 92.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 4206.7 4206.7 348.37 0.000
Residual Error 28 338.1 12.1
Total 29 4544.8
Unusual Observations
Obs Value Price Fit SE Fit Residual St Resid
6 59.9 61.900 62.606 1.719 -0.706 -0.23 X
11 60.8 69.340 64.156 1.642 5.184 1.69 X
X denotes an observation whose X value gives it large influence.
MTB > regress c1 1 c3
Regression Analysis: Price versus New
The regression equation is
Price = 88.5 + 9.93 New
Predictor Coef SE Coef T P
Constant 88.458 2.759 32.07 0.000
New 9.926 4.362 2.28 0.031
S = 11.70 R-Sq = 15.6% R-Sq(adj) = 12.6%
Analysis of Variance
Source DF SS MS F P
Regression 1 709.3 709.3 5.18 0.031
Residual Error 28 3835.5 137.0
Total 29 4544.8
Unusual Observations
Obs New Price Fit SE Fit Residual St Resid
2 0.00 115.50 88.46 2.76 27.04 2.38R
6 0.00 61.90 88.46 2.76 -26.56 -2.33R
26 0.00 113.90 88.46 2.76 25.44 2.24R
R denotes an observation with a large standardized residual
MTB > regress c1 1 c4
4/25/03 252x0342
Regression Analysis: Price versus Time
The regression equation is
Price = 86.4 + 0.698 Time
Predictor Coef SE Coef T P
Constant 86.355 4.942 17.47 0.000
Time 0.6980 0.5057 1.38 0.178
S = 12.33 R-Sq = 6.4% R-Sq(adj) = 3.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 289.6 289.6 1.91 0.178
Residual Error 28 4255.2 152.0
Total 29 4544.8
Unusual Observations
Obs Time Price Fit SE Fit Residual St Resid
2 2.0 115.50 87.75 4.07 27.75 2.38R
6 4.0 61.90 89.15 3.27 -27.25 -2.29R
R denotes an observation with a large standardized residual
MTB > regress c1 2 c2 c4;
SUBC> dw;
SUBC> vif.
2. The researcher now adds time. Compare this regression with the regression with Value alone. Are the coefficients significant? Does this explain the variation in better than the regression with value alone?. What would the predicted selling price be for an old house with a valuation of 80 in time 17? (3)
Regression Analysis: Price versus Value, Time
The regression equation is
Price = - 45.0 + 1.75 Value + 0.368 Time
Predictor Coef SE Coef T P VIF
Constant -44.988 6.553 -6.87 0.000
Value 1.75060 0.08576 20.41 0.000 1.0
Time 0.3680 0.1281 2.87 0.008 1.0
S = 3.097 R-Sq = 94.3% R-Sq(adj) = 93.9%
Analysis of Variance
Source DF SS MS F P
Regression 2 4285.8 2142.9 223.46 0.000
Residual Error 27 258.9 9.6
Total 29 4544.8
Source DF Seq SS
Value 1 4206.7
Time 1 79.2
Unusual Observations
Obs Value Price Fit SE Fit Residual St Resid
2 86.3 115.500 106.842 1.385 8.658 3.13R
11 60.8 69.340 63.656 1.474 5.684 2.09R
20 74.3 83.000 89.146 0.680 -6.146 -2.03R
R denotes an observation with a large standardized residual
Durbin-Watson statistic = 2.73
4/25/03 252x0342
3. The researcher now adds the variable ‘new’ Remember that there is nothing wrong with a negative coefficient unless there is some reason why it should not be negative.
a. What two reasons would I find to doubt that this regression is an improvement on the regression with just value and time by just looking at the t tests and the sign of the coefficients? What does the change in adjusted tell me about this regression? (3)
b. We have done 5 ANOVA’s so far. What was the null hypothesis in these ANOVA’s and what does the one where the null hypothesis was accepted tell us? (2)
c. What selling price does this eqution predict for an old home with a valuation of 80 in time 17? What percentage difference is this from the selling price predicted in the regression with just time and value? (2)
d. The last two regressions have a Durbin-Watson statistic computed. What did this test for, what should our conclusion be, and why is it important? (3)
e. The column marked VIF (variance inflation factor) is a test for (multi)collinearity. The rule of thumb is that if any of these exceeds 5, we have a multicollinearity problem. None does. What is multicollinearity and why am I worried about it? (2)
f. Do an F test to show whether the regression with ‘value’, ‘time’ and ‘new’ is an improvement over the regression with ‘value’ alone. (3)
MTB > regress c1 3 c2 c4 c3;
SUBC> dw;
SUBC> vif.
Regression Analysis: Price versus Value, Time, New
The regression equation is
Price = - 47.7 + 1.79 Value + 0.351 Time - 1.22 New
Predictor Coef SE Coef T P VIF
Constant -47.675 7.190 -6.63 0.000
Value 1.79394 0.09804 18.30 0.000 1.3
Time 0.3508 0.1298 2.70 0.012 1.0
New -1.218 1.322 -0.92 0.366 1.3
S = 3.105 R-Sq = 94.5% R-Sq(adj) = 93.8%
Analysis of Variance
Source DF SS MS F P
Regression 3 4294.0 1431.3 148.42 0.000
Residual Error 26 250.7 9.6
Total 29 4544.8
4/25/03 252x0342
Source DF Seq SS
Value 1 4206.7
Time 1 79.2
New 1 8.2
Unusual Observations
Obs Value Price Fit SE Fit Residual St Resid
2 86.3 115.500 107.862 1.777 7.638 3.00R
11 60.8 69.340 63.502 1.487 5.838 2.14R
20 74.3 83.000 89.492 0.778 -6.492 -2.16R
R denotes an observation with a large standardized residual
Durbin-Watson statistic = 2.60
MTB >
4/25/03 252x0342
II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points - Anything extra you do helps, and grades wrap around) . Show your work! State and where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests.
1. (Berenson et. al. 1220) A firm believes that less than 15% of people remember their ads. A survey is taken to see what recall occurs with the following results (In these problems calculating proportions won’t help you unless you do a statistical test):
Medium
MagTVRadioTotal
Remembered 25 10 8 43
Forgot 73 93107273
Total 98103 115316
a. Test the hypothesis that the recall rate is less than 15% by using proportions calculated from the ‘Total’ column. Find a p-value for this result. (5)
b. Test the hypothesis that the proportion recalling was lower for Radio than TV. (4)
c. Test to see if there is a significant difference in the proportion that remembered according to the medium. (6)
d. The Marascuilo procedure says that if (i) equality is rejected in c) and
(ii) , where the chi – squared is what you used in c) and the standard deviation is what you would use in a confidence interval solution to b), you can say that you have a significant difference between TV and Radio. Try it! (5)
4/25/03 252x0342
2. (Berenson et. al. 1142) A manager is inspecting a new type of battery. These are subjected to 4 different pressure levels and their time to failure is recorded. The manager knows from experience that such data is not normally distributed. Ranks are provided.
PRESSURE
Use low rank normal rank high rank whee! rank
1 8.2 11 7.9 9 6.2 4 5.3 1
2 8.3 12 8.4 13 6.5 5 5.8 2
3 9.4 15 10.0 17 7.3 7 6.1 3
4 9.6 16 11.1 18 7.8 8 6.9 6
5 11.9 19 12.5 20 9.1 14 8.0 10
a. At the 5% level analyze the data on the assumption that each column represents a random sample. Do the column medians differ? (5)
b. Rerank the data appropriately and repeat a) on the assumption that the data is non-normal but cross classified by use. (5)
c. This time I want to compare high pressure (H) against low - moderate pressure (L). I will write out the numbers 1-20 and label them according to pressure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
H H H H H H H H L H L L L H L L L L L L
Do a runs test to see if the H’s and L’s appear randomly. This is called a Wald-Wolfowitz test for the equality of means in two nonnormal samples. Null hypothesis is that the sequence is random and the means are equal.What is your conclusion? (5)
4/25/03 252x0342
3. A researcher studies the relationship of numbers of subsidiaries and numbers of parent companies in 11 metropolitan areas and finds the following:
Area parents subsidiaries
1 658 2602 432964 1712116 6770404
2 396 1709 156816 676764 2920681
3 357 1852 127449 661164 3429904
4 266 1223 70756 325318 1495729
5 231 875 53361 202125 765625
6 223 666 49729 148518 443556
7 207 1519 42849 314433 2307361
8 156 884 24336 137904 781456
9 146 477 21316 69642 227529
10 143 564 20449 80652 318096
11 139 657 19321 91323 431649
2922 13028 1019346 4419959 19891990
a. Do Spearman’s rank correlation between and and test it for significance (6)
b. Compute the sample correlation between and and test it for significance (6)
c. Compute the sample standard deviation of and test to see if it equals 200 (4)
4/25/03 252x0342
4. Data from the previous page is repeated:
Area parents subsidiaries
1 658 2602 432964 1712116 6770404
2 396 1709 156816 676764 2920681
3 357 1852 127449 661164 3429904
4 266 1223 70756 325318 1495729
5 231 875 53361 202125 765625
6 223 666 49729 148518 443556
7 207 1519 42849 314433 2307361
8 156 884 24336 137904 781456
9 146 477 21316 69642 227529
10 143 564 20449 80652 318096
11 139 657 19321 91323 431649
2922 13028 1019346 4419959 19891990
a. Test the hypothesis that the correlation between and is .7 (5)
b. Test the hypothesis that has the Normal distribution. (9)
c. Test the hypothesis that and have equal variances. (4)
4/25/03 252x0342
5. Data from the previous page is repeated:
Area parents subsidiaries
1 658 2602 432964 1712116 6770404
2 396 1709 156816 676764 2920681
3 357 1852 127449 661164 3429904
4 266 1223 70756 325318 1495729
5 231 875 53361 202125 765625
6 223 666 49729 148518 443556
7 207 1519 42849 314433 2307361
8 156 884 24336 137904 781456
9 146 477 21316 69642 227529
10 143 564 20449 80652 318096
11 139 657 19321 91323 431649
2922 13028 1019346 4419959 19891990
a. Compute a simple regression of subsidiaries against parents as the independent variable. (5)
b. Compute . (3)
c. Predict how many subsidiaries will appear in a city with 60 parent corporations. (1)
d. Make your prediction in c) into a confidence interval. (3)
e. Compute and make it into a confidence interval for . (3)
f. Do an ANOVA for this regression and explain what it says about . (3)
4/25/03 252x0342
6. A chain has the following data on prices, promotion expenses and sales of one product. (You can do ):
Store sales price promotion
1 3842 59 200 3481
2 3754 59 400 3481
3 5000 59 600 3481
4 1916 79 200 6241
5 3224 79 200 6241
6 2618 79 400 6241
7 3746 79 600 6241
8 3825 79 600 6241
9 1096 99 200 9801
10 1882 99 400 9801
11 2159 99 400 9801
12 2927 99 600 9801
35989 968 4800 80852
Store
1 40000 14760964 226678 768400
2 160000 14092516 221486 1501600
3 360000 25000000 295000 3000000
4 40000 3671056 151364 383200
5 40000 10394176 254696 644800
6 160000 6853924 206822 1047200
7 360000 14032516 295934 2247600
8 360000 14630625 302175 2295000
9 40000 1201216 108504 219200
10 160000 3541924 186318 752800
11 160000 4661281 213741 863600
12 360000 8567329 289773 1756200
2240000 121407527 2752491 15479600
2999.08,80.6667 and 400.000.
a. Do a multiple regression of sales against and . (10)
b. Compute and adjusted for degrees of freedom. Use a regression ANOVA to test the usefulness of this regression. (6)
d. Use your regression to predict sales when price is 79 cents and promotion expenses are $200. (2)
e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval. (4)
f. If the regression of Price alone had the following output:The regression equation is
sales = 7391 - 54.4 price
Predictor Coef SE Coef T P
Constant 7391 1133 6.52 0.000
price -54.44 13.81 -3.94 0.003
S = 726.2 R-Sq = 60.9% R-Sq(adj) = 56.9%
Analysis of Variance
Source DF SS MS F P
Regression 1 8200079 8200079 15.55 0.003
Residual Error 10 5273437 527344
Total 11 13473517
Do an F-test to see if adding helped. (4). The next page is blank – please show your work.
4/25/03 252x0342 (Blank)
4/25/03 252x0342
7. The Lees present the following data on college students summer wages vs. years of work experience blocked by location.
Years of Work ExperienceRegion / 1 / 2 / 3
1 / 16 / 19 / 24
2 / 21 / 20 / 21
3 / 18 / 21 / 22
4 / 14 / 21 / 25
a. Do a 2-way ANOVA on these data and explain what hypotheses you test and what the conclusions are. (9) (Or do a 1-way ANOVA for 6 points.) The following column sums are done for you: 69,81,4,4,1217 and 1643. So 17.25,and 20.25.
b. Do a test of the equality of the means in columns 2 and 3 assuming that the columns are random samples from Normal populations with equal variances (4).
c. Assume that columns 2 and 3 do not come from a Normal distribution and are not paired data and do a test for equal medians. (4)
d. Test the following data for uniformity.
Category 1 2 3 4 5
Numbers 0 2 010 8
1