ECONOMETRICS I
Assignment 4
Regression Modeling
Professor William Greene Phone: 212.998.0876
Office: KMC 7-78 Home page:ww.stern.nyu.edu/~wgreene
Office Hours: Open Email:
URL for course web page:
www.stern.nyu.edu/~wgreene/Econometrics/Econometrics.htm
Your professor (once) drove to work each morning on the dreaded Long Island Distressway, from a faraway suburb (that sits high on a hill, overlooking things green and beautiful…..) The amount of time the drive took each day was relatively constant, but it did vary systematically with some obvious factors and with some less obvious factors. These included:
a. Time of departure from home – there was a distinct peaking effect.
b. Day of the week. Monday seemed to be the peak day, Friday was less crowded.
c. Holiday. Holidays were special.
d. Rain. Actual occurrence of rain had an obvious effect. 99.9% of New Yorkers cannot drive in the
rain. Your professor could. So could all drivers of SUV's. They were always willing to
demonstrate this, at high speed.
e. Snow. Snow on the road had a surprising effect on drive time. Many people stayed home. New
Yorkers are also unable to drive in snow. However, unlike rain, they do not like even to try to
drive in snow.
e. The idiot effect. An idiot in an SUV who weaves their way into (up) a tree or into some other car
affects everyone’s time for most of the morning.
The data set commute.dat, which is posted in the Problem Sets section of our course website contains a large sample of observations on these effects and the drive time. You can download them from
http://www.stern.nyu.edu/~wgreene/Econometrics/Commute.dat
Your assignment is to build a regression model for drive time which incorporates these observed effects. Some notes about the data set:
1. Drive time is coded in minutes
2. Exit time is coded in minutes past 6:30 in the morning. E.g., 5.0 means 6:35.
3. Day is coded 1 for Mon., 2 for Tues, etc.
4. Holiday, Rain, Snow, and Idiot are ordinary dummy variables.
A hint: There is a strong peaking effect with respect to time of day. The peak comes at 7:00 AM. After that, the expected drive time goes down a little. Before 7:00, expected drive time goes up steadily. You will have to think about this one a bit, and decide how you want departure time to enter your equation. Note, it is not a simple linear term.
a. Estimate the coefficients of your model.
I started by normalizing the exit time to 7:00, since I think that is where the peak time is. I did this by subtracting 30 from EXIT. Thus, EXIT=0 means 7:00 departure time. I expanded the day variable into 5 weekday dummy variables. Regression results are below. Note that there is no overall constant, but there are 5 dummy variables that sum to 1.0, so that is the same as if I had an overall constant and dropped one of the day dummy variables. NLOGIT does not realize this, so it issues its standard warning about R2. Notice that the 5 dummy variables display an uexpected weekday frequency, rising until Wednesday, then declining on Thursday and Friday. This shows that everything else held constant, we have a day of week cycle effect, but Monday does not seem to be the peak day, as suggested in the problem.
------
Ordinary least squares regression ......
LHS=TIME Mean = 102.26667
Standard deviation = 28.16665
Number of observs. = 75
Model size Parameters = 11
Degrees of freedom = 64
Residuals Sum of squares = 12635.31941
Standard error of e = 14.05087
Fit R-squared = .78478
Adjusted R-squared = .75115
Model test F[ 10, 64] (prob) = 23.3(.0000)
Diagnostic Log likelihood = -298.67401
Restricted(b=0) = -356.27744
Chi-sq [ 10] (prob) = 115.2( .0000)
Info criter. LogAmemiya Prd. Crt. = 5.42223
Akaike Info. Criter. = 5.42010
Bayes Info. Criter. = 5.75999
Not using OLS or no constant. Rsqrd & F may be < 0
Model was estimated on Nov 03, 2010 at 00:11:51 PM
------+------
| Standard Prob. Mean
TIME| Coefficient Error t t>|T| of X
------+------
DEPART| -1.31547*** .19446 -6.76 .0000 -5.65333
DEPARTSQ| -.07054*** .01096 -6.44 .0000 198.587
MON| 101.039*** 3.94255 25.63 .0000 .20000
TUES| 104.113*** 4.43304 23.49 .0000 .20000
WED| 118.491*** 4.02846 29.41 .0000 .20000
THURS| 104.301*** 4.54932 22.93 .0000 .20000
FRI| 93.3165*** 4.14063 22.54 .0000 .20000
HOLIDAY| 1.01826 8.59433 .12 .9061 .04000
RAIN| 56.5480*** 5.84818 9.67 .0000 .10667
SNOW| -20.1722*** 6.77551 -2.98 .0041 .08000
IDIOT| 9.54156 15.42995 .62 .5385 .01333
------+------
To explore what the model is suggesting about drive time, I plotted its predictions as a function of departure time for two specific days, a snowy Friday and a rainy Wednesday, neither a holiday, both idiot free. My model suggests that the driving time will be greatest if I leave at about 10 minutes until 7:00.
create;depart=exit-30$
create;departsq=depart^2$
create;mon=day=1;tues=day=2;wed=day=3;thurs=day=4
;fri=day=5$
regress;lhs=time;rhs=depart,departsq,
mon,tues,wed,thurs,fri
,holiday,rain,snow,idiot
;keep=predtime;list$
create;exittime=trn(-30,.5)$
create;yf_SF=b(1)*exittime + b(2)*exittime^2
+b(7)+b(10)$ Friday with snow
create;yf_RW=b(1)*exittime + b(2)*exittime^2
+b(5)+b(9)$ Wednesday with rain
plot;lhs=exittime;rhs=yf_SF,yf_RW; grid ; spikes=0; fill
;title=Drive Time as Function of Exit Time$
b. Identify which are the most significant effects on drive time, and which are the least important.
Departure time relative to 7:00 does seem to matter. Weekday matters, of course, but to know if the weekdays are different, we would need to test the hypothesis that the differences are zero, not that the coefficients are different from zero. Factors that seem not to matter are HOLIDAY and IDIOT. The weather definitely matters.
c. Compute the predicted drive times based on your equation.
Observation Observed Y Predicted Y Residual 95% Forecast Interval
1 101.00000 89.468190 11.531810 59.708158 119.22822
2 89.000000 96.628042 -7.6280418 67.292425 125.96366
3 133.00000 142.79615 -9.7961456 111.78719 173.80510
4 121.00000 124.11852 -3.1185153 94.787899 153.44913
5 81.000000 80.091778 .9082216 49.460445 110.72311
6 104.00000 100.56866 3.4313409 71.293956 129.84336
7 92.000000 107.41108 -15.411081 74.431307 140.39086
8 97.000000 94.648869 2.3511311 65.505590 123.79215
9 186.00000 166.02896 19.971037 134.90366 197.15426
10 158.00000 160.00019 -2.0001916 129.51233 190.48806
11 78.000000 64.294156 13.705844 34.720121 93.868191
12 131.00000 107.42462 23.575379 78.016107 136.83314
13 87.000000 78.357615 8.6423853 48.306774 108.40846
14 109.00000 123.08141 -14.081412 93.744460 152.41836
15 104.00000 103.45220 .5478023 74.215828 132.68857
16 144.00000 142.79615 1.2038544 111.78719 173.80510
17 94.000000 102.72704 -8.7270422 73.284419 132.16967
18 78.000000 86.477594 -8.4775937 55.547061 117.40813
19 92.000000 92.698540 -.6985404 63.542042 121.85504
20 104.00000 85.643405 18.356595 55.584844 115.70197
21 82.000000 88.115812 -6.1158121 58.653383 117.57824
22 96.000000 86.257011 9.7429886 56.718753 115.79527
23 121.00000 107.14019 13.859813 77.934348 136.34603
24 130.00000 109.46648 20.533523 80.098746 138.83421
25 99.000000 108.70396 -9.7039577 79.544649 137.86327
26 157.00000 166.41293 -9.4129323 135.18444 197.64143
27 58.000000 75.763520 -17.763520 46.548983 104.97806
28 85.000000 74.846825 10.153175 44.528002 105.16565
29 72.000000 100.63011 -28.630113 71.481818 129.77841
30 89.000000 90.607133 -1.6071331 61.427108 119.78716
31 99.000000 104.02888 -5.0288782 74.826022 133.23173
32 110.00000 98.669898 11.330102 69.289766 128.05003
33 111.00000 103.38818 7.6118164 74.213185 132.56318
34 116.00000 111.25418 4.7458213 78.117211 144.39115
35 164.00000 164.00000 .000000 124.30326 203.69674
36 91.000000 99.251415 -8.2514147 69.877535 128.62529
37 65.000000 84.975622 -19.975622 55.818329 114.13292
38 110.00000 94.561410 15.438590 65.272166 123.85065
39 130.00000 113.28985 16.710154 83.646100 142.93359
40 81.000000 91.410177 -10.410177 62.043538 120.77682
41 109.00000 121.47999 -12.479994 92.123587 150.83640
42 104.00000 101.38787 2.6121296 71.898106 130.87764
43 105.00000 94.334740 10.665260 61.306804 127.36268
44 123.00000 157.11665 -34.116653 125.77664 188.45666
45 108.00000 109.48097 -1.4809688 79.881931 139.08001
46 116.00000 109.74106 6.2589392 80.524571 138.95755
47 59.000000 64.294156 -5.2941559 34.720121 93.868191
48 99.000000 103.64232 -4.6423202 74.544416 132.74022
49 99.000000 108.92694 -9.9269375 79.542960 138.31092
50 139.00000 123.84393 15.156068 94.542683 153.14518
51 96.000000 88.719707 7.2802932 57.959506 119.47991
52 72.000000 99.100312 -27.100312 69.141648 129.05898
53 125.00000 106.66740 18.332600 77.469735 135.86506
54 100.00000 102.39468 -2.3946763 72.603189 132.18616
55 85.000000 82.884215 2.1157851 53.731147 112.03728
56 80.000000 62.765960 17.234040 31.015836 94.516084
57 71.000000 75.096033 -4.0960334 45.418421 104.77365
58 89.000000 79.864470 9.1355301 49.053247 110.67569
59 66.000000 75.278655 -9.2786554 45.055565 105.50175
60 139.00000 124.61608 14.383922 95.292419 153.93974
61 59.000000 83.658069 -24.658069 52.996777 114.31936
62 79.000000 70.458545 8.5414554 39.131887 101.78520
63 60.000000 57.713001 2.2869992 27.784229 87.641772
64 91.000000 88.374647 2.6253530 59.158488 117.59081
65 108.00000 124.59130 -16.591303 95.264375 153.91823
66 89.000000 97.722530 -8.7225300 68.233163 127.21190
67 120.00000 103.38818 16.611816 74.213185 132.56318
68 120.00000 118.49051 1.5094902 89.289784 147.69124
69 64.000000 63.920126 .0798744 32.895125 94.945126
70 195.00000 160.84897 34.151031 129.94154 191.75640
71 80.000000 84.092297 -4.0922967 54.300319 113.88427
72 88.000000 95.485030 -7.4850295 66.010313 124.95975
73 123.00000 118.49051 4.5094902 89.289784 147.69124
74 73.000000 86.001082 -13.001082 56.733710 115.26845
75 88.000000 103.64232 -15.642320 74.544416 132.74022
d. Note that only one day was affected by the idiot effect. Compare the actual and fitted values for
this day, and comment.
This was observation 35. The model predicts this day perfectly, as shown in the table above.
e. Compare the day of the week effects. Is the drive typically worse on Tuesday than on Thursday?
Tuesday and Thursday seem to be identical. To find out, I imposed the constraint that these two effects were equal, and used an F test. The F statistic is less than 0.1. There seems to be no systematic difference between Tuesday and Thursday.
regress;lhs=time;rhs=depart,departsq,
mon,tues,wed,thurs,fri
,holiday,rain,snow,idiot
;cls:b(4)-b(6)=0$
f. Test the hypothesis that my guess about weekday effects is incorrect. That is, test the hypothesis
that variation in drive time is not explained by day of the week.
regress;lhs=time;rhs=depart,departsq,
one,holiday,rain,snow,idiot$
When all 5 week dummy variables are in the equation, R2 = .78478. When only an overall constant is included, R2 falls to .70566. F for the 4 restrictions is [(.78478-.70566)/4]/[(1-.78478)/(75-11)] = 5.88.
The critical value for 95% for 4 and 11 degrees of freedom is 3.57, so the hypothesis is rejected.
------
Ordinary least squares regression ......
LHS=TIME Mean = 102.26667
Standard deviation = 28.16665
Number of observs. = 75
Model size Parameters = 7
Degrees of freedom = 68
Residuals Sum of squares = 17280.43531
Standard error of e = 15.94127
Fit R-squared = .70566
Adjusted R-squared = .67969
Model test F[ 6, 68] (prob) = 27.2(.0000)
Diagnostic Log likelihood = -310.41447
Restricted(b=0) = -356.27744
Chi-sq [ 6] (prob) = 91.7( .0000)
Info criter. LogAmemiya Prd. Crt. = 5.62705
Akaike Info. Criter. = 5.62651
Bayes Info. Criter. = 5.84281
Model was estimated on Nov 03, 2010 at 00:29:27 PM
------+------
| Standard Prob. Mean
TIME| Coefficient Error t t>|T| of X
------+------
DEPART| -1.34552*** .21359 -6.30 .0000 -5.65333
DEPARTSQ| -.07513*** .01226 -6.13 .0000 198.587
Constant| 104.556*** 2.73499 38.23 .0000
HOLIDAY| -3.91592 9.60754 -.41 .6849 .04000
RAIN| 62.5470*** 6.42709 9.73 .0000 .10667
SNOW| -19.2242*** 6.97695 -2.76 .0075 .08000
IDIOT| 3.48077 17.07950 .20 .8391 .01333
------+------
g. Test the hypothesis that the expected drive time on a rainy holiday is the same as on an ordinary
day on which it does not rain.
This question is ambiguous. If both coefficients, rainy and holiday, are zero, then the hypothesis is true. The R2 falls from .78478 to .46998, which produces an F of 46.8 which is certainly significant for two restrictions. However, the hypothesis is also true if the holiday and rain effects cancel each other out, which would mean that the coefficients are the same, but with opposite signs. The F statistic for testing whether the two coefficients sum to 0.0 is 29.4, which is also highly signiricant. Therefore, either way, the hypothesis is rejected.
regress;lhs=time;rhs=depart,departsq,
mon,tues,wed,thurs,fri
,snow,idiot$
regress;lhs=time;rhs=depart,departsq,
mon,tues,wed,thurs,fri
,holiday,rain,snow,idiot
;cls:b(8)=0,b(9)=0$
regress;lhs=time;rhs=depart,departsq,
mon,tues,wed,thurs,fri
,holiday,rain,snow,idiot
;cls:b(8)+b(9)=0 $
h. My worst day took a miserable 195 minutes to get to work. What were the conditions that day.
What drive time would your regression model predict for that day?
That was day 70, a rainy Thursday. Not a holiday, not the day that the idiot drove up the tree. My model predicted 160.8 minutes. That was the 4th highest prediction my model made, not actually all that accurate. It looks like day 70 was an outlier. (There are many explanations for such an observation that are not contained in the model – some would be one time events such as the idiot effect, such as the afternoon drive that took 6 hours because a gasoline truck exploded 40 on the freeway, 40 miles east of New York, and closed the highway for more than 40 miles in both directions. This is the stuff that outliers – large random draws – are made of..)