ECONOMETRICS I

Assignment 4

Regression Modeling

Professor William Greene Phone: 212.998.0876

Office: KMC 7-78 Home page:ww.stern.nyu.edu/~wgreene

Office Hours: Open Email:

URL for course web page:

www.stern.nyu.edu/~wgreene/Econometrics/Econometrics.htm

Your professor (once) drove to work each morning on the dreaded Long Island Distressway, from a faraway suburb (that sits high on a hill, overlooking things green and beautiful…..) The amount of time the drive took each day was relatively constant, but it did vary systematically with some obvious factors and with some less obvious factors. These included:

a.  Time of departure from home – there was a distinct peaking effect.

b.  Day of the week. Monday seemed to be the peak day, Friday was less crowded.

c.  Holiday. Holidays were special.

d.  Rain. Actual occurrence of rain had an obvious effect. 99.9% of New Yorkers cannot drive in the

rain. Your professor could. So could all drivers of SUV's. They were always willing to

demonstrate this, at high speed.

e. Snow. Snow on the road had a surprising effect on drive time. Many people stayed home. New

Yorkers are also unable to drive in snow. However, unlike rain, they do not like even to try to

drive in snow.

e.  The idiot effect. An idiot in an SUV who weaves their way into (up) a tree or into some other car

affects everyone’s time for most of the morning.

The data set commute.dat, which is posted in the Problem Sets section of our course website contains a large sample of observations on these effects and the drive time. You can download them from

http://www.stern.nyu.edu/~wgreene/Econometrics/Commute.dat

Your assignment is to build a regression model for drive time which incorporates these observed effects. Some notes about the data set:

1.  Drive time is coded in minutes

2.  Exit time is coded in minutes past 6:30 in the morning. E.g., 5.0 means 6:35.

3.  Day is coded 1 for Mon., 2 for Tues, etc.

4.  Holiday, Rain, Snow, and Idiot are ordinary dummy variables.

A hint: There is a strong peaking effect with respect to time of day. The peak comes at 7:00 AM. After that, the expected drive time goes down a little. Before 7:00, expected drive time goes up steadily. You will have to think about this one a bit, and decide how you want departure time to enter your equation. Note, it is not a simple linear term.


a. Estimate the coefficients of your model.

I started by normalizing the exit time to 7:00, since I think that is where the peak time is. I did this by subtracting 30 from EXIT. Thus, EXIT=0 means 7:00 departure time. I expanded the day variable into 5 weekday dummy variables. Regression results are below. Note that there is no overall constant, but there are 5 dummy variables that sum to 1.0, so that is the same as if I had an overall constant and dropped one of the day dummy variables. NLOGIT does not realize this, so it issues its standard warning about R2. Notice that the 5 dummy variables display an uexpected weekday frequency, rising until Wednesday, then declining on Thursday and Friday. This shows that everything else held constant, we have a day of week cycle effect, but Monday does not seem to be the peak day, as suggested in the problem.

------

Ordinary least squares regression ......

LHS=TIME Mean = 102.26667

Standard deviation = 28.16665

Number of observs. = 75

Model size Parameters = 11

Degrees of freedom = 64

Residuals Sum of squares = 12635.31941

Standard error of e = 14.05087

Fit R-squared = .78478

Adjusted R-squared = .75115

Model test F[ 10, 64] (prob) = 23.3(.0000)

Diagnostic Log likelihood = -298.67401

Restricted(b=0) = -356.27744

Chi-sq [ 10] (prob) = 115.2( .0000)

Info criter. LogAmemiya Prd. Crt. = 5.42223

Akaike Info. Criter. = 5.42010

Bayes Info. Criter. = 5.75999

Not using OLS or no constant. Rsqrd & F may be < 0

Model was estimated on Nov 03, 2010 at 00:11:51 PM

------+------

| Standard Prob. Mean

TIME| Coefficient Error t t>|T| of X

------+------

DEPART| -1.31547*** .19446 -6.76 .0000 -5.65333

DEPARTSQ| -.07054*** .01096 -6.44 .0000 198.587

MON| 101.039*** 3.94255 25.63 .0000 .20000

TUES| 104.113*** 4.43304 23.49 .0000 .20000

WED| 118.491*** 4.02846 29.41 .0000 .20000

THURS| 104.301*** 4.54932 22.93 .0000 .20000

FRI| 93.3165*** 4.14063 22.54 .0000 .20000

HOLIDAY| 1.01826 8.59433 .12 .9061 .04000

RAIN| 56.5480*** 5.84818 9.67 .0000 .10667

SNOW| -20.1722*** 6.77551 -2.98 .0041 .08000

IDIOT| 9.54156 15.42995 .62 .5385 .01333

------+------

To explore what the model is suggesting about drive time, I plotted its predictions as a function of departure time for two specific days, a snowy Friday and a rainy Wednesday, neither a holiday, both idiot free. My model suggests that the driving time will be greatest if I leave at about 10 minutes until 7:00.

create;depart=exit-30$

create;departsq=depart^2$

create;mon=day=1;tues=day=2;wed=day=3;thurs=day=4

;fri=day=5$

regress;lhs=time;rhs=depart,departsq,

mon,tues,wed,thurs,fri

,holiday,rain,snow,idiot

;keep=predtime;list$

create;exittime=trn(-30,.5)$

create;yf_SF=b(1)*exittime + b(2)*exittime^2

+b(7)+b(10)$ Friday with snow

create;yf_RW=b(1)*exittime + b(2)*exittime^2

+b(5)+b(9)$ Wednesday with rain

plot;lhs=exittime;rhs=yf_SF,yf_RW; grid ; spikes=0; fill

;title=Drive Time as Function of Exit Time$

b. Identify which are the most significant effects on drive time, and which are the least important.

Departure time relative to 7:00 does seem to matter. Weekday matters, of course, but to know if the weekdays are different, we would need to test the hypothesis that the differences are zero, not that the coefficients are different from zero. Factors that seem not to matter are HOLIDAY and IDIOT. The weather definitely matters.


c. Compute the predicted drive times based on your equation.

Observation Observed Y Predicted Y Residual 95% Forecast Interval

1 101.00000 89.468190 11.531810 59.708158 119.22822

2 89.000000 96.628042 -7.6280418 67.292425 125.96366

3 133.00000 142.79615 -9.7961456 111.78719 173.80510

4 121.00000 124.11852 -3.1185153 94.787899 153.44913

5 81.000000 80.091778 .9082216 49.460445 110.72311

6 104.00000 100.56866 3.4313409 71.293956 129.84336

7 92.000000 107.41108 -15.411081 74.431307 140.39086

8 97.000000 94.648869 2.3511311 65.505590 123.79215

9 186.00000 166.02896 19.971037 134.90366 197.15426

10 158.00000 160.00019 -2.0001916 129.51233 190.48806

11 78.000000 64.294156 13.705844 34.720121 93.868191

12 131.00000 107.42462 23.575379 78.016107 136.83314

13 87.000000 78.357615 8.6423853 48.306774 108.40846

14 109.00000 123.08141 -14.081412 93.744460 152.41836

15 104.00000 103.45220 .5478023 74.215828 132.68857

16 144.00000 142.79615 1.2038544 111.78719 173.80510

17 94.000000 102.72704 -8.7270422 73.284419 132.16967

18 78.000000 86.477594 -8.4775937 55.547061 117.40813

19 92.000000 92.698540 -.6985404 63.542042 121.85504

20 104.00000 85.643405 18.356595 55.584844 115.70197

21 82.000000 88.115812 -6.1158121 58.653383 117.57824

22 96.000000 86.257011 9.7429886 56.718753 115.79527

23 121.00000 107.14019 13.859813 77.934348 136.34603

24 130.00000 109.46648 20.533523 80.098746 138.83421

25 99.000000 108.70396 -9.7039577 79.544649 137.86327

26 157.00000 166.41293 -9.4129323 135.18444 197.64143

27 58.000000 75.763520 -17.763520 46.548983 104.97806

28 85.000000 74.846825 10.153175 44.528002 105.16565

29 72.000000 100.63011 -28.630113 71.481818 129.77841

30 89.000000 90.607133 -1.6071331 61.427108 119.78716

31 99.000000 104.02888 -5.0288782 74.826022 133.23173

32 110.00000 98.669898 11.330102 69.289766 128.05003

33 111.00000 103.38818 7.6118164 74.213185 132.56318

34 116.00000 111.25418 4.7458213 78.117211 144.39115

35 164.00000 164.00000 .000000 124.30326 203.69674

36 91.000000 99.251415 -8.2514147 69.877535 128.62529

37 65.000000 84.975622 -19.975622 55.818329 114.13292

38 110.00000 94.561410 15.438590 65.272166 123.85065

39 130.00000 113.28985 16.710154 83.646100 142.93359

40 81.000000 91.410177 -10.410177 62.043538 120.77682

41 109.00000 121.47999 -12.479994 92.123587 150.83640

42 104.00000 101.38787 2.6121296 71.898106 130.87764

43 105.00000 94.334740 10.665260 61.306804 127.36268

44 123.00000 157.11665 -34.116653 125.77664 188.45666

45 108.00000 109.48097 -1.4809688 79.881931 139.08001

46 116.00000 109.74106 6.2589392 80.524571 138.95755

47 59.000000 64.294156 -5.2941559 34.720121 93.868191

48 99.000000 103.64232 -4.6423202 74.544416 132.74022

49 99.000000 108.92694 -9.9269375 79.542960 138.31092

50 139.00000 123.84393 15.156068 94.542683 153.14518

51 96.000000 88.719707 7.2802932 57.959506 119.47991

52 72.000000 99.100312 -27.100312 69.141648 129.05898

53 125.00000 106.66740 18.332600 77.469735 135.86506

54 100.00000 102.39468 -2.3946763 72.603189 132.18616

55 85.000000 82.884215 2.1157851 53.731147 112.03728

56 80.000000 62.765960 17.234040 31.015836 94.516084

57 71.000000 75.096033 -4.0960334 45.418421 104.77365

58 89.000000 79.864470 9.1355301 49.053247 110.67569

59 66.000000 75.278655 -9.2786554 45.055565 105.50175

60 139.00000 124.61608 14.383922 95.292419 153.93974

61 59.000000 83.658069 -24.658069 52.996777 114.31936

62 79.000000 70.458545 8.5414554 39.131887 101.78520

63 60.000000 57.713001 2.2869992 27.784229 87.641772

64 91.000000 88.374647 2.6253530 59.158488 117.59081

65 108.00000 124.59130 -16.591303 95.264375 153.91823

66 89.000000 97.722530 -8.7225300 68.233163 127.21190

67 120.00000 103.38818 16.611816 74.213185 132.56318

68 120.00000 118.49051 1.5094902 89.289784 147.69124

69 64.000000 63.920126 .0798744 32.895125 94.945126

70 195.00000 160.84897 34.151031 129.94154 191.75640

71 80.000000 84.092297 -4.0922967 54.300319 113.88427

72 88.000000 95.485030 -7.4850295 66.010313 124.95975

73 123.00000 118.49051 4.5094902 89.289784 147.69124

74 73.000000 86.001082 -13.001082 56.733710 115.26845

75 88.000000 103.64232 -15.642320 74.544416 132.74022

d. Note that only one day was affected by the idiot effect. Compare the actual and fitted values for

this day, and comment.

This was observation 35. The model predicts this day perfectly, as shown in the table above.

e. Compare the day of the week effects. Is the drive typically worse on Tuesday than on Thursday?

Tuesday and Thursday seem to be identical. To find out, I imposed the constraint that these two effects were equal, and used an F test. The F statistic is less than 0.1. There seems to be no systematic difference between Tuesday and Thursday.

regress;lhs=time;rhs=depart,departsq,

mon,tues,wed,thurs,fri

,holiday,rain,snow,idiot

;cls:b(4)-b(6)=0$

f. Test the hypothesis that my guess about weekday effects is incorrect. That is, test the hypothesis

that variation in drive time is not explained by day of the week.

regress;lhs=time;rhs=depart,departsq,

one,holiday,rain,snow,idiot$

When all 5 week dummy variables are in the equation, R2 = .78478. When only an overall constant is included, R2 falls to .70566. F for the 4 restrictions is [(.78478-.70566)/4]/[(1-.78478)/(75-11)] = 5.88.

The critical value for 95% for 4 and 11 degrees of freedom is 3.57, so the hypothesis is rejected.

------

Ordinary least squares regression ......

LHS=TIME Mean = 102.26667

Standard deviation = 28.16665

Number of observs. = 75

Model size Parameters = 7

Degrees of freedom = 68

Residuals Sum of squares = 17280.43531

Standard error of e = 15.94127

Fit R-squared = .70566

Adjusted R-squared = .67969

Model test F[ 6, 68] (prob) = 27.2(.0000)

Diagnostic Log likelihood = -310.41447

Restricted(b=0) = -356.27744

Chi-sq [ 6] (prob) = 91.7( .0000)

Info criter. LogAmemiya Prd. Crt. = 5.62705

Akaike Info. Criter. = 5.62651

Bayes Info. Criter. = 5.84281

Model was estimated on Nov 03, 2010 at 00:29:27 PM

------+------

| Standard Prob. Mean

TIME| Coefficient Error t t>|T| of X

------+------

DEPART| -1.34552*** .21359 -6.30 .0000 -5.65333

DEPARTSQ| -.07513*** .01226 -6.13 .0000 198.587

Constant| 104.556*** 2.73499 38.23 .0000

HOLIDAY| -3.91592 9.60754 -.41 .6849 .04000

RAIN| 62.5470*** 6.42709 9.73 .0000 .10667

SNOW| -19.2242*** 6.97695 -2.76 .0075 .08000

IDIOT| 3.48077 17.07950 .20 .8391 .01333

------+------


g. Test the hypothesis that the expected drive time on a rainy holiday is the same as on an ordinary

day on which it does not rain.

This question is ambiguous. If both coefficients, rainy and holiday, are zero, then the hypothesis is true. The R2 falls from .78478 to .46998, which produces an F of 46.8 which is certainly significant for two restrictions. However, the hypothesis is also true if the holiday and rain effects cancel each other out, which would mean that the coefficients are the same, but with opposite signs. The F statistic for testing whether the two coefficients sum to 0.0 is 29.4, which is also highly signiricant. Therefore, either way, the hypothesis is rejected.

regress;lhs=time;rhs=depart,departsq,

mon,tues,wed,thurs,fri

,snow,idiot$

regress;lhs=time;rhs=depart,departsq,

mon,tues,wed,thurs,fri

,holiday,rain,snow,idiot

;cls:b(8)=0,b(9)=0$

regress;lhs=time;rhs=depart,departsq,

mon,tues,wed,thurs,fri

,holiday,rain,snow,idiot

;cls:b(8)+b(9)=0 $

h. My worst day took a miserable 195 minutes to get to work. What were the conditions that day.

What drive time would your regression model predict for that day?

That was day 70, a rainy Thursday. Not a holiday, not the day that the idiot drove up the tree. My model predicted 160.8 minutes. That was the 4th highest prediction my model made, not actually all that accurate. It looks like day 70 was an outlier. (There are many explanations for such an observation that are not contained in the model – some would be one time events such as the idiot effect, such as the afternoon drive that took 6 hours because a gasoline truck exploded 40 on the freeway, 40 miles east of New York, and closed the highway for more than 40 miles in both directions. This is the stuff that outliers – large random draws – are made of..)