Homework: Ordinary Least Squares (OLS) regression in SAS and Enterprise Miner
Question 1: Short answer, classical homework problem
A club keeps track of its number of reservations for a buffet lunch (X) and the number of members (Y) that actually eat there. Though the club requests reservations, they do not refuse to serve walk in customers and occasionally people who reserve spots do not show up. To predict Y from X, they have to choose among only these equations:
Y = 35 + 1X + e
Y = 68 + 0.8X + e
Y = 79 + 0.7X + e
Which of these comes closest in spirit to an ordinary least squares fit? Explain your choice. Here are the (X,Y) data pairs:
(140, 178), (135, 162), (147, 180), (138, 188)
Question 2: Deer Related Crashes Report Question (this mimics the in class regression demo)
Create your library of data. I called it c:\workshop\winsas\aaem. Open SAS and issue your libname statement. Mine is
libname aaem ‘c:\workshop\winsas\aaem’;
Deer and other data are available from the data link on our course web page
http://www4.stat.ncsu.edu/~dickey/Analytics/Datamine/data/
This dataset is a personal communication from the NC Department of Transportation. It is a monthly time series of automobile crash reports split into variables DEER (number of accident reports involving the word Deer) and NONDEER (other accidents) in North Carolina. Imagine you work for an automobile insurance company. The goal here is to see if there would be any reason to purchase a radio ad or include a flyer in a monthly bill to be aware of deer in the coming month or so. With that in mind, write up a businesslike report that addresses the following specific points and adding anything else you think would be of interest. End with your recommendation in terms of the goal.
(1) Plot deer and p = deer/(deer+nondeer) by time (2 graphs) adding some commentary.
(2) In SAS/STAT (not Enterprise Miner) run a regression of deer on time t and the seasonal dummies mon1-mon12, then run this again leaving out mon12. Look at what happened and include a brief explanation in your report. Is there a significant increase/decrease in deer related crashes over time?
(3) Try the same thing in Enterprise Miner
- Create a project
- Add the startup code using your libname statement (I’d use the one above, e.g.)
- Create a diagram (DeerCrash e.g.)
- Create a data source (aaem.deer)
- Reject nondeer, specify date as an ID, time and mon1 through mon12 as interval. Specify deer as the target.
- Pull the data source into your diagram.
- In the properties panel of the data source node, select imported data … then highlight the deer data and select actions-> plot to make a line plot.
- Under the model subtab, select regression (near the right side of the icon list)
- Connect the data import node to the regression node.
- Right click on the node, select run, then view the results. Compare to the PROC REG output. Are they the same?
- Look at the parameter estimates. Show how these are combined to compute the predicted value for December 2004 and compare the prediction to the observed value.