SAS Program Code Comparing Regression Modeling with Autocorrelation Errors
This is the SAS program code that compares neural network estimates with time series autoregressive forecasting estimates i.e. regression modeling with autoregressive errors. The SAS code is from the number of private housing units started example. In the following example, a second order autoregressive model is applied by modeling the current target values based on the observations that are lagged behind two successive periods i.e. two separate target lag variables, with two additional input variables of U.S. construction contracts and average new house mortgage rate in the model. The basic configuration to the neural network model is illustrated in the autoregressive time series example in section 4.4.
proc format;
value grpfmt 1 = 'Historical' 2 = 'Regression' 3 = 'Neural Network';
run;
filename sasdata 'c:\AutoReg\housecon.txt';
/* SAS/ETS Software Application Guide */
/* Chapter 9 page 152-153 */
* Read private housing units started data;
data housecon;
infile sasdata;
retain time 0;
input date:monyy5. constr intrate hstarts @@;
lagh1 = LAG(hstarts); * Create one period behind target lag variable;
lagh2 = LAG2(hstarts);* Create second period behind target lag var;
time = time + 1;
run;
* Compute the autoregressive of 2nd order correlation AR(2) model;
proc autoreg data=housecon;
model hstarts=constr intrate / nlag=2 method=ml;
output out=autoreg p=pred r=resid;
run;
* Alternatively, fit an autoregressive AR(2) model using PROC ARIMA;
proc arima data=housecon;
* The IDENTIFY statement is used to identify the target variable to
predict and the input variables to the regression model. The INPUT
statement denotes the input variables used in the CROSSCOR statement
that identifies the degree of differencing in the input variables. In
this example differencing in the input variables was not performed;
identify var=hstarts crosscor=(constr intrate);
estimate p=2 input=(constr intrate) method=ml;
run;
* Calculate neural network prediction estimates from the training and test
data sets by selecting the Process and Score: Training, Validation or
Test check box within the Output tab in the Neural Network node that
creates the corresponding scored data sets;
data neural(keep=date p_hstarts);
set emdata.strnhttl
emdata.stst9bld;
run;
* Merge both the neural network and autoregressive forecasting estimates;
data final;
set housecon (in=ina
rename=(hstarts=pred))
autoreg(in=inb)
neural (in=inc
rename=(p_hstarts=pred));
if ina then group = 1;
else if inb then group = 2;
else if inc then group = 3;
run;
* Plot the Neural Network estimates with the Auto-Regressive estimates
over time;
axis1 label=(c=black f= zapf h=1 j=c 'Date')
value=(h=1 f=zapf)
minor=none
width=3;
axis2 label=(c=black f= zapf h=1 j=c 'Housing Starts')
value=(h=1 f=zapf)
minor=none
order=(0 to 200 by 50)
width=3;
title1 font=zapf height=1.5 'Autoregression Estimates';
title2 font=zapf height=1.2
'Private Housing Started Measured in Thousands of Units';
symbol1 i=join value=dot color=black height=1;
symbol2 i=join value=circle color=black height=1;
symbol3 i=join value=star color=black height=1;
legend1 label=(f=zapf h=1 '') value=(j=c h=1 f=zapf) position=center;
proc gplot data=final;
plot pred*date=group / haxis=axis1
vaxis=axis2
legend=legend1;
footnote5 font=swiss justify=left height=1 'Autoregression Model: HOUSING STARTS = CONTRACTS + INTRATE + LAG(Y) + LAG2(Y) + Error';
footnote6 font=swiss justify=left height=1
"Data Source: SAS/ETS Software Application Guide 1";
format group grpfmt.;
run;
* Perform diagnostic analysis to the residual by observing the Q chi-
square statistic and constructing the ACF and PACF plots of the
residuals from the time series model to observe white noise indicating
an adequate fit to the model;
proc arima data=autoreg;
identify var=resid;
run;