1

Regress™User’s Guide

Jeffrey H. Moore

Graduate School of Business, Stanford University

Introduction

A statistical regression add-in to Microsoft Excel on Macintosh and Windows computers, Regress[1] makes exploratory regression analysis fast and convenient. It does this by displaying preliminary results in a temporary window that you can easily dismiss during the exploratory modeling phase. This allows you to run many regression models, interactively adding and dropping variables, as you evaluate alternative models for reasonableness and goodness-of-fit.

Since Regress is an add-in to Excel, you have all the normal Excel worksheet capabilities available. This includes, for example, the ability to cut and paste the results into another Excel worksheet, Microsoft Word document, or other Windows or Macintosh compatible application. Moreover, you have complete freedom to change the formatting, the fonts, the size of tables and charts, etc., to suit your particular requirements. This flexibility is rarely present in other dedicated regression packages.

ITutorial[ ]

Getting Started with Regress

To begin, launch Excel as usual.[2] Regress should appear as the last item on the "Tools" menu. See the figure below, left. If you do not see “Regress” listed, it has not been properly installed on your computer (or, if you purchased Regress, you neglected to re-start Excel after installing it on your computer.)

Now, click the “Regress” item on the Tools menu, causing Regress to be loaded into Excel as an Add-In. During the load process, you should see three changes:

(1)Excel will ask if you wish to allow a macro add-in to be opened. Click “Enable Macros” to begin.

(2) A herald window will remind you that you need to open a workbook containing your data. (See the figure below, right.)

(3)Finally, a small toolbar entitled “Regress,” containing three buttons, will appear in the upper right corner of your screen. (See the small figure below.)

When the herald window closes, loading of Regress is complete. Regress will remain loaded into Excel and available until you quit Regress by clicking the “Q” button. Normal spreadsheet functions are unaffected by Regress, and you can use Excel as usual. Since its only function is to initiate loading, the “Regress” item on the Tools menu is removed after the Regress Toolbar appears. Thereafter, all regression modeling is done by clicking each of the three Regress Toolbar buttons, the middle one of which is not yet defined.

The Regress Toolbar “floats” above all Excel windows, and as with any Excel toolbar, you may move it out of the way or dock it by click-dragging its title bar.[3]

A sheet within a workbook containing the data for your regression modeling is called a Regress “datasheet” in this User’s Guide. The topmost row in any Regress datasheet is a special row containing each variable’s name at the top of its column. The actual (numerical-only!) data should begin immediately below the variable names and occupy as many rows as there are cases. (In a database, data set rows are usually called “records;” in data analysis they are frequently called “cases.”) Important: Data must be in columns, each with a variable name at the top, and a missing datum, if any, in a column must be represented as a completely empty cell.

Now, using the standard Excel “File Open” dialog in the File menu, open the MY_DATA.XLS example datasheet, a copy of which should be on your hard disk or on a network disk. (Purchasers of Regress may copy the file from the installation disk.) This datasheet contains the example data which should appear as in the figure below.[4] The first MY_DATA variable is entitled “No.” and is nothing more than a numeric row label for each of the case numbers. The actual variables used in the Tutorial are named My_Var_1, Y, and Age.[5]

Important: Each column variable in a Regress datasheet must be Range Named by Excel. Range names attach a range of worksheet cell addresses to a name, and Regress uses this approach to map the variable name you select to its data. Range names are saved with the datasheet, and so, range naming the data need only be done once. Since Regress accesses the data only through range names, you are free to have non-range-named information (descriptive labels or text, formulas, charts, XY scatterplots, extraneous data, etc.) present elsewhere on your datasheet.

The variables of MY_DATA have already been range named. However, the easiest way to Range-Name a new sheet is to highlight the entire block of the data, including the variable names at the top of the columns, and then select the “Name Create...” item from the Insert menu. Clicking “OK” causes Excel to define each column as a separate range named variable. For example, MY_DATA was range named by highlighting cells A1 through D11, followed by selecting the “Name Create...” menu item from the Insert menu, and then clicking “OK.”

Your first task is to verify MY_DATA’s range names by selecting "Name Define..." from the Insert menu, as below.

As seen in the Define Name dialog box below, you can verify the range for any variable by clicking its name and noticing the range of cells to which that name refers. For example, the name “My_Var_1” has been assigned the range of cells B2 through B11. (Always verify that the variables in your datasheet have the proper ranges associated with them. Otherwise, Regress may not process all your data!) After reviewing the variable ranges, click “OK” to close the Define Name dialog box.

At this point, you should have the Regress Toolbar showing on your screen and have your properly ranged-named datasheet, MY_DATA, opened in Excel. For simplicity, there should be no other workbooks open except your datasheet.[6]

Now you are ready to begin exploratory regressions.

Exploratory Regression Analysis

Be sure your datasheet is open, and click the left “Regression” button on the Regress Toolbar.

Clicking the left button on the Regress Toolbar always displays the Exploratory Regression dialog box, as below. Displayed on the left is an alphabetic list of all the range-named variables in your open datasheet.[7]

The entry at the top right is the dependent variable you wish to predict. The dialog defaults to the first variable in the variable list. It is a drop-down list and you may select another variable to be your dependent variable. Select the variable named “Y” as the dependent variable, as shown below.

Now you are ready to select the independent (predictor) variables used to predict the dependent variable, Y. You do this by selecting one or more variables from the list on the left. The variable(s) that you select will be included in either a simple regression model (selecting one independent variable) or a multiple regression model (selecting more than one independent variable). For now, select the variable Age as the single independent variable you will use to predict values of Y. See below.

Next, click the button to cause the (simple) regression model to be fit to the data. Regress will open a new Excel workbook called “Exploratory Results,” shown below. By default, Exploratory Results displays the names of the dependent and independent variable(s) and two summaries, the “Regression Statistics” and the “Summary Table.”[8]

The Regression Statistics gives information about your model’s goodness of fit (signaled by the Adjusted R-square, “Adj.RSqr”), the Standard Error of the Regression, “Std.Err.,”[9] the number of cases used, “# Cases,” and number of cases dropped because of missing data, “#Missing,” if any.

The Summary Table gives the coefficients of the regression model, “Coeff.,” along with their standard errors, “Std.Err.,” and the resulting statistics, “t-Stat.” and “P-value,” used for hypothesis tests on the coefficients.

At the top of the Exploratory Results window are two buttons that you may click for additional information:

Helpgives a brief summary of your options.

Correlationpresents the correlation matrix among all the pairs of variables in your model.

Because it is just another worksheet, you are free to open new windows and do other work even though the Exploratory Results window is open. Also, you can save Exploratory Results to disk or print it, if you wish. However, these are not its intended functions. Rather, Exploratory Results presents preliminary information about your regression model that you will either dismiss immediately in favor of another model or keep in a more detailed report worksheet.

Keeping a Regression Model

When the Exploratory Results window appears, a change occurs in the Regress Toolbar buttons. All three buttons become defined, as shown on the right.

Clicking each of the Regress Toolbar buttons produces a different action, as documented below.

REGRESSION: Returns to the Exploratory Regression dialog for another round of modeling with this datasheet.

REPORT: Keeps the Exploratory Results model as a new worksheet report within the datasheet’s workbook.

STOP: Stops regression modeling with this datasheet.

Let’s assume that you want to keep this regression model. So, click the middle “Report” button. This produces the “Regression Results” dialog box, as shown below, containing many options for things to compute and keep in your report.

The first entry in the Regression Results dialog box is the name you wish to give for the report worksheet of your model. Regress always proposes to call your first model “Regression #1.” You can leave this name as it is or substitute a more meaningful name of your own choosing.[10].

Change the dialog box’s Regression Name to be “My Regression Model.” You may now select the tables and plots that you would like to save, including if you wish, the Exploratory Results information Regress previously calculated. So, click the Descriptive Statistics option. This will produce another dialog box listing some standard statistics for your variables, as below. These statistics are not regression model statistics, but are computed from your raw data for possible comparisons later with your regression model results.[11] Accept these defaults by clicking the “OK” button, returning you to the “Regression Results” dialog box.

Now, click the boxes to keep the items shown in the dialog below. You need not collect all this information. Of course, you can always check everything to get as much information on the regression model as possible. However, that the more things you check (especially plots) the longer it will take Regress to prepare your report and the more random access memory will be taken to hold your workbook.

Leave the items under “Advanced Options” unchecked. They will be covered later. Now click the OK button. As you can see below, Regress will create a new worksheet with your Regression Name assigned to it containing all the tables and plots, and place it into the workbook containing your datasheet.[12]

At the far left of your worksheet report are small (+,-) symbol buttons of an Excel outline. These buttons allow you to collapse and expand portions of your results so that you can manage the view of information detail on your display screen. By default, the Regression Statistics and Summary Table are expanded, as signaled by the buttons with “-” signs. A button with a “-” sign has no more detail to display; a button with a “+” sign conceals information. You can reveal, i.e., open, the concealed information by clicking any button having a “+” sign, thereby expanding that portion of the worksheet.

For example, click the “+” outline button to expand the Descriptive Statistics, as shown below.

Clicking a “-” sign outline button collapses information. For example, if you click the “-” sign outline button next to the Summary Table, the information subtended by the vertical line above it will be concealed. Remember, the information is not erased; it is simply hidden from view.

Expand the Histogram of Residuals, as shown below.[13] Two histograms are presented; the darker (red) bars are the actual residuals from the model, and the lighter (blue) bars give the height of the theoretical bell-shaped normal curve that you would expect from normally distributed residuals. This allows you to compare visually the shape of the theoretical and the actual frequency histograms to see if the results violate the “normal distribution of residuals” assumption for a regression model.[14]

Next, collapse the Histogram of Residuals and expand the Line Fit Plot., as shown below.

Whenever you define a “simple” regression model, the Line Fit Plot shows the Predicted-Y verses Age regression line itself with the scatter of Actual-Y values around it. (This plot is not possible for models with more than one predictor variable.) The vertical distance between the Actual-Y values and the regression line is the “Residual” or “Error” value for that case, given your current regression model.

Now, collapse the Line Fit Plot and expand the Residual Plots. Notice that two-level outlining is used advantageously for nesting the Residual Plots. You can expand and collapse each of these residual scatter plots by clicking its outline button or collapse all of them as a group.[15]

Click the outline buttons to expand the “Residuals -vs- Predicted Plot,” and the “Age Residual Plot,” as shown below.

Don’t forget that every Regress produced report is an entry in your datasheet’s workbook and is an Excel worksheet with its gridlines and row/column entries turned off. You can change fonts and number formats, copy/paste, annotate, etc., at will.

This completes your first regression model. Now is a good time to save your workbook. So, select the “Save” or “Save As” item from the File menu. Excel will prompt to convert MYDATA.XLS to workbook format during the Save operation. If you do not convert it to a workbook, Excel will save only the active worksheet and not the entire workbook.

How do you go back to create other regression models from this datasheet using different variables? You do this at any time by clicking the left button on the Regress Toolbar. At this point, you have several other options, however.

Clicking the middle “Report” button again will give you the opportunity to collect additional information from the previous model that you might not have checked earlier. To illustrate this, click the Report button. Note the same Regression Results dialog box appears. Regress always remembers your last actions.

If you click OK at this point, you will get a second report that is identical to the first one you kept. Since they would be redundant, un-check most of the tables and plots that you checked previously. Now, instead, check the boxes to keep the Line Fit Plot, Add Confidence Intervals, and Predicted and Residual Matrix, as below. When you check the Predicted and Residual Matrix option, the “Leverage Values” option un-gray’s to allow its selection. Leave this box un-checked for now.

Click the “OK” button. Regress will notice that you did not change your Regression Name and will complain, as below:

Since you are asking for more information on the same regression model, normally, you would “Append” the new information to the previous report worksheet. However, for the sake of illustration, let’s keep this second report in a separate worksheet. So, click the third option (“Save results instead to:”) to place the report into a second worksheet whose name defaults to an incremented version of the previous name, which you can override, if you wish, with another name. For now, accept the new default name by clicking the OK button.

Regress opens a new worksheet using the name of “My Regression Model #2.” Contained in it is a new Statistics and Summary Table, Line Fit Plot, and the Residual Table. Whenever the advanced option “Add Confidence Intervals” is checked, Regress also produces three additional statistics in the Statistics and Summary Table: (1) the regression model’s degrees of freedom, “Deg.Free,” and using the degrees of freedom and the default 95% option, (2) the “back-of-statistics-book” t-table value for a two tail, (1-95%)/2 = 2.5%, hypothesis test, “t(2.5%, 8).” In addition, (3) the 95% confidence intervals are presented for each regression coefficient.

Next, expand the Line Fit plot. Whenever you define a one-predictor regression model and check the “Add Confidence Intervals” option, Regress also adds the 95% confidence interval lines for forecasting the range of individual population values of Y given Age.[16]

In all appropriate tables and plots, such as above, you may, of course, override the 95% default Regress uses by changing the “95%” entry in the earlier dialog box, repeated below.