Applied Regression: SPSS Reference Sheet

Biostat 503.881 Fall 2012, University of Michigan

Exploratory Analysis
Create a Histogram: Graphs  Legacy Dialogs  Histogram  move desired variable into “Variable” field  (Change title if desired)
Create a Boxplot:Graphs  Legacy Dialogs Boxplot Simple  Select “summaries of separate variables” (if you want a boxplot for one variable)  move variable into “boxes represent”
QQ Plot: Analyze  Descriptive Statistics  QQ Plots  move desired variable into “Variable” field
Descriptive Stats: Analyze  Descriptive Statistics Descriptives Enter variable into field  Select the “Options” button to change/specify desired statistics
Frequency Counts: Analyze  Descriptive Statistics  Frequencies  Enter variable into field
Correlations matrix: Analyze  Correlate Bivariate Enter desired variables into box (at least 2) and select appropriate method (Pearson, Spearman)
Simple scatter plot: (eg, sbpvs weight) Graphs  Legacy Dialogs  Scatter/Dot  Click Simple Scatter, then Define  move desired variable into “Y axis” field, and another into the “X axis” field. Add a title.
You may add a grouping variable into a scatterplot, to separate smokers from non-smokers, for example. You can:
A. Create color-coded groups within a single scatterplot: move a variable into the “Set markers by” box in the scatterplot dialog box. Smokers would appear in one color, and non-smokers in another, in the same scatterplot.
B. Create separate panels for each subgroup: move a variable into the “Rows” box under “Panel by.” You can also panel by columns. This creates two separate scatterplots for your data, one for smokers, and another for non-smokers.
Scatterplot: Graphs  Legacy Dialogs  Scatter/Dot  Simple Scatter  Define  Move desired variables into X-axis and Y-axis fields
Matrix scatterplot:Graphs  Legacy Dialogs  Scatter/Dot  Matrix Scatter  Define  Move desired variables into Matrix Variables fields
Frequency Tables: Analyze  Descriptive Statistics  Crosstabs  Enter desired row and column variables  Click “Cells”  Make sure “Observed” is checked; make sure “Column” is checked (will give you percentage within each column. You may also check “Row” if you want percentage within each row.)
Select (delete) cases: Data Select Cases Select "If condition is satisfied"  press the "If" button below it Type in desired command, such as “Diabetes=0” to only include non-diabetics  Click continue  Back in the "Select Cases" window, select "Delete unselected cases" under "Output"  Click OK
(This procedure selects non-diabetics (Diabetes=0) and deletes the Diabetics cases from the dataset)
Basic Statistical Methods
Correlation: Analyze  Correlate Bivariate Enter all variables you would like pairwise correlation for  Make sure “two-tailed” is selected; select whether you want “Pearson” or “Spearman” (Pearson is standard)
One-Way ANOVA: Analyze  Compare Means  One-Way ANOVA  Enter “Dependent Variable” (continuous outcome)  Enter “Factor” Variable (categorical variable to break down outcome into groups)  If multiple comparisons desired: Click Post Hoc button  check desired test (“Bonferroni”, “Tukey”, etc.)
Transforming a Variable: Transform  Compute Variable  In the “Target Variable” Field, enter the desired name of the new variable (note: cannot contain spaces)  In the “Numeric Expression” Box, make desired transformation.
***For the specific example of Centering a Covariate: if you want to center a variable Age, and the mean age is 46, then move Age into “Numeric Expression”, then enter a minus sign with the keypad, then use the keypad to type in “46”  Click OK.
Create Interaction Variable, Switch Values of Indicator Variables: see “More Regression techniques,” below
Two-independent-samples t-test:AnalyzeCompare MeansIndependent Samples t-test.
Class example: Select fasting glucose as the Test Variable and exercise at least 3 times a week as the Grouping Variable (make sure you define the groups)
Linear Regression
Simple or Multiple Linear Regression Equation: Analyze  Regression  Linear  Insert “Dependent” variable (outcome)  Insert “Independent” Variable(s) (predictor(s)/covariate(s))  Click OK
To add Confidence Intervals for Coefficients: follow the above, and in the main dialog box,  Click “Statistics” Button  Click “Confidence Intervals” under “Regression Coefficients” category  change appropriate confidence level, if you want anything other than 95%  (“Estimates” gives your ’s, but that option should already be checked.)
Checking Assumptions: Plots, Predicted Values, Residuals, DFBetas:AnalyzeRegressionLinear Enter Dependent and Independent Variables as usual  Hit “Plots” Check “Histogram” and “Normal Probability Plot” (Normality)  Check “Produce all Partial Plots” (Linearity)  Click Continue Click Save Button  Check “Unstandardized” under Predicted Values (saves ’s)  Check “Standardized” under Residuals (saves ’s)  Check “DFBeta(s)” under Influence Statistics (checks for influential observations)  Click Continue and OK
Plot Residual vs. Predicted Value (Check Constant Variance Assumption):Perform a Linear Regression as Usual  Hit Plots button  Move ZRESID to the Y-axis box  Move ZPRED to the X-axis box  Click Continue and OK
Assess Linearity with a Smooth Plot: Create a Scatter plot as usual  after you have created it, double click the Scatterplot, and the Chart Editor will pop up  click the icon that looks like a scatter with a line through it (hover your mouse and it says “Add Fit Line at Total”)  Check the “Loess” Option  Click the Lines Tab  Pick your favorite color Increase weight for a thicker line, if desired  Hit Apply, Close, and Click back in the Output Screen
More Regression Techniques
Convert a Continuous Variable Into Categories: Analyze Descriptives Frequencies  Move in Variable  Hit Statistics Button  Check “Cut Point For” and enter number of categories you would like (ex: for 5 categories, use 5 for cut point)  Transform  Compute Variable  Enter appropriate range values, based on your cut-points (you are basically “slicing” the data into 5 pieces with the cut points based on the values of the percentiles). This creates 5 different indicator variables, G1, G2, G3, G4, and G5.
Construct Ordered Categories Variable: Follow instructions above for converting a continuous variable into categories. Now, we want to take all of these indicator variables and order them into one variable. Say you have 5 indicator variables, G1, G2, G3, G4, and G5. Then in Transform Variable, after giving a title, enter the following in the numeric expression box: G1+2*G2+3*G3+4*G4*5*G5, where G1 takes the value 1, G2 takes the value 2, etc.
Creating Prediction Plots: Create a Scatter plot as usual  after you have created it, double click the Scatterplot, and the Chart Editor will pop up  click the icon that looks like a scatter with a line through it (hover your mouse and it says “Add Fit Line at Total”)  Click the Fit Line option  Check the “Linear” Option  Check Mean CI’s  Hit Apply, Close, and Click back in the Output Screen
A More Complex Example of Prediction Plots: Fit the model, as usual  Click Save Check Unstandardizedin Predicted Values, save both Mean and Individual Prediction Intervals  Create a Scatter plot, as usual, of the outcome vs. predicted value  Follow instructions from above to fit a line to this scatterplot
Obtain Tolerance: Fit a model, as usual  Click the Statistics button  Check Collinearity Diagnostics(the Tolerance will be a column in the Coefficients Box)
Automatic Model Selection in SPSS: Fit a model, as usual  Click the Method tab  Select your desired method of selection
Fit a Stratified Model: Fit a model, as usual  Move the variable you want to stratify by into the Selection Variable Box  Hit the Rule Button  Enter the value within the variable for which you only want to include subjects (Ex: if males=1, females=0, and you only want to include males in your model, enter a 1)
Creating an Interaction Variable: TransformCompute Variable Give the variable a meaningful name in Target Variable (Ex: If creating Age*Gender Interaction, call it “Age_Gender”)  In the Numeric Expression Box, multiply the two factors together
Switch the values of an Indicator Variable:TransformCompute Variable Give the variable a meaningful name in Target Variable  In the Numeric Expression Box, enter 1 minus your factor (computes 1-1=0 and 1-0=1...changes 0’s to 1’s and 1’s to 0’s)
Odds, Logistic Regression
Contigency Table: Analyze  Descriptive Statistics  Crosstabs  Enter Column and Row Variables  If you want Expected Counts: Click “Cells” → check off “Expected”
Chi-Square Test: Analyze → Descriptive Statistics → Crosstabs → Enter Column(s) and Row(s) → Click “Statistics” → check off “Chi-squared”
Binary Logistic Regression: Analyze  Regression  Binary Logistic Regression  Enter Dependent Variable and Covariate(s)  Click “Options” and Check CI for exp(B) if desired
Use a Categorical Variable as several Indicators in Logistic Regression: Follow steps in slide 14  Click “Categorical”  Move the desired variable into “categorical covariates” box  Choose which category you want as the reference (first or last) Click “Change”Then, click Continue
Change Scale of a Variable for Logistic Regression:Transform → Compute Variable → Name new variable in Target Variable → Enter the numeric expression you want
Logistic Model with Interaction Term: Run Logistic Model, as usual  From the variable list, select your first covariate, hold the control key down, and then select your second covariate (both will be highlighted)  Then click the “>a*b>” button (SPSS will create the interaction variable for us!)  Click OK
Logistic Model- Check for Influential Observations: Run Logistic Model, as usual  Click “Save” → Under “Influence”, check off “DFBetas”
Remove Influential Observations: To find the person you want, sort by id- Data → sort cases → sort by: enter “id”  Scroll to find influential person you previously identified  Select row  Left Click  Select “Cut”
Obtain Predicted Probabilities for Individuals: Run Logistic Model, as usual  Click “Save” → Under “Predicted”, check off “probabilities”
Plot of Predicted Probabilities: Complete steps above to obtain predicted probabilities  Graphs -> legacy dialog -> scatter plot -> simple scatter plot  Enter Predicted Probability in Y-axis  Enter covariate in X-axis  Enter variable you want to color code by in the “Set Markers by” box
Area Under the ROC Curve: Complete steps above to obtain predicted probabilities  Analyze  ROC Curve  Test Variable and enter predicted probability  State Variable and enter outcome  Value of State Variable (enter 1)  check off “with diagonal reference line”
Poisson Regression
1)Create Offset Manually: Transform  Compute Variable  Rename Variable and take the natural log of the time variable that will be your offset (ArithemticLn)
2)Regression Model: GLM  Poisson  Log Linear  click “response” tab, enter response variable
3)Enter Predictors: Click “predictors” tab: enter any categorical predictors into the “factors” box  click “options” for reference group specification; enter any continuous predictors into the “covariates” box
4)Enter Offset Variable/Finalize Model: Enter Offset Variable in box click “model” tab  Enter the desired factors you want SPSS to put in the current model  if you want interactions, you can SPSS will directly create them for you here rather than creating them in the dataset
5)Specify Statistics: Click the “statistics” tab  select desired statistics (parameters, RR’s, etc)
Survival Analysis, Cox Regression
Kaplan-Meier Curves: Analyze  Survival  Kaplan-Meier…  Enter your time variable  Enter the outcome variable you are interested in, in the Status box (death, injury, etc.)  Click “Define Event”  Enter the value of the variable that indicates the event has occurred (ex: if you are modeling death and 1 indicates death, enter 1)  Click continue  Click Save  Check off survival and standard error of survival  Continue  Options  Check off survival under plots  Continue  OK
Log Rank Test: Follow the steps above for Kaplan-Meier, but also:  add the variable for which you want to test groups in the “Factor” box  Hit “Compare Factor”  check off log rank  Continue  OK
Cox Regression:
1)Basic Set-up: Analyze  Survival  Cox Regression  Enter the time variable  Enter the outcome variable in the Status box  Click Define Event  Enter the value that indicates the event has happened (usually coded as a 1)  Continue
2)Put Covariates in Model: Enter all covariates in the covariates box  IF any covariates are categorical, click the categorical button  move variables into the categorical covariates box  specify first or last reference category  CLICK CHANGE to save what you just did  Continue
3)Specify Plots: Click Plots  Check off survival under plot type  if you want separate lines for a category, move that variable into the “separate lines for” field  Continue
4)Obtain CI’s for the Hazard Rate: Click Options  check off CI for exp(B)  Continue  OK

Other resources

Excellent additional general tutorials and specific guides for SPSS and other software.

Another guide: tests, ANOVA, linear regression and steps on determining which to use.