Applied Regression: SPSS Reference Sheet
Biostat 503.881 Fall 2012, University of Michigan
Exploratory AnalysisCreate a Histogram: Graphs Legacy Dialogs Histogram move desired variable into “Variable” field (Change title if desired)
Create a Boxplot:Graphs Legacy Dialogs Boxplot Simple Select “summaries of separate variables” (if you want a boxplot for one variable) move variable into “boxes represent”
QQ Plot: Analyze Descriptive Statistics QQ Plots move desired variable into “Variable” field
Descriptive Stats: Analyze Descriptive Statistics Descriptives Enter variable into field Select the “Options” button to change/specify desired statistics
Frequency Counts: Analyze Descriptive Statistics Frequencies Enter variable into field
Correlations matrix: Analyze Correlate Bivariate Enter desired variables into box (at least 2) and select appropriate method (Pearson, Spearman)
Simple scatter plot: (eg, sbpvs weight) Graphs Legacy Dialogs Scatter/Dot Click Simple Scatter, then Define move desired variable into “Y axis” field, and another into the “X axis” field. Add a title.
You may add a grouping variable into a scatterplot, to separate smokers from non-smokers, for example. You can:
A. Create color-coded groups within a single scatterplot: move a variable into the “Set markers by” box in the scatterplot dialog box. Smokers would appear in one color, and non-smokers in another, in the same scatterplot.
B. Create separate panels for each subgroup: move a variable into the “Rows” box under “Panel by.” You can also panel by columns. This creates two separate scatterplots for your data, one for smokers, and another for non-smokers.
Scatterplot: Graphs Legacy Dialogs Scatter/Dot Simple Scatter Define Move desired variables into X-axis and Y-axis fields
Matrix scatterplot:Graphs Legacy Dialogs Scatter/Dot Matrix Scatter Define Move desired variables into Matrix Variables fields
Frequency Tables: Analyze Descriptive Statistics Crosstabs Enter desired row and column variables Click “Cells” Make sure “Observed” is checked; make sure “Column” is checked (will give you percentage within each column. You may also check “Row” if you want percentage within each row.)
Select (delete) cases: Data Select Cases Select "If condition is satisfied" press the "If" button below it Type in desired command, such as “Diabetes=0” to only include non-diabetics Click continue Back in the "Select Cases" window, select "Delete unselected cases" under "Output" Click OK
(This procedure selects non-diabetics (Diabetes=0) and deletes the Diabetics cases from the dataset)
Basic Statistical Methods
Correlation: Analyze Correlate Bivariate Enter all variables you would like pairwise correlation for Make sure “two-tailed” is selected; select whether you want “Pearson” or “Spearman” (Pearson is standard)
One-Way ANOVA: Analyze Compare Means One-Way ANOVA Enter “Dependent Variable” (continuous outcome) Enter “Factor” Variable (categorical variable to break down outcome into groups) If multiple comparisons desired: Click Post Hoc button check desired test (“Bonferroni”, “Tukey”, etc.)
Transforming a Variable: Transform Compute Variable In the “Target Variable” Field, enter the desired name of the new variable (note: cannot contain spaces) In the “Numeric Expression” Box, make desired transformation.
***For the specific example of Centering a Covariate: if you want to center a variable Age, and the mean age is 46, then move Age into “Numeric Expression”, then enter a minus sign with the keypad, then use the keypad to type in “46” Click OK.
Create Interaction Variable, Switch Values of Indicator Variables: see “More Regression techniques,” below
Two-independent-samples t-test:AnalyzeCompare MeansIndependent Samples t-test.
Class example: Select fasting glucose as the Test Variable and exercise at least 3 times a week as the Grouping Variable (make sure you define the groups)
Linear Regression
Simple or Multiple Linear Regression Equation: Analyze Regression Linear Insert “Dependent” variable (outcome) Insert “Independent” Variable(s) (predictor(s)/covariate(s)) Click OK
To add Confidence Intervals for Coefficients: follow the above, and in the main dialog box, Click “Statistics” Button Click “Confidence Intervals” under “Regression Coefficients” category change appropriate confidence level, if you want anything other than 95% (“Estimates” gives your ’s, but that option should already be checked.)
Checking Assumptions: Plots, Predicted Values, Residuals, DFBetas:AnalyzeRegressionLinear Enter Dependent and Independent Variables as usual Hit “Plots” Check “Histogram” and “Normal Probability Plot” (Normality) Check “Produce all Partial Plots” (Linearity) Click Continue Click Save Button Check “Unstandardized” under Predicted Values (saves ’s) Check “Standardized” under Residuals (saves ’s) Check “DFBeta(s)” under Influence Statistics (checks for influential observations) Click Continue and OK
Plot Residual vs. Predicted Value (Check Constant Variance Assumption):Perform a Linear Regression as Usual Hit Plots button Move ZRESID to the Y-axis box Move ZPRED to the X-axis box Click Continue and OK
Assess Linearity with a Smooth Plot: Create a Scatter plot as usual after you have created it, double click the Scatterplot, and the Chart Editor will pop up click the icon that looks like a scatter with a line through it (hover your mouse and it says “Add Fit Line at Total”) Check the “Loess” Option Click the Lines Tab Pick your favorite color Increase weight for a thicker line, if desired Hit Apply, Close, and Click back in the Output Screen
More Regression Techniques
Convert a Continuous Variable Into Categories: Analyze Descriptives Frequencies Move in Variable Hit Statistics Button Check “Cut Point For” and enter number of categories you would like (ex: for 5 categories, use 5 for cut point) Transform Compute Variable Enter appropriate range values, based on your cut-points (you are basically “slicing” the data into 5 pieces with the cut points based on the values of the percentiles). This creates 5 different indicator variables, G1, G2, G3, G4, and G5.
Construct Ordered Categories Variable: Follow instructions above for converting a continuous variable into categories. Now, we want to take all of these indicator variables and order them into one variable. Say you have 5 indicator variables, G1, G2, G3, G4, and G5. Then in Transform Variable, after giving a title, enter the following in the numeric expression box: G1+2*G2+3*G3+4*G4*5*G5, where G1 takes the value 1, G2 takes the value 2, etc.
Creating Prediction Plots: Create a Scatter plot as usual after you have created it, double click the Scatterplot, and the Chart Editor will pop up click the icon that looks like a scatter with a line through it (hover your mouse and it says “Add Fit Line at Total”) Click the Fit Line option Check the “Linear” Option Check Mean CI’s Hit Apply, Close, and Click back in the Output Screen
A More Complex Example of Prediction Plots: Fit the model, as usual Click Save Check Unstandardizedin Predicted Values, save both Mean and Individual Prediction Intervals Create a Scatter plot, as usual, of the outcome vs. predicted value Follow instructions from above to fit a line to this scatterplot
Obtain Tolerance: Fit a model, as usual Click the Statistics button Check Collinearity Diagnostics(the Tolerance will be a column in the Coefficients Box)
Automatic Model Selection in SPSS: Fit a model, as usual Click the Method tab Select your desired method of selection
Fit a Stratified Model: Fit a model, as usual Move the variable you want to stratify by into the Selection Variable Box Hit the Rule Button Enter the value within the variable for which you only want to include subjects (Ex: if males=1, females=0, and you only want to include males in your model, enter a 1)
Creating an Interaction Variable: TransformCompute Variable Give the variable a meaningful name in Target Variable (Ex: If creating Age*Gender Interaction, call it “Age_Gender”) In the Numeric Expression Box, multiply the two factors together
Switch the values of an Indicator Variable:TransformCompute Variable Give the variable a meaningful name in Target Variable In the Numeric Expression Box, enter 1 minus your factor (computes 1-1=0 and 1-0=1...changes 0’s to 1’s and 1’s to 0’s)
Odds, Logistic Regression
Contigency Table: Analyze Descriptive Statistics Crosstabs Enter Column and Row Variables If you want Expected Counts: Click “Cells” → check off “Expected”
Chi-Square Test: Analyze → Descriptive Statistics → Crosstabs → Enter Column(s) and Row(s) → Click “Statistics” → check off “Chi-squared”
Binary Logistic Regression: Analyze Regression Binary Logistic Regression Enter Dependent Variable and Covariate(s) Click “Options” and Check CI for exp(B) if desired
Use a Categorical Variable as several Indicators in Logistic Regression: Follow steps in slide 14 Click “Categorical” Move the desired variable into “categorical covariates” box Choose which category you want as the reference (first or last) Click “Change”Then, click Continue
Change Scale of a Variable for Logistic Regression:Transform → Compute Variable → Name new variable in Target Variable → Enter the numeric expression you want
Logistic Model with Interaction Term: Run Logistic Model, as usual From the variable list, select your first covariate, hold the control key down, and then select your second covariate (both will be highlighted) Then click the “>a*b>” button (SPSS will create the interaction variable for us!) Click OK
Logistic Model- Check for Influential Observations: Run Logistic Model, as usual Click “Save” → Under “Influence”, check off “DFBetas”
Remove Influential Observations: To find the person you want, sort by id- Data → sort cases → sort by: enter “id” Scroll to find influential person you previously identified Select row Left Click Select “Cut”
Obtain Predicted Probabilities for Individuals: Run Logistic Model, as usual Click “Save” → Under “Predicted”, check off “probabilities”
Plot of Predicted Probabilities: Complete steps above to obtain predicted probabilities Graphs -> legacy dialog -> scatter plot -> simple scatter plot Enter Predicted Probability in Y-axis Enter covariate in X-axis Enter variable you want to color code by in the “Set Markers by” box
Area Under the ROC Curve: Complete steps above to obtain predicted probabilities Analyze ROC Curve Test Variable and enter predicted probability State Variable and enter outcome Value of State Variable (enter 1) check off “with diagonal reference line”
Poisson Regression
1)Create Offset Manually: Transform Compute Variable Rename Variable and take the natural log of the time variable that will be your offset (ArithemticLn)
2)Regression Model: GLM Poisson Log Linear click “response” tab, enter response variable
3)Enter Predictors: Click “predictors” tab: enter any categorical predictors into the “factors” box click “options” for reference group specification; enter any continuous predictors into the “covariates” box
4)Enter Offset Variable/Finalize Model: Enter Offset Variable in box click “model” tab Enter the desired factors you want SPSS to put in the current model if you want interactions, you can SPSS will directly create them for you here rather than creating them in the dataset
5)Specify Statistics: Click the “statistics” tab select desired statistics (parameters, RR’s, etc)
Survival Analysis, Cox Regression
Kaplan-Meier Curves: Analyze Survival Kaplan-Meier… Enter your time variable Enter the outcome variable you are interested in, in the Status box (death, injury, etc.) Click “Define Event” Enter the value of the variable that indicates the event has occurred (ex: if you are modeling death and 1 indicates death, enter 1) Click continue Click Save Check off survival and standard error of survival Continue Options Check off survival under plots Continue OK
Log Rank Test: Follow the steps above for Kaplan-Meier, but also: add the variable for which you want to test groups in the “Factor” box Hit “Compare Factor” check off log rank Continue OK
Cox Regression:
1)Basic Set-up: Analyze Survival Cox Regression Enter the time variable Enter the outcome variable in the Status box Click Define Event Enter the value that indicates the event has happened (usually coded as a 1) Continue
2)Put Covariates in Model: Enter all covariates in the covariates box IF any covariates are categorical, click the categorical button move variables into the categorical covariates box specify first or last reference category CLICK CHANGE to save what you just did Continue
3)Specify Plots: Click Plots Check off survival under plot type if you want separate lines for a category, move that variable into the “separate lines for” field Continue
4)Obtain CI’s for the Hazard Rate: Click Options check off CI for exp(B) Continue OK
Other resources
Excellent additional general tutorials and specific guides for SPSS and other software.
Another guide: tests, ANOVA, linear regression and steps on determining which to use.