1
SPSS #6: Exploring Causality Using Multiple Regression and Path Analysis
(References: Szafran Ch. 9 and 10; Nachmias, Ch. 16 and 17; Agresti and Finlay, Ch. 10-13)
Introduction:
The final SPSS assignment will be the core of your analysis for your major research project. It consists of a multi-part multiple regression analysis and a causal path analysis like the one demonstrated in class during the Nov. 8thlecture. Please print this assignment and complete all applicable sections.
Your multiple regression analysis will consist of 1) your main dependent variable, 2) one or two main independent variables and should incorporate 3) one or two control/intervening variables. For this assignment, do not use more than 5-6 variables since this will make your interpretation much more difficult! If necessary, you can add additional variables or do other statistical tests later on.
This assignment is more extensive than the previous ones you have done in the lab. The deadline for SPSS # 6 (this assignment) is Nov. 15th so you will have to weeks to complete it. Please save all of your syntax/output in a separate file but print only the syntax and output for Part II.
Thefinal SPSS assignment is designed to be the analysis for your major research project. It is an individual assignment that each of you is required to submit. However, lab partners can work co-operatively and can provide each other with suggestions and feedback. The assignment is worth 10% of your mark. Upon completion of the assignment, you will have completed the core analytical component of your final project, although in some cases, further revision may still be necessary to make sense of the variables that you have chosen for your project. The critique that you receive upon submission of the assignment can be used to 'perfect' the SPSS analysis for your project.
In submitting the assignment, attach only the output from Part II.Additional calculations for the path analysis (Part III) can be attached on a separate page. Due Nov. 15th.
The dataset for this assignment is one of the following (as previously stated in your problem):
Canadian Labour Force Survey Jan 2012
Ontario Material Deprivation Survey 2009
Canadian Youth Smoking Survey 2009
City Happiness Survey 2006
Since the dataset that you will be using islarger than the one used for assignments 1-5, you may wish to make some changes to it before you begin. Following are a few suggestions....
a) If you find it confusing to work with an increased number of variables, you could eliminate the variables not needed for your project and save only the variables that you will actually be using, or may conceivably use later on. Go to Save Data As...>Variables and Uncheck the unwanted variables, saving the remaining variables in a new .sav file under a different name.
b) If you plan to select for certain cases (for instance, if you only want to look at cases from a certain region or province), do so by going to Data>Select Cases and savethe result as your new .sav file.
c)Once you have your new .sav file, make all your changes (recoding, etc.) to this new file, saving the changes as you go. Make sure you do not save over the original data file, in case you need more from it later on!
Part I Preliminary Steps:
1. In a few sentences, outline your revised problem for your major research project.
2. What is your DV?
3. What is (are) your main (one or two) independent variable(s)?
4. What other IV's (one or twocontrols/intervening variables) are you planning to use?
5. Examine the questions used and the levels of measurement of your dependent variable and your main independent variables. Make sure your DV is at the interval-ratio or ordinal (with at least 5 categories) level. List this information below.
6. If necessary, create an index to use as one of the main variables in your analysis. Your DV should be at the interval-ratio level, and if it is not, you can create theindex for your DV. To do this, find a series of related variables with the same answer categories and follow the techniques suggested in Szafran. If desired, you can create more than one index for your analysis but if you are using ordinal variables with more than five categories, you can use them "as is." If you have created an index, briefly describe itand its associated reliability (use Cronbach's alpha) below.
7. Run frequencies and histograms (Analyze>Descriptives>Frequencies), asking for a normal curve and skewness/kurtosis statistics for all interval-ratio, summated ordinal indexes and ordinal variables. You do not need to print any of this output – just examine it and save the file in order to answer #8 below and for future reference.
8. Check histograms and statistics for skewness and kurtosis. List and comment on the skewness and kurtosis values below. If any variables areextremely skewed (use +/- 2 as a guideline, see text p. 111-2), they are not appropriate for a multiple regression analysis and should be treated differently. If they are only moderately skewed, merely note this below and go on with the assignment. Do not worry about kurtosis, merely report it and comment. For your final project, your best choice is to recode very skewed variables as 'dummy' variables inyour multiple regression analysis. Note: Do not use a recoded binary (categorical) variable for your DV!See me or the TA if in doubt about any of this.
9. Create a bivariate correlation matrix (see Ch. 9 p. 218-22) incorporating the above variables, listing your DV first. (Don't worry if not all your variables are at the interval-ratio level, because you are merely checking to see if your proposed relationships between your DV and other variables are 'real'.) Go to Analyze > Correlations and use "exclude cases listwise" (p. 221)
10.Examine the correlations of your DV with the independent variables. Are all variables significantly related to your DV? List therelationships, Pearson's r and p-values (sig. level) below. Use asterisks to indicate significance (* = <.05, ** = <.01, *** = < .001)
11. Examine the correlations between your IV's. Do any of the inter-correlations seem very high (> +/- .600) indicating possible multicollinearity (text, p. 248-9)? List the r and p-values between all IV's below, indicating which variables may not be appropriate for this analysis.
12. Drop any variables that are not significantly correlated with your DV unless you have theoretical reasons for keeping them in the model. If there are indications of multicollinearity in the IV's, drop one of the two variables that are highly correlated or choose a different variable (again see text, p. 248-9).
13. Run a revised correlation matrix and check for changes.
14. Revise your causal model. On the separate page attached below, draw your revised causal model, indicating all relationships you expect to find. You will use this diagram for the regression analysis and path analysis in Part II. Submit this page with your assignment.
Part II Your Mulitple Regression Analysis (Note: print and submit the output for this section):
1. Create scatterplots for your DV and all IV's at the interval ratio, ordinal or summated ordinal level (index.) Check for linearity and homoscedasticity and briefly comment on this below.
2. Do a multiple regression analysis using the process outlined in class and in Ch. 10 that incorporates at least three other variables. If any of your IV's are at the nominal level, you will need to first recode them into binary "dummy" variables to use in the analysis. To do the regression analysis, go to Analyze>Regression. Use the default method Enter and under Statistics, check off Estimates, Model Fit, Descriptives, and Part and Partial Correlations. Under Options, the default Listwise should be checked off.
3. Examine the Model Summary. Interpret R and Adj. R2 below.
4. Examine the ANOVA table. Report and interpret F and p-value below.
5. Examine the Coefficients table. Report and interpret Slopes (b), their significance (t and p-values) and Beta values below.
6. Look at the zero-order and partial correlations. Look for indications of spuriousness and multicollinearity. Comment on any large changes below.
7. Enter the Beta Weights (the path coefficients) from your multiple regression analysis onto your revised model diagram (the one that you will be submitting on the separate page.)
8. Use regression to calculate the Betas for the other causal relationships in your model. You will need to run a partial regression analysis for each endogenous IV in your model. Exogenous variables (those that have no prior causes in your model) can be ignored. Enter beta weights for all paths onto your revised model that you have drawn on the separate page.
Part III Path Analysis: Calculating the Causal Effects
1. Do a complete path analysis according to the guidelines given in class (Nov. 8) Show your calculations below the causal model on the separate page attached to this assignment.
2. Which variable(s) have the greatest causal effect (direct and indirect) on your DV? How much unexplained variability (error or 1 - R2) is there in your model?
3. Are you satisfied with this model? Do you think the model is correctly specified, over-identified (unnecessary variables) or under-identified (missing crucial IV's that should have been included)?
How might you revise or make changes to this model at a later date?
Revised Causal Model with Path Coefficients:
Path Calculations (continue on reverse or attach separate page if necessary):