Moderation Analysis with Regression

Categorical Data Analysis

Packet CD03

Dale Berger,

Claremont Graduate University

()

Statistics website: http://wise.cgu.edu

This document is designed to aid note taking during the associated PowerPoint presentation and to serve as a reference for later use. It provides selected formulas, figures, SPSS syntax and output, and references, along with detailed explanations of SPSS output.

This document, the associated PowerPoint presentation, SPSS data files, supplemental reading, and other materials are available at http://sakai.claremont.edu under Resources.

2 Moderation analysis with regression

3 Examples of moderation (identify X, Y, and Z)

3 Model of salary for men and women

4 SPSS example of moderation with a dichotomous moderator

5 SPSS point-and-click commands

6 SPSS regression output and interpretations

7 Figure presenting the findings

8 Table presenting regression analysis

9 Dummy mediator variable and centered continuous X variable

10 Multicollinearity and tolerance

10 SPSS example of moderation with a continuous moderator

11 SPSS output and table for presentation

12 Interpretations

13 Figure presenting the findings

14 Summary

15 References

16 Appendices: SPSS syntax for moderation analyses

17 Hayes’ SPSS macro PROCESS

18 Hayes’ SPSS macro MODMED

Moderation Analysis with Regression

Group differences in treatment effects often are especially important to measure and understand. If a treatment has greater effects for women than for men, we say that sex ‘moderates’ the effects of the treatment. When there is moderation, it may be misleading to describe overall treatment effects without taking group membership into account. Moderation analysis can guide decisions about interpreting effects and redesigning treatments for different groups.

Moderation is interaction. For example, among eighth grade children, drug use by a child can be predicted by drug use of their friends. However, the relationship is weaker for children who have greater parental monitoring. This finding can be displayed by showing that the slope of the regression line predicting the child’s drug use from drug use by friends differs with the level of parental monitoring.

“Parental monitoring moderates the relationship between drug use by children and drug use by their friends, such that the relationship is weaker for children with stronger parental monitoring.”

In general, Variable Z is a moderator of the relationship between X and Y if the strength of the relationship between X and Y depends on the level of Z. A moderator relationship can be illustrated with an arrow from Variable Z (the moderator variable) pointing to the arrow that connects X and Y (see Figure 2). A model can include both mediation and moderation. We can include a path from X to Z, indicating that Z may also mediate the relationship between X and Y. The X-Z path would not affect our analysis of the moderation effect. Z can be a moderator even if it has no direct effect on Y and thus no mediation effect. The effect of X on Y for a specific value of Z is called a simple effect of X on Y for that value of Z.

Figure 2: Model Showing Z Moderating the X-Y Relationship

Examples of moderation (identify X, Y, and Z):

The impact of a program is greater for younger adolescents than for older adolescents.

X =

Y =

Z =

Learning outcomes are positively related to amount of study time for children who use either Book A or Book B, but the relationship is stronger for those who use Book B.

X =

Y =

Z =

The relationship between education and occupational prestige is greater for women than for men.

X =

Y =

Z =

Models of Salary for Men and Women

We wish to test the null hypothesis that the relationship between salary and time on the job is the same for men and women in a large organization. We have data on salary, years on the job, and gender for a random sample of n=200 employees.

Y = salary in $1000s; X1 = years on the job; X2 = gender (men = 0; women = 1)

To test for an interaction, we create a special interaction term, X3 = X1 * X2.

Regression analysis yielded the following model:

Y' = 55.0 + 1.5X1 -3.4X2 + .7X3

With this model, we can predict salary for any individual if we know X1 and X2 for that person.

For men, X2 = 0, and also X3 = 0 because X3 = X1 * X2.

Thus, for men, the regression model simplifies to Y' = 55.0 + 1.5X1.

For women, X2 = 1, so also X3 = X1. Thus, for women, the regression model simplifies to

Y' = 55.0 + 1.5X1 + (-3.4) + .7X1, or Y' = 51.6 + 2.2X1

We can use the models for men and women to create a diagram showing the simple effects for men and women. Elements of the original regression equation can be interpreted as follows.

The constant (55.0) is the predicted salary for someone who has values of zero on all predictors. In this example, men with zero years on the job have X1 = 0, X2 = 0, and X3 = 0, so the constant is the predicted salary for men with zero years on the job, i.e., 55.0 or $55,000.

If we did not have an interaction term in the model, then both men and women would be given the same regression coefficient on X1. Because there is an interaction term, the coefficient of 1.5 on X1 applies only to men (who have values of zero on the interaction term), so the model predicts average salary to be $1500 greater for each year on the job for men. This does not mean that every individual man is expected to earn $1500 more each year, but rather 1.5 describes the slope for men in the cross-sectional data. In general, the coefficient on X1 is the simple effect of X1 when X2 = 0.

The coefficient of -3.4 on X2 is the modeled difference in salary between men and women who have zero years on the job. This is the difference in the constant for the models for men and women.

The coefficient of .7 on the interaction term X3 indicates the difference in the regression weight for men and women. Because X3 = 0 for men and X3 = X1 for women, the weight on X3 is the additional weight given to X1 for women. The null hypothesis for the test of the regression weight on X3 is that this weight is zero in the population, which would mean that the slopes of the regression model for men and women are the same. If this null hypothesis is rejected, the conclusion is that the slopes are different. In the example, the slope is .7 greater for women, indicating that the average increment in predicted salary per year is $700 greater for women. In the regression models for men and women, the weight on X1 is 1.5 for men and 2.2 for women, a difference of .7.

An assumption is that residuals from the model are reasonably homogeneous and normally distributed across levels of X1 and X2. If relationships are nonlinear, the nonlinear components should be included in the model.

SPSS Example: Moderation Effects with a Dichotomous Moderator

Occupational prestige as measured by a standard scale is positively related to years of education, but is the relationship the same for men and women? For this example, we can use data from a national sample of U.S. adults given in 1991 U.S. General Social Survey.SAV, as provided by SPSS. This sample includes n=1415 cases with complete data on the three variables of occupational prestige, years of education, and gender.

In this example, the dependent variable (Y) is occupational prestige and the independent variable (X) is years of education, while gender is a potential moderator (Z). Moderation in this example is indicated by an interaction between X and Z in predicting Y, which would indicate that the relationship between X and Y depends on the level of Z. A special term must be constructed to represent the interaction of X and Z. With regression we can test whether this term contributes beyond the main effects of X and Z in predicting Y. Mathematically, the interaction term can be computed as the product of X and Z. To demonstrate how this works in our example, we compute the product of Education (X) and Sex (Z) to create a new variable that we name EdxSex (XZ), and we include EdxSex in a final model to predict Y.

SPSS Commands (Point and Click):

Call up SPSS and the GSS1991 data file (available online or from Sakai, Resources, Data files, select file 1991 U.S. General Social Service.sav).

First, create the interaction term. Click Transform, Compute…, in the Target Variable: window enter EdxSex, select educ and click the black triangle to enter educ into the Numeric Expression: window, click *, select sex and click the triangle. This should give educ * sex in the Numeric Expression: window. This expression can be typed into the window instead of selecting and clicking. You can click OK to run this computation, or you can click Paste to save the syntax in a syntax file to be run later. If you use Paste, go to the syntax window and run this computation because we need the new variable EdxSex for the next analysis. (Highlight the compute statement and the Execute command, press the triangle to run.)

The regression analysis to test interactions is hierarchical, whereby we must enter the main effects of education and sex before we enter the interaction.

Click Analyze, Regression, Linear…, select prestg80 and click the black triangle to enter it as the Dependent variable. Select educ and click the triangle to enter educ as the first independent variable. Click Next to go to the second block; select sex and click the triangle to enter sex as the second independent variable. Click Next to go to the third block; select EdxSex and click the triangle to enter EdxSex as the third independent variable in a hierarchical analysis.

For illustration, we will ask for a lot of statistics. Click Statistics…, select Estimates, Model fit, R squared change, Descriptives, Part and partial correlations, and Collinearity diagnostics, and click Continue.

Click Plots, select *ZRESID as the Y variable and *ZPRED as the X variable, check Histogram, and click Continue. Click Paste to save the syntax.

Go to the syntax window and run the regression analysis. Table 1 shows selected SPSS output.

Table 1: Test of Moderation Effects with a Dichotomous Moderator (N=1415)

The first model uses only Education (X) to predict Y (Occupational Prestige). We see that education is a strong predictor, with r = beta = .520, t(1413) = 22.864, p < .001.

The second model predicts Occupational Prestige (Y) from the additive effects of Education (X) and Sex (Z), assuming no moderation. From Unstandardized Coefficients we find the following:

Predicted Y = = B0 + B1X + B2Z ; = 14.294 + 2.286X - .709Z

The coefficient B1 = 2.286 can be interpreted as indicating that for either males or females, one additional unit of X (one more year of education) is associated with 2.286 more units of predicted Y (+2.286 on the Occupational Prestige scale). However, if there is an interaction, the model may be misleading. If the effects of education are different for males and females, this simple model is not accurate for either group.

The third model in Table 1 includes the interaction term, resulting in the following equation:

= B0 + B1X + B2Z + B3XZ ; = 22.403 + 1.668X – 6.083Z + .412XZ

The test of statistical significance of the interaction term yields t(1411)=2.050, p=.041. We conclude that the relationship between Education and Occupational Prestige differs for males and females. This also means that the sex difference on occupational prestige varies with level of education. (Statistical significance doesn’t necessarily indicate a large or important effect.)

How does this work? It is instructive to compute the regression equations separately for males and females. In this data set, Sex (Z) is coded Z=1 for males and Z=2 for females. Thus, for males the equation reduces to = 22.403 + 1.668X – (6.083)(1) + .412X(1), which can be written as = 22.403 – 6.083*1 + 1.668*X + .412*X*1, or m = 16.320 + 2.080X.

For females the equation is = 22.403 + 1.668X – (6.083)(2) + .412X(2), which can be written as = 22.403 – 12.166 + 1.668X + .824X, or f = 10.237 + 2.492X.

The weight on the XZ interaction term (B3 = .412) is the difference in the regression weight on X for females and males (2.492 vs. 2.080). Thus, a test of B3 is a test of the sex difference in the regression weight on education when predicting occupational prestige.

We can conclude that, on average, education has a statistically significantly stronger relationship with occupational prestige for females than for males. In the model without the interaction term, the regression weight of 2.286 on Education overestimates the relationship for males and underestimates the relationship for females. Of course, statistical significance does not imply that this difference is large enough to be theoretically or practically interesting.

An interaction is often illustrated effectively with a figure. Figure 7 shows the size and direction of the main effects and interaction, and where the modeled sex effect is largest, etc. This graph was made with Excel using regression weights from SPSS. You can access this Excel worksheet through http://WISE.cgu.edu under WISE Stuff in a file called Plotting Regression Interactions.XLS. It is important to note that the figure is a model of the relationship, not the actual data (which probably would not show such a nice regular pattern).