SPS 580 Lecture 6 Controls Z-P multiple regression interaction

  1. LANGUAGE FOR INTERPRETING SLOPES Income  Neighborhood Pessimism

DAS: X (0,1)  Y(int)

Slope = -.293

Higher income people score, on average, .29 points lower on neighborhood pessimism than lower income people.

DAS: X (int 0,3)  Y(int)

Slope = -.152

With each unit increase in income quartile, the neighborhood pessimism score drops by .15 points.

DAS: X (int 0,3)  Y(0,1)

Slope = -.091

With each unit increase in income quartile, the neighborhood pessimism score drops by 9 percent.

DAS: X (0,1)  Y(0,1)

Slope = - .176

Higher income people are 18% less likely to be pessimistic about their neighborhood than lower income people.

  1. INTRODUCING . . . CONTROL VARIABLES

A control variableenters the picture when the theory/idea says there isanother factor that explains the X  Y relationship. For example . . .

The reason higher income people are 18% less likely to be pessimistic about their neighborhood is because higher income people live in places where there is less fear of crimeand fear of crime causes pessimism about the neighborhood

Income causes Fear of Crime, which causes Neighborhood Pessimism

X1  X2  Y

If the control variable is measured in the same survey, then there are statistical procedures to find out whether that control variable explains the X  Y relationship

  1. GETTING A CONTROL VARIABLE . . . Fear of Crime

 2 variables available on the data set, asked same year, etc

will code each (0,1) and create a scale (0,2)

 need to pick a cut point for each (0,1) coding that results in a NICE Scale

Is there a policy relevant group? . . . the policy relevant group is usually the extreme end – in this case “a lot” or “none.” Coding it this way would result in bad skew. There is not a good reason to do this.

 The (low, high) coding should match the language of the theory . . . Fear causes Pessimism, so 0 = low fear 1 = high fear

 The coding of “don’t knows” should match the language of the theory 1 =high fear 2 =other, DK

The (0,1) coding should mean the same thing for each variable in the scale . . . if crimnbr (1) = a lot + some then victim1 (1) = high + moderate

 What coding produces minimal skew (larger variance)? (1+2 =1) (3-8=0)

PRETTY NICE scale

  1. TESTING THE IMPACT OF A CONTROL VARIABLE
  1. When you control for a variable, it means you hold it constant.
  1. So if you want to look at the causal impact of Income (X1) on Neighborhood Pessimism (Y), controlling for Fear of Crime (X2), it means you need to separate the survey sample into two groups (low fear, high fear) and look at the causal impact of Income (X1) on Neighborhood Pessimism within each of these two groups.
  1. Like so many other things in life, this is pretty easy to do with crosstabs . . .

CROSSTABS

Layer 1 = Fear of crime (X2) Row variable = income (X1) Column variable = pessimism (Y)

 PQ version

Causal impact controlling for Fear

  1. HOW TO DETERMINE WHETHER THE CONTROL VARIABLE “EXPLAINS” THE ORIGINAL X1  Y RELATIONSHIP

A.The average conditional difference shows the amount of the X1  Y relationship that remains when the explanatory variable X2 is controlled

Three way table

conditional differences

B.Question: does Fear of Crime explain the relationship between Income and Pessimism?

a)Total Bivariate Relationship = Zero Order effect (difference, slope) = -.18

(Because zero variables are controlled)

b)Direct effect = Partial (difference, slope) = -.14

c)Amount explained by third variable = -.04

  1. Intervening effect . . . if X1 causes X2 X1  X2
  2. Spurious effect ...... if X2 causes X1 X2  X1
  3. We’ll talk about Causal Order among X variables next week

C.Answer: Somewhat, Fear of crime explains 22% of the original relationship. Controlling fear of crime there is still a direct effect of income on pessimism of -.14 which means that controlling for fear of crime, higher income people are 14% less likely than low income people to be pessimistic about their neighborhood

D.PQ way to report significance of the partial slopes

E.PQ way to summarize the impact of controlling a third variable

  1. SIGN ME UP . . .HOW DO I GET THE AVERAGE OF THE CONDITIONAL DIFFERENCES (aka THE PARTIAL, or THEDIRECT EFFECT) ?

A.It would be nice if you just add up the conditionals and get the simple average by dividing by however many conditionals there are (in this case there are two conditionals because fear is (0,1) , but there could be more conditionals if X2 had 3+ categories)

B.But Nooooooo . . . the partial is a WEIGHTED AVERAGE of conditionals

  1. (THE NEXT COMMENT IS FOR EXTRA CREDIT, SKIP IT IF YOU ARE HAVING TROUBLE IN THIS CLASS)
  2. The weights depend on the variance of the difference in each conditional table PARTIAL = Sum of ( weight * conditional difference)

C.So let’s just have PASW calculate it for us . . .

ANALYZE REGRESSION LINEAR

Dependent Neighborhood Pessimism (Y)

Independent(s) Income (X1) Fear of Crime (X2)

OPTIONS Exclude cases pairwise

Coefficientsa
Model / Unstandardized Coefficients / Standardized Coefficients / t / Sig.
B / Std. Error / Beta
1 / (Constant) / .284 / .011 / 25.768 / .000
income50pct above or below median / -.138 / .012 / -.146 / -11.308 / .000
crimscaleDICHOT nbrhd crime + victimization likelihood / .231 / .012 / .246 / 19.008 / .000
a. Dependent Variable: nbhdscaleDICHOT bad vs other
  1. This is a multiple regression (more than one X variable)
  • The slope for X1 is the partial (direct) effect
  • This is where the -.14 comes from
  • It is the impact of X1  Y for a regression model that also includes X2
  • If the partial is NOT statistically significant, then it could = 0 and the control variable is said to fully explain the original X1  Y relationship
  • That didn’t happen here . . . significance test . . . (partial/SE) = t-test = -11 p <.05
  • We already saw that 78% of the original relationship remains – i.e., is not explained by X2 – and now we also learn that the partial is statistically significant
  1. PREDICTED AVERAGE SCORES FOR Y

A.The regression equation predicts the average on Y as a function of scores on two X variables

Predicted average on Y = a + B1 * (x1) + .B2 *(x2)

Predicted average on Y = .284 -.138 *(x1) + .231 *(x2)

B.The prediction is an equation for two lines on a graph . . .

 one line shows the linear relationship between income and pessimism among those with high fear

 the other line shows the linear relationship among those with low fear

the slope (impact of X1  Y) is the same for each of these lines because the partial is the weighted average of the conditional differences and is assumed to be the same within each condition

  1. STATISTICAL INTERACTION

A.In almost every analysis, however, the actualslope is not going to be the same in each condition. You can find out how different they are by graphing observed data:

The slope is a little steeper among those with High Fear (-17%)

than it is for those with Low Fear (-11%)

It is OK if the slopes are A LITTLE different because the regression program is a robot thattreats them asseparate estimates of the partial and assumes the weighted average of the two is the best overall estimate of the partial.

80% of the time this is what happens, the slopes are A LITTLE different, no worry

B.Which meansthat 20% of the time the slopes are A LOT different.

Let’s imagine that the control variable is Place of Residence and the theory is:The reason higher income people are less likely to be pessimistic about their neighborhood is because higher income people are more likely to live in the suburbs and and suburban residents are generally less pessimistic about their neighborhood . . .

Income (X1) causesPlace of Residence (X2), which causes Neighborhood Pessimism (Y)

 And let’s imagine the observed data look like this

 Slope for Chicago = -17%

 Slope for Suburbs = 0%

The regression robot will calculate the partial slope as the weighted average of the conditional slopes . . . i.e., about -8%

But the predicted average scores for Y will always be pretty far off

The regression equation would understate the income difference in the city and overstate the income difference in the suburbs.

C.When the conditional slopes are A LOT different from each other it is calledastatistical INTERACTION. When an interaction is present:

  1. The partial slope calculated by the regression program is WRONG
  2. The regression equation is WRONG
  3. The predictions from the regression equation don’t fit the data very well

Q1: How can you tell if you have a statistical INTERACTION?

A1: Graph the observed data and see if the lines are parallel

A2: Make a table that compares observed with predicted average on Y:

 residuals show where they disagree, large

Q2: What do you do if you have a statistical INTERACTION

A1: Right now –note the problem and proceed

A2: In a couple of weeks -- test the INTERACTION TERM and include it in the equation if the t-test is significant (TBA)

  1. ANOTHER EXAMPLE X1 (interval 0,3) X2 (interval (0,2) Y (interval 0,3)

A.THEORY: Income causes Fear of Crime, which causes Neighborhood Pessimism

B.EXPLAIN THE VARIABLES . . . DESCRIPTIVES

 Descriptives, range, mean

 bar charts to show how NICE they are

C.ZERO ORDER RELATIONSHIP TO BE EXPLAINED

Table of means (not shown)

 Graph to explore curvilinearity

Slope = -.152 T = 15 p < .05

Equation Y = .752 - .152 (X1)

D.INTRODUCE CONTROL VARIABLE

ANALYZE / COMPARE MEANS / MEANS

Dependent Y Independent List X2 Next X1

Options Mean CONTINUE OK

 Table of Means

E.REPORT THE RESULTS

1. Plot the means carefully label everything

 Explore interaction

a little different

Be sure to ask me how to do a regression in Excel to solve for the conditional slopes

.2 Report Direct effect, significance . . . Partial = -.101 T = -10 p < .05

.3 Report the regression equation Y = .422 - .101 ( X1) + .369 * ( X2)

4. Make a table of predicted means OR a graph of the predicted means, use it to talk through the findings from the multiple regression

.5 Make a table that summarizes the impact of the control variable

Explain the impact of the control variable . . . i.e., controlling for Fear explains 34% of the zero order relationship between income and neighborhood pessimism

ASSIGNMENT 6:

Part 1: Calculate a regression slope in Excel.

In the Excel File WEEK 6 SUPPORT MATERIALS there is a spreadsheet called ASSGT 6 part 1 which shows the results of a recent survey of SPS graduates who studied hard and did well in SPS 580. The X variable is the number of years since graduation, the Y variable is the average salary.

  1. Write a seven-word poem in the space provided, When you are satisfied with the poem, freeze the Y variable.
  2. Use Excel and your brain to fill in the boxes: what is the XY slope, mean(X), mean(Y) and the intercept
  3. Use the slope and intercept to fill in the predicted average(Y) as f(X)
  4. Make a graph of the observed average(Y) and the predicted avg(Y) as f(X)
  5. All you need to turn in is the graph, with two sentences max commenting on the results.

FROM THIS POINT ON follow guidelines for writing reports and rules for PQ exhibits

Part 2: Develop an X1 Y theory in a population of interest. Test the impact of an intervening variable X2.

  1. Choose/calculate/recode X1 (interval) and Y (interval)
  2. Don’t go beyond 5 categories for X1 to keep the graphs tidy
  3. Y can have any number of categories
  4. Explain the variables in English, use bar charts to show they are NICE
  5. Explain the zero order results
  6. Choose/calculate/recode theintervening variable X2 – i.e, the theory is that X1 causes X2 and X2 causes
  7. YX2 can be dichotomous or intervalIf interval
  8. Don’t go beyond 4 categories in order to keep the graphs tidy
  9. Explain the impact of the control variable, following the 5 steps in the lecture, and providing the PQ documentation that goes along with those steps.

1