Lecture 2 - Introduction to path analysis.doc
Introduction to path analysis
Based on Asher (1983) on
“Causal models”
summarizing results by
Wright (1934) on
“The method of path coefficients”
and results by
Simon (1957) and Blalock (1964)
Path analysis models
Endogeneous variables (the causal part structure):
A,B,C,D
Exogenous variables (background)
E, F
The statistical model
A sequence of linear regression models
A = A + ABB + ACC + ADD + AEE + AFF + A
B = B + BCC + BDD + BEE + BFF + B
C = C+CDD + CEE + CFF + C
D = D + DEE + DFF + D
The equations defining the regression models are often referred to as a set of structural equations
Structural equation models (SEM) permit a more complex set of similar equations
A path analysis model is a particular simple special case of a SEM
A missing arrow in the path diagram means that the corresponding regression parameter is equal to 0
A = A + ABB + ADD + AFF + A
B = B + BCC + BEE + B
C = C + CDD + C
D = D + DEE + DFF + D
Path analysis with standardized variables
ZA = ABZB + ADZD + AFZF + A
ZB = BCZC + BEZE + B
ZC = CDZD + C
ZD = DEZE + DFZF + D
Simon (1957) and Blalock (1964)
XY = rXY|Z
where
Z is the vector of all variables appearing before Y in the path diagram
rXY|Z is the partial correlation between X and Y given Z
Missing arrow between X and Y in a path diagram
rXY|Z = 0
X and Y are conditionally independent given Z (XY|Z)
when X,Y,Z is multivariate normal
Conditional independencies
AC|BDEFAE|BCDF
BD|CEFBF|CDE
CE|DFCF|DE
Path analysis models are graphical
Models defined by recursive structure and assumptions of conditional independence given prior variables are referred to as chain graph models
Since
AC|BDEFAE|BCDF
BD|CEFBF|CDE
CE|DFCF|DE
defines a graphical model it follows
that
path analysis models are chain graph models
Acyclic chain graph models are called DAGs
(directed acyclic graphs)
Measuring causal effect
in path analysis models
There appears to be consensus that the appropriate measure of causal effect of X on Y is the marginal correlation, rYX
The problem:
RYX is a composite of direct, indirect and spurious (confounding) effects.
Is it possible to estimate these different effects, such that confounding can be eliminated and such that we can estimate the proportion of the effect explained by intermediate variables
Wright (1934)’s rules
The correlation between two variables in a path analysis model can be decomposed into a sum of the direct path coefficient and some compound path coefficients.
The coefficient of a compound path is equal to the product of the path coefficients for the simple paths comprising it.
The compound paths must satisfy the following conditions:
1)No path may pass through the same variable more than once.
2)No path may go backward against the direction of an arrow after the path has gone forward on a different arrow.
3)No path may pass through an edge connecting exogenous variables more than once in any single path.
Decomposition of causal effects
rAD
=
AD
+ ABBCCD
+ DFAF + DEBEAB+ DFrEFBEAB
The total effect =AD + ABBCCD
The mediated proportion
= ABBCCD/(AD + ABBCCD)
The example
Inference
Three standard linear regression analyses for each of the endogenous variables will give us the (standardized) regression parameters we need for the calculation of effects.
Specialized software (Mplus) is available for situations with more variables.
Sex does not confound the estimates of the regression parameters pertaining to plasma glucose.
Statistical analysis using Mplus
Variables
B = GLUC60
L = SYSBT40
M = BMI40
O = SEX
The model statement
MODEL:
B on L M; ! B depends on L and M
L on M O; ! L depends on M and O
M on O; ! M depends on O
1
SAMPLE STATISTICS
Means
B L M O
______
1 5.538 123.458 23.656 1.533
Covariances
B L M O
______
B 2.167
L 3.867 211.540
M 1.654 16.963 12.002
O -0.062 -1.561 -0.394 0.249
Correlations
B L M O
______
B 1.000
L 0.181 1.000
M 0.324 0.337 1.000
O -0.084 -0.215 -0.228 1.000
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value 0.001
Degrees of Freedom 1
P-Value 0.9738
Estimates S.E. Est./S.E. Std StdYX
B ON
L 0.008 0.004 2.086 0.008 0.081
M 0.126 0.016 7.703 0.126 0.297
L ON
M 1.274 0.155 8.234 1.274 0.303
O -4.252 1.074 -3.960 -4.252 -0.146
M ON
O -1.580 0.260 -6.067 -1.580 -0.228
Residual Variances
B 1.924 0.105 18.344 1.924 0.889
L 183.022 9.977 18.344 183.022 0.866
M 11.362 0.619 18.344 11.362 0.948
ESTIMATED MODEL AND RESIDUALS (OBSERVED - ESTIMATED)
Model Estimated Covariances/Correlations/Residual Correlations
B L M O
______
B 2.164
L 3.861 211.226
M 1.652 16.937 11.984
O -0.062 -1.559 -0.393 0.249
Residuals for Covariances/Correlations/Residual Correlations
B L M O
______
B 0.000
L 0.000 0.000
M 0.000 0.000 0.000
O 0.001 0.000 0.000 0.000
Fitted correlation coefficients are not always calculated
Corr(B,M) = 0.325
1
Standardized regression coefficients
The effect of BMI on plasma glucose
Direct effect = 0.297
Indirect effect = 0.3030.081 = 0.025
Spurious effect = (-0.228)(-0.146)0.081 = 0.003
Total effect = 0.322
Mediated proportion = 7.8 %
The causal effect of Sex on Plasma glucose
Corr(Glucose,Sex)
=
The change in the standardized glucose variable, after a change of 1 on the standardized sex variable
Natural code / p / Standardized codeMale / 1 / 0.467 / -1.148
Female / 2 / 0.533 / 0.996
This is not a natural measure of causal effect for binary variables
The natural measure of causal effect of Sex on Glucose
=
the difference in unstandardized glucose between men and women
Covarfit(Glucose,Sex)
=
the fitted covariance between Glucose and Sex under the model
=
-0.062
Zglucose = rglucose,sexZsex
The causal effect of Sex in natural units
=
Do we have a paradox?
Sex does not confound the estimates of the regression parameters pertaining to glucose
but
Sex does confound rGlucose,BMI
(not much, but still …)
Since the standardized regression coefficients are measures of direct causal effect it follows that Sex confounds the measure of indirect
Since we can estimate the size of the spurious effect we can in this specific case adjust for confounding.
But could this have been avoided?
The above structure is based on observational data
The fitted correlation between Glucose and BMI is equal to 0.324.
A little higher than the estimated total effect of 0.322
If it had been possible to randomize BMI in such a way that the marginal distribution of BMI after randomization was the same as in the observational study then the structure path diagram would look like ….
The path diagram for the randomized BMI experiment
Direct effect = 0.297
Indirect effect = 0.3030.081 = 0.025
Spurious effect = 0
Total effect = 0.322
Mediated proportion = 7.8 %
Causal effects may be estimated by fit of the randomized model to the data from the observational study
Estimates are slightly different
The fitted correlation between BMI and Glucose is equal to
0.322
Exactly the same as the previous estimates of the total causal effect!
Summary
Two ways to calculate total causal effects in linear causal models:
I
Use Wright’s rules to calculate direct, indirect and spurious effects
The total effect = direct + indirect
II
Fit the model that would have been appropriate if the cause had been randomized
The total effect is equal to the fitted marginal correlation between cause and effect
A more challenging example
Standardized regression coefficients
1