Lecture 2 - Introduction to path analysis.doc

Introduction to path analysis

Based on Asher (1983) on

“Causal models”

summarizing results by

Wright (1934) on

“The method of path coefficients”

and results by

Simon (1957) and Blalock (1964)

Path analysis models

Endogeneous variables (the causal part structure):

A,B,C,D

Exogenous variables (background)

E, F

The statistical model

A sequence of linear regression models

A = A + ABB + ACC + ADD + AEE + AFF + A

B = B + BCC + BDD + BEE + BFF + B

C = C+CDD + CEE + CFF + C

D = D + DEE + DFF + D

The equations defining the regression models are often referred to as a set of structural equations

Structural equation models (SEM) permit a more complex set of similar equations

A path analysis model is a particular simple special case of a SEM

A missing arrow in the path diagram means that the corresponding regression parameter is equal to 0

A = A + ABB + ADD + AFF + A

B = B + BCC + BEE + B

C = C + CDD + C

D = D + DEE + DFF + D

Path analysis with standardized variables

ZA = ABZB + ADZD + AFZF + A

ZB = BCZC + BEZE + B

ZC = CDZD + C

ZD = DEZE + DFZF + D

Simon (1957) and Blalock (1964)

XY = rXY|Z

where

Z is the vector of all variables appearing before Y in the path diagram

rXY|Z is the partial correlation between X and Y given Z

Missing arrow between X and Y in a path diagram

rXY|Z = 0

X and Y are conditionally independent given Z (XY|Z)

when X,Y,Z is multivariate normal

Conditional independencies

AC|BDEFAE|BCDF

BD|CEFBF|CDE

CE|DFCF|DE

Path analysis models are graphical

Models defined by recursive structure and assumptions of conditional independence given prior variables are referred to as chain graph models

Since

AC|BDEFAE|BCDF

BD|CEFBF|CDE

CE|DFCF|DE

defines a graphical model it follows

that

path analysis models are chain graph models

Acyclic chain graph models are called DAGs

(directed acyclic graphs)

Measuring causal effect

in path analysis models

There appears to be consensus that the appropriate measure of causal effect of X on Y is the marginal correlation, rYX

The problem:

RYX is a composite of direct, indirect and spurious (confounding) effects.

Is it possible to estimate these different effects, such that confounding can be eliminated and such that we can estimate the proportion of the effect explained by intermediate variables

Wright (1934)’s rules

The correlation between two variables in a path analysis model can be decomposed into a sum of the direct path coefficient and some compound path coefficients.

The coefficient of a compound path is equal to the product of the path coefficients for the simple paths comprising it.

The compound paths must satisfy the following conditions:

1)No path may pass through the same variable more than once.

2)No path may go backward against the direction of an arrow after the path has gone forward on a different arrow.

3)No path may pass through an edge connecting exogenous variables more than once in any single path.

Decomposition of causal effects

rAD

=

AD

+ ABBCCD

+ DFAF + DEBEAB+ DFrEFBEAB

The total effect =AD + ABBCCD

The mediated proportion

= ABBCCD/(AD + ABBCCD)

The example

Inference

Three standard linear regression analyses for each of the endogenous variables will give us the (standardized) regression parameters we need for the calculation of effects.

Specialized software (Mplus) is available for situations with more variables.

Sex does not confound the estimates of the regression parameters pertaining to plasma glucose.

Statistical analysis using Mplus

Variables

B = GLUC60

L = SYSBT40

M = BMI40

O = SEX

The model statement

MODEL:

B on L M; ! B depends on L and M

L on M O; ! L depends on M and O

M on O; ! M depends on O

1

SAMPLE STATISTICS

Means

B L M O

______

1 5.538 123.458 23.656 1.533

Covariances

B L M O

______

B 2.167

L 3.867 211.540

M 1.654 16.963 12.002

O -0.062 -1.561 -0.394 0.249

Correlations

B L M O

______

B 1.000

L 0.181 1.000

M 0.324 0.337 1.000

O -0.084 -0.215 -0.228 1.000

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value 0.001

Degrees of Freedom 1

P-Value 0.9738

Estimates S.E. Est./S.E. Std StdYX

B ON

L 0.008 0.004 2.086 0.008 0.081

M 0.126 0.016 7.703 0.126 0.297

L ON

M 1.274 0.155 8.234 1.274 0.303

O -4.252 1.074 -3.960 -4.252 -0.146

M ON

O -1.580 0.260 -6.067 -1.580 -0.228

Residual Variances

B 1.924 0.105 18.344 1.924 0.889

L 183.022 9.977 18.344 183.022 0.866

M 11.362 0.619 18.344 11.362 0.948

ESTIMATED MODEL AND RESIDUALS (OBSERVED - ESTIMATED)

Model Estimated Covariances/Correlations/Residual Correlations

B L M O

______

B 2.164

L 3.861 211.226

M 1.652 16.937 11.984

O -0.062 -1.559 -0.393 0.249

Residuals for Covariances/Correlations/Residual Correlations

B L M O

______

B 0.000

L 0.000 0.000

M 0.000 0.000 0.000

O 0.001 0.000 0.000 0.000

Fitted correlation coefficients are not always calculated

Corr(B,M) = 0.325

1

Standardized regression coefficients

The effect of BMI on plasma glucose

Direct effect = 0.297

Indirect effect = 0.3030.081 = 0.025

Spurious effect = (-0.228)(-0.146)0.081 = 0.003

Total effect = 0.322

Mediated proportion = 7.8 %

The causal effect of Sex on Plasma glucose

Corr(Glucose,Sex)

=

The change in the standardized glucose variable, after a change of 1 on the standardized sex variable

Natural code / p / Standardized code
Male / 1 / 0.467 / -1.148
Female / 2 / 0.533 / 0.996

This is not a natural measure of causal effect for binary variables

The natural measure of causal effect of Sex on Glucose

=

the difference in unstandardized glucose between men and women

Covarfit(Glucose,Sex)

=

the fitted covariance between Glucose and Sex under the model

=

-0.062

Zglucose = rglucose,sexZsex

The causal effect of Sex in natural units

=

Do we have a paradox?

Sex does not confound the estimates of the regression parameters pertaining to glucose

but

Sex does confound rGlucose,BMI

(not much, but still …)

Since the standardized regression coefficients are measures of direct causal effect it follows that Sex confounds the measure of indirect

Since we can estimate the size of the spurious effect we can in this specific case adjust for confounding.

But could this have been avoided?

The above structure is based on observational data

The fitted correlation between Glucose and BMI is equal to 0.324.

A little higher than the estimated total effect of 0.322

If it had been possible to randomize BMI in such a way that the marginal distribution of BMI after randomization was the same as in the observational study then the structure path diagram would look like ….

The path diagram for the randomized BMI experiment

Direct effect = 0.297

Indirect effect = 0.3030.081 = 0.025

Spurious effect = 0

Total effect = 0.322

Mediated proportion = 7.8 %

Causal effects may be estimated by fit of the randomized model to the data from the observational study

Estimates are slightly different

The fitted correlation between BMI and Glucose is equal to

0.322

Exactly the same as the previous estimates of the total causal effect!

Summary

Two ways to calculate total causal effects in linear causal models:

I

Use Wright’s rules to calculate direct, indirect and spurious effects

The total effect = direct + indirect

II

Fit the model that would have been appropriate if the cause had been randomized

The total effect is equal to the fitted marginal correlation between cause and effect

A more challenging example

Standardized regression coefficients

1