EFA, CFA, and SEM
(You should have read the EFA chapter in T&F, the Structural Equation Modeling chapter, and be reading the Byrne text.)
Observed Variable
A variable whose values are observable.
Examples: IQ Test scores (Scores are directly observable), GREV, GREQ, GREA, UGPA, Minnesota Job Satisfaction Scale, Affective Commitment Scale, Gender, Questionnaire items.
Latent Variable
A variable, i.e., characteristic, presumed to exist, but whose values are NOT observable. A Factor in Factor Analysis literature. A characteristic of people that is not directly observable.
Intelligence, Depression, Job Satisfaction, Affective Commitment, Tendency to display affective state
No direct observation of values of latent variables is possible. Brain states? Brain chemistry?
Indicator
An observed variable whose values are assumed to be related to the values of a latent variable.
Reflective Indicator
An observed variable whose values are partially determined by, i.e., are influenced by or reflect, the values of a latent variable. For example, responses to Conscientiousness items are assumed to reflect a person’s Conscientiousness.
Formative Indicator
An observed variable whose values partially determine, i.e., cause or form, the values of a latent variable.
Exogenous Variable (Ex = Out)
A variable whose values originate from / are caused by factors outside the model, i.e., are not explained within the theory with which we’re working. That is, a variable whose variation we don’t attempt to explain or predict by whatever theory we’re working with. Causes of exogenous variable originate outside the model. Exogenous variables can be observed or latent.
Endogenous variable (En ~~ In)
A variable whose values are explained within the theory with which we’re working. We account for all variation in the values of endogenous variables using the constructs of whatever theory we’re working with. Causes of endogenous variables originate within the model.
Basic SEM Path Analytic Notation
Observed variables are symbolized by squares or rectangles.
Latent Variables are symbolized by Circles or ellipses.
Correlations or covariances between variables are represented by double-headed arrows.
"Causal" or "Predictive" or “Regression” relationships between variables are represented by single-headed arrows
Exogenous Observed Variables
Exogenous variable connect to other variables in the model through either a “causal” arrow or a correlation
Exogenous Latent Variables
Exogenous latent variables also connect to other variables in the model through either a “causal” arrow or a correlation
Endogenous Observed Variables - Endogenous Latent Variable
Endogenous variables connect to other variables in the model by being on the “receiving” end of one or more “causal” arrows. Specifically, endogenous variables are typically represented as being “caused” by 1) other variables in the theory and 2) random error. Thus, 100% of the variation in every endogenous variable is accounted for by either other variables in the model or random error. This means that random error is an exogenous latent variable in SEM diagrams. Random error is a catch-all concept representing all “other” things that are affecting the endogenous variable.
Summary statistics associated with symbols
Our SEM program, Amos, prints means and variances above and to the right. Typically the mean and variance of latent variables are fixed at 0 and 1 respectively, although there are exceptions to this in advanced applications.
Path Diagrams of Analyses We’ve Done Previously
Following is how some of the analyses we’ve performed previously would be represented using path diagrams.
1. Simple correlation between two observed variables.
2. Simple correlations between three observed variables.
3. Simple regression of an observed dependent variable onto one observed independent variable.
4. Multiple Regression of an observed dependent variable onto three observed independent variables.
ANOVA in SEM Models
Since ANOVA is simply regression analysis, the representation of ANOVA in SEM is merely as a regression analysis. The key is to represent the differences between groups with group coding variables, just as we did in 513 and in the beginning of 595 . . .
1) Independent Groups t-test
The two groups are represented by a single, dichotomous observed group-coding variable. It is the independent variable in the regression analysis.
2) One Way ANOVA
The K groups are represented by K-1 group-coding variables created using one of the coding schemes (although I recommend contrast coding). They are the independent variables in the regression analysis. If contrast codes are used, the correlations between all the group coding variables are 0, so no arrows between them need be shown.
3) Factorial ANOVA.
Each factor is represented by G-1 group-coding variables created using one of the coding schemes. The interaction(s) is/are represented by products of the group-coding variables representing the factors. Again, no correlations between coding variables need be shown if contrast codes are used.
Path Diagrams representing Exploratory Factor Analysis
1) Exploratory Factor Analysis solution with one factor.
The factor is represented by a latent variable with three or more observed indicators. (Three is the generally recommended minimum no. of indicators for a factor.)
Note that factors are exogenous. Indicators are endogenous. Since the indicators are endogenous, all of their variance must be accounted for by the model. Thus, each indicator must have an error latent variable to account for the variance in it not accounted for by the factor.
2) Exploratory Factor Analysis solution with two orthogonal factors.
Each factor is represented by a latent variable with three or more indicators. The orthogonality of the factors is represented by the fact that there is no arrow connecting the factor symbols.
Let’s assume that Obs1, 2, and 3 are thought to be primary indicators of F1 and 4,5,6 of F2.
For exploratory factor analysis, each variable is allowed to load on all factors. Of course, the hope is that the loadings will be substantial on only some of the factors and will be close to 0 on the others, but the loadings on all factors are retained, even if they’re close to 0. The loadings that might be close to 0 in the model are shown in red and orange.
Orthogonal factors represent uncorrelated aspects of behavior.
Note what is assumed here: There are two independent characteristics of behavior – F1 and F2. Each one influences responses to all six items, although it is hoped that F1 influences primarily the first 3 items and that F2 influences primarily the last 3 items.
If Obs 1 thru Obs 3 are one class of behavior and Obs 4 thru Obs 6 are a second class, then if the loadings “fit” the expected pattern, this would be evidence for the existence of two independent dispositions – that represented by F1 and that represented by F2.
3) Exploratory Factor Analysis solution with two oblique factors.
Each factor is represented by a latent variable with three or more indicators. The obliqueness of the factors is represented by the fact that there IS an arrow connecting the factors.
Again, in exploratory factor analysis, all indicators load on all factors, even if the loadings are close to zero.
Exploratory factor analysis (EFA) programs, such as that in SPSS, always report estimates of all loadings.
This solution is potentially as important as the orthogonal solution, although in general, I think that researchers are more interested in independent dispositions than they are in correlated dispositions. But discovering why two dispositions are separate but still correlated is an important and potentially rewarding task.
Confirmatory vs Exploratory Factor Analysis
In Exploratory Factor Analysis, the loading of every item on every factor is estimated. The analyst hopes that some of those loadings will be large and some will be small. An EFA two-orthogonal-factor model is represented by the following diagram.
Note that there are arrows (loadings) connecting each variable to each factor. We have no hypotheses about the loading values – we’re exploring – so we estimate all loadings and let them lead us. No EFA programs (except that in Mplus) allow you to specify or fix loadings to pre-determined values.
In contrast to the exploration implicit in EFA, a factor analysis in which some loadings are fixed at specific values is called a Confirmatory Factor Analysis. The analysis is confirming one or more hypotheses about loadings, hypotheses representing by our fixing them at specific (usually 0) values.
Unfortunately, EFA and CFA cannot be done using the same computer program except MPlus.
The problem is that all EFA programs except that in Mplus won’t allow some loadings to be fixed at predetermined values. And CFA programs, except Mplus canNOT estimate the above model. Amos and all CFA programs other than MPlus require that some of the loadings be fixed.
So, in many instances, you will have to employ both SPSS (for EFA) and AMOS (for CFA) in exploring the interrelations between variables and factors. Often, analysts will use an EFA program to estimate ALL loadings to all factors, then use an SEM program to perform a confirmatory factor analysis, fixing those loadings that were close to 0 in the EFA to 0 in the CFA.
Note that in the above confirmatory model, loadings of indicators 4-6 on F1 are fixed at 0, as are loadings of indicators 1-3 on F2. (The arrows are missing, therefore assumed to be zero.)
The Identification Problem
Consider the simple regression model . . .
Quantities which can be computed from the data . .
Mean of the X variable Variance of the X variable
Mean of the Y variable. Variance of the Y variable.
Intercept of X on Y
Slope of X->Y regression
Quantities in the diagram .
Remember that in SEM path diagrams, all the variance in every endogenous variable must be accounted for. For that reason, the path diagram includes a latent “Other factors” variable, labeled “E”.
Mean of X Mean of E
Variance of X Variance of E
Intercept of X-> regression
Slope of X->Y regression
Covariance of E with Y
Whoops! There are 6 quantities in the data but 7 in the model. There are too few quantities in the data. The model is underidentified. – not identified enough - there aren't enough quantities from the data to identify each model value.
Dealing with underidentification . . .
The mean of E is always assumed to be 0.
1) Fix the variance of E to be 1.
So in this regression model, the path diagram will be
In this case, the model is said to be “just identified” or “completely identified”. This means that every estimable quantity in the model corresponds to one quantity obtained from the data.
Or,
2) Fix covariance of E with Y at 1.
Underidentified: Bad.
Just identified: OK
Overidentified: Great, you have degrees of freedom.
Identification in CFA models
Here’s a typical CFA two-factor model.
Making the residuals part of the CFA identified . . .
1. Fix all residual variances to 1.
or
2. Fix all E-O covariances to 1.
Making the Factors part of the CFA identified
1. Fix one of the loadings for each factor at 1
Or
2. Fix the variance of each factor at 1.
Examples
1. Fixing all variances.
2. Fixing residual loadings but Factor variances
3. Fixing residual loadings and factor loadings.
Programming with path diagrams: Introduction to Amos
Amos is an add-on program to SPSS that performs confirmatory factor analysis and structural equation modeling.
It is designed to emphasize a visual interface and has been written so that virtually all analyses can be performed by drawing path diagrams.
It also contains a text-based programming language for those who wish to write programs in the command language.
The Amos drawing toolkit with functions of the most frequently used tools.
Creating an Amos analysis
1. Open Amos Graphics.
2. File -> Data Files . . . (Because you have to connect the path diagram to a data file.)
3. Specify the name of the file that contains the raw or summary data.
a. Click on the [File Name] button.
b. Navigate to the file and double-click on it.
c. Click on the [OK] button.
In this example, I opened a file called IncentiveData080707.sav
4. Draw the desired path diagram using the appropriate drawing tools.
The example below is a simple
correlation analysis.
Amos Details
For most of the analyses you’ll perform using Amos, you should get in the habit of doing the following . . .
View -> Analysis Properties -> Estimation
Check “Estimate means and intercepts”
View -> Analysis Properties -> Output
Check “Standardized estimates”
Check “Squared multiple correlations”
Remember that you must fix some parameter values to make the models identified.
Doing old things in a new way: Analyses we’ve done before, now performed using Amos
The data used for this example are the VALDAT data. We’ll simply look at the output here. Later, we’ll focus on the menu sequences needed to get this output.
a. SPSS analysis of the correlation of FORMULA with P511G
Correlations
b. Amos Input Path Diagram - Input Parameter Values
(Note, I told Amos to estimate means for this analysis.)
c. Amos Output Path Diagram - Unstandardized (Raw) coefficients
c. Amos Path Diagram - Standardized coefficients
Simple Regression Analysis: SPSS and Amos
The data used here are the VALDAT data.
a. SPSS Version 10 output