QUESTIONS of the MOMENT...

"Is there any way to improve Average Variance Extracted (AVE) in a Latent Variable (LV) X?"

(The APA citation for this paper is Ping, R. A. (2007). "Is there any way to improve Average Variance Extracted (AVE) in a Latent Variable (LV) X?" [on-line paper]. http://home.att.net/~rpingjr/LowAVE.doc)

Low AVE in an LV may or may not always be "fatal" to publishing a model test. Experience suggests that not all reviewers accept AVE as "the" measure of convergent validity, some prefer reliability for reasons that will be explained later. Thus, if an LV is reliable, that may be a sufficient demonstration of convergent validity for some reviewers. However, low AVE can also produce discriminant validity problems (see "Notes on Salesperson-Employer Relationships..." under the "Socio-Economic Relationship Termination" menu pick on the "Home" web page). Nevertheless, because some authors and reviewers do not accept AVE, they may prefer other discriminant validity criteria (see p. 13 of Step V in the "Testing Latent Variable Models with Survey Data" monograph on this web page).

The logic for possibly ignoring AVE might be that most "interesting" theoretical model testing studies involve a "first-time" model and an initial test that together could/should be viewed as largely "exploratory." This "first test" usually uses new measures in new models tested for the first time, etc., and insisting that the new measures be "perfect" may be inappropriate for a "first-time" study because new knowledge would go unpublished until a "perfect" study is attained. The AVE adherents of course might reply that concluding anything from measures that are more than 50% error is ill advised, because there are so few replication studies.

In my opinion, an AVE slightly below 0.50 may be acceptable in a really "interesting" "first-time" study, 1) if it does not produce discriminant validity problems, 2) the diminished AVE is noted and discussed in the Limitations section of the paper, 3) any significant effects involving the low AVE LV's are held to a higher significance requirement (e.g., |t| >= 2.2 rather than |t| >= 2.0), and 4) any discussion of interpretation, and especially implications, involving the low AVE LV's are clearly labeled as "very provisional" and in need of replication. (A little-used procedure for obtaining an additional replication study to "confirm," or at least investigate further, low AVE results is a "Scenario Analysis"--see "Step III" in the "Testing Latent Variable Models with Survey Data" monograph on the previous web page.) The logic would be that the model may be too interesting to suppress its first test. In different words, the focus of the paper should shift to the new theory developed, and the contributions include that more measurement work is needed on the low AVE measures. For emphasis, the alternative with low AVE would be a propositional paper, which might be considerably less "interesting."

Nevertheless, AVE can occasionally be improved by "weeding" a measure by maximizing reliability (reliability and AVE are strongly correlated in real-world data), or using the "EXCEL template for weeding measures..." on the previous web page. In case of discriminant validity problems, a little-known "residual centering" procedure could be used. Please e-mail me for details.

Clustering the cases, using Ward's method and squared Euclidean distance, into 3 groups sometimes improves AVE. The cases should cluster into 2 large clusters (e.g., the respondents who reported high values of the study variables, and respondents who reported low values of the study variables), and a (hopefully) small cluster of "oddball" cases, some of which may be candidates for omission from the data set. (Experience suggests these oddball cases, especially those near the centroid of the oddball cluster, tend to contribute disportionately to error in structural equation analysis.) In my opinion, dropping these cases before any structural models are estimated it is still "good science"--e.g., similar to dropping incomplete, echeloned, etc. questionnaires.

"Bootstrapping" could also be used to investigate AVE, and possibly improve it (see p. 17 of Step V in "Testing Latent Variable Models with Survey Data" monograph on the previous web page). Experience suggests that AVE may vary across bootstrap subsamples, suggesting that there is a case(s) in the sample that is contributing to low AVE.

However, reporting the results from this sort of thing may be tricky. One option would be to report the average AVE from the bootstrap (if it is more "acceptable" than the full sample AVE). Another would be to drop offending case(s) if they improve AVE. A third option might be to do both. In all cases, a) the full sample AVE probably should be reported, b) a brief comment to the effect that the bootstrap or the omitted case(s) AVE were reported to shed additional light on the sample's underlying error structure (i.e., the low AVE may have been due to cases), and c) measurement and structural model results from the full sample and omitted case(s) probably should be reported.

Curiously, experience suggests that measurement and structural model results from omitting cases to improve AVE's in the neighborhood of .5 are not particularly sensitive to the omitted case(s), based on AVE. In real-world data, a borderline significance(s) may change. In different words, an LV with an AVE of .45 is practically the same in its measurement and structural models as it is with an AVE of .5 in real-world data. (However, this is not the case as AVE continues to move away from .5 in either direction.)

However, while the above procedures sometimes "improve" convergent validity, experience suggests that they usually will not remedy discriminant validity problems. (There are other procedures to improve discriminant validity, but at some point it may be "better science" to simply admit that the measures need more work.)