Recent developments in method bias1

Development of method bias models of Big Five Questionnaires.

The original Faking model. Nhung Nguyen’s dissertation data – Summer 2003

Nhung had gathered data involving administration of the IPIP 50 item Big 5 twice – once with instructions to respond honestly, once with instructions to fake good.

When working with these data on a paper involving faking of situational judgment tests, I hit on the idea that the Big 5 latent variables were common across the two instructional conditions but there was an additional influence on responding in the faking conditions. I later found that others had considered this notion. This lead to the following set of models.

Model 1: Basic CFA of Parcels formed from Honest and Faked Questionnaire Items.

Below is the model above with one additional latent variable, representing individual differences in tendency to “add it little to each response” or “substract a little from each response”, called F, for faking here.

To simplify the presentation, the 3 regression arrows to each set of 3 parcels are represented as a single arrow.

Note that the fit is not spectacular, but that it is much better than the fit of the previous model.

Since we were afraid that the paper might be rejected outright because of the fact that the fit indices were not close enough to the traditional threshold values, we looked around for ways to improve fit. We realized that when participants are asked to respond to the same item twice, even under different instructional sets, their responses to those identical items will both be influenced by specific idiosyncratic aspects of the items. Thus, across participants, responses to identical items will be positively correlated.

These idiosyncratic items are part of the “other” influences that are the residual terms. So we allowed the residuals of identical testlets to be correlated. This let to the following model . . .

The fit of this model is closer to being acceptable. Note that F influences only the faked items, not the honest items. We have since discovered that there is an analogous influence on the honest items, one we call M, for method bias.

The fit was better, but still not quite at “rejection-proof” levels.

We considered other possibilities and discovered that there were positive correlations among the F testlets that were not accounted for by the loadings of those testlets onto the single F factor. These seemed to be dimension-specific effects. To account for these we could have introduced a different F latent variable for each dimension. Instead, we chose to allow the residuals between testlets within each dimension to be correlated. This lead to the following, final model . . .. We felt that the fit of this model was acceptable, and submitted the paper for presentation to SIOP, 2004 based on it.

Faking Model Conceptualized as a Longitudinal Growth Model – ALL Summer of 2004

For each dimension, the single-letter latent variable is the Intercept. The latent variable whose name begins with F is the slope.

While my wife and son and daughter-in-law visited Europe, I stayed home to try to develop a perspective on model we had developed. I did this in response to the remark of a reviewer of the SIOP paper who said, “learn all you can about longitudinal models . . .”, I spent the summer doing just that and figuring out how to conceptualize the faking model as an LGM. It turns out to have been a bust, I think.

Measuring Method bias in Honest conditions - 2006.

F in the above is a latent variable that represents a systematic bias on part of participants to adjust their scores to ALL items on a questionnaire. Most participants adjusted their scores positively in the faking condition, but some adjusted them negatively. This type of adjustment is what has been studied under the heading of method bias for more than 20 years. So, it could be said that the above model is consistent with the conceptualization that faking is a form of method bias that emerges under instructions to fake.

The existence of a method bias in the faking condition lead to the question: Is there analogous (or different) bias occurring when participants are instructed to response honestly. The natural extension of the above model is one in which a latent variable like F is added with the Honest testlets as indicators Here it is . . .

Note that this model fits the data quite well (if you ignore the chi-square).
Estimating method bias from a single session of data -2006.

At the time we believed that the ability to estimate a method bias latent variable was due to the fact that we were employing two-condition data – with an honest condition and a faking condition.

We then decided to see whether or not the method latent variables (M or F) could be estimated from the data of only one condition. Here are the results for the H condition of Nhung’s study . . .

The date on the output is probably not correct, since we didn’t start looking at M until 2005. I often change models without changing the documentation associated with them. That is the weak point of documentation – it must be kept consistent. Who has the time?

This model is significant in two ways. First, it demonstrates that the “general” factor (called M here) can be estimated from the data of ONE condition. Second, it demonstrates that there is apparently a general factor effect event when participants are told to respond honestly.
The following is the “same” model applied to only the Nguyen faking condition data.

The main points of this and the previous page is that

1) Method effects exist in both faked data and in honest data, and

2) Big 5 latent variables AND a method bias latent variable could be measured from the data of a single instructional condition.

2007 - Measuring faking from a single session.

The fact that the method latent variable could be measured from the data of a single session meant that it might be possible to measure “faking” from the data of a single session, something that has been done only once, by Cellar, et. al. in 1996. We applied the faking model to both condition of Nhung’s data and then to only the faking data.

For each application, we computed factor scores of the F latent variable. If the F latent variable in the one-condition data was measuring faking in the same way as the F latent variable in the two-condition data, the factor scores should be highly correlated. Here’s a scatterplot of the faking latent variable factor scores from Nhung’s data and from the data of a follow-up study with Lyndsay Wrensen . . .

b.Wrensen and Biderman (2005)

The above relationships strongly suggest that faking measured in the one-condition data is highly correlated with faking measured from two-condition data. This suggests that the faking model could be applied to the data of a single session and amount of faking of participants in that session measured.

Parcels vs. Items as Indicators - 2007

The models above were all applied to testlet/parcel data. That is, each indicator was the average of responses to two or three items. We did that originally because of a belief that we would not get acceptable goodness-of-fit unless we applied the models to parcel data. In the period 2005-2007 we began considering the use of individual items as indicators, rather than parcels. One reason for this was that having more indicators gives you more degrees of freedom, and allows you more freedom to estimate latent variables. The downside is that I believe there is a general tendency for models of individual items to have poorer fit indices than those of parcels. For example, below are graphs of fit indices CFI and RMSEA for individual-item indicators and 2-item parcel indicators for the same data. Note that goodness-of-fit is generally better for the two-items parcel data, particularly that the CFI values move from traditionally unacceptable to traditionally acceptable when parcels are indicators.

Positively-worded and negatively-worded method biases –2008.

Nhung Nguyen had mentioned in emails regarding method bias that we should look at method bias associated with item wording, specifically associated with positively worded items and with negatively-worded items.

We decided to look at bias associated with positively-worded and with negatively-worded items for the several studies we’ve conducted here. Here’s what the path diagram of a Mp, Mn model with individual items as indicators looks like . . .Because of the complexity of the path diagram, all of the applications of the MpMn model have been done using Mplus, which is programmed with commands, rather than figures.

Here’s a summary of Mplus output from application of the MpMn model to four datasets.

In each application, the 10 individual IPIP Big 5 items were indicators.

Dataset

NguyenWrensenDamronSebren

ModeldfChi-square

CFA with No method 1165 2252.12 2315.73 2839.79 2552.45

CFA with M 1115 2031.74 2048.11 2449.20 2253.26

CFA with Mp,Mn 1114 1972.32 2025.24 2282.74 2184.08

M vs. No M 50 220.38* 267.62* 390.59* 299.19*

MpMn vs. M 1 59.42* 22.87* 166.46* 69.18*

MpMn CFI .785 .712 .833 .708

MpMn RMSEA .062 .070 .054 .072

Correlation of Mp with Mn .766 .844 .754 .752

* p < .001

The bottom line is that there is considerable evidence that the responses of participants to Big Five items are influenced by

1. The amount of the particular Big 5 characteristic that each participant possesses

2. A tendency to adjust responses to all positive items. The adjustment is positive for some people, negligible for some, negative for others.

3. A tendency to adjust responses to all negatively worded items. The adjustment is positive for some people, negligible for some, negative for others.

The item-wording adjustments measured here are independent of the Big Five dimensions.

The item-wording adjustments are positively correlated with each other, although not so positively correlated that they can be treated as a single latent variable. This is shown by the significant MpMn vs. M chi-squares.

2009 – Three types of bias factor – General bias, negative bias, and positive bias

As we explored the idea that there are different bias factors associated with different item wordings, I questioned the idea that there were only two such factors. It seemed more reasonable that there are THREE bias factors – negative, positive, and a general bias factor. My idea was buttressed by a recent article by March et al. (2010) in which three factors – a negative, positive, and general factor – were found to account for data of the Rosenberg Self Esteem scale. We explored this possibility by comparing several models for five different datasets. They’re summarized in the following figure from a paper recently submitted to Journal of Research in Personality. Model 6 is the model I believe best represents Big Five questionnaire data.

Figure 1. Models compared. Each rectangle represents the items indicating a Big Five dimension. The left half of each rectangle represents positively-worded items and the right half negatively-worded items. A single arrow drawn from a factor to a rectangle represents all the loadings from that factor to the indicators represented by the rectangle. Residual latent variables have been omitted for clarity

The results of comparisons . . .

Table 2.

Chi-square goodness-of-fit measures and chi-square difference tests.

------

Analysis

------

123456df

------

QuestionnaireIPIPIPIPIPIPIPIPIPIPNEO IPIP / NEO

------

Model 12174.42552.53523.03734.42568.63219.61165 / 1700

Model 21901.72253.33063.72431.22241.52893.11115 / 1640

Chi-squareModel 31853.72186.22715.72275.02230.92838.21114 / 1639

Model 41786.02136.22638.92112.92085.02744.81089 / 1611

Model 51758.22101.92629.62152.72044.12732.71091 / 1609

Model 61642.91980.72162.21912.31962.12589.61065 / 1580

------

Δχ2Model 2 vs 1272.7299.2459.3672.9327.1326.550 / 60

Δχ2Model 3 vs 254.067.1348.0156.231.154.91 / 1

rMpMn.77.76.33.50.86.89

Δχ2Model 4 vs 2115.7117.1424.8318.3156.5148.326 / 29

Δχ2Model 5 vs 2143.5151.4434.1278.5197.4160.424 / 31

Δχ2Model 6 vs 2258.8272.6901.5518.9279.4303.550 / 60

Δχ2Model 6 vs 4143.1155.5476.7200.6122.9155.224 / 31

Δχ2Model 6 vs 5115.3121.2467.4240.482.0143.126 / 29

------

Note. For Analysis 4, residual variance of one item set to .001 .

For analysis 5, variance of Mp set to .001;

For analysis 3, residual variance of one item set to .001;

In all data sets the most general model, Model 6, fit significantly better than any of the other models.

This suggests that the most appropriate model for Big Five data is one that include EIGHT factors – 5 Big Five Trait factors and THREE method bias factors – one influencing only negatively worded items, a second influencing only positively worded items, and a third influencing all items.

Here is a more detailed figure representing Model 6.

But wait, there’s more . . .

2010 – Method factors as measures of well-being??

A few years ago a person with whom I was acquainted was going through some very rough times. One of the primary problems was depression. That person had taken a Big Five questionnaire. When the Big Five questionnaire was score for just the five traits, there was nothing terribly unusual about the profile of scores. Even the Emotional Stability score, while below average, was not as far below average as one would have expected based on the severity of the depression at that time.

However, when the Big Five was scored for SIX factors – the Big Five plus M – a striking profile emerged. The person’s scores on the Big Five traits, including Emotional Stability were nearly normal, but the persons M score were VERY low. At the time, we were still considering M to be a measure of faking and I didn’t do anything immediately with the information. It was one of those isolated pieces of information that you store away for future reference.

Last year, I gathered data on the Big Five, and remembering that person’s profile, I included a measure of depression and also a measure of self-esteem in the questionnaire packet that was given to students. In the analysis of the data, I correlated M scores with both depression and self-esteem scores.

Here are the results, from a paper under review at SIOP for next year . . .

Table 1. Means, standard deviations, correlations, and reliability coefficients for study variables.

------

MeanSDEACSOMCCDRSE

______

E 4.75 1.04 .885

A 5.30 0.74 .317c .789

C 4.57 0.86 .007 .164a .823

S 4.24 0.99 .237b .176a -.021 .842

O 4.85 0.82 .244c .335c .270c .156a.812

M 0.00 0.38 .714c .616c .231b .592c.292c.912

CCD 1.84 0.83 -.202b -.309c -.330c -.284c-.192b-.412c .920

RSE 5.65 0.87 .285c .188a .381c .242c.359c.401c-.674c .847

------

a p < .05b p < .01c p < .001

M correlates very negatively with Depression (CCD) and very positively with Self Esteem (RSE). Its correlations with these are larger than the correlations of Emotional Stability (S) with both.

In fact, we argued in the paper that the correlations of S with both CCD and RSE were spurious, caused by the influence of M on both the Big Five, CCD, and RSE scores.

When the effect of M on S is removed, the correlation between “purified” S and CCD was .01 and with RSE was .00.

2011 - The Big Two and the General Factor of Personality (GFP)

Several theorists believe that there are higher order factors that influence the Big Five.

The Big Two theorists believe that there are two 2nd order factors – Stability and Plasticity. Stability is believed to influence Agreeablenss, Conscientiousness, and Emotional Stability. Plasticity is believed to influence Extraversion and Opennesss.

Other theorists believe that there is a single higher order factor – called the general factor of personality or GFP. It has been conceptualized as a 3rdorder factor, influencing Stability and Plasticity.

2011 – M and the GFP

Contrast the GFP model with the models we’ve been considering. Clearly they are 1) different and 2) can coexist.

Our data have indicated that when M is estimated, the correlations between the Big Five factors are reduced to essentially zero. Since the indicators of a factor must be correlated, else there is no reason for the factor, this result provides little support for the GFP as presented below.

Some studies regarding the GFP have used the

first unrotated factor in an EFA of items.

They found that the GFP estimated in that

way correlated positively with self

presentation. But I would argue that

what they’ve done is get crude

estimates of M and have replicated

our finding of the relationship

of M to self presentation.