Bond and Fox, Chapter6 – Likert Scales
The authors spend pages 102-103 of the 2nd edition spreading what might be termed lies about Likert scaling.
They suggest that the response of 5 on a 5-point scale is taken by researchers to mean 5 times the value of the response of 1 on the same 5-point scale. No one who knows anything about measurement would presume that.
There are some assumptions underlying common usage of Likert scales that can be questioned, however.
For example,
Does a 5 response to Item 1 from an Extraversion scale represent the same amount of Extraversion as a 5 response to Item 10 from the same scale?
Does a 5 response to Item 1 represent more extraversion than a 3 response to Item 3?
Does a 3 response to item 1 represent more Extraversion than a 2 response to the same item?
Is the difference between a 2 and a 3 response the same as the difference between an 4 and 5?
These are the kinds of issues that Rasch modeling attempts to address.
The relationship between Likert Scales and Right/Wrong tests
Consider the IPIP Extraversion item, “Am the life of the party.”
A person who is extraverted will agree with this item.
Thus, an agreement response is the “correct” response for Extraverts to this item.
So, if we count the number of agreement responses to the Extraversion items, that’s counting the number of “correct” responses to those items. The person with the higher count has more Extraversion, i.e., greater “ability” on the Extraversion dimension.
So, if we were to score all Likert items asDisagree=0 andAgree=1 and if we were to count the 1s to get the persons total Extraversion score, then scoring them would be no different from scoring a Right/Wrong test by counting the 1s.
That is, a Likert item can be viewed as simply a Right/Wrong item in which the Agreement response is the correct answer.
Measurement of personality and intelligence is essentially the same??
Example of scoring a personality test as Right/Wrong, i.e., treating responses as a dichotomy
The data here are taken from a study in which the IPIP and the NEO-FFI Big Five instruments were compared.
There were 189 respondents – UTC UG and G students. Half filled out the IPIP first and the NEO second. The order was reversed for the other half. Bias Study.
The focus here will be on the IPIP Extraversion scale.
The IPIP Extraversion items are
1 I am the life of the party.
2 I don't talk a lot.
3 I feel comfortable around people.
4 I keep in the background.
5 I start conversations.
6 I have little to say.
7 I talk to a lot of different people at parties.
8 I don't like to draw attention to myself.
9 I don't mind being the center of attention.
10 I am quiet around strangers.
The standard SPSS stuff on the data is below. Alpha was .890. Negatively worded items were reverse-scored prior to this and all following analyses.
Scoring as a Right/Wrong test.
To score a test as a right/wrong test, the responses must be dichotomized into Disagreement vs Agreement.
The dichotomization should be made near the middle of the response scale.
I’ll actually try all possible dichotomizations here, in order to show you why the middle is better.
First, I dichotomized the item responses as 1=0 and 2,3,4,5=1 and computed the sum of 1s.
Then I dichotomized the item responses as 1,2=0 and 3,4,5=1 and computed the sum of 1s.
Next was 1,2,3=0 and 4,5=1 and computed the sum of 1s.
Finally, it was 1,2,3,4=0 and 5=1. . . . (Boy, I had so much energy when I was young.)
The correlations of the 4 new right/wrong dichotomies with the originalgold standard summated mean-of-responses Extraversion scale scores are
Correlations with Extraversion gold standard scale scoresbiasid1iext / biasid2iext / biasid3iext / biasid4iext
biasiext / Pearson Correlation / .673 / .900 / .929 / .671
Sig. (2-tailed) / .000 / .000 / .000 / .000
N / 189 / 189 / 189 / 189
As you can see, the two “middle” dichotomizations – 1,2=0 vs 3,4,5=1 and 1,2,3=0 vs 4,5=1 – yielded “right/wrong” scale scores that were very highly correlated with the original summated scores. Dichotomizing at 1 vs 2,3,4,5 and at 1,2,3,4 vs 5 yielded scores that were less valid.
I’m not recommending that you do this. But in a pinch, you can get a fairly good estimate of a personality summated scale score from dichotomized variables treated as if they were right/wrong items, as long as the dichotomization is in the “middle” of the Likert response format.
Here’s a screen shot of the first few cases of the original responses (biasiext) and the 1,2,3=0 / 4,5=1 dichotomized responses (biasd3ext) . . .
As you can see, the scale scores based on the dichotomized items, biasd3iext, look different from the scale scores based on actual responses, biasiext. However they’re quite highly correlated.
Rasch analysis of personality items dichotomized so that they’re like right/wrong items.
Here’s the Rasch analysis of the above 1,2,3=0 vs 4,5=1 items as if they were right/wrong items.
(File is BondFoxChapter6D.txt)
Item information
TABLE 14.1 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012
INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0
------
Person: REAL SEP.: 1.49 REL.: .69 ... Item: REAL SEP.: 6.30 REL.: .98
Item STATISTICS: ENTRY ORDER
+------+
|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| |
|NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item |
|------+------+------+-----+------+------|
| 1 58 189 1.74 .21|1.01 .1| .85 -.4| .63| 81.3 81.3| 01 Life of party |
| 2 121 189 -.80 .20| .94 -.6|1.00 .1| .65| 78.3 78.4| 02R Don't talk a lot |
| 3 138 189 -1.54 .22|1.06 .6| .78 -.6| .60| 78.3 81.4| 03 Comfortable around people|
| 4 94 189 .26 .20| .84 -1.7| .74 -1.7| .71| 81.3 77.6| 04R Keep in background |
| 5 137 189 -1.49 .22| .80 -1.9| .66 -1.1| .67| 84.9 81.1| 05 Start conversations |
| 6 140 189 -1.63 .22|1.05 .5| .81 -.5| .59| 81.9 81.9| 06R Have little to say |
| 7 111 189 -.40 .20| .92 -.8|1.16 .9| .66| 79.5 77.6| 07 Talk to diff people |
| 8 42 189 2.55 .24|1.11 .9|1.63 1.4| .54| 84.9 84.8| 08R Don't draw attention |
| 9 97 189 .14 .20|1.36 3.4|1.28 1.5| .55| 63.9 77.4| 09 Don't mind being center |
| 10 71 189 1.17 .20| .97 -.3| .83 -.8| .66| 79.5 79.3| 10R Quiet around strangers |
|------+------+------+-----+------+------|
| MEAN 88.9 166.0 .00 .21|1.01 .0| .97 -.1| | 79.4 80.1| |
| S.D. 33.1 .0 1.38 .01| .15 1.4| .29 1.0| | 5.6 2.3| |
+------+
There is one item in this analysis that is very poorly fitting – “Don’t mind being the center of attention”. Recall that this item was the only item whose removal would have increased alpha.
Same information as above, but ordered by Extraversion
Top items – Respondent must have a LOT of E to endorse – lots of E required to get this “right”.
Bottom items – Respondent may endorse with just a little E – little E required to get this “right”
TOTAL SCORE is the number of 4/5 responses after reverse-scoring of negatively worded items. A small TOTAL SCORE is an item that only those high in Extraversion would endorse.
TABLE 13.1 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012
INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0
------
Person: REAL SEP.: 1.49 REL.: .69 ... Item: REAL SEP.: 6.30 REL.: .98
Item STATISTICS: MEASURE ORDER
+------+
|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| |
|NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item |
|------+------+------+-----+------+------|
| 8 42 189 2.55 .24|1.11 .9|1.63 1.4| .54| 84.9 84.8| 08R Don't draw attention |
| 1 58 189 1.74 .21|1.01 .1| .85 -.4| .63| 81.3 81.3| 01 Life of party |
| 10 71 189 1.17 .20| .97 -.3| .83 -.8| .66| 79.5 79.3| 10R Quiet around strangers |
| 4 94 189 .26 .20| .84 -1.7| .74 -1.7| .71| 81.3 77.6| 04R Keep in background |
| 9 97 189 .14 .20|1.36 3.4|1.28 1.5| .55| 63.9 77.4| 09 Don't mind being center |
| 7 111 189 -.40 .20| .92 -.8|1.16 .9| .66| 79.5 77.6| 07 Talk to diff people |
| 2 121 189 -.80 .20| .94 -.6|1.00 .1| .65| 78.3 78.4| 02R Don't talk a lot |
| 5 137 189 -1.49 .22| .80 -1.9| .66 -1.1| .67| 84.9 81.1| 05 Start conversations |
| 3 138 189 -1.54 .22|1.06 .6| .78 -.6| .60| 78.3 81.4| 03 Comfortable around people|
| 6 140 189 -1.63 .22|1.05 .5| .81 -.5| .59| 81.9 81.9| 06R Have little to say |
|------+------+------+-----+------+------|
| MEAN 88.9 166.0 .00 .21|1.01 .0| .97 -.1| | 79.4 80.1| |
| S.D. 33.1 .0 1.38 .01| .15 1.4| .29 1.0| | 5.6 2.3| |
+------+
The item “Negation of : Don’t draw attention to myself” is the item only the most extraverted “got correct”. Most respondents agreed with this item. Only those respondents who were the most highly extraverted disagreed with it – they DO draw attention to themselves.
The item, “Negation of : Have little to say” is the item most people from the least extraverted to the most extraverted “got correct”. People even just slightly extraverted disagree with this – they DO have something to say.Only an extreme introvert would agree with this item.
Item Map of dichotomized items
TABLE 12.2 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012
INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0
------
Persons MAP OF Items
<more>|<rare>
3 ###### +
.############ |
|T
|
|
| 08R Don't draw attention
|
|
|
|
2 S+
########## |
|
| 01 Life of party
|
|
|S
|
| 10R Quiet around strangers
######### |
1 +
|
|
|
|
######## |
|
| 04R Keep in background
M|
| 09 Don't mind being center
0 +M
########### |
|
|
| 07 Talk to diff people
|
######## |
|
| 02R Don't talk a lot
|
-1 +
|
.########## |
|
|S
S| 03 Comfortable around people
05 Start conversations
| 06R Have little to say
|
|
.######## |
-2 +
|
|
|
|
|
|
|
.##### |T
|
-3 .##### +
<less>|<frequ
EACH '#' IS 2.
The persons on the left extend beyond the items in the high extraversion direction (upwards) so there were people who likely got all items “correct.” People also extend beyond the items in the low extraversion direction (downwards), so there were people who didn’t get many items “correct” in this dichotomization. The fact that the distribution of person “Extraversion” values extends beyond the item values is good for estimation of item Extraversion.
Comparison of person measures from the dichotomized scale vs Likert measures.
I pasted the Rasch person measures into SPSS and created a scatterplot of the Rasch person measures from the analysis of dichotomized items on the vertical axis vs the original Likert summated scores of the original responses on the horizontal axis. Note that the person measures are of the (1,2,3 vs 4,5) dichotomized values treated as wrong(1,23)/right(4,5) answers. Here’s the scatterplot . . .
There are multiple Likert scale scores for each Rasch measure value because a given proportion “correct” could be obtained by means of different combinations ofthe 10 Extraversion responses.
The r-squared is .861, which means the r is .928, essentially the same as the correlation of the sum of “correct” responses found above (.929, p 3 of this lecture).
The correlation of Rasch measures of dichotomized items with the traditional Likert summated scale scores is very large. In spite of this, inspection of the scatterplot yields some reason not to treat them as being completely equivalent. For example, the points circled in the scatterplot all have the same Rasch score of -2.2 or so. Yet their Likert means range from 2 to 3.2, a wide range of Likert scores all for persons with the same Rasch score. The same criticism applied to the highest and lowest Rasch measures. Each single Rasch value represents a range of summated score values. This is due to the dichotomization.
So we won’t continue to dichotomize the Likert responses and analyze them as right/wrong answers. This was what is called an intellectual exercise – an exercise in “what if”.
Using all the response values in analyses of Likert items – Analysis as Dimension Scales
Start here on 3/25/15 or 4/5/17 or 4/12/17.
We hardly ever dichotomize Likert items and score them as right/wrong.
While personality items can be considered to be kind of like right/wrong items, the two types of of item are not exactly the same – specifically, strength of agreement with a personality item has no direct counterpart in right/wrong answers. In right/wrong answers, you’re either right or you’re wrong. “Amount of rightness” or “amount of wrongness” is rarely measured.
To assess this strength of agreement with the statement aspect, most Likert items use response formats with multiple response options each with a different value, such as 1 thru 5 or 1 thru 7, appropriate analyses of such items must take all the information in the responses into account.
Dichotomizing as done above ignores differences in responses within the two categories.
For example, response of 1 is not the same as a response of 2 or 3, even though they were all categorized as 0 in the above scheme. A response of 7 is more agreement than a response of 5, so treating them both as simply “Agree” jus’ ain’t right.
Rasch (and general IRT) analysis can incorporate multiple responses to each item.
The ideas behind these analyses date to the seminal work of L. L. Thurstone in the 1920s.
It uses a model of respondent behavior that assumes the reading of an item results in an internal “amount” of agreement. “I am the life of the party.”
------|------|------|------|------
This internal “amount” is evaluated against the perceived response alternatives each of which also has an internal representation on the same continuum.
------|------|------|------|------
SD D N A SA
The response alternative to which the internal “amount” is closest is picked as the response.
This means that both the perception of the item and the perceptions of the response alternatives are internalized and placed on an internal continuum.
The assumption is made that there is a dividing line between the internalization of adjacent response alternatives.
These dividing lines are called thresholds in much of the literature on scaling of Likert data.
The Thurstone assumption
T1 T2 T3 T4
------|------|------|------|------
SD D N A SA
For example, the internal representation of “I am the life of the party” is closest to the internal representation of “Agree”, so the participant responds A (for Agree) to that item.
So, the program estimates K-1 thresholds for a K-response format. This means that for a 5-response format with SD, D, N, A, SA, it would estimate 4 thresholds.
The Bond and Fox program also estimates values for the midpoints of the intervals between thresholds – the points corresponding to SD, D, N, A, and SA in the above illustration.
The program reports the threshold values as well as response values on the same scale as people and items, in case you want to use them.
The program estimates the following for Likert data . . .
1. Person “ability” values. More appropriately, person positions on the dimension.
2. Item “difficulty” values. Position of items on the same personality dimension.
Items with high numeric positions are those only endorsed by persons with the most of whatever it is that the items represent, e.g., Extraversion.
3. Threshold values relative to each item.
We won’t pay much attention to these in the hope of avoiding sensory overload.
Trying to deal with 4 thresholds X 50 items, for example, would mean that we would
Have to try to interpret 200 different values. It’s probably the case that individual item thresholds are about the same for each item, so . . .
4. Average threshold values – we’ll consider these.
5. Average Response “marker” values – locations on the internal continuum corresponding to each response category.
Rasch Control file for the analysis – Bond&FoxChapter6B.txt.
The data are Extraversion items from the “Bias” study conducted several years ago. 189 persons responded to both the IPIP-50 and the NEO-FFI Big 5 questionnaires. Responses were on a 5-point scale due to the fact that the NEO questionnaire packets were set up for 5-point scales.
&INST ; initial line (can be omitted)
TITLE = "Bias Study Extraversion Items"
PERSON = Person ; persons are ...
ITEM = Item ; items are ...
ITEM1 = 5 ; column of response to first item in data record
NI = 10 ; number of items
NAME1 = 1 ; column of first character of person label
NAMELEN = 4 ; length of person label
XWIDE = 1 ; number of columns per item response
TOTAL = Yes ; show total raw scores
CHART = Yes ; produce across-pathway picture
MNSQ = No ; use Standardized fit statistics
STBIAS = Yes ; Adjust for estimation bias
MAXPAGE = 60 ; Maximum lines per page
IREFER= FFFFFFFFFF ; Forward and Reversed items; all items are F here
CODES = 12345 ; valid codes in data file
IVALUEF = 12345 ; Forward items
IVALUER = 54321 ; Reversed items if there are any reversed items
CLFILE = * ; category label file for category naming
1+1 "STD Strongly Disagree" ; Item 1 is a forward item
1+2 "D Disagree"
1+3 "N Neither A nor D"
1+4 "A Agree"
1+5 "STA Strongly Agree"
* ; end of CLFILE=* list
&END
01 Life of party
02R Don't talk a lot
03 Comfortable around people
04R Keep in background
05 Start conversations
06R Have little to say
07 Talk to diff people
08R Don't draw attention
09 Don't mind being center
10R Quiet around strangers
END LABELS
30013343433333Data lines – one for each respondend.
30034544444244Values of negatively-worded items are reverse-scored.
N = 189 for this dataset.
Traditional Analysis of these data.
Here are the first 10 cases
(We’ve seen this before, so it’s included here to refresh our memories.)
origorder ie1 ie2 ie3 ie4 ie5 ie6 ie7 ie8 ie9 ie10
1 3 3 4 3 4 3 3 3 3 3
2 4 5 4 4 4 4 4 2 4 4
3 4 4 4 4 4 4 4 3 4 3
4 2 4 4 4 4 4 5 1 1 2
5 2 2 4 3 3 3 2 2 3 2
6 4 3 3 4 4 2 4 3 3 4
7 3 5 4 3 4 4 4 3 3 2
8 1 2 2 2 2 2 2 1 1 2
9 4 5 4 4 5 5 5 4 5 4
10 2 4 2 4 4 2 2 2 3 2
Here’s the RELIABILITY Output
Reliability StatisticsCronbach's Alpha / Cronbach's Alpha Based on Standardized Items / N of Items
.890 / .892 / 10
Summary Item Statistics
Mean / Minimum / Maximum / Range / Maximum / Minimum / Variance / N of Items
Item Means / 3.353 / 2.693 / 3.820 / 1.127 / 1.418 / .183 / 10
Item-Total Statistics
Scale Mean if Item Deleted / Scale Variance if Item Deleted / Corrected Item-Total Correlation / Squared Multiple Correlation / Cronbach's Alpha if Item Deleted
ie1 / 30.63 / 45.467 / .631 / .476 / .879
ie2 / 29.88 / 43.076 / .703 / .566 / .874
ie3 / 29.73 / 47.411 / .577 / .373 / .883
ie4 / 30.15 / 44.705 / .717 / .542 / .874
ie5 / 29.74 / 45.196 / .711 / .550 / .875
ie6 / 29.71 / 46.822 / .581 / .428 / .883
ie7 / 30.14 / 42.251 / .690 / .590 / .875
ie8 / 30.84 / 46.358 / .569 / .434 / .884
ie9 / 30.26 / 45.940 / .524 / .454 / .887
ie10 / 30.71 / 43.748 / .623 / .458 / .881
None of the items “dragged down” alpha.
Here’s a dot plot of the scale scores based on the 10 items.
Mean = 3.35
Median = 3.40
SD = 0.74
Skewness = -0.45
The distribution of scale scores was slightly above the midpoint (3) of the 1-5 response scale.
The distribution is skewed to the left.
Application of Rasch model to a Dimension scale – Bias study Extraversion responses
(File is Bond&FoxChapter6B.txt; Negatively worded items were reverse-scored prior to entry, so all items will be treated by B&F as if they were positively-scored.)
Note that the Person reliability estimate of .85 indicates that if the sample of persons were given an equivalent set of items, we’d expect a correlation of .85 person measures for the two sets of items.
Correlation of .85 between scores of same people using equivalent items – so good test.
The item reliability of .98 indicates that if the same items were used on an equivalent sample of persons, the correlations between item measures for the two samples would be expected to be .98. Correlation of .98 between measures of the same items using equivalent people.
I
tems- Ordered by measure.
TABLE 13.1 Bias Study Extraversion Items ZOU268WS.TXT Apr 2 12:27 2012
INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 5 CATS 1.0.0
------
Person: REAL SEP.: 2.42 REL.: .85 ... Item: REAL SEP.: 7.03 REL.: .98
Item STATISTICS: MEASURE ORDER
+------+
|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| |
|NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item |
|------+------+------+-----+------+------|
| 8 509 189 1.14 .09| .98 -.2|1.03 .3| .65| 52.9 48.3| 08R Don't draw attention |
| 10 533 189 .93 .09|1.16 1.6|1.16 1.5| .70| 46.5 47.6| 10R Quiet around strangers |
| 1 548 189 .80 .09| .86 -1.4| .93 -.7| .69| 50.8 48.1| 01 Life of party |
| 9 618 189 .18 .09|1.32 2.9|1.28 2.6| .62| 48.1 49.0| 09 Don't mind being center |
| 4 639 189 -.01 .10| .68 -3.6| .75 -2.6| .75| 64.2 49.8| 04R Keep in background |
| 7 641 189 -.03 .10|1.20 1.9|1.17 1.6| .73| 47.6 49.9| 07 Talk to diff people |
| 2 691 189 -.52 .10|1.13 1.2|1.06 .6| .74| 46.5 52.9| 02R Don't talk a lot |
| 5 718 189 -.81 .10| .75 -2.5| .74 -2.6| .75| 61.0 55.2| 05 Start conversations |
| 3 719 189 -.82 .10| .96 -.3| .88 -1.1| .65| 59.4 56.0| 03 Comfortable around people|
| 6 722 189 -.86 .10|1.03 .3|1.01 .2| .65| 59.9 56.2| 06R Have little to say |
|------+------+------+-----+------+------|
| MEAN 627.8 187.0 .00 .10|1.01 .0|1.00 .0| | 53.7 51.3| |
| S.D. 76.7 .0 .72 .00| .19 1.9| .17 1.6| | 6.4 3.2| |
+------+
Items with large positive measures are those to which only the most extraverted endorsed. That is, if a person endorsed with those items, they were very extraverted.
Items with negative measures are those to which moderately or even low extraverted respondents endorse. Note that disagreement is endorsement for negatively-worded items. So even low extraverted participants disagreed with “Have little to say.”
Note that there are 3 items for which there is an indication that agreement to the items was not as consistent as we would like. Two of the Infit values are negative, indicating “very good, perhaps too good consistency”.
One item - #9 – has a large positive infit value. This is an item for which some high E people might disagree and low E people might agree.