Bond and Fox, Chapter 5
Invariance in Rasch modeling
Invariance of item difficulty estimates gotten from different samples, perhaps of unequal difficulty
Invariance of person ability estimates – gotten from different items from the sample test item pool.
What do we mean by invariance?
We do not mean exactly equal estimates from two different analyses.
If the items were ratio scaled, invariance would mean equal except for a multiplicative constant.
To a multiplicative constant means that you could multiply all the values of from one set by the same value and you would get all the values from the 2nd set, such as converting from inches to feet.
In this case of the Rasch modeling that we’re doing here, since estimates are approximately (believed to be) interval scaled, it means equal except for an additive AND a multiplicative constant, i.e., a linear transformation.
Invariance means that one set of estimates can be transformed to the other through a linear transformation, as in linear regression analysis.
The quickest way to get a feeling for whether two sets of estimates are invariant in this sense is to create a scatterplot of them. If it’s linear, then one set is a linear transformation of the other.
Chapter 5 deals with invariance of BLOT item difficulty estimates from two different samples.
Two groups of estimates are obtained – one from higher functioning students and one from students who are lower functioning.
If the items are functioning properly, the estimates of item difficulty from the higher functioning students should be a linear transformation of the estimates from the lower functioning students.
This is a pretty stringent test, because the abilities of the two groups will be clustered at different points in the common difficulty/ability axis. This means that some items will not be “covered” by the high ability group and different items will not be “covered” by the low ability group.
If we get invariance of item estimates in spite of this asymmetry of overlap, that’s a good thing for the test.
I have the feeling that the authors are kind of showing off here. Think of giving the same test to a group of high performers and then to a group of low performers. The high performers would mostly get good scores on the test and the poor performers bad scores. How could you equate item difficulties using classical psychometric techniques? Rasch, techniques work, however.
The Bond&FoxChapter5.txt command and data file
&INST ; initial line (can be omitted)
TITLE = "Bond & Fox BLOT data: Chapter 5"
PERSON = Person ; persons are ...
ITEM = Item ; items are ...
ITEM1 = 9 ; column of response to first item in data record
NI = 35 ; number of items
NAME1 = 1 ; column of first character of person label
NAMELEN = 7 ; length of person-identifying label
@ABILITY= $S5W1 ; Ability level is coded in Person Label, column 5. H = High, L = Low
@GENDER = $S7W1 ; Gender is coded in Person Label, column 7. B=Boy, G=Girl, x=unknown
XWIDE = 1 ; number of columns per item response
CODES = 10 ; valid codes in data file
UIMEAN = 0 ; item mean for local origin
USCALE = 1 ; user scaling for logits
UDECIM = 2 ; reported decimal places for user scaling
TOTAL = Yes ; show total raw scores
CHART = Yes ; produce across-pathway picture
MNSQ = No ; use Standardized fit statistics
CONVERGE= L ; Convergence decided by logit change
LCONVERGE=.00001 ; Set logit convergence tight because of anchoring
IAFILE = * ; Item anchor file to preset the difficulty of an item
4 0 ; Item 4 exactly at 0 logit point.
* ; End of anchor list
&END
01 Negation (to negate identity) ; Item labels courtesy of Trevor Bond
02 Reciprocal (to negate identity)
03 Implication
04 Incompatibility
05 Multiplicative compensation
06 Correlations
07 Correlations
08 Correlations
09 Conjunction
10 Disjunction
11 Conjunctive negation
12 Affirmation of p
13 Reciprocal exclusion
14 Probability
15 Reciprocal implication
16 Reciprocal (to negate identitiy)
17 Identity (to negate reciprocal)
18 Negation (to negate correlative)
19 Reciprocal (to cause disequilibrium)
20 Negation (to cause disequilibrium)
21 Correlative + negation > equilibrium
22 Reciprocal + negation > disequilibrium
23 Correlative + identity > disequilibrium
24 Coordination of two systems of reference
25 Complete negation
26 Complete affirmation
27 Negation of p
28 Non-implication
29 Affirmation of q
30 Equivalence
31 Negation of q
32 Negation of reciprocal implication
33 Probability
34 Coordination of two systems of reference
35 Coordination of two systems of reference
END NAMES
001 HG 11111111110110101101011111111011111
The purpose of this file is to assess invariance of item difficulties from two subsamples – a subsample of relatively poor performers and a subsample of relatively good performers. The estimates of the item difficulties should be highly correlated. Note the H. It’s there to indicate the ability group to which each student belongs. (G is the gender group, if that were of interest here.)
Estimation for High performers.
I typed: PSELECT=????H
The P stands for Person;SELECT stands for selection. So this is BF’s way of selecting persons for the analysis. On each line of data, identifying information that will determine whether that line, i.e., person, will be included in the analysis is specified . . .
The ???? is a way of indicating that we’ll accept anything that occurs in the first four columns of a line.
The H indicates that we’ll accept only lines of data with an H in column 5 of the line.
The output tells us that the data of 79 persons were analyzed. 35 items were analyzed.
Reliability of the person ordering was .34. Reliability of the item ordering was .79.
Getting BF3 to save the item info based on H students
We could request an output table and then copy the item difficulty measures from that table and paste them into SPSS. (That’s what you did for the homework assignment that was due today.) However, the program allows us to save these item difficulty measures in a file for later comparison within this program, avoiding our having to use SPSS.
Menu sequence: Output files -> Item File IFILE=
Check the box,
“Permanent file: request file name”
Then click on [OK].
Type the name of the file in which the information is to be stored. The “.txt” part is important.
Here are the leftmost characters of a few lines of what was saved in the file, Ch5H.txt . . .
; Item Bond & Fox BLOT data: Chapter 5 Mar 28 8:11 2012
;ENTRY MEASURE STTS COUNT SCORE ERROR IN.MSQ IN.ZSTD OUT.MS OUT.ZSTD
1 -.93 1 79.0 76.0 .60 .93 .04 .43 -.66
2 -.93 1 79.0 76.0 .60 .95 .08 .50 -.51 )
3 .76 1 79.0 66.0 .32 .96 -.16 .82 -.49
4 .00 2 79.0 72.0 .41 1.00 .09 .75 -.40
5 -1.36 1 79.0 77.0 .72 .97 .18 .53 -.24
The two columns we’ll be most interested in are the two on the left – the item number column(labeled ENTRY) and the item measure column.
Estimation for Low Performers
The Bond&Fox Steps program has to be relaunched.
The output tells us that the data of 71 persons were analyzed. 35 items were analyzed.
Reliability of the person ordering was .69. Reliability of the item ordering was .89.
From the previous analysis we have the item difficulties of the High Performers in the external file called Ch5H.txt and from this analysis we have the item difficulties of the Low performers in the program itself.
The program allows us to create a scatterplot of the two sets of estimates.
Plots ->Compare Statistics Scatterplot
You’ll see
Then, after you click on the Excel icon . . .
Note that there are 2 items – 7 and 6 – that are not particularly invariant across samples.
Overall, the relationship appears to me to be essentially linear.
Comparing estimates the old fashioned way – using SPSS
I ran the program, typing PSELECT=????H at the Extra Specifications prompt .
I then requested an output table by Choosing14.Item:Entry
1. The MEASURES from BF Steps for
H group . . .
ENTRY TOTAL
|NUMBER SCORE COUNT MEASURE
|------
| 1 76 79 -.93
| 2 76 79 -.93
| 3 66 79 .76
| 4 72 79 .00A
| 5 77 79 -1.36
| 6 78 79 -2.07
| 7 79 79 -3.29
| 8 65 79 .86
| 9 68 79 .54
| 10 76 79 -.93
| 11 67 79 .65
| 12 79 79 -3.29
| 13 61 79 1.21
| 14 73 79 -.17
| 15 63 79 1.04
| 16 72 79 .00
| 17 74 79 -.38
| 18 76 79 -.93
| 19 70 79 .30
| 20 77 79 -1.36
| 21 34 79 3.00
| 22 77 79 -1.36
| 23 69 79 .43
| 24 74 79 -.38
| 25 69 79 .43
| 26 70 79 .30
| 27 76 79 -.93
| 28 52 79 1.86
| 29 75 79 -.62
| 30 58 79 1.44
| 31 69 79 .43
| 32 60 79 1.29
| 33 76 79 -.93
| 34 77 79 -1.36
| 35 77 79 -1.36
|------
| MEAN 67.2 76.0 -.23
| S.D. 9.0 .0 1.32
3. I copied the tables into Word, then Alt-selected the Measures and pasted them into SPSS naming them with either an “l” or “h” prefix.
I also alt-selected and copied the totalscore values for the items. These are simply the total number within each sample that got each item correct.
The two sets of measures (along withitem total scores, i.e., number of persons getting item correct) in SPSS.
Note that item difficulty is number missed, so for each group I subtracted the totalscore (number correct) from the subsample size (79 for the H group and 71 for the L group) to get Htotalmissd and Ltotalmissd.
The Relationship of measures in an SPSS scatterplot. .Same as the BF Excel scatterplot.
The relationship of Total Missed for low performers vs total missed for high performers.
OK – this clearly demonstrates an advantage of the Rasch method.
The item difficulties estimated from the Rasch analyses are much morenearly linearly related than those estimated as simply, item total number of items missed.
The graphs below clearly show that the item total score measures and the Rasch item measures are not linearly related.
The H group measures
The L group measures
This means that if you’re going to characterize the items, characterizing them in terms of total score will yield a different picture of the items than characterizing them in terms of the Rasch measure.
Invariance of WPT Item Scores estimated from Low Performers vs High Performers
The above example illustrated the superiority of Rasch measures for the BLOT, probably collected in Australia..
Below is an assessment using WPT data collected here in the good ol’ USA.
The data are from the Balanced Scale study. The purpose of the study was to create a Big Five questionnaire in which the number of positively-worded items was equal to the number of negatively-worded items for each domain. We gave a bunch of other questionnaires, just in case we might need the data of those questionnaires. Interestingly, the “balanced scale” questionnaire was never created. But we’ve used the heck out of the data from the other questionnaires.
Here are the WPT distributions of number correct for the two subsamples . . .
Reportwpt WPT Form II
wptGTmdn / Mean / N / Std. Deviation
.00 / 18.02 / 109 / 2.915
1.00 / 26.72 / 94 / 3.581
Total / 22.05 / 203 / 5.421
BF3 Output from low performers
Now the high performers
Note that the reliability of estimates of person abilities are not high.
However, the reliability of estimates of item difficulties are quite high.
PSY 5950C BF5 - 1
Here are the two Item maps – Low performers on the left
TABLE 12.2 Balanced Scale WPT data – WPT <= 22 ZOU187WS.TXT Mar 28 2017 13:48
INPUT: 109 PERSON 50 ITEM REPORTED: 109 PERSON 50 ITEM 2 CATS Bond&Fox 3.91.0
------
MEASURE PERSON - MAP - ITEM
<more>|<rare>
5 + I0032 I0043 I0046 I0048 I0049 I0050
|
|
|
|
|
4 + I0035 I0037 I0045
|T
|
|
|
|
3 + I0039
|
| I0022 I0041
|
| I0040
|
2 +
|S I0042
| I0047
| I0036
|
| I0033
1 +
| I0019 I0031 I0044
| I0025 I0034
| I0018 I0026 I0038
|
T| I0013 I0027
0 +M I0010
######## |
########## S| I0028 I0029
######## | I0003
####### M| I0024
##### | I0006
-1 ##### + I0007 I0017
#### S| I0009 I0020
.## | I0001 I0030
.### | I0011 I0021
. T|
# |S I0015 I0023
-2 +
| I0004 I0008 I0016
|
|
| I0002
| I0012 I0014
-3 +
|
| I0005
|
|
|T
-4 +
<less>|<freq
EACH "#" IS 2: EACH "." IS 1
MEASURE PERSON - MAP - ITEM
<more>|<rare>
6 + I0048 I0049 I0050
|
|
|
| I0046
5 +
|
| I0043
|
|T
4 +
|
. | I0045
|
|
3 +
| I0041
| I0037 I0039 I0040
. | I0032
|S
2 + I0047
. | I0022 I0035 I0042
. T| I0033
## | I0044
### |
1 # S+ I0019
## | I0034
.##### | I0010 I0027 I0036
########## M| I0018 I0038
###### |
0 .####### +M I0025 I0028
######## S| I0013 I0026
| I0006 I0024
| I0011 I0031
T| I0017 I0029
-1 + I0007 I0021
| I0003
| I0009 I0020
|
|
-2 + I0001 I0015
|S I0030
|
| I0002
| I0004 I0012 I0023
-3 + I0008 I0016
|
|
| I0005
|
-4 +
|T I0014
|
|
|
-5 +
<less>|<freq
EACH "#" IS 2: EACH "." IS 1
PSY 5950C BF5 - 1
I copied total correct and the Rash Measure from each subsample to an SPSS file.
Here are the first few lines of the SPSS file . . .
.
Here are correlations between comparable measures . . .
CorrelationsGE23Missd / GE23Measure
LE22Missd / Pearson Correlation / .939 / .891
Sig. (2-tailed) / .000 / .000
N / 50 / 50
LE22Measure / Pearson Correlation / .946 / .951
Sig. (2-tailed) / .000 / .000
N / 50 / 50
Here are the scatterplots of the relationships between the two sets of item difficulty values . . .
The Rasch Measures are more nearly linearly related, though the different is not super striking.
The Rasch Measures show the problems associated with estimating item difficulty of the most difficult items from the WPT– there are no persons “around” those measures, so the estimates are quite unreliable.
PSY 5950C BF5 - 1