Bond and Fox, Chapter 5

Bond and Fox, Chapter 5

Invariance in Rasch modeling

Invariance of item difficulty estimates gotten from different samples, perhaps of unequal difficulty

Invariance of person ability estimates – gotten from different items from the sample test item pool.

What do we mean by invariance?

We do not mean exactly equal estimates from two different analyses.

If the items were ratio scaled, invariance would mean equal except for a multiplicative constant.

To a multiplicative constant means that you could multiply all the values of from one set by the same value and you would get all the values from the 2nd set, such as converting from inches to feet.

In this case of the Rasch modeling that we’re doing here, since estimates are approximately (believed to be) interval scaled, it means equal except for an additive AND a multiplicative constant, i.e., a linear transformation.

Invariance means that one set of estimates can be transformed to the other through a linear transformation, as in linear regression analysis.

The quickest way to get a feeling for whether two sets of estimates are invariant in this sense is to create a scatterplot of them. If it’s linear, then one set is a linear transformation of the other.

Chapter 5 deals with invariance of BLOT item difficulty estimates from two different samples.

Two groups of estimates are obtained – one from higher functioning students and one from students who are lower functioning.

If the items are functioning properly, the estimates of item difficulty from the higher functioning students should be a linear transformation of the estimates from the lower functioning students.

This is a pretty stringent test, because the abilities of the two groups will be clustered at different points in the common difficulty/ability axis. This means that some items will not be “covered” by the high ability group and different items will not be “covered” by the low ability group.

If we get invariance of item estimates in spite of this asymmetry of overlap, that’s a good thing for the test.

I have the feeling that the authors are kind of showing off here. Think of giving the same test to a group of high performers and then to a group of low performers. The high performers would mostly get good scores on the test and the poor performers bad scores. How could you equate item difficulties using classical psychometric techniques? Rasch, techniques work, however.

The Bond&FoxChapter5.txt command and data file

&INST ; initial line (can be omitted)

TITLE = "Bond & Fox BLOT data: Chapter 5"

PERSON = Person ; persons are ...

ITEM = Item ; items are ...

ITEM1 = 9 ; column of response to first item in data record

NI = 35 ; number of items

NAME1 = 1 ; column of first character of person label

NAMELEN = 7 ; length of person-identifying label

@ABILITY= $S5W1 ; Ability level is coded in Person Label, column 5. H = High, L = Low

@GENDER = $S7W1 ; Gender is coded in Person Label, column 7. B=Boy, G=Girl, x=unknown

XWIDE = 1 ; number of columns per item response

CODES = 10 ; valid codes in data file

UIMEAN = 0 ; item mean for local origin

USCALE = 1 ; user scaling for logits

UDECIM = 2 ; reported decimal places for user scaling

TOTAL = Yes ; show total raw scores

CHART = Yes ; produce across-pathway picture

MNSQ = No ; use Standardized fit statistics

CONVERGE= L ; Convergence decided by logit change

LCONVERGE=.00001 ; Set logit convergence tight because of anchoring

IAFILE = * ; Item anchor file to preset the difficulty of an item

4 0 ; Item 4 exactly at 0 logit point.

* ; End of anchor list

&END

01 Negation (to negate identity) ; Item labels courtesy of Trevor Bond

02 Reciprocal (to negate identity)

03 Implication

04 Incompatibility

05 Multiplicative compensation

06 Correlations

07 Correlations

08 Correlations

09 Conjunction

10 Disjunction

11 Conjunctive negation

12 Affirmation of p

13 Reciprocal exclusion

14 Probability

15 Reciprocal implication

16 Reciprocal (to negate identitiy)

17 Identity (to negate reciprocal)

18 Negation (to negate correlative)

19 Reciprocal (to cause disequilibrium)

20 Negation (to cause disequilibrium)

21 Correlative + negation > equilibrium

22 Reciprocal + negation > disequilibrium

23 Correlative + identity > disequilibrium

24 Coordination of two systems of reference

25 Complete negation

26 Complete affirmation

27 Negation of p

28 Non-implication

29 Affirmation of q

30 Equivalence

31 Negation of q

32 Negation of reciprocal implication

33 Probability

34 Coordination of two systems of reference

35 Coordination of two systems of reference

END NAMES

001 HG 11111111110110101101011111111011111

The purpose of this file is to assess invariance of item difficulties from two subsamples – a subsample of relatively poor performers and a subsample of relatively good performers. The estimates of the item difficulties should be highly correlated. Note the H. It’s there to indicate the ability group to which each student belongs. (G is the gender group, if that were of interest here.)

Estimation for High performers.

I typed: PSELECT=????H

The P stands for Person;SELECT stands for selection. So this is BF’s way of selecting persons for the analysis. On each line of data, identifying information that will determine whether that line, i.e., person, will be included in the analysis is specified . . .

The ???? is a way of indicating that we’ll accept anything that occurs in the first four columns of a line.

The H indicates that we’ll accept only lines of data with an H in column 5 of the line.

The output tells us that the data of 79 persons were analyzed. 35 items were analyzed.

Reliability of the person ordering was .34. Reliability of the item ordering was .79.

Getting BF3 to save the item info based on H students

We could request an output table and then copy the item difficulty measures from that table and paste them into SPSS. (That’s what you did for the homework assignment that was due today.) However, the program allows us to save these item difficulty measures in a file for later comparison within this program, avoiding our having to use SPSS.

Menu sequence: Output files -> Item File IFILE=

Check the box,

“Permanent file: request file name”

Then click on [OK].

Type the name of the file in which the information is to be stored. The “.txt” part is important.

Here are the leftmost characters of a few lines of what was saved in the file, Ch5H.txt . . .

; Item Bond & Fox BLOT data: Chapter 5 Mar 28 8:11 2012

;ENTRY MEASURE STTS COUNT SCORE ERROR IN.MSQ IN.ZSTD OUT.MS OUT.ZSTD

1 -.93 1 79.0 76.0 .60 .93 .04 .43 -.66

2 -.93 1 79.0 76.0 .60 .95 .08 .50 -.51 )

3 .76 1 79.0 66.0 .32 .96 -.16 .82 -.49

4 .00 2 79.0 72.0 .41 1.00 .09 .75 -.40

5 -1.36 1 79.0 77.0 .72 .97 .18 .53 -.24

The two columns we’ll be most interested in are the two on the left – the item number column(labeled ENTRY) and the item measure column.

Estimation for Low Performers

The Bond&Fox Steps program has to be relaunched.

The output tells us that the data of 71 persons were analyzed. 35 items were analyzed.

Reliability of the person ordering was .69. Reliability of the item ordering was .89.

From the previous analysis we have the item difficulties of the High Performers in the external file called Ch5H.txt and from this analysis we have the item difficulties of the Low performers in the program itself.

The program allows us to create a scatterplot of the two sets of estimates.

Plots ->Compare Statistics Scatterplot

You’ll see

Then, after you click on the Excel icon . . .

Note that there are 2 items – 7 and 6 – that are not particularly invariant across samples.

Overall, the relationship appears to me to be essentially linear.

Comparing estimates the old fashioned way – using SPSS

I ran the program, typing PSELECT=????H at the Extra Specifications prompt .

I then requested an output table by Choosing14.Item:Entry

1. The MEASURES from BF Steps for

H group . . .

ENTRY TOTAL

|NUMBER SCORE COUNT MEASURE

|------

| 1 76 79 -.93

| 2 76 79 -.93

| 3 66 79 .76

| 4 72 79 .00A

| 5 77 79 -1.36

| 6 78 79 -2.07

| 7 79 79 -3.29

| 8 65 79 .86

| 9 68 79 .54

| 10 76 79 -.93

| 11 67 79 .65

| 12 79 79 -3.29

| 13 61 79 1.21

| 14 73 79 -.17

| 15 63 79 1.04

| 16 72 79 .00

| 17 74 79 -.38

| 18 76 79 -.93

| 19 70 79 .30

| 20 77 79 -1.36

| 21 34 79 3.00

| 22 77 79 -1.36

| 23 69 79 .43

| 24 74 79 -.38

| 25 69 79 .43

| 26 70 79 .30

| 27 76 79 -.93

| 28 52 79 1.86

| 29 75 79 -.62

| 30 58 79 1.44

| 31 69 79 .43

| 32 60 79 1.29

| 33 76 79 -.93

| 34 77 79 -1.36

| 35 77 79 -1.36

|------

| MEAN 67.2 76.0 -.23

| S.D. 9.0 .0 1.32

3. I copied the tables into Word, then Alt-selected the Measures and pasted them into SPSS naming them with either an “l” or “h” prefix.

I also alt-selected and copied the totalscore values for the items. These are simply the total number within each sample that got each item correct.

The two sets of measures (along withitem total scores, i.e., number of persons getting item correct) in SPSS.

Note that item difficulty is number missed, so for each group I subtracted the totalscore (number correct) from the subsample size (79 for the H group and 71 for the L group) to get Htotalmissd and Ltotalmissd.

The Relationship of measures in an SPSS scatterplot. .Same as the BF Excel scatterplot.

The relationship of Total Missed for low performers vs total missed for high performers.

OK – this clearly demonstrates an advantage of the Rasch method.

The item difficulties estimated from the Rasch analyses are much morenearly linearly related than those estimated as simply, item total number of items missed.
The graphs below clearly show that the item total score measures and the Rasch item measures are not linearly related.

The H group measures

The L group measures

This means that if you’re going to characterize the items, characterizing them in terms of total score will yield a different picture of the items than characterizing them in terms of the Rasch measure.

Invariance of WPT Item Scores estimated from Low Performers vs High Performers

The above example illustrated the superiority of Rasch measures for the BLOT, probably collected in Australia..

Below is an assessment using WPT data collected here in the good ol’ USA.

The data are from the Balanced Scale study. The purpose of the study was to create a Big Five questionnaire in which the number of positively-worded items was equal to the number of negatively-worded items for each domain. We gave a bunch of other questionnaires, just in case we might need the data of those questionnaires. Interestingly, the “balanced scale” questionnaire was never created. But we’ve used the heck out of the data from the other questionnaires.

Here are the WPT distributions of number correct for the two subsamples . . .

Report
wpt WPT Form II
wptGTmdn / Mean / N / Std. Deviation
.00 / 18.02 / 109 / 2.915
1.00 / 26.72 / 94 / 3.581
Total / 22.05 / 203 / 5.421

BF3 Output from low performers

Now the high performers

Note that the reliability of estimates of person abilities are not high.

However, the reliability of estimates of item difficulties are quite high.

PSY 5950C BF5 - 1

Here are the two Item maps – Low performers on the left

TABLE 12.2 Balanced Scale WPT data – WPT <= 22 ZOU187WS.TXT Mar 28 2017 13:48

INPUT: 109 PERSON 50 ITEM REPORTED: 109 PERSON 50 ITEM 2 CATS Bond&Fox 3.91.0

------

MEASURE PERSON - MAP - ITEM

5 + I0032 I0043 I0046 I0048 I0049 I0050

4 + I0035 I0037 I0045

3 + I0039

| I0022 I0041

| I0040

2 +

|S I0042

| I0047

| I0036

| I0033

1 +

| I0019 I0031 I0044

| I0025 I0034

| I0018 I0026 I0038

T| I0013 I0027

0 +M I0010

######## |

########## S| I0028 I0029

######## | I0003

####### M| I0024

##### | I0006

-1 ##### + I0007 I0017

#### S| I0009 I0020

.## | I0001 I0030

.### | I0011 I0021

. T|

# |S I0015 I0023

-2 +

| I0004 I0008 I0016

| I0002

| I0012 I0014

-3 +

| I0005

-4 +

<less>|<freq

EACH "#" IS 2: EACH "." IS 1

MEASURE PERSON - MAP - ITEM

6 + I0048 I0049 I0050

| I0046

5 +

| I0043

4 +

. | I0045

3 +

| I0041

| I0037 I0039 I0040

. | I0032

2 + I0047

. | I0022 I0035 I0042

. T| I0033

## | I0044

### |

1 # S+ I0019

## | I0034

.##### | I0010 I0027 I0036

########## M| I0018 I0038

###### |

0 .####### +M I0025 I0028

######## S| I0013 I0026

| I0006 I0024

| I0011 I0031

T| I0017 I0029

-1 + I0007 I0021

| I0003

| I0009 I0020

-2 + I0001 I0015

|S I0030

| I0002

| I0004 I0012 I0023

-3 + I0008 I0016

| I0005

-4 +

|T I0014

-5 +

<less>|<freq

EACH "#" IS 2: EACH "." IS 1

PSY 5950C BF5 - 1

I copied total correct and the Rash Measure from each subsample to an SPSS file.

Here are the first few lines of the SPSS file . . .

Here are correlations between comparable measures . . .

Correlations
GE23Missd / GE23Measure
LE22Missd / Pearson Correlation / .939 / .891
Sig. (2-tailed) / .000 / .000
N / 50 / 50
LE22Measure / Pearson Correlation / .946 / .951
Sig. (2-tailed) / .000 / .000
N / 50 / 50

Here are the scatterplots of the relationships between the two sets of item difficulty values . . .

The Rasch Measures are more nearly linearly related, though the different is not super striking.

The Rasch Measures show the problems associated with estimating item difficulty of the most difficult items from the WPT– there are no persons “around” those measures, so the estimates are quite unreliable.

PSY 5950C BF5 - 1