Scale Validation

Scale validation:

Analysis of the quality of

summated index scales

Svend Kreiner

Dept. of biostatistics, Univ. of Copenhagen

Index scales

Provides indirect measurement of unobservable phenomena

Defined by functions summarizing responses to a number of items

S = f(Y1,…,Yk)

Examples:

Educational tests

Psychological or psychiatric tests

Measurement of socioeconomic status

Attitude measurements

BMI

Health related quality of measurement scales

Most measurement instruments are summated scales, S = iYi.

BMI and Social class are two examples of scales with other types of scale functions.

Example: CHIPS items – a cognitive test

Figur 1. Fire CHIPS items.

The physical functioning (PF)

subscale of SF-36

Does your health now limit you in

these activities? If so, how much?

PF1) Vigorous activities

PF2) Moderate activities

PF3) Lifting or carrying groceries

PF4) Climbing several flights of stairs

PF5) Climbing one flight of stairs

PF6) Bending, kneeling, or stooping

PF7) Walking more than a mile

PF8) Walking several blocks

PF9) Walking one block

PF10) Bathing or dressing yourself

Three response categories

0: Not limited

1: Limited a little

2 : Limited a lot

The PADL (Physical Activities of Daily Living) measure of functional ability of healthy elderly.

Mobility function / Lower limb function / Upper limb function
A: Are you able to walk
indoors? / G: Are you able to wash the
lower part of the body? / G: Are you able to wash the
upper part of the body?
B: Are you able to walk out
of doors in nice weather? / H: Are you able to cut your
toenails? / M: Are you able to cut your
fingernails?
C: Are you able to walk out
of doors in bad weather? / I: Are you able to go to the
toilet yourself? / N: Are you able to comb
your hair?
D: Are you able to manage
stairs? / J: Are You able to dress the
lower part of the body? / O: Are you able to wash
your hair?
E: Are you able to get
outdoors? / K:Are you able to take
shoes/stockings on/off? / P: Are You able to dress the
upper part of the body?
F: Are you able to get up
from a chair or bed?

0 = “Cannot do it at all, or cannot do it without getting tired”

1 = “can do it without getting tired”

Do CHIPS, SF-36 and PADLprovide good measurements?

What can we require of high-quality scales?

Type of requirement / Type of considerations / Requirement
Validity / Substantive / Face validity
Content validity
Statistical / Criterion validity
Construct validity
Technical / Statistical / Sufficiency
Objectivity
No DIF
Reliability
Sensitivity/specificity
Ability to discriminate
Practical / Simplicity
Feasibility

How many of these requirements doCHIPS, SF-36 and PADL meet?

Face validity

Requires substance matter arguments

Some general requirements related to face validity:

Existence of a latent quantitative or ordinal variable (the construct)

A causal relation with the latent variable as the cause and item responses as effects

Monotonous relationships: E(Yi |  = ) is an increasing function of 

Some thinking about response behaviour

What construct does the PF subscale measure?

Is it different from the construct measured by the PADL items?

Content validity

A question of item coverage:

A summated scale should include items relating to all relevant aspects of the construct.

The PADL “item bank”

How should items for a short version of the PADL scale be selected?

No content validity

Content validity

Psychometrics

Three different (but related) traditions:

Classical psychometrics (not to be discussed here)

Item response theory for categorical items

Factor analysis (not to be discussed here)

Criterion validity

The score must correlate with all variables known in advance to be correlated to the latent variable

The SF-36 scale must be correlated to self reported health:

PF
Score / Self reported health
very good / good / fair / bad / Very bad
0 / 44,8% / 48,4% / 6,6% / ,2%
1 / 21,4% / 61,6% / 16,3% / ,7%
2 / 19,8% / 57,1% / 20,7% / 1,8% / ,5%
3 / 10,5% / 55,3% / 31,6% / 2,6%
4 / 12,5% / 46,2% / 35,6% / 5,8%
5 / 9,5% / 40,5% / 44,6% / 4,1% / 1,4%
6 / 10,5% / 50,9% / 35,1% / 1,8% / 1,8%
7 / 25,7% / 20,0% / 40,0% / 14,3%
8 / 7,1% / 21,4% / 57,1% / 14,3%
9 / 9,4% / 18,8% / 62,5% / 9,4%
10 / 8,6% / 14,3% / 68,6% / 5,7% / 2,9%
11 / 29,6% / 48,1% / 18,5% / 3,7%
12 / 5,0% / 15,0% / 65,0% / 15,0%
13 / 3,7% / 3,7% / 70,4% / 18,5% / 3,7%
14 / 21,4% / 50,0% / 28,6%
15 / 18,2% / 54,5% / 9,1% / 18,2%
16 / 9,1% / 45,5% / 27,3% / 18,2%
17 / 9,1% / 36,4% / 45,5% / 9,1%
18 / 14,3% / 71,4% / 14,3%
19 / 9,1% / 27,3% / 36,4% / 27,3%
20 / 12,5% / 50,0% / 37,5%
Total / 29,4% / 47,8% / 19,0% / 3,0% / ,8%

Goodman & Kruskall’s  = 0.607 (p = 0.000)

Criterion validity

The PF score must be correlated with SRH

Goodman & Kruskall’s  = 0.607 (p = 0.000)

PADL

Goodman & Kruskall’s  = -0.544 (p = 0.000)

Construct validity

Items have to be face valid and item responses must not depend directlyon anything but the latent variable

Illustrated in the IRT graph

Psychometric models are graphical models defined by assumptions concerning conditional independence

Notation

Four types of variables:

Items: Y = (Y1,…,Yk)

The total score:S = iYi

The latent trait variable: 

Exogenous variables:X = (X1,…Xm)

Conditional independence:

XY|Z P(X|Y,Z) = P(X|Z)

Construct validity

Construct validity requires

1)Unidimensionality

2)Monotonicity and causality

3)Local independence (YiYj | )

4)No differential item functioning (DIF): (YiXj | )

Consequences of construct validity

1)Items must be positively correlated

2)Items must be positively correlated with rest scores

3)If the score correlates with an exogenous variable, X, then all items must correlate with X in the same way.

4)If the latent variable is monotonously correlated with an exogenous variable, X, then the same is required for the score (criterion validity)

1) – 3) are requirements of consistency

4) is a requirement of criterion validity

Association among PF items (gamma coefficients)

rest

A B C D E F G H I J score

------

A Vig.act. Gamma 0.915 0.889 0.861 0.873 0.835 0.858 0.839 0.821 0.845 0.860