How Well a Test Measures What It Purports to Measure?

Chapter 6. Validity

-How well a test measures what it purports to measure?

-It tells us what can be inferred from test scores.

(EX) Measuring a head size for intelligence.

1. Face Validity:

-The degree to which a test appears to measure what it was supposed to measure to the person being tested (i.e., it is not what the test is actually measuring).

(EX) MMPI-2 vs. Rorschach.

-The face validity can affect test-takers’ (or their family members) attitude, motivation, and cooperation.

-The mere appearance of validity (i.e., face validity) is not an acceptable basis for interpretive inferences from test scores.

2. Content validity:

(a) The degree to which a test covers a representative sample of the behavior domain to be measured.

(EX) 2 reliability questions and 48 validity questions?

(EX) Vocabulary questions for this course?

(b) Procedure:

-Topics/chapters on one column and actual questions on another column.

-Ask numerous raters to rate each item if “the skill or knowledge (i.e., definition of validity) the item is measuring” is essential, useful but not essential, or not necessary to measure the target domain (i.e., psychological testing). And then compute the CVR and compare it to the Minimum value table constructed by Lawshe (1975) (i.e., 5 raters = .99, 8 raters = .75, 10 raters = .62)

*Content validity ratio (CVR) = Ne –n/2 / n/2

[Ne = number of essential ratings, n = number of raters)

3. Criterion-related validity (Concurrent vs. predictive validity)

(a) The degree to which a test (i.e., final test for this class) is effectively used to infer an individual’s performance on some measure of interest at this time (i.e., A final test intended to measure how much you learned for this course at this time: concurrent) or in the future (GRE to predict your performance in graduate school: predictive).

-Concurrent validity: The extent to which test scores (i.e., final test scores) may be used to estimate an individual’s present standing on a criterion (i.e., how much learned).

(i.e., BDI vs. Kim’s depression index)

-Predictive validity: The extent to which test scores (i.e., final test scores) may be used to predict an individual’s future performance on a criterion (i.e., GPA in graduate school).

(i.e., Kim’s D.I. vs. suicidal rates at the future).

-Criterion contamination: MMPI for diagnosis Diagnoses by MMPI vs. Diagnoses found in the chart records.

-Incremental validity: More than one predictor.

The degree to which an additional predictor explains something about the criterion measure not explained by predictors already in use.

-Decision theory

Hit: The proportion of people a test accurately identifies as possessing a particular trait (Retain Ha) or not possessing the trait (Reject Ha).

Miss: The proportion of people the test inaccurately identifies as having (False positive: Type I error) or not having (False negative: Type II error) a particular characteristics.

4. Construct validity

-The extent to which a test is measuring a construct or trait as theoretically defined.

(a) Developmental changes.

(Ex) Intelligence is supposed to increase as infants grow up A test developed to measure intelligence must changes as a function of age.

(b) Pre- vs. post-test changes: Test scores should change as theory suggests.

(Ex) The Marital Satisfaction Measure after sex therapy.

(Ex) High correlations among items of a test.

(Ex) High correlations between an individual’s scores on each of items with the individual’s total score.

(d) Convergent validity: Positive correlations with other tests that are already validated as tests measuring the similar constructs.

(Ex) Intelligence test scores must be positively correlated with school performance and other IQ tests already empirically validated.

(e) Discriminant validity: No or very weak correlations with other tests that are intended to measure theoretically different constructs.

(Ex) Intelligence test scores should not be correlated with personality tests.

(f) Factor analysis:

-A statistical procedure to simplify the description of behavior by reducing the number of categories from an initial multiplicity of test variables to a few common factors or traits.

-That is, a statistical technique to discover some coherent subsets of variables that independent of one another from a single set of variables.

-Exploratory (i.e., extracting how many factors a test or a set of variables is measuring) vs. Confirmatory factor (i.e., To see if a test or a set of variables fits with a factor structure that is explicitly hypothesized) analysis.

5. Bias and fairness.

-Cross-cultural.

-Rater’s biases (leniency, severity, halo effect, central tendency error, etc.)