Chapter 4 Typical Performance Test Design

Objective Tests:

True-False Items - simplest and easiest to construct, however, subjects may tend to omit items. The MMPI includes a "cannot say" option but too many of these responses and a score cannot be validly calculated.

Likert Items - are often used on typical performance tests. They may increase precision over true-false items.

Odd or Even? - Test takers like an odd number, 5 or 7, as there is a neutral center point. However, they may overuse the center response which has little value in identifying individual differences. Dr. Richman suggests an even number.

How many choices? - There can be as few as 3 choices but usually there are more. Reliability increases up to 7 choices but then declines with 8 or more choices.

Forced Choice (Ipsative) Items - The Edwards Personal Preference Schedule (EPPS) and Kuder Occupational Interest Survey (KOIS) both use these items (KOIS has three choices per item, not two).

Q-Sort - Subject sorts a set of cards with adjectives (e.g., friendly) into a number of piles in terms of like me to not like me. Favored as a research tool by the Humanist Psychologist Carl Rogers in the mid 1900s. Not that popular a technique today.

Dissimulation - biased and invalid ways of responding, a problem in testing. Two types are most common

1. Response Set - responding so as to present oneself in a certain way (e.g., fake bad or fake good).

2. Response Style - unintentionally responding in a certain way (e.g., agreeable yea sayers, disagreeable nay sayers, neutral).

Validity Scales - many tests have scales to detect dissimulation. The MMPI has four and then some:

1. ? cannot say - detects reluctance to disclose

2. L (lie scale) detects overt attempts to present oneself in a positive light

3. F (fake bad scale) detects over-endorsement of pathology

4. K (fake good scale) detects more subtle fake good attempts than L, also too much willingness to disclose

Item Selection and Scale Construction

1. Logical Content or Rational Strategy - early simple method, just ask (e.g., Are your depressed?). Obviously, this strategy leads to a "face validity" problem.

Face Validity - what the item is getting at is obvious so the subject can control his/her impression.

2. Theoretical Approach - items are derived from a theory. The Thematic Apperception Test (TAT) is based in Henry Murray's theory of "psychological needs" (power, submission). Projective tests are based in psychoanalytic theory.

3. "Empirical" (Scientific) Methods (2) (Driven by data rather than by a theory)

A. Criterion / Contrasted Groups Method (also called Empirical Keying) -MMPI is an example. Large numbers of items are given to different groups. If most of a group (e.g., depressed people) endorse an item (e.g. I feel tense), that item goes into the depression scale, even thought it does not really "look like" a depression item.

B. Factor Analytic Method - complex statistical method, many items are administered to groups of people and are fed into the computer. The computer detects items that "group" or "cluster" together. A number of "factors" is the result. Raymond Cattell's 16pf Questionnaire contains 16 factor derived scales.

While the computer identifies a number of factors, THE TEST DEVELOPER STILL NEEDS TO DECIDE WHAT TO CALL THE FACTORS!

Critics claim that Theory based tests have low reliability and validity

Critics claim that Empirically based tests lack coherence and focus

Both approaches are needed.

Projective Tests: Usually individually administered, are time consuming and expensive, favored by psychoanalysts. They may verbal (1) verbal, (2) pictorial, or (3) drawing type items. Projectives are based on the "projective hypothesis."

Projective Hypothesis - We will project onto an ambiguous stimulus, unconscious thoughts and feelings (Frank, 1939).

Carl Jung - pioneered the use of the "word association" test. Response "latency" was as important as the response itself.

The two most well known projectives are the "pictorial" Rorschach and TAT.

In general, projectives are criticized for low reliability and low validity.

The Exner System - a structured scoring system developed for the Rorschach to improve its psychometrics. No such system exists for the TAT.

Proponents claim that projectives provide a richness of understanding not available with objective tests.

Halo Effect - subject characteristics affecting the examiner's administration or scoring of the test. Though called "halo" it could either raise OR lower the score, depending on the examiner's reaction to the subject.