Chapter 3 Maximal Performance Test Design
Guidelines for Alternate Choice Item Creation - (4 types) 1. matching, 2. true-false, 3. multiple choice, 4. ipsative.
General:
Stem - the statement or question part of an alternate choice item, should be: (1) only as long as necessary, (2) simple in language, (3) grammatically correct.
Distractors (the wrong answers) - should (1) all be in grammatical agreement with the stem, (2) should all be reasonable responses but clearly wrong, (3) should be about equal in their "attractiveness" as an answer, (an obviously wrong distractor contributes nothing).
True-False Items - Avoid the words "always," "never," and "sometimes." The first two "pull" for "false" whereas sometimes "pulls" for "true."
Matching Item sets - (1) specify if responses can be used for more than one question, (2) have one more response than questions, (3) cover only a single domain or only part of a larger domain.
Open Ended Questions - should (1) be as specific as possible, (2) avoid vague instructions such as "discuss" or "describe." "List" or "compare" are better.
Guessing - is a problem in testing theory. The fewer the number of choices, the higher the probability of a correct guess (with true-false the probability of a correct guess is 50%, 3 choices - 33%, 4 choices - 25%, 5 choices 20%).
Probability of a correct guess = 1 / number of choices of choices
Correction for Guessing - Applying a formula that deducts points that are "presumed" to be the result of simply guessing. To be statistically sound, two assumptions must be met.
1. the guesses were "pure" or "random" and not "educated" guesses.
2. all of the wrong answers were guesses.
Though once more popular, this is NOT used very much nowadays, perhaps because those two assumptions are rarely met.
Types of Scores / Transforming Scores
Local Norms - rather than comparing to a huge national sample (as on the Stanford Binet) the norm group might be a school or even a single class.
Raw Scores - the actual score (i.e., number of points) you earned on your test.
Percentiles - tell us how a person scored relative to a group of others, when converting to percentiles we are going from the "interval" level to the "ordinal" level. This is a "non-linear transformation," meaning that the equal unit scale of measurement is lost.
Percentile Cutoffs - E.G., below the 25th and above the 75th (upper and lower quartiles) are often used to create "extreme groups" (e.g., low and high on extraversion) to be used in research.
Quartiles - the distribution of scores is split into four divisions at the 25th, 50th, 75th percentiles.
Deciles - the distribution or scores is split into ten divisions at the 10th, 20th percentiles etc. Not used very much nowadays.
Age Equivalents - Often used in reporting results on children's' testings. Like Binet's mental age. If a child scores the same as the "average" child of a given age, the child is performing at that age equivalent.
Grade Equivalents - Like age equivalents, often used in reporting results on children's' testings such as "achievement tests". If a child scores the same as the "average" child in a given "grade," the child is performing at that grade equivalent.
Intelligence Quotient (IQ) - Attributed to Lewis Terman (though there is some debate), expresses the intelligence of a person compared with others of his/her age as a single number, a "quotient"
Mental age - we know this from Binet's work
Chronological age - our actual age in years
IQ = (mental age / chronological age) X 100
You can see that the "average" IQ is 100
Modern Tests - like the Wechsler IQ tests actually use a different method for calculation the IQ, based on the mean and standard deviation of the test.
Standardized Scores - allow variables on greatly differing scales (e.g., GRE and GPA to be compared, plotted on one graph, etc.
Z Scores - are standardized scores with a mean of 0 and a standard deviation of 1.
T Scores - are standardized scores with a mean of 50 and a standard deviation of 10.
Unlike percentiles These are "linear transformations," meaning that the equal unit (interval) scale of measurement is retained.
Calculating a Criterion Referenced (content referenced) Score: (simply percent correct)
Percent Correct = (number of points earned / number of points possible) X 100
Ex. (75 / 100) = .75 X 100 = 75% Correct
Objective Referenced Scoring - rather than simply one overall score, what is of interest is how the test taker demonstrates knowledge on a SET of objectives, with a set of questions for each objective.
Pass/Fail or Mastery Scores - really just the same as criterion referenced scores.