The Reliability and Validity

The Reliability and Validity of the Myers-Briggs Type Indicator® Instrument

The following excerpts are adapted from Chapter 7 of Building People, Building Programs, written by Drs. Gordon Lawrence and Charles Martin.

Reliability

What is reliability? Reliability is how consistently a test measures what it attempts to measure. Why is consistency important? Well, when you measure something with an instrument two times, you want it to come out with the same answer (or close to it) both times. With the MBTI® instrument, as with other psychological instruments, you want the person to come out the same type both times they take it (this is test-retest reliability, the kind most people care about).

Because personality is "slippery" to measure, psychological instruments cannot have the same consistency you would expect from, say, a ruler. But there are generally accepted standards for psychological instruments. . . . It should be understood that the MBTI® instrument meets and exceeds the standards for psychological instruments in terms of its reliability.

There is also a kind of reliability that addresses the degree to which someone answers questions consistently on any given scale on the same taking of the MBTI® instrument. This is, not surprisingly, called internal consistency reliability. This is of special interest to people who construct instruments because the more consistency there is, the less "noise" there is in the measurement process. It is of interest to (MBTI®) practitioners because it tells us that there is more "noise" when using the MBTI® instrument with some groups of respondents — and this is important to know."

Some conclusions about the reliability of the MBTI® instrument that would be helpful to know . . .

Reliabilities (when scores are treated as continuous scores, as in most other psychological instruments) are as good or better than other personality instruments.
On retest, people come out with three to four type preferences the same 75-90% of the time.
When people change their type on retest, it is usually on one scale, and in scales where the preference clarity was low.
The reliabilities are quite good across age and ethnic groups, although reliabilities on some scales with some groups may be somewhat lower. The T-F scale tends to have the lowest reliability of the four scales.
There are some groups for whom reliabilities are especially low, and caution needs to be exercised in thinking about using the MBTI® instrument with these groups. (For example, children)

Validity

What is validity? Validity is the degree to which an instrument measures what it intends to measure, and the degree to which the "thing" that the instrument measures has meaning. Why is this important? If type is real (or rather, if it is an idea that reflects the real world with any accuracy), then we should be able to use type to understand and predict people's behavior to some degree. Type should help us make useful distinctions in the values, attitudes and behaviors of different people.

The question of validity essentially asks the question, "Is this type stuff real?" Chapter Nine in the Manual (3rd Edition) broadly describes the kind of research that is done to demonstrate the validity of the MBTI® instrument, and large amounts of data are summarized in that chapter. Three broad categories of data are summarized: (1) evidence for the validity of the four separate scales; (2) evidence for the validity of the four preference pairs as dichotomies; and (3) evidence for the validity of whole types or particular combinations of preferences. These three categories of data all speak to question of validity.

Three books offered through the CAPT catalog that go into an in depth discussion of the Reliability and Validity of the MBTI® instrument are: The MBTI Manual, Statistics and Measurement, and Building People, Building Programs.