Chapter 4: Reliability & Validity

Reliability:“To what extent can we say that the data are consistent?”

  • About consistency
  • Different statistical techniques available, each produces a reliability coefficient ranging from 0.00 to +1.00 (totally inconsistent or totally consistent)

Different techniques to measure reliability

Approach/Method / Description / Remarks
Test-Retest
(coefficient of stability) / Measures across time / - Article need to start time lapse
(higher reliability if longer time)
Parallel-Forms
(coefficient of equivalence) / Measures across forms / - two forms of the same instrument supposedly focusing on the same object to measure
Internal Consistency
  1. Split-half
  1. Kuder-Richardson #20 (K-R 20)
  1. Cronbach’s alpha
/ - Examining performance on odd and even-numbered items separately and measure their correlation
- Order of items does not matter (all possible combinations computed)
- Same as KR 20 if dichotomous; more versatile if items have > 2 possible values / - Will be expectedly high (unusual if low)
Interrater Reliability
  1. Kendall’s coefficient of concordance
  1. Cohen’s kappa
  1. Intraclass Correlation (ICC)
  1. Pearson’s product-moment correlation
/ - For ranked data (ordinal)
- For nominal data (categorical)
- Reliability of ratings (for raw score)
- For raw score

Standard Error Measurement:

Range within which a score would likely to fall if a given measured object were to be remeasured

Some warnings about reliability

  1. Different methods of assessing reliability consider the issue of consistency from different perspectives. E.g. a high coefficient of stability does not necessarily mean high internal consistency
  1. Reliability coefficients really apply to data, not to the measuring instruments. They are characteristics of data rather than the instruments that produce the data.
  1. Place more faith in good results for large groups rather than for small groups.
  1. If a test is administered under time pressure, various estimates of internal consistency (split-half, KR 20, alpha) will be high. So don’t be overly impressed.
  1. Reliability not only criterion used to assess quality of data.

Validity

-concerned with accuracy

-i.e. whether the measuring instrument measures what it purports to measure

-reliability is a necessary but not sufficient condition for validity; valid data are reliable, but not all reliable data are valid

-three kinds of validity:

  1. Content Validity
  2. Content experts evaluates instrument; we should ask:
  3. Who did the evaluations?
  4. What did they check/do?
  5. How was the outcome?
  6. Criterion-Related Validity
  7. Comparing scores with a relevant, established criterion variable
  8. Correlating these two set of scores to produce the validity coefficient
  9. Two kinds:
  10. Concurrent validity (two tests administered at same or near same time)
  11. Predictive validity (test before the criterion)
  12. Construct Validity
  13. 3 ways
  14. Providing correlation with convergent and discriminative variables; score should indicate high correlation for convergent (convergent validity) variables and low correlation for discriminant (discriminant validity) variables.
  15. Show certain groups obtain higher mean scores on new instrument than other groups, with the two groups determined on logical grounds prior to the test
  16. Conduct a factor analysis

Warnings about Validity

  1. Validity (like reliability) is a characteristic of the data, not the instrument.
  2. Importance of correlation as correlation plays a central role in assessing construct validity (the first two ways); hence remember warnings about correlation

Final Comments

1.How high should reliability and validity coefficients be?

Answer: It should be judged in relative to other available instruments.

2.Researchers should use multiple methods to assess reliability and validity.

3.Reliability and validity related to data quality, which by itself, does not determine the degree to which the study’s results can be trusted. Possible for conclusions to be worthless because of the wrong use of statistical procedure, or design of study deficient. In other words, reliability and validity are important, but other important concerns must be attended to as well.