Systematic Bias and Random Error in Scientific Research:

What They Are, How to Avoid Them

By Andria Cimino

Research outcomes fuel the growth of scientific knowledge. Because today’s results drive tomorrow’s new hypotheses, the design, setting, and data-collection methods of research experiments and studies must be carefully analyzed to ensure the highest degree of accuracy. In particular, there are two sources of skewed results: random error (also called “noise”) and systematic bias (also known as artifact). The first is like a cold—normal, routine, and treatable, if not curable. The second is like a poison for which there is no antidote—only foresight avoids it spreading throughout a study, contaminating all results. Although random error potentially affects the reliability of a study’s results, systematic bias undermines a study’s validity, rendering the results useless. As such, systematic bias is by far the more serious problem of the two.

Research almost always “has a small cold.” The causes of random error derive from variables within an experiment or study that researchers cannot completely control, such as the individual differences among subjects. For instance, in an experiment measuring the speed of axon conduction by timing how long it takes for a line of x number of subjects standing hand to shoulder to pass on a squeeze, a difference such as the number of right-handed vs. left-handed subjects might influence the results, because those using the “wrong” hand would likely respond a little more slowly than those using their dominant hand. This could introduce variance, particularly if numerous trials were run and those using their less dominant hand improved their squeezing time with practice (lecture).

Another area where random error may creep in is measurements. In the above example, the reflexes of the person keeping time may slow down as the experiment progresses over many trials, or, if there are two time keepers, minute differences between their stopwatches could introduce variability. How much variability is important, because if the reliability of the measure is very low, it can decrease the chance of finding statistical significance (Gray, p. 43).

Fortunately, there are a number of ways to minimize the effect of random error on results. Because it is random, it is correctable by statistical analysis. Averaging washes out the dross of unreliability, allowing the gold of best results to shine through. Also, the amount of error itself is measurable, which researchers do by applying the standard deviation formula to their results. These corrections work best when applied to large data sets, so researchers may control for random error in the design of their studies, by planning to run as many trials or include as many subjects as necessary to create that large data set.

Unlike random error, systematic bias creeps into an experiment or study from the outside. Also unlike random error, it is impossible to correct for but rather must be avoided, requiring careful forethought of the study’s design, including how the subjects are chosen and how measurements are taken. If the subjects in one group differ in a fundamental way from the subjects in a control group or from the population for which the study is intended, this is called a biased sample (Gray, pp. 42-43). A very simple (and highly unlikely) example of a biased sample would be using men as subjects in a study on the effect of a drug on women’s health.

Measurements can be biased too, either in reliability or validity. An unreliable measuring tool is one that fails to yield similar results with each use. An invalid tool is one that does not measure what it is intended to measure. Measurement bias due to a lack of validity is more damaging than a lack of unreliability because it leads to false conclusions (Gray, p. 43).

An example of this may be seen in Robert Tyron’s selective breeding experiment to test his hypothesis that behavior could be strongly influenced by genes. Tyron mated those rats that had made the fewest errors in a maze with each other, as well as those that had made the most errors with each other. To control for the possibility that learning was occurring in the nest, he cross-fostered the rats with mothers from the opposite group. Within seven generations, Tyron had a population of “maze bright” rats who clearly outperformed the seventh generation “maze dull” rats in running the same maze. He concluded that the rats had developed into truly “dull” and “bright” strains (Gray, pp. 64-65).

But was this a valid conclusion? Given that he did not control for other physiological factors that could have influenced the first generation rats’ performance in the maze, and the fact that any such factor would have been strengthened by selective breeding (Gray, pp. 65-66), one can argue that this was a case of systematic bias.

For instance, it is not clear if the first generation rats all weighed the same amount. Weight differences could have played a role in the outcome in a number of ways: a) indicating more muscle mass and the ability to run the maze more quickly, b) indicating more fat and less ability to run the maze quickly, or c) indicating differences in nutrition that may have influenced ability to concentrate and problem-solve. Likewise, other physiological differences such as vision, or even differences in behavioral influences such as hunger and curiosity, could also have accounted for the differences in first-generation rat performance (Gray, p. 66).

The Tyron experiment raises the possibility of another form of bias—observer expectancy. When a researcher expects a certain outcome, he or she may interpret ambiguous results in the favored direction. The fact that in different experiments conducted by a different researcher Tyron’s “dull” rats performed just as well and sometimes even better than the “bright” rats argues that some observer bias may have been at work in Tyron’s experiment. It is unclear if Tyron ran the maze trials himself. But even if run by students, they probably knew the purpose of the experiment. The solution to this kind of bias is to ensure that those who run the trials are kept blind, i.e., uninformed of the purpose of the experiment.

Another type of bias, subject expectancy, was probably not an issue in the Tyron experiment. This type of bias happens when the subject’s expectations influence his or her response. An example of this would be a subject in a headache study reporting that he felt better after receiving a drug because he expected to feel better, not necessarily because the drug actually helped. In such studies, placebos are often given to one group as a control for this effect. It is a way of keeping both subjects and observers blind and is therefore called a double-blind experiment.

As the above examples attest, it is crucial to the scientific integrity and meaningfulness of results to prevent systematic bias and to remove random error in research studies.

Gray, Peter, Psychology, 4th Ed., Worth, NY 2002

1