Probabilities and Statistical formulas

The key point of the Hardy-Weinburg Equation, discussed earlier, is that the frequency of alleles in a stable population remains constant from one generation to the next. There are requirements placed on the population being studied, such as it must be large, it must have random mating patterns, it must not have mutations or migrations, and that all genotypes reproduce with equal success (Kimball, html link). In general, the human population of the world exhibits all of these necessary characteristics.

Knowing that the overall allele frequency in large human populations is stable makes calculating the accuracy of DNA evidence easier, because we are not trying to make estimates about a moving target. The likelihood that two people have identical DNA profiles doesn’t change from one generation to the next.

Prior odds, posterior odds, likelihood of guilt

Prior to the introduction of any evidence, each juror has some feeling for the guilt or innocence of the suspect, which can be referred to as “prior odds of guilt”. These prior odds are heavily in favor of the defendant being innocent at the start of the trial. If a juror didn’t feel that way, he should have been excused. After each piece of evidence is presented, each juror subconsciously reevaluates his belief of innocence or guilt of the defendant, which becomes the “posterior odds of guilt” and also becomes the new “prior odds of guilt” as the next piece of evidence is presented. The strength of each piece of evidence can be mathematically described by the ratio, “likelihood of guilt”.

posterior odds of guilt = likelihood of guilt * prior odds of guilt

For instance, if a juror thought that a suspect had a 1 in a million chance of being guilty (prior odds = 1 / 10^6), and a piece of damaging evidence was presented that indicated that the suspect’s likelihood of guilt was a billion to one (10^9 / 1), then the posterior odds should be 1000 to 1 in favor of guilt.

Of course, these feelings of guilt or innocence by each juror are not really quantifiable to the point where odds can be placed on them, but they do provide a statistician a frame work in which the “strength” of any piece of evidence can be judged.

The “strength” of DNA evidence really has two components to it, the implications or importance of the assertions being made by the prosecution and the defense, along with the likelihood that these assertions are correct. For instance, if a piece of hair was found in the bathroom of a house, and DNA tests confirmed the likelihood that this piece of hair came from the house’s occupant was 200 billion to 1, this piece of evidence really isn’t that “strong”. You would expect to find hair from the occupant of a house in the house’s bathroom, so what have you proved? On the other hand, if the semen of a man was found in a rape victim’s vagina, and the DNA evidence indicated the likelihood that it came from the male suspect was 50 million to 1, this DNA evidence is extremely “strong”. Even though the likelihood ratio is relatively low in the second example compared to the first example, the implications are much more significant. In order to discuss the statistical aspects of DNA evidence, we will not consider the significance of the assertions being made with this evidence, instead, focusing exclusively on the likelihood of these assertions being correct.

As has been stated earlier, DNA evidence merely tries to connect or disconnect a piece of evidence to a suspect. It makes no assertions of guilt or innocence!

Questions that a court wants answered

Before you begin to determine the accuracy of DNA fingerprinting, first you need to be precise in exactly what questions are being answered. Essentially, the accuracy of the DNA evidence boils down to two related questions:

1)Given that a piece of DNA evidence matches the DNA of the suspect, how likely is this DNA evidence to have that type if it comes from the suspect?

2)Given that a piece of DNA evidence matches the DNA of the suspect, how likely is this DNA evidence to have that type if it comes from someone else?

These two questions can be written mathematically with conditional probabilities (Weir, slide 26):

  • E = the event that the DNA evidence at the crime scene matches the DNA of the suspect.
  • Hp = the prosecution’s hypothesis that the DNA evidence at the crime scene comes from the suspect.
  • Hd = the defense’s hypothesis that the DNA evidence at the crime scene comes from someone other than the suspect.
  • L = the likelihood that the prosecution’s hypothesis is correct versus that of the defense’s hypothesis. (Notice that the importance of these two hypotheses is not being considered. We are only looking at the accuracy of the DNA evidence.)

The numerator, P(E | Hp), should be one, since the DNA evidence at the crime scene matches the DNA of the suspect and is exactly what the prosecution would expect. The denominator can be reduced to the following:

  • CprofileA = the probability that some unknown criminal has DNA profile A (all of the alleles measured combined), which happens to match the evidence at the crime scene.
  • SprofileA = the probability that the suspect has DNA profile A (all of the alleles measured combined), which happens to match the evidence at the crime scene.

We already know that the suspect has a DNA profile that matches the DNA profile of the evidence. If the probability of having this DNA profile was completely independent of anyone else having this same DNA profile, then the denominator would become just the probability of anyone having the crime scene’s DNA profile. The probability of this completely independent event occurring can be calculated by multiplying the frequency of each allele involved.

For illustration purposes, let’s assume that the DNA profile at the crime scene involves 9 loci, which involves 18 alleles (n=18). Let’s also assume, merely for convenience, that the frequency of each of the 18 alleles happens to be all the same, Pa=0.1. The chance of someone having the crime scene’s DNA profile, assuming complete independence, is:

P(CprofileA) = P(SprofileA) = P(anyone having profileA) = Pa^n = 0.1^18 = 1/10^18

The likelihood ratio would become:

L = 10^18

However, as will be discussed momentarily, the frequency of alleles is NOT independent.

Dependency of the data and how it affects its accuracy

The dependency between the data comes from the fact that all humans have essentially the same genome. In fact, the genomes of a chimpanzee and a human are roughly 98% the same. Obviously, DNA fingerprinting looks at those parts of the human genome which display a significant variability within human populations. However, once a certain genome has been found, the likelihood that it will be found again goes up tremendously.

The probability in the sample 9 locus system in the previous section was based upon the false belief that the frequency of each allele is independent of the frequency of the other alleles. What are the effects of all humanity having roughly a common evolutionary history? A simple expression is available for joint probabilities of sets of alleles (Weir, slide 35):

  • Pa = the frequency of allele a
  • Theta = the probability that two alleles of a gene, each from a different randomly selected person, are identical due to evolutionary means. (The alleles come from the same common ancestor some time in antiquity.)

Theta is a measure of relatedness due to evolutionary forces between two alleles in a general population (Weir, 32:15). If theta = 0, then the relatedness between the two alleles is none and the two alleles are independent. If theta = 1, then the existence of an allele being found in someone implies that it must be found in everyone, perfect dependence. Looking at the formula:

The more compact you define your subpopulation the more likely that two alleles from different people will come from a common ancestor. So as the subpopulation becomes more narrowly defined, theta increases. In general, a safe value for theta for the world’s population is around 4% (Weir, 32:15).

Now let’s return to the calculation that was made in the previous section for the 9 locus (18 allele) system in which we said that each allele had a frequency of 0.1, for simplicity sake. Applying the above formula with a theta = 0.04 we get the following:

P(aa|aa) = 0.0338462

Since the formula is for pairs of alleles, we take the square root to get:

Pa = 0.183973

Using our new value for Pa, our hypothetical 18 allele system now has a probability of :

Pa^18 = (0.183973)^18 = 5.582886 x 10^-14

On one hand, the new overall value is still quite small, and the likelihood ratio is still quite large. However, the value has weakened by:

(5.582886 x 10^-14) / (10^-18) = 5.582886 x 10^4

Sampling effects and confidence intervals

It is impossible to know the real variation between the entire human population or any sizable subpopulation. There are over 6 billion people on this planet, and growing. It is unreasonable to expect anyone to build a DNA profile for every person on Earth, and so estimates of the accuracy of our DNA database must be made.

However, we do have ever growing DNA databases, and from these databases we can extract the frequency of any allele. If we assume that allele frequency in a DNA database has a binomial distribution, and that the probability for “success” is the frequency of the allele in question, then we can create a confidence interval for the frequency of that allele (Spiegel, p 207, 213).

  • Panew = is the new value that we will use for the frequency of allele a.
  • Pa = the frequency of allele a in the DNA database. It can also be thought of as the probability of “success”, since a binomial distribution is being assumed.
  • N = number of entries in the DNA database
  • Zc = the desired confidence interval, in terms of standard deviations

The first thing to notice about this formula is that we are only concerned about how much we might have underestimated the frequency of allele a, Pa, and therefore we are adding something to it in order to gain a level of confidence that we have not unduly strengthened our DNA evidence. Normally, this formula results in a range of values, +/- Zc(…. ).

Returning to our 9 locus (18 allele) example, with the frequency of each of these alleles being 0.1, let’s set Zc = 3 standard deviations and assume that our DNA database has N = 10000 alleles of the locus in question.

Panew = 0.1 + 3 ((0.1)(0.9)/10000)^0.5 = 0.103

A confidence interval of 3 standard deviations is 99.73% sure that real value of what ever is being estimated falls within 3 standard deviations of the measured value. But we are not concerned about overestimating the frequency of allele a, so we have blindly added 3 standard deviations to its value. Since we don’t care that the real value of the frequency might fall below this confidence interval, our “confidence” that we have fairly (from the defense’s point of view) estimated the frequency of allele a is really 99.865%.

Applying the new value of Pa to our 18 allele system, the accuracy of our evidence now becomes:

0.103^18 = 1.702 x 10^-18

And the likelihood ratio now is L = 5.874 x 10^17

Again, the likelihood ration has been reduced, in this case by a factor of:

(5.874 x 10^17) / 10^18 = 0.5874

Note that as the size of our DNA database increases, the less of a correction to the allele frequency we must make (N is getting bigger). Since the size of our DNA databases around the world are growing very fast, significant corrections to allele frequencies due to incomplete sampling of the entire population will become very small.

Police / Laboratory Error Rates

Despite the seemingly over powering accuracy of DNA fingerprints, the gathering, storing, and processing of the evidence itself involves humans and human error. Of course, this is true of all forms of evidence presented in court, but it is the overwhelming factor when talking about DNA evidence. A piece of DNA evidence may place a suspect at the scene of a crime beyond doubt, unless the evidence itself was mishandled. Police departments and crime laboratories are understandably reluctant to gather and publish statistics on how often they mishandle evidence, and how often a case is brought against a suspect with evidence that apparently was mishandled. Most often, if the handling of the evidence is determined by the police or prosecuting attorneys to be mishandled, it is not used to avoid tainting the case. The percentage of cases that are actually brought to trial that involve evidence that is ultimately deemed to be improperly handled or improperly processed varies tremendously from police department to police department and crime laboratory to crime laboratory. In most cases, these numbers are just not available.

To gain an idea of how overwhelming the error rate of gathering and handling the evidence is let’s take a hypothetical situation. Suppose that the district attorney’s office brings 10000 cases to trial that involves DNA evidence, and that the DNA evidence is handled incorrectly by the police only 2% of the time. Also, assume that the crime laboratory will process this DNA evidence incorrectly only 0.1% of the time. Now also let’s assume that the average likelihood that the DNA evidence comes from some unknown persons, instead of the respective suspects in the various cases is 1 in 50 million. So we have the following probabilities for failure:

How many cases will be affected by poor police practices? How many cases will be affected by an error made by the crime lab? What is the overall strength of this DNA evidence? What is the overall strength of the DNA evidence, assuming that the DNA tests are 100% accurate and only the potential human error is involved? How many cases will involve faulty DNA evidence?

As you can see, of the 10000 trials, faulty DNA evidence will be involved in approximately 210 of them. (You don’t merely add the number of trials that are negatively affected by the 3 sources of error to each other, because they might accidentally affect the same trial.) Also, for all practical purposes, the overall strength of the DNA evidence is the overall strength of the humans involved in bringing the evidence to trial.

Problems trying to measure the accuracy of the police and crime labs

Like everyone, police and scientists resist having measurements made of how effectively they are doing their jobs. This is just human nature. Consequently, there is a tremendous resistance to gathering and maintaining statistics about how often a particular member of the police force or a particular technician makes a mistake that finds its way into the proceedings of a court case. This resistance also applies institutionally to the police forces and crime labs around the world, not just on an individual basis.

Beyond the natural human resistance to being measured, there are other impediments. The frequency of testing a crime lab or a police department (or specific individuals of these organizations) usually is too low to make any usable predictions. If a crime lab is tested once every 3 months, and the tests for 2 years (8 tests) all are performed perfectly, this certainly doesn’t mean that the error rate of this crime lab is zero. Likewise, if one of these tests fails, this doesn’t mean that the failure rate is 12.5%.

The circumstances of what is being tested are constantly changing. If a police department discovers that a certain individual was ignoring the rules that produce the “chain of evidence”, he is either given a different assignment, retrained, or let go. The same can be said for an individual working at a crime lab. If a weakness is discovered in a certain procedure, especially if this weakness becomes public as would happen in a court case, the procedure is fixed. Essentially, the system (procedures and personnel) that is being to be measured is changing as it is being measured.

If the error rate of a police force or crime rate is to be obtained and significant, testing probably must be pervasive and constant, and most likely will involve a significant expense.

Definition of Uniqueness

Proponents of DNA fingerprinting have long sought to get numbers out of the court room, just like their counterparts that look at fingerprints, handwriting, or hair and fibers experts. In most court cases, these other experts will merely state that it is in their expert opinion that this piece of evidence matches something else. How accurate is not presented most of the time, nor is it necessary to do so. The credentials of the expert are examined, and if in order, his opinion is all that matters.

On the other hand, DNA fingerprint experts still need to present the statistical numbers behind their conclusions. The DNA technology itself, while not that new and already “officially accepted”, is still not completely understood by the average juror, judge, or lawyer. Somehow the numbers behind the conclusions are still required.

The requirement of numbers is starting to change. In 1997, the FBI said that “if the likelihood of a random match is less than one in 260 billion, the examiner can testify that the samples are an exact match” (cited by Weir, slide 1). Effectively, the FBI was saying that there now was a point at which legally we could say that a particular DNA profile was unique. Let’s examine the numbers behind this statement.

  • P = the probability of obtaining a DNA profile at a crime scene.
  • a = the probability that somebody in a population has this profile.
  • N = the number of people in the population

The probability of not obtaining a random match of the DNA profile at a crime scene is:

The U.S. population at the time was approximately 260 million people. If you set a = 0.001, then

P = 0.001 / (260 x 10^6) = 1 / (260 x 10^9)

Thus, when the FBI made this original statement about the likelihood of a random match, they were really saying that it was acceptable to declare that an exact match had been found because there was a 99.9% chance of not finding the specific DNA profile from another source within the population of the United States at that time (Weir, slide 50 + discussion).

- 1 -