Given the Possible Number of Genetic Variations, the Probabilit

Team 250Page1

Summary

Given the possible number of genetic variations, the probability of having a naturally occurring Doppelganger is low. This is why DNA evidence acquired at crime scenes is such conclusive evidence when presented in criminal trials. Though the process of DNA fingerprinting is fallible, the probability that two unrelated people with the same DNA exist is microscopic. Barring, then, that you have an identical evil twin, the probability that you will be mistaken for a criminal based on such evidence is low. Fingerprints, however, being only a portion of this genetic identity, seem far less restricting. It is then conceivably possible that one could be mistaken as the perpetrator of a crime based on fingerprint evidence. It is our goal to determine exactly how probable this is.

One of the progenitors of the study of fingerprint identity was Sir Francis Galton, who identified characteristic ridge patterns in the skin that vary widely among a population, but which are constant over time to an individual. In addition to these minutiae, fingerprints also have an overall pattern that in nearly all cases falls into one of three groups: loops, arches, and whorls. Using both the overall fingerprint patterns, and a set of the most commonly occurring Galton Characteristics (GCs), we created a model to test the individuality of fingerprints, based on a probabilistic interpretation: highly probable fingerprints are less individual, and less probably fingerprints are more individual.

In this model, we first divided an ideal rectangular thumbprint into squares of equal area, denoted as cells. Knowing that any comparison between two fingerprints first matches the general pattern of a fingerprint and then a certain number of GCs, we calculated the fingerprint patterns that have the maximum probability of occurrence. This was done by using figures which determined the relative frequency of occurrence of each of the patterns and GCs.

To start, we assumed that from an ideal thumbprint containing N total cells, we chose to confirm the form and placement of n GCs in those cells. Our model proceeds in stages, first choosing the overall pattern of the print, and then proceeding to choose n locations of GCs from the N total placements possible. Once the pattern and placement have been determined, it remains only to factor in the relative occurrence probabilities of each GC in order to determine a measure of the individuality of the fingerprint.

The model is constructed based on a number of assumptions. To begin with, we first assume that the patterns and GCs occur independently; neither has an influence on the other’s probability. In later stages of our analysis, then, we account for the fact that dependencies may exist, and alter the selection of GCs accordingly. Another assumption that our model makes is that the GCs occur independently; that is, in the n spaces which we wish to confirm the presence of GCs, placement has no effect on which characteristic is selected. Since there has been no conclusive evidence that a particular fingerprint pattern has any influence on the minutiae present in the fingerprint, this seems to be a valid assumption, and hence no unnecessary restrictions were placed on the form of the fingerprint. The construction of the model allowed us to calculate the ability to confirm a fingerprint based on partial fingerprint evidence. In addition, we used population figures of many countries and the entire world to find what the minimum number of GCs in common between fingerprints should be before a match can be said to occur.

In testing this model, we did not calculate the probability of occurrence for every individual pattern and placement of GCs. Rather, we calculated only the probability of the most likely occurrence. Also, the orientation of GCs was not taken into consideration. This may at first seem to be a weakness, but is in fact a strength, as requiring a fingerprint to occur with GCs oriented in a particular direction is stricter than not requiring any particular direction for their placement. Thus, any fingerprint occurring in nature is hypothetically less likely to occur than our calculated maximum. For a template fingerprint with 12 identified minutiae, a reasonable required number given new advancements in laser recognition of fingerprints, the probability finding a match was calculated to be on the order of 10-13. This figure shows that even the most likely fingerprint is thus highly individual, and fingerprint identification is as reliable on ideal grounds as DNA identification, which has reliability on the order of 10-10.

Team 250Page1

AN INQUIRY INTO INDIVIDUALITY OF THUMBPRINTS

Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer

Kansas State University

Mathematical Contest in Modeling 2004

I. Introduction

“How can you disbelieve in me when I have created each one of you down to the prints on your fingers?”

--God (The Holy Qur’an 75:3-4) [4]

The above reference, depending on one’s religiousness or secularism, either confirms that fingerprints are distinct to individuals, or at the very least, that knowledge of variation of fingerprints between persons, and its inherent properties in identification, has existed since the 8th century. In modern Western culture, the idea of using fingerprints as a means of identification first appeared in an article written by Henry Faulds in 1880 in the journal Nature [3]. His interest was aroused by his discovery of ridged pattern imprints in handmade pottery. After performing a series of experiments to determine difference in fingerprints among individuals as well as their resilience, he recommended that a primary use of these ridged imprints could be used as evidence of criminal identity at the scene of the crime. At the root of this assertion is the assumption of uniqueness in each human’s fingerprint patterns. There are several commonalities in the patterns of ridged skin, however, which allow fingerprints to be systematically classified.

For example, the ridged lines on fingers appear in a number of major pattern types: loops, which comprise the largest portion of all fingerprints and occur in two chiralities; whorls, which are characterized by the spiraling pattern of the ridges; and the arches, which comprise the smallest major group [1]. Other possible manifestations exist; however their occurrence is very rare. In addition to these major groups, the ridges of different fingerprints show certain defining characteristics. This idea was prevalent in one of the first attempted quantifications of fingerprint individuality, which was performed by Sir Francis Galton in 1892 [1]. The patterns of finger ridge divergences and combinations, termed minutiae, are also identified as Galton Characteristics in his honor. Later developments have incorporated his ideas along with other print-determining factors to establish more exactly each print’s uniqueness [1,2,6].

Whether or not each fingerprint pattern is truly unique, their use as a form of identification has found much use in forensic science. Recently, however, the validity of fingerprint evidence has been called into question, as evidenced by the case United States v. Mitchell, which presented the US with its first challenge as to the admissibility of latent fingerprint evidence as a means of identification [7]. This necessitates a reevaluation of the validity of fingerprint uniqueness in measurement. Thus, we become faced with the problem of determining the probability that two people in the world might share the same fingerprints to measurable accuracy. This is quite a complex problem if one allows it to be, as there seem at first to be almost infinitely many variations within ridge patterns whose appearance and interplay must be accounted for, and yet it has a simple and elegant solution which we will show in this paper. In our study, we focus not on each of the ten fingers, but on only the thumb, which effectively serves as an upper bound for the multiple occurrence probability of all friction ridged skin. Our calculations have found on the basis of a discrete probability model that it is extremely unlikely that two people with the same thumbprints have ever existed, within the limitations of current measurement practices.

II. Model

The first step in devising a model for thumbprint individuality is simply to understand what types of fingerprints exist. As mentioned previously, fingerprints occur in what seems to be an infinite number of variations, determined by both their overall pattern and the distribution of Galton Characteristics (GCs). The patterns fall into three main categories: loops, arches, and whorls. These can be further divided into over a thousand subcategories [1]. Figure 1 shows the major types of prints.

FIGURE 1. These are four most common patterns of fingerprint patterns: Left and right loops, whorls, and arches. From

Prints which fall into these categories can, to the untrained eye, and oftentimes even the trained eye, appear very similar. When the contribution of GCs is factored in, a particular fingerprint’s unique character starts to become apparent. The major types of GCs are illustrated in Figure 2. Whether the pattern on the finger is a loop, arch, or whorl, GCs occur randomly throughout the entire print. These occurrences give distinct attributes to the print that can be systematically classified.

FIGURE 2. A chart showing the 10 most common forms of Galton Characteristics. (Osterburg ??)

The central problem, given a known classification of a fingerprint by its pattern and GCs, becomes to calculate the probability that an identical finger exists. Our model focuses specifically on thumbprints, for a variety of reasons. For instance, a thumb is easy to idealize. In practice, when fingerprints are taken, the finger is rolled over nearly its entire surface above the first knuckle. This is similar to the unrolling of an uncapped cylinder. The shape of this print on paper is approximately rectangular. The thumbprint has the largest area, and also the largest number of defining qualities, due to the random distribution of GCs.

For an ideal rectangular thumbprint, we partition the area into N equally sized squares, with a minimum size on the order of one square millimeter, due to the minimum extent to which a GC can be identified as occurring in one of the N squares. Since only a finite number of visible GCs can occur on a single patterned finger, a discrete probability method is useful for determining the possibility of Doppelganger thumbs. It is then perfectly admissible to use a counting argument to find approximately the number of possible arrangements of friction ridges on the thumb, and their relative occurrences based on the features they contain.

It should be noted that ideal fingerprints as described above do not usually occur in actual fieldwork. Usually only portions of fingerprints are left by oils or other substances on the fingers of the criminal; these are called latent prints. After these latent prints are developed and brought into visible form, they are described as partial prints. These partial prints contain only a fraction of the total surface of the friction ridged skin on the thumb. Using similar ideas to the ones above, we can model partial prints simply by decreasing N; that is, limiting the number of cells on which the prints have to match up. Since a partial print cannot possibly match the rest of the cells contained in an ideal print, the characteristics of those cells are irrelevant. Decreasing N then gives an accurate model, as we can say that the area we are sampling from is smaller. Accordingly, the probability of matching the print among people of a given population grows, as we show below.

III. Probability Algorithms

Our first step was to measure the dimensions of an idealized thumb. Averaging over the three members in our group, we found the dimensions of a nearly rectangular print, when measured as described above, to be approximately 3 cm by 4 cm. Thus there are approximately 1200 square millimeters on two thumbs. We took each square millimeter to be a cell, so that in our ideal thumb model, a full print has a possibility of 1200 identification points.

In practice, a suspect’s thumbprint and the thumbprint found at the scene of the crime are compared to each other on both the overall pattern and a certain number of distinguishing characteristics. The distinguishing factors can correspond to either scars on the suspect’s thumbprint or GCs. Since scars are the result of completely random events, and thus are nearly impossible to quantify without exact personal histories, our model considers only the cases in which GCs occupy these identifying points. In previous models [1,2], the relation between GCs and the overall pattern was not considered; only the occurrence of GCs was taken into account. In our model, various degrees of pattern and GC independence were considered. This accounts for the possibility that a certain percentage of the GCs are inherent in the overall pattern. In the case where pattern and GC occurrences are completely independent, one can separate the probability of a fingerprint’s occurrence into two factors:

(1).

In the above equation, Pfpis the probability a particular fingerprint will occur, Pp is the probability a particular pattern will occur, some approximate figures for which are given in Table 1, and PGCis the probability of a particular combination of GCs.

Class of Print / Probability
Right Loop / 0.325
Left Loop / 0.325
Whorl / 0.3
Arch / 0.05
Total / 1

TABLE 1: A list of approximate occurrence probabilities of the four most common thumbprints from Osterburg, et. al. The loop category is determined therein to have a 65% occurrence probability, which here is divided into the two chiralities, which are easily distinguishable and occur at nearly the same rate overall.

Our model treats non-measured GCs and cells in which there are no GCs as equivalent empty cells. Thus, in the case where GCs are dependent on which pattern a fingerprint has, we can still use this independence model, by noting that since a particular percentage of the GCs are determined by the pattern, we can treat those as empty space in which no defining characteristic occurs.

Suppose then, that we wish to find the probability that a particular distribution of measured GCs occurs. To do this, we note that of the N total cells in the fingerprint, only n of these cells have any significance in terms of GC measurement. The number of ways this can be distributed is easy to compute. Placing all measured cells on the same level, we begin placing GC’s and empty cells on the surface of the thumbprint. At first there are n GCs to place within the total area of the print, and N total cells to place them in. If the first cell is empty space, we are left with N-1 cells in which to place characteristics, and n characteristics. If the first cell contains a characteristic, we have N-1 empty cells in which to place characteristics, and n-1 GCs. Iterating this choice process over all N cells, we find that the number of ways we can place the GCs is

(2).

This leaves us to calculate the probability that each GC cell contains a particular GC. Osterburg, et al, contains relative frequencies of occurrence for each characteristic averaged over 39 fingers. Table 2 gives these figures. In our model, since we disregard empty spaces, we considered only the relative frequency of the eleven most common elements. Double occurrences, or the event that two GCs occur in the same space, while certainly possible, were ignored in this model calculation, due to their small frequency. The number in the table is misleading, as it accounts for all double occurrences, not double occurrences of particular types.

Parameter / Cell configuration / Frequency / Probability of Parameter
0 / Empty / 6,584 / 0.766
1 / Island / 152 / 0.018
2 / Bridge / 105 / 0.012
3 / Spur / 64 / 0.007
4 / Dot / 130 / 0.015
5 / Ending ridge / 715 / 0.083
6 / Fork / 328 / 0.038
7 / Lake / 55 / 0.006
8 / Trifurcation / 5 / 0.001
9 / Double bifurcation / 12 / 0.001
10 / Delta / 17 / 0.002
11 / Broken ridge / 119 / 0.014
12 / Multiple occurances / 305 / 0.036
Total / 8,591 / 1.000

TABLE 2. Experimentally determined Galton Characteristic probability numbers. From Osterburg, et al. Our model disregards multiple occurrences, hence for our purposes, the characteristics numbered 0 and 12 are empty cells. Only the characteristics numbered 1-11 are relevant.

The relative probability is a necessary factor for determining which characteristic is most likely to occur in the n GC cells. The probability of the ithoccurrence is given by:

(3),

where the elements P(i) are determined from Table 1. The i in this case ranges from 1 to 11, as our model considers only single GC occurrences, and treats the low probability and multiple occurrence GCs as empty space. It should be noted that their inclusion would decrease the relative probability of the ith term as defined above; hence, it would decrease the upper bound which our calculation aims to set. Clearly, the sum of these relative probability quantities is 1, hence they are validly defined as probabilities.

For n GCs, the probability of each arrangement is given by the relative probability of each GC to the power of the number of times the GC is selected divided by the number of ways to divide those n elements into groups categorized by the eleven GCs considered. Though the idea is complex, the notation is rather mathematically simple, and corresponds to the product of the selection probabilities divided by the multinomial coefficient corresponding to n choosing n1of GC number 1, n2of GC number 2, etc. If we divide this quantity by the number of ways each of the n GCs considered, we obtain the probability of each arrangement of n GC’s, shown in equation (4a).

(4a)

One should note that in the above,

(4b),

hence there are only as many stages considered in the determination of GCs as there are GCs that are measured and available to compare to.