Supplementary data - Curiosities of X markers and haplotypes

The ʎ-model

To assist in the understanding of the model, we may construct a simple example. Consider frequency data given in Tillmar et al [1] for 652 unrelated male individuals, typed on the commercial Argus X12 kit (QIAGEN, Hilden, Germany). We focus on the first three markers located together in a cluster. Using observed data and the ʎ-model we estimate frequencies for all possible haplotypes. In Figure 3 we have plotted the sum of square of differences between the estimated and the expected frequencies. Specifically we computed (Fi-pi)2/pi for a range of values on the ʎ parameter. The plot illustrates the fact that with increasing ʎ, the estimated frequencies quickly approaches the expectations.

Figure 3. The sum of squared differences between expected and estimated haplotype frequencies against the value of ʎ. The y axis is on log scale.

Case 1

We use the formula derived in the paper

And the Table

SNP 1
SNP 2 / Allele 1 / Allele 2
Allele 1 / F1,1exp+D / F2,1exp-D
Allele 2 / F1,2exp-D / F2,2exp+D

Without consideration of the sign of D and where the expected haplotype frequency is given by 0.25 for all haplotypes.

We get

And we see that if D=0 we get,

In the case where we can unambiguously determine the females genotypes we have from the paper that

If we instead consider the fact that we use the lambda model to estimate the haplotype frequency we have,

Given that C=100 and that c will take values depending on the degree of LD we get,

Case 2

We use the formula derived in the paper,

And evaluate the haplotype frequencies. We have that,

Where we have used the formula to estimate haplotype frequencies. Using the information that the total database size is 500 (C=500) and that all the expected haplotype frequencies are 0.13 and the observations as given in the paper we get,

Re-arranging we get,

Where we see that if lambda is very small we get

While for very large lambdas we get

Case 3

We have from the paper that

Using also that

Haplotype / Observations
F19,21,13 / 0
F14,21,10 / 1
F24.1,20,13 / 1
F19,20,13 / 10
F24.1,21,13 / 2
F14,21,13 / 1
F19,21,10 / 1

And that C=1000, and that all allele frequencies are 0.1 we get,

That can be further reduced to

Case 5

We have from the paper that

We further have that all haplotypes (observed in the case) have been observed once, total number of observations is 200 and the individual allele frequencies are 0.1. We derive F18,x,y and F19,x,y as

and

Which using sloppy notation is a summation over all possible haplotypes involving 19 respective 18 as the allele at the first marker, with i and j corresponding to the alleles at the second and third marker.

References

1. Tillmar, A.O., Population genetic analysis of 12 X-STRs in Swedish population. Forensic Science International: Genetics, 2012. 6(2): p. e80-e81.

6