Supplementary data - Curiosities of X markers and haplotypes
The ʎ-model
To assist in the understanding of the model, we may construct a simple example. Consider frequency data given in Tillmar et al [1] for 652 unrelated male individuals, typed on the commercial Argus X12 kit (QIAGEN, Hilden, Germany). We focus on the first three markers located together in a cluster. Using observed data and the ʎ-model we estimate frequencies for all possible haplotypes. In Figure 3 we have plotted the sum of square of differences between the estimated and the expected frequencies. Specifically we computed (Fi-pi)2/pi for a range of values on the ʎ parameter. The plot illustrates the fact that with increasing ʎ, the estimated frequencies quickly approaches the expectations.
Figure 3. The sum of squared differences between expected and estimated haplotype frequencies against the value of ʎ. The y axis is on log scale.
Case 1
We use the formula derived in the paper
And the Table
SNP 1SNP 2 / Allele 1 / Allele 2
Allele 1 / F1,1exp+D / F2,1exp-D
Allele 2 / F1,2exp-D / F2,2exp+D
Without consideration of the sign of D and where the expected haplotype frequency is given by 0.25 for all haplotypes.
We get
And we see that if D=0 we get,
In the case where we can unambiguously determine the females genotypes we have from the paper that
If we instead consider the fact that we use the lambda model to estimate the haplotype frequency we have,
Given that C=100 and that c will take values depending on the degree of LD we get,
Case 2
We use the formula derived in the paper,
And evaluate the haplotype frequencies. We have that,
Where we have used the formula to estimate haplotype frequencies. Using the information that the total database size is 500 (C=500) and that all the expected haplotype frequencies are 0.13 and the observations as given in the paper we get,
Re-arranging we get,
Where we see that if lambda is very small we get
While for very large lambdas we get
Case 3
We have from the paper that
Using also that
Haplotype / ObservationsF19,21,13 / 0
F14,21,10 / 1
F24.1,20,13 / 1
F19,20,13 / 10
F24.1,21,13 / 2
F14,21,13 / 1
F19,21,10 / 1
And that C=1000, and that all allele frequencies are 0.1 we get,
That can be further reduced to
Case 5
We have from the paper that
We further have that all haplotypes (observed in the case) have been observed once, total number of observations is 200 and the individual allele frequencies are 0.1. We derive F18,x,y and F19,x,y as
and
Which using sloppy notation is a summation over all possible haplotypes involving 19 respective 18 as the allele at the first marker, with i and j corresponding to the alleles at the second and third marker.
References
1. Tillmar, A.O., Population genetic analysis of 12 X-STRs in Swedish population. Forensic Science International: Genetics, 2012. 6(2): p. e80-e81.
6