Here We Provide Details of the Derivation of the Two Main Theoretical Results Ot the Paper

Supplementary materials

Here we provide details of the derivation of the two main theoretical results of the paper.

The first theoretical result describes the difference curve obtained by subtracting the melting curves from samples of different genotypes mixed with homozygous reference DNA fraction x of the total DNA.

1) This curve is the given by the difference in heteroduplex fractions of the mixtures times a weighted average of the homoduplex curves minus the mean of the heteroduplex curves.

2) When the melting curves of wild type and mutant homozygotes are indistinguishable, the weighted average of the homoduplex curves is just this common homozygote curve. In this case

the reference DNA fraction which maximizes separation of the difference curves of mixtures with all three genotypes of bi-allelic diploid DNA is x=.

Samples of different genotypes are designated W for wild type, M for homozygous mutant, and H for heterozygous mutant; forward and reverse strands of wild type amplicon duplexes are designated w and w’, respectively, and forward and reverse strands of homozygous mutant amplicon duplexes are designated m and m’, respectively. We will assume that the genotype of the reference DNA is wild type, but clearly the role of wild type and homozygous mutant are arbitrary here and may be reversed.

We assume amplification preserves the relative proportion of all strand species. Thus, Table 1a models both the initial and post-extension relative concentrations of reference wild type and homozygous mutant duplexes when samples of the three genotypes mixed with homozygous reference DNA fraction x of the total DNA so that the sample fraction is 1-x. The result is trivial when the sample is of the same or opposite homozygous genotype, and when it is heterozygous, half of the sample contributes to the wild type duplex fraction and the other half contributes to the mutant homozygous duplex fraction.

Table 1a. Homoduplex fractions at extension phase of PCR

Genotype / [ww’] / [mm’]
W / 1 / 0
M / x / 1-x
H / x +(1-x) =(1+x) / (1-x)

Next we assume that a final denaturation and reannealing step prior to melting promotes the formation of heteroduplexes independent of whether they are perfectly complementary ([ww’],[mm’]) or just approximately complementary ([wm’],[mw’]). Mathematically, this says that the resulting fraction of each of these duplexes is the product of the individual strand concentrations given by their duplex concentrations in Table 1a. Rows of Table 1b is then obtained by multiplying the expressions in Table 1a in Binomial (`foil’) form much like a Punnett square. Since the sums of rows of Table 1a equal 1, so do the sums of rows of Table 1b: When a+b=1, (a+b)(a+b)=aa+ab+ba+bb=1.

Table 1b. Duplex fractions after denaturation and reannealing of PCR product

Genotype / [ww’] / [wm’] / [mw’] / [mm’]
W / 1 / 0 / 0 / 0
M / x2 / x(1-x) / x(1-x) / (1-x)2
H / (1+x)2 / (1-x2) / (1-x2) / (1-x)2

Our next assumption is that the melting curve of a mixture of duplexes is given by the weighted sum of the individual duplex melting curves in proportion to their relative concentrations. This makes sense since individual duplex melting curves already take into account all fluorescence differences due to double vs. single stranded state of the duplex at a given temperature as well as potentially reduced fluorescence of double stranded heteroduplex bubbles. Identifying the duplex fluorescence vs. temperature melting curves by their two strands, Fww’, Fwm’, Fmw’, Fmm’, and weighting them by the duplex concentrations from Table 1b gives Table 1c, the theoretical melting curves of mixtures of reference DNA fraction x with the various genotypes. Note that when x=0, these expressions reduce to the wild type and mutant homozygote melting curves, and the heterozygote curve given by the equally weighted sum of one-quarter of each duplex, two homoduplexes and two heteroduplexes, so its overall heteroduplex content is one-half. We may interchange the words homozygote and homoduplex when referring to their melting curves, but not heterozygote and heteroduplex.

Table 1c. Melting curves of mixtures

Genotype / Melting Curve
W / 1.0Fww’(T)
M / x2Fww’ (T) + x(1-x)Fwm’ (T) + x(1-x)Fmw’(T) + (1-x)2F mm’(T)
H / (1+x)2Fww’ (T) + (1-x2)Fwm’ (T) + (1-x2)Fmw’(T) + (1-x)2Fmm’(T)

Theoretically computed examples of the four duplex curves and the three genotype curves for the HFE amplicons of the manuscript, corresponding to the experimental examples in Fig. 2 of the manuscript, are shown in Fig. S1. The two homozygous genotypes and two homoduplexes all have one identical melting curves, so only four distinct melting curves are visible. These curves are obtained by inverting the van ‘tHoff equation for temperature in terms of product and reactant concentrations, , with and obtained from nearest -neighbor approximations for the duplexes and thermodynamic parameters described in [S1, S2].

Subtracting the melting curves in the lower two rows of Table 1c from the melting curve in the upper row in Table 1d gives the theoretical difference curves between mixtures with wild type and mutant homozygous samples, and between mixtures with wild type and heterozygous samples, respectively. The difference curve between mixtures with mutant homozygous and heterozygous samples will be obtained as the difference of these differences after simplification. Note that the sum of the coefficients of melting curves in each row of Table 1d is zero, since we have subtracted rows with coefficients adding to one.

Table 1d. Between mixture difference curves

Genotypes / Difference Curve
W-M / (1-x2) 2Fww’(T) - x(1-x)Fwm’(T) - x(1-x)Fmw’(T) - (1-x)2Fmm’(T)
W-H / (1-(1+x)2) 2Fww’(T) - (1-x2)Fwm’(T) - (1-x2)Fmw’(T) - (1-x)2Fmm’(T)

In Table 1e, we factor out heteroduplex content from the expressions in each row of Table 1d. Since the coefficients of the heteroduplex curves Fwm’, Fmw’ in Table 1d are the same, both equal to minus one-half of the total heteroduplex content, the coefficients of Fwm’, Fmw’ in parentheses in Table 1e are both -. Since the coefficients of Fww’, Fmm’ in Table 1d are positive, and the sum of homoduplex and heteroduplex curve coefficients is zero, the coefficients in parentheses of Fww’, Fmm’ in Table 1e must be positive and sum to +1.

Table 1e. Factored form of difference curves.

Genotypes / Difference Curve (Factored Form)
W-M / 2x(1-x)[(mww’(x)Fww’(T)+ mmm’(x)Fmm’(T)) - (Fwm’(T)+Fmw’(T))]
W-H / (1-x2)[[(hww’(x)Fww’(T)+ hmm’(x)Fmm’(T)) - (Fwm’(T)+Fmw’(T))]

where mww’(x)0, mmm’(x)0, mww’(x)+ mmm’(x) =1, hww’(x)0 , hmm’(x)0, hww’(x)+ hmm’(x) =1. These are the conditions for coefficients of weighted average, sometimes called a convex combination or linear interpolation. Table 1e says that difference curves are described by the total heteroduplex content times a weighted average of the homoduplex curves minus the mean heteroduplex curve.

Since we are most interested in the situation when it is difficult to distinguish the homozygous mutant melting curve from the wild type melting curve, we focus first on the case when they are taken to be identical. In situations where nearest-neighbor thermodynamics predicts that the melting curves of two homoduplexes are the same, it does not imply that the melting curves corresponding to heteroduplexes are the same, in fact that would be an unlikely coincidence. And though the relative concentrations of the two heteroduplexes in our these mixtures is always equal, the relative concentrations of the homoduplexes is not. What is unique to the identical homozygote situation is, with respect to melting and difference curves, all weighting factors have the same result: When Fww’(T)=Fmm’(T) then aFww’(T)+bFmm’(T) is the same for any a and b with a+b=1. This allows us to replace each of the weighting factors in Table 1e, mww’, mmm’ , hww’ ,hmm’ by , and completely separate the heteroduplex dependence from the temperature dependence of the difference curves. Table 1f shows that in the case of identical homozygotes, the melting curve difference is given by the heteroduplex content difference between mixtures times a universal difference curve equal to the mean homoduplex curve minus the mean heteroduplex curve. The theoretical universal difference curve and the difference curve between unmixed homozygous and heterozygous samples are also shown in Fig. S1. The latter is one-half of the former, in accordance with the theory.

Table 1f. Difference curves when homozygote curves are equal, Fww’(T)= Fmm’(T)

Genotypes / Difference Curve When Fww’(T)= Fmm’(T)
W-M / 2x(1-x)[(Fww’(T)+Fmm’(T)) - (Fwm’(T)+Fmw’(T))] = m(x)F(T)
W-H / (1-x2)[(Fww’(T)+Fmm’(T)) - (Fwm’(T)+Fmw’(T))] = h(x)F(T)
H-M / (x2-2x+)[(Fww’(T)+Fmm’(T)) - (Fwm’(T)+Fmw’(T))] = (h(x)-m(x))F(T)

Here m(x)=2x(1-x) is the heteroduplex content of the mixture with a mutant homozygous sample, h(x)=2x(1-x) is the heteroduplex content of the mixture with a heterozygous sample, and F(T) = (Fww’(T)+Fmm’(T)) - (Fwm’(T)+Fmw’(T)) is the difference of the mean homoduplex curve and the mean heteroduplex curve. Since there is no heteroduplex content in the mixture with wild type, m(x) and h(x) are also the difference in heteroduplex content between their respective mixtures and wild type. We have also obtained the expression for the difference curve between mixtures of reference DNA with homozygous mutant and with heterozygous samples by subtracting the differences of each and wild type: H-M=(W-M)-(W-H).

In the case of identical homozygote melting curves, described by Table 1f, we can explicitly determine the reference DNA fraction x which maximizes the separation between melting curves corresponding to mixtures of that fraction with different genotypes. This is because the separation between any such pair of curves is proportional to the magnitude of the heteroduplex content difference of between the mixtures, which we have computed for all three pairs as m(x), h(x), and |h(x)-m(x)|, which are plotted in Fig. S2. Any quantitative measure of separation will be propotional to the appropriate one of these functions, such as area between curves or their maximum separation. For the sake of definiteness, we will measure the ability to distinguish all genotypes will be determined by the maximum separation between the closest pair, so the separation between a pair of mixtures is given by their heteroduplex content difference times the maximum value of F(T), which we will call F. For example, when the reference DNA fraction x=0 and samples are unmixed, wild type and homozygous mutant samples have zero heteroduplex content equal to 0, and heterozygous samples have heteroduplex content equal to . The maximum difference between either homozygous curve and the heterozygous curve isF, but the maximum difference between the two homozygous curves is zero. To maximize our ability to distinguish all three genotypes, we seek the value of the reference DNA fraction x which maximizes the smallest of the three absolute heteroduplex content difference functions. To find this value, we first place the three absolute heteroduplex content difference functions, in increasing order depending on the values of x on intervals where the order differs.

Table 1f. Order of absolute heteroduplex content difference on intervals where it differs.

Interval / Smallest, s(x) / Middle / Largest
0<x< / m(x) / h(x)-m(x) / h(x)
<x< / h(x)-m(x) / m(x) / h(x)
<x<1 / m(x)-h(x) / h(x) / m(x)

The first column gives the function s(x) we wish to maximize across the full interval of reference DNA fractions x. In Fig. S2, this is the union of the lowest graphs on each interval, m(x) from reference DNA fractions x=0 to , h(x)-m(x) from x= to x=, and m(x)-h(x) from x=to x=1. A theorem from calculus confirms our visual intuition that the maximum of s(x) can only occur where the slope of its tangent is zero, or it does not have a well-defined tangent: If s(x) has a local extremum at x=a, then s'(a)=0 or s'(a) does not exist. The only place s'(x)=0 is halfway between its roots and 1 (as it is quadratic in this interval), i.e., at x=. This corresponds to adding twice as much wild-type DNA as there was unknown DNA and gives a maximum separation of F, between the heterogygous and homozygous SNP curves, or of the original separation F between the heterozygous melting curve and the other two. The separation between the wild-type melting curve and the other two melting curves will be larger.

The only places s(x) is not differentiable is where it changes form, i.e., at x=and x= . Comparing the values at these points and x=, we find the optimal mixture fraction occurs at

x=, as indicated in Fig. S2. For this reference DNA fraction, at the temperature of maximum separation, the melting curves of mixtures with heterozygous and wild type samples are F apart, barely reduced from the separation F =F when the same samples were not mixed. What we have gained is that at this reference DNA fraction, instead of overlapping the wild type curve when the samples were not mixed, the melting curve of a mixture with a homozygous sample, is exactly halfway between melting curves of mixtures with wild type and heterozygous samples, F away from both at the temperature of maximum separation.

In retrospect, we can give a simple heuristic explanation for this value, when we recognize that it corresponds to adding one part wild-type DNA to six parts unknown sample. As we saw above, the the melting curves will be optimally separated when the homozygous mutant curve is equally separated from both the wild-type and heterozygous melting curves, so the heteroduplex content of the mixture with a homozygous sample must be exactly half that of a mixture with a heterozygous sample. The ratio of 1 part wild-type to 6 parts unknown is optimal because when we divide 6 in equal parts (3+3, representing the heterozygous sample strands), add 1 to one of the parts (4=3 wild type sample plus one reference strand) and multiply (3)(4)=12 to represent heteroduplexes formed, we obtain exactly twice the product of the original number (6, representing the homozygous SNP strands) multiplied by one (reference strand.) So at the simplest level, it is because (6) (6+1) = 2 (6)(1) that the optimal reference DNA fraction is . This is visualized in the animation

References for the Supplementary Materials

[S1] J. SantaLucia Jr., A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics, Proc. Natl. Acad. Sci. USA 95 (1995) 1460-1465.

[S2] N. Peyret, P.A. Seneviratne, H.T. Allawi, J. SantaLucia Jr.,

"Nearest neighbor thermodynamics of DNA with A·A, C·C, G·G, and T·T mismatches"

Biochemistry 38 (1999) 3468-3477.