Elizabeth Hom

Epi 516

November 16, 2011

Homework #4

Question 1

(a)

The values of P(A) and P(B) were given.

I calculated P(a) = 1-P(A) and P(b)=1-P(B). Then I used Excel to calculate the values for the theoretical range of the linkage disequilibrium coefficient DAB for the four scenarios:

Scenario / P(A) / P(a) / P(B) / P(b) / - P(A)*P(B) / - P(a)*P(b) / P(a)*P(B) / P(A)*P(b)
1 / 0.5 / 0.5 / 0.5 / 0.5 / -0.25 / -0.25 / 0.25 / 0.25
2 / 0.95 / 0.05 / 0.95 / 0.05 / -0.9025 / -0.0025 / 0.0475 / 0.0475
3 / 0.95 / 0.05 / 0.05 / 0.95 / -0.0475 / -0.0475 / 0.0025 / 0.9025
4 / 0.5 / 0.5 / 0.95 / 0.05 / -0.475 / -0.025 / 0.475 / 0.025
Range of D_ab
Scenario / Minimum =
Max(-PAPB, -PaPb) / Maximum=
Min(PaPB, PAPb)
1 / -0.25 / 0.25
2 / -0.0025 / 0.0475
3 / -0.0475 / 0.0025
4 / -0.025 / 0.025

Based on the range of Dab, here is the theoretical range of the absolute value of the linkage disequilibrium coefficient, which is |DAB|, for the four scenarios:

Scenario / Range of |D_ab|
1 / (0, 0.25)
2 / (0,0.0475)
3 / (0,0.0475)
4 / (0, 0.025)

b)

D would reach its theoretical maximum value when D=P(a)*P(B) or D=P(A)*P(b).

We can also use the definition of the linkage disequilibrium coefficient,

D=P(AB)-[P(A)*P(B)]

In one case, at the theoretical maximum value,

D=[P(a)*P(B)] = P(AB)-[P(A)*P(B)]

P(AB)=[P(a)*P(B)] + [P(A)*P(B)]

P(AB) = P(B), which means that P(aB) =0. Thus, one of the possible four haplotypes is not present in this population.

In another case, at the theoretical maximum value,

D= [P(A)*P(b)] = P(AB)-[P(A)*P(B)]

P(AB) = [P(A)*P(b)] + [P(A)*P(B)]

P(AB) = P(A), which means that P(Ab)=0. Thus, one of the possible four haplotypes is not present in the population.

This makes sense because D reaches its maximum value, D can be thought of as being in “complete linkage disequilibrium.” When “complete linkage disequilibrium” happens, at least one haplotype does not occur in the population, and at most 3 of the 4 possible haplotypes occur in the population. When 2 loci are in complete linkage disequilibrium, we can imagine that the 2 loci cannot separated by recombination and thus one haplotype is missing.

Question 2

a)

For SNP1, I calculated the allele frequencies of A1 and A2:

FA1 = (nA1A1 + ½ nA1A2) /N = (115+ 0.5*119)/260 = 0.67

FA2 = 1-FA1 = 1-0.67 = 0.33

To determine if SNP1 was in Hardy-Weinberg Equilibrium, I used a Chi-square test. To calculate the Chi-square statistic, I used the formula:

X2=

I used Excel to calculate the separate terms for each genotype:

Genotype
n_A1A1 / n_A1A2 / n_A2A2
observed count / 115 / 119 / 26
expected count / 117.1163 / 114.7673 / 28.11635
Chi-square statistic / 0.038243 / 0.156104 / 0.1593

I summed these separate terms together: X2 = 0.04 + 0.16 + 0.16 = 0.353647

I calculated: P(X20.353647) = 1-P(X2 0.353647) = 1- 0.4479443= 0.5520557

I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.55 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP1.

To determine if SNP2 was in Hardy-Weinberg Equilibrium, I also used a Chi-square test. For SNP2, I calculated the allele frequencies of B1 and B2:

FB1 = (nB1B1 + ½ nB1B2) /N = (47+ 0.5*125)/260 = 0.42

FA2 = 1-FA1 = 1-0.42 = 0.58

To calculate the Chi-square statistic, I used the formula:

X2=

I used Excel to calculate the separate terms for each genotype:

Genotype
n_B1B1 / n_B1B2 / n_B2B2
observed count / 47 / 125 / 88
expected count / 46.11635 / 126.7673 / 87.11635
Chi-square statistic / 0.016932 / 0.024639 / 0.008963

I summed these separate terms together: X2 = 0.017 + 0.025 + 0.009= 0.0505

I calculated: P(X20.050534)=1-P(X20.050534)=1- 0.1778632= 0.8221368

I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.82 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP2.

b)

My null hypothesis is that the two loci, SNP1 and SNP2 are in linkage equilibrium and are therefore not linked. Thus, my alternative hypothesis is that the two loci are not in linkage equilibrium and are linked.

The observed frequencies of the haplotypeswere calculated as follows:

A1B1=36+36+62+5+28

A1B2=62+17+17+24+62

A2B1=5+24+6+6+11

A2B2= 28+62+11+9+9

The expected frequencies were calculated using the allele frequencies calculated in part (a) and the fact that in total 520 haplotypes (2*260 offspring) were observed in the sample as follows:

E(A1B1) =Nhaplotypes*P(A1)*P(B1)= 520*0.67*0.42

E(A1B2)=Nhaplotypes*P(A1)*P(B1)=520*0.67*0.58

E(A2B1)=Nhaplotypes*P(A2)*P(B1)=520*0.33*0.42

E(A2B2)=Nhaplotypes*P(A2)*P(B2)=520*0.33*0.58

Here are the results for the observed and expected frequencies of the haplotypes:

Haplotype / Observed / Expected / Chi-Square Statistic
A1B1 / 167 / 146.9827 / 2.72612102
A1B2 / 182 / 202.0173 / 1.98345682
A2B1 / 52 / 72.01731 / 5.56383764
A2B2 / 119 / 98.98269 / 4.04810778

I summed these separate terms together: X2 =2.73+1.98+5.56+4.05=14.32

I calculated: P(X214.32152327)=1-P(X214.32152327)=1-0.9998459= 0.0001540929

I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.0001540929<0.05, I reject the null hypothesis that the 2 loci are in linkage equilibrium and conclude that the 2 loci are not in linkage equilibrium.

c)

For the observed data, here are the linkage disequilibrium parameter estimates:

i.

= P(A1B1) –P(A1)P(B1) = (167/520) – (0.67*0.42) = 0.038495

ii.

= if >0

= = = 0.277951

iii.

= = = 0.027541