Calculation of numbers of synonymous and non-synonymous substitutions per site using the method of Nei & Gojobori (1986).

Show that syn and non-syn sites evolve at different rates.

Need to calculate:

S = no. syn sites

N = no. non-syn sites

Sd = no. syn differences

Nd = no. non-syn differences

Now define :

DS = Sd/S (fraction of syn sites that differ)

DN = Nd/N (fraction of non-syn sites that differ)

These are equivalent to D in the Jukes-Cantor model.

We can use the JC distance formula to calculate two evolutionary distances.

dS = -3/4 ln(1- 4DS/3) (no. of syn subs per syn site)

dN = -3/4 ln(1- 4DN/3)(no. of non-syn subs per non-syn site)

These are equivalent to the usual Jukes-Cantor d, which is the number of substitutions per site if all sites are equivalent.

For any two homologous sequences, we expect dS > dN because selection slows down the rate of non-syn subs.

If we know the time t since two species diverged, we can calculate the rates of syn and non-syn subs:

dS/2t and dN/2t.

These rates would be numbers of subs per site per million years.

If we don’t know t, we can still compare the two distances. The ratio dN/dS tells us how much slower the non-syn subs are.

Notation:

d is sometimes called K

dS is sometimes called KS

dN is sometimes called KA (where the A means amino acid subs)

dN/dS is the same thing as KA/KS

1 2 3 4 5

Pro Phe Gly Leu Phe

Seq 1CCC UUU GGG UUA UUU

Seq 2CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Calculate S for each codon.

Check the genetic code -

A fourfold degenerate site counts as S = 1(N = 0)

A non-degenerate site counts as S = 0 (N = 1)

A two fold degenerate site counts as S = 1/3 (N = 2/3)

1. S = 0 + 0 + 1 = 1

2. S = 0 + 0 + 1/3 = 1/3

3. S = 0 + 0 + 1 = 1 (whether we look at Gly or Ala codons)

4. for UUA, S = 1/3 + 0 + 1/3 = 2/3

for CUA, S = 1/3 + 0 + 1 = 4/3

Take the average of these: S = 1 for codon 4.

5. for UUU, S = 1/3

for GUA, S = 1

Take average: S = 2/3

For whole sequence, S = 1 + 1/3 + 1 + 1 + 2/3 = 4

N = total number of sites -S = 15 - 4 = 11

1 2 3 4 5

Pro Phe Gly Leu Phe

Seq 1CCC UUU GGG UUA UUU

Seq 2CCC UUC GAG CUA GUA

Pro Phe Ala Leu Val

Calculate Sd and Nd for each codon.

1. Sd = 0,Nd = 0

2. Sd = 1,Nd = 0

3. Sd = 0,Nd = 1

4. Sd = 1,Nd = 0

5. this could happen two ways

UUU --> GUU --> GUAroute 1

Nd = 1 Sd = 1Sd = 1, Nd = 1

UUU --> UUA --> GUAroute 2

Nd = 1Nd = 1Sd = 0, Nd = 2

Take average of these two:

Sd = 0.5, Nd = 1.5

(note that if all three positions were different there would be 6 routes to average)

Total Sd = 2.5Total Nd = 2.5

DS = 2.5/4 = 0.625DN = 2.5/11 = 0.227

dS = 1.34dN = 0.271

Non-syn rate is much slower than syn rate in this example