Simulating Natural Selection

Genetics – Biology 331 – Introduction to Population Genetics

David Marcey, 2017

A. Background

Readthe following 7 pages of background material carefully and thoroughly. When prompted, answer the questions or fill in blanks. Then, proceed to the experimental simulations od evolution on pages 8-10. To get started, please reviewthe meaning of some basic terms used in evolutionary biology and genetics. These are:

1)Natural Selection: The force that acts to change the characteristics of a population over time through the differential reproductive fitness of individuals.Differential reproductive fitness refers to different degrees of success at producing viable offspring by individuals in a population.

2)Selection (selective) pressure:The force of natural selection. The magnitude of this force can vary greatly, depending on the degree of variability of a particular characteristic and/or the degree of instability of the environment in which a population exists. Of course, the environment includes physical factors as well as biotic ones, since populations are part of complex ecosystems that include the other organisms that members of a particular species interact with.

3)Gene: a segment of DNA that is inherited as a unit and that may or may not encode a protein or RNA product. Many DNA segments in eukaryotes have no function and are therefore insensitive to natural selection pressure. Many genes or regulatory DNA sequences do encode functional information and are therefore subject to selective pressure.

4)Allele: one version of a gene. A gene can have many different alleles, but each individual of a diploid species carries only two alleles at any one time (one inherited paternally through a spermatozoan, one inherited maternally through an egg).

5)Genotype: The particular combination of alleles carried by an individual. For example, in a one gene, two allele model, the possiblegenotypes are A1/A1, A1/A2, or A2/A2. There are two homozygous genotypes (A1/A1 and A2/A2) and one heterozygous genotype (A1/A2).

6)Phenotype: The appearance of an organism, as determined by genotype. Although environmental factors contribute to the production of any one phenotype, these factors are usually not considered in simple simulations. When phenotypes change during evolution, this is largely due to changes in genotypes.

7)Dominant allele: an allele is said to be dominant when a heterozygote has the same phenotype as a homozygous genotype for that sameallele. For example, if the phenotypes of A1/A1 and A1/A2 are the same, allele A1 is said to be dominant.

8)Recessive allele: an allele is said to be recessive when a heterozygote has the same phenotype as a homozygous genotype for a different allele. For example, if the phenotypes of A1/A1 and A1/A2 are the same, allele A2 is said to berecessive (A1 is dominant). Likewise, if the phenotypes of A2/A2 and A1/A2 are the same, allele A1 is said to be recessive (A2 is dominant).

9)Incomplete (Intermediate) Dominance: Incomplete dominance occurs when a heterozygote has a different phenotype than either homozygote. For example, if A1/A2 has a phenotype that is intermediate between that of A1/A1 and A2/A2, then both alleles are considered incompletely dominant.

10)Relative Fitness: Relative fitness refers to the phenotype of differential reproductive fitness of individuals. These differences can arise due to many factors, including differential survival. In genetics terms, the relative fitness of different genotypes is indicated by the variable w, which can vary from 0.0 to 1.0. Relative fitness (w) is positively correlated with differential reproductive success. For example, if w11, w12, and w22 are the relative fitnesses of the corresponding genotypes (A1/A1, A1/A2, A2/A2), and w11 = 1.0, w12 = 1.0, and w22 = 0.8, then it can be inferred that allele A1 is dominant, and that it confers a selective reproductive advantage to individuals that inherit it.

11)Evolution: a change in the gene pool (allelic frequencies) over time.

To test yourself on your understanding a few of the above termsfill in the blanks of the following sentences in a way that makes sense.

1)The phenotype of an organism can largely be determined by its ______.

2)The ______of genotype A1/A1 was found to be less than that of the A1/A2 heterozygotes because A1/A2 individuals leave more successful offspring than A1/A1 individuals.

3)The ______, A2, is ______because the phenotype of A1/A2 is the same as that of A2/A2.

4)The genotype A2/A2 has greater ______than that of A1/A2 because A2/A2 individuals leave more successful offspring than A1/A2 individuals.

Now,on to the information necessary to understand our simulation experiments.

Our simulations will be done with a simple population level model using one gene (A) with two alleles, A1 and A2. Important features of this model include:

The variable p is the frequency of allele A1 in the population at any time. The frequency of allele A2 is q;
The allele frequencies, p and q vary between 0.0 and 1.0 and always add up to 1, which of course equalsthe total alleles in the population in any given generation.

Example: if q = 0.7, then p = (1-q) = 0.3.

If, at any given time, p and q areknown, then the frequencies (f) of the three possible genotypes at that time can be calculated using simple Mendelian genetics. Again, the frequencies always add to 1. If there are p reproductive cells bearing A1 and q reproductive cells bearing A2, then a Punnett square gives us the genotype frequencies.

frequency of A1 sperm (p) / frequency of A2 sperm (q)
frequency of A1 eggs (p) / frequency of A1/A1genotype( p2) / frequency of A1/A2genotype( pq)
frequency of A2 eggs (q) / frequency of A1/A2genotype( pq) / frequency of A2/A2genotype( q2)

In the example just used:

(f)A1/A1 = p2 = 0.32 = 0.09;
(f)A1/A2 = 2pq = 2 x (0.3 x 0.7) = .42;
(f)A2/A2 = q2= 0.72 = 0.49.

Using the Punnett square, it is easy to see that the the pq product must be multiplied by 2in order to calculate the frequency of the A1/A2 genotype.

When all the individuals in a population are homozygous for a particular allele, that allele is said to be fixedin the population. For example,p = 1 (q = 0) if A1 is fixed, whereas q = 1 (p = 0) if A2 is fixed.

Let’s now apply the above information to calculate changes in allele frequencies under different selection conditions.First, we will make certain assumptions for conditions that rarely obtain in nature.

The Hardy-Weinberg (H-W) Equilibrium

In order to model the simplest case of allele frequency change, or lack thereof, in a one-gene, two-allele simulation we will make a series of unrealistic assumptions about an experimental population. These assumptions are:

1)allgenotypes have the equivalent relative fitness w11 = w12 = w22 = 1;

2)there is a very, very large population size, practically infinite;

3)there is no migration into or out of the population under study;

4)there is no mutation, i.e. no change in allele frequency due to spontaneous mutation of A1 to A2 or vice versa;

5)there is random mating, i.e. no mating preference of one genotype for another.

Given these conditions, there will be no change in allele or genotype frequencies. There is no selection, no introduction of genes from outside the population, and no alleles produced by mutation. A population meeting these criteria for a gene will be in Hardy-Weinberg equilibrium for that gene, and allele and genotype frequencies will not change over time once the equilibrium is established.

To understand this, let’s consider a simple example.

Say that the starting genotype frequencies (in generation 1 of our example) are as follows (of course they add up to 1):

(f)A1/A1 = 0.2
(f)A1/A2 = 0.4
(f)A2/A2 = 0.4

Calculate the frequencies (fractions) of reproductive cells (gametes) that carry the A1 and A2 alleles produced by generation 1 genotypes. A1/A1 individuals will only produce A1 gametes, A1/A2 individuals will produce ½ A1 gametes and ½ A2 gametes, and A2/A2 individuals will only produce A2 gametes. The allele frequencies of these gametes are therefore:

p =(f)A1/A1 + ½ [(f)A1/A2] = ? ______
q =(f)A2/A2 + ½ [(f)A1/A2] = ? ______

Make the above calculation and fill in the blanks before proceeding.

Naturally, p + q = 1 (total alleles in the population). Now let’s see what random mixing of these gametes will produce in terms of genotype frequencies in generation 2. Using our Punnett square tool, we have:

frequency of A1 sperm (p) = 0.4 / frequency of A2 sperm (q) = 0.6
frequency of A1 eggs (p) = 0.4 / frequency of A1/A1genotype( p2) = _?__ / frequency of A1/A2genotype( pq) = _?__
frequency of A2 eggs (q) = 0.6 / frequency of A1/A2genotype( pq) = _?__ / frequency of A2/A2genotype( q2) = _?__

Thus, the generation 2 genotype frequencies will be:

(f)A1/A1 = _?__
(f)A1/A2 = _?__
(f)A2/A2 = _?__

We find that generation 2 genotype frequencies are different than in generation 1. Evolution has occurred! We need to examine what happens next by repeating our calculations of allele frequencies and next generation genotypes.

First, generation 2 allele (gamete) frequencies are:

p =(f)A1/A1 + ½ [(f)A1/A2] = 0.16 + 0.24 = 0.4
q =(f)A2/A2 + ½ [(f)A1/A2] = 0.36 + 0.24 = 0.6

As before, random mixing of these gametes will produce the following genotype frequencies for generation 3:

frequency of A1 sperm (p) = 0.4 / frequency of A2 sperm (q) = 0.6
frequency of A1 eggs (p) = 0.4 / frequency of A1/A1genotype( p2) = 0.16 / frequency of A1/A2genotype( pq) = 0.24
frequency of A2 eggs (q) = 0.6 / frequency of A1/A2genotype( pq) = 0.24 / frequency of A2/A2genotype( q2) = 0.36

Thus, the generation 3 genotype frequencies will be:

(f)A1/A1 = 0.16
(f)A1/A2 = 0.48
(f)A2/A2 = 0.36

You can immediately see that the generation 3 genotype frequencies are the same as those in generation 2, and that bothallele and genotype frequencies will be static from generation 2 forward. For this gene (A), evolution has ceased: the gene is said to be in Hardy-Weinberg (H-W) equilibrium. This makes intuitive as well as mathematical sense. If there is no selection, i.e. all genotypes have equivalent relativefitnesses, and there is no net change in allele frequency due to migration or mutation, and there is a huge population size with random mixing of gametes, then we would not anticipate evolutionary change in allele frequencies for the gene under consideration.

The Effect of Natural Selection

We can now move on to consider the effect of natural selection on changes in allele and genotype frequencies over time (evolution).We will not consider the mathematical derivation of formulas that feed into various selection simulations, but it should be possible to understand that if individuals differ in their reproductive success according to genotype, then, unlike H-W equilibria cases, we must weigh the allelic contribution of each genotype to the next generation differently, according to the genotypes’ relative fitness (w). This will lead to changes in allele frequencies, and therefore genotype frequencies, and therefore evolution (!) as the following example simulations show.

Figure 1, below, illustrates theoretical changes of the frequencies of the two alleles (A1, A2) in different selection scenarios over 100 generations of reproduction.

Figure 1. The three graphs in both (A) and (B) monitor the frequency of an advantageous allele, A1, over 100 generations assuming three different situations: 1) A1 is dominant (A1/A1 and A1/A2 genotypes have the same phenotype); 2) incomplete dominance (A1/A2 has a phenotype intermediate between A1/A1 and A2/A2); 3) A1 is recessive (A2/A2 and A2/A1 have the same phenotype). In this example, the relative fitness of the A1/A1, A1/A2, and A2/A2 genotypes in the dominant, intermediate, and recessive cases are, respectively:

A1 Dominant: w11, w12, w22 = 1.0, 1.0, 0.8;

A1 Incompletely Dominant: w11, w12, w22 = 1.0, 0.9, 0.8;

A1 Recessive: w11, w12, w22 = 1.0, 0.8, 0.8.

In (A), the starting frequency of A1 (p0) is 0.01. In (B), it is 0.10. Note that the relative fitnesses in all cases indicate that A1 is selectively advantageous.

Figure is from Futuyma (2007), Evolution.

Refer to Figure 1(A) and note that if p is very low, e.g. 0.01, and A1 is recessive, it can take a very long time for p to increase, greater than 100 generations, even if the allele confers a selective advantage (Figure 1A). Also note that once established in the population, e.g. p = 0.1, anadvantageous, recessive allele can increase to fixation (p =1) in a relatively short time (Figure 1B).

Refer to Figure 1(B)again and note the interesting, and somewhat counterintuitive result that only if A1, the advantageous allele, is recessive or incompletely dominant does A1 become fixed in the population (p = 1.0). Please spend some time and develop a speculative hypothesis that could explain why A1 will neverbecome fixed if A1 is dominant.

Write your speculation down here before proceeding with this background section:

Figure 2, below, shows the meanfitness of apopulationplottedas a function of p

(i.e. frequency of allele A1). The mean fitness of the population is the average fitness of individuals in the population relative to the fittest genotype. Relative fitness values (w11, w12, w22) of three different genotypes (A1/A1; A1/A2; A2/A2) are varied in each simulation, leading to different outcomes. The different scenarios represent two different, opposite cases of directional selection(A,B), a case of overdominance (C), and a case of underdominance(D). By studying the graphs, paying particular attention to the relative fitnesses (w) of the three different genotypesin each case (do this!), it is possible to discern the effects of genotypic fitness differences on changes in allele frequency, i.e. evolution, and to understand that there are implications of these changes for the average fitness of the population as a whole. The ^p symbol specifies an equilibrium value of p.

Figure 2.(A) Directional selection (w11 > w12 > w22) which would lead to fixation of A1 (elimination of A2): the population’s mean fitness is maximized at 1.0 when all individuals are A1/A1 homozygotes. (B) Directional selection (w22 > w12 > w11) which would lead to fixation of A2 (elimination of A1): the population’s mean fitness is maximized at 1.0 when all individuals are A2/A2 homozygotes. (C) Overdominance (heterozygote has highest relative fitness) where the equilibrium value of the mean fitness of the population reaches a maximum, stable, equilibrium value that depends upon the relative fitnesses of the three genotypes. Equilibrium frequencies of alleles are restored if perturbations in these occur. (D) Underdominance (heterozygote has lowest relative fitness) where the equilibrium value of allele frequency is unstable. Deviations from these values lead to either fixation or loss of A1. Figure is from Futuyma (2007), Evolution.

For each graph of Figure 2: what is the dominance relationship of A1 and A2?; which allele confers a selective advantage? You should now be able to revisit Figure 1 and determine which of the scenarios just described (directional selection, overdominance, underdominance) was operative in the graphs of Figure 1. Please do this now.

B. Simulations Using AlleleA1

We will use the freeware application,AlleleA1, to experiment with allele and genotype frequency changes in response to different selection scenarios. After some time running these simulations, you should be able to recognize and construct different types of selection scenarios (directional, overdominance, underdominance). All simulations using this program will be using a one-gene, two-allele model. You should work with one lab partner in running these simulations.

The following section will briefly describe the use of AlleleA1. You can download both the AlleleA1.exeprogram and a manual describing its use (also provided as an appendix to this handout) at: with the link provided on the Biology 331 course syllabus.

When you run AlleleA1, you will initially get a window that looks like this:

The window has a graph area with the frequency of the A1 allele (p) on the y axis and the number of generations on the x axis. There are also boxes in which you can change the parameters of the simulation. For the experiments you will run today, you will probably wish to change only the following parameters:

Starting frequency of A1
Fitness (relative), w11, of genotype A1A1 (=A1/A1)
Fitness (relative), w12, of genotype A1A2 (=A1/A2)
Fitness (relative), w22, of genotype A2A2 (=A2/A2)

You can choose to overlay consecutive simulations by choosing the “Multiple” or “Auto” radio buttons in the lower right corner of the window. For the “Auto” option, consecutive plots will be shown in different colors.

If you wish, you can change the number of generations in the simulation by using the arrow at the bottom right corner of the graph area.

For each run (simulation), in addition to the results displayed in the graph area, you will also get exact numbers of the final frequencies of the alleles (A1, A2) and genotypes (A1/A1, A1/A2, A2/A2) displayed to the right of the graph area. It is a good idea to observe these numbers. When you run experiments, you will need to record these numbers as part of your results.

At this point, feel free to play with the parameters mentioned above, running simulations by hitting the “Run” button to get a feel for the AlleleA1 program. For each run of the simulation, you will get a plot of the changes in the frequency of the A1 allele (p) over time (generations). For now, run some simulations without paying much attention to the precise nature of the results. Spend at least 5 minutes “playing.”

Experiments with AlleleA1

Even though these are simulations, treat the results generated by your in silicoexperiments as you would any laboratory or field data. For each simulation run, record in your lab notebook the parameter settings, and the final frequencies of alleles and genotypes obtained. Keeping accurate track of your quantitative results is very important as you explore the ramifications of changing selective conditions. You should record your observations of the graph data, e.g. how many generations did it take for significant change to occur and what are the final frequencies of A1 and A2.

1. The absence of natural selection.

1)Hit the “Reset” button to return AlleleA1 to starting parameters.

2)For the first simulation, keep all parameters set to the default, reset values. Run a simulation. Repeat, varying only the parameter of the starting frequency of allele A1 (p). What happens? Record your data (numbers and graphs) for multiple experiments with different starting p’s. Look closely at the starting parameters for this set of simulations. Note that since there are no differences in the relative fitnesses of the three genotypes, no migration or mutation, infinite population size, and random mating (no inbreeding), the conditions for Hardy-Weinberg (H-W) equilibria are met for each simulation. What is the most distinguishing feature of your results from H-W equilibria experiments?

2. Selection.

Let’s introduce selection into our simulation experiments by recreating the situations of Figure1(B), remembering that in this Figure’s simulations, A1 was advantageous in every case and that the starting frequency of A1 was 0.1. Before starting these simulations, reset to default program values.

First, set the fitnesses of the genotypes as for the A1 dominant scenario:

w11, w12, w22 = 1.0, 1.0, 0.8, respectively. Run a simulation. What happened? Did A1 become fixed? Be sure to check the numbers to the right of the graph before making any conclusions. Try it again over a larger number of generations. Did increasing the generations lead to fixation of A1?

Next, set the fitnesses of the genotypes as for the A1 recessive scenario:

w11, w12, w22 = 1.0, 0.8, 0.8, respectively. Run the simulation. What happened? Did A1 become fixed? Be sure to check the numbers to the right of the graph before making any conclusions.

Finally, set the fitnesses of the genotypes as for the incomplete dominance scenario: w11, w12, w22 = 1.0, 0.9, 0.8, respectively. Run the simulation. What happened? Did A1 become fixed? Be sure to check the numbers to the right of the graph before making any conclusions.

What type of selection scenario (directional, overdominance, underdominance) have you just simulated? Revisit Figure 1(A) [starting p = .01] and repeat the above three simulations. For the A1 recessive case, does p increase over 100 generations? Expand the number of generations and see what happens. Record your data.