Nancy Schoeppner

Genetics 303

11/4/11

The Application of Genetic Techniques in Wild Butterfly Populations inspired by

Functional Genomics of Life History Variation in a Butterfly Metapopulation

The Glanville fritillary butterfly, Melitaea cinxia(Nymphalidae), has been the subject of an extensive long term study to understand how population dynamics function on the landscape scale. The Glanville fritillary butterfly is found on the Aland Islands, which are off the coast of Finland. These islands contain approximately 4000 suitable habitatsthat can support butterfly populations. The Glanville fritillary lays its eggs on two specific host plants, Plantago lanceolata and Veronica spicata, which the larval caterpillars eat to grow and metamorphose into the adult butterflies. These host plants are limited to thin soil habitats bordering granite outcrops, which in turn limit the distribution of the butterflies (Klepsatel and Flatt 2011). IllkaHanski and his colleagues have been surveying the 4000 habitat sites since 1993, recording which sites had a butterfly population and the size of that population. The results of this long-term survey have shown that only a fraction of the 4000 suitable habitats contain butterflies at any time, making a mosaic of populations across the landscape (Metapopulation Research website). As sites become overpopulated and competition increases, butterflies migrate from inhabited sites to colonize new populations in uninhabited sites. This process of stochastic extinction and recolonization on the landscape scale is referred to as metapopulation dynamics. Given the process of founding populations followed by population growth;metapopulation dynamics provides a source of stabilizing selection where butterflies that are good dispersers (high metabolism, high endurance, early egg production, high contribution to egg production) are successful in founding “new” populations and butterflies that are good competitors (high efficiency of resource use, later reproduction, longer reproduction) are more successful in existing “old” populations (Wheat et al. 2011). While this interpretation of how selection is operating in a natural population makes good logical sense, it has not been verified using studies of DNA sequence variation or gene expressionin both “new” (recently colonized) and “old” (existing) populations . The major stumbling block that has prevented this type of research in most natural populations is the lack of a sequenced genome and microarrays to look for patterns of variation in gene expression within and among populations. Until recently, genome sequences have not been available for many non-model organisms. Genome sequences have been available for Drosophila, C. elegans, and Arabidopsis but these model organisms have been used more often for basic research questions about the role of genes, and less so for studies of evolution in natural populations. The goal of this paper is to describe how a group of researchers has developed an extensive understanding of the interplay between genetics, evolution, and ecology using the Glanville fritillary butterfly and a related butterfly species. I begin by describing how variation in the sequence and structure of phosphoglucose isomerase (Pgi) was linked to differences in the fitness of Colias eurytheme, a butterfly that that is in the Pieridae family which is a sister taxa to the Nymphalidae. Next I discuss how a description of the transcriptome for M. cinxiawas developed using a new sequencing technique (454 pyrosequencing), and then how that transcriptome wasused to detect balancing selection acting on Pgi in the M. cinxia populations living on the Aland Islands.

Phosphoglucose isomerase (Pgi) is a gene that codes for the enzyme (PGI) that catalyzes the second reaction in glycolysis where glucose 6-phosphate is converted into fructose 6-phosphate. Glycolysis is needed to convert glucose into pyruvate, which is then used by the mitochondria to produce ATP. PGI is a protein dimer that is formed through the interaction of two polypeptide chains. The polypeptide chains interact to form the active site where fructose 6-phosphate attaches to the enzyme to be converted into fructose 6-phosphate. One particular amino acid residue, His 392, has been identified as being particularly important to the enzyme’s function. The histidineat position 392 on each polypeptide chain is positioned such that it is involved in forming the catalytic site of the other polypeptide chain. This interaction between the two polypeptide chains in forming the catalytic site is important because multiple alleles for Pgi have been identified in butterfly populations (Wheat et al. 2006). This means that the performance of the enzyme in individuals with different genotypes is likely to be different because of the way that the two polypeptide chains fit together. Some combinations of alleles will produce polypeptide chains that fit together better than others, which will in turn make some enzymes more effective than others. The organisms that contain the more effective enzymes should also have more efficient ATP production which should translate into fitness effects.

The sequence of PGI is highly conserved among other taxa. For example, butterfly PGI shows 69% sequence identity to both rabbit and human PGI sequences (Wheat et al. 2006). Because of this similarity, the shape of the PGI, which has been determined for rabbit and human PGI, can be used to predict the 3-D shape of the butterfly PGI. The shape of the proteins produced by these different alleles can then be analyzed to determine why some versions of the enzyme are more efficient than others. This analysis will provide information about which mutations in PGI are produce important differences in performance among different butterfly genotypes.

Wheat et al. (2006) used DNA sequencing to identify 4 different Pgi alleles in a population of C. eurytheme that was collected in Tracy, California. Averages of synonymous (nucleotide changes that do not change the amino acid sequence) and nonsynonymous (nucleotide changes that do alter the amino acid sequence) were found for the alleles. The average number of synonymous differences found between pairs of the four alleles ranged from 29.2-41.4, while the average number of nonsynonymous differences ranged from 2.8 – 5.0. The amount of variation that was observed in this single population of butterflies was much higher than the variation that has been previously observed in Pgi in Drosophila melanogaster and in other genes in C.eurytheme (Wheat et al. 2006). The authors contend that this high level of polymorphism in Pgi is maintained by balancing selection via heterozygote advantage. Because PGI is a dimer, two polypeptide chains must interact to form the functional enzyme. In a heterozygote the two polypeptide chains are not identical, while in a homozygote the two polypeptide chains are the same. In previous studies using another butterfly species, the enzymes produced by heterozygotes were found to perform better than the enzymes produced by homozygotes (Watt 1983, Watt et al. 1996). The authors connect this observation to their findings by suggesting that the dimers formed by the heterozygotes suffer from fewer conformational constraints, which allows them to form more effective active sites. They suggest that the homozygotes would be limited in the way that the two chains could fit together because of the symmetry of the chains. The authors support this contention using the results from the analysis of how the nucleotide substitutions that they found altered the 3-D shape of the protein produced by the different alleles. The authors found that the nonsynonymous nucleotide substitutions only occurred in regions of the gene that coded for amino acids found on the exterior region of the enzyme. They also found that a few substitutions in particular altered the shape of the active site of the enzyme. These substitutions typically involved a change in charge of the amino acid, which changed the way that the amino acid interacted with its environment and the other amino acids in the area.

The synonymous nucleotide substitutions were then analyzed to determine if selection was maintaining the variation in the gene using a test called Tajima’s D. The Tajima’s D test is used to determine if variation occurring between two DNA sequences is due to random (genetic drift) or nonrandom (selection, population bottlenecks, population expansions) processes. The Tajima’s D test is based on a model which uses the number of segregating sites and the average number of mutations between two segments of a DNA to compute a test value. If the DNA sequence of interest is experiencing neutral drift then the test value is expected to be zero. A negative test value is interpreted to indicate a recent population bottleneck followed by population growth. A positive test value is interpreted to indicate a decrease in population size or balancing selection. In this study the authors calculated overall Tajima’s D values using the synonymous nucleotide changes in the four alleles, and then performed a sliding window analysis where they calculated Tajima’s D values for small sections of the DNA sequence. In the sliding window analysis the authors calculated a Tajima’s D value for smallsegments of nucleotides within PGI. They then shifted the window along the DNA sequence and calculated a new value. They performed this sliding window test for two exons of Pgi (exons 7 and 9) to look for extreme changes in the Tajima’s D value. Overall, the Tajima’s D value for the Pgi allele was found to be negative, which is not consistent with balancing selection acting to maintain variation in this gene. However, areas of high positive values of Tajima’s D were found in exons 7 and 9 in the sliding window analysis. This supports the idea that the balancing selection is acting on those sections of the gene. In total all of these results were interpreted by the authors to indicate that the different alleles observed did in fact produce versions of PGI that had different 3-D shapes that varied in function, with the heterozygotes having a likely performance advantage that was responsible for maintaining the allelic variation in the population observed in exons 7 and 9.

In a 2010 study Wheat found a similar result looking at polymorphisms in Pgi in a different butterfly species. The 2010 study used the Glanville fritillary, Melitaea cinxia, as the study organism. The change in model organisms was likely driven by the fact that the previous study was completed as part of the author’s Ph.D. research, while the studies that I describe in the remainder of the paper were completed working in a different lab as a post-doc. The switch of species was also not likely to have been just a matter of changing jobs and universities, but also a strategically planned choice. The ecology of the Glanville fritillary has been studies intensively since 1993; making this species an ideal candidate for studies that bridge the gap between ecological and evolutionary patterns and genetics. It was already known that different allozymes (forms of the enzyme) were present in the population and that individuals with the different allozymes exhibited differences in flight metabolic rate, female fecundity, and lifespan (Wheat et al. 2010). Wheat et al. (2010) analyzed the coding region of Pgi in 22 adult butterflies collected in 2004 from 15 different populations from across the Aland Islands. Coding sequences were amplified from cDNA and haplotypes were identified. The authors constructed a neighbor network of the Pgi haplotypes that were found to try to answer questions about the appearance of the genetic variation in the butterfly populations. They found 16 unique haplotypes in samples taken from 22 individuals. The different allozymes of PGI corresponded to distinct haplotype clades with the majority of the nucleotide differences occurring between rather than within the clades (Wheat et al. 2010). They also performed an analysis to determine how long ago the different haplotype groups shared a common ancestor. To do this they used a computer program called BEAST. Overall, the authors found a higher amount of polymorphism in Pgi than would be predicted by neutral theory. This is consistent with the hypothesis that balancing selection is acting in this population to maintain the variation. The authors also went through extensive tests to rule out other possible mechanisms related to changes in population size that may also account for the high polymorphism. They performed analyses designed to model different scenarios of population change and they found that some scenarios could produce the observed patterns of polymorphisms, but that it was unlikely that changes in population size could account for the observed results. In addition when the results from this study were compared to the results obtained in the study in C. eurytheme, a similar pattern of variation was detected, but there were enough differences to suggest that the balancing selection observed in both populations was the result of convergent evolution. This result lead the authors to conclude that the polymorphism that they observed in PGI for both species of butterfly is due to long-term balancing selection that occurs in many butterfly populations, and it is not strictly due to the population structure of either butterfly population.

While the previous studies illustrated the power of using genetic techniques to help explain patterns of evolution observed in natural populations, Pgi is not likely acting in isolation in the population. To better understand the evolutionary and ecological implications of genetic differences in populations, scientists would like to be able to look for differences in gene expression between populations. One tool that is needed to do this is a microarray for the species of interest. Given the cost and time needed to make microarrays, they have not generally available for nonmodel species like the Glanville fritillary. New sequencing techniques have recently been developed that are cheaper and faster. Vera et al. (2008) describe how they used 454 pyrosequencing to produce transcriptome for M. cinxia. To do this, they started with RNA that was isolated from 80 larval individuals collected from a diverse range of populations across the main Aland island. The authors then made a cDNA library from the RNA pool using both the 454 pyrosequencing technique and traditional Sanger sequencing. The authors found that the 454 pyrosequencing was able to produce results that could be used to construct a microarray that showed both high repeatability and the ability to detect biological differences among individuals (Vera et al. 2008). An unexpected additional result was that the authors also detected non-butterfly cDNA in their analysis. This occurred because the RNA that was used to make the cDNA library came from a whole insect. This whole insect DNA was screened for polyadenylation to eliminate DNA from many microorganisms but some DNA from microsporidia did get sequenced as well. This was an unexpected result because these butterflies were not known to carry microsporidia. The authors suggest that this technique can also be used to detect parasites (xenobiotics) in host species.

The final paper that I will discuss builds on the previous work to apply genomics to questions about ecology and evolution. The authors use a functional genomics approach to identify genes that are thought to be involved in regulating differences in life history characteristics between butterflies that are in established old populations and butterflies that are in newly colonized populations. This focus on old and new populations is based on the idea that the selection pressures in these two populationsare different. Butterflies in the old populations likely face higher competition while butterflies that colonize new habitats would need to fly long distances and reproduce before they die. These different selection pressures should maintain both types of butterfly in the population. To test this idea using a genomic approach Wheat et al. (2011) used the microarray that they developed using the 454 pyrosequencing technique to look at gene expression in old and new populations. They found that the females from new populations showed a higher expression of genes involved in egg provisioning (including larval serum proteins and amino acid transporters) and production. They also found that new population females had a higher level of juvenile hormone, which is important in triggering sexual maturation, and they had a larger number of mature eggs. Together these results indicate that females from new populations are able to reproduce faster and at an earlier age. In addition to the differences in gene expression related to egg production, the authors also found differences related to flight performance. There was a higher expression of proteins related to increased protein turnover in response to damage caused by intense flight. The authors also found a difference in the frequencies of two alleles that are important to metabolism. Individuals in the new population had a higher frequency of both a Pgi allele and an Sdhd (succinate dehydrogenase) allele that are linked to more efficient ATP production. Butterflies that had certain Pgi and Sdhd alleles had better metabolic capabilities compared to other butterflies and were better able to fly long distances. Overall, the importance of these results is that gene expression analysis was able to help uncover some mechanistic explanation of ecological trade-offs (i.e. between competitive ability and dispersal) and proposed evolutionary patterns (disruptive selection) that are thought to be at work in natural populations.