Text S1: Assessing Gene Presence Absence Patterns

For conservative assessments of gene presence/absence variation (PAV), we excluded singletons (genes that were found only in a single strain). There were a total of 14,222 gene clusters in the pan genome (i.e., core and flexible genome), with 5283 of these genes that could be mapped to reference coordinates. Patterns of gene gain or loss could also contribute to the differentiation between N-fertilized and control populations. To examine the distribution of the flexible genome, for each gene in the pan genome that mapped to the WSM1325 reference, we constructed a gene presence index (GPI): GPI = (Total # control strains with gene/34) – (Total # N-fertilized strains with gene/28). Thus, the GPI reflects the degree to which a particular gene is found in N-fertilized rhizobia compared to control rhizobia; negative values indicate genes found more commonly in N-fertilized strains, and positive values indicate genes found more commonly in control strains.

Our analyses of gene presence/absence variation (PAV) revealed high variation in gene content among strains; nevertheless, there were no obvious PAV gene candidates driving partner quality decline because no genes were completely absent in control strains and present in all N-fertilized strains (GPI = -1), or vice versa (GPI = 1; Fig. 1AB below). There was little overall deviation from GPI = 0 with 95% of the values ranging from -0.113 - 0.122 (observed mean = 0.006, SD = 0.054, min = -0.359, max = 0.443), revealing that many genes can be found in equal numbers of control and N-fertilized strains. Genes in the top 1% (N = 53) or bottom 1% (N = 53) of GPI values indicate that they were found more commonly among control strains or N-fertilized strains, respectively (orange triangles in Fig. 1B). The top and bottom 1% of GPI values do not contain obvious symbiosis genes such as nif, nod, or fix genes (Dataset S3); nevertheless, given that most symbiosis genes have been identified in laboratory manipulations using isogenic mutants, it is likely that a large fraction of the genes generating natural variation in partner quality remain unannotated[1].

Presence-absence variation (PAV) in natural bacterial populations, which arises through gain and loss of genes within particular lineages, is likely important to adaptation.The evolutionary mechanisms governing this variation are just starting to be elucidated [2–4]. Because formal tests of selection on microbial flexible gene content are in their infancy [5], we instead looked for a significant association between gene PAV and the N environment. Here we found no genes that were completely absent in control strains and present in all N-fertilized strains (GPI = -1), or vice versa (GPI = 1); therefore, we could not definitively connect any variable gene content (other than the pSym region) to decreased rhizobium partner quality after long-term N exposure. Characteristic of the microbial “U” shaped gene frequency distribution[2], a large portion of genes in the pan genome were found in high frequency, with much of the remaining gene content occurring at low frequency. High frequency (i.e., core or nearly-core) genes typically encode housekeeping functions irrespective of local environment, whereas the prevalence of low frequency genes suggests that much of the flexible gene content is either neutral or is important for highly-dynamic selective agents (e.g., phage predation) in local microhabitats[3,6,7]. A more comprehensive understanding of the relative contributions of PAV versus nucleotide variation in microbial adaptation will require better theoretical models, i.e., a “neutral model” [8]of PAV, as well as more ecological genomic investigations focused on bacteria.

Figure 1:Patterns of gene presence and absence across the genome for N-fertilized and control Rhizobium leguminosarum. PlotA: Total number of strains (out of 62) possessing each gene in the pan-genome (5283 genes total). Plot B: Gene presence index (GPI) for all genes in the pan-genome for all 62 strains. Genes with GPI values larger than zero were found more often in N-fertilized strains; genes with values below zero were found more often in control strains. Orange triangles denote the top and bottom 1% of GPI values (106 genes total).The region of the pSym (in pink) that houses symbiosis genes of interest (e.g., nifand nod genes) is highlighted in light blue.

References:

1.Heath, K. D., Burke, P. V. & Stinchcombe, J. R. 2012 Coevolutionary genetic variation in the legume-rhizobium transcriptome. Mol. Ecol.21, 4735–4747. (doi:10.1111/j.1365-294X.2012.05629.x)

2.Lobkovsky, A. E., Wolf, Y. I. & Koonin, E. V. 2013 Gene frequency distributions reject a neutral model of genome evolution. Genome Biol. Evol.5, 233–242. (doi:10.1093/gbe/evt002)

3.Cordero, O. X. & Polz, M. F. 2014 Explaining microbial genomic diversity in light of evolutionary ecology. Nat. Rev. Microbiol.12, 263–73. (doi:10.1038/nrmicro3218)

4.Epstein, B., Sadowsky, M. J. & Tiffin, P. 2014 Selection on horizontally transferred and duplicated genes in Sinorhizobium (ensifer), the root-nodule symbionts of Medicago. Genome Biol. Evol.6, 1199–1209. (doi:10.1093/gbe/evu090)

5.Shapiro, B. J. 2013 Signatures of Natural Selection and Ecological Differentiation in Microbial Genomes. In Ecological Genomics (eds C. R. Landry & N. Aubin-Horth), pp. 339–359. Springer Netherlands. (doi:10.1007/978-94-007-7347-9)

6.Hao, W. & Golding, B. G. 2006 The fate of laterally transferred genes: Life in the fast lane to adaptation or death. Genome Res.16, 636–643. (doi:10.1101/gr.4746406)

7.Doolittle, W. F. & Zhaxybayeva, O. 2009 On the origin of prokaryotic species. Genome Res.19, 744–756. (doi:10.1101/gr.086645.108)

8.Kimura, M. 1983 The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.