Supplemental methods, figures, and tables for Lamb et al.: A high-throughput belowground plant diversity assay using next-generation sequencing of the trnL intron

.

Figure 1: Alignment of trnL operon DNA sequences from diverse plant species across the location of primers selected for this study. The c-1 and h-1 primers used were modified from the trnL c and h primers designed by Taberlet et al. (1991) and Taberlet et al. (2007) respectively. Note that primer h-1 is written in reverse order to Taberlet’s primer h. The modifications incorporate degeneracies to capture more plant species. The forward primer “c-1“ and the reverse primer “h-1” and their matches to a number of vascular plant species are shown in the top and bottom panels respectively

DNA extraction efficiency and qPCR amplification efficiency assay

Molecular characterization of communities is susceptible to biases in DNA extraction, amplification and sequencing of target genes. Extraction of DNA (yield, purity, etc.) varies between plants, for example, species with higher polyphenol and polysaccharide content are difficult to extract from and can suffer from low yield when extracting genomic DNA (Aljanabi et al. 1999; Salzman et al. 1999). Root size, age and structure can also influence extraction efficiency (Fisk et al. 2010). Genomic DNA subsequently amplified by PCR is also subject to bias because template composition and secondary structure will influence amplification efficiency. For example, GC content both upstream and downstream of the target region can influence the stability of the DNA at certain temperatures (Siciliano et al. 2007). Relatively small differences in efficiency can lead to large differences in amplicon abundances due to the exponential nature of the PCR reaction, meaning that in a mixed community, some species will be inaccurately represented especially in communities of relatively low diversity (i.e. the low diversity of plant root systems relative to microbial communities makes plant applications particularly susceptible to biases) (Siciliano et al. 2007). Furthermore, the Next-Generation Sequencing (NGS) platforms are all subject to variation in the fidelity of their resequencing of amplicons. Some platforms suffer from homopolymer extensions (Roche 454, Ion Torrent) as well as general mis-pairing of bases during reactions. Comparisons of the various NGS platforms show Illumina to possess the highest accuracy, with extremely low indel rates and substitution rates of less than 1% (Junemann et al. 2013).

While biases in molecular techniques exist, most can be mitigated by including steps to calculate the level of error, efficiency or bias involved during the resequencing of amplicons. Here we evaluate two potential biases (DNA extraction yield, and amplification efficiency) on five common rough fescue grassland species.

Five reference plant species were used to determine mean extraction yields and amplification efficiency to evaluate the potential for cross-taxa bias. The species chosen were the forbs Achillea millefolium and Campanula rotundifolia and the grasses Bromus anomalus, Bromus inermis, and Festuca altaica ssp. hallii. Each species was grown separately in the greenhouse and root samples were collected in 1.5mL tubes in a gradient of six different weights (5, 10, 20, 50, 80 and 100mg dry weight per species) in triplicate and then extracted using a Power Plant Pro DNA Isolation Kit. (MOBIO, Cat# 13400-50). The kit protocol was followed, with the exception of using a Retsch MM-400 grinder (Retsch, Haan, Germany). The extracted DNA was then eluted in 30uL of elution buffer. Concentration estimates for these extracts were obtained in ng/ul (Nano-drop 2000, Nano-drop products, DE, USA) and extracts were visualised on a 2% agarose gel.

qPCR was used to assess amplification efficiency with the trnL primers. Reactions were performed using a Quantitect SYBR green PCR kit (Qiagen, Toronto, ON, Canada) on a ABI 7500 Realtime PCR system (ABI). This involved a 95 °C hold for 15 mins, followed by 40 cycles of 95 °C, 50 °C, 72 °C and 77 °C deg for 30, 30, 72 and 45 seconds respectively. The reactions were terminated with three steps consisting of 95 °C for 15 sec, 60 °C for 60 sec and 95 °C for 15 min. Template dilutions were required to dilute PCR inhibitors.

DNA extraction efficiency was higher in some species (59.8 ± 11.0 ng/mg root for Festuca altaica ssp. hallii, versus 33.0 ± 5.1 ng/mg root for Campanula rotundifolia). The variation of yield was significantly different (p=0.04) among the species analysed delineating two groups of high and low yields (Figure S2). The PCR amplification efficiency observed among these taxa also varied (95.7 ± 13.8 % for Bromus inermis versus 86 ± 2.88 % for Bromus anomalus). While the species assessed for DNA extraction efficiency represent a diverse collection, they do not provide a proxy for all plant taxa, particularly when considering the DNA extraction efficiency. To further explore these effects on composition, in silico analysis was required.

Though biases are present from the molecular techniques applied here, they do not have a significant effect on plant community composition (Figure S3). In silico generation of plant communities highlighted that although some species under report (Bromus anomalus R = 0.267) and some over report (Festuca altaica ssp. hallii R = 1.002) these factors do not affect the overall difference between expected and observed in silico data. Sequencing platforms also bring about their own biases and effects on sequence data. To test this, mock communities were analysed.

Figure S2. DNA yield and qPCR efficiency from 5 specimens of five common grassland species used to test the efficacy of this assay. Error bars are one standard deviation.

Figure S3: In silico analysis of the molecular biases imparted in this study. Mock communities were constructed by scaling aboveground plant species counts for DNA extraction yield and amplification efficiency to produce correlated belowground counts. Communities were compared to see whether species over or under report abundance. For clarity, the upper-right inset expands the observed community composition datapoints above 0.9 and the lower-left inset expands observed community composition below 0.1.

Figure S4. Rarefaction curves showing the relationship between number of sampled reads and number of unique OTUs recovered from thirteen samples taken from the A soil horizon in a rough fescue grassland.

Figure S5. Rarefaction curves showing the relationship between number of sampled reads and number of unique OTUs recovered from nine soil samples taken from the B soil horizon in a rough fescue grassland.

Table S1. Summary statistics for grassland, synthetic community, and Arctic samples (mean number of reads ± 1 standard deviation) for the number of raw reads from Illumina sequencing and numbers of filtered reads retained at three stages in the bioinformatics pipeline. Note that the values reported here for the Arctic samples are based on a subset of 23 samples.

Grassland / Synthetic Communities / Arctic Samples
Raw Reads / 30048 ±26622.02 / 134561.06 ±98276.87 / 22647 ±21237.9
Post screen.seqs / 19811.08 ±27882.86 / 79662.5 ±67406.92 / 20009.23 ±18868.22
post pre.cluster / 16056.38 ±26370.73 / 77583.16 ±65974.13 / 9693.86 ±11795.85
Chimera checked / 15691.25 ±25518.88 / 75955.78 ±64592.47 / 9465.18 ±11570.47

Table S2. Species detection rates in mock root communities. Number of mock communities indicates the number of the 47 mock communities where a species was present, and abundance when present indicates the mean abundance (percentage of the community) when the species was present. The proportions of communities with correct detections of a species and mistaken detection of a species are indicated by the true and false detection rates.

Species / Number of Communities / Abundance When Present / True Detection Rate / False Detection Rate
Achillea millefolium / 14 / 1.5 / 0.64 / 0.27
Agoseris glauca / 4 / 1.7 / 0 / 0.21
Agropyron spp. / 40 / 8.3 / 0.7 / 0
Agrostis scabra / 1 / 1.5 / 0 / 0
Androsace septentrionalis / 5 / 1.5 / 0.8 / 0.02
Anemone spp. / 18 / 13.8 / 0.72 / 0.1
Arabis hirsuta / 1 / 0.7 / 0 / 0
Aster ericoides / 9 / 2.6 / 0.44 / 0.03
Aster laevis / 4 / 1.7 / 0 / 0.05
Astragalus spp. / 31 / 2.2 / 0.97 / 0.13
Bouteloua gracilis / 1 / 1 / 0 / 0
Bromus anomalus / 2 / 13.7 / 0 / 0
Bromus inermis / 40 / 54.3 / 0.98 / 0.86
Campanula rotundifolia / 4 / 1 / 0.25 / 0.02
Carex spp. / 36 / 4.7 / 0.42 / 0
Cerastium arvense / 5 / 4 / 0.6 / 0
Cirsium spp. / 12 / 3.6 / 0.08 / 0
Erigeron spp. / 1 / 28.4 / 1 / 0
Erysimum inconspicuum / 2 / 0.9 / 0 / 0
Festuca hallii / 39 / 18.7 / 0.97 / 0.5
Galium boreale / 22 / 2.7 / 0.73 / 0
Helictotrichon hookeri / 3 / 10 / 0.67 / 0.02
Juncus balticus / 6 / 2.5 / 0.5 / 0.02
Koeleria macrantha / 3 / 1.2 / 0 / 0
Linum lewisii / 1 / 0.6 / 0 / 0
Melilotus officinalis / 2 / 1.1 / 0 / 0
Penstemon spp. / 2 / 10.9 / 0.5 / 0.02
Poa palustris / 1 / 2.8 / 0 / 0
Poa pratensis / 2 / 5.7 / 0 / 0
Rosa spp. / 31 / 5.8 / 0.94 / 0
Sonchus arvensis / 8 / 10.2 / 0.75 / 0
Stellaria longifolia / 1 / 2.2 / 1 / 0
Stipa spp. / 10 / 8 / 0.5 / 0
Symphoricarpos occidentalis / 12 / 9.1 / 0.67 / 0
Thermopsis rhombifolia / 11 / 5.8 / 0.55 / 0
Vicia americana / 37 / 1.9 / 0.86 / 0

Supplemental References

Aljanabi SM, Forget L, Dookun A (1999) An improved and rapid protocol for the isolation of polysaccharide- and polyphenol-free sugarcane DNA. Plant Molecular Biology Reporter 17: 281-281. doi: 10.1023/a:1007692929505.

Fisk MC, Yanai RD, Fierer N (2010) A molecular approach to quantify root community composition in a northern hardwood forest — testing effects of root species, relative abundance, and diameter. Can J For Res 40: 836-841. doi: 10.1139/X10-022.

Junemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, Mellmann A, Goesmann A, von Haeseler A, Stoye J, Harmsen D (2013) Updating benchtop sequencing performance comparison. Nat Biotechnol 31: 294-296. doi: 10.1038/nbt.2522.

Salzman R, Fujita T, Zhu-Salzman K, Hasegawa P, Bressan R (1999) An improved RNA isolation method for plant tissues containing high levels of phenolic compounds or carbohydrates. Plant Molecular Biology Reporter 17: 11-17.

Siciliano SD, Ma W, Powell S (2007) Evaluation of quantitative polymerase chain reaction to assess nosZ gene prevalence in mixed microbial communities. Can J Microbiol 53: 636-642. doi: 10.1139/W07-014.

Taberlet P, Coissac E, Pompanon Fo, Gielly L, Miquel C, Valentini A, Vermat T, Corthier Gr, Brochmann C, Willerslev E (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res 35: e14. doi: 10.1093/nar/gkl938.

Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Mol Biol 17: 1105-1109.

1