Distinct sensitivity of fungal freshwater guilds to water quality

Oliver Röhl1, Derek Peršoh1, Moritz Mittelbach1, Vasco Elbrecht2, Andreas Brachmann3, Julia Nuy4, Jens Boenigk4, Florian Leese2 and Dominik Begerow1

1 AG Geobotany, Faculty of Biology and Biotechnology; Ruhr Universität Bochum; Universitätsstr. 150, 44801 Bochum, Germany.

2 AG Aquatic Ecosystem Research, Faculty of Biology; Universität Duisburg-Essen; Universitätsstr. 5, 45141 Essen, Germany.

3 Genetics,Faculty of Biology, Ludwig-Maximilians-University Munich, Großhaderner Straße2-4, 82152 Planegg-Martinsried, Germany.

4 Department of Biodiversity, Faculty of Biology; Universität Duisburg-Essen; Universitätsstr. 5, 45141 Essen, Germany.

Corresponding author: ; Telephone: 0049 234 / 322 24043

Journal: Mycological Progress

Bioinformatics pipeline

In the following list we will explain the order an details of the applied bioinformatic pipeline for both presented algorithms (CD-HIT-OTU and usearch). The workflow consists of demultiplexing, quality filtering, data transformation and OTU picking. Parts, written in italic, should be replaced by individual file and folder names.

1)  Extracting barcodes and tags from reads

Qiime > extract_barcodes.py f Input FastQ forward -r Input FastQ reverse -c barcode_paired_end --bc1_len 13 --bc2_len 13 -o output folder

This command is used in a shell script for all Illumina index combinations, like 501-701, 501-702, 501-703 etc.

2)  Recognition of tags and quality filtering

Qiime > split_libraries_fastq.py i input FastQ -o output folder -m mapping files -b barcode file --barcode_type 26 --store_demulitplexed_fastq -r 9 -q 19

The barcode file is generated during step 1. The command is processed for forward and reverse reads respectively.

3)  Generating a file with all samples

Qiime > cat FastQ file1 FastQ file2 … > output file

4)  Splitting of data into tag specific subfiles

Qiime > filter_fasta.py -f input FastQ -o output FastQ --sample_id_fp IDs.txt

This is necessary to identify the conserved flanking region for cdHIT-OTU. Sample ID´s are dependent on the respective pipetting scheme.

5)  Trimming of sequences to equal length

FastX > fastx_timmer -Q 33 -f first base to keep -l last base to keep -i input FastQ -o output FastQ

Absolute necessary in case of global alignment cluster algorithms (see usearch homepage).

6)  Clustering with cdHIT-OTU

CD-HIT-OTU > Cd-hit-otu-all-single.pl -i input FastQ -o output folder -m blast -t 180 -p primingsites.txt -c 0.97 -e 0.01

Sequence length is defined with –t option and restricted to 180 bp. CdHIT-OTU requires conserved regions for clustering at the beginning of sequences (primingsites). These are defined with –p and are highly dependent on barcode marker region. In this study we choose highly conserved flanking region of the 5.8 S rDNA.

7)  Generating OTU table

CD-HIT-OTU > Clstr_sample_count_matrix.pl _ output folder

Generates an OTU table with abundance of OTUs in each assigned sample.

8)  Taxonomy assignment

CD-HIT-OTU > assign_taxonomy.py -i input OTU list -o output folder -r UNITE_database.fasta -t UNITE_database.txt

The input OTU list is generated during step 6. UNITE reference database is downloaded on http://microbiology.se/tag/unite/. For this study we used version 7 (released 02.03.2015).

Following commands are only applied for usearch dataset, since CD-HIT-OTU has an implemented sequence check before clustering it is required to filter usearch dataset for comparison. For this we used the same dataset that was clustered in step 6.

9)  Sample filtering

Qiime > filter_fasta.py -f input.fastq -o output.fastq --sample_id_fp Sample_ID.txt

For further procession of files it is necessary to split up sequences into sample dependent files.

10)  Quality filtering

Qiime > split_libraries_fastq.py -i input_sample1.fastq --sample_id Sample1_ID -o output_folder/ -q 30 --barcode_type 'not-barcoded'

Sequences are filtered with stronger quality filter compared to step 2. Error probability of 0.1 is applied to all bases.

11)  Dereplication and singleton removing

Vsearch > ./vsearch-1.1.1-osx-x86_64 --derep_fulllength outut_folder from step 10/seqs.fna --output output_dereplicated.fasta --log=logfile --sizeout --minuniquesize 2

Input file is generated during step 11. Quality filtering must be done before step 11 because here fastq files are automatically translated into fasta files. Size out command enables the counting of repeated sequences. The “minuniquesize” is responsible for removal of singletons. Value defines the minimum read repeat number for further steps. Can be adapted in larger datasets to maintain stringent quality filtering.

12)  Combination of single files

Qiime > cat FastQ file1 FastQ file2 … > output file

13)  Size dependent sorting

Usearch > ./usearch8.1.1861_i86osx32 -sortbysize input.fasta -fastaout output.fasta -minsize 2

Clustering with usearch requires a sorting step when sequences are counted after dereplication. In this step it is possible to remove singletons via the “minsize” option if necessary.

14)  Clustering with usearch

Usearch > ./usearch8.1.1861_i86osx32 -cluster_otus input_sorted.fasta -otus output_rep_seqs.fasta -minsize 2 -relabel OTU -uparseout output_logfile.txt -sizein –sizeout

This command clusters all the remaining sequences according to the 0.97 similarity threshold, which is set by default. Important options are the “sizein” and “sizeout” commands. Due to them the read counts from the dereplication step are also considered.

15)  Generate OTU table with usearch

Usearch > ./usearch8.1.1861_i86osx32 -usearch_global all_seqs.fasta -db rep_seqs_OTUs.fasta -strand plus -id 0.97 -otutabout output.txt -biomout output.biom

This command generates the OTU table by mapping the representative reads from each OTU against the complete dataset. It generates two output files. A excel compatible .txt file and .biom file for further data procession in qiime.

Analyses of stressor interaction

Fig. S1 Stressor interactions. Pseudo-F values according to PERMANOVA (10,000 replicates) are given for the whole eukaryotic and fungal communities and the three preference groups, separately for litter and tile compartments. For each treatment combination (decreased flow velocity x enriched sediment, decreased flow velocity x enriched salt, enriched sediment x enriched salt, decreased flow velocity x enriched sediment x enriched salt) the significance of respective interactions was tested. Significant interactions are marked with asterisks (*= p < 0.05; **= p < 0.005; ***= p < 0.0005).

OTU read ratio

Fig. S2 Assignment of OTUs to guilds. 406 OTUs are categorized into 72 aquatic and 334 amphibious OTUs and sorted by ascending litter-biofilm read ratio. Ratios range from 0.0001 to 10,000, while the first one indicates a strong preference to biofilm, indicates the latter one a strong litter preference. The vertical line indicates the point of equal abundance between litter and biofilm. This was tested as the most suitable threshold between aquatic and amphibious OTUs.

OTU and read counts across treatments

The dependence of OTU and read counts in samples from different treatments was assessed. Species richness was not significantly different among treatments (values of pairwise t-Test not shown)(Fig. S3). Read counts also differed not significantly (values of pairwise t-Test not shown).

The observed differences in counts are not congruent with the differences in community composition (Fig. 5 and Fig. S1).

Fig. S3 OTU and read counts in dependence or treatment. Variability among OTU count (A) and read count (C) for litter as well as OTU count (B) and read count (D) for biofilm samples is given. Each treatment was tested in eight replicates for litter and four for biofilm compartments. The thick bar within each boxplot represents the median, while whiskers indicate the 0.25 and 0.75 quartiles. Dodd’s represent outlier samples. Applied stressors are abbreviated: L = decreased flow velocity; LN = decreased flow velocity + enriched salt; LNS = all three stressors; LS = decreased flow velocity + enriched sediment; N = enriched salt; NS = enriched salt + enriched sediment; S = enriched sediment and C = no stressor applied.

Abandonment of rarefying

The number of reads correlated significantly for samples of each compartment (Fig. S4). However, the five samples with the highest and lowest read counts, respectively, did not differ significantly according to OTU composition. ANOSIM (one-way, 10,000 permutations) an R-value of 0.028 (p-value = 0.34) for differences among the respective litter and an R-value of 0.132 (p-value = 0.09) for the tile samples.

Because of the negligible and non-significant differences according to the predominantly applied type of analyses in this study, we decided against rarefying our data, to minimize data manipulation. It was shown that the proportion of false positives increases considerably during rarefying for OTUs with variable abundance across samples (McMurdie and Holmes 2014). Clustering accuracy also decreased due to omitted read counts and added noise from random subsampling. McMurdie and Holmes (2014) showed that reliable results are obtained without rarefying with an adequate number of biological replicates. With eight and four biological replicates of litter and tiles compartments respectively we fulfilled these requirements and therefore conducted our analyses without rarefying.

Fig. S4 Read versus OTU count in litter (A) and tile (B) compartments. Linear regression is used to analyse the correlation between OTU and read count for litter (blue squares) and biofilm compartments (red triangles). R2 indicates the quality of the regression line calculated with the linear fit model.

Primer sequences

Table S1 PCR1 Forward primer sequences are TACACGACGCTCTTCCGATCT – Tag – GTACACACCGCCCGTC, those of Reverse primers CAGACGTGTGCTCTTCCGATC – *Tag – GCTGCGTTCTTCATCGATGC. Illumina indices and adapters were introduced in the second PCR using the PCR2 forward primer AATGATACGGCGACCACCGAGATCTACAC – index – ACACTCTTTCCCTACACGACGCTCTTCCGATCT and the reverse primer CAAGCAGAAGACGGCATACGAGAT – index - GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC.

Euk1391F / Tag / ITS2 / Tag / PSP501 / Tag / PSP701 / Tag
F13 / GGAG / R1 / ACAGA / -1 / TAGATCGC / -1 / TAAGGCGA
F14 / AGTTAAT / R2 / CTCACT / -2 / CTCTCTAT / -2 / CGTACTAG
F15 / CAGTGCGAGA / R3 / TGCTGAA / -3 / TATCCTCT / -3 / AGGCAGAA
F16 / TATAGACGATGCGT / R4 / GTGCACCA / -4 / AGAGTAGA / -4 / TCCTGAGC
--- / --- / --- / --- / -5 / GTAAGGAG / -5 / GGACTCCT

*NOTE: Introducing a T in front of the PCR2 reverse primers ensures compatibility with the Illumina sequencing platform.

ANOSIM results on stressor impact

Table S2 Response of preference group 1 (PG1) to individual and combined stressors. Influence of applied stressors on preference groups assessed with ANOSIM (pairwise test) and 10,000 replicates. Applied stressors (L = decreased flow velocity; LN = decreased flow velocity + enriched salt; LNS = all three stressors; LS = decreased flow velocity + enriched sediment; N = enriched salt; NS = enriched salt + enriched sediment; S = enriched sediment) are compared to the control ( C ).

PG1 / Litter Euk. / Biofilm Euk. / Litter Fungi / Biofilm Fungi
R-val. / p-val. / R-val. / p-val. / R-val. / p-val. / R-val. / p-val.
C vs. L / -0.049 / 0.690 / 0.111 / 0.340 / -0.016 / 0.520 / 0.241 / 0.170
C vs. LN / 0.087 / 0.150 / 0.75 / 0.029 / -0.031 / 0.610 / 0.073 / 0.370
C vs. LNS / 0.415 / 0.002 / 0.417 / 0.057 / 0.113 / 0.087 / 0.021 / 0.480
C vs. LS / -0.045 / 0.690 / 0.75 / 0.029 / 0.036 / 0.260 / 0.229 / 0.170
C vs. N / 0.08 / 0.150 / 0.385 / 0.086 / 0.008 / 0.400 / -0.094 / 0.510
C vs. NS / 0.137 / 0.048 / 0.823 / 0.029 / -0.047 / 0.720 / 0.135 / 0.200
C vs. S / 0.134 / 0.057 / 0.542 / 0.029 / 0.131 / 0.064 / 0.135 / 0.064

Table S3 Response of preference group 2 (PG2) to individual and combined stressors. Influence of applied stressors on preference groups assessed with ANOSIM (pairwise test) and 10,000 replicates. Applied stressors (L = decreased flow velocity; LN = decreased flow velocity + enriched salt; LNS = all three stressors; LS = decreased flow velocity + enriched sediment; N = enriched salt; NS = enriched salt + enriched sediment; S = enriched sediment) are compared to the control ( C ).

PG2 / Litter Euk. / Biofilm Euk. / Litter Fungi / Biofilm Fungi
R-val. / p-val. / R-val. / p-val. / R-val. / p-val. / R-val. / p-val.
C vs. L / 0.052 / 0.210 / -0.074 / 0.570 / 0.065 / 0.170 / -0.074 / 0.620
C vs. LN / 0.103 / 0.130 / 0.146 / 0.170 / 0.103 / 0.130 / 0.271 / 0.140
C vs. LNS / 0.002 / 0.430 / 0.042 / 0.450 / 0.027 / 0.310 / 0.021 / 0.450
C vs. LS / -0.027 / 0.600 / 0.052 / 0.400 / -0.011 / 0.500 / 0.625 / 0.029
C vs. N / 0.013 / 0.390 / -0.021 / 0.540 / -0.001 / 0.440 / -0.021 / 0.450
C vs. NS / -0.027 / 0.560 / 0.083 / 0.280 / -0.034 / 0.580 / 0.208 / 0.110
C vs. S / -0.044 / 0.660 / 0.177 / 0.140 / -0.048 / 0.680 / 0.24 / 0.086

Table S4 Response of preference group 3 (PG3) to individual and combined stressors. Influence of applied stressors on preference groups assessed with ANOSIM (pairwise test) and 10,000 replicates. Applied stressors (L = decreased flow velocity; LN = decreased flow velocity + enriched salt; LNS = all three stressors; LS = decreased flow velocity + enriched sediment; N = enriched salt; NS = enriched salt + enriched sediment; S = enriched sediment) are compared to the control ( C ).

PG3 / Litter Euk. / Biofilm Euk. / Litter Fungi / Biofilm Fungi
R-val. / p-val. / R-val. / p-val. / R-val. / p-val. / R-val. / p-val.
C vs. L / -0.055 / 0.720 / -0.167 / 0.850 / -0.033 / 0.620 / 0 / 0.660
C vs. LN / 0.001 / 0.440 / 0.083 / 0.220 / 0.013 / 0.350 / 0.5 / 0.330
C vs. LNS / -0.008 / 0.480 / -0.198 / 0.970 / -0.032 / 0.630 / -0.5 / 1.000
C vs. LS / -0.051 / 0.740 / 0.01 / 0.450 / -0.051 / 0.730 / 0 / 0
C vs. N / -0.077 / 0.890 / 0 / 0.400 / -0.065 / 0.830 / 0.083 / 0.500
C vs. NS / 0.116 / 0.054 / -0.031 / 0.570 / 0.109 / 0.056 / 0 / 0.660
C vs. S / -0.066 / 0.790 / -0.01 / 0.620 / -0.049 / 0.700 / 0 / 0

PERMANOVA results on stressor impact

Table S5 Total community responses on applied stressors. Pseudo-F and p-values according to multifactorial PERMANOVA calculating 10,000 replicates. Compartment and (sub-)community responses are listed for single stressors (L = decreased flow velocity; S = enriched sediment; N = enriched salt) and interactions (“×”) of among stressors. Single stressor responses are visulaizedvisualized in Fig. 5, interactions in Fig. S1.

Compartment / Community / Parameter / L / S / N / L × S / L × N / S × N / L × S × N
Litter / Eukaryotic OTUs / Pseudo-F / 0.831 / 0.753 / 2.458 / 1.044 / 1.803 / 1.117 / 0.738
p / 0.713 / 0.832 / 0.000 / 0.380 / 0.015 / 0.284 / 0.847
Litter / Fungal OTUs / Pseudo-F / 0.750 / 0.743 / 1.831 / 1.038 / 1.688 / 0.955 / 0.652
p / 0.807 / 0.815 / 0.016 / 0.392 / 0.029 / 0.499 / 0.920
Litter / Non-fungal OTUs / Pseudo-F / 1.562 / 1.281 / 8.060 / 1.363 / 0.740 / 0.997 / 1.286
p / 0.099 / 0.213 / 0.000 / 0.172 / 0.706 / 0.422 / 0.202
Tile / Eukaryotic OTUs / Pseudo-F / 2.734 / 4.647 / 15.988 / 1.400 / 0.913 / 2.783 / 0.672
p / 0.025 / 0.004 / 0.000 / 0.222 / 0.468 / 0.027 / 0.644
Tile / Fungal OTUs / Pseudo-F / 0.851 / 2.711 / 1.620 / 0.536 / 0.753 / 1.645 / 1.492
p / 0.495 / 0.040 / 0.151 / 0.749 / 0.569 / 0.151 / 0.186
Tile / Non-fungal OTUs / Pseudo-F / 4.287 / 0.746 / 47.258 / 0.755 / 2.978 / 0.784 / 0.238
p / 0.021 / 0.490 / 0.000 / 0.478 / 0.051 / 0.464 / 0.836

Table S6 Preference group 1 (PG1) responses on applied stressors. Pseudo-F and p-values according to multifactorial PERMANOVA calculating 10,000 replicates. Compartment and (sub-)community responses are listed for single stressors (L = decreased flow velocity; S = enriched sediment; N = enriched salt) and interactions (“×”) of among stressors. Single stressor responses are visualized in Fig. 5, interactions in Fig. S1.