Large-scale comparative metagenomics of Blastocystis, a common member of the human gut microbiome

Running title: Metagenomic analysis of human Blastocystis

Francesco Beghini 1,*, Edoardo Pasolli 1,*, Tin Duy Truong 1, Lorenza Putignani 2, Simone M. Cacciò 3,^, Nicola Segata 1,^

  1. Centre for Integrative Biology, University of Trento, Via Sommarive 9, Trento 38123 Italy
  2. Units of Parasitology and Human Microbiome, Bambino Gesù Children's Hospital and Research Institute, Piazza Sant'Onofrio 4, Rome 00165 Italy
  3. Department of Infectious Diseases, Istituto Superiore di Sanità, Viale Regina Elena 299, Rome 00161 Italy

* Equal contribution.

^ Corresponding authors: ,

Supplementary Tables and Figures

Supplementary Table 1 (see uploaded file). Statistics for the nine considered reference genomes coming from eight different STs. Numbers refer to original genomes and after-screening genomes. Screening was devoted to remove potential bacterial and archaeal contamination.

Supplementary Table 2 (see uploaded file). Prevalence for each Blastocystis subtype in every category considered in Figure 1.

Supplementary Table 3 (see uploaded file). Breadth of coverage and relative abundance of Blastocystis in subjects infected over two timepoints.

Supplementary Table 4 (see uploaded file). Statistics for the 43 genomes associated with different STs reconstructed from the metagenomes.

Supplementary Table 5 (see uploaded file). Description of the 795 ST-specific KOG functions, which were determined after statistical significance testing with false-discovery rate correction. The table reports the KOG functions with an adjusted p-value less than 0.2.

Supplementary Table 6 (see uploaded file). p-values associated with the statistical significance test aimed at finding association between Blastocystis presence and other organisms of the microbiome.

Supplementary Figure 1. Analysis on semi-synthetic data revealed that Blastocystis is detected through the developed methodology when present at a concentration as low as 0.03% in typical metagenomic samples of 50M reads.

Supplementary Figure 2. Blastocystis prevalence at varying DNA extraction procedures.

Supplementary Figure 3. p-values associated with the statistical significance test aimed at finding prevalence of specific STs in different continents. "-" denotes non-statistical significance.

Supplementary Figure 4. GC-content of the reconstructed genomes associated with the four most prevalent ST types.

Supplementary Figure 5. Heatmap reporting the ST- specific KOG functions. Further details on these KOG functions are reported in Supplementary Table 5.

Supplementary Figure 6. The LEfSe analysis conducted on all the taxonomic levels extends the results reported in Figure 6E. Several microorganisms are statistically associated (α = 0.05) with Blastocystis presence at high effect size (threshold at 3.3). These include Clostridiales and Firmicutes that are associated with the presence of Blastocystis whereas Bacteroides and Proteobacteria tend to be more associated with its absence.

Supplementary Figure 7. Additional results than those reported in Figure 6A-D shows that the presence (or absence) of Blastocystis is associated with major differences in the intestinal microbiome.

Supplementary Figure 8. Gini-Simpson and Shannon indexes were considered to estimate the alpha diversity in each dataset under the condition of absence (in red) or presence (in green) of Blastocystis. Only in one case (reported with the blue asterisk) we observed statistical significance between the two conditions.

Supplementary Figure 9: False discovery rate plot. The two plots show the average subtypes detected at a given value of breadth of coverage. The distribution goes from seven (all the STs in addition to the dominant one) to one (only the dominant ST detected), but it is already plateauing at 10% breadth of coverage confirming that such value does not produce false positives.

1