Supplementary Information

Materials and Methods

Statistical analyses - A hypergeometric distribution is generated through a series of Monte Carlo simulations (Fig S1) with an emphasis on two conditions: 1) random sampling without replacement and 2) an outcome being either failure or success.1 If a process under investigation meets the previous two conditions and follows the hypergeometric distribution, the hypergeometric probability function can be used to determine the probability of each case of representation. In our situation, it is used to effectively model the experiment and output the various probabilities of having different subsets of prominent bacterial families with genes encoding enzymes of interest.

The statistical test involving hypergeometric distribution was performed in R using the phyper function2 with lower tail parameter assigned to FALSE for all bacterial groups capable of utilizing or producing the aforementioned compounds. The population size used in the test was 772, which is the number of total bacterial families present on the G2 PhyloChip using the updated taxonomy. The sample set size was the number of distinct bacterial families that were found to be highly abundant in ESRD patients or control individuals. The numbers of all bacterial families in each reference set were obtained from KEGG or reference articles, as described in the main text. The number of bacterial families from the sample set considered to contain a metabolic activity of interest was identified through the use of the relevant reference set.

Results

Taxonomy Update of Bacterial OTU and Families

The recent Greengenes taxonomy3 was used to update the annotations of OTUs found to be significantly increased in ESRD patients.1 Previously annotated OTUs that were part of the following families were reclassified during the reannotation (applied to all OTU for consistency of reporting): Brachybacterium, Catenibacterium, and Pepostreptococcaceae (re-annotated with 100% identity to be within Dermabacteraceae, Coprobacillaceae, and Clostridaceae respectively). The OTUs previously classified in families called Nesterenkonia_FM and Thiotrix_FM are now assigned to Micrococcaceae and Thiotrichaceae families, respectively. From the reannotation, we also were able to uncover additional taxa from 13 previously unclassified OTUs (Table 1). Due to this, the total number of significantly abundant bacterial families in ESRD group increased to 19 with the addition of Beutenbergiaceae, Cellulomonadaceae, OM60, SUP05, Verrucomicrobiaceae, and Xanthomonadaceae families. Similarly, three of the four OTUs that had two- to three-fold lower average abundances in the ESRD samples were also reclassified: Sutterellaceae (1 OTU) and Bacteroidaceae (2 OTU) are now classified as Alcaligenaceae and Prevotellaceae families, respectively. See Table S1 for a complete list of families after reannotation.

References

1. Kroese DP, Taimre T, Botev ZI.Handbook of Monte Carlo Methods,New York, USA: Wiley Series in Probability and Statistics, John Wiley and Sons; 2011.

2. Kachitvichyanukul V. and Schmeiser B. Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation 1985;22(2):127–145.

3. McDonald D, Price MN,Goodrich J, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 2012;6(3):610–618.

Tables and Figures

Table S1 – Changes to bacterial taxonomic names after re-annotation using a more recent Greengenes taxonomy (gg_12_10).

Bacterial Functional Gene Reference Sets

Table S2 – A set of bacterial families with publically reported strains possessing genes encoding the urease enzyme.

Reported Families with Urease Genes
Acaryochloridaceae
Acetobacteraceae
Acidimicrobiaceae
Aeromonadaceae
Alcaligenaceae
Alcanivoracaceae
Alicyclobacillaceae
Alphaproteobacteria unclassified
Alteromonadaceae
Aquificaceae
Bacillaceae
Bacteroidales unclassified
Beijerinckiaceae
Betaproteobacteria unclassified
Bifidobacteriaceae
Blattabacteriaceae
Bradyrhizobiaceae
Brucellaceae
Burkholderiaceae
Burkholderiales unclassified
Caldilineaceae
Campylobacteraceae
Catenulisporaceae
Cellulomonadaceae
Chamaesiphonaceae
Chitinophagaceae
Chromatiaceae
Chroococcaceae
Clostridiaceae
Comamonadaceae
Conexibacteraceae
Corynebacteriaceae
Cyanobacteriaceae
Cystobacteraceae
Cystobacterineae
Cytophagaceae
Deinococcaceae
Dermabacteraceae
Dermocarpellaceae
Desulfovibrionaceae
Ectothiorhodospiraceae
Enterobacteriaceae
Family XI. Incertae Sedis
Family XVII. Incertae Sedis
Flavobacteriaceae
Francisellaceae
Frankiaceae
Geodermatophilaceae
Gomontiellaceae
Gordoniaceae
Hahellaceae
Haliangiaceae
Halomonadaceae
Helicobacteraceae
Herpetosiphonaceae
Hydrococcaceae
Hyphomicrobiaceae
Hyphomonadaceae
Kineosporiaceae
Lactobacillaceae
Magnetococcaceae
Merismopediaceae
Methylobacteriaceae
Methylococcaceae
Methylophilaceae
Micrococcaceae
Microcystaceae
Micromonosporaceae
Moraxellaceae
Mycobacteriaceae
Mycoplasmataceae
Myxococcaceae
Nakamurellaceae
Neisseriaceae
Nitrosomonadaceae
Nocardiaceae
Nocardiopsaceae
Nostocaceae
Oceanospirillaceae
Opitutaceae
Oscillatoriaceae
Oxalobacteraceae
Paenibacillaceae
Pasteurellaceae
Peptococcaceae
Phormidiaceae
Phycisphaeraceae
Phyllobacteriaceae
Piscirickettsiaceae
Planctomycetaceae
Polyangiaceae
Porphyromonadaceae
Promicromonosporaceae
Propionibacteriaceae
Pseudanabaenaceae
Pseudoalteromonadaceae
Pseudomonadaceae
Pseudonocardiaceae
Psychromonadaceae
Puniceicoccaceae
Rhizobiaceae
Rhodobacteraceae
Rhodocyclaceae
Rhodospirillaceae
Rikenellaceae
Rivulariaceae
Ruminococcaceae
Segniliparaceae
Shewanellaceae
Sphingomonadaceae
Staphylococcaceae
Streptococcaceae
Streptomycetaceae
Streptosporangiaceae
Synechococcaceae
Teredinidae
Thermoanaerobacterales Family III. Incertae Sedis
Thermodesulfobacteriaceae
Thermomicrobiaceae
Thermomonosporaceae
Tsukamurellaceae
Vibrionaceae
Xanthobacteraceae
Xanthomonadaceae
Xenococcaceae

Table S3 – A set of bacterial families with publically reported strains possessing genes encoding the uricase enzyme.

Reported Families with Uricase Genes
Acidobacteriaceae
Alicyclobacillaceae
Bacillaceae
Catenulisporaceae
Cellulomonadaceae
Chitinophagaceae
Deinococcaceae
Dermabacteraceae
Frankiaceae
Geodermatophilaceae
Glycomycetaceae
Kineosporiaceae
Methylobacteriaceae
Microbacteriaceae
Micrococcaceae
Micromonosporaceae
Mycobacteriaceae
Nakamurellaceae
Nocardiaceae
Nocardioidaceae
Nocardiopsaceae
Paenibacillaceae
Phycisphaeraceae
Planctomycetaceae
Polyangiaceae
Propionibacteriaceae
Pseudonocardiaceae
Rhizobiaceae
Rubrobacteraceae
Solibacteraceae
Streptomycetaceae
Streptosporangiaceae
Trueperaceae
Xanthobacteraceae

Table S4 – A set of bacterial families with publically reported strains possessing a gene encoding the tryptophanase enzyme.

Reported Families with Tryptophanase Gene
Acidobacteria unclassified
Aeromonadaceae
Bacteroidaceae
Brachyspiraceae
Bradyrhizobiaceae
Burkholderiaceae
Burkholderiales unclassified
Caldilineaceae
Clostridiaceae
Clostridiales unclassified
Comamonadaceae
Corynebacteriaceae
Cystobacterineae
Cytophagaceae
Desulfovibrionaceae
Elusimicrobiaceae
Enterobacteriaceae
Family XVIII. Incertae Sedis
Flavobacteriaceae
Fusobacteriaceae
Halanaerobiaceae
Hyphomicrobiaceae
Hyphomonadaceae
Ignavibacteriaceae
Intrasporangiaceae
Methylocystaceae
Micromonosporaceae
Myxococcaceae
Natranaerobiaceae
Neisseriaceae
Nocardioidaceae
Opitutaceae
Pasteurellaceae
Peptococcaceae
Planctomycetaceae
Porphyromonadaceae
Prevotellaceae
Propionibacteriaceae
Pseudonocardiaceae
Rhizobiaceae
Rhodobacteraceae
Rhodocyclaceae
Rhodospirillaceae
Rhodothermaceae
Rikenellaceae
Saprospiraceae
Shewanellaceae
Solibacteraceae
Spirochaetaceae
Streptomycetaceae
Synergistaceae
Syntrophaceae
Thermaceae
Thermoanaerobacteraceae
Thermodesulfobiaceae
Thermotogaceae
Verrucomicrobiaceae
Vibrionaceae

Table S5 – A set of bacterial families reported in published journals to be capable of producing p-Cresol in the gut of animals.

Reported Families Capable of p-Cresol Production
Bacteroidaceae
Clostriadiaceae
Enterobacteriaceae
Lacctobacillaceae

Table S6 – A set of bacterial families with publically reported strains possessing a gene encoding the phosphotransbutyrylase enzyme.

Reported Families with Phosphotransbutyrylase Gene
Acetobacteraceae
Acidithiobacillaceae
Alphaproteobacteria unclassified
Bacillaceae
Bacillales Family XII. Incertae Sedis
Bacteroidaceae
Beijerinckiaceae
Bradyrhizobiaceae
Burkholderiaceae
Caulobacteraceae
Chlorobiaceae
Chloroflexaceae
Clostridiaceae
Comamonadaceae
Cystobacterineae
Deferribacteraceae
Deinococcaceae
Desulfomicrobiaceae
Desulfovibrionaceae
Ectothiorhodospiraceae
Elusimicrobiaceae
Enterococcaceae
Erysipelotrichaceae
Eubacteriaceae
Geobacteraceae
Halanaerobiaceae
Halobacteroidaceae
Lachnospiraceae
Lactobacillaceae
Listeriaceae
Methylobacteriaceae
Myxococcaceae
Natranaerobiaceae
Peptococcaceae
Peptostreptococcaceae
Phyllobacteriaceae
Planctomycetaceae
Porphyromonadaceae
Prevotellaceae
Rhodobacteraceae
Rhodospirillaceae
Spirochaetaceae
Staphylococcaceae
Synergistaceae
Syntrophaceae
Thermaceae
Thermoanaerobacteraceae
Thermoanaerobacterales Family III. Incertae Sedis
Thermodesulfobiaceae
Thermotogaceae
Vibrionaceae

Table S7 – A set of bacterial families with publically reported strains possessing a gene encoding the butyrate kinase enzyme.

Reported Families with Butyrate Kinase Gene
Acidaminococcaceae
Bacillaceae
Bacillales Family XII. Incertae Sedis
Bacteroidaceae
Caldisericaceae
Clostridiaceae
Coriobacteriaceae
Cystobacterineae
Deferribacteraceae
Deinococcaceae
Desulfomicrobiaceae
Desulfovibrionaceae
Elusimicrobiaceae
Enterobacteriaceae
Enterococcaceae
Erysipelotrichaceae
Eubacteriaceae
Family XVIII. Incertae Sedis
Fusobacteriaceae
Geobacteraceae
Halanaerobiaceae
Halobacteroidaceae
Lachnospiraceae
Lactobacillaceae
Listeriaceae
Myxococcaceae
Natranaerobiaceae
Pelobacteraceae
Peptococcaceae
Peptostreptococcaceae
Porphyromonadaceae
Prevotellaceae
Rhodobacteraceae
Rikenellaceae
Spirochaetaceae
Staphylococcaceae
Synergistaceae
Syntrophaceae
Thermaceae
Thermoanaerobacteraceae
Thermoanaerobacterales Family III. Incertae Sedis
Thermodesulfobiaceae
Thermotogaceae
Vibrionaceae

Figure S1. Probability formula for generating a hypergeometric distribution (N = total number of families designed to be on G2 PhyloChip, m = number of known families identified to possess a certain enzymatic function, n = the number of bacterial families with significant intensity differences between ESRD and control groups, k = number of bacterial families out of 19 found to be significantly more abundant in ESRD patients or 3 determined to show largest increase in control individuals).

Hypergeometric Distribution Probabiilty Function