Supplementary methods, Zaura et al.,“On the ecosystemic network of saliva in healthy young adults”

This cross-sectional observational study was carried out within the Top Institute Food and Nutrition theme Oral Health, the project “Novel strategies to promote oral health”. The study was carried out on 268 healthy young adults (aged 18-32 years) after overnight fasting and refraining from all oral hygiene procedures for 24 h. The inclusion and exclusion criteria were as described in the Dutch Trial Register under trial number NTR3649 ( and by Prodan et al (Prodan et al., 2015) and Oliveira et al (Oliveira et al., 2015). Peripheral blood sample was drawn from the antecubital fossa into an EDTA-Vacutainer blood collection tube. Standard cell counts and biochemistry of the blood samples were analyzed in a hospital-based laboratory to confirm that the study individuals presented with blood values within the normal range. The study was conducted according to the Declaration of Helsinki and the study protocol was reviewed and approved by the Medical Ethical Committee of the Academic Medical Centre of Amsterdam (2012_210#B2012406).

Saliva collection and processing

Unstimulated saliva was collected as described previously (Prodan et al., 2015) in two sessions of 5 min with a 5 min interval between the two samples. Sample collection occurred between 9 and 10 am, before any food or drink intake that day, by drooling into a sterile ice-cooled vial. The first sample was immediately aliquotted into two vials and stored at -80°C for microbiological and metabolome analyses, respectively, while the second sample was processed for biochemical analyses (Prodan et al., 2015). These included salivary flow rate measurements, pH and buffered pH measurements, determination of the total protein content, salivary mucins (MUC5B and MUC7), albumin, lactoferrin, secretory IgA, Cystatin S and enzymatic activities of amylase, chitinase and lysozyme and protease activity in saliva as described by Prodan et al.(2015).

In brief, salivary pH was measured immediately after acquisition with a Eutech pH 5+ microelectrode pH meter (Thermo Scientific, USA). The pH buffering capacity of saliva was assayed by mixing 20 μl of saliva sample with 40 μl of 0.005 M HCl, followed by buffered pH measurement.

Total protein content was measured using a Pierce BCA Protein Assay Kit (Thermo Scientific) in 96-well polystyrene microplates (Greiner Bio-One), according to the manufacturer's specifications withbovine serum albumin (BSA) used as a standard. Optical readouts for the BCA assay and for all ELISA assays were obtained using a Multiscan FC microplate photometer (Thermo Scientific).

For the ELISA assays high-binding 96-well polystyrene microplates (Greiner Bio-One) were used.

For mucin quantification, 100 μl of each saliva sample was initially diluted 1:100 in coating buffer (0.1 M Na2CO3; pH 9.6) and 100 μl of the diluted saliva was added to a microplate well. Then, twofold serial dilutions of each saliva sample were made (in coating buffer) and the plates were incubated overnight at 4°C. MUC5B levels were then determined using monoclonal antibody (mAb) F2. MUC7 levels were determined using a polyclonal rabbit antibody, anti-MUC7. All microplates contained a reference sample consisting of pooled saliva from 10 volunteers. The concentrations of MUC5B and MUC7, respectively, were expressed as 1 arbitrary unit (AU) ml−1.

For the albumin assay, microplates were coated with a rabbit polyclonal antibody, anti-(human albumin) (Sigma-Aldrich, USA), overnight at room temperature. Saliva samples were subsequently serially diluted twofold and incubated for 2 h at 37°C. Captured salivary albumin was detected using horseradish peroxidase (HRP)-conjugated rabbit anti-(human albumin) (GeneTex, USA). Human serum albumin (Sigma-Aldrich) was used as a standard.

For the lactoferrin ELISA assay microplates were coated with polyclonal rabbit anti-(human lactoferrin) (Sigma-Aldrich). Purified human lactoferrin (Sigma-Aldrich) was used as a standard. Captured lactoferrin was assayed with HRP-conjugated rabbit anti-(human lactoferrin) (RayBiotech, USA).

For the secretory-IgA assay, microplates were coated overnight at room temperature with monoclonal rabbit anti-(human secretory IgA) (Sigma-Aldrich). The microplates were then incubated with the saliva samples for 2 h at 37°C. Captured secretory-IgA was detected with HRP-conjugated goat anti-(human IgA) (Sigma-Aldrich). A standard of purified human secretory IgA (Nordic-MUbio, Susteren, the Netherlands) was used.

For the Cystatin S assay, samples were coated directly onto microplates overnight at 4°C, together with a pooled saliva reference sample. Cystatin S was detected with a monoclonal mouse anti-(human cystatin S) antibody. Captured antibodies were detected with HRP-conjugated polyclonal rabbit anti-mouse (Sigma-Aldrich).

All fluorescence-based enzymatic activity assays were performed in black 96-well polypropylene microplates (Greiner Bio-One).

Amylase activity was measured using an EnzChek Ultra Amylase Assay kit (Thermo Scientific), according to the manufacturer's specifications. Chitinase activity was quantified by adding 50 μl of saliva to a substrate solution of 4-methylumbelliferyl β-d-N,N′,N′′-triacetylchitotrioside (Sigma-Aldrich) to a final reaction volume of 200 μl and a substrate concentration of 40 nM. The increase in fluorescence was subsequently acquired for 15 min at 37°C.

Lysozyme activity was measured using an EnzChek Lysozyme Activity kit (Thermo Scientific), according to the manufacturer's specifications.

Protease activity was measured based on the cleavage of a fluorescence resonance energy transfer (FRET) substrate. Fluorescence was recorded at 2 min intervals in a BMG Fluostar Galaxy microplate reader (MTX Lab Systems, USA).

All ELISAs and enzymatic activity assays were performed in duplicate.

Dietary assessment

A food frequency questionnaire (FFQ) was developed to assess habitual dietary intake in the previous month. The FFQ was based on consumption data of 20-40 year old participants of the Dutch National Food Consumption Survey of 2007-2010 (van Rossum et al., 2011) and the Dutch food composition database (RIVM 2010).Selection of food items and calculation of the weighted average nutrient composition of the food items were performed using the validated Dutch FFQ-tool™ (Molag, 2010). Foods that contributed at least 0.1% to the total level or to the between-person variation of intake of energy and nutrients of interest were selected. The FFQ was developed to assess intake of energy, macronutrients (protein, fat, linoleic acid, ALA, EPA, DHA, carbohydrates, mono- and disaccharides, polysaccharides, dietary fiber, alcohol), and micronutrients (calcium, vitamin B1, B2, B6, B12, D, C, folate equivalents, and organic acids). Comparable FFQs were validated for energy, fat, and B-vitamins (Feunekes et al., 1993; Siebelink et al., 2011; Verkleij-Hagoort et al., 2007). The final FFQ included questions on frequency and amount of intake of 130 food items. An online version of the FFQ was available for participants to fill in the questionnaires.

16S rRNA gene amplicon sequencing and data processing

For DNA isolation, saliva samples were thawed and homogenized by vortexing. 200 µl of saliva was added to a vial containing 200 µl of lysis buffer (Mag Mini DNA Isolation Kit, LGC ltd, UK), 250 µL zirconium beads (0.1mm; BioSpec products, Bartlesville, OK, USA) and 200 µL of phenol saturated with Tris-HCl (pH 8.0; Carl Roth GMBH, Germany). Samples were placed in a Mini-BeadBeater-96 ( BioSpec products, Bartlesville, OK, USA) for 2 min at 2100 oscillations/min. DNA was extracted using the Mag Mini DNA Isolation Kit.

To determine the amount of bacterial DNA, a quantitative polymerase chain reaction (qPCR) using primers specific for the bacterial 16S rRNA gene was applied (forward: TCCTACGGGAGGCAGCAGT; reverse: GGACTACCAGGGTATCTAATCCTGTT; probe: 6FAM-CGTATTACCGCGGCTGCTGGCAC-BHQ1)(Ciric et al., 2010).

For 16S rDNA amplicon sequencing of the V4 hypervariable region, 1 ng of DNA was amplified as described by Kozichet al.(Kozich et al., 2013) with the exception that 33 cycles were used instead of 35, using F515/R806 primers (Caporaso et al., 2011). Primers included the Illumina adapters and a unique 8-nt sample index sequence key (Kozich et al., 2013). The amount of DNA per sample was quantified using the Quant-iT™ PicoGreen® dsDNA Assay Kit (Thermo Fisher Scientific). The amplicon libraries were pooled in equimolar amounts and purified using the IllustraTM GFXTM PCR DNA and Gel Band Purification Kit (GE Healthcare, Eindhoven, The Netherlands). Amplicon quality and size was analyzed on an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA). Paired-end sequencing of amplicons was conducted on the Illumina MiSeq platform (Illumina, Eindhoven, The Netherlands). The sequence data was processed with mothur v.1.31.2 (Schloss et al., 2009) in line with the mothurMiSeq SOP (Kozich et al., 2013). Before merging the read pairs, low quality regions were trimmed using Btrim(Kong, 2011) with a sliding window size of 5 nt and average quality score of 25. After merging, the sequences were filtered by length (range: 243 – 263), while no ambiguous bases were allowed. The unique sequences were aligned to the bacterial SILVA SEED reference alignment release 102 (available at: sequences were filtered usingscreen.seqs with parameters “optimize=start-end, criteria=90”. Chimeric sequences were identified per sample using UCHIME (Edgar et al., 2011) in de novo mode and removed from all samples.Next, sequences occurring less than 10 times in the entire dataset were removed. Taxonomic names were assigned to all sequences using the Ribosomal Database Project (RDP) naïve Bayesian classifier with a confidence threshold of 60%,1000 iterations (Wang et al., 2007) and the mothur-formatted version of the RDP training set v.9 (trainset9_032012). Sequences were clustered into operational taxonomic units (OTUs) using furthest neighbor hierarchical clustering with a hard cut-off of 3%, after which consensus taxonomy for the OTUs was generated using a confidence threshold of 60%. In addition, the OTU representative sequences were assigned a taxonomic name based on the Human Oral Microbiome Database (Chen et al., 2010). First, BLAST (Altschul et al., 1997) was used to obtain the top 20 matches in the HOMD database (BLASTN against HOMD 16S rRNARefSeq version 14.5 on homd.org; default parameters at that time: -q -5 -r 4 -G 5 -E 5). HOMD names were assigned at species level, if possible, only if the sequence similarity was equal or larger than 98.5% and if the entire query sequence was aligned. Between 95% and 98.5% sequence similarity assignments were done at genus level (or higher). In case of tied top-hits, i.e. the same percentage sequence similarity, the corresponding taxon names were combined, for example as “Veillonellaatypica/Veillonelladispar”. The obtained OTU-table was filtered to remove OTUs occurring at a fraction less than 10-5 and subsampled at equal depth (12000 reads/sample). The data is stored at dedicated database at TNO and is available upon request via

Metabolome assessment

All saliva samples were immediately stored at -80°C. At the time of analysis, samples were extracted, prepared and processed at Metabolon (Durham, NC, USA) as described previously (Evans et al., 2009). Samples were analyzed on three separate platforms: two integrated LC/MS-MS and one GC/MS. The LC/MS-MS portion of the platform was based on a Waters ACQUITY UPLC and a Thermo-Finnigan LTQ mass spectrometer with a linear ion-trap (LIT) mass analyzer front end and a Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometer backend. The sample extract was split into two aliquots, dried, then reconstituted in acidic or basic LC-compatible solvents. One aliquot was analyzed using acidic positive ion optimized conditions (eluted using water and methanol both containing 0.1% formic acid) and the other using basic negative ion optimized conditions (water/methanol, containing 6.5 mM ammonium bicarbonate) on separate, dedicated columns. The MS analysis alternated between MS and data-dependent MS2 scans using dynamic exclusion. The samples destined for GC/MS analysis were re-dried under vacuum desiccation prior to being derivatized with bistrimethyl-silyl-triflouroacetamide (BSTFA). The GC column was 5% phenyl and the temperature ramp was from 40°C to 300°C in a 16 min period. GC/MS samples were analyzed on a Thermo-Finnigan Trace DSQ fast-scanning single-quadrupole mass spectrometer using electron impact ionization. A data normalization step was performed to correct variation resulting from instrument inter-day tuning differences. Each compound was corrected in run-day blocks by registering the medians to equal one (1.00) and normalizing each data point proportionately. Overall process variability was determined by calculating the median Relative Standard Deviation (RSD) for all endogenous metabolites present in 100% of technical replicates of pooled sample. The RSD was 12%. Missing values were assumed to be caused by the compound in question being below the detection limit in the respective samples and were imputed with the minimum detected value for that compound across all samples after the normalization step. Due to differences in ionization potentials between different compounds, normalized peak areas were scaled by the median value for each compound, i.e., each value for a particular compound in a sample was expressed as a multiplication of the median value for that compound across all samples in which it was detected. The normalized metabolome dataset provided by Metabolon was range-scaled between 0 and 10, except for Elastic Net analyses. Metabolites with a single value were omitted from statistical analysis. This resulted in a dataset with 493 metabolites.

Statistical analyses

Spectral clustering (SC) was performed using the neighborhood co-regularized SC algorithm (Imangaliyev et al., 2015; Tsivtsivadze et al., 2013), ( based on the SC method (Luxburg, 2007). For SC on microbiome, the OTU data was pre-processed by removing rare OTUs, i.e., OTUs with a count of less than 5 (per individual) in less than 10 individuals, then sample-wise normalized to a sum of one and feature-wise range-scaled from 0 to 10. For SC on metabolome, the range-scaled metabolome dataset with 493 metabolites without sample-wise normalization was used. Next, per each dataset, a similarity matrix was calculated based on the Euclidean distances between each pair of samples. A co-occurrence matrix was subsequently calculated based on the clustering results, quantifying the tendency of any two samples to fall within the same cluster over many k-means clustering assignments using varying parameters. After defining the number of clusters, examples were assigned to the clusters using a probabilistic decomposition algorithm (ter Braak et al., 2009).

Principal component analysis (PCA), permutational analysis of variance (PERMANOVA), Shannon diversity index and Chao-1 estimate of species richness was calculated using PAST software v. 3.04 (Hammer et al., 2001). Microbiome data was log2-transformed before PCA.

Differences among the clusters in host parameters and in microbiome alpha diversity were assessed using Kruskal-Wallis test and Mann-Whitney test, the differences in gender – by Pearson Chi-square test in SPSS version 21. False Discovery Rate correction of the P values for multiple comparisons was performed in R.

LDA Effective Size (LEfSe) biomarker discovery tool (Segata et al., 2011) was used with one against all strategy for multiclass analysis with factorial Kruskal-Wallis test among classes and logarithmic LDA score >2, P<0.05 (default) to discriminate the SC clusters.

Assessment of significant patterns of microbial co-occurrence or mutual exclusion was performed using CoNetv.1.0b6 (Faust et al., 2012) and visualized in Cytoscape v. 3.3.0. For this, the taxonomic names in theOTU table were made non-redundant by combining OTUs with identical taxonomic species assignment. Separate analyses were performed per each SC cluster, including only taxa that were present in at least 90% of the samples in the respective cluster. An ensemble approach was used, where two measures of correlation (Spearman and Kendall) and two measures of dissimilarity (Bray Curtis and Kullback-Leibler) were combined as described by Faust et al.,(Faust et al., 2012) with a threshold of 0.6 and at least two methods supporting an association. The data matrix was randomized by 100 row-wise permutations and P values were adjusted by Benjamini-Hochberg FDR (Benjamini and Hochberg, 1995)correction, retaining only P<0.05.

High-dimensional feature selection by Significance Analysis of Microarrays (SAM) analysis (Tusher et al., 2001) in TIGR MultiExperiment Viewer (111.tm4.org/mev.html) was used to compare scaled metabolite data among metabolome sample clusters and to assess the differences in host-related SC clusters and microbiome SC clusters by metabolites.

Elastic Net regression, a regularization and variable selection method, was used to select features (OTUs or metabolites) that were related to the host environmental variables. It is a regularized regression model that combines the penalties of the least absolute shrinkage and selection operator (LASSO) and ridge regression methods (Zou and Hastie, 2005). Here, we applied Elastic Net with Stability Selection (Meinshausen and Bühlmann, 2010). Different parameter sets were used, and for each parameter set a model was trained on 80% of the data with ten-fold cross-validation, selecting the most stable features. For the stability selection 100 runs were performed, while the range between maximum stability and minimum stability values for features to be reported was set at 20%. The Elastic Net with Stability Selection was implemented in Python using scikit-learn v.0.16.1 (Pedregosa et al., 2011). The Elastic Net mixing parameter was set at 0.3 to 0.8 in 20 steps. The model's alpha parameter ranged from 10-2 to 102 in 150 steps. The raw metabolome data was filtered for use with Elastic Net as follows: First, all unidentified metabolites and metabolites with more than 25% missing measurements were removed from the raw data. Second, missing values were imputed by the minimum value per metabolite (assuming this is the detection limit). Third, all metabolites were scaled to zero mean and unit variance. Finally, outliers, defined as metabolite measurements ≥3 standard deviations from the mean and individual samples with more than 45 of such metabolites, were removed. This filtered metabolome dataset or the top 400 largest OTUs (scaled to zero mean and unit variance)and a biochemical variable (centered to zero mean) were used as input data for the Elastic Net. For each feature, the stability (max. 1), average weight and standard deviation were reported.

The relation between the microbiome (top 400 OTUs) and the range-scaled metabolome dataset (all 493 metabolites) was assessed using Spearman Correlation with Bonferroni corrected p-values. The concordance between the two datasets was assessed using Procrustes Analysis (Gower, 1975) and the Mantel test (Mantel, 1967) implemented in QIIME (Caporaso et al., 2010) v.1.8.0, both with 10000 random permutations. The PCA coordinates as well as the Bray-Curtis dissimilarities were calculated in PAST using the log2-transformed microbiome data and the range-scaled metabolome data. The network layout was made using Gephi v 0.9.1. (Bastian et al., 2009).