Online resource methods
Can We Trust Untargeted Metabolomics?: Results of the Metabo-Ring initiative, a large-scale, multi-instrument inter-laboratory study
Sample collection and preparation
- Test #1
For 20 hours before sample collection, volunteers were asked to avoid alcohol and medication and to moderate their physical activity. They were instructed to eat a dinner between 7 and 9 pm comprising 2 regular slices of ham, approximately 100 g dry weight of spaghetti with 20 g of butter, and one plain yoghurt with 20 g of refined sugar, accompanied by tap water only. The volunteers were asked to fast until urine collection on the following morning. In the morning, the totality of the bladder was emptied, and the second void was collected in the laboratory in 50 mL Falcon tubes with a screw cap.
The added mixture of 32 standard molecules contained (online resource Table 1) ascorbic acid, citrulline, creatinine, taurine, uric acid, caffeine, glutaric acid, inosine, isoleucine, leucine, pyroglutamic acid, methionine, methylmalonic acid, N-methylhistidine, aminobenzoic acid, phenylalanine, proline, riboflavin, adenine, adenosine, adipic acid, azelaic acid, caffeic acid, tryptophan, tyrosine, uracil, uridine, chenodeoxycholic acid, cholic acid, cortisone, deoxycholic acid, and glycocholic acid. A stock solution was prepared containing all the compounds, each at a final concentration of 0.1 mg/mL. For LCMS analysis, 100 µL of the stock solution was evaporated to dryness under a flux of nitrogen and resuspended with 100 µL of the urine sample. The mixture was centrifuged at 14000 RPM for 10 min at 4°C, and 50 µL of the supernatant was removed and diluted with 150 µL of extrapure water (MilliQ). After centrifugation at 3000 RPM for 10 min at 4°C, an aliquot of the supernatant was transferred to an LC vial for LC-MS analysis. For NMR analysis, 1.5 mL of the standard mix stock solution was dried and rediluted with 500 µL of urine. Prior to analysis, 200 µL of phosphate buffer was added to each of the urine samples (phosphate buffer (pH 7.4): Dissolve 2.885 g anhydrous Na2HPO4, 0.525 g anhydrous NaH2PO4, 17.2 mg sodium 3-trimethylsilylpropionate-d4 (TSP), 60 mg sodium azide (NaN3) in 100 mL H2O/D2O (80%/20% vol/vol),see Beckonert et al, Nature Protocols, 2007, 2, 11, 2692-2703), and the mixture was centrifuged at 14000 RPM and 4°C for 10 min. Five hundred mL of the supernatant was then transferred to an NMR tube. A pre-study using various concentrations of the standard mixture allowed the procedures to be chosen to enable optimal detection, with the concentrations of each compound brought to 25 µg/mL, and 250 µg/mL in the urine samples to be analysed by LCMS andNMR, respectively. A quality control (QC) sample for either the spiked or non-spiked urine and a blank sample were also prepared by pipetting and merging 50 µL of each respective urine sample or by replacing the urine with extrapure water (blank), respectively.
All the samples were prepared in one laboratory and dispatched to all the participants in dry ice. The samples were stored at -80°C until analysis within 1-3 months. Two vials per sample were provided for duplicate analysis.
The urine density was also measured to apply a correction factor whenever necessary (see experimental section in the manuscript and online resource Table 5).
In addition, as specified in the experimental section, an outlier (pregnant woman at mid-gestation) was blindly introduced, and the participants were asked to seek for an outlier individual and to identify and annotate the discriminating features; these are reported in the following online resource Table 4.
- Test #2: Diet composition
see online resource Table 2
- Statistics
RV coefficients
The RV coefficient between the two data tables X1 and X2 was calculated as follows:
Where:
W1=X1*X1T
W2=X2*X2 T
t_W1 W1=trace(W1*W1)
t_W2 W2=trace(W2*W2)
t_W1 W2=trace(W1*W2)
The RV coefficients matrix was calculated based on the 22 data tables obtained from Test #1 and on the 12 data tables obtained from Test #2. For each Test, the RV coefficients indicated the strength of the association between the data tables.For each data Table, the average RV coefficient was calculated as the average of all the RV coefficients between this Table and all other tables. A value close to one indicates that the corresponding Table contains similar information to all the others. To test the significance of the assessment of the RV coefficients, the coefficients were re-calculated after 100 random re-samplings. For each of these, the observations (i.e., human urine samples for Test #1 and plasma rat samples for Test #2) were ranked in a random order independently in all data tables. Therefore, the assumption was that the RV coefficients calculated on randomly ordered samples would be close to zero and much lower than the RV coefficients calculated on the original data. For each instrument, the distribution of the 100 random RV coefficients was graphically represented using boxplots (Figure 1). The average RV coefficients within the NMR and LCMS instruments were also compared.
Common components and specific weights analysis (CCSWA or ComDim)
The aim of CCSWA is to investigate the relationships among several data tables. Derived from the consensus or global scores of all the tables, T(:,i), local scores were calculated for each Table, j, and each Common Component, i, by the following procedure:
Local Loadings, LocalLx{j}(i,:), were calculated for each centred and normalised data Table, Table{j}, and each Common Component, i, as:
LocalLx{j}(i,:) =inv( T(:,i)' * T(:,i) ) * T(:,i)' * Table{j};
For common component 1, Table {j} is used as is; for subsequent common components, it is deflated first.
These Local Loadings were then used to calculate the corresponding Local Scores, Local_T(j,:,i)
Temp = ( LocalLx{j} * Table{j}' )';
Local_T(j,:,i) = Temp(:,i) / norm( Temp(:,i) );
A local score represents the projection of individuals (samples) in the common space for a given data Table. By projecting the samples using both global scores and local scores onto the same plot, the dispersion of all the data tables around the consensus scores in the common space can be visualised.
Identification of discriminating features based on the CCSW analysis in Test #1
One of the common components calculated by the CCSW analysis gave complete separation between the non-spiked group and the spiked group. Given the experimental setup, this common component should explain the maximum amount of common variance shared by all instruments (e.g., common spectral information). Thus, the CCSW analysis calculated scores for all samples that represented the common component. For each data Table, correlations between the common component and the features were calculated. Features with a correlation greater than 0.8 were considered as discriminating features. The number of discriminating features selected using the CCSW analysis as well as the list of these features were matched and compared with the discriminating features selected independently by platform operators.
Scaling
Each data Table was scaled independently to provide comparable data between instruments. Because NMR and LCMS instruments provide data with different characteristics, two different scalings were applied. The rationale was to standardise each data Table in order to retain the specific structure of each method while avoiding particular tables that had an effect because they were similar due to data size. The data extracted from the NMR instruments were Pareto-scaled 14. The data produced by the LCMS instruments were first Log10 transformed after the addition of a constant to avoid negative values. Then, the Log10 transformed data were Pareto-scaled.
Data acquisition and post-processing
- Participant N1
Urine samples were prepared by mixing urine (500 µl) with phosphate buffer (200 µl, pH 7.4, 0.2 M) containing 10% D2O as a field frequency lock and sodium trimethylsilyl-2,2,3,3-tetradeuteropropionate (TSP, internal standard) as a chemical shift reference. Buffered urine samples were then centrifuged at 13,000 g for 10 min to remove any precipitates, and aliquots of 0.6 mL were transferred to 5 mm NMR tubes for 1H NMR analysis.
All 1H NMR spectra were acquired at 300 K on a Bruker DRX-600 Avance NMR spectrometer operating at 600.13 MHz for 1H resonance frequency using an inverse detection 5 mm 1H-13C-15N cryoprobe. The spectra were collected using a solvent suppression pulse sequence based on a 1D nuclear Overhauser effect spectroscopy pulse sequence (Trd-90°-t1-90°-tm-90°-Taq) to saturate the residual water signal, first during the relaxation delay (Trd=2s) and second during the mixing time (tm=150 ms). The delay t1 was held constant at 3 s. For each 1D NOESY spectrum, 128 free induction decay values (FIDs) were collected into 32,768 data points using a spectral width of 12,000 Hz with an acquisition time of 1.36 s. For all the spectra, the FIDs were multiplied by an exponential weighting function corresponding to a line broadening of 0.3Hz and zero-filled before Fourier transformation. The acquired NMR spectra were phase and baseline corrected manually and referenced to the TSP signal (=0 ppm).
The urine spectra were data-reduced prior to statistical analysis using AMIX software (Analysis of Mixtures, Bruker, v3.9.11, Karlsruhe, Germany). The spectral region 0.5-10.0 ppm was segmented into consecutive non-overlapping regions of 0.01 ppm (buckets). The region 4.5-6.5 ppm in the urine spectra was excluded from the statistical analysis to eliminate artefacts from residual water and urea resonances. To minimise possible differences in concentration between the samples, each integrated region was normalised to the total spectral area. The values of all variables were Pareto-scaled before PCA (Principal Component Analysis) and PLS-DA (Partial Least Squares Discriminant Analysis) analyses using SIMCA-P software (version 12, Umetrics, Umea, Sweden). Discriminant variables were determined using VIP (Variable Importance in the Projection), and an arbitrary threshold of VIP>1.5 was chosen to select the variables. The spectral assignments were based on matching the 1D and 2D data to reference spectra in a database ( and to reports in the literature.
- Participant N2
Sample preparation:
Phosphate buffer (pH 7.4) was prepared from anhydrous sodium phosphate dibasic (2.885 g),anhydrous sodium phosphate monobasic (0.525 g), sodium 3-trimethylsilylpropionate-d4 (17.2 mg), and sodium azide (60 mg) dissolved in 100 mL of a mixture of water and deuterium oxide (80% :20%; vol : vol). The phosphate buffer solution (200 μL) was added to urine (500 μL), homogenised by vigorous shaking, and centrifuged at 20000 g and 4°C for 10 min. The supernatant (600 μL) was transferred to an NMR tube.
Spectroscopic analysis:
NMR spectra were acquired on a Bruker AVANCE III 600 spectrometer (Magnet system 14.09 T 600 MHz/54 mm operating at 600.17 MHz for 1H) using a TXI 5 mm z-gradients probe and running the TOPSPIN 2.1 (Bruker) software. Automation was controlled by the IconNMR software. The sample temperature was regulated at 300 K with an allowance of approximately 5 min for setting the temperature before data acquisition. Automatic tuning and matching of the probe was carried out for each sample. The samples were not spun. Standard shim settings from a reference file were loaded with each new sample.D2O (from buffer) was used for the field/frequency lock and shims were optimised for each sample using the Topshim programme for gradient shimming (Z1 to Z6) followed by trimming of (Z1, Z2, X, Y) the shims. The 900 pulse length was determined for the first sample and applied to all subsequent samples.
For each sample 1D proton with water suppression, pulse sequence (NOESY 1D) and 2D proton 1H (JRES) were performed [1]. The sequence for the NOESY 1D repeat was –D1-t-90°-t-90°-tm-90°-AQ, where D1 (4 s) is the relaxation delay, 90° is the already determined 90° radio-frequency pulse length, t (4 µs) is a very short delay, tm (0.15 s) is a mixing time delay and AQ (3.90 s) is the data acquisition time. Low power rf irradiation was applied at the water frequency during D1 and tm to pre-saturate the water signal. Spectra were run with a fixed receiver gain (RG=64). Each spectrum consisted of 128 scans and 8 dummy scans of 64K data points with a spectral width of 14 ppm. Furthermore, two-dimensional J-resolved NMR spectra were acquired. The pulse sequence comprises relaxation delay—90°—(t1/4)—180°—(t1/2)—180°—(t1/4)—acquisition, where t1 is an incremented time delay used to create an indirect time axis for the second dimension. 2D 1H JRES NMR spectra were acquired with a 2.0 s relaxation delay using 4 transients per 64 increments that were collected into 32k data points, using spectral widths of 8417.5 Hz along the direct dimension (i.e., chemical shift axis) and of 80 Hz along the indirect dimension (i.e., spin–spin coupling axis) and using excitation sculpting for the water suppression.
On samples (K1, N11, N5), HSQC, 2D H-1/X correlation via double inept transfer using sensitivity improvement, phase sensitive using Echo/Antiecho-TPPI gradient selection with decoupling during acquisition using trim pulses in inept transfer using shaped pulses for all 180 degree pulses on f2 – channel with gradients in back-inept spectra were recorded. The spectra were acquired with a 1.5 s relaxation delay using 32 scans for K1 and 128 scans for N1 and N5 per 256 increments that were collected into 2K data points using spectral widths of 8417.5 Hz in F2 and 30185.3 Hz in F1. All 2D spectra were phased and calibrated at 0.0 ppm to TMSP.
Data analysis:
The time domain signals obtained with the NOESY 1D sequence were Fourier transformed with a 0.3 Hz exponential line broadening factor zero filled to give spectra with 64K real points, phase corrected, baseline corrected (usually only the first two terms of a 5th order polynomial correction were used) and, for urine, referenced to TSP at 0 ppm. Prior to Fourier transformation, where stated, JRES spectra were zero filled to give spectra with 32K real points in the direct dimension and 64K real points in the indirect dimension. Data in the direct and then indirect dimensions were either apodised using SINE apodisation (i.e., multiplied by a sine-bell window function in each dimension) and SEM apodisation (i.e., multiplied by a combined exponential and sine-bell function along the direct dimension and by a sine-bell function along the indirect dimension). Then, the 2D spectra were tilted by 45° and symmetrised. Finally, each spectrum was calibrated by setting TMSP to 0 ppm on the direct and indirect dimensions, respectively. Positive skyline projections of the 2D spectrum along F2 were made. A 1D spectrum (pJRES) was obtained.
The phased, baseline-corrected and referenced 1H NMR spectra (NOESY 1D and pJRES) were automatically reduced to ASCII files using AMIX (Bruker). The spectral intensities (positive) were scaled to TMSP and reduced to integrate regions or ‘‘buckets’’ of equal width (0.04 ppm) corresponding to the region of d 10.5 to 0.0. The region from 4.85 ppm to 4.75 ppm was removed from the analysis due to the residual signal of water. Proton signals corresponding to TSP-d4 (at 0.00 ppm) were also removed. A total of 252 buckets were created over the spectral range. The generated ASCII file was imported into Microsoft Excel (Microsoft Corporation) for the addition of labels. Principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were performed with SIMCA-P software (v. 12.0, Umetrics, Umea, Sweden). The scaling method for PCA was Pareto and was unit of variance for PLS-DA. To test the significance of the differences observed between the spiked and control samples, a Mann-Whitney-Wilcoxon test using R software ( was applied.
Identification of the different metabolites was obtained via data comparison with NMR spectra (Amix database pH 7) and with standard metabolite chemical shift tables from the HMDB database.
Reference:
1.Ludwig C, Viant MR: Two-dimensional J-resolved NMR spectroscopy: review of a key methodology in the metabolomics toolbox. Phytochemical analysis: PCA 2010, 21(1):22-32.
- Participant N3
NMR samples were prepared using 200 l of pH 7.4 200 mM phosphate buffer (H2O/D2O, 80/20 v/v) mixed with 400 l of urine and placed into a 5 mm NMR tube. All the 1D 1H NMR experiments were carried out on a Bruker Avance spectrometer operating at 600 MHz for the 1H frequency using a 5-mm broadband observe probe at 300 K. Spectra were collected using a solvent suppression pulse sequence based on a 1D nuclear Overhauser effect spectroscopy pulse sequence (Trd-90°-t1-90°-tm-90°-Taq) to saturate the residual water signal, first during the relaxation delay (Trd=2s) and second during the mixing time (tm=150 ms). The delay t1 was held constant at 3 s. For each 1D NOESY spectrum, 128 free induction decay values (FIDs) were collected into 32,768 data points using a spectral width of 9,600 Hz with an acquisition time of 1.71 s. For all the spectra, the FIDs were multiplied by an exponential weighting function corresponding to a line broadening of 0.3 Hz and zero-filled before Fourier transformation. The acquired NMR spectra were phase and baseline corrected manually and referenced to the taurine signal (=3.43 ppm).
Before statistical analysis, the spectral region between 0.5 and 10.5 ppm was divided into buckets with variable sizes (one bucket per identified signal when possible; alternatively, 0.04 ppm-width buckets were used) using AMIX software (Bruker Biospin, Karlsruhe, Germany) in order to reduce the residual shifts of the signals due to pH variations. The region 4.77-6.56 ppm in the urine spectra was excluded from the statistical analysis to eliminate artefacts from residual water and urea resonances. To minimise possible differences in concentration between the samples, each integrated region was normalised to the total spectral area.
Principal component analysis based on the correlation matrix was carried out on the NMR dataset using SIMCA P+ v12 software (Umetrics, Umea, Sweden). Buckets with a PC1 coefficient superior to 0.5 were considered as statistically significant for discrimination between the 2 analysed groups.
Assignments were performed using Chenomx software (Chenomx Inc., Canada), 1H-1H TOCSY and 1H-13C HSQC experiments.
- Participant N4
NMR procedure
The urine samples were thawed at room temperature and prepared for 1H NMR spectroscopy by mixing 400 μL urine with 200 μL phosphate buffer (0.2 M Na2HPO4 and 0.038 M NaH2PO4, pH 7.4) prepared with 80% D2O and containing 9 mM sodium azide and 1 mM TSP (3-trimethylsilyl propionate) as a chemical shift reference. The sample was shaken, and 500 μL was transferred into a 5 mm NMR tube for spectral acquisition. 1H NMR spectra were recorded at 600 MHz on a Bruker Avance spectrometer (Bruker BioSpin GmbH, Rheinstetten, Germany) running TOPSPIN 2.0 software and fitted with a cryoprobe and a 60 slot autosampler. The acquisition order was randomised with respect to the collection date and sample codes. Each 1H NMR spectrum was acquired with 128 scans, 90° pulses (10.8 µs), a spectral width of 8389 Hz, an acquisition time of 1.95 s and a relaxation delay of 2.0 s. The noesygppr1d pre-saturation sequence was used to suppress the residual water signal with low power selective irradiation at the water frequency during the relaxation delay and mixing time (0.15 s). The spectra were transformed with zero filling and 1 Hz line broadening, manually phased, baseline corrected and referenced by setting the TSP methyl signal to 0 ppm. Metabolites were identified using information found in the literature [1] or on the web (Human Metabolome Database and by using the 2D-NMR methods COSY, HSQC, and HMBC.