Online Supplemental Material
Diabetes Associated Metabolic Perturbations in NOD Mice
*Dmitry Grapov1, *Johannes Fahrmann1, Jessica Hwang2, Ananta Poudel2, Junghyo Jo3, Vipul Periwal3, Oliver Fiehn1, and Manami Hara2
1NIH West Coast Metabolomics Center, University of California Davis, Davis, California;
2Department of Medicine, The University of Chicago, Chicago, Illinois; 3Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
*contributed equally to this manuscript
Correspondence to: Manami Hara, D.D.S., Ph.D., Department of Medicine,
The University of Chicago, 5841 South Maryland Avenue, MC1027, Chicago, IL, USA.
Tel: (773) 702-3727. Fax: (773) 834-0486. Email: .
Supplemental methods
Metabolomic Analysis
Analyses were conducted on 15 µl plasma aliquots, which were extracted with 1 ml of degassed acetonitrile:isopropanol:water (3:3:2) at −20°C, centrifuged, the supernatant removed and solvents evaporated to dryness under reduced pressure. To remove membrane lipids and triglycerides, dried samples were reconstituted with acetonitrile/water (1:1), decanted and taken to dryness under reduced pressure. Internal standards, C8–C30 fatty acid methyl esters (FAMEs), were added to samples and derivatized with methoxyamine hydrochloride in pyridine and subsequently by MSTFA (Sigma-Aldrich) for trimethylsilylation of acidic protons.
A Gerstel MPS2 automatic liner exchange system (ALEX) was used to eliminate sample cross-contamination during the GC-TOF analysis. One microliter of sample was injected at 50°C (ramped to 250°C) in splitless mode with a 25 sec splitless time. An Agilent 6890 gas chromatograph (Santa Clara, CA) was used with a 30 m long, 0.25 mm i.d. Rtx5Sil-MS column with 0.25 µm 5% diphenyl film; an additional 10 m integrated guard column was used (Restek, Bellefonte PA) (Weckwerth et al. 2004; Fiehn 2008; Kind et al. 2007). The chromatographic gradient consisted of a constant flow of 1 ml/min, ramping the oven temperature from 50°C for to 330°C over 22 min. Mass spectrometry was done using a Leco Pegasus IV time of flight mass (TOF) spectrometer, 280°C transfer line temperature, electron ionization at −70 V and an ion source temperature of 250°C. Mass spectra were acquired at 1750 V detector voltage at m/z 85–500 with 20 spectra/sec.
Acquired spectra were further processed using the BinBase database (Fiehn et al. 2005; Scholz and Fiehn 2007). Briefly, output results (Kind et al. 2007) were filtered based on multiple parameters to exclude noisy or inconsistent peaks. All entries in BinBase were matched against the Fiehn mass spectral library of 1,200 authentic metabolite spectra using retention index and mass spectrum information or the NIST11 commercial library.
Data Analysis
A chi-squared test was used to confirm independence between diabetic outcome, gender and age (p>0.05). The false discovery rate associated with the multiple hypotheses tested (n=476) was adjusted for according to Benjamini and Hochberg (Benjamini and Hochberg 1995) allowing a maximum 5% probability (q= 0.05) of false positives and FDR was also independently estimated (Strimmer 2008) and reported as the “q-value”.
Orthogonal signal correction partial least squares discriminant analysis (O-PLS-DA) (Svensson O. 2002) was implemented on gender and age adjusted, logarithm transformed and autoscaled metabolomic measurements. Model testing and feature selection was used to identify and validate the top 10% metabolic discriminants of diabetic from non-diabetic animals. The full sample set (n=76) was split between 2/3 training and 1/3 test sets while conserving the proportion of diabetic and non-diabetic samples in each set. Training data was used to carry out feature selection and model optimization, and final model performance was validated using the test data. The training set was used to select model latent variable (LV) and orthogonal LV (OLV) number based on leave-one-out cross-validation. A preliminary 1 LV and 1 OLV model was developed using the full variable set (n=476), and its loadings and scores were used for feature selection to identify the top ~10% (n=48) of all metabolic determinants of the T1DM phenotype. Features (metabolites) were selected based on significant correlation of raw values with model scores (Spearmans, padj ≤ 0.05) (Wiklund et al. 2008) and absolute value of model loadings on LV1 ≥ 90th quantile (Palermo et al. 2009). The top 10% feature model was evaluated using internal training and testing cross-validation, permutation testing and final model performance was estimated through the prediction of class labels for the initially held out test data. Internal training and testing was done by further splitting the initial training set into 2/3 pseudo-training and 1/3 pseudo-test sets. This was randomly repeated 100 times and used to estimate distributions for the model performance statistics, fit to training data (Q2) and root mean squared error of prediction (RMSEP). The probability of achieving the model's predictive performance by chance was tested by comparing the Q2 and RMSEP distributions to those of the NULL hypothesis, generated using 100 randomly permuted models (random class labels) calculated using identical procedures used for the true model. The selected feature set and final model performance was validated by predicting the class labels for the held out test set, and used to estimate model sensitivity, specificity and the area under the receiver operator characteristic curve.
A biochemical and chemical similarity network was developed for all measured and annotated metabolites (n = 188, as reported by BinBase). Biochemical interactions were determined based on product-precursor relationships defined in the KEGG RPAIR database. Molecular structural similarities were used to determine relationships between molecules not participating in direct biochemical transformations (Barupal et al. 2012). Structural relationships were defined based on Tanimoto similarity coefficients (> 0.7) which were calculated based on PubChem Substructure Fingerprints (Cao et al. 2008).
A partial correlation network was calculated to analyze relationships between the top 10% O-PLS-DA feature set and all significant T1DM-associated perturbations in known metabolites. Direct estimation of partial correlations for data with variable number exceeding the sample size can be computationally intractable or fail to converge due to unreliable estimates of variables’ joint probability distributions (Castelo and Roverato 2009; Whittaker 1990). To ensure convergence of the maximum likelihood estimates of the covariance matrix, which necessary for the calculation of partial correlations, all possible pairwise connections were reduced to the strongest associations. This was achieved using reduced order or q-order partial correlations (Castelo and Roverato 2009) which have been successfully applied to the analysis of high-dimensional metabolomic data sets (Oresic et al. 2012). Q-order partial correlations (q=1, 19, 38, 56) were calculated using 1000 replications, and used to estimate the average non-rejection rate (β) for all pairwise relationships. Analysis of the relationship between vertex number, edge degree and β was used to select β=0.4 for edge acceptance, which retained 81 metabolites with an average of 1.3 connections per species or 108 pairwise relationships. Using this approach smaller average non-rejection rates are proportional to stronger q-order partial correlations. Coefficients of partial correlation, p-values and FDR adjusted p-values (Benjamini and Hochberg 1995) were calculated for all q-order selected relationships. This procedure was used to identify 99 significant (92% of retained q-order edges) and conditionally independent pairwise relationships (padj≤0.05) between significantly changed metabolites and top predictors for T1DM.
Network mapping was used to encode statistical and multivariate modeling results to the network edge and vertex properties. Edges, representing metabolite relationships, are used to display the type (color) and strength of the relationship (thickness) between any two species. Vertices, representing metabolites, are used to encode the significance and direction of change in metabolites (color) in diabetics relative to non-diabetic animals. Vertex border and shape identify the top 10% selected species and their biochemical domains, while size is used to encode each species contribution to the multivariate discrimination model (O-PLS-DA loadings on LV 1 for the full feature model).
Supplemental Table S1. Characteristics* of NOD mice for metabolomic studies.
non-diabetic / diabeticFemale / 23 / 7
Male / 30 / 16
Age (wks)† / 24 (8,40) / 32 (6,40)
Weight (g) / 26.5 ± 5.2 / 20 ± 7‡
Glucose(mg/dL) / 128 ± 21 / 557 ± 95‡
* values reported as mean ± standard deviation unless otherwise noted
† values reported median (minimum, maximum)
‡ unpaired two-sample t-test p-value ≤ 0.05
Supplemental Table S2. Summary* of metabolic perturbations associated with type 1 diabetes.
Biochemical domain† (No)‡ / non diabetic / diabetic / FC§ / padj||Amino acid metabolism (28) / 1103800 ± 360500 / 1011900 ± 232300 / 0.92 / 0.1894
Carbohydrate metabolism (22) / 299960 ± 137100 / 674760 ± 221000 / 2.25 / 1.87E-07
Energy metabolism (3) / 3847.9 ± 1643 / 5632.2 ± 1616 / 1.46 / 0.00019
Lipid metabolism (11) / 96300 ± 49770 / 260660 ± 281100 / 2.71 / 0.012149
Nucleotide metabolism (5) / 14219 ± 7229 / 25099 ± 16260 / 1.77 / 0.006487
Organic acids (7) / 9170.9 ± 5807 / 16417 ± 7752 / 1.79 / 0.000587
Other (15) / 30741 ± 11650 / 42248 ± 11950 / 1.37 / 0.000587
Unknown (140) / 374580 ± 122700 / 689720 ± 199700 / 1.84 / 3.78E-07
* values are reported as the mean ± standard deviation of the sum and only include significantly changed metabolites. See Supplemental Table S3 for a description of all metabolites.
† defined based on IDEOM database (Creek et al. 2012)
‡ number of significantly altered metabolites (padj≤0.05)
§ fold change of means in diabetic compared to non-diabetic animals
|| false discovery rate adjusted p-value
Supplemental Table S32. Summary statistics for all measured* known and unknown metabolites.
metabolite / non-diabetic† / diabetic / FC‡ / padj§ / q-value||Amino Acid Metabolism
glycine / 166000 ± 67000 / 152000 ± 68000 / 0.9 / 9.00E-01 / 2.69E-01
alanine / 205000 ± 130000 / 264000 ± 210000 / 1.3 / 4.22E-01 / 1.26E-01
lysine / 121000 ± 49000 / 65700 ± 37000 / 0.5 / 8.69E-04 / 2.58E-04
glutamine / 90100 ± 64000 / 105000 ± 59000 / 1.2 / 2.09E-01 / 6.14E-02
ornithine / 40100 ± 15000 / 20600 ± 18000 / 0.5 / 8.76E-06 / 2.60E-06
tryptophan / 98000 ± 37000 / 72000 ± 28000 / 0.7 / 4.04E-01 / 1.21E-01
tyrosine / 65500 ± 28000 / 56100 ± 45000 / 0.9 / 3.99E-02 / 1.19E-02
urea / 460000 ± 420000 / 297000 ± 160000 / 0.6 / 1.43E-02 / 4.27E-03
histidine / 29300 ± 12000 / 22400 ± 10000 / 0.8 / 4.83E-02 / 1.43E-02
proline / 42300 ± 24000 / 62100 ± 43000 / 1.5 / 1.48E-01 / 4.39E-02
threonine / 13200 ± 7900 / 18000 ± 16000 / 1.4 / 1.57E-01 / 4.68E-02
ethanolamine / 5650 ± 3300 / 7830 ± 8200 / 1.4 / 9.75E-01 / 2.91E-01
sarcosine / 21700 ± 9000 / 13000 ± 8500 / 0.6 / 4.83E-04 / 1.44E-04
2-ketoisocaproic acid / 1210 ± 1400 / 2370 ± 2000 / 2 / 2.52E-02 / 7.50E-03
homoserine / 304 ± 120 / 254 ± 300 / 0.8 / 3.65E-02 / 1.09E-02
kynurenine / 321 ± 250 / 362 ± 130 / 1.1 / 1.66E-01 / 4.96E-02
phosphoethanolamine / 1620 ± 720 / 1400 ± 610 / 0.9 / 4.87E-01 / 1.45E-01
isoleucine / 32900 ± 18000 / 63900 ± 36000 / 1.9 / 1.87E-02 / 5.55E-03
shikimic acid / 229 ± 370 / 308 ± 610 / 1.4 / 3.47E-01 / 1.03E-01
N-acetylglutamate / 710 ± 450 / 460 ± 300 / 0.6 / 5.66E-02 / 1.67E-02
creatinine / 8800 ± 5200 / 5710 ± 3900 / 0.6 / 1.13E-02 / 3.35E-03
pantothenic acid / 1380 ± 1200 / 4350 ± 2000 / 3.1 / 1.17E-07 / 3.46E-08
indole-3-acetate / 2190 ± 1000 / 2850 ± 1200 / 1.3 / 6.73E-02 / 2.01E-02
3-hydroxypropionic acid / 1560 ± 620 / 2710 ± 1700 / 1.7 / 2.82E-03 / 8.42E-04
trans-4-hydroxyproline / 4410 ± 1800 / 2760 ± 1700 / 0.6 / 3.51E-04 / 9.92E-05
homocystine / 400 ± 380 / 713 ± 380 / 1.8 / 3.80E-04 / 1.10E-04
oxoproline / 121000 ± 41000 / 69500 ± 69000 / 0.6 / 6.65E-05 / 1.97E-05
indole-3-lactate / 1820 ± 1800 / 3960 ± 3400 / 2.2 / 1.14E-03 / 3.41E-04
hydrocinnamic acid / 1240 ± 1600 / 2930 ± 2600 / 2.4 / 2.28E-03 / 6.80E-04
5-methoxytryptamine / 27800 ± 9800 / 20400 ± 8700 / 0.7 / 2.25E-01 / 6.64E-02
valine / 96100 ± 53000 / 193000 ± 120000 / 2 / 7.85E-03 / 2.33E-03
serine / 40600 ± 21000 / 72800 ± 81000 / 1.8 / 1.78E-01 / 5.30E-02
phenylalanine / 17500 ± 8200 / 24100 ± 10000 / 1.4 / 1.10E-01 / 3.26E-02
phenylacetic acid / 131 ± 110 / 300 ± 290 / 2.3 / 1.02E-03 / 3.04E-04
methionine / 8300 ± 5300 / 9630 ± 6000 / 1.2 / 5.05E-01 / 1.50E-01
isothreonic acid / 1140 ± 600 / 1680 ± 700 / 1.5 / 5.84E-03 / 1.72E-03
glutaric acid / 809 ± 460 / 1950 ± 3700 / 2.4 / 8.63E-02 / 2.58E-02
glutamic acid / 10600 ± 5400 / 8690 ± 5800 / 0.8 / 8.32E-02 / 2.47E-02
cysteine / 2710 ± 2000 / 1970 ± 1600 / 0.7 / 3.99E-02 / 1.19E-02
aspartic acid / 5750 ± 2000 / 4760 ± 2500 / 0.8 / 5.05E-02 / 1.50E-02
asparagine / 6060 ± 2400 / 8110 ± 9400 / 1.3 / 7.93E-01 / 2.37E-01
4-hydroxyproline / 4910 ± 3300 / 2660 ± 2000 / 0.5 / 3.46E-02 / 1.02E-02
pipecolic acid / 675 ± 1500 / 1380 ± 2400 / 2 / 9.01E-03 / 2.66E-03
phenylethylamine / 1910 ± 1200 / 2790 ± 2700 / 1.5 / 2.89E-01 / 8.63E-02
N-acetyl-D-tryptophan / 265 ± 110 / 399 ± 220 / 1.5 / 3.72E-02 / 1.11E-02
leucine / 80800 ± 49000 / 168000 ± 110000 / 2.1 / 1.08E-02 / 3.19E-03
cystine / 3180 ± 2100 / 7180 ± 6200 / 2.3 / 1.58E-03 / 4.73E-04