Measurement Design and Quality Control

Supplementary Material

Measurement Design and Quality Control

194 study samples were analyzed in duplicate (real sample duplicates including sample preparation and chromatographic analysis) in four analysis batches. The first and second batch together contained a total of 194 study samples. The third and fourth batch also contained a total of 194 study samples. Each batch comprised 96-98 study samples, 21 calibration samples, 14 quality control samples and 14 blank samples. Sets of seven calibration samples were run at the beginning, in the middle and at the end of each batch. Pairs of quality control sample and blank sample were run before and after calibration set and every ten study samples. In each batch study samples were run in randomized order, but samples coming from one subject were always run one after another.

Data obtained for quality control samples were used to control within and between batch differences. The relative standard deviation (RSD) was calculated for each metabolite/internal standard ratio (see Supplementary Table 2). For most metabolites the RSD was lower than 10 %. Only five metabolites out of total 106 metabolites were excluded from further data analysis, because their RSD was higher than 30 % in the QC samples and because they were not present in more than 20 % of study samples. Additionally, retention times and metabolite/internal standard ratios obtained for each metabolite were plotted versus sample order to visualize possible trends in their changes. No such clear trends within or between batches were observed for any of the analyzed metabolites. So it was concluded that the general quality of data was good and no correction of data was needed.

Multilevel-PLS-DA procedure

Schematic representation of the most important steps of multilevel-PLS-DA procedure with double cross validation and variable selection is given in Supplementary Figure 4. The presented procedure contains two additional steps compared to the standard double cross validation scheme [1]. Firstly, for each of six comparisons: Adifference vs. Cdifference, Bdifference vs. Cdifference, Adifference vs. Bdifference, Abefore vs. Aafter, Bbefore vs. Bafter and Cbefore vs. Cafter, 20 submodels were built following this procedure. For each of the 20 submodels a different design matrix was used containing information about which samples should be used in the test set, rest set, calibration and training set during the single cross-validation (1CV) and double cross-validation (2CV) loops. For the final interpretation of the model performance, mean diagnostic outcomes and mean variable ranks were calculated on the basis of the 20 submodels (see boxes 15 and 16 in Supplementary Figure 4). Secondly, during the single cross validation loop, also a variable selection procedure was included. Variable selection consisted of five consecutive steps in which a selected number of variables with highest CV1 ranks are taken to further CV1 model building (Supplementary Figure 4)[2]. That results in many CV1 models with five different sets of variables and different numbers of latent variables. After CV1 model optimization the best set of variables and optimal number of latent variables was chosen (box 11 in Supplementary Figure 4).

In the first step of multilevel-PLS-DA procedure data were randomly split into the test set and the rest set with help of a design matrix with randomly permuted rows, which indicated which samples should be used in the test set and which in the rest set (Supplementary Figure 4). The test set was used in evaluation of final model performance and the rest set was further used in the single cross-validation procedure (1CV). Ten-fold double cross validation (2CV) was used, which means that 1/10 of the samples were used as the test set, 9/10 of the samples as the rest set and that 2CV procedure was repeated 10 times. In 1CV the rest set was randomly split into a validation set and a training set. Nine-fold single cross validation was used implying that the validation set comprised 1/9 of samples from the rest set and the training set 8/9 of samples from the rest set, and CV1 loop was repeated 9 times. On the basis of the training set a number of PLS-DA models with different number of PLS components and different sets of selected variables (because of variable selection steps) was built (8 in Supplementary Figure 4).

In the first step of variable selection all variables were included in the model building. On the basis of regression coefficients of variables in CV1 model built on training set, rank products (RP) (box 9) for all variables were calculated. A rank product of 1 is equal to the largest absolute values of regression coefficient. In the second variable selection step variables were reordered according to their rank products and only part of variables (e.g. 1/2 or 1/3) with the lowest rank products were used to build new CV1 PLS models. Then new CV1 rank products were generated and used to reorder and select new sets of variables to the third round of variable selection and so forth.

In each loop of CV1 on the basis of the training set a number of PLS-DA models with different number of PLS components and five different sets of selected variables was built. Those models are employed in prediction of class labels of validation set (box 5A in Supplementary Figure 4) and these prediction results are used to select the optimal number of PLS components as well as the optimal set of variables. In the selection of the optimal number of PLS components as well as the optimal number of variables, the AUROC was used as the optimization criterion. The optimal number of PLS components (box 10) and optimal set of variables (box 11) were further used to build an optimized model (box 12) on rest set (training and validation set together). This optimized model was evaluated by prediction results of the test set (box 3A). After ten repetitions of the double cross validation (10-fold 2CV) procedure, the prediction results for all samples were obtained. These results were contributed to one of the 20 PLS-DA sub-models which are used to calculated mean prediction outcomes of model (box 15) and mean variables ranks products (box 16).

Supplementary figures

Supplementary Figure 1. Data preprocessing for three data analysis approaches. Legend: i, subject; j, metabolite; A, group with treatment A; B, group with treatment B; C, group with treatment C; panel D0, data set before approach specific preprocessing, panel D1, data used in first data analysis approach – data before and after treatment is stacked underneath each other per metabolite j; panel D2, data used in second data analysis approach – mean level of each metabolite j is calculated per subject i and subtracted from level before and after treatment; panel D3, data used in third data analysis approach – difference (δ) between level of metabolite j after and before treatment is calculated per subject i.

Supplementary Figure 2. PCA analysis results. (a) PC1/PC2 scores scatter plot for differences in metabolite profiles before and after intervention for groups A and C (Adifference vs.Cdifference); (b) PC1/PC2 scores scatter plot for differences in metabolite profiles before and after intervention for groups B and C (Bdifference vs. Cdifference); (c) PC1/PC2 loadings scatter plot for differences in metabolite profiles before and after intervention for groups A and C (Adifference vs. Cdifference); (d) PC1/PC2 loadings scatter plot for differences in metabolite profiles before and after intervention for groups B and C (Bdifference vs. Cdifference).

Supplementary Figure 3.Correlation analysis between differences in levels of lipid metabolites before and after intervention (lipidomics profiles) and differences in levels of lipoproteins before and after intervention (lipoprotein profiles) for treatment groups A (A)), B (B)) and C (C)). Lipidomics profiles are composed of 101 lipid metabolites from 8 lipid subclasses e.g. cholesteryl esters (CE) in an order listed in Supplementary Table 2. Lipoproteins profiles include 4 parameters: TAG (total triacylglycerols), TC (total cholesterol), LDL-C and HDL-C. Positive statistically significant correlations are presented in red color without x (dark red p<0.01, light red p<0.05) and negative statistically significant correlations in blue color with x (dark blue p<0.01, light blue p<0.05).

Supplementary Figure 4. Multilevel-PLS-DA procedure with double cross validation and variable selection

Supplementary tables

Supplementary Table 1. List of calibration standards and internal standards used in UPLC-MS Lipidome Platform (the commercial source is given brackets).

Name and Catalog Number / Name / Formula
Calibration Standards
LPC 19:0 (Avanti Polar Lipids cat. no. 855776P) / lysophosphatidylcholine / C27H56NO7P
PG 14:0/14:0 (Avanti Polar Lipids cat. no. 840445P) / diacylphosphatidylglycerol / C34H66NaO10P
PE 15:0/15:0 (Avanti Polar Lipids cat. no. 850704P) / diacylphosphatidylethanolamine / C35H70NO8P
PC 19:0/19:0 (Avanti Polar Lipids cat. no. 850367P) / diacylphosphatidylcholine / C46H92NO8P
TG 15:0/15:0/15:0 (Sigma-Aldrich cat. no. T4257) / triacylglycerol / C48H92O6
Internal Standards
LPC 17:0 (Avanti Polar Lipids cat. no. 855676P) / lysophosphatidylcholine / C25H52NO7P
PG 17:0/17:0 (Avanti Polar Lipids cat. no. 830456P) / diacylphosphatidylglycerol / C40H78NaO10P
PE 17:0/17:0 (Avanti Polar Lipids cat. no. 830756P) / diacylphosphatidylethanolamine / C39H78NO8P
PC 17:0/17:0 (Avanti Polar Lipids cat. no. 850360P) / diacylphosphatidylcholine / C42H84NO8P
TG 17:0/17:0/17:0 (Sigma-Aldrich cat. no. T2151) / triacylglycerol / C54H104O6

Supplementary Table 2. Lipids analyzed by UPLC-MS method.* metabolites excluded from further data analysis by quality control

number / symbol / assignment / number / symbol / assignment
1 / CE02 / CE(18:2) / 51 / PC38 / PC(40:4)
* / CE04 / CE(20:5) / 52 / PE01 / PE(34:2)
2 / CE05 / CE(20:4) / 53 / PE02 / PE(O-36:5)
3 / CE06 / CE(22:6) / 54 / PE03 / PE(36:4)
4 / DG02 / DG(36:3) / 55 / PE05 / PE(O-38:7)
5 / LPC01 / LPC(14:0) / 56 / PE06 / PE(O-38:5)
6 / LPC02 / LPC(O-16:1) / 57 / PE07 / PE(38:6)
7 / LPC03 / LPC(O-16:0) / 58 / PE09 / PE(38:2)
8 / LPC04 / LPC(16:1) / 59 / SM01 / SM(d18:1/14:0)
9 / LPC05 / LPC(16:0) / 60 / SM02 / SM(d18:1/15:0)
10 / LPC07 / LPC(18:3) / 61 / SM03 / SM(d18:1/16:1)
11 / LPC08 / LPC(18:2) / 62 / SM04 / SM(d18:1/16:0)
12 / LPC09 / LPC(18:1) / 63 / SM05 / SM(d18:1/17:0)
13 / LPC10 / LPC(18:0) / 64 / SM06 / SM(d18:1/18:2)
14 / LPC11 / LPC(20:5) / 65 / SM07 / SM(d18:1/18:1)
15 / LPC12 / LPC(20:4) / 66 / SM08 / SM(d18:1/18:0)
16 / LPC13 / LPC(20:3) / 67 / SM09 / SM(d18:1/20:1)
17 / LPC14 / LPC(22:6) / 68 / SM10 / SM(d18:1/20:0)
18 / LPC16 / LPC(20:1) / 69 / SM11 / SM(d18:1/21:0)
19 / LPC17 / LPC(20:0) / 70 / SM12 / SM(d18:1/22:1)
20 / LPE02 / LPE(18:0) / 71 / SM13 / SM(d18:1/22:0)
* / LPE04 / LPE(22:6) / 72 / SM14 / SM(d18:1/23:1)
21 / PC01 / PC(32:2) / 73 / SM15 / SM(d18:1/23:0)
22 / PC02 / PC(32:1) / 74 / SM16 / SM(d18:0/24:2)
23 / PC03 / PC(32:0) / 75 / SM17 / SM(d18:0/24:1)
24 / PC04 / PC(O-34:3) / 76 / SM18 / SM(d18:0/24:0)
25 / PC05 / PC(O-34:2) / 77 / SM19 / SM(d18:0/25:1)
26 / PC06 / PC(O-34:1) / 78 / SM20 / SM(d18:0/25:0)
27 / PC07 / PC(34:4) / 79 / TG18 / TG(40:6)
28 / PC08 / PC(34:3) / 80 / TG17 / TG(40:7)
29 / PC09 / PC(34:2) / 81 / TG03 / TG(42:0)
30 / PC10 / PC(34:1) / 82 / TG05 / TG(44:1)
31 / PC12 / PC(O-36:6) / 83 / TG09 / TG(46:1)
32 / PC13 / PC(O-36:5) / 84 / TG13 / TG(48:2)
33 / PC14 / PC(O-36:4) / 85 / TG26 / TG(50:3)
34 / PC15 / PC(O-36:3) / 86 / TG40 / TG(52:4)
35 / PC16 / PC(O-36:2) / 87 / TG10 / TG(46:0)
36 / PC17 / PC(36:6) / 88 / TG14 / TG(48:1)
* / PC18 / PC(36:5) / 89 / TG52 / TG(54:5)
37 / PC19 / PC(36:4) / 90 / TG28 / TG(50:2)
38 / PC20 / PC(36:3) / 91 / TG41 / TG(52:3)
39 / PC21 / PC(36:2) / 92 / TG53 / TG(54:4)
40 / PC22 / PC(36:1) / * / TG15 / TG(48:0)
41 / PC23 / PC(O-38:7) / 93 / TG29 / TG(50:1)
42 / PC25 / PC(O-38:5) / 94 / TG66 / TG(56:5)
43 / PC26 / PC(O-38:4) / 95 / TG60 / TG(55:2)
44 / PC27 / PC(38:7) / 96 / TG42 / TG(52:2)
45 / PC28 / PC(38:6) / 97 / TG54 / TG(54:3)
46 / PC30 / PC(38:4) / * / TG35 / TG(51:1)
47 / PC31 / PC(38:3) / 98 / TG31 / TG(51:4)
48 / PC32 / PC(38:2) / 99 / TG30 / TG(50:0)
49 / PC35 / PC(40:8) / 100 / TG44 / TG(52:1)
50 / PC37 / PC(40:6) / 101 / TG55 / TG(54:2)

Supplementary Table 3. Quality control data. Relative standard deviations (RSDs) calculated for all quallity control samples for individual metabolites/internal standard ratios. Symbols: * – metabolites excluded from further data analysis, † – metabolites with RSD higher than 25%

metabolite / RSD (%) / metabolite / RSD (%) / metabolite / RSD (%)
CholE02 / 26.15 / PC16 / 5.72 / SM12 / 12.29
CholE04* / 53.76 / PC17 / 17.63 / SM13 / 10.16
CholE05 / 19.49 / PC18* / 42.35 / SM14 / 11.43
CholE06 / 24.03 / PC19 / 4.01 / SM15 / 8.63
DG02 / 22.93 / PC20 / 3.90 / SM16 / 9.27
LPC01 / 5.16 / PC21 / 4.51 / SM17 / 9.73
LPC02 / 11.47 / PC22 / 4.09 / SM18 / 5.73
LPC03 / 10.03 / PC23 / 22.66 / SM19 / 14.78
LPC04 / 4.99 / PC25 / 8.87 / SM20 / 18.59
LPC05 / 6.36 / PC26 / 23.51 / TG18 / 14.99
LPC07 / 6.57 / PC27 / 15.37 / TG17 / 15.55
LPC08 / 3.94 / PC28 / 5.07 / TG03 / 26.22
LPC09 / 4.70 / PC30 / 3.61 / TG05 / 19.66
LPC10 / 14.80 / PC31 / 5.09 / TG09 / 16.06
LPC11 / 7.84 / PC32 / 9.26 / TG13 / 13.38
LPC12 / 4.20 / PC35 / 14.12 / TG26 / 5.57
LPC13 / 11.03 / PC37 / 4.70 / TG40 / 5.16
LPC14 / 4.43 / PC38 / 16.40 / TG10† / 30.86
LPC16† / 25.15 / PE01 / 14.25 / TG14 / 18.87
LPC17† / 27.90 / PE02 / 13.58 / TG52 / 14.74
LPE02 / 22.87 / PE03 / 14.90 / TG28 / 13.15
LPE04* / 32.31 / PE05 / 11.99 / TG41 / 10.72
PC01 / 14.87 / PE06 / 8.02 / TG53 / 5.57
PC02 / 4.19 / PE07 / 8.00 / TG15* / 38.25
PC03 / 3.59 / PE09 / 10.74 / TG29 / 7.68
PC04 / 5.00 / SM01 / 15.71 / TG66 / 10.22
PC05† / 29.78 / SM02† / 26.38 / TG60 / 7.64
PC06 / 12.34 / SM03 / 13.91 / TG42 / 12.94
PC07 / 14.92 / SM04 / 8.15 / TG54 / 7.89
PC08 / 6.00 / SM05 / 4.88 / TG35* / 39.80
PC09 / 8.96 / SM06† / 30.22 / TG31 / 24.82
PC10 / 4.82 / SM07 / 5.17 / TG30* / 32.08
PC12 / 15.99 / SM08 / 9.77 / TG44 / 7.43
PC13 / 6.49 / SM09 / 6.37 / TG55 / 7.76
PC14 / 7.34 / SM10 / 10.01
PC15 / 16.24 / SM11 / 10.44

Supplementary Table 4. Performance of multilevel-PLS-DA models.