Supplementary Information

Spectral quality determination

We define the measure of quality to be the fraction of b and y ions observed among the peaks of high intensity. More specifically, we define

Quality = (Nb + Ny) / (2*Length – 2),

where Nb and Ny are the number of b and y ion peaks, respectively, whose intensity ranks are less than 100, and Length is the number of amino acids in the peptide. This feature runs from 0.0 to 1.0, from no b and y ion peaks present among the top 100 peaks of a spectrum, to all b and y ions present among the top 100 peaks. The quality measure of spectra, as defined above, can be determined a posteriori, after the database searching algorithm has identified the matching peptide and the corresponding b and y ions.

We determined a predicted quality score (Q) based on two features that have been shown2 to be good predictors of spectral quality: Qdiffs, the likelihood that two peaks in the spectrum differ by the mass of an amino acid, and Qcomplements, the likelihood that a pair of peaks in the spectrum have complementary m/z-values summing to the mass of the parent ion. Both features are weighed by the intensities of the respective peaks, as described below.

Since our purpose is to define a quality score that is independent of the intensity scale or the number of peaks in the spectrum, we introduce a rank-based normalized intensity NormI(x) function as folows:

NormI(x) = max{0, 1 – (20*Rank(x) / Maxmz)},

where Rank(x) is the rank of peak x, and Maxmz is the maximum significant m/z-value in the spectrum. The Maxmz term implicitely normalizes for the length of the peptide: a shorter peptide, such as one generating a charge +1 spectrum, will have a lower Maxmz, therefore a lower number of peaks whose normalized intensity is non-zero. For a typical spectrum, the value of Maxmz is approximately 2000, resulting in zero normalized intensities for all but the top 100 peaks.

Using the normalized intensity function, we define the following quantities:

GoodDiffs = ∑ {NormI(x) + NormI(y) | M(x) – M(y) ≈ Mi, for i = 1, 2, …, 20},

where M(x) is the m/z-value of peak x, M1, M2, …, M20 are the amino acid masses, and the comparison implied by ≈ uses a 0.4 Dalton tolerance,

TotalDiffs = ∑ {NormI(x) + NormI(y) | 56 < M(x) – M(y) < 187},

where 56 and 187 are the lowest and highest amino acid masses, respectively,

GoodComplements = ∑ {NormI(x) + NormI(y) | M(x) + M(y) ≈ Mparent},

where Mparent is the mass of the parent ion, and finally:

AllComplements = ∑ {NormI(x) + NormI(y) | Mparent - 100 < M(x) + M(y) < Mparent + 100}.

Using these quantities, we define the two normalized components of the quality score, Qdiffs and Qcomplements:

Qdiffs = GoodDiffs / TotalDiffs,

and

Qcomplements = GoodComplements / AllComplements.

The overall quality score Q is then calculated as a linear combination of Qdiffs and Qcomplements. Using a training set of confidently identified peptides, the coefficients of the linear combination were obtained by a multivariate linear regression with Qdiffs and Qcomplements as explanatory variables and the a posteriori measure of spectral quality Quality as the response variable. The multivariate linear regression gave Qdiffs and Qcomplements highly significant non-zero coefficients as judged by P-values. The final quality score was obtained as:

Q = 5.3 Qdiffs + 3.6 Qcomplements,

with a correlation coeffient of the linear regression of 0.82.

Supplementary Tables

Supplementary Table 1: Total number of acquired spectra, and the number of identified spectra, peptides, and proteins, as a function of threshold trigger values on the LTQ analyzer. Each experiment was performed in triplicate, and the values of each replicate experiment are listed in the table.

Threshold
value / Number of acquired
spectra / Number of identified spectra / Number of identified peptides / Number of identified proteins
19580 / 8117 / 2150 / 359
1e1 / 19588 / 8210 / 2188 / 363
18869 / 7998 / 2090 / 352
19692 / 8180 / 2200 / 365
1e2 / 19193 / 8004 / 2177 / 371
19613 / 8110 / 2189 / 359
19609 / 8380 / 2470 / 370
1e3 / 18952 / 8255 / 2211 / 361
18900 / 8196 / 2254 / 360
18747 / 8411 / 2471 / 375
1e4 / 17786 / 8280 / 2380 / 370
18322 / 8520 / 2456 / 383
13860 / 7320 / 1858 / 337
1e5 / 12971 / 7112 / 1720 / 330
14167 / 7551 / 1809 / 351
1283 / 855 / 731 / 141
1e6 / 833 / 693 / 680 / 120
1051 / 740 / 720 / 133
30 / 23 / 19 / 0
1e7 / 10 / 8 / 8 / 0
32 / 22 / 20 / 0

Supplementary Table 2: Total number of acquired spectra, and the number of identified spectra, peptides, and proteins, as a function of threshold trigger values on the Orbitrap-LTQ analyzer. Each experiment was performed in triplicate, and the values of each replicate experiment are listed in the table.

Threshold
value / Number of acquired spectra / Number of identified spectra / Number of identified peptides / Number of identified proteins
16231 / 8202 / 2277 / 368
1e1 / 16519 / 8449 / 2325 / 372
16192 / 7800 / 2010 / 349
16175 / 8222 / 2348 / 377
1e2 / 16630 / 7904 / 2072 / 352
16302 / 7913 / 2334 / 385
16620 / 8183 / 2335 / 377
1e3 / 16045 / 8069 / 2122 / 353
16521 / 8044 / 2200 / 357
15568 / 8387 / 2336 / 363
1e4 / 15315 / 8267 / 2342 / 384
15425 / 8054 / 2420 / 380
14826 / 8408 / 2410 / 369
1e5 / 14925 / 8460 / 2295 / 367
14060 / 8000 / 2350 / 361
6424 / 3739 / 2050 / 348
1e6 / 6828 / 3962 / 1922 / 333
6174 / 3578 / 2004 / 330
814 / 552 / 497 / 118
1e7 / 433 / 280 / 248 / 69
691 / 437 / 393 / 93

Supplementary Table 3: Summary for “mock” Orbitrap-LTQ data analysis (data was analyzed ignoring information provided by the high mass accuracy of the Orbitrap). Total number of acquired spectra, and the number of identified spectra, peptides, and proteins, as a function of threshold trigger values. Each experiment was performed in triplicate, and the values of each replicate experiment are listed in the table.

Threshold
value / Number of acquired spectra / Number of identified spectra / Number of identified peptides / Number of identified proteins
16231 / 7523 / 2015 / 355
1e1 / 16519 / 7439 / 2141 / 357
16192 / 7298 / 1867 / 338
16175 / 7984 / 2086 / 356
1e2 / 16630 / 7733 / 1883 / 340
16302 / 7754 / 2115 / 367
16620 / 7962 / 2098 / 359
1e3 / 16045 / 7820 / 1977 / 340
16521 / 7839 / 2003 / 341
15568 / 8121 / 2120 / 348
1e4 / 15315 / 8017 / 2123 / 360
15425 / 7855 / 2217 / 367
14826 / 8098 / 2175 / 348
1e5 / 14925 / 8186 / 2065 / 347
14060 / 7783 / 2103 / 352
6424 / 3544 / 1866 / 330
1e6 / 6828 / 3783 / 1687 / 321
6174 / 3255 / 1831 / 318
814 / 552 / 462 / 103
1e7 / 433 / 280 / 211 / 63
691 / 437 / 352 / 87

Supplementary Figures

Supplementary Figure 1: Good quality spectra (Q > 0.2) as a percentage of the total number of tandem mass spectra acquired. LTQ data is shown in red, while Orbitrap data is shown in blue.