S1

Investigation of Scrambled Ions in Tandem Mass Spectra Part 2: The Influence On Peptide Identification

Journal Name: Journal of the American Society for Mass Spectrometry

Nai-ping Dong, Yi-zeng Liang*, Lun-zhao Yi, Hong-mei Lu

College of Chemistry and Engineering, CentralSouthUniversity, Changsha 410083, P. R. China.

E-mail for Corresponding Author:

Supplementary Material

Figure S-1. Score variation after removing the scrambled ions……….. S2

Figure S-2.Fraction of abundance and number of scrambled ions ions versus variances of dot product scores...... S4

Figure S-3.Fraction of intensity and number of scrambledions versus variances of dot product scores...... S5

Figure S-4.Variation of dot product score with different penalization factors...... S6

Figure S-5.Distribution of deltaCn...... S7

Figure S-6.Distribution of ΔHyperscore...... S8

Figure S-7.Preprocessed MS/MS spectra with and without scrambled ions...... S9

A

B

C

Figure S-1. Relationship of scores derived fromthe original (non-direct sequence ions containing) and the non-direct sequence ion removing MS/MS spectra. A: X!Tandem hyperscores; B: PepNovo RnkScr scores; and C: Lutefisk pevzscr scores. a: Agilent dataset; b: LTQ dataset; c: LTQ-FT dataset; and d: QSTAR dataset. Xorigin: scores X derived from original MS/MS spectra; Xscrdel: scores X derived from non-direct sequence ion removing MS/MS spectra.

Figure S-2. Fraction of intensity (a) and number (b) of the non-direct sequence ions in tandem mass spectra versus the variances of XCorr scores after removing the ions. Dataset used: Agilent. Peptide identification algorithm: Crux.

a b

Figure S-3. Fraction of intensity (a) and number (b) of the non-direct sequence ions in tandem mass spectra versus the variances of dotproduct scores after removing the ions. Dataset used: Agilent. Peptide identification algorithm: SpectraST.

A

B

Figure S-4. SpectraST dot product scores obtained by searchingthe SLndlibrary versus the SLnsdlibrary with the penalization factor of 0.1 (A) and 1 (B). All scores are derived from LTQ dataset.The red line in each figure indicates y=x diagonal.

A

B

C

Figure S-5. Distribution of the difference of XCorr scores between the first- and second ranked sequences. A: Agilent dataset; B: LTQ-FT dataset; and C: QSTAR dataset.

A

B

C

Figure S-6. Distribution of ΔHyperscore between the first- and second ranked sequences. A: Agilent dataset; B: LTQ-FT dataset; and C: QSTAR dataset.

Figure S-7. Variation of tandem mass spectrum that actually used by X!Tandem in identifying peptide. The most intense peak of the original MS/MS spectrum was assigned to the non-direct sequence ion, making the incorrect sequence LSFNPTQLEEQCHI gained score over the correct sequence AAFDMFDADGGGDISVK. Whereasafter removing this fragment ion, significantly different normalized MS/MS spectrum from the original one was obtained and the correct sequence was identified by the algorithm.