Comet JASMS Manuscript

Comet JASMS Manuscript

Supplemental Materials

Table 1 supplemental data:

LTQ Orbitrap Velos file FIG1C_TIIMAC_500UG_AUTO_1.raw downloaded from PRIDE ( and converted to mzXML using ReAdW version 2014.1.1.

Sequence database searched: UniProt proteomes human sequence database (downloaded 10/30/2014) with common contaminants appended (88,855 total sequence entries) or UniProt proteomes yeast sequence database (downloaded 10/30/2014) with common contaminants appended (6,792 total sequence entries). URL ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/proteomes/.

Relevant search parameters common to all searches:

decoy_search = 1

num_threads = 0

peptide_mass_tolerance = 20.00
peptide_mass_units = 2

mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1
num_enzyme_termini = 2

allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1
use_sparse_matrix = 0
add_C_cysteine = 57.021464

High-res searches:

fragment_bin_tol = 0.02

fragment_bin_offset = 0.0
theoretical_fragment_ions = 0

Low-res searches:

fragment_bin_tol = 1.0005

fragment_bin_offset = 0.4
theoretical_fragment_ions = 1

Human searches adds:

variable_mod02 = 79.966331 STY 0 3 -1 0 0

Figure 3 supplemental data:

LTQ Orbitrap Velos file FIG1C_TIIMAC_500UG_AUTO_1.raw downloaded from PRIDE ( converted to mzXML using ReAdW version 2014.1.1, and then converted to ms2 format using MzXML2Search (TPP v4.8.0). The ms2 file was duplicated and concatenated together until there were 50,000 spectra in the Comet search.

Sequence database searched: UniProt proteomes human sequence database (downloaded 10/30/2014) with common contaminants appended (88,855 total sequence entries).

Relevant search parameters common to all searches:

decoy_search = 0

num_threads = 4

peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1

search_enzyme_number = 1

num_enzyme_termini = 2

allowed_missed_cleavage = 2

variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005

fragment_bin_offset = 0.4

theoretical_fragment_ions = 1

use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1

use_sparse_matrix = 1
add_C_cysteine = 57.021464

Figure 5 supplemental data:

LTQ Orbitrap Velos files PRO_CAD_IT_01.raw and PRO_CAD_IT_02.raw downloaded from SCOR (http://scor.chem.wisc.edu/data/raw/Frag_meth_Detect_comparison.tar.gz) and converted to mzXML using ReAdW version 2014.1.1.

Sequence database searched: UniProt proteomes human sequence database (downloaded 10/30/2014) with common contaminants appended (88,855 total sequence entries).

Relevant search parameters common to all searches:

decoy_search = 0

num_threads = 0

peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1

num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005
theoretical_fragment_ions = 1

use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1

use_sparse_matrix = 1
add_C_cysteine = 57.021464

Figure 5 supplemental data:

LTQ Orbitrap Velos file PRO_CAD_IT_01.raw downloaded from SCOR (http://scor.chem.wisc.edu/data/raw/Frag_meth_Detect_comparison.tar.gz) and converted to mzXML using ReAdW version 2014.1.1.

Sequence database searched: UniProt proteomes human sequence database (downloaded 10/30/2014) with common contaminants appended are concatenated with their reverse decoy counterparts (177,710 total sequence entries).

Relevant Comet search parameters:

peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1

num_enzyme_termini = 2
allowed_missed_cleavage = 2
variable_mod01 = 15.9949 M 0 3 -1 0 0
fragment_bin_tol = 1.0005
theoretical_fragment_ions = 1

use_B_ions = 1
use_Y_ions = 1
use_NL_ions = 1

use_sparse_matrix = 1
add_C_cysteine = 57.021464

Relevant UW SEQUEST search parameters:

peptide_mass_tolerance = 20.00
peptide_mass_units = 2
mass_type_parent = 1
mass_type_fragment = 1
isotope_error = 1
search_enzyme_number = 1

num_enzyme_termini = 2
allowed_missed_cleavage = 2
diff_search_options: 15.9949 M 0.0 X 0.0 X 0.0 X 0.0 X 0.0 X
fragment_bin_tol = 1.0005

fragment_bin_startoffset = 0.4
ion_series = 0 0 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

theoretical_fragment_ions = 1

use_sparse_matrix = 1
add_C_cysteine = 57.021464

Relevant Thermo SEQUEST search parameters:

peptide mass tolerance: 20.00 ppm
fragment mass tolerance: 0.8 Da

variable modification: oxidation M

static modification: carbamidomethyl C
fragment ion types: b, y

mass range: 600 to 5000 Da

fragment and precursor mass type: monoisotopic

enzyme: trypsin, full

missed cleavages: 2

Sparse Matrix Format: determining the optimum number of bins in a segment

Evaluation of sparse matrix horizontal dimension. Table 1 data searched against the yeast database. The number of bins per sparse matrix segment is varied from 10 to 500 and the effects on search speed and memory use are measured. As the number of bins is varied, there is no significant effect on search times. But memory use does vary with the optimal minimum memory use at 100 bins. Thus 100 bins per segment are applied in the sparse matrix.

bin size / memory use (GB) / run time (mm:ss)
10 / 8 / 2:28
25 / 4.5 / 2:22
50 / 3.5 / 2:25
100 / 3.3 / 2:20
200 / 3.7 / 2:21
300 / 4.2 / 2:25
500 / 5.2 / 2:21

Decoy database support

Comet’s internally generated decoy sequences, invoked using the “decoy_search” parameter, are based on the pseudo-reversed strategy that was initially implemented by Sage-N Sorcerer. This strategy takes each target peptide and reverses all amino acids except for either the N-terminal residue (most enzymes) or C-terminal residue (for AspN digestion), which is kept in place. For example, the tryptic peptide DLSTYAK would generate a decoy peptide AYTSLDK. If the protease AspN were applied in a search, the target peptide DVLNHGST would generate a corresponding decoy peptide DTSGHNLV. The benefit of this decoy strategy is multi-fold: every target peptide will have a corresponding decoy peptide of the exact same mass so the target and decoy mass distributions are exactly the same; the number of target and decoy peptides analyzed is exactly the same; and all decoy peptides maintain an appropriate terminal residue consistent with the enzyme applied.

Comet itself applies no filtering based on the target or decoy entries. This means every peptide hit, whether target or decoy, is faithfully reported. The user can choose to run decoy searches as if the database is concatenated (target and decoy entries are scored against each other in competition) or as if the target and decoy databases were searched separately (resulting in separate search results reporting the target hits and the decoy hits for each spectrum query).