Step by step guide to spectrum clustering and merging:

  1. Ensure that Java is installed on the system (test by opening a command line window: Press Start – Run – cmd and then write java on the command line window). If it says something how to use the java command, everything is fine!
  1. Download the clustering application from
  2. Unzip the file into a suitable location
  1. Download and unzip the large Q-TOF test data (10056 spectra) from the same location (or use the data from yesterday). NOTE: If the amount of spectra is too large, ask for a smaller subset.
  1. Start the application by double-clicking clustering.cmd.
  1. Select which folder to use as input spectrum folder, and where to put the output.

In our case, we will start with the spectra from the Q-TOF instrument. Choose the directory containing all the mgf files (subfolder named “all”), and a suitable (temporary) output folder.

In section 2 – similarity options, select NorBel sigmoid as similarity measure

In section 4 – merging options, select “merge clusters to uberspectra”.

In section 4 – also change Fragment merging to Accumulate – use important.

Then, go to the output page and press GO! Wait until the progress bar says: Done (can take some time, all depending on your computer).

  1. Have a look at the output clusters by pressing the “browse output” button. The biggest clusters probably contain contaminants, while others contain peptide material.
  1. Now download and unzip the ion-trap dataset from the web page (or use the data from yesterday). Perform a clustering using the non-identified spectra (in the non_id folder). In section 4.35 – Output settings, select “output basis files for clusters”. This tells the program for each cluster to generate an mgf file with the original member spectra.
  1. Check that there is an output file called 7merged_MCF7_60_TX100_060515H_13_11530_54.mgf and one called 7merged_MCF7_60_TX100_060515H_13_11530_54.basis.mgf
  1. Go to and press the Mascot link. Then press the MS/MS Ion Search link. In the search page, fill in your credentials, and the following settings. Database “Swiss-Prot”, taxonomy Homo Sapiens. Peptide tol +-0.5 Da, MS/MS tol +-0.5 Da. Select the file 7merged_MCF7_60_TX100_060515H_13_11530_54.basis.mgf and choose instrument type ESI-TRAP.
  1. Then press start search.
  2. You'll see that there are several hits for the spectra in this cluster.
  3. Redo the search with the file 7merged_MCF7_60_TX100_060515H_13_11530_54.mgf
  4. Now you should see a significant hit, with a higher score than any of the original members in the cluster
  5. Mascot’s scoring is quite sensitive to small changes in the spectra.
  1. Try to change some settings, and re-run the clustering and searches. The merging parameters are particularly prone to change the subsequent Mascot results.
  2. For example, without changing the clustering settings, try to cluster only the spectra in the “id” folder of the ion-trap data. A cluster with filename 34merged_MCF7_60_TX100_060515H_13_11530_41.mgf should be produced. Experiment with the merging settings (how the fragment merging is done, e.g. change the “Fraction for importance” parameter) and see how the Mascot score of this spectrum is affected.
  1. DONE!