One of the important challenges for collaborative proteomic studies is the variety of search algorithms embedded in mass spectrometers. Some of these search algorithms are proprietary with key elements undescribed in the open literature or even for the user laboratory. Each investigator has many options in the choice of parameters for the software search to identify peptides from the mass spectra of ion fragments and then to deduce the best protein match from yet another broad array of gene and protein databases, including different versions of each evolving database. Expert curation of such collaborative datasets is required. In the PPP Jamboree Workshop of June 2004, the offer to generate cross-algorithm analyses with PPP data was strongly endorsed, and many months of effort were invested.
Kapp et al.  report a unique analysis of alternative search algorithms. They used one raw file from the Pacific Northwest National Laboratory LCQ-MS/MS data on serum depleted only of IgG published by Adkins et al. , which served as a basis for the later FT-ICR-MS analyses for the PPP (Lab 28). The same spectra were subjected to analyses with MASCOT, SEQUEST (with and without PeptideProphet), Sonar, Spectrum Mill, and XlTandem by experts familiar with the use of each. Careful manual inspection was applied, as well, though it is always a challenge to understand what exactly were the criteria used in manual inspection. The paper provides a useful description and categorization of the features of each search engine into heuristic algorithms and probabilistic algorithms. The authors then present and compare their performance identifying peptides and proteins, benchmarking them based on a range of specified false-positive rates. In all, 600 peptides were identified, of which 355 were found with very high confidence (estimated error rate 1%) by all four of MASCOT, SEQUEST, Spectrum Mill, and XlTandem. The authors concluded that no one of these algorithms outperforms the rest. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but performed less well than MASCOT, XlTan-dem, and Sonar in terms of specificity. Thus, they recommend using at least two search engines for consensus scoring, though the scheme for creating combined scores awaits further work. The probabilistic algorithm, MASCOT, correctly identified the most peptides, while the re-scoring algorithm, PeptideProphet, enhanced the overall performance of SEQUEST. This paper utilizes reversed-sequence searches, as well as probabilistic estimates of false-positive rates. Unfortunately, the spectra in this dataset were dominated by high abundance proteins, such that the 600 peptides were matched to only 40-60 proteins using a trypsin-constrained search.
Was this article helpful?
Get More Traffic With SEO. SEO is short for search engine optimization. This is the very complex yet very visible way of accessing websites or web pages within the natural or unpaid realm of search results. Simply put in its literal sense, is that the more visits or hits as it is often referred to the more visible the said site would be when a search is applied.