Independent analyses of raw spectra or peaklists

After the original data submission protocol had been established, built upon pep-tide sequences and protein identifications, three groups emerged as having capability for centralized, independent analyses that would bypass the peculiarities of the search engine software embedded in particular MS instruments and the criteria applied by individual investigators in establishing thresholds for high and lower confidence identifications or applying manual inspection of the spectra.

Beer at IBM/Haifa developed PepMiner software [42], which processes very large numbers of raw spectra to generate clusters of spectra and then SEQUEST-like analysis and scoring for peptide and protein IDs. Beer et al. [43] applied this method to the spectra from laboratories 1, 2, 17, 22, 28, 29, 34, and 40. The data from laboratory 1 included those submitted for the Core Dataset(s) as well as those in the Barnea et al. [39] special project paper. They identified 14 296 peptides, which were assigned to 4985 proteins with one or more peptides, 2895 proteins with two or more peptides, and 1646 with three or more peptides. The 4985 IDs had 2245 in common with the 15 519unintegrated and 1983 in common with the 9504integrated PPP IDs. The 2895 based on two or more peptides compares with our 2868 based on two or more peptides for the same eight laboratories, with 865 in common with our Core Dataset.

Deutsch et al. [44], at the Institute for Systems Biology in Seattle, US, utilized SEQUEST with PeptideProphet/ProteinProphet software developed by the Eng group to estimate error rates and probability of correct assignment of spectra to peptide sequences and then to protein IDs [15, 45]. Analyzing the PPP datasets from laboratories 2, 12, 22, 28, 29, 34 (B1-heparin only), 37, and 40 with the Pep-tideAtlas process [46], they observed 6929 distinct peptides with a probability score >0.90, including 6342 which mapped to 1606 different EnsEMBL proteins and 1131 different EnsEMBL genes. Reduction of multiple mappings yielded 960 different proteins, of which 479 have matches in the PPP 3020.

Kapp et al. [11] at the Ludwig Institute in Melbourne are utilizing MASCOT and Digger software developed at Ludwig on submissions from 14 laboratories; incomplete analyses show more than 500 high-confidence, non-redundant proteins with trypsin-constrained searches.

In addition, Beavis at the Manitoba Centre for Proteomics created a dataset with 16 191 EnsEMBL proteins from the PPP raw spectra using X!Tandem [47], of which 9497 matched to IPI v2.21, 3903 to our unintegrated list, and 2828 to our 9504proteins based on one or more peptides. Of 5816 IPI proteins with two or more peptides, 1259 matched to the 5102 unintegrated and 913 to the 3020 Core Dataset.

Martens et al. [48] noted the value of these independent analyses in overcoming numerous sources of variation from the search algorithm, the database, and the investigator. They recommend that m/z peaklists routinely be made publicly available, while deferring on the raw data, which currently lack standardized formats, let alone the required infrastructure for centralized storage and distribution. However, a plan to assure access to the raw spectra, as well as the peaklists, can facilitate wide dissemination and utilization of complex datasets, as we have demonstrated in this collaboration by both the participating laboratories and the independent analysts, the incorporation into PRIDE by EBI, into PeptideAtlas by ISB and ETH, and into the Global Proteome Machine DataBase by Beavis.

It is striking that these independent analyses not only differed in the proteins that they identified, but also in the peptides identified from the same MS/MS spectra that were the basis for the protein matches. Further improvements in software and analytical methods are needed, given the many sources oferror in peptide identification [49]; automated de novo sequencing can help, and chemical synthesis of peptides to determine the spectra directly can be employed selectively.

Was this article helpful?

0 0
Basic SEO Explained

Basic SEO Explained

Struggling to Optimize Your Site for the Search Engines? Uncover What You Need to Know to Perform Basic SEO on Your Site, and Help Get it Listed in the Powerful Search Engines. Are YOU Ready to Climb Your Way Up The Search Engine Rankings and Start Getting the FREE Traffic You're Looking For? Hundreds of places claim they can give you top rankings, but wouldn't you rather just learn how to do it on your own so you can repeat the process on any future site you build?

Get My Free Ebook


Post a comment