Protein identification data set

The identified protein accession numbers, names, search database and version, sequences of the identified peptides, and an estimate of confidence for each protein identification, plus any supporting information about PTMs (from experimental measurements, or other sources), and estimates of relative protein abundance in the specimen. Identification data sets were stored as peptide lists, reflecting the fact

Tab. 1 Comparison ofthe HUPO PPP data model with guidance for publishing peptide and protein identification data byCarr etal. [4]

Guideline proposed by Carr et al.[4] HUPO PPP data model

1. Supporting information

The method and/or program used to create the Data were collected as a part of free text de-"peak list" from raw data and the parameters scription of performed experiments. Recused in the creation of this peak list. ommendation to use PEDRO tool was moot, since tool was not ready for use.

The name and version of the program(s) used Name of the search program collected, but for database searching and specific parameters not version or operation parameters. used for its (their) operation.

Scores used to interpret MS/MS data and thres- Scores and thresholds were collected. holds and values specific to judging certainty of identification, whether any statistical analysis was applied to validate the results, and a description of how it was applied.

The name and version of sequence database Both name and version of the sequence data-used; the count of number of protein entries base were collected. The sequence database in it at the time searched. itself was also recorded.

2. Information regarding the observed sequence coverage

Table that lists for each protein the sequences Peptides (sequences) identified for each proof all identified peptides. tein were collected.

To calculate the sequence coverage different All forms of identified peptides were collected, forms of the same peptide are to be counted but as long as they have the same amino as only a single peptide. acid sequence they were counted only once.

The total number of MS/MS-interpreted spectra Raw spectra were collected. assigned to peptides corresponding to each protein.

3. Protein assignments based on single-peptide assignments

The sequence of the peptide used to make each Sequence of the peptide was collected but not such assignment, together with the amino acids the terminal information. N- and C-terminals to that peptide's sequence.

The precursor mass and charge. The precursor charge state was collected as a part of the peptide data. The mass was requested as part of the peak list information.

The scores for this peptide. Scores were collected.

4. Biological conclusions based on observation of a single peptide matching to a protein

Such conclusions must be supported by inclu- Raw spectra were requested for all the MS/MS sion of the corresponding MS/MS spectrum. identifications (including single peptide).

5. Peptide mass fingerprint data

In addition to listing the number of masses match- Only peptides matched to the identified pro-ed to the identified protein, authors should also tein were collected. Sequence coverage was state the number of masses not matched in the calculated. spectrum and the sequence coverage observed.

Tab. 1 Continued

Guideline proposed by Carr et al.[4] HUPO PPP data model

Parameters and thresholds used to analyze the Data collected only as a part of free text dedata. scription of performed experiments. No particular information was requested.

6. Ambiguous protein identifications

The same protein appears in many cases under A data integration workflow was specially de-different names and accession numbers in the signed to address this problem. It is de-database. When matching peptides to members scribed in the following sections. of such a family, it is the authors' responsibility to demonstrate that they are aware ofthe problem and have taken reasonable measures to eliminate redundancy. In cases where a single-protein member of a multiprotein family has been singled out, the authors should explain how the other members ofthe group were ruled out.

7. Submission of MS/MS spectra

Submission of all MS/MS spectra mentioned in the paper as supplemental material. The dta, pkl, and mgffiles are accepted.

Raw spectra in the instrument native format were collected and are available on request. They may be converted to the other formats with use of special software.

that some laboratories applied significant protein fractionation prior to tryptic digest and mass spectral analysis. In a pure "bottom up" strategy, any protein can contribute any peptide and no information is gained by retaining group structure for peptides. However, when protein fractionation is used, knowledge that a group of peptides were all derived from the same protein fraction can enhance the power ofidentification.

