Data submission process

The data submission strategy was designed to make the submission process simple for the participants and at the same time error-proofand relatively easy to process for the data collection and integration center. As stated above, the consensus data model of the PPP pilot phase included only a limited representation of methods and results, to minimize the time commitment for participating experimentalists. Two methods for submitting were offered: (a) a combination of Microsoft Excel™, Microsoft Word™, and text forms, or (b) an XML ( schema-based file format (PEDRO [5, 6]). Those who chose the form-based submission were asked to fill out a set of preformatted Excel/Word/text document templates, and submit them online using a web-based submission server at the University of Michigan. Those who chose the XML format were asked to email their submissions to the European Bioinformatics Institute, after generating one or more XML documents using the provided XML schema. The schema ofthe XML document allowed for the collection of all the information in one, hierarchically organized file. To generate the XML documents the participants were encouraged to use the PEDRO data entry tool [6], or to export XML directly from their existing LIMS system. The XML documents were checked for compliance with the schema and forwarded to the University of Michigan for further processing.

During the course of the project, we decided to request the raw MS/MS spectra in the form of instrument files in spectrometer native format. The size of these files, sometimes in excess of several gigabytes, did not allow for their collection by the standard data submission route; instead, CD or DVD disks were submitted to the University of Michigan Core and distributed to three groups for special cross data set analyses (see Omenn et al., Kapp et al., and Beer et al., this issue).

At the beginning of the project each participating laboratory received two distinct identifiers: the first, a numeric public identifier used for interactions with the submission centers and other laboratories, and the second, a three-character private code known only to the laboratory and the central data analysis group. These private identifiers were used to create data surveys without disclosing the identity of submitters.

