Analyses of human plasma and serum proteomes using HUPO filter criteria

All the data described above were analyzed using the XCorr/DCn/Sf criteria. From experience, we know that a substantial number ofthe proteins identified by a single peptide using these criteria are incorrect and the probability of a protein being correctly identified increases with the number of unique peptides identified. However, for biomarker discovery it is better to have a less stringent filter so that potentially interesting low-abundance proteins will not be excluded from the analysis. This, however, will inevitably increase the number of false positives and will require more efforts to verify the data generated. As the aim of the HUPO PPP is to provide the most accurate description of the human serum/plasma proteome possible, a more stringent filter (XCorr > 1.9 (z = 1), 2.2 (z = 2), 3.75 (z = 3) and DCn > 0.1; and RSp < 4) was selected by HUPO to minimize false identifications. The analysis of our plasma and serum data sets using the HUPO criteria is summarized in Tab. 1. With the more stringent HUPO criteria, the number of nonredundant proteins identified by >2 peptides from the smaller plasma data set was reduced by only 5.9% while the single-peptide proteins were reduced by 28.3%. A larger reduction was observed with the serum data set, where proteins identified by >2 peptides and a single peptide were reduced by 30.9 and 34.6%, respectively. In both data sets, the reduction in proteins identified by >2 peptides was mainly contributed by the two-peptide proteins (Tab. 1). This is consistent with the expectation that identification errors are more likely to happen for single-peptide proteins followed by two-peptide proteins, and least likely for proteins identified by more than two peptides.

In this study, nonredundant proteins are defined as proteins with different accession number. The in-house software used for the analysis of these datasets dit not eliminate potential redundancy caused by SEQUEST assignment of the same peptide in different pixel to different but homologous database entries. To address this issue, the plasma and serum datasets were reanalyzed using the DTASelect program [30] that is capable of grouping redundant identifications. The program was used to filter peptides using the HUPO high stringency criteria. Proteins that were subsets of others of contained the description 'keratin' were removed. A total of 576 and 2725 nonredundant proteins were reported for the plasma and serum datasets, respectively, using the DTASelect program compared with 575 and 2890 proteins using our in-house software. Therefore, the redundancy in our analysis is very minimal.

When both data sets were combined, a total of 3146 nonredundant proteins were identified using the HUPO criteria (Tab. 1). Of these, 567 (18.0%) were identified by >2 unique peptides and 82.0% were single-peptide proteins. Since immunoglobulins were depleted in both samples, they constituted only 1.3% of the total nonredundant proteins (or 3.2% of proteins with >2 peptides) identified in the combined data set (Tab. 1). The number of proteins that are common to both data sets is only 319, and is limited by the lower sensitivity method used in the plasma analysis (Tab. 1). In addition, 92.5% of the proteins with >2 peptides in plasma were identified in the serum analysis, but only 40.6% of serum proteins with >2 peptides were identified in the plasma analysis. However, 49.5% ofthe common proteins identified in plasma are single-peptide proteins. Since these single-pep-tide proteins were identified using a different instrument and sample, it is likely that a large percentage of the single-peptide proteins identified in plasma are probably correct. In support of this, many of the single-peptide proteins identified in plasma, as well as in serum, have rich MS/MS fragmentation patterns that agree

Tab. 2 Examples of low-abundance proteins (<100 ng/mL) and the corresponding peptides identified in the human plasma and serum samples

Sample Name

ng/mLa) Sequence

z

XCorr

DCn

RSp

Plasma Vascular endothelial-cadherin

30

VHDVNDNWPVFTHR

3

4.08

0.52

1

Serum Vascular endothelial-cadherin

30

DTGENLETPSSFTIK

2

4.32

0.43

1

EYFAIDNSGR

2

2.59

0.55

1

KPLIGTVLAM*DPDAAR

3

3.76

0.32

1

VDAETGDVFAIER

2

3.75

0.61

1

VHDVNDNWPVFTHR

2

3.45

0.51

1

YEIVVEAR

2

2.39

0.27

1

Plasma L-selectin

17

NKEDCVEIYIK

2

3.56

0.38

1

Serum L-selectin

17

NKEDCVEIYIK

2

4.29

0.34

1

SLTEEAENWGDGEPNNK

2

4.38

0.50

1

SLTEEAENWGDGEPNNKK

2

5.22

0.61

1

SYYWIGIR

2

2.65

0.35

1

TICESSGIWSNPSPICQK

2

5.33

0.50

1

Plasma Metalloproteinase inhibitor 1

14

GFQALGDAADIR

2

3.72

0.45

1

Serum Metalloproteinase inhibitor 1

14

GFQALGDAADIR

2

3.05

0.26

1

HLACLPR

2

2.38

0.11

1

LQSGTHCLWTDQLLQGSEK 3

5.32

0.52

1

Serum Vascular endothelial growth

0.500

SEQQIRAASSLEELLR

2

2.47

0.13

2

factor D

Serum Calcitonin

0.190

SALESSPADPATLSEDEAR

2

2.24

0.15

3

Serum Tumor necrosis factor (TNF-a) 0.041

PWYEPIYLGGVFQLEK

2

2.87

0.30

1

a) Concentration values were obtained from [27] * Indicates methionine oxidation a) Concentration values were obtained from [27] * Indicates methionine oxidation well with peptide sequences assigned by SEQUEST. Examples of the MS/MS spectra for single-peptide proteins identified in both data sets are shown in Fig. 7. All of the major peaks in both MS/MS spectra can be accounted for by fragment ions from the predicted peptide sequences, indicating that the peptide assignment is correct. Of particular interest is the protein creatine kinase M which, in the MB isoform, is an important serum marker for myocardial infarction [31]. Therefore, even though the single-peptide protein category contains the most false positives, it also contains many important correct entries that cannot be ignored.

Examples of low-abundance proteins identified in the plasma and serum samples using HUPO criteria are shown in Tab. 2. The list provides an estimate of the detection limit of the protein array pixelation strategy. Some proteins in the low ng/ mL can be detected from 45 mg of plasma using the LCQ Deca XP + , whereas some proteins in the pg/mL can be detected from the 204 mg of serum analyzed using the LTQ mass spectrometer. Not surprisingly, the ability to detect low abundance proteins decreases with protein abundance. For example, among the low abundance proteins described by Haab et al. [27] in their Tab. 2,14 out of the 20 proteins in the

Plasma Protein Abundance
Fig. 7 Representative MS/MS spectra of low-abundance proteins identified by single-peptide matches. MS/MS spectra of the doubly charged ions with m/z 617.45 (GFQALGDAADIR) and m/z 755.22 (LSVEALNSLTGEFK) are shown.

1 to 100 ng/mL concentration range were detected in our serum analysis, whereas only 2 out of 19 proteins at concentrations below 1 ng/mL were detected. In addition, most of the lower abundance proteins identified in plasma are single-peptide proteins whereas the same proteins were identified with multiple peptides in the serum analysis using the more sensitive linear IT mass spectrometer. This indicates that the use of the highly sensitive LTQ mass spectrometer coupled with our optimized method allows detection of proteins up to a concentration range of 109.

This study demonstrates the utility of a novel 4-D protein profiling strategy, protein array pixelation, for comprehensive profiling of human plasma and serum pro-teomes. The four separations used in this strategy greatly reduce plasma/serum complexity, allowing access to proteins differing in abundance by up to nine orders of magnitude. Using HUPO criteria for high-confidence protein identifications, this strategy has detected a total of 3104nonredundant proteins, after excluding keratins and immunoglobulins. Although larger amounts of sample are used for early steps, the final LC-ESI-MS/MS analyses are based on very low amounts of sample (45 mg of plasma and 204 mg of serum). Of these identified proteins, 549 were identified with two or more unique peptides. The total time required for analyzing each sample was similar to MudPIT approaches described by others [12, 15]. Analysis of the HUPO serum sample (BDCA02) using the highly sensitive LTQ mass spectrometer and an optimized method produced a very rich data set that contained >90% of the proteins with two or more peptides identified in the plasma sample. Most importantly, many low-abundance proteins (<100 ng/mL - pg/mL levels) were identified in this data set. In conclusion, the protein array pixelation strategy is a powerful method for comprehensive protein profiling and for protein biomarker discovery.

We would like to thank Brian Haab for sharing data prior to publication. We are also grateful to John Yates, III form The Scripps Research Institute for providing the DTASe-lect software. This work was supported in part by the National Institutes ofHealth Grants CA94360 and CA77048 to D.W.S., and institutional grants to the Wistar Institute including an NCI Cancer Core Grant (CA10815), and the Commonwealth Universal Research Enhancement Program, Pennsylvania Department ofHealth.

Was this article helpful?

0 0
Your Heart and Nutrition

Your Heart and Nutrition

Prevention is better than a cure. Learn how to cherish your heart by taking the necessary means to keep it pumping healthily and steadily through your life.

Get My Free Ebook


Post a comment