Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Abstract

Shotgun proteomics uses liquid chromatography–tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of SEQUEST post-processing methods.
Figure 2: A peptide that was re-ranked by Percolator.

Similar content being viewed by others

References

  1. Eng, J.K., McCormack, A.L. & Yates, J.R. III. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    Article  CAS  Google Scholar 

  2. Perkins, D.N., Pappin, D.J.C., Creasy, D.M. & Cottrell, J.S. Electrophoresis 20, 3551–3567 (1999).

    Article  CAS  Google Scholar 

  3. MacCoss, M.J., Wu, C.C. & Yates, J.R. III. Anal. Chem. 74, 5593–5599 (2002).

    Article  CAS  Google Scholar 

  4. Keller, A., Nezvizhskii, A.I., Kolker, E. & Aebersold, R. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  Google Scholar 

  5. Moore, R.E., Young, M.K. & Lee, T.D. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002).

    Article  CAS  Google Scholar 

  6. Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. J. Proteome Res. 2, 43–50 (2003).

    Article  CAS  Google Scholar 

  7. Anderson, D.C., Li, W., Payan, D.G. & Noble, W.S. J. Proteome Res. 2, 137–146 (2003).

    Article  CAS  Google Scholar 

  8. Boser, B.E., Guyon, I.M. & Vapnik, V.N. A training algorithm for optimal margin classifiers. in 5th Annual ACM Workshop on COLT (ed. Haussler, D.) 144–152 (ACM Press, Pittsburgh, Pennsylvania, USA, 1992).

    Google Scholar 

  9. Storey, J.D. & Tibshirani, R. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  Google Scholar 

  10. Tabb, D.L., McDonald, W.H. & Yates, J.R. III. J. Proteome Res. 1, 21–26 (2002).

    Article  CAS  Google Scholar 

  11. Washburn, M.P., Wolters, D. & Yates, J.R. III. Nat. Biotechnol. 19, 242–247 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was funded by US National Institutes of Health grants P41 RR011823 and R01 EB007057.

Author information

Authors and Affiliations

Authors

Contributions

M.J.M. came up with the initial idea to use decoy PSMs as negative examples. L.K. and W.S.N. came up with the idea to use a support vector machine using semi-supervised learning. L.K. implemented Percolator and performed computational experiments. J.W. provided machine learning expertise. J.D.C. performed initial proof-of-concept experiment and provided mass spectrometry expertise. W.S.N., L.K. and M.J.M. wrote the article.

Corresponding author

Correspondence to Michael J MacCoss.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4, Supplementary Tables 1 and 2, Supplementary Methods, Supplementary Data (PDF 1393 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Käll, L., Canterbury, J., Weston, J. et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4, 923–925 (2007). https://doi.org/10.1038/nmeth1113

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth1113

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing