Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides

TitleDynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides
Publication TypeJournal Article
Year of Publication2006
AuthorsNesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R
JournalMol Cell Proteomics
Volume5
Pagination652-70
Date PublishedApr
PMID16352522
Keywords*Polymorphism, Genetic, *Protein Processing, Post-Translational, *Proteomics, Alternative Splicing, Amino Acid Sequence, Mass Spectrometry, Molecular Sequence Data, Peptides/*metabolism
AbstractIn mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets we developed a robust method that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum. We also demonstrate that iterative search strategies applied to such detected unassigned high quality spectra significantly increase the number of spectra that can be assigned from datasets and that biologically interesting new insights can be gained from existing data.

Back