27
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame

Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

  • View
    221

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with

Complex Patterns of Posttranslational Modifications

Shenheng Guan and Alma L. Burlingame

Page 2: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Problem Input: An MS/MS spectrum of a mixture of peptides:

Heavily modified protein Same amino acid sequence Same PTM Same total number of PTMs Different PTM configurations

Example Two peptides with two methylations each.

LATK[+32]AARKSAE

LATK[+16]AARK[+16]SAE

Problem: Identify the PTM configurations Estimate their relative abundance

Page 3: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Work flow

Page 4: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Peptide identification

Input A deisotoped MS/MS spectrum of a mixture of peptides An identified peptide, the type of PTMs and the number of

PTMs. Example

Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE PTM: Methylation #PTM: 4

Problem Identify the PTM configurations Estimate their relative abundance

Page 5: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

All possible configuration Assumption:

All methylations are on lysine residues Each lysine residue has at most 3 methyl groups.

Page 6: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Configuration identification

Score of Spectrum-Configuration-Pair Spectrum S: ETD peak list Configuration C: theoretical peak list (c-ion) Sc(S,C) is the number of matched peaks in the real peak list and

the theoretical peak list. Greedy algorithm

Compute the matching score for each configuration Remove the configure with the highest score from the

configuration set and remove the peaks in S that are matched to the configuration

Repeat the above steps until all configurations have score 0

Page 7: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Configuration identification results

Page 8: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Estimation of relative abundance

We have four identified configurations C1,C2,C3,C4. x1, x2, x3, x4 the relative abundance

Sum equals to 1 Consider the ith c-ion with charge z

Five possible peaks p0, …, p4

Suppose p2 is matched to C1, C2

Observed peak intensity I(p2)

Theoretical peak intensity

Compute the observed and theoretical peak intensity pair for each matched c-ion

40

21 )()(j

jpIxx

Page 9: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Estimation of relative abundance

Find x1, x2, x3, x4 such that the sum of the squared errors of these intensity pairs is minimized.

Standard non-negative least-square procedure

Page 10: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry

Richard C. Baliban, Peter A. DiMaggio, Mariana D. Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia and Christodoulos A. Floudas

Page 11: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Bottom up PTM identification

Two approaches Tags Non-tags

Restricted Unrestricted

PILOT_PTM

Page 12: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Preprocessing

Remove all peaks related the precursor ion Only keep locally significant peaks Deisotope Remove neutral offset if the peak doe not

have a complementary peak. Each candidate peak has a list of

supporting peaks.

Page 13: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

ILP Model

Input A preprocessed deisotoped spectrum S={ a1,a2,…,am } A peptide (theoretical b-ion peak list) P={ b1b2…bn} A list of all known PTMs

Theoretical peak bk CSk is the set of all possible peaks (indices) in S that bk can be

matched to with PTMs Real peak aj

Posj is the set of all possible peaks (indices) in P that a j can be matched to with PTMs

Supportj is the set of all peaks (indices) supporting peak j in S Multj is the set of all peaks (indices) peak j supports

Page 14: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

ILP Model

Binary variablepj,k = 1 if peak aj in S is matched to bk in P,

otherwise pj,k = 0

yj = 1 is peak aj is a supporting peak or matched peak, otherwise yj = 0

Page 15: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

ILP Model

Objective Subject to

One peak in P can only match one peak in S

One peak in S can only match one peak in P

Page 16: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

ILP Model

Subject to: No three consecutive missing peaks

The intensity of peak i is counted iff the exists one peak j such that peak i supports j and peak j is a matched peak.

Page 17: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

ILP Model

Solve using CPLEX Report top-10 variable assignments

Existing problemNo constraints that require the distance

between two neighboring matched peaks should match the mass of a residue (with PTM)

Page 18: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

New constraints

For each pj,k

Set of candidate ion peaks j’ with respect to k’ such that no valid jump exists between j and j’

The maximum and minimum masses that can be reached from j, respectively

Page 19: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

New constraints

Neighboring matched peaks do not conflict

Conflicting matched peaks must have a matched peak between them

The distance between two matched peaks should be bounded

Page 20: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Postprocessing

Re-scoring 10 candidate modified candidate peptidesCross-correlation score

Recheck modifications if there are unmatched peaks indicating non-modification

Page 21: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Test data sets Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of

chemically synthesized phosphopeptides, manually validated

Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail, manually validated

Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments, manually validated

Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from chromatin fraction. Identified by SEQUEST and validated by MASCOT and remove low quality spectra manually

Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID unmodified spectra. Validated as test set D

Page 22: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Residue predication accuracy

Page 23: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Peptide prediction accuracy

Page 24: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Comparison on test sets C and D1Peptide and residue prediction accuracy

Page 25: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Comparison on test sets C and D1Subsequence prediction accuracy

Page 26: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Running time

Page 27: Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan

Q & A