View
221
Download
4
Tags:
Embed Size (px)
Citation preview
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with
Complex Patterns of Posttranslational Modifications
Shenheng Guan and Alma L. Burlingame
Problem Input: An MS/MS spectrum of a mixture of peptides:
Heavily modified protein Same amino acid sequence Same PTM Same total number of PTMs Different PTM configurations
Example Two peptides with two methylations each.
LATK[+32]AARKSAE
LATK[+16]AARK[+16]SAE
Problem: Identify the PTM configurations Estimate their relative abundance
Work flow
Peptide identification
Input A deisotoped MS/MS spectrum of a mixture of peptides An identified peptide, the type of PTMs and the number of
PTMs. Example
Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE PTM: Methylation #PTM: 4
Problem Identify the PTM configurations Estimate their relative abundance
All possible configuration Assumption:
All methylations are on lysine residues Each lysine residue has at most 3 methyl groups.
Configuration identification
Score of Spectrum-Configuration-Pair Spectrum S: ETD peak list Configuration C: theoretical peak list (c-ion) Sc(S,C) is the number of matched peaks in the real peak list and
the theoretical peak list. Greedy algorithm
Compute the matching score for each configuration Remove the configure with the highest score from the
configuration set and remove the peaks in S that are matched to the configuration
Repeat the above steps until all configurations have score 0
Configuration identification results
Estimation of relative abundance
We have four identified configurations C1,C2,C3,C4. x1, x2, x3, x4 the relative abundance
Sum equals to 1 Consider the ith c-ion with charge z
Five possible peaks p0, …, p4
Suppose p2 is matched to C1, C2
Observed peak intensity I(p2)
Theoretical peak intensity
Compute the observed and theoretical peak intensity pair for each matched c-ion
40
21 )()(j
jpIxx
Estimation of relative abundance
Find x1, x2, x3, x4 such that the sum of the squared errors of these intensity pairs is minimized.
Standard non-negative least-square procedure
A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry
Richard C. Baliban, Peter A. DiMaggio, Mariana D. Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia and Christodoulos A. Floudas
Bottom up PTM identification
Two approaches Tags Non-tags
Restricted Unrestricted
PILOT_PTM
Preprocessing
Remove all peaks related the precursor ion Only keep locally significant peaks Deisotope Remove neutral offset if the peak doe not
have a complementary peak. Each candidate peak has a list of
supporting peaks.
ILP Model
Input A preprocessed deisotoped spectrum S={ a1,a2,…,am } A peptide (theoretical b-ion peak list) P={ b1b2…bn} A list of all known PTMs
Theoretical peak bk CSk is the set of all possible peaks (indices) in S that bk can be
matched to with PTMs Real peak aj
Posj is the set of all possible peaks (indices) in P that a j can be matched to with PTMs
Supportj is the set of all peaks (indices) supporting peak j in S Multj is the set of all peaks (indices) peak j supports
ILP Model
Binary variablepj,k = 1 if peak aj in S is matched to bk in P,
otherwise pj,k = 0
yj = 1 is peak aj is a supporting peak or matched peak, otherwise yj = 0
ILP Model
Objective Subject to
One peak in P can only match one peak in S
One peak in S can only match one peak in P
ILP Model
Subject to: No three consecutive missing peaks
The intensity of peak i is counted iff the exists one peak j such that peak i supports j and peak j is a matched peak.
ILP Model
Solve using CPLEX Report top-10 variable assignments
Existing problemNo constraints that require the distance
between two neighboring matched peaks should match the mass of a residue (with PTM)
New constraints
For each pj,k
Set of candidate ion peaks j’ with respect to k’ such that no valid jump exists between j and j’
The maximum and minimum masses that can be reached from j, respectively
New constraints
Neighboring matched peaks do not conflict
Conflicting matched peaks must have a matched peak between them
The distance between two matched peaks should be bounded
Postprocessing
Re-scoring 10 candidate modified candidate peptidesCross-correlation score
Recheck modifications if there are unmatched peaks indicating non-modification
Test data sets Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of
chemically synthesized phosphopeptides, manually validated
Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail, manually validated
Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments, manually validated
Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from chromatin fraction. Identified by SEQUEST and validated by MASCOT and remove low quality spectra manually
Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID unmodified spectra. Validated as test set D
Residue predication accuracy
Peptide prediction accuracy
Comparison on test sets C and D1Peptide and residue prediction accuracy
Comparison on test sets C and D1Subsequence prediction accuracy
Running time
Q & A