Upload
ernesto-keaton
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
De novo glycan structure search with CID MS/MS spectra of
native N-glycopeptides
18.12.2008Hannu Peltoniemi
De novo vs database matching
MS2 spectrum
Unknown glycan
glycandatabase
Database matching
matching
Best scoring glycan(s) in the DB
• Only those structures that are in the DB can be found• OK if comprehensive DB• If glycan not in the DB the result may be closest matching (wrong) structure or no result at all
MS2 spectrum
Unknown glycan
De novo
Best scoring glycans
• No database -> also new structures can be found !• Computational intensive, requires high quality spectra• Typically no definite answer, but a set of high scoring structures.
On the fly structure generation and matching
De novo structure search
Part of the N-glycopeptide workflow:Joenväärä et al., N-Glycoproteomics
- An automated workflow approach., Glycobiology 2008,18(4):339-349.
Input: Protonated, deconvoluted MS2 spectra
Steps:1) identification of peptides 2) identification of N-glycan compositions 3) identification of de novo N-glycan structures (branching, no linkage)
Input data
Spectrum with annotated glycopeptide and glycan composition fragments.
Example data
Peptide: QDQCIYNTTYLNVQRGlycan composition: 6 Hex 5 HexNac 3 NeuAc
Same data, different view:
O O OOOO
OO O
OO O
O O
OO O
O OO
OO O
O O
OOO O
O
O
OO O
OO O
O
Hex
Hex
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
6
6
5 5 5 5
0
0
0 0 0 0
composition: 6 Hex 5 HexNac 3 NeuAc
Glycan fragments attached to peptide
Free glycans
HexNAc HexNAc HexNAc HexNAc
The puzzle
• All the measured fragment compositions of a unknown structure with the given total composition are known• Some theoretical fragments may be missing• Some measured fragments may be false
O O OOOO
OO O
OO O
O
What is the structure that explains best the data?
?
Solution
The problem is split to two phases
1)Generation of possible structures: Structures are grown starting from N-glycan core. The population size is limited by removing structures with lowest fit with peptide+glycan fragments
2) Scoring: The set of structures are scored with full data. The final glycopeptide score is set to sum of peptide and glycan structure scores.
measured
theoretical
Initialization
The missfit (cost) between theoretical structure and measured data is defined as the number of not matching theoretical and measured fragments.
Example data: peptide + 5 Hex 4 HexNAc
Growing structuresStart (core)
End (final composition)
add unit
add unit
add unit
add unit
If population grows too large structures with highest cost are removed.
Scoring
...
Score is calculated as –log10(P), where P is the probability (binomial) that a random set of fragments would match as well or better as the ranked structure. The final glycopeptide score is sum of peptide and structure scores.
highest scoring
lowest scoring
Options
• All glycosidig bonds can be broken• Unlimited number of cuts
Assumptions
• Monosaccharide names• Number of possible connections with each monosaccharide• Accepted connections between monosaccharides• Start structures (N-glycan cores)• Max population size when growing structures
Testing with in silico generated data
structure theoretical spectrum
fragmentation
randomly removing and adding noise fragments
x x xxxx
xxx
xxxx
xxxx
xx
xxx
xxx
xx x
x x
xxx
xxxxx
xxxxxx
xxxxx
xxxxx
xxxx
xx x
xxx
xxxx
xxxx
xxxx
xxx
xxx
xxx
xxx
xx x x x
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
Hex
Hex
HexNAc HexNAc HexNAc HexNAc
peptide+glycan
glycan
x x xxxx
xx
x
xx
x
x
x x
xx
xxxx x
xx
xx
x
xxx
x
xxx
xx x
x xxx
x
x
x
x
x
x
xx
xxxx
x
xx x
input to the de novo algoritm
randomized spectrum
no noise2 noise fragm ents4 noise fragm ents
20 30 40 50 60 70 80
02
04
06
08
01
00
Correct structure w ith rank 3
Removed reducing end fragments (%)
Re
sults
ma
tch
ing
th
e c
rite
ria (
%)
20 30 40 50 60 70 80
02
04
06
08
01
00
Correct structure w ith rank 1
Removed reducing end fragments (%)
Re
sults
ma
tch
ing
th
e c
rite
ria (
%)
Percentage of runs (% )
(20,40) (40,60) (60,80) (80,100) (20,40) (40,60) (60,80) (80,100)
Removed reducing, non reducing end fragments (% )
Removed reducing, non reducing end fragments (% )
Results of the in silico tests
If about ½ of the theoretical fragments present => The correct structure is among the few highest scoring ones.
Each mark is a result of a 100 runs.
Testing with serum sample
• Very complex wet lab data set, i.e. a human serum specimen• Removal the high abundance proteins prior to LC-MS/MS • 80 spectra with identified peptide and glycan compositions• 62 spectra with putative structures• Mostly typical structures• Mostly small structures, large ones seems to be hard to catch
NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3
Hex
Hex
HexNAc HexNAc HexNAc HexNAc
Reducing end fragm ents(attached to peptide).
Non reducing end fragm ents(free glycans).
0
0
6
6
0 0 0 05 5 5 5X : theoretical O : m easured
x x xxxx
xxx
xxxx
xxxx
O O OOOO
OO O
OO O
O
xx
xxx
xxxO
OO O
O OO
xx x
x
OO O
O xO
xxx
xxxxx
xxxxxx
xxxxx
xxxxx
xxxx
OOO O
O
O
xx x
xxx
xxxx
xxxx
xxxx
xxx
OO O
OO x
xx
xxx
xxx
xx
O
Ox x x
G lyca n is a tta che d to pe ptideQ D Q C IY N T T Y L N V Q R (A lpha -1 -a c id g lyco pro te in 1 ).
S e rum , m /z=1194.93, z=4
T hree best sco ring s truc tu res.
73 .2 72 .8 72 .6S co re
M e a s ure d a nd the o re tica l fra gm e nts fo r the be s t s co ring s truc tu re .
Example serum spectrum
ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394), TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) }
FIBG(78), HRG(344), IGHA1(144) VTNC(169)
IGHG1(180), IGHG2(176) IGHA1(144) A1AG1(93)
IGHG2(176) IGHA1(144) CO2(621), CO3(85)
IGHG2(176) IGHA1(144) CO3(85)
Structures found from the serum sample
Conclusions
• De novo glycan structure identification of intact glycopeptides is possible
• High quality spectra is necessary
• Typically no definite answer but a few structures matching equally well => biological insight still needed if one identified structure needs to be picked