19
De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumer ics.fi

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi [email protected]

Embed Size (px)

Citation preview

Page 1: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

De novo glycan structure search with CID MS/MS spectra of

native N-glycopeptides

18.12.2008Hannu Peltoniemi

[email protected]

Page 2: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

De novo vs database matching

MS2 spectrum

Unknown glycan

glycandatabase

Database matching

matching

Best scoring glycan(s) in the DB

• Only those structures that are in the DB can be found• OK if comprehensive DB• If glycan not in the DB the result may be closest matching (wrong) structure or no result at all

Page 3: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

MS2 spectrum

Unknown glycan

De novo

Best scoring glycans

• No database -> also new structures can be found !• Computational intensive, requires high quality spectra• Typically no definite answer, but a set of high scoring structures.

On the fly structure generation and matching

Page 4: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

De novo structure search

Part of the N-glycopeptide workflow:Joenväärä et al., N-Glycoproteomics

- An automated workflow approach., Glycobiology 2008,18(4):339-349.

Input: Protonated, deconvoluted MS2 spectra

Steps:1) identification of peptides 2) identification of N-glycan compositions 3) identification of de novo N-glycan structures (branching, no linkage)

Page 5: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Input data

Spectrum with annotated glycopeptide and glycan composition fragments.

Page 6: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Example data

Peptide: QDQCIYNTTYLNVQRGlycan composition: 6 Hex 5 HexNac 3 NeuAc

Page 7: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Same data, different view:

O O OOOO

OO O

OO O

O O

OO O

O OO

OO O

O O

OOO O

O

O

OO O

OO O

O

Hex

Hex

NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3

6

6

5 5 5 5

0

0

0 0 0 0

composition: 6 Hex 5 HexNac 3 NeuAc

Glycan fragments attached to peptide

Free glycans

HexNAc HexNAc HexNAc HexNAc

Page 8: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

The puzzle

• All the measured fragment compositions of a unknown structure with the given total composition are known• Some theoretical fragments may be missing• Some measured fragments may be false

O O OOOO

OO O

OO O

O

What is the structure that explains best the data?

?

Page 9: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Solution

The problem is split to two phases

1)Generation of possible structures: Structures are grown starting from N-glycan core. The population size is limited by removing structures with lowest fit with peptide+glycan fragments

2) Scoring: The set of structures are scored with full data. The final glycopeptide score is set to sum of peptide and glycan structure scores.

Page 10: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

measured

theoretical

Initialization

The missfit (cost) between theoretical structure and measured data is defined as the number of not matching theoretical and measured fragments.

Example data: peptide + 5 Hex 4 HexNAc

Page 11: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Growing structuresStart (core)

End (final composition)

add unit

add unit

add unit

add unit

If population grows too large structures with highest cost are removed.

Page 12: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Scoring

...

Score is calculated as –log10(P), where P is the probability (binomial) that a random set of fragments would match as well or better as the ranked structure. The final glycopeptide score is sum of peptide and structure scores.

highest scoring

lowest scoring

Page 13: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Options

• All glycosidig bonds can be broken• Unlimited number of cuts

Assumptions

• Monosaccharide names• Number of possible connections with each monosaccharide• Accepted connections between monosaccharides• Start structures (N-glycan cores)• Max population size when growing structures

Page 14: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Testing with in silico generated data

structure theoretical spectrum

fragmentation

randomly removing and adding noise fragments

x x xxxx

xxx

xxxx

xxxx

xx

xxx

xxx

xx x

x x

xxx

xxxxx

xxxxxx

xxxxx

xxxxx

xxxx

xx x

xxx

xxxx

xxxx

xxxx

xxx

xxx

xxx

xxx

xx x x x

NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3

Hex

Hex

HexNAc HexNAc HexNAc HexNAc

peptide+glycan

glycan

x x xxxx

xx

x

xx

x

x

x x

xx

xxxx x

xx

xx

x

xxx

x

xxx

xx x

x xxx

x

x

x

x

x

x

xx

xxxx

x

xx x

input to the de novo algoritm

randomized spectrum

Page 15: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

no noise2 noise fragm ents4 noise fragm ents

20 30 40 50 60 70 80

02

04

06

08

01

00

Correct structure w ith rank 3

Removed reducing end fragments (%)

Re

sults

ma

tch

ing

th

e c

rite

ria (

%)

20 30 40 50 60 70 80

02

04

06

08

01

00

Correct structure w ith rank 1

Removed reducing end fragments (%)

Re

sults

ma

tch

ing

th

e c

rite

ria (

%)

Percentage of runs (% )

(20,40) (40,60) (60,80) (80,100) (20,40) (40,60) (60,80) (80,100)

Removed reducing, non reducing end fragments (% )

Removed reducing, non reducing end fragments (% )

Results of the in silico tests

If about ½ of the theoretical fragments present => The correct structure is among the few highest scoring ones.

Each mark is a result of a 100 runs.

Page 16: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Testing with serum sample

• Very complex wet lab data set, i.e. a human serum specimen• Removal the high abundance proteins prior to LC-MS/MS • 80 spectra with identified peptide and glycan compositions• 62 spectra with putative structures• Mostly typical structures• Mostly small structures, large ones seems to be hard to catch

Page 17: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

NeuAc=0 NeuAc=1 NeuAc=2 NeuAc=3

Hex

Hex

HexNAc HexNAc HexNAc HexNAc

Reducing end fragm ents(attached to peptide).

Non reducing end fragm ents(free glycans).

0

0

6

6

0 0 0 05 5 5 5X : theoretical O : m easured

x x xxxx

xxx

xxxx

xxxx

O O OOOO

OO O

OO O

O

xx

xxx

xxxO

OO O

O OO

xx x

x

OO O

O xO

xxx

xxxxx

xxxxxx

xxxxx

xxxxx

xxxx

OOO O

O

O

xx x

xxx

xxxx

xxxx

xxxx

xxx

OO O

OO x

xx

xxx

xxx

xx

O

Ox x x

G lyca n is a tta che d to pe ptideQ D Q C IY N T T Y L N V Q R (A lpha -1 -a c id g lyco pro te in 1 ).

S e rum , m /z=1194.93, z=4

T hree best sco ring s truc tu res.

73 .2 72 .8 72 .6S co re

M e a s ure d a nd the o re tica l fra gm e nts fo r the be s t s co ring s truc tu re .

Example serum spectrum

Page 18: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394), TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) }

FIBG(78), HRG(344), IGHA1(144) VTNC(169)

IGHG1(180), IGHG2(176) IGHA1(144) A1AG1(93)

IGHG2(176) IGHA1(144) CO2(621), CO3(85)

IGHG2(176) IGHA1(144) CO3(85)

Structures found from the serum sample

Page 19: De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi hannu.peltoniemi@appliednumerics.fi

Conclusions

• De novo glycan structure identification of intact glycopeptides is possible

• High quality spectra is necessary

• Typically no definite answer but a few structures matching equally well => biological insight still needed if one identified structure needs to be picked