29
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center

Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center

Embed Size (px)

Citation preview

Top-down characterization of proteins in bacteria with

unsequenced genomes

Nathan EdwardsGeorgetown University Medical Center

2

Microorganism Identification

Homeland-security/defense applications Long history of fingerprinting approaches

Clinical applications in strain identification: Selection of treatment and/or antibiotics

New applications in microbiome analysis: Bacterial colonies in gut, .... Chronic wound infections

Compete with genomic approaches? PCR, Next-gen sequencing Primary sales-pitch is speed.

Microorganism Identifications

Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to

instrumentation and sample prep

Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example

...but many have – about 2500 to date.

3

Microorganism Identifications

Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to

instrumentation and sample prep

Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example

...but many have – about 2500 to date.

Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? Yes, for some proteins in some organisms!

4

5

Intact protein LC-MS/MS

Crude cell lysate

Capilary HPLC C8 column

LTQ-Orbitrap XL

Precursor scan: 30,000 @ 400 m/z

Data-dependent precursor selection: 5 most abundant ions 10 second dynamic

exclusion Charge-state +3 or

greater

CAD product ion scan 15,000 @ 400 m/z

6

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

7

Enterobacteriaceae Protein Sequences

Exhaustive set of all Enterobacteriaceae family protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR]

...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes Primary and alternative translation start-sites

Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species

Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.

8

ProSightPC 2.0

Product ion scan decharging Enabled by high-resolution fragment ion

measurements THRASH algorithm implementation

Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance

"Single-click" analysis of entire LC-MS/MS datafile.

Other tools

Explored using standard search engines: Decharge and format as charge +1 spectrum X!Tandem scoring plugin (ProSight, delta M) OMSSA, Mascot, etc…

MS-Tools: MS-Deconv, MS-TopDown, MS-Align, MS-Align+, MS-Align-E!

9

10

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Match to Y. pestis 50SRibosomal Protein L32

Exact match sequence…

11

Phylogeny: Protein vs DNA

12

Protein Sequence 16S-rRNA Sequence

What about mixtures?

13

14

Shared Small Ribosomal Proteins

15

Shared Small Ribosomal Proteins

16

Identified E. herbicola proteins

30S Ribosomal Protein S19 m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007

Six proteins identified with |Δ| < 0.02

17

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Eight proteins identified with "large" |Δ|

Identified E. herbicola proteins

18

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 1.91e-58

Use "Sequence Gazer" to find mass shift ΔM mode can "tolerate" one shift for free!

Identified E. herbicola proteins

ProSightPC: ΔM mode

19

Protein Sequence

ExperimentalPrecursor

ΔM

b- and y-ions

Also: PIITA - Tsai et al. 2009

ProSightPC: ΔM mode

20

Protein Sequence

ExperimentalPrecursor

ΔM

b- and y-ions

ΔM b'- and y'-ions

Also: PIITA - Tsai et al. 2009

Match a single "blind" mass-shift for free!

ProSightPC: ΔM mode

21

Protein Sequence

ExperimentalPrecursor

ΔM

b-, b'-, y- and y'-ions

ΔM

Also: PIITA - Tsai et al. 2009

Match a single "blind" mass-shift for free!

22

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Extract N- and C-terminus sequence supported by at least 3 b- or y-ions

Identified E. herbicola proteins

23

E. herbicola protein sequences

24

E. herbicola sequences found in other species

25

Phylogenetic placement of E. herbicola

Phylogram Cladogramphylogeny.fr – "One-Click"

Genome annotation errors

UniProt: E. coli Cell division protein ZapB

22 (371) E. coli strains

26

MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…

3 (204)17 (166)

0 (2)

Genome annotation errors

UniProt: E. coli Cell division protein ZapB

22 (371) E. coli strains Need ±1500 Da precursor tolerance…

27

MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…

3 (204)17 (166)

0 (2)

28

Conclusions Protein identification for unsequenced organisms.

Identification and localization for sequence mutations and post-translational modifications.

Extraction of confidently established sequence suitable for phylogenetic analysis.

Genome annotation correction.

New paradigm for phylogenetic analysis?

29

Acknowledgements

Dr. Catherine Fenselau Avantika Dhabaria, Joe Cannon*, Colin Wynne* University of Maryland Biochemistry

Dr. Yan Wang University of Maryland Proteomics Core

Dr. Art Delcher University of Maryland CBCB

Funding: NIH/NCI