98
Was T. rex Just a Big Chicken? Computational Proteomics Phillip Compeau and Pavel Pevzner adjusted by Jovana Kovačević Bioinformatics Algorithms: an Active Learning Approach © 2015 by Compeau and Pevzner. All rights reserved.

Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

Computational Proteomics

Phillip Compeau and Pavel Pevzner

adjusted by Jovana Kovačević

Bioinformatics Algorithms: an Active Learning Approach

© 2015 by Compeau and Pevzner. All rights reserved.

Page 2: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionaries

Page 3: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

T. Rex and Chicken Collagens Are Nearly

Identical!

T. rex and chicken

collagens are nearly identical!

Page 5: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Frederick Sanger’s Two Nobel Prizes

GIVEECCA!

GIVEECCASV!

GIVEECCASVC!

GIVEECCASVCSL!

GIVEECCASVCSLY!

SLYELEDYC!

ELEDY!

ELEDYCD!

LEDYCD!

EDYCD!

FVDEHLCG!

FVDEHLCGSHL!

HLCGSHL!

SHLVEA !

VEALY!

YLVCG!

LVCGERGF!

LVCGERGFF!

GFFYTPK!

YTPKA!

GIVECCASVCSLYELEDYCDFVDEHLCGSHLVEALYLVCGERGFFFYTPKA!

1958: protein

sequencing

1977: DNA

sequencing

1958: protein sequencing difficult, DNA sequencing

impossible

Today: protein sequencing difficult, DNA sequencing trivial

Multiple identical

copies of a genome

AGAATATCASequence the reads

Shatter the genome

into reads

Assemble the

genome using

overlapping reads

...TGAGAATATCA...

AGAATATCA

GAGAATATC

TGAGAATAT

GAGAATATCTGAGAATAT

Page 6: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Sequencing Proteins Today

• Putative proteome– If we know a genome, we can predict all the genes

that the genome encodes

– Translating the predicted genes leads us to putative proteome (set of all proteins encoded by genome)

– But how can we determine whether a protein is syntetized in a specific tissue?

• Peptide identification– In practice, merely confirming that 10aa long peptide

from a known protein is present in a sample confirms that the sample contains this protein

• Peptide sequencing– Inferring amino acid sequence of a peptide without

relying on a proteome

– Used in situations when proteome is unknown

Page 7: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Sequencing Proteins with Mass

Spectrometry• Most mass spectrometers can only

measure masses of rather short peptides

(e.g. < 30-40 amino acids). To bypass this

limitation:

– Proteases (e.g., trypsin) break proteins

into short peptides.

– A mass spectrometer breaks these

peptides into charged fragment ions

and measures the mass/charge ratio*

and intensity of each ion.

How do we reconstruct the peptide from the

collection of mass/charge ratios?

* For simplicity, we assume that all masses are integers and all charges

are 1

Page 8: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

Which Peptide Generated This Spectrum?

200 400 600 800 1000 120000

Intensity

100

mass/charge

Page 9: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionaries

• The Ostrich Hemoglobin Riddle

• Searching for Post-Translational Modifications

• Spectral Alignment Algorithm

Page 10: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Prefix and Suffix Peptides

503

574

400

285

156

prefix

masses

suffix

masses

71

174

289

418

0

129 156 115 103 71

Page 11: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Reconstructing a Peptide from Prefix/Suffix

Masses

503

574

400

285

156

prefix

masses

suffix

masses

71

174

289

418

0

Page 12: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

503

574

400

285

156

71

174

289

418

0

Reconstructing a Peptide from Prefix/Suffix

Masses

Ideal Spectrum: Collection of all prefix

and suffix masses of a peptide.

Note: we don’t know which masses

correspond to prefixes and which

masses correspond to suffixes.

Peptide explains Spectrum if

IdealSpectrum(Peptide) = Spectrum.

IdealSpectrum(REDCA):

0 71 156 174 285 289 400 418 503 574

Page 13: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Spectrum 0 71 156 174 285 289 400 418 503 574

Graph(Spectrum)

Decoding an Ideal Spectrum Problem:

Reconstruct a peptide from its ideal spectrum.

• Input: A collection of integers Spectrum.

• Output: An amino acid string Peptide that

explains Spectrum.

Reconstructing a Peptide from an Ideal Spectrum

0 71 156 574503418400289285174

Page 14: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Graph(Spectrum)

Decoding an Ideal Spectrum Problem:

Reconstruct a peptide from its ideal spectrum.

• Input: A collection of integers Spectrum.

• Output: An amino acid string Peptide that

explains Spectrum.

0 71 156 574503418400289285174

D

Nodes: masses in the spectrum

Edges: connect node i to node j if j - i is the mass of an

amino acid a. Label this edge by a.

Reconstructing a Peptide from an Ideal Spectrum

Page 15: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Graph(Spectrum)

Decoding an Ideal Spectrum Problem:

Reconstruct a peptide from its ideal spectrum.

• Input: A collection of integers Spectrum.

• Output: An amino acid string Peptide that

explains Spectrum.

0 71 156 574503418400289285174

R E D C

C D E R

A

A

Nodes: masses in the spectrum

Edges: connect node i to node j if j - i is the mass of an

amino acid a. Label this edge by a.

Reconstructing a Peptide from an Ideal Spectrum

Page 16: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

DecodingIdealSpectrum Algorithm

DecodingIdealSpectrum(Spectrum)

construct Graph(Spectrum)

find a path Path from source to sink in Graph(Spectrum)

return amino acid string spelled by labels of Path

Spectrum 0 71 156 174 285 289 400 418 503 574

Graph(Spectrum) 0 71 156 574503418400289285174

R E D C

C D E R

A

A

Page 17: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Does This Approach Work for All Spectra?

DecodingIdealSpectrum(Spectrum)

construct Graph(Spectrum)

find a path Path from source to sink in Graph(Spectrum)

return amino acid string spelled by labels of Path

Spectrum 0 57 114 128 215 229 316 330 387 444

Graph(Spectrum) 0 57 114 444387330316229215128

G

N

G S S G G

K/Q

A

D D K/Q

T T T T A N

Page 18: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Does This Approach Work for All Spectra?

DecodingIdealSpectrum(Spectrum)

construct Graph(Spectrum)

find a path Path from source to sink in Graph(Spectrum)

return amino acid string spelled by labels of Path

Spectrum 0 57 114 128 215 229 316 330 387 444

Graph(Spectrum) 0 57 114 444387330316229215128

G

N

G S S G G

K/Q

A

D D K/Q

T T T T A N

IdealSpectrum(NTTAG) ≠ Spectrum!

Page 19: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Correcting DecodingIdealSpectrum

Graph(Spectrum)

Spectrum 0 57 114 128 215 229 316 330 387 444

0 57 114 444387330316229215128

G

N

G S S G G

K/Q

A

D D K/Q

T T T T A N

IdealSpectrum(GGDTN) = Spectrum

DecodingIdealSpectrum(Spectrum)

construct Graph(Spectrum)

for each path Path from source to sink in

Graph(Spectrum)

Peptide ← amino acid string spelled by labels of Path

if IdealSpectrum(Peptide) = Spectrum

return Peptide

Page 20: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Correcting DecodingIdealSpectrum

IdealSpectrum(GGDTN) = Spectrum

DecodingIdealSpectrum(Spectrum)

construct Graph(Spectrum)

for each path Path from source to sink in

Graph(Spectrum)

Peptide ← amino acid string spelled by labels of Path

if IdealSpectrum(Peptide) = Spectrum

return Peptide

• Not efficient algorithm, may be exponential in the number of nodes (= number of masses in the spectrum)

Page 21: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionariesy4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

Page 22: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

From Ideal to Real Spectra

Decoding a (Real) Spectrum Problem:

Reconstruct a peptide from its spectrum.

• Input: A collection of integers Spectrum.

• Output: An amino acid string Peptide that

explains Spectrum the best (among all possible

a.a. strings).

0 71 99 156 180 196 228 285 289 320 400 421 503 574

Real spectra have both false and missing masses.

0 71 156 174 285 289 400 418 503 574

Ideal Spectrum of REDCA

Real

Spectrum

Page 23: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

From Ideal to Real Spectra

Decoding a (Real) Spectrum Problem:

Reconstruct a peptide from its spectrum.

• Input: A collection of integers Spectrum.

• Output: An amino acid string Peptide that

explains Spectrum the best (among all possible

a.a. strings).

0 71 99 156 180 196 228 285 289 320 400 421 503 574

Real spectra have both false and missing masses.

0 71 156 174 285 289 400 418 503 574

Ideal Spectrum of REDCA

Real

Spectrum

Page 24: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Which Peptide Generated This Spectrum?

Intensity

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z200 400 600 800 1000 12000

0

100

mass/charge

DinosaurSpectru

m

• Once the peptide is known, how can we measure how well a peptide explains a spectrum?

Page 25: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

Annotating a Spectrum

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

Suffix peptide of length

3 (denoted as y3)

Prefix peptide of length

10 (denoted as b10)

• Once we infer the peptide that generated a given spectrum,we can annotate the spectrum by establishing correspondencebetween peaks in the spectrum and prefixes/suffixes of thepeptide

Page 26: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

Shared Peak Count

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

GLVGAPCLRGLPGK annotates b10, b11, b13, y3, y4, y12 (Shared Peak Count =

6)

• Shared Peak Count – the number of peaks annotated bypeptide

Page 27: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

Another Candidate Peptide

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

GLVGAPCLRGLPGK annotates b10, b11, b13, y3, y4, y12 (Shared Peak Count =

6)

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

y6y5

b3

b6

b9

y7

y8

y2

y4

y3

A T K I V D C F M T Y

b1

y10

b2

y9

b3

y8

b4

y7

b5

y6

b6

y5

b7

y4

b8

y3

b9

y2

b10

y1

0

100

Intensity

200 400 600 800 1000 12000

ATKIVDCFMTY annotates b3, b6, b9, y2, y3, y4, y5, y6, y7, y8 (Shared Peak Count = 10)

Page 28: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

Another Candidate Peptide

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

GLVGAPCLRGLPGK annotates b10, b11, b13, y3, y4, y12 (Shared Peak Count =

6)

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

y6y5

b3

b6

b9

y7

y8

y2

y4

y3

A T K I V D C F M T Y

b1

y10

b2

y9

b3

y8

b4

y7

b5

y6

b6

y5

b7

y4

b8

y3

b9

y2

b10

y1

0

100

Intensity

200 400 600 800 1000 12000

ATKIVDCFMTY annotates b3, b6, b9, y2, y3, y4, y5, y6, y7, y8 (Shared Peak Count = 10)

Page 29: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

How Should We Score an Annotated Spectrum?

Shared Peak

Count?

Sum of intensities

of explained peaks?

ignores

intensities

large peaks may

dominate the score

Idea: probabilistic model of spectra so that large peaks

contribute to the score but do not dominate it.

Page 30: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

Transform the spectrum of mass m into a spectral

vector

s1, …,si, …, sm

The value si (amplitude) approximates the likelihood

that mass i is the prefix mass of an (unknown!)

peptide that generated the spectrum.

Spectral Vectors

Page 31: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

R

71

Peptid

e

00…0100…0100…0100…0100…01156 bits 71 bits103 bits115 bits129 bits

peptide vector

Peptide

mass 156 129 115 103

E D C A

From a Peptide to a Peptide Vector

Page 32: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Converting a Peptide into a Peptide Vector

Problem. Convert a peptide into a peptide vector.

• Input: A string of amino acids Peptide.

• Output: The peptide vector of Peptide.

From a Peptide to a Peptide Vector

Page 33: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Converting a Peptide into a Peptide Vector

Problem. Convert a peptide into a peptide vector.

• Input: A string of amino acids Peptide.

• Output: The peptide vector of Peptide.

From a Peptide Vector to a Peptide

Converting a Peptide Vector into a Peptide

Problem. Convert a binary vector into a peptide.

• Input: A binary vector P.

• Output: A peptide whose peptide vector is equal

to P (if such a peptide exists).

Page 34: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

From a Spectrum to a Spectral Vector

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

DinosaurSpectru

m

(mass m)Intensity

200 400 600 800 1000 12000

0

100

+9 (amplitude) is not the intensity of this peak!

It is a likelihood that this peak will be annotated by a prefix

of an (unknown!) peptide that generated the spectrum.

+9amplitude

Page 35: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

From a Spectrum to a Spectral Vector

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

DinosaurSpectru

m

(mass m)Intensity

200 400 600 800 1000 12000

0

100

The larger the peak at mass i,

the larger the value (amplitude) si of the spectral

vector

s1........-5.........+3..........................+9...+7..............sm

an integer-valued vector with m

coordinates

spectral

vector

+7+9+3-5amplitude

Page 36: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionaries

Page 37: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Scoring Peptide against Spectrum

Score of Peptide against Spectrum is the dot

product of Peptide and Spectrum:

score(Peptide, Spectrum) = p1*s1+p2*s2+ …+pm*sm.

000…001000…001000…001000…001000…001Peptide

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++In

tensity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

DinosaurSpectru

m

Intensity

200 400 600 800 1000 12000

0

100

s1..........-5…....+3..........................+9...+7..............sm

Spectrum ******************************************

Page 38: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Peptide Sequencing Problem

Peptide Sequencing Problem: Given a spectral

vector, find a peptide vector with maximum score

against this spectral vector.

• Input: A spectral vector Spectrum.

• Output: An amino acid string Peptide that

maximizes

score(Peptide, Spectrum)

among all possible peptides.

Page 39: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Building a DAG from a Spectral Vector

1. For a spectral vector Spectrum=s1, … ,sm, construct

DAG(Spectrum) on nodes {0,1, …, m}

Page 40: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Building a DAG from a Spectral Vector

1. For a spectral vector Spectrum=s1, … ,sm, construct

DAG(Spectrum) on nodes {0,1, …, m}

2. Assign weight si to node i

33 2 10 0 0 -2 -3 -1 -7 5 -8 0 1 2 10 4 6 9 3 0

Page 41: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Building a DAG from a Spectral Vector

1. For a spectral vector Spectrum=s1, … ,sm, construct

DAG(Spectrum) on nodes {0,1, …, m}

2. Assign weight si to node i

3. Connect node i to node j if j - i is equal to the mass

of an amino acid

Toy alphabet: amino acids X and Z with masses 4 and 5

33 2 10 0 0 -2 -3 -1 -7 5 -8 0 1 2 10 4 6 9 3 0

X

Page 42: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Building a DAG from a Spectral Vector

1. For a spectral vector Spectrum=s1, … ,sm, construct

DAG(Spectrum) on nodes {0,1, …, m}

2. Assign weight si to node i

3. Connect node i to node j if j - i is equal to the mass

of an amino acid

Toy alphabet: amino acids X and Z with masses 4 and 5

33 2 10 0 0 -2 -3 -1 -7 5 -8 0 1 2 10 4 6 9 3 0

X

Z

Page 43: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Building a DAG from a Spectral Vector

1. For a spectral vector Spectrum=s1, … ,sm, construct

DAG(Spectrum) on nodes {0,1, …, m}

2. Assign weight si to node i

3. Connect node i to node j if j - i is equal to the mass

of an amino acid

Toy alphabet: amino acids X and Z with masses 4 and 5

X

Z

0 33 2 1 90 0 0 4 -2 -3 -1 -7 6 5 -8 0 3 1 2 1 0

Z

X X

Score(XZZXX, Spectrum) = 0 + 4 + 6 + 9 + 3 + 0 =

22

Page 44: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Peptides = Paths in DAG(Spectrum)

X

Z

0 33 2 1 90 0 0 4 -2 -3 -1 -7 6 5 -8 0 3 1 2 1 0

Z

X X

• Peptide: any path from source to sink in DAG(Spectrum).

• score(Peptide, Spectrum): sum of scores of nodes it visits.

• Peptide Sequencing Problem: finding a maximum-weight

path in a node-weighted DAG.

Score(XZZXX, Spectrum) = 0 + 4 + 6 + 9 + 3 + 0 =

22

Page 45: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Peptide Sequencing = Finding a Path in

DAG(Spectrum)

X

Z

0 33 2 1 90 0 0 4 -2 -3 -1 -7 6 5 -8 0 3 1 2 1 0

Z

X X

Peptide Sequencing Problem: Given a spectral

vector, find a peptide vector with maximum score

against this spectral vector.

• Input: A spectral vector Spectrum.

• Output: A maximum-weight path in DAG(Spectrum).

Score(XZZXX, Spectrum) = 0 + 4 + 6 + 9 + 3 + 0 =

22

STOP and Think: How do we find a maximum-weight path in

a node-weighted DAG?

Page 46: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Intensity

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z200 400 600 800 1000 12000

0

100

mass/charge

DinosaurSpectru

m

???????????

Generating Spectrum

from an (Unknown) Peptide

Page 47: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Intensity

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z200 400 600 800 1000 12000

0

100

mass/charge

DinosaurSpectru

m

???????????

Reconstructing Peptide from Spectrum

Page 48: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

y6y5

b3

b6

b9

y7

y8

y2

y4

y3

A T K I V D C F M T Y

b1

y10

b2

y9

b3

y8

b4

y7

b5

y6

b6

y5

b7

y4

b8

y3

b9

y2

b10

y1

De novo Reconstruction!

mass/charge

DinosaurSpectru

m

ATKIVDCFMTY

Intensity

0

100

200 400 600 800 1000 12000

But this highest scoring peptide is biologically

incorrect!

Scoring functions that reliably assign the highest

score to the biologically correct peptide remain

unknown...

Page 49: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Intensity

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z200 400 600 800 1000 12000

0

100

mass/charge

DinosaurSpectru

m

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

…HKMPRSTATPKRMGGCTFSPCFTKRLMATSGLVGAPGLRGLPGKMGGCTFGTRACFGH…

The correct peptide may not score highest among all peptides,

but it typically scores highest among all peptides in the

proteome* * If the resulting score is sufficiently high

The highest-scoring peptide in Proteome

Imagine that You Know the Proteome…

Page 50: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Intensity

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z200 400 600 800 1000 12000

0

100

mass/charge

DinosaurSpectru

m

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z

…HKMPRSTATPKRMGGCTFSPCFTKRLMATSGLVGAPGLRGLPGKMGGCTFGTRACFGH…

The highest-scoring peptide in Proteome

Imagine that You Know the Proteome…

Peptide identification: reconstructing a peptide as

the highest-scoring peptide occurring in a

proteome.

Page 51: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

All peptides from

Proteome

MDERHILNM, KLQWVCSDL,

PTYWASDL, ENQIKRSACVM,

TLACHGGEM, NGALPQWRT,

HLLERTKMNVV, GGPASSDA,

GGLITGMQSD, MQPLMNWE,

ALKIIMNVRT, AVGELTK,

HEWAILF, GHNLWAMNAC,

GVFGSVLRA, EKLNKAATYIN

WR

A

C

VG

E

K

DW

LP

T

L T

WR

A

C

VG

E

K

DW

LP

T

L T

AVGELTK

Peptide

Identificatio

nAll possible peptides (20n)

AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,AA

AAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI,

AVGELTI, AVGELTK , AVGELTL, AVGELTM,

YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY

Peptide Sequencing vs. Peptide

Identification

Which approach is

faster?

Peptide

Sequencing

AVGELTK

Page 52: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Peptide

Sequencing

Peptide

Identificatio

nThe set of all peptides in Proteome is much smaller than the set of of all possible peptides.

However, peptide sequencing algorithms are much faster, even though their search space is much larger.

Peptide sequencing eliminates the time-consuming scan of Proteomeby modeling the problem as the Longest Path in a DAG Problem.

However, since the scoring function is imperfect, peptide sequencing remains inaccurate: state-of-the-art tools correctly reconstruct only 30% of spectra.

Peptide Sequencing vs. Peptide

Identification

Page 53: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionaries

Page 54: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Peptide Identification Problem: Find a peptide

from a proteome with maximum score against a

spectrum.

• Input: A spectral vector Spectrum and an amino

acid string Proteome.

• Output: An a.a. string Peptide that maximizes

score(Peptide, Spectrum)

among all substrings of Proteome.

STOP and Think: How can we possibly construct

the T. rex proteome?

The Peptide Identification Problem

Page 55: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

• 90% of proteins making up

animal bones are collagens.

• Since collagens are often conserved across

species, collagens in T. rex were likely similar to

collagens in some present-day species.

Approximating the T. rex Proteome

Page 56: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

• As a sanity check, Asara

compared the T. rex spectra

against the UniProt database

(≈ 200 million amino acids

from hundreds of species).

• Asara also included some mutated versions of

collagens from present-day species; we will call

the augmented database UniProt+.*

Approximating the T. rex Proteome

*concatenate all proteins in UniProt+ into a string Proteome for

simplicity

Page 57: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Most of the high-scoring peptides identified in

UniProt+ were chicken collagens, supporting the

hypothesis that birds evolved from dinosaurs.

Searching T. rex Spectra Against UniProt+

Page 58: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

DinosaurPeptide = GLVGAPGLRGLPGK is only

one mutation away from a chicken collagen

peptide.

Searching T. rex Spectra Against UniProt+

But how can we be sure that DinosaurPeptide is

the correct interpretation of DinosaurSpectrum?

y4

y6

y10

V N V A D C G A E A L A R

b1

y12

b2

y11

b3

y10

b4

y9

b5

y8

b6

y7

b7

y6

b8

y5

b9

y4

b10

y3

b11

y2

b12

y1

[M+2H]2+ = 673.46

y3

y5

y11

y12

b3

b4

b5

b6

b7

b8b9

b10

b11

b12

y7

y8

y9

b2

y2

Inte

nsity (

%)

100

0

200 1200400 600 800 1000 m/z

y4

y12++

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

y3

y12

b10

b11

b13

G L V G A P G L R G L P G K

b1

y13

b2

y12

b3

y11

b4

y10

b5

y9

b6

y8

b7

y7

b8

y6

b9

y5

b10

y4

b11

y3

b12

y2

b13

y1

200 1200400 600 800 1000 m/z200 400 600 800 1000 1200 m/z00

Intensity

100

Page 59: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

But billions of peptides not occurring in UniProt+

outscore DinosaurPeptide.

Statistical Significance of DinosaurPeptide

DinosaurPeptide is the highest scoring peptide for

DinosaurSpectrum among all peptides in UniProt+.

Page 60: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

But billions of peptides not occurring in UniProt+

outscore DinosaurPeptide.

We need to develop a method for evaluating the

statistical significance of identified peptides.

STOP and Think: Does this concern you?

Statistical Significance of DinosaurPeptide

DinosaurPeptide is the highest scoring peptide for

DinosaurSpectrum among all peptides in UniProt+.

* If the resulting score is sufficiently high

Page 61: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Given a parameter threshold, a peptide Peptide and

a spectral vector Spectrum form a Peptide-

Spectrum Match (PSM) if:

• Peptide is a highest-scoring peptide against

Spectrum among all peptides in Proteome

• Score(Peptide, Spectrum) ≥ threshold

Peptide-Spectrum Matches (PSMs)

Page 62: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Given a parameter threshold, a peptide Peptide and

a spectral vector Spectrum form a Peptide-

Spectrum Match (PSM) if:

• Peptide is a highest-scoring peptide against

Spectrum among all peptides in Proteome

• Score(Peptide, Spectrum) ≥ threshold

PSMthreshold(Proteome, SpectralVectors): the set of

Peptide-Spectrum Matches (PSMs) resulting from a

set of SpectralVectors (for a given Proteome and

threshold).

Peptide-Spectrum Matches (PSMs)

Page 63: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

PSM Search Problem: Identify all Peptide-

Spectrum Matches scoring above a threshold for a

set of spectra and a proteome.

• Input: A set SpectralVectors, an amino acid

string Proteome, and a score threshold

threshold.

• Output: The set of Peptide-Spectrum Matches

PSMthreshold(Proteome, SpectralVectors).

PSM Search Problem

Page 64: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Was T. rex Just a Big Chicken?

• Paleontology Meets Computing

• Decoding an Ideal Spectrum

• From Ideal to Real Spectra

• Peptide Sequencing

• Peptide Identification

• Spectral Dictionaries

Page 65: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

STOP and Think: A PSM search of 1,000 spectra

from a human sample against the human proteome

results in 100 PSMs whose score surpassed a

threshold.

• What is the fraction of erroneous PSMs among

them?

Hint: Repeat the same experiment for a randomly

generated DecoyProteome of the same size as the

human proteome.

Decoy Proteome

If you identify 5 PSMs in DecoyProteome, then 5/100

of PSMs identified in the human proteome are

estimated to be correct.

Page 66: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

False Discovery Rate

For the T. rex spectra, there are 27 PSMs in UniProt+

and only 1 PSM in DecoyProteome with score ≥ 100

(FDR =1/27= 3.7%)

STOP and Think: Have we found ≈27* T. rex

peptides?!

False discovery rate (FDR):

|PSMthreshold(DecoyProteome,SpectralVectors)|

|PSMthreshold(Proteome, SpectralVectors)|

Many of these PSM correspond to contaminants, e.g., keratin from human skin

How can we estimate the statistical significance of

an individual PSM?

Page 67: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

The Monkey and the Typewriter

abagytegertoyhktyhkyrzaxujhotgemamaghtkmjytrabagytegertozhkoghk

yrzacatxujhotgemamaghtkdhairytdgbikemjytrcgtyyghjotfghtsybdkkpw

kfffldogjfiegbebgncnslkcfscnnclnscnscnsnovcsnovslvnsnvnvnsnvsvv

slnlnsvlnsnvnslnvnlsvnsnnsvnslvnscatlvslvslvlmbgjgaggeyjllfghlh

mhlhjjlhjlhabracadabraghytnlkprstyrhketryabcnccowcnchairmtdgwom

bikedmdppdtyhtgftxcjabcjwqbcoewbvcoewvbexovervhhddwdwqdhgyusjff

fgfghhhhhy…

Page 68: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

The Monkey Can Spell!

abagytegertoyhktyhkyrzaxujhotgemamaghtkmjytrabagytegertozhkoghk

yrzacatxujhotgemamaghtkdhairytdgbikemjytrcgtyyghjotfghtsybdkkpw

kfffldogjfiegbebgncnslkcfscnnclnscnscnsnovcsnovslvnsnvnvnsnvsvv

slnlnsvlnsnvnslnvnlsvnsnnsvnslvnscatlvslvslvlmbgjgaggeyjllfghlh

mhlhjjlhjlhabracadabraghytnlkprstyrhketryabcnccowcnchairmtdgwom

bikedmdppdtyhtgftxcjabcjwqbcoewbvcoewvbexovervhhddwdwqdhgyusjff

fgfghhhhhy…

The

MonkeyDictionary

NEW EDITION

2,000 new words

even more nonsense

Page 69: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of Strings from Dictionary

The Monkey and the Typewriter Problem: Find the expected

number of strings from dictionary appearing in a randomly

generated text.

• Input: A set of strings Dictionary and an integer n.

• Output: The expected number of strings from Dictionary that

appear in a randomly generated string of length n.

Page 70: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of High-Scoring Peptides Problem: Find

the expected number of high-scoring peptides (against a given

spectrum) in a decoy proteome.

• Input: A Spectrum, an integer n, and a score threshold.

• Output: The expected number of peptides in a decoy

proteome of length n that score a least threshold against

Spectrum.

Expected Number of High-Scoring Peptides

The Monkey and the Typewriter Problem: Find the expected

number of strings from dictionary appearing in a randomly

generated text.

• Input: A set of strings Dictionary and an integer n.

• Output: The expected number of strings from Dictionary that

appear in a randomly generated string of length n.

STOP and Think: Are these problems equivalent?

Page 71: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Spectral DictionaryIn

tensity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

Dictionarythreshold(Spectrum): the set of all peptides with score

at least threshold against Spectrum.

Expected Number of High-Scoring Peptides Problem: Find

the expected number of high-scoring peptides (against a given

spectrum) in a decoy proteome.

• Input: A Spectrum, an integer n, and a score threshold.

• Output: The expected number of peptides in a decoy

proteome of length n that score a least threshold against

Spectrum.

Page 72: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

Expected Number of High-Scoring Peptides Problem:

Find the expected number of high-scoring peptides (against a

given spectrum) in a decoy proteome.

• Input: A Spectrum, an integer n, and a score threshold.

• Output: The expected number of peptides from

Dictionarythreshold(Spectrum) occurring in a decoy proteome

of length n.

Spectral DictionaryDictionarythreshold(Spectrum): the set of all peptides with score

at least threshold against Spectrum.

Page 73: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Inte

nsity (

%)

100

0

[M+2H]2+ = 646.20

200 1200400 600 800 1000 m/z

Spectral DictionaryDictionarythreshold(Spectrum): the set of all peptides with score

at least threshold against Spectrum.

Expected Number of High-Scoring Peptides Problem:

Find the expected number of high-scoring peptides (against a

given spectrum) in a decoy proteome.

• Input: Peptides Dictionarythreshold(Spectrum) and an integer

n.

• Output: The expected number of strings from

Dictionarythreshold(Spectrum) occurring in a decoy proteome

of length n.

Page 74: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of High-Scoring Peptides Problem:

Find the expected number of high-scoring peptides (against a

given spectrum) in a decoy proteome.

• Input: Peptides Dictionarythreshold(Spectrum) and an integer

n.

• Output: The expected number of strings from

Dictionarythreshold(Spectrum) occurring in a decoy proteome

of length n.

Spectral DictionaryDictionarythreshold(Spectrum): the set of all peptides with score

at least threshold against Spectrum.

Page 75: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of Occurrences of Peptides

from Dictionary in DecoyProteome

• Probability that a string Peptide matches a string starting at

a given position in DecoyProteome:

Pr(Peptide) =1/20|Peptide|

Page 76: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of Occurrences of Peptides

from Dictionary in DecoyProteome

• Probability that a string Peptide matches a string starting at

a given position in DecoyProteome:

Pr(Peptide) =1/20|Peptide|

• Exp. #times Peptide appears in DecoyProteome of length

n:

E(Peptide, n) ≈ n * Pr(Peptide) = n * 1/20|Peptide|

Page 77: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Expected Number of Occurrences of Peptides

from Dictionary in DecoyProteome

• Probability that a string Peptide matches a string starting at

a given position in DecoyProteome:

Pr(Peptide) =1/20|Peptide|

• Exp. #times Peptide appears in DecoyProteome of length

n:

E(Peptide, n) ≈ n * Pr(Peptide) = n * 1/20|Peptide|

• Exp. #times peptides from Dictionary appear in

DecoyProteome of length n:

E(Dictionary, n) ≈ n * (∑each Peptide in Dictionary 1/20|Peptide|)

= n * Pr(Dictionary)

How many peptides in DecoyUniprot+ are expected to score

at least -19 against DinosaurSpectrum, i.e., what is

E(Dictionary-19(DinosaurSpectrum, |UniProt+|)?

Page 78: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Probability of Spectral Dictionary

Probability of Spectral Dictionary Problem: Find the

probability of a spectral dictionary for a given spectrum and

score threshold.

• Input: A spectral vector Spectrum and a score threshold

threshold.

• Output: The probability of Dictionarythreshold(Spectrum).

Page 79: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Probability and Size of Spectral Dictionary

Size of Spectral Dictionary Problem: Find the size of a

spectral dictionary for a given spectrum and score threshold.

• Input: A spectral vector Spectrum and a score threshold

threshold.

• Output: The size of Dictionarythreshold(Spectrum).

Probability of Spectral Dictionary Problem: Find the

probability of a spectral dictionary for a given spectrum and

score threshold.

• Input: A spectral vector Spectrum and a score threshold

threshold.

• Output: The probability of Dictionarythreshold(Spectrum).

Page 80: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

• Given a spectral vector s = s1…si…sn

• size(i, t): #peptides matching i-prefix s1…si with

score t

• sizea(i, t): #peptides matching i-prefix s1…si with score

t and ending in amino acid a:

• Removing the last amino acid a from a peptide results in

a shorter peptide with mass i - |a| and score t - si:

• Initialization: size(0, 0) = 1, size(i, t) = 0 for i < 0

Computing the Size of a Spectral

Dictionary

size(i, t) = Σ all amino acids a sizea(i, t)

size(i, t) = Σ all amino acids a sizea(i, t)

= Σ all amino acids a size(i - |a|,t - si)

Page 81: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

• Given Spectrum=s1…sm, construct DAG(Spectrum)

on nodes 0,…, m with weight of node i equal to si .

Computing the Size of a Spectral

Dictionary

Amino acids X and Z with respective masses 4 and 5.

X

Z

00001100010002Spectrum

Page 82: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

Computing the Size of a Spectral

Dictionary

X

Z

• Given Spectrum=s1…sm, construct DAG(Spectrum)

on nodes 0,…, m with weight of node i equal to si .

• a path from source to sink spells out a peptide.

XXZ

Page 83: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

Computing the Size of a Spectral Dictionary

Score(XXZ,Spectrum) = 0 + 1 + 0 + 2 = 3

X

Z

• Given Spectrum=s1…sm, construct DAG(Spectrum)

on nodes 0,…, m with weight of node i equal to si .

• a path from source to sink corresponds to a

peptide.

• sum of weights of nodes on path = score of

PSM.

Page 84: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

Computing the Size of a Spectral

Dictionary

X

Z

Score(XZX,Spectrum) = 0 + 1 + 1 + 2 = 4

• Given Spectrum=s1…sm, construct DAG(Spectrum)

on nodes 0,…, m with weight of node i equal to si .

• a path from source to sink corresponds to a

peptide.

• sum of weights of nodes on path = score of

PSM.

Page 85: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

Computing size(i, t)

t=0 1 0 0 0

t=1 0 0 0 0

t=2 0 0 0 0

t=3 0 0 0 0

t=4 0 0 0 0

X

Z

Page 86: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

size(i, t)=Σ all amino acids a size(i - |a|,t - si)

t=0 1 0 0 0 0

t=1 0 0 0 0 1

t=2 0 0 0 0 0

t=3 0 0 0 0 0

t=4 0 0 0 0 0

X

Z

Page 87: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

size(i, t)=Σ all amino acids a size(i - |a|,t - 1)

t=0 1 0 0 0 0

t=1 0 0 0 0 1

t=2 0 0 0 0 0

t=3 0 0 0 0 0

t=4 0 0 0 0 0

X

Z

Page 88: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

t=0 1 0 0 0 0 0

t=1 0 0 0 0 1 1

t=2 0 0 0 0 0 0

t=3 0 0 0 0 0 0

t=4 0 0 0 0 0 0

X

Z

size(i, t)=Σ all amino acids a size(i - |a|,t - si)

Page 89: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

t=0 1 0 0 0 0 0 0

t=1 0 0 0 0 1 1 0

t=2 0 0 0 0 0 0 0

t=3 0 0 0 0 0 0 0

t=4 0 0 0 0 0 0 0

X

Z

size(i, t)=Σ all amino acids a size(i - |a|,t - si)

Page 90: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

t=0 1 0 0 0 0 0 0

t=1 0 0 0 0 1 1 0

t=2 0 0 0 0 0 0 0

t=3 0 0 0 0 0 0 0

t=4 0 0 0 0 0 0 0

X

Z

size(i, t)=Σ all amino acids a size(i - |a|,t - 0)

Page 91: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

t=0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

t=1 0 0 0 0 1 1 0 0 1 0 1 0 1 0

t=2 0 0 0 0 0 0 0 0 0 2 0 0 0 0

t=3 0 0 0 0 0 0 0 0 0 0 0 0 0 1

t=4 0 0 0 0 0 0 0 0 0 0 0 0 0 2

X

Z

size(i, t)=Σ all amino acids a size(i - |a|,t - si)

Page 92: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

t=0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

t=1 0 0 0 0 1 1 0 0 1 0 1 0 1 0

t=2 0 0 0 0 0 0 0 0 0 2 0 0 0 0

t=3 0 0 0 0 0 0 0 0 0 0 0 0 0 1

t=4 0 0 0 0 0 0 0 0 0 0 0 0 0 2

X

Z

size(i, t)=Σ all amino acids a size(i - |a|,t - 2)

Page 93: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

• Given a spectral vector s = s1…si…sn

• Pr(i, t): sum of probabilities of all peptides matching i-

prefix s1…si with score t

• Pra(i, t): sum of probabilities of all peptides matching i-

prefix s1…si with score t and ending in amino acid a:

• Removing the last amino acid a from results in a shorter

peptide with mass i – |a|, score t – si , and 20 times larger

probability:

Computing the Probability of a Spectral

Dictionary

Pr(i, t) = Σ all amino acids a Pra(i, t)

Pr(i, t) = Σ all amino acids a Pra(i, t)

= Σ all amino acids a Pr (i - |a|,t - si) / 20

Page 94: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

size(i, t)=Σ all amino acids a size(i - |a|,t - si)

t=0 1 0 0 0 0

t=1 0 0 0 0 1

t=2 0 0 0 0 0

t=3 0 0 0 0 0

t=4 0 0 0 0 0

X

Z

Page 95: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

00001100010002

Pr(i, t)=Σ all amino acids a Pr(i - |a|,t - si)/20

t=0 1 0 0 0 0

t=1 0 0 0 0 1

t=2 0 0 0 0 0

t=3 0 0 0 0 0

t=4 0 0 0 0 0

X

Z

1/20

Page 96: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

Hint: Dictionary-19(DinosaurSpectrum) contains

219,136,251,374 peptides (!) and has probability

0.00018

STOP and Think: What is the statistical significance of

the PSM

(DinosaurPeptide, DinosaurSpectrum)

found in searches against the UniProt+ database of

length n ≈ 200 million amino acids?

Statistical Significance of the PSM

(DinosaurPeptide, DinosaurSpectrum)

Reminder: PSM (DinosaurPeptide, DinosaurSpectrum)

has score -19.

Page 97: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

STOP and Think: How many PSMs with score at

least -19 do we expect to find in a decoy proteome

of the same size as UniProt+?

n * Pr(Dictionary-19(DinosaurSpectrum)) = 35,311

Statistical Significance of the PSM

(DinosaurPeptide, DinosaurSpectrum)

Finding DinosaurPeptide as an

interpretation of DinosaurSpectrum is

no more surprising than the monkey

typing “THE” after 200 million

attempts...

The

MonkeyDictionary

NEW EDITION

2,000 new words

even more nonsense

Page 98: Was T. rex Just a Big Chicken?poincare.matf.bg.ac.rs/~jovana/bi/predavanja/Chapter_11.pdf · Was T. rex Just a Big Chicken? • Paleontology Meets Computing • Decoding an Ideal

STOP and Think: How many PSMs with score at

least -19 do we expect to find in a decoy proteome

of the same size as UniProt+?

n * Pr(Dictionary-19(DinosaurSpectrum)) = 35,311

Statistical Significance of the PSM

(DinosaurPeptide, DinosaurSpectrum)

Finding DinosaurPeptide as an

interpretation of DinosaurSpectrum is

no more surprising than the monkey

typing “THE” after 200 million

attempts...

...which is not surprising at all!

The

MonkeyDictionary

NEW EDITION

2,000 new words

even more nonsense