BITS - Search engines for mass spec data

http://www.bits.vib.be/training

BITS MS Data Processing – Search Engines UGent, Gent, Belgium – 19 September 2011

Lennart Martens lennart.martens@UGent.be

Lennart MARTENS lennart.martens@ebi.ac.uk

Proteomics Services Group European Bioinformatics Institute

Hinxton, Cambridge United Kingdom www.ebi.ac.uk

search engines

lennart martens

lennart.martens@ugent.be

Computational Omics and Systems Biology Group

Department of Medical Protein Research, VIB Department of Biochemistry, Ghent University

Ghent, Belgium

THREE TYPICAL PRE-PROCESSING STEPS

Global thresholding

Local thresholding

precursor

Noise thresholding

From: http://www.purdue.edu/dp/bioscience/images/spectrum.jpg

Charge deconvolution (peptides)

From: Gill et al, EMBO Journal, 2000

Charge deconvolution (proteins)

Monoisotopic mass Average mass

Centroiding (peak picking)

From: Last et al, Nature Rev. Mol. Cell Bio., 2007

A total ion current chromatogram, corrected by typical pre-processing steps.

Combined results

0.7 0.3

24.5 23.7

0.2 0.10

RAW RAW GZIPped Peak lists Peak lists GZIPpedData type

Q-TOF I Esquire HCT

Data type

File size (MB)

Q-TOF I Esquire HCT

See: Martens et al., Proteomics, 2005

Data size reduction

MS/MS IDENTIFICATION

PEPTIDE FRAGMENTATION FINGERPRINTING

L E N N A R T

LENNAR

LENNART

E N N A R T L

NART NNART

ENNART

LENNART

intensity

Peptide sequences and MS/MS spectra

protein sequence database

in silico

digest

YSFVATAER

HETSINGK

MILQEESTVYYR

SEFASTPINK

peptide sequences

Int m/z

Int in silico

theoretical MS/MS spectra

experimental MS/MS spectrum

in silico

matching

1) YSFVATAER 34 2) YSFVSAIR 12 3) FFLIGGGGK 12

peptide scores

Peptide fragment fingerprinting (PFF)

Spectral comparison

Sequencial comparison

Threading comparison

database sequence theoretical spectrum

experimental spectrum

compare

database sequence experimental spectrum

compare de novo sequence

database sequence experimental spectrum

thread

From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007

Three types of PFF identification

• MASCOT (Matrix Science) http://www.matrixscience.com • SEQUEST (Scripps, Thermo Fisher Scientific) http://fields.scripps.edu/sequest • X!Tandem (The Global Proteome Machine Organization) http://www.thegpm.org/TANDEM • OMSSA (NCBI) http://pubchem.ncbi.nlm.nih.gov/omssa/

The most popular algorithms

Incorrect identifications

Correct identifications

False positives False negatives

Threshold score

Adapted from: www.proteomesoftware.com – Wiki pages

Overall concept of scores and cut-offs

p=0.05 p=0.01 p=0.005 p=0.00050%

false positives

identifications

higher stringency

Playing with probabilistic cut-off scores

• Very well established search engine

• Can be used for MS/MS (PFF) identifications

• Based on a cross-correlation score (includes peak height)

• Published core algorithm (patented, licensed to Thermo), Eng, JASMS 1994

• Provides preliminary (Sp) score, rank, cross-correlation score (XCorr),

and score difference between the top tow ranks (deltaCn, ∆Cn)

• Thresholding is up to the user, and is commonly done per charge state

• Many extensions exist to perform a more automatic validation of results

SEQUEST

XCorr = deltaCn= XCorr1− XCorr 2

XCorr1𝑅0 −

� 𝑅𝑅+75

𝑖=−75

𝑅𝑖 = �𝑥𝑗 ∙ 𝑦(𝑗+𝑖)

𝑗=1

From: MacCoss et al., Anal. Chem. 2002

From: Peng et al., J. Prot. Res.. 2002

SEQUEST: some additional pictures

• Very well established search engine, Perkins, Electrophoresis 1999

• Can do MS (PMF) and MS/MS (PFF) identifications

• Based on the MOWSE score,

• Unpublished core algorithm (trade secret)

• Predicts an a priori threshold score that identifications need to pass

• From version 2.2, Mascot allows integrated decoy searches

• Provides rank, score, threshold and expectation value per identification

• Customizable confidence level for the threshold score

Mascot

y = 8.3761x - 34.089R2 = 0.9985

6.50 7.00 7.50 8.00 8.50log10(number of AA)

Mascot: some additional pictures

p=0.05 p=0.01 p=0.005 p=0.00050%

false positives

identifications

• A successful open source search engine, Craig and Beavis, RCMS 2003

• Based on a hyperscore (Pi is either 0 or 1):

• Relies on a hypergeometric distribution (hence hyperscore)

• Published core algorithm, and is freely available

• Provides hyperscore and expectancy score (the discriminating one)

• X!Tandem is fast and can handle modifications in an iterative fashion

• Has rapidly gained popularity as (auxiliary) search engine

X!Tandem

* !* !n

i i b yi

HyperScore I P N N=

0 20 40 60 80 100

hyperscore

20 25 30 35 40 45 50

hyperscore 0

0 20 40 60 80 100

hyperscore

Adapted from: Brian Searle, ProteomeSoftware, http://www.proteomesoftware.com/XTandem_edited.pdf

significance threshold

E-value=e-8.2

X!Tandem: some additional pictures

A note on how the scores differ

HyperScore

DeltaCn

E-Value

Accuracy Score Relative Score

Adapted from: Brian Searle, ProteomeSoftware

• A successful open source search engine, Geer, JPR 2004

• Relies on a Poisson distribution

• Published core algorithm, and is freely available

• Provides an expectancy score, similar to the BLAST E-value

• OMSSA was recently upgraded to take peak intensity into account

• Good really good marks in a recently published comparative study

Yeast lysate spectrum, m/z matches of fragment peak matches versus all NCBI nr sequence library. Poisson distribution fitted.

Validation of the Poisson distribution model: mean number of modelled and measured

matching peaks (against the NCBI nr database) for two mass tolerances.

Adapted from: Geer et al., J. Prot. Res., 2004

OMSSA: some additional pictures

COMPARATIVE STUDIES

Kapp et al., Proteomics, 2005

1.6x more?!

Balgley et al., Mol. Cell. Proteomics, 2007

Mascot SEQUEST

Phenyx

ProteinSolver

212 (+4,2%)

486 (+9,6%)

329 (+6,5%)

380 (+7,5%)

3229 3792

3186 168

139 77 195

Figure courtesy of Dr. Christian Stephan, Medizinisches Proteom-Center, Ruhr-Universität Bochum; Human Brain Proteome Project

Combining the output of search algorithms

SEQUENCIAL COMPARISON

ALGORITHMS

Image from: Matthias Wilm, EMBL Heidelberg, Germany http://www.narrador.embl-heidelberg.de/GroupPages/PageLink/activities/SeqTag.html

sequence tag

The concept of sequence tags was introduced by Mann and Wilm (Mann,and Wilm, Anal. Chem. 1994, 66: 4390-4399).

Sequence tags

• Tabb, Anal. Chem. 2003, Tabb, JPR 2008, Dasari, JPR 2010

• Recent implementations of the sequence tag approach

• Refine hits by peak mapping in a second stage to resolve ambiguities

• Rely on a empirical fragmentation model

• Published core algorithms, DirecTag and TagRecon freely available

• Most useful to retrieve unexpected peptides (modifications, variations)

• Entire workflows exist (e.g., combination with IDPicker)

GutenTag, DirecTag, TagRecon

From: Tabb et al., Anal. Chem., 2003

GutenTag: some additional pictures

Example of a manual de novo of an MS/MS spectrum No more database necessary to extract a sequence!

Algorithms

Lutefisk Sherenga

PEAKS PepNovo

References

Dancik 1999, Taylor 2000 Fernandez-de-Cossio 2000

Ma 2003, Zhang 2004 Frank 2005, Grossmann 2005

De novo compared to sequence tags

Thank you!

Questions?

BITS - Search engines for mass spec data

Education

256M bits DDR SDRAM - Intel · 256M bits DDR SDRAM EDD2508AKTA (32M words × 8 bits) Description The EDD2508AK is a 256M bits DDR SDRAM organized as 8,388,608 words × 8 bits × 4

Specification for Internal- Combustion Reciprocating SPEC 7B 11C...API SPEC*7B-LLC 94 = 0732290 0538657 734 Specification for Internal-Combustion Reciprocating Engines for Oil-Field

Industrial Generator Sets - gmwebsite · 2009. 9. 20. · operation manual, spec sheet, or sales invoice. Controller Description ... 4.2 Air-Cooled Engines 23..... 4.3 Liquid-Cooled

Hi-Spec - media.brintex.commedia.brintex.com/Occurrence/174/Brochure/4937/brochure.pdfHi-Spec J25SCA Hi-Spec J30 MAF Hi-Spec M31 ET Hi-Spec J32 ET Hi-Spec J40 ET Hi-Spec J50 MAF Hi-Spec

TECHNICAL SPECIFICATIONS DETROIT DD13itest.demanddetroit.com/pdf/Engines/1.13-DD13-PTO-Spec-Sheet-FNL… · DETROIT ™ TECHNICAL SPECIFICATIONS DD13 ... reserved. Detroit Diesel

Spec Table of Contents - designandconstruction.ucsf.edu · Web viewContractor’s supply trucks or vehicles shall not be permitted to idle engines any more ... Negative air machines

Geothermal Direct-Use and Geothermal … and Geothermal Greenhouse Operations ... DRILL BITS • DRAG BITS (BLADE BITS) • ROLLER BITS (TRI-CONE)

virtualization Issues and Challenges in MemoryExample: 4-level page tables (48-bit virtual address) CR3 → mm->pgd 9 bits 9 bits 9 bits 9 bits 12 bits Doubly virtualized memory! VM1

GEM eS – Spec Guide GEM eS – Spec Guide GEM e4S – Spec Guide GEM e4S – Spec Guide GEM e6S – Spec Guide GEM e6S – Spec Guide GEM eL XD – Spec Guide

Ask A Biologist - Biology Bits - Photosynthesisaskabiologist.asu.edu/.../biology-bits/Biology-Bits-photosynthesis.pdf · Ask A Biologist | Web address: askabiologist.asu.edu/activities/biology-bits

Bits are just bits (no inherent meaning) …class.ece.iastate.edu/arun/Cpre381_Sp06/lectures/arithmetic.pdf · 1 • Bits are just bits (no inherent meaning) —conventions define

Bits is Bits? Right? Check Again

2011-2012 EnginE SPEc guidE - SouthCommmedia.cygnus.com/files/cygnus/whitepaper/OOH/2011/OCT/enginespec... · EnginE SPEc guidE 2011-2012 ... MaxxForce D engines are tried, true and

Custom-Designed PDC Bits Steerable PDC Bits Tri-Cone Bits Product Catalogue Online.pdf · Product Catalogue Custom-Designed PDC Bits Steerable PDC Bits Tri-Cone Bits Stable, Durable

DLX Instruction Formatmeseec.ce.rit.edu/eecc551-winter2001/551-12-5-2001.pdf · DLX Instruction Format 6 bits 5 bits 5 bits 16 bits Opcode rs1 rd Immediate 6 bits 5 bits 5 bits 5

Network and Architectures Branch Overview (Bits are Bits)

SPEC THIS SPEC THAT

BITS Pilani Hyderabad Campus BITS Pilani presentation D. Powar Lecturer, BITS-Pilani, Hyderabad Campus

Notas de la versión de Cisco WebEx Event Center (versión ... · 10.6,10.7,10.8,10.9, 10.10 2003Server,Vistade 32-bits/64-bits,Windows 7de32-bits/64-bits, Windows8de 32-bits/64-bits,Windows

THE END OF CRYPTOGRAPHY AS WE KNOW IT · 2017. 9. 27. · ECC 256 256 bits 128 bits 0 bits ECC 521 521 bits 256 bits 0 bits ... Communication session is intercepted and saved for