25
1 Insider secrets to PTM analysis using proteomics mass spectrometry David Chiang Chairman, Sage-N Research [email protected] Stanford SUMS Meeting, Sept 2, 2010

Insider secrets to PTM analysis using proteomics mass

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Insider secrets to PTM analysis using proteomics mass

1

Insider secrets to PTM analysisusing proteomics mass spectrometry

David ChiangChairman, Sage-N [email protected]

Stanford SUMS Meeting, Sept 2, 2010

Page 2: Insider secrets to PTM analysis using proteomics mass

100902SUMS 2 www.SageNResearch.com

Can we really do these with mass spec?

1. Identify peptides with PTMs?• Phosphorylation• SUMO-ylation• Glycosylation

2. Quantify them?

3. Localize the PTM sites?• Ascore for phospho

Page 3: Insider secrets to PTM analysis using proteomics mass

100902SUMS 3 www.SageNResearch.com

* Your mileage will vary. (Helps if the Data God is smiling on you that day.)

*

Page 4: Insider secrets to PTM analysis using proteomics mass

100902SUMS 4 www.SageNResearch.com

Incredible potential of ProteomicsExpensiveInstrument

CrypticData Eureka!Tricky

Data Analysis

Mass Spectrometry Protein Spectrum Your Face HereSupercomputer

X-Ray Crystallography DNA Photo 51 Fame & FortuneLots of Paper, Pencils

Page 5: Insider secrets to PTM analysis using proteomics mass

100902SUMS 5 www.SageNResearch.com

Agenda

• Overview• PTM data analysis• Scripting capability is key• SORCERER for cancer PTMs

Page 6: Insider secrets to PTM analysis using proteomics mass

100902SUMS 6 www.SageNResearch.com

Scientific Advisory Board

John R. Yates, IIIScripps Research Institute

Steven GygiHarvard Medical School

Ruedi AebersoldSwiss Institute of Tech. Zurich

Roman ZubarevUppsala University

Page 7: Insider secrets to PTM analysis using proteomics mass

100902SUMS 7 www.SageNResearch.com

Some of Our Customers

Page 8: Insider secrets to PTM analysis using proteomics mass

100902SUMS 8 www.SageNResearch.com

Proteomics in Principle: Simple!

1. Prepare clean sample

2. Separate and measure

3. Interpret the data

Page 9: Insider secrets to PTM analysis using proteomics mass

100902SUMS 9 www.SageNResearch.com

Proteomics in practice: ‘Food Network’ + ‘CSI’

• Play Chef with sample preparation• Play Detective with data interpretation

Need both very different skill sets for success!

Page 10: Insider secrets to PTM analysis using proteomics mass

100902SUMS 10 www.SageNResearch.com

If the FBI did Protein ID …

Basic algorithm (FBI’s AFIS)Step1: Measure Raw Data [MSMS]

– Extract all available finger prints from crime scene

Step2: Search Engine [SEQUEST]– Get top scoring DB hits for each finger print

– Several sub-scores (size, features, correlation of “minutae”)

Step3: Determine True/False Hits [PeptideProphet]– Decide which hits are valid and which are iffy

– Individual hits assigned true/false or probability value

Step4: Infer Suspect [ProteinProphet]– Combine one or more finger/thumb prints back to person

“Q-Tof”

“Ion Trap”

Page 11: Insider secrets to PTM analysis using proteomics mass

100902SUMS 11 www.SageNResearch.com

Proteomics Costs & Time(adapted from Mallick & Kuster NBT 7/10)

• How long to get results?• 5-10 working days for simple protein ID and quantitation• 4-6 weeks for quantitative protein expression profiling• 2-6 months for PTM analysis

• How much does it cost?• $50-200 for simple protein ID• $500-2K for simple PTM analysis• $5K-15K for complex PTM analysis• $1K-$2K for quantitative protein expression profiling

Page 12: Insider secrets to PTM analysis using proteomics mass

100902SUMS 12 www.SageNResearch.com

Data Analysis for Phosphorylation Profiling

• Search CID spectra with STY +80 Da– Look for intact phosphate

• Tyrosine• Phosphate stabilized near Proline

– Run Ascore algorithm to localize sites if possible

• Search CID spectra with ST -18 Da, precursor -80 Da• Look for phosphate losses during separation

• Search CID MSA/MS3 with precursor -80 Da• Search ETD with STY +80 Da• Other

• HCDAll depend on robust peptide ID foundation.All require some workflow scripting.All require some dataset-dependent detective work.

Page 13: Insider secrets to PTM analysis using proteomics mass

100902SUMS 13 www.SageNResearch.com

Other PTMs of Interest

• SUMO-ylation (Small Ubiquitin-like MOdifier)– ~100 residue peptide attached to K in consensus sequence y-K-x-D/E– Scripting required around Search Engine

• Account for PTM fragmentation• Search consensus sequence

• Glycosylation– Scripting for N-linked glycopeptides for consensus sequence N-x-S/T– (See www.Proteomics2.com blog for info 10/08)

Page 14: Insider secrets to PTM analysis using proteomics mass

100902SUMS 14 www.SageNResearch.com

Which Quantitation Method?(adapted from Mallick & Kuster NBT 7/10)

Metabolic labeling (eg SILAC, 15N)• Small changes (10-50%) in cell cultures

Peptide labeling (iTRAQ, TMT)• Moderate changes (50-200%) for multiplex experiments (eg time course, dose response)

Label-free using MS1• Moderate changes (50-200%) for comparing many similar experiments

Label-free using spectral counting• Large changes (>100%) for comparing many similar experiments

Single/Multiple Reaction Monitoring (SRM,MRM) with spiked standards(AQUA)

• Absolute quantity in complex sample (eg serum)

Page 15: Insider secrets to PTM analysis using proteomics mass

100902SUMS 15 www.SageNResearch.com

Peptide Search Engine = Foundation

AgilentABIBrukerHitachiShimadzuThermoWaters…

TandemMass Spectrometer

PeptideSearch Engine

SEQUEST®

Mascot®

X!Tandem

PHENYX®

OMSSA

ProteinPilot™

ProteinProspector

SpectrumMill

InspecT

SCAFFOLDTrans-Proteomic Pipeline

DTASelect, Census

Proteome Discoverer

PeptideInference

ProteinInference

SEQUEST is a registered trademark of University of Washington. Others are trademarks of their respective owners.

PTM(Ascore)

Quant.(SpecCount)

Page 16: Insider secrets to PTM analysis using proteomics mass

100902SUMS 16 www.SageNResearch.com

MASCOTSearch Engine and Protein ID Application

• Relatively simple but extensively-tuned scoringmodel based on PMF

– Unpublished model, but appears to be Ionscore = Max(#Matched m/z’s) x Factor [Factor commonly ~11]

– Ideal for TOF data• Every peak is “significant”, mass-accurate, with no extra peaks

• Search Engine key steps1. From spectrum, derive list of characteristic m/z’s

(peaks are ignored in scoring)

2. Peptide score according to # matched m/z’s vs. theory

Page 17: Insider secrets to PTM analysis using proteomics mass

100902SUMS 17 www.SageNResearch.com

SEQUESTSearch Engine Only

• Original search engine, circa 1993– Cross-correlation score (like cell phones etc.)– Compute-intensive, but sensitive– Developed for ion trap data

• Lots of extra peaks, with poor mass accuracy

– Needs post-processing• PeptideProphet, Proteome Discoverer, DTASelect, Scaffold

• Search Engine key steps1. Find top 500 DB entries with good overlap with spectrum

(peaks are considered)

2. Calculate cross-correlation etc., and report top hit(s)

Page 18: Insider secrets to PTM analysis using proteomics mass

100902SUMS 18 www.SageNResearch.com

SEQUEST 3Gon Sorcerer

Key Steps:1. Fast but accurate spectrum matching

that uses ppm mass accuracy2. Complete "on the fly" consideration of

chosen PTMs3. All candidate matches (not just a

"short list") get rigorous cross-correlation scores ...

4. ... to enable most sensitive rankingand effective compilation of results

5. Comprehensive XCorrs also allowreliable E-value results for simpleinterpretation, as well as computationof legacy SEQUEST scores

Page 19: Insider secrets to PTM analysis using proteomics mass

100902SUMS 19 www.SageNResearch.com

SORCERER Scripting PlatformMUSE = Modular Utilities for Search Engines

• Re-score engines for search results– Ascore phosphorylation site localization– Other similarity scores (e.g. binomial scoring used by Mann, Gygi)– Post-process and filter for decoys, incorrect mods

• Pre-process spectra for search and re-search– Calculate mass shift and re-adjust precursor masses– Compute FDR using different formulas– Mass-accurate MS3 from mass-accurate MS2– Add/subtract 98 Da for phospho searches

• Interactive data query– Run what-if’s quickly on complex search results

• Workflow automation– Auto archive after search

Page 20: Insider secrets to PTM analysis using proteomics mass

100902SUMS 20 www.SageNResearch.com

Difference between ‘Script’ and ‘Program’e.g. Microsoft Excel Example

Microsoft Excelis a Program

Customize platform viaVBasic Scripts

Page 21: Insider secrets to PTM analysis using proteomics mass

100902SUMS 21 www.SageNResearch.com

SORCERER Script Example:VersaSearch™ for combined CID/ETD

• Unified search accommodates ETD variations• All-CID, all-ETD, alternating, & decision tree

• Peptide terminus mods to account for b vs c, y vs. z-radical• Post-processing to define “true” vs. “decoy”

• Post-search re-scoring provides other similarity scores• E.g. binomial scores• User-defined modules for special PTMs

SEQUEST 3GPeptide ID

Extractpeaklist

ProteinInference

RAW peakspephits prot hits

DetermineCID vs. ETD

TagTrue/Decoyw/ MUSE

Peptide FDRw/ terminus mods

Page 22: Insider secrets to PTM analysis using proteomics mass

100902SUMS 22 www.SageNResearch.com

SORCERER Script Example: DTASelect and Census

• Great tools for filtering, reporting and quant from Yates Lab– Available from Yates lab (fields.scripps.edu)

– We recommend it for SILAC etc. (commercial offering in development)

• On SORCERER, install scripts and run census_select.muse– Runs DTASelect, then Census

– View DTASelect report directly from Sorcerer Web GUI

– Runs Census pre-analysis using search results and supplied params/config

census_select.muse -c ~/custom/census/census_config_silac.xml

Page 23: Insider secrets to PTM analysis using proteomics mass

100902SUMS 23 www.SageNResearch.com

Hedge Fund IT Model for Proteomics

Server + Software + Storage + Support + Service + Training

Specialized IT PlatformScripting, Visualization

Informaticians

Traders/BiologistsSpecialist End-Users

SORCERER Enterprise Product+Service

Page 24: Insider secrets to PTM analysis using proteomics mass

100902SUMS 24 www.SageNResearch.com

Sorcerer product line• Sorcerer 2 — well-established standard

– Self-contained tower (or 2U rack unit)

– Can fit under bench, shared by lab

– One high-end processing node, with integrated storage

– Good match for single Orbi or similar

• Sorcerer Enterprise — new HT platform

– Pre-integrated rack cabinet

– Goes in server room, shared site-wide

– Includes multinode blade server withhigh capacity storage subsystem

– Ideal for multi-instrument throughput

Page 25: Insider secrets to PTM analysis using proteomics mass

100902SUMS 25 www.SageNResearch.com

Requirements for PTM Data Analysis

• Robust peptide ID workflow– Heavy-duty compute servers for hedge-fund-class data

crunching

• Scripting platform for prototyping analyses– Generally customized for each workflow

• Both expert “Chef” and “Detective” needed– Semi-custom detective-work is Sage-N’s specialty

For more info, contact [email protected] work with top labs on semi-custom workflow scripting for cancer-related PTMs.