View
2
Download
0
Category
Preview:
Citation preview
1
Insider secrets to PTM analysisusing proteomics mass spectrometry
David ChiangChairman, Sage-N Researchdavid@SageNResearch.com
Stanford SUMS Meeting, Sept 2, 2010
100902SUMS 2 www.SageNResearch.com
Can we really do these with mass spec?
1. Identify peptides with PTMs?• Phosphorylation• SUMO-ylation• Glycosylation
2. Quantify them?
3. Localize the PTM sites?• Ascore for phospho
100902SUMS 3 www.SageNResearch.com
* Your mileage will vary. (Helps if the Data God is smiling on you that day.)
*
100902SUMS 4 www.SageNResearch.com
Incredible potential of ProteomicsExpensiveInstrument
CrypticData Eureka!Tricky
Data Analysis
Mass Spectrometry Protein Spectrum Your Face HereSupercomputer
X-Ray Crystallography DNA Photo 51 Fame & FortuneLots of Paper, Pencils
100902SUMS 5 www.SageNResearch.com
Agenda
• Overview• PTM data analysis• Scripting capability is key• SORCERER for cancer PTMs
100902SUMS 6 www.SageNResearch.com
Scientific Advisory Board
John R. Yates, IIIScripps Research Institute
Steven GygiHarvard Medical School
Ruedi AebersoldSwiss Institute of Tech. Zurich
Roman ZubarevUppsala University
100902SUMS 7 www.SageNResearch.com
Some of Our Customers
100902SUMS 8 www.SageNResearch.com
Proteomics in Principle: Simple!
1. Prepare clean sample
2. Separate and measure
3. Interpret the data
100902SUMS 9 www.SageNResearch.com
Proteomics in practice: ‘Food Network’ + ‘CSI’
• Play Chef with sample preparation• Play Detective with data interpretation
Need both very different skill sets for success!
100902SUMS 10 www.SageNResearch.com
If the FBI did Protein ID …
Basic algorithm (FBI’s AFIS)Step1: Measure Raw Data [MSMS]
– Extract all available finger prints from crime scene
Step2: Search Engine [SEQUEST]– Get top scoring DB hits for each finger print
– Several sub-scores (size, features, correlation of “minutae”)
Step3: Determine True/False Hits [PeptideProphet]– Decide which hits are valid and which are iffy
– Individual hits assigned true/false or probability value
Step4: Infer Suspect [ProteinProphet]– Combine one or more finger/thumb prints back to person
“Q-Tof”
“Ion Trap”
100902SUMS 11 www.SageNResearch.com
Proteomics Costs & Time(adapted from Mallick & Kuster NBT 7/10)
• How long to get results?• 5-10 working days for simple protein ID and quantitation• 4-6 weeks for quantitative protein expression profiling• 2-6 months for PTM analysis
• How much does it cost?• $50-200 for simple protein ID• $500-2K for simple PTM analysis• $5K-15K for complex PTM analysis• $1K-$2K for quantitative protein expression profiling
100902SUMS 12 www.SageNResearch.com
Data Analysis for Phosphorylation Profiling
• Search CID spectra with STY +80 Da– Look for intact phosphate
• Tyrosine• Phosphate stabilized near Proline
– Run Ascore algorithm to localize sites if possible
• Search CID spectra with ST -18 Da, precursor -80 Da• Look for phosphate losses during separation
• Search CID MSA/MS3 with precursor -80 Da• Search ETD with STY +80 Da• Other
• HCDAll depend on robust peptide ID foundation.All require some workflow scripting.All require some dataset-dependent detective work.
100902SUMS 13 www.SageNResearch.com
Other PTMs of Interest
• SUMO-ylation (Small Ubiquitin-like MOdifier)– ~100 residue peptide attached to K in consensus sequence y-K-x-D/E– Scripting required around Search Engine
• Account for PTM fragmentation• Search consensus sequence
• Glycosylation– Scripting for N-linked glycopeptides for consensus sequence N-x-S/T– (See www.Proteomics2.com blog for info 10/08)
100902SUMS 14 www.SageNResearch.com
Which Quantitation Method?(adapted from Mallick & Kuster NBT 7/10)
Metabolic labeling (eg SILAC, 15N)• Small changes (10-50%) in cell cultures
Peptide labeling (iTRAQ, TMT)• Moderate changes (50-200%) for multiplex experiments (eg time course, dose response)
Label-free using MS1• Moderate changes (50-200%) for comparing many similar experiments
Label-free using spectral counting• Large changes (>100%) for comparing many similar experiments
Single/Multiple Reaction Monitoring (SRM,MRM) with spiked standards(AQUA)
• Absolute quantity in complex sample (eg serum)
100902SUMS 15 www.SageNResearch.com
Peptide Search Engine = Foundation
AgilentABIBrukerHitachiShimadzuThermoWaters…
TandemMass Spectrometer
PeptideSearch Engine
SEQUEST®
Mascot®
X!Tandem
PHENYX®
OMSSA
ProteinPilot™
ProteinProspector
SpectrumMill
InspecT
…
SCAFFOLDTrans-Proteomic Pipeline
DTASelect, Census
Proteome Discoverer
…
PeptideInference
ProteinInference
…
SEQUEST is a registered trademark of University of Washington. Others are trademarks of their respective owners.
PTM(Ascore)
Quant.(SpecCount)
100902SUMS 16 www.SageNResearch.com
MASCOTSearch Engine and Protein ID Application
• Relatively simple but extensively-tuned scoringmodel based on PMF
– Unpublished model, but appears to be Ionscore = Max(#Matched m/z’s) x Factor [Factor commonly ~11]
– Ideal for TOF data• Every peak is “significant”, mass-accurate, with no extra peaks
• Search Engine key steps1. From spectrum, derive list of characteristic m/z’s
(peaks are ignored in scoring)
2. Peptide score according to # matched m/z’s vs. theory
100902SUMS 17 www.SageNResearch.com
SEQUESTSearch Engine Only
• Original search engine, circa 1993– Cross-correlation score (like cell phones etc.)– Compute-intensive, but sensitive– Developed for ion trap data
• Lots of extra peaks, with poor mass accuracy
– Needs post-processing• PeptideProphet, Proteome Discoverer, DTASelect, Scaffold
• Search Engine key steps1. Find top 500 DB entries with good overlap with spectrum
(peaks are considered)
2. Calculate cross-correlation etc., and report top hit(s)
100902SUMS 18 www.SageNResearch.com
SEQUEST 3Gon Sorcerer
Key Steps:1. Fast but accurate spectrum matching
that uses ppm mass accuracy2. Complete "on the fly" consideration of
chosen PTMs3. All candidate matches (not just a
"short list") get rigorous cross-correlation scores ...
4. ... to enable most sensitive rankingand effective compilation of results
5. Comprehensive XCorrs also allowreliable E-value results for simpleinterpretation, as well as computationof legacy SEQUEST scores
100902SUMS 19 www.SageNResearch.com
SORCERER Scripting PlatformMUSE = Modular Utilities for Search Engines
• Re-score engines for search results– Ascore phosphorylation site localization– Other similarity scores (e.g. binomial scoring used by Mann, Gygi)– Post-process and filter for decoys, incorrect mods
• Pre-process spectra for search and re-search– Calculate mass shift and re-adjust precursor masses– Compute FDR using different formulas– Mass-accurate MS3 from mass-accurate MS2– Add/subtract 98 Da for phospho searches
• Interactive data query– Run what-if’s quickly on complex search results
• Workflow automation– Auto archive after search
100902SUMS 20 www.SageNResearch.com
Difference between ‘Script’ and ‘Program’e.g. Microsoft Excel Example
Microsoft Excelis a Program
Customize platform viaVBasic Scripts
100902SUMS 21 www.SageNResearch.com
SORCERER Script Example:VersaSearch™ for combined CID/ETD
• Unified search accommodates ETD variations• All-CID, all-ETD, alternating, & decision tree
• Peptide terminus mods to account for b vs c, y vs. z-radical• Post-processing to define “true” vs. “decoy”
• Post-search re-scoring provides other similarity scores• E.g. binomial scores• User-defined modules for special PTMs
SEQUEST 3GPeptide ID
Extractpeaklist
ProteinInference
RAW peakspephits prot hits
DetermineCID vs. ETD
TagTrue/Decoyw/ MUSE
Peptide FDRw/ terminus mods
100902SUMS 22 www.SageNResearch.com
SORCERER Script Example: DTASelect and Census
• Great tools for filtering, reporting and quant from Yates Lab– Available from Yates lab (fields.scripps.edu)
– We recommend it for SILAC etc. (commercial offering in development)
• On SORCERER, install scripts and run census_select.muse– Runs DTASelect, then Census
– View DTASelect report directly from Sorcerer Web GUI
– Runs Census pre-analysis using search results and supplied params/config
census_select.muse -c ~/custom/census/census_config_silac.xml
100902SUMS 23 www.SageNResearch.com
Hedge Fund IT Model for Proteomics
Server + Software + Storage + Support + Service + Training
Specialized IT PlatformScripting, Visualization
Informaticians
Traders/BiologistsSpecialist End-Users
SORCERER Enterprise Product+Service
100902SUMS 24 www.SageNResearch.com
Sorcerer product line• Sorcerer 2 — well-established standard
– Self-contained tower (or 2U rack unit)
– Can fit under bench, shared by lab
– One high-end processing node, with integrated storage
– Good match for single Orbi or similar
• Sorcerer Enterprise — new HT platform
– Pre-integrated rack cabinet
– Goes in server room, shared site-wide
– Includes multinode blade server withhigh capacity storage subsystem
– Ideal for multi-instrument throughput
100902SUMS 25 www.SageNResearch.com
Requirements for PTM Data Analysis
• Robust peptide ID workflow– Heavy-duty compute servers for hedge-fund-class data
crunching
• Scripting platform for prototyping analyses– Generally customized for each workflow
• Both expert “Chef” and “Detective” needed– Semi-custom detective-work is Sage-N’s specialty
For more info, contact David@SageNReearch.com.We work with top labs on semi-custom workflow scripting for cancer-related PTMs.
Recommended