Regular or Decaf? Options for Quantitative Biomarker Discovery
Mary F Lopez
Director, BRIMSMSACL
San Diego, Feb 8, 2011
A Tale of Two Discoveries:
Unbiased Targeted
Metabolic Pathway
Targeted, multiplexed SRM
Rank by ROC, Ratio
QuantitativeDifferential analysis
Clinical samples
Unbiased, quantitative LC-MS/MS
Global differential analysis
List of biomarkers
Rank by ROC, Ratio
Clinical samples
Collaboration with Dr. MingMing Ning, Mass General Hospital and Harvard University
Questions we asked:
1. What proteins may be involved in PFO related strokes?
2. What proteins may differentiate ischemic from hemorrhagic strokes?
Discovery and verification of cardiovascular and stroke biomarkers in blood
Atrial septum
•The prevalence of PFOs in the general population is around 25%, but it is doubled in cryptogenic (unknown cause) stroke patients. These patients are often young and “healthy”.
•If there is a clot traveling into the right side of the heart, it can cross the PFO, enter the left atrium, and travel out of the heart and to the brain causing a stroke.
•This suggests a causal relationship between PFO and cryptogenic stroke.
•Supported by NIH/NINDS (Dr Tom Jacobs), MGH Cardio-Neurology Division evaluates patients with PFO related stroke and the therapeutic efficacy of surgical PFO closure and other stroke treatment.
•Venous blood samples from stroke patients are taken before (upon admission) and at12 month follow up after PFO closure.
•Biomarkers for PFO-related stroke could be clinically useful.
Number ofpatients
Sample type Patient
5 PFO pre OP Stroke
8 Patient matched PFO post OP Stroke
PFO and Stroke
When the atrial septum does not close properly, it is called a patent foramen ovale or PFO.
Design and Optimization• Robust, commercially
available nanoflow LC• Commercially
available columns• Focus on stable spray• Focus on high
reproducibility of peak intensities, CV<8%
Pass 1: Quantification• Chromatographic alignment• Uncompromised full scan
measurements • Each sample is measured
once – no need for replicates
• Internal peptide standards (normalization)
• Triplicate runs of peptide standards every 12 runs (instrument QC)
• “Top10” data dependent acquisition
• Stringent Precursor ion selection criteria
Pass 2: Identification• Targeted fragmentation by Inclusion
list• Relaxed Precursor ion selection
criteria• Not all samples measured – subset
as determined from SIEVE analysis• Internal peptide standards • Marker stratification using multi
marker and single ROC AUC (SIEVE 1.3)
• Export to Ingenuity pathway analysis
methods Inclusion list
Unbiased Discovery using LC-MS/MS requires a new approach to make it quantitative
Strategy for label-free LC-MS/MS differential expression analysis
Frame
• Global intensity-based features• Reconstructed chromatograms• Significance statistics and
annotation filters
Align• Chromatographic alignment• Scalable Adaptive Tiled
Algorithm
Identify
• SEQUEST or Mascot for protein/peptides
• ChemSpider for small molecules
LC setup for Two-Pass Workflow
• Controlled trapping flow rates ensure consistent sample retention and salt removal.• Rapid column equilibration allows for enhanced duty cycle. • Hydrophobicity differences from trapping column to resolving column allow for effective
refocusing.• Larger resolving column allows for higher capacity, and rapid application of gradient to the
column(flow rates to 1.0uL/min)
Waste tubing
HV in(from source)
5cm trap column
25cm resolving column
From pump/autosampler
To Orbitrap Velos
Data Quality – Spray Stability
March 30 April 4
Spray stability is the largest factor in reproducible measurements.
* Get it on BRIMS
*
Method for Assessing Systematic Errors without Sample Technical Replicates
• Systematic errors are assessed from triplicate acquisitions of standard sample.
• Internal standards are spiked in all samples.
Blank
Standards Calibration
Standards Calibration
Standards Calibration
Top 10 Fragmenta-
tion
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Blank run
Standards calibration
Column regeneration – top 10
Patient samples – full scan onlyPass 1 Acquisition cycle
The Two- Pass workflow increases sensitivity by effectively fractionating samples in silico
• Typical MS acquisition parameters are not geared for quantification.
• Data dependent acquisition triggers MS2 based on intensity so most low abundance biomarkers are not identified in complex mixtures with large dynamic range ie blood.
• Classical “shotgun” approaches focus on physical sample fractionation strategies such as depletion and cation exchange coupled with data dependent acquisition.
• Physical fractionation such as depletion and cation exchange results in loss of albumin binding proteins and multiple runs for each sample.
• These approaches are very labor intensive, time consuming and typically do not allow for rigorous quantification and statistical power because fewer samples are analyzed due to time and instrument constraints.
• The Two-Pass Workflow using Inclusion Lists optimizes parameters for full scan quantification and MS2 triggering separately.
• This results in:
Higher sensitivity and getting deeper into the proteome, ie more ID’s
Precise and reproducible quantification
Flexibility in creating the inclusion list based upon desired attributes such as differential expression, PTM’s or other parameters.
Reducing the number of replicates needed since LC reproducibility and %CV’s are so low (ca 8%)
Increases the biological sampling power (can run more samples in a shorter time).
Decreases the circular biomarker identification syndrome, ie we identified Albumin AGAIN.
Quantitative Statistics for the Two-Pass Workflow
2076
498 461540un
ique
pep
tides
Data dependent “Top 10”
Inclusion list 1
Inclusion list 2
Inclusion list 3
Data Dependent “Top 10” vs Inclusion list
Ion Score vs Concentration of Spiked Standard Peptide in Plasma
1 10 100 1000 100000
50
100
150
200
250
300
Top 10
Two-Pass
Concentration (amol)
Ma
sco
t Io
n S
core
Experiment for the PFO stroke study
Sample groups were identified in SIEVE at the beginning of the analysis
Number ofpatients
Sample type Patient
5 PFO pre OP Stroke
8 Patient matched PFO post OP Stroke
Data demonstrated high reproducibility and robustness of measurements
Reconstructed ion chromatogram of an example frame (not differentially expressed)
Whisker plot of expression ratios for all 13 peptides identified for protein gi119372317Gray area represents 90% confidence interval for expected protein ratio
3575 unique peptides and 263 proteins were identified in the study with high confidence128 were differentially expressed (determined by ratio)
ROC* analysis: How can we quickly rank the potential“usefulness” of putative biomarkers for clinical research?
Why? Expression ratio and Pvalue may not necessarily be specific to the pathology.
How can we query the data and test the classification power of the target analytes?• Create ROC curves by plotting false positives vs true positives while adjusting the criteria threshold. The area under
the curve, AUC is a measurement of classification power.
• Use AUC to select optimal candidates and discard suboptimal candidates.
• AUC values range from 0.5 to 1.0. An AUC of 1.0 indicates a specificity and sensitivity of 100%.
• We have developed a multi marker ROC algorithm that calculates errors
*Receiver Operating Characteristic (a classification model)
Specificity
Sen
sitiv
ity
ROC Station* algorithm calculates AUC
• Single and multiple markers
• Calculates errors
• Simulates data
* Get it on BRIMS
Description Peptides
Ratio*Pre OP
VSPost OP
%standard error
StdDevPre OP
VSPost OP
PvaluePre OP
VSPost OP Avg ROC
AUC
_gi_4503635_ref_NP_000497.1_ prothrombin preproprotein [Homo sapiens] 4 0.55 16.57 0.09 9.9E-20 1.00
_gi_261878616_ref_NP_001159907.1_ inter_alpha_trypsin inhibitor heavy chain H1 isoform c [Homo sapiens] 5 0.48 19.94 0.10 9.9E-20 1.00
_gi_283806712_ref_NP_001164609.1_ clusterin isoform 3 [Homo sapiens] 6 0.53 16.32 0.09 9.9E-20 0.99_gi_70778918_ref_NP_002207.2_ inter_alpha_trypsin inhibitor heavy chain H2 [Homo sapiens] 16 0.51 9.27 0.05 9.9E-20 0.99
_gi_32483410_ref_NP_000574.2_ vitamin D_binding protein precursor [Homo sapiens] 7 0.45 19.76 0.09 9.9E-20 0.99
_gi_41393602_ref_NP_958850.1_ complement C1s subcomponent precursor [Homo sapiens] 3 0.56 18.32 0.10 9.9E-20 0.99
_gi_4502261_ref_NP_000479.1_ antithrombin_III precursor [Homo sapiens] 12 0.31 13.57 0.04 9.9E-20 0.98
_gi_31542984_ref_NP_002209.2_ inter_alpha_trypsin inhibitor heavy chain H4 isoform 1 precursor [Homo sapiens] 19 0.45 13.00 0.06 9.9E-20 0.97
_gi_50659080_ref_NP_001076.2_ alpha_1_antichymotrypsin precursor [Homo sapiens] 11 0.49 12.94 0.06 9.9E-20 0.96
_gi_239752152_ref_XP_002348153.1_ PREDICTED: hypothetical protein XP_002348153 [Homo sapiens] 3 0.56 16.58 0.09 9.9E-20 0.96
_gi_73858570_ref_NP_001027466.1_ plasma protease C1 inhibitor precursor [Homo sapiens] 9 0.57 11.69 0.07 9.9E-20 0.96
_gi_38016947_ref_NP_001726.2_ complement C5 preproprotein [Homo sapiens] 8 0.60 16.35 0.10 9.9E-20 0.96
_gi_4557321_ref_NP_000030.1_ apolipoprotein A_I preproprotein [Homo sapiens] 13 0.54 10.15 0.05 9.9E-20 0.96
_gi_62739186_ref_NP_000177.2_ complement factor H isoform a precursor [Homo sapiens] 4 0.60 19.42 0.12 9.9E-20 0.95
_gi_4557871_ref_NP_001054.1_ serotransferrin precursor [Homo sapiens] 16 0.21 13.61 0.03 9.9E-20 0.95
_gi_4557485_ref_NP_000087.1_ ceruloplasmin precursor [Homo sapiens] 22 0.37 13.15 0.05 9.9E-20 0.95
_gi_296080754_ref_NP_001171670.1_ fibrinogen beta chain isoform 2 preproprotein [Homo sapiens] 18 0.21 14.15 0.03 9.9E-20 0.95
_gi_70906437_ref_NP_000500.2_ fibrinogen gamma chain isoform gamma_A precursor [Homo sapiens] 16 0.54 9.85 0.05 9.9E-20 0.94
_gi_169214179_ref_XP_001724196.1_ PREDICTED: similar to complement component 3 [Homo sapiens] 12 0.49 14.89 0.07 9.9E-20 0.94
_gi_4557325_ref_NP_000032.1_ apolipoprotein E precursor [Homo sapiens] 9 0.45 18.63 0.08 9.9E-20 0.94
Top 21 single proteins with highest ROC AUC for PFO Stroke Study
* Ratio = PRE OP/POST OP
Top 2 ROC AUC candidates, selected literature references
Clin Chim Acta. 2009 Apr;402(1-2):160-3.Inter-alpha-trypsin inhibitor heavy chain 4 is a novel marker of acute ischemic stroke.Kashyap RS, Nayak AR, Deshpande PS, Kabra D, Purohit HJ, Taori GM, Daginawala HF.Biochemistry Research Laboratory, Central India Institute of Medical Sciences, 88/2 Bajaj Nagar Nagpur-10, India.
Stroke. 2007 Jul;38(7):2070-3. Epub 2007 May 24.Prothrombotic mutations as risk factors for cryptogenic ischemic cerebrovascular events in young subjects with patent foramen ovale.Botto N, Spadoni I, Giusti S, Ait-Ali L, Sicari R, Andreassi MG.CNR Institute of Clinical Physiology, G. Pasquinucci Hospital, Massa, Italy.
Description Avg ROC AUC_gi_4503635_ref_NP_000497.1_ prothrombin preproprotein [Homo sapiens] 1.00
_gi_261878616_ref_NP_001159907.1_ inter_alpha_trypsin inhibitor heavy chain H1 isoform c [Homo sapiens] 1.00
Biological context? Ingenuity Pathways Analysis (IPA)
Top network Lipid Metabolism
Top physiological system development and function
Neurological Disease
Top disease Hematological system
Top Canonical pathways
Acute phase signalingCoagulation systemComplement systemIntrinsic Prothrombin PathwayExtrinsic Prothrombin Pathway
The entire PFO stroke dataset was uploaded and analyzed with IPA
Targeted
Targeted discovery of stroke biomarkers using a multiplexed assay for a panel of apolipoproteins
Number ofpatients
Blood Collection times Sample type
53 Upon admission Ischemic Stroke
26 Upon admission Hemorrhagic stroke
Apolipoproteins and stroke
The relative levels of various apolipoproteins can be important biomarkers for heart disease, stroke, Alzheimer’s, diabetes and metabolic syndrome.
Typically, these proteins are individually measured in blood by immunoassay.
The availability of a multiplexed assay that could simultaneously and quantitatively measure a panel of apolipoproteins would be an extremely useful clinical research tool.
We decided to interrogate clinical samples to see if apolipoproteins could be used to classify different types of strokes.
Clinical Samples
Ischemic vs hemorrhagic stroke
• About 80 percent of strokes are ischemic, caused by a blockage of the vessels that supply blood to the brain. More than 400,000 people in the United States every year are affected.
• About 20 percent of all strokes are hemorrhagic; this type of stroke involves the rupture of a blood vessel in or around the brain.
• TPA is the only treatment for ischemic stroke. It can only be given within 6 hrs of the event.
• If TPA is given to a hemorrhagic stroke patient, death can result.
• An assay that could accurately differentiate ischemic from hemorrhagic stroke quickly would be clinically useful.
Diagnosis for acute stroke is currently by:• Neurological exam• CAT scan• MRI• Lumbar pucture
SRM assay development is automated and efficient
List of Targeted Proteins
Discovery data:Protein DiscovererSIEVEPeptide AtlasNISTGPMRecombinant ProteinHeavy-Labeled PeptidesQC Standards
Exhaustive List: - Peptides - Transitions
Identify and Verify: - Best Peptides - Best Transitions Refine Transition ListOptimize LC Gradient
Verify the LC-SRM Assay with Recombinant Digests
Analyze Biological Samples
Pinpoint
Pinpoint Algorithmic prediction
Single day development of a multiplexed assay for a panel of apolipoproteins
Import protein sequences and priorLC-MS/MS discovery datalibrary for 10 Apolipoproteins
1Choose optimal “proteotypic” peptides: ie, Highest intensity and unique.Narrow list down to one peptide per protein
2
3Choose at least 5 fragment transitions per peptide. This ensures accurate identification of peptides.Create method and run sample triplicates.
ROC analysis of apolipoprotein levels in hemorrhagic vs ischemic stroke patients: Single marker AUC
Top AUC for single marker
Apo CIII 0.85 +/- 0.05
1. Apolipoprotein Panel Apo AIApo AIIApo AIVApo BApo CIApo CIIApo CIIIApo DApo EApo H
ROC analysis of apolipoprotein levels in hemorrhagic vs ischemic stroke patients: Multi marker AUC
Top AUC for multi markers
Apo CIII and Apo AI 1.0 +/- 0
Apo CIII and Apo CI 1.0 +/- 0
1. Apolipoprotein Panel Apo AIApo AIIApo AIVApo BApo CIApo CIIApo CIIIApo DApo EApo H
Parathyroid hormone (84 aa)is an example of a protein that in vivo has several clinically relevant variants
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.2
m/z4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200
37-8438-84
38-77
34-84
28-84
48-8445-84 34-77
37-77
m/z9325 9375 9425 9475 9525
1-84
Spectra at 3X
Rel
ativ
e In
ten
sity
Renal failure samples
Therefore, protein biomarker discovery must include biomarker characterization in a variety of bona fide clinical samples
Summary
• Unbiased Discovery workflows must include quantification and a significant sample N to encompass biological variability.
• Targeted discovery using SRM can be a shortcut if relevant metabolic pathways are known.
• Characterization of protein biomarker heterogeneity and isoforms are a necessary part of discovery workflows .
Targeted
Unbiased
Acknowledgements
Mary Lopez-Director
David Sarracino-
Manager, Biomarker Workflows
Bryan Krastins-Biomarker ScientistAmol Prakash-
Bioinformatic ScientistMichael Athanas-
Software Consultant
Jennifer Sutton-Manager, Biomarker Research
BRIMS TEAM
Thermo FisherScott PetermanAmy ZumwaltAndreas HuhmerBernard Delanghe
IBI, ASU Biodesign InstituteRandall NelsonDobrin NedelkovPaul OranChad Borges
Mass General Hospital, Harvard U.MingMing NingFerdinando S Buonanno Eng H Lo Mayo Clinic
Ravinder SinghDavid Barnidge