Upload
julie
View
38
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Modeling Biological Systems and Analyzing Large-Scale Data Sets. ilya shmulevich. TCGA Data Types. TCGA Research Network. Heterogeneous data. Clinical variables contributing to tumor aggressiveness. Nature , 487,330-337, 2012. Vesteinn Thorsson. FBXW7. Vesteinn Thorsson. - PowerPoint PPT Presentation
Citation preview
Modeling Biological Systems and Analyzing Large-Scale Data Sets
ilya shmulevich
TCGA Data Types
TCGA Research Network
Heterogeneous data
Less Aggressive
More Aggressive
Distant Metastasis
M0=No M1=Yes
Tumor Stage Early (I-II) Late(III-IV)
Fraction Lymph Nodes Positive by H & E
0 – 100 %
Lymphatic Invasion Present
No Yes
Vascular Invasion Present
No Yes
Histological Type
Mucinous Non-mucinous
Clinical variables contributing to tumor aggressiveness
Nature, 487,330-337, 2012.
Vesteinn Thorsson
Vesteinn Thorsson
FBXW7
Vesteinn ThorssonNature, 487,330-337, 2012.
Vesteinn Thorsson, Dick Kreisberg
Nature, 487,330-337, 2012.
The Regulome Explorer is an interactive web application that allows the user to explore multivariate relationships in data
Richard Kreisberg, Jake Lin, Timo Erkkila, Sheila Reynolds
explorer.cancerregulome.org
explorer.cancerregulome.org
RF-ACE, a multivariate statistical inference method based on ensembles of decision trees, which seeks to uncover significant associations between features in the input data matrix.
Timo Erkkilä, Sheila Reynolds, Kari Torkkola
RF-ACE has high predictive power and is resistant to over-fitting.
Computational challenges:
• mixed data types: continuous, discrete, and categorical
• tens of thousands of features x tens or hundreds of samples
• non-linear, noisy, and multivariate relationships
• correlated features
• missing data
RF-ACE has high predictive power and is resistant to over-fitting.
Computational challenges:
• mixed data types: continuous, discrete, and categorical
• tens of thousands of features x tens or hundreds of samples
• non-linear, noisy, and multivariate relationships
• correlated features
• missing data
http://code.google.com/p/rf-ace/
Timo Erkkilä
RF-ACE features:
• handles mixed variable types
• does not require imputation of missing values
• random subsampling rather than combinatorial search
• statistical testing removes redundant features
• “importance” p-value for each candidate predictor
• fast, portable implementation in C++
RF-ACE features:
• handles mixed variable types
• does not require imputation of missing values
• random subsampling rather than combinatorial search
• statistical testing removes redundant features
• “importance” p-value for each candidate predictor
• fast, portable implementation in C++
Google I/O keynote presentationJune 27, 2012
600,000 cores
A multilevel pan-cancer view: from genes to hallmarks
Theo Knijnenburg
Mutational investment
explorer.cancerregulome.org
Billions of Associations!
Motivating questions
• Repurposing– Which existing cancer drugs may be therapeutic
in which other cancers?– Which inhibitors with no current cancer
indications may be therapeutic in certain cancers?
• Opportunity– TCGA primary tumor data may serve as the basis
for guided investigation of these open questions
Guiding principle• The direct protein target for most inhibitors is not
the sensitizing aberrated protein itself– e.g., AKT1 inhibitors are most effective against cell
lines with PTEN mutations
Song et al. (2012)
Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA
PTEN mutations in UCEC
AKT1 protein expression related to PTEN mutation in UCEC
PTEN mutation status
AKT1
RPP
A pr
otei
n ex
pres
sion
gene
spot
.org
canc
erre
gulo
me.
org
Association
drug target : sensitizing aberration pairs
Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA
Approach
• Create large heterogeneous graph of associations from TCGA data, literature, databases, …– [Billions of edges, Terabytes of data]
• Query on Cray YarcData uRiKA graph analytics appliance– No locality of reference, graphs hard to partition – [Minutes rather than hours per query]
• Identify aberrated gene → target → drug relationships for drugs with and without known efficacy in cancer
Genomic Aberration
TP53 mutation
Synthetic lethal protein targets Candidate compounds
…
ATR
CHEK1
PAK3
PLK1
SGK2
…
WEE1
Integrating multiple data sources into a (big) graphGenomic aberrations Therapeutic targets Candidate inhibitors
RNAi
Graph Data Model:Resource Description Framework (RDF)
<http://www.systemsbiology.net/tp53y_n_somatic>
http://www.systemsbiology.net
/tp53>
gnab
<http://www.systemsbiology.net/feature#label>
<http://www.systemsbiology.net/feature#source>
_:blankGeneNMD
http://www.systemsbiology.net
/brca2>
0.25151
<http://www.systemsbiology.net/nmd#term1> <http://www.systemsbiology.net/nmd#term2>
<http://www.systemsbiology.net/nmd#combocount>
_:blankDrugGeneNMD
1.1628http://www.systemsbiology.net
/biotin>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
http://www.systemsbiology.net
/Drug>
_:blankPairwise
<http://www.systemsbiology.net/association#dataset>
brca_05nov
<http://www.systemsbiology.net/association#feature1>
<http://www.systemsbiology.net/association#feature2>
<http://www.systemsbiology.net/gata3gexp>
-0.511
<http://www.systemsbiology.net/association#correlation>
gexp
<http://www.systemsbiology.net/feature#source>
<http://www.systemsbiology.net/feature#source>
<http://www.systemsbiology.net/nmd#term2>
<http://www.systemsbiology.net/nmd#term1>
<http://www.systemsbiology.net/nmd#nmd>
<http://www.systemsbiology.net/nmd#nmd>
Example SPARQL Query
Seed Gene List
Associated Genes
Small Molecules
Cancer Type
Literature
TCGA
Database
Literature
cancer.gov approved drugs
Example Result: PTEN associations in UCEC
Genomic aberrations Candidate targets Candidate inhibitors
PTEN
ASRGL1ESR1GLYATL2PLIN3HADHNT5EPIK3R3GABREPGRFBP1SMPD3GRIN1PIK3R1RARGAADATCACNA2D2SSTSRD5A1B4GALT1ADRA1BKCNJ12RYR1SLC6A14RETSATFAAHSRRNQO1CEACAM1KCNK6ACADSCRATELOVL4FOLH1ALDH1A3SORDASS1NADSYN1PRNPNDUFA11KCNH2CPS1SLC22A5HMGCRALDH18A1PARS2GLSB4GALT4ACACBSLC38A3GSROAZ3TCN1SLC1A1SMPD4BHMT2HSD17B4GRIK5GLDCPPIBPIPOXADASCN3BS100A1PLGSLC1A4CBSGLRBACVR1BSLC6A2
AcepromazineAcitretinAdapaleneAdenineAdenosine monophosphateAdenosine triphosphateAdinazolamAlfuzosinAlitretinoinAllylestrenolAlpha-Linolenic AcidAlprazolamAlteplaseAminocaproic AcidAmiodaroneAmitriptylineAmoxapineAmsacrineAnistreplaseAprotininArcitumomabAripiprazoleAstemizoleAtomoxetineAtorvastatinBepridilBiotinBromazepamBromocriptineBupropionCaffeineCapromabCarglumic acidCarmustineCarvedilolChlordiazepoxideChlorotrianiseneChlorpheniramineChlorpromazineCinolazepamCisaprideClobazamClomifeneClomipramineClonazepamClorazepateClotiazepamClozapineCocaineConjugated EstrogensCysteamineDanazolDantroleneDapiprazoleDebrisoquinDesipramineDesogestrelDesvenlafaxineDexmethylphenidateDextroamphetamineDiazepamDicumarolDienestrolDiethylpropionDiethylstilbestrolDipyridamoleDofetilideDoxazosinDoxepinDrospirenoneDroxidopaDuloxetineDutasterideDydrogesteroneEphedraEphedrineEpinephrineErgotamineEscitalopramEstazolamEstradiolEstramustineEstriolEstroneEstropipateEthinyl EstradiolEthynodiol DiacetateEtonogestrelFelodipineFinasterideFludiazepamFluoxymesteroneFlurazepamFluticasone PropionateFluvastatinFulvestrantGabapentinGalsulfaseGinkgo bilobaGlutathioneGlycineGuanadrel SulfateGuanethidineHalazepamHalofantrineHydroxocobalaminIbutilideIdursulfaseImipramineIsoproterenolIsradipineKetazolamLabetalolL-AlanineL-ArginineL-AsparagineL-Aspartic AcidL-CarnitineL-CitrullineL-CysteineLevonordefrinLevonorgestrelL-Glutamic AcidL-HistidineLindaneLisdexamfetamineL-MethionineLorazepamL-OrnithineLovastatinL-ProlineL-SerineMaprotilineMazindolMedroxyprogesteroneMegestrolMelatoninMenadioneMeperidineMestranolMethamphetamineMethotrimeprazineMethoxamineMethylphenidateMianserinMiconazoleMidazolamMidodrineMifepristoneMilnacipranModafinilN-Acetyl-D-glucosamineNADHNaloxoneNefazodoneNicardipineNitrazepamNitrendipineNorelgestrominNorepinephrineNorethindroneNorgestimateNortriptylineOlanzapineOlopatadineOrphenadrineOxazepamPaliperidoneParoxetinePentostatinPentoxifyllinePergolidePhendimetrazinePhenmetrazinePhenterminePhenylephrinePhosphatidylserinePimozidePravastatinPrazepamPrazosinProgesteronePromazinePropafenonePropericiazinePropiomazineProtriptylinePseudoephedrinePyridoxineQuazepamQuetiapineQuinestrolQuinidineRaloxifeneReboxetineReteplaseRisperidoneRosuvastatinS-AdenosylmethionineSertindoleSibutramineSimvastatinSotalolStreptokinaseSuraminTamoxifenTamsulosinTazaroteneTemazepamTenecteplaseTerazosinTerfenadineTetracyclineThiopentalThioproperazineThioridazineToremifeneTramadolTranexamic AcidTretinoinTriazolamTrilostaneTrimipramineUrokinaseVenlafaxineVerapamilVitamin AXylometazolineZiprasidoneZonisamide
Example Result: PTEN associations in UCECGenomic aberrations Candidate targets Candidate inhibitors
PTEN PIK3R1/PIK3CA Wortmannin
PTEN mutation status
PIK3
R1 g
ene
expr
essi
on
PDB id 3hhm
Repurposing existing cancer drugs in other cancers
Genomic aberrations Candidate targets Candidate inhibitors
Existing cancer indication Target Cancer Drug A
New cancer indication
Example Result
• TP53 is frequently mutated in most tumor types• ABCG2, also known as Breast Cancer Resistance
Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data
• Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins
• High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. “It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. “ [Shim et al. JNCI 2012]
source: EMBO Rep. 2004 May; 5(5): 470–476.
Source: http://www.sjrcd.org/soilhealth/soilagg.html
Source: http://www.webmd.com
Source: http://www.theregister.co.uk
Understanding behavior of massive multicellular systems: BioCellion
Ductal Carcinoma model:
Nicholas Flann, Utah State Univ.
Brady Bernard, Ryan Bressler, Andrea Eakin, Timo Erkkilä, Lisa Iype, Seunghwa Kang, Theo Knijnenburg, Roger Kramer, Richard Kreisberg, Kalle Leinonen, Jake Lin, Yuexin Liu, Michael Miller, Sheila Reynolds, Hector Rovira, Vesteinn Thorsson, Da Yang, Wei Zhang
Acknowledgments