Modeling Biological Systems and Analyzing Large-Scale Data Sets ilya shmulevich

Preview:

Citation preview

Modeling Biological Systems and Analyzing Large-Scale Data Sets

ilya shmulevich

TCGA Data Types

TCGA Research Network

Heterogeneous data

Less Aggressive

More Aggressive

Distant Metastasis

M0=No M1=Yes

Tumor Stage Early (I-II) Late(III-IV)

Fraction Lymph Nodes Positive by H & E

0 – 100 %

Lymphatic Invasion Present

No Yes

Vascular Invasion Present

No Yes

Histological Type

Mucinous Non-mucinous

Clinical variables contributing to tumor aggressiveness

Nature, 487,330-337, 2012.

Vesteinn Thorsson

Vesteinn Thorsson

FBXW7

Vesteinn ThorssonNature, 487,330-337, 2012.

Vesteinn Thorsson, Dick Kreisberg

Nature, 487,330-337, 2012.

http://explorer.cancerregulome.org

Web-based Apps

The Regulome Explorer is an interactive web application that allows the user to explore multivariate relationships in data

Richard Kreisberg, Jake Lin, Timo Erkkila, Sheila Reynolds

explorer.cancerregulome.org

explorer.cancerregulome.org

RF-ACE, a multivariate statistical inference method based on ensembles of decision trees, which seeks to uncover significant associations between features in the input data matrix.

Timo Erkkilä, Sheila Reynolds, Kari Torkkola

RF-ACE has high predictive power and is resistant to over-fitting.

Computational challenges:

• mixed data types: continuous, discrete, and categorical

• tens of thousands of features x tens or hundreds of samples

• non-linear, noisy, and multivariate relationships

• correlated features

• missing data

RF-ACE has high predictive power and is resistant to over-fitting.

Computational challenges:

• mixed data types: continuous, discrete, and categorical

• tens of thousands of features x tens or hundreds of samples

• non-linear, noisy, and multivariate relationships

• correlated features

• missing data

http://code.google.com/p/rf-ace/

Timo Erkkilä

RF-ACE features:

• handles mixed variable types

• does not require imputation of missing values

• random subsampling rather than combinatorial search

• statistical testing removes redundant features

• “importance” p-value for each candidate predictor

• fast, portable implementation in C++

RF-ACE features:

• handles mixed variable types

• does not require imputation of missing values

• random subsampling rather than combinatorial search

• statistical testing removes redundant features

• “importance” p-value for each candidate predictor

• fast, portable implementation in C++

Google I/O keynote presentationJune 27, 2012

600,000 cores

A multilevel pan-cancer view: from genes to hallmarks

Theo Knijnenburg

Mutational investment

explorer.cancerregulome.org

Billions of Associations!

Motivating questions

• Repurposing– Which existing cancer drugs may be therapeutic

in which other cancers?– Which inhibitors with no current cancer

indications may be therapeutic in certain cancers?

• Opportunity– TCGA primary tumor data may serve as the basis

for guided investigation of these open questions

Guiding principle• The direct protein target for most inhibitors is not

the sensitizing aberrated protein itself– e.g., AKT1 inhibitors are most effective against cell

lines with PTEN mutations

Song et al. (2012)

Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA

PTEN mutations in UCEC

AKT1 protein expression related to PTEN mutation in UCEC

PTEN mutation status

AKT1

RPP

A pr

otei

n ex

pres

sion

gene

spot

.org

canc

erre

gulo

me.

org

Association

drug target : sensitizing aberration pairs

Proof of concept:Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA

Approach

• Create large heterogeneous graph of associations from TCGA data, literature, databases, …– [Billions of edges, Terabytes of data]

• Query on Cray YarcData uRiKA graph analytics appliance– No locality of reference, graphs hard to partition – [Minutes rather than hours per query]

• Identify aberrated gene → target → drug relationships for drugs with and without known efficacy in cancer

Genomic Aberration

TP53 mutation

Synthetic lethal protein targets Candidate compounds

ATR

CHEK1

PAK3

PLK1

SGK2

WEE1

Integrating multiple data sources into a (big) graphGenomic aberrations Therapeutic targets Candidate inhibitors

RNAi

Graph Data Model:Resource Description Framework (RDF)

<http://www.systemsbiology.net/tp53y_n_somatic>

http://www.systemsbiology.net

/tp53>

gnab

<http://www.systemsbiology.net/feature#label>

<http://www.systemsbiology.net/feature#source>

_:blankGeneNMD

http://www.systemsbiology.net

/brca2>

0.25151

<http://www.systemsbiology.net/nmd#term1> <http://www.systemsbiology.net/nmd#term2>

<http://www.systemsbiology.net/nmd#combocount>

_:blankDrugGeneNMD

1.1628http://www.systemsbiology.net

/biotin>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

http://www.systemsbiology.net

/Drug>

_:blankPairwise

<http://www.systemsbiology.net/association#dataset>

brca_05nov

<http://www.systemsbiology.net/association#feature1>

<http://www.systemsbiology.net/association#feature2>

<http://www.systemsbiology.net/gata3gexp>

-0.511

<http://www.systemsbiology.net/association#correlation>

gexp

<http://www.systemsbiology.net/feature#source>

<http://www.systemsbiology.net/feature#source>

<http://www.systemsbiology.net/nmd#term2>

<http://www.systemsbiology.net/nmd#term1>

<http://www.systemsbiology.net/nmd#nmd>

<http://www.systemsbiology.net/nmd#nmd>

Example SPARQL Query

Seed Gene List

Associated Genes

Small Molecules

Cancer Type

Literature

TCGA

Database

Literature

cancer.gov approved drugs

Example Result: PTEN associations in UCEC

Genomic aberrations Candidate targets Candidate inhibitors

PTEN

ASRGL1ESR1GLYATL2PLIN3HADHNT5EPIK3R3GABREPGRFBP1SMPD3GRIN1PIK3R1RARGAADATCACNA2D2SSTSRD5A1B4GALT1ADRA1BKCNJ12RYR1SLC6A14RETSATFAAHSRRNQO1CEACAM1KCNK6ACADSCRATELOVL4FOLH1ALDH1A3SORDASS1NADSYN1PRNPNDUFA11KCNH2CPS1SLC22A5HMGCRALDH18A1PARS2GLSB4GALT4ACACBSLC38A3GSROAZ3TCN1SLC1A1SMPD4BHMT2HSD17B4GRIK5GLDCPPIBPIPOXADASCN3BS100A1PLGSLC1A4CBSGLRBACVR1BSLC6A2

AcepromazineAcitretinAdapaleneAdenineAdenosine monophosphateAdenosine triphosphateAdinazolamAlfuzosinAlitretinoinAllylestrenolAlpha-Linolenic AcidAlprazolamAlteplaseAminocaproic AcidAmiodaroneAmitriptylineAmoxapineAmsacrineAnistreplaseAprotininArcitumomabAripiprazoleAstemizoleAtomoxetineAtorvastatinBepridilBiotinBromazepamBromocriptineBupropionCaffeineCapromabCarglumic acidCarmustineCarvedilolChlordiazepoxideChlorotrianiseneChlorpheniramineChlorpromazineCinolazepamCisaprideClobazamClomifeneClomipramineClonazepamClorazepateClotiazepamClozapineCocaineConjugated EstrogensCysteamineDanazolDantroleneDapiprazoleDebrisoquinDesipramineDesogestrelDesvenlafaxineDexmethylphenidateDextroamphetamineDiazepamDicumarolDienestrolDiethylpropionDiethylstilbestrolDipyridamoleDofetilideDoxazosinDoxepinDrospirenoneDroxidopaDuloxetineDutasterideDydrogesteroneEphedraEphedrineEpinephrineErgotamineEscitalopramEstazolamEstradiolEstramustineEstriolEstroneEstropipateEthinyl EstradiolEthynodiol DiacetateEtonogestrelFelodipineFinasterideFludiazepamFluoxymesteroneFlurazepamFluticasone PropionateFluvastatinFulvestrantGabapentinGalsulfaseGinkgo bilobaGlutathioneGlycineGuanadrel SulfateGuanethidineHalazepamHalofantrineHydroxocobalaminIbutilideIdursulfaseImipramineIsoproterenolIsradipineKetazolamLabetalolL-AlanineL-ArginineL-AsparagineL-Aspartic AcidL-CarnitineL-CitrullineL-CysteineLevonordefrinLevonorgestrelL-Glutamic AcidL-HistidineLindaneLisdexamfetamineL-MethionineLorazepamL-OrnithineLovastatinL-ProlineL-SerineMaprotilineMazindolMedroxyprogesteroneMegestrolMelatoninMenadioneMeperidineMestranolMethamphetamineMethotrimeprazineMethoxamineMethylphenidateMianserinMiconazoleMidazolamMidodrineMifepristoneMilnacipranModafinilN-Acetyl-D-glucosamineNADHNaloxoneNefazodoneNicardipineNitrazepamNitrendipineNorelgestrominNorepinephrineNorethindroneNorgestimateNortriptylineOlanzapineOlopatadineOrphenadrineOxazepamPaliperidoneParoxetinePentostatinPentoxifyllinePergolidePhendimetrazinePhenmetrazinePhenterminePhenylephrinePhosphatidylserinePimozidePravastatinPrazepamPrazosinProgesteronePromazinePropafenonePropericiazinePropiomazineProtriptylinePseudoephedrinePyridoxineQuazepamQuetiapineQuinestrolQuinidineRaloxifeneReboxetineReteplaseRisperidoneRosuvastatinS-AdenosylmethionineSertindoleSibutramineSimvastatinSotalolStreptokinaseSuraminTamoxifenTamsulosinTazaroteneTemazepamTenecteplaseTerazosinTerfenadineTetracyclineThiopentalThioproperazineThioridazineToremifeneTramadolTranexamic AcidTretinoinTriazolamTrilostaneTrimipramineUrokinaseVenlafaxineVerapamilVitamin AXylometazolineZiprasidoneZonisamide

Example Result: PTEN associations in UCECGenomic aberrations Candidate targets Candidate inhibitors

PTEN PIK3R1/PIK3CA Wortmannin

PTEN mutation status

PIK3

R1 g

ene

expr

essi

on

PDB id 3hhm

Repurposing existing cancer drugs in other cancers

Genomic aberrations Candidate targets Candidate inhibitors

Existing cancer indication Target Cancer Drug A

New cancer indication

Example Result

• TP53 is frequently mutated in most tumor types• ABCG2, also known as Breast Cancer Resistance

Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data

• Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins

• High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. “It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. “ [Shim et al. JNCI 2012]

source: EMBO Rep. 2004 May; 5(5): 470–476.

Source: http://www.sjrcd.org/soilhealth/soilagg.html

Source: http://www.webmd.com

Source: http://www.theregister.co.uk

Understanding behavior of massive multicellular systems: BioCellion

Ductal Carcinoma model:

Nicholas Flann, Utah State Univ.

Brady Bernard, Ryan Bressler, Andrea Eakin, Timo Erkkilä, Lisa Iype, Seunghwa Kang, Theo Knijnenburg, Roger Kramer, Richard Kreisberg, Kalle Leinonen, Jake Lin, Yuexin Liu, Michael Miller, Sheila Reynolds, Hector Rovira, Vesteinn Thorsson, Da Yang, Wei Zhang

Acknowledgments