Psb tutorial cancer_pathways

Translational Genomics Research Institute | www.tgen.org

Cancer Pathway Analysis and Personalized Medicine

Jeff Kiefer Research Associate Inves4gator Transla4onal Genomics Research

Ins4tute


Big Cancer Data Resources and Secondary Data Tools

Pathway Analysis - Resources, Methods, and Tools

Personalized Medicine - ‘Interpretation bottleneck’ Drug to Genomic Event Matching

Outline


Cancer Genome Data Repositories

https://www.ebi.ac.uk/arrayexpress/

http://www.ncbi.nlm.nih.gov/geo/

http://cancergenome.nih.gov/

https://icgc.org/

https://www.ebi.ac.uk/arrayexpress/

http://www.ncbi.nlm.nih.gov/geo/

http://cancergenome.nih.gov/

https://icgc.org/


Cancer Genome Data Repositories and Data Portals

https://genome-cancer.ucsc.edu/

http://www.cbioportal.org/public-portal/

http://cancergenome.broadinstitute.orgTumorPortal

https://dcc.icgc.org/

http://genomeportal.stanford.edu/pan-tcga

http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/

https://genome-cancer.ucsc.edu/


http://cancergenome.broadinstitute.org

https://dcc.icgc.org/

http://genomeportal.stanford.edu/pan-tcga

http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/






Pathways Analysis

Pathway analysis encompasses a number of different approaches and methods applied to large-scale -omic data sets.

The goal is to discover meaningful biological knowledge from large data sets often in the form of a gene list.

Pathway is a term that describes a step-wise signal transduction pathway. However, the term ‘pathway’ is also loosely used to

encompass genes sets derived from signatures or other biological processes such as the gene ontology.


(2012). PLOS Computational Biology, 8(2), e1002375. doi:10.1371/journal.pcbi.1002375.t001

Pathways Analysis

Good general review outlining techniques, resources, and issues in pathway analysis


Pathways Analysis

Threshold-Based = Enrichment analysis performed on a gene list derived from statistical test.

Non-threshold Based = All data is used. First popularized by gene set enrichment analysis (GSEA).

‘de-novo’ Based = Pathways or gene sets derived from primary data.


Pathway Resources

http://www.reactome.org/http://www.genome.jp/kegg/pathway.html

http://www.broadinstitute.org/gsea/msigdb/index.jsp

Commercial Resources

http://www.pathwaycommons.org/about/#main-container

http://www.reactome.org/

http://www.genome.jp/kegg/pathway.html

http://www.broadinstitute.org/gsea/msigdb/index.jsp

http://www.pathwaycommons.org/about/#main-container


Threshold-based Pathway Enrichment Tools

https://toppgene.cchmc.org/

http://amp.pharm.mssm.edu/Enrichr

http://www.ici.upmc.fr/cluego/


ToppGene extensive pathway gene sets available for enrichment analysis


Easy to use web interface

Add list of gene identifiers toperform enrichment analysis on.


Results sorted based on significance.



Results

Gene Set/Pathway Categories


Different Result Outputs


http://www.ici.upmc.fr/cluego/

ClueGO integrates Gene Ontology (GO) terms as well as pathways and creates a functionally organized GO/pathway term network.

COL9A1

COL28A1

COL14A1

COL9A3

COL20A1COL12A1

COL9A2

Collagenbiosynthesis and

modifyingenzymes

Collagen formation

forebraindevelopment

SEMA3A

SYPL2

FGF9CNTNAP2

SLC6A4

NDNF

SLC5A3

HEPH

SLC14A1

Transport ofglucose and othersugars, bile salts

and organic acids,metal ions and

amine compounds

RHBG SLC6A20

TBX5

RAC3

negative regulation of celldifferentiation

negative regulation of Wntsignaling pathway

BICC1PRICKLE1

DKK1SFRP2

EFEMP1

regulation of cell development

COL1A1

EPHA3

SLIT2FES APCDD1SULF1

PPP2R3Aregulation of canonical Wnt

signaling pathway

regulation of Wntsignaling pathway

DDR2

LTF

regulation of celldifferentiation

SP7

MT3

BAX

S100A9

S100A8

NDUFA13

regulation of cysteine-typeendopeptidase activity involved

in apoptotic process

BBC3

regulation of intrinsic apoptoticsignaling pathway

IGFBP3MEGF10

SLN

CACNG4

CCL4CACNB2

ENPP1

KCNH2

regulation of ion transport

positiveregulation of ion

transport

CCL3CTLA4SCN4B

GADD45G

TRIB3

intrinsic apoptotic signalingpathway

p53 signalingpathway

BAI1

SEPT4

CD82

SFN TLR4osteoblastdifferentiation

TLR3

Rheumatoidarthritis

IL8

LOC100509457

CXCL5 ANGPT1

Toll-like receptor signalingpathway

CTSK

RUNX2

Cytoscape App


Non-Threshold Pathway Enrichment Tools

http://www.broadinstitute.org/gsea/index.jsp


GSEA

Can be accessed through a number of resources and methods

Java DesktopR-GSEAGene Pattern


GSEA Use CaseAnaplastic Thyroid Cancer vs Non-Tumor Thyroid


GSEA Use Case


GSEA Visualization with Enrichment Map


GSEA Visualization with Enrichment Map

(2010) PLoS ONE, 5(11), 1–12. doi:10.1371/journal.pone.0013984.t001

http://www.baderlab.org/Software/EnrichmentMapCytoscape App

http://www.baderlab.org/Software/EnrichmentMap


EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.


Methodology that can exploit complex interactions between two conditions, such as tumor v normal that might be missed in traditional approaches based on differential gene expression


Investigate differential dependencies between conditions –  Evaluation of Differential DependencY –  Computes the differential dependency statistics (JS) and its statistical

significance (p-value, via permutation) between conditions, based on the likelihoods of genetic networks (a probabilistic distribution)

Likelihood

… Possible (or probable) dependency structures JS

A

B C

Gene set of interest

A

B C

A

B C

Class 1

Class 2

MSigDB, …

Gene set DB

Class 2 specific dependency

Class 1 specific dependency

Common dependency

EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.


Likelihood

… Possible (or probable) dependency structures

A

B C

A

B C

Class 1

Class 2

A

B C

A

B C A

B C

Class 1 Specific dependency

Class 2 Specific Dependency

A

B C

Common dependency


•  GSEA appears under-powered, and also select disproportionate amount.

•  GSCA appears to be overly sensitive – high false positive

(#): Overlap with EDDY gene sets

The number of identified subtype-specific gene sets

Simulation II: comparison of EDDY with other methods

Figure 5 illustrates the ROC curves of GSCA, GSEA andEDDY from Simulation II, and Table 1 lists the areaunder curve values of the corresponding ROC curves inFigure 5.

From the results of the interaction-focused simulationexperiments (Figure 5), EDDY demonstrates superiorperformance than the other two methods. This is partlydue to the fact that the data were generated from modelsassuming conditional dependencies in gene expressionsrather than simple linear correlations, which is alsoassumed by the Bayesian network model that the currentimplementation of EDDY uses. The performance ofEDDY declines with increasing the size of gene sets, butit improves with more computations (by using larger M asshown in Figure 5). Another observation is significantlylower false positive rates of EDDY than that of othermethods (Supplementary Figure S10–S12).

This simulation study clearly indicates that EDDYsignificantly outperforms other methods when differentialgene sets are defined in the sense of gene interactions, withsignificantly lower false-positive rates.

Besides the simulation scenario covered in this study,there can be various different simulation configurationsdepending on the methods to generate synthetic datasets and the definition of differential gene sets. However,it is not feasible to enumerate and cover all such differentcases, and thus, they are left for future studies.

Comparison of EDDY with other methods in applicationto TCGA GBM gene expression data

Table 2 lists the number of statistically significant genesets identified with the three different methods for eachsubtype. EDDY and GSEA produced different results,as EDDY identified 10! 22 gene sets for each subtype,whereas GSEA identified 245 gene sets for mesenchymalbut just a few for other subtypes. Moreover, there is onlyone common gene set (for mesenchymal) between theresults from the two methods. A possible hypothesis ofGSEA identifying many gene sets only for mesenchymalis that mesenchymal is the most differentiated form ofGBM (physiologically or genotypically) (27), and manygenes are differentially expressed in mesenchymalcompared with other subtypes. Compared with GSEA,the results of EDDY are relatively less biased to aspecific subtype (for the lists of identified gene sets fromEDDY and GSEA, see Supplementary Tables S1–S8).Compared with the other two methods,

GSCA identified much more gene sets as statisticallysignificant, from 68 to 80% of the tested gene sets,making it almost noninformative (for the lists of identifiedgene sets from GSCA, see the supplementary file providedin http://biocomputing.tgen.org/software/EDDY). Thisbecomes clearer from Figure 6, where the P-values fromGSCA are much closer to 0 in general than that of EDDYand GSEA (for direct comparison of P-values from

A B C

Figure 5. The comparison of ROC of GSCA, GSEA and EDDY in identifying differential gene sets from the interaction-focused simulationexperiments. (A) v ¼ 10. (B) v ¼ 20. (C) v ¼ 30.

Table 1. The area under curve values of GSCA, GSEA and EDDYin identifying differential gene sets from Simulation II

Method v ¼ 10 v ¼ 20 v ¼ 30

GSCA 0.5774 0.5822 0.5965GSEA 0.4911 0.5574 0.6075EDDY (M=1000) 0.7440 0.6768 0.6704EDDY (M=5000) 0.8287 0.7580 0.7064

Bold face indicates top performance.

Table 2. The number of statistically significant gene sets for eachsubtype

Method Classical Mesenchymal Neural Proneural

EDDY 13 10 22 22GSEA 1 (0) 245 (1) 6 (0) 3 (0)GSCA 1590 (11) 1432 (7) 1681 (21) 1563 (17)

The number of common cases with EDDY is indicated in theparentheses.

Nucleic Acids Research, 2014 9

by guest on February 6, 2014http://nar.oxfordjournals.org/

Dow

nloaded from


G2 pathway and p53 pathway gene sets to have differential dependencies that are related to the enrichment of p53 mutations in the proneural subtype. Heat maps show that genes in pathway are not differentially expressed so would not be identified by GSEA technique.

Two Pathways Identified with EDDY Enriched in Proneural Glioblastoma Phenotype


PARADIGM

March 20, 2014 Vol507 Nature

MEMo

https://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf

Both methods employ multiple genomicdata types to identified altered pathways.

Employed in TCGA studies

https://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf


Personalized Medicine ‘Interpretation Bottleneck’

Drug Target Annotation


Personalized Medicine Pipeline

Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7


Drug Target Matching for Personalized Medicine



Framework for Clinical Mapping Genomic Aberration to Drugs



Drug Target Resources

A number of resources available for drug mapping to gene targets.

Issues with available sources•Different annotations schemes and data structures leads to misleading results for end user.•Contextual information around the drug and target is often not annotated.•Not all annotations are therapeutically actionable or appropriate.


Drug to Target AnnotationInformation for linking drugs to genes should be based on primary literature.

Curated information should be annotated with controlled vocabulary and arrayed in a structured format.

Rules need to capture explicit drug-target response information but also be flexible enough to capture inferred information that may not always be explicitly stated. Important for further research.


Example annotation workflow for capturing drug to target information.


Visualization of Drug Target Network


CNV

OtherEXPDRUG

SNV

Aberration Type Color Key=no_direct=no_inferred=yes_inferred

=yes_directEdge Interaction Key Aberration Type Color Key

=DRUG

=BIOMARKER

=MODIFIER

Patient Specific Drug Target Network

Patient Genomic Information


Impact Areas for Text Mining

•Identify and extract interaction information for network and pathway reconstruction.

•Aid in identifying and extracting genomic events linked to drug response to better enable personalized medicine.

Science

Psb tutorial cancer_pathways