Upload
jeff-kiefer
View
574
Download
1
Tags:
Embed Size (px)
Citation preview
Translational Genomics Research Institute | www.tgen.org
Cancer Pathway Analysis and Personalized Medicine
Jeff Kiefer Research Associate Inves4gator Transla4onal Genomics Research
Ins4tute
Translational Genomics Research Institute | www.tgen.org
Big Cancer Data Resources and Secondary Data Tools
Pathway Analysis - Resources, Methods, and Tools
Personalized Medicine - ‘Interpretation bottleneck’ Drug to Genomic Event Matching
Outline
Translational Genomics Research Institute | www.tgen.org
Cancer Genome Data Repositories
https://www.ebi.ac.uk/arrayexpress/
http://www.ncbi.nlm.nih.gov/geo/
http://cancergenome.nih.gov/
https://icgc.org/
Translational Genomics Research Institute | www.tgen.org
Cancer Genome Data Repositories and Data Portals
https://genome-cancer.ucsc.edu/
http://www.cbioportal.org/public-portal/
http://cancergenome.broadinstitute.orgTumorPortal
https://dcc.icgc.org/
http://genomeportal.stanford.edu/pan-tcga
http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/
Translational Genomics Research Institute | www.tgen.org
http://www.cbioportal.org/public-portal/
Translational Genomics Research Institute | www.tgen.org
http://www.cbioportal.org/public-portal/
Translational Genomics Research Institute | www.tgen.org
Pathways Analysis
Pathway analysis encompasses a number of different approaches and methods applied to large-scale -omic data sets.
The goal is to discover meaningful biological knowledge from large data sets often in the form of a gene list.
Pathway is a term that describes a step-wise signal transduction pathway. However, the term ‘pathway’ is also loosely used to
encompass genes sets derived from signatures or other biological processes such as the gene ontology.
Translational Genomics Research Institute | www.tgen.org
(2012). PLOS Computational Biology, 8(2), e1002375. doi:10.1371/journal.pcbi.1002375.t001
Pathways Analysis
Good general review outlining techniques, resources, and issues in pathway analysis
Translational Genomics Research Institute | www.tgen.org
Pathways Analysis
Threshold-Based = Enrichment analysis performed on a gene list derived from statistical test.
Non-threshold Based = All data is used. First popularized by gene set enrichment analysis (GSEA).
‘de-novo’ Based = Pathways or gene sets derived from primary data.
Translational Genomics Research Institute | www.tgen.org
Pathway Resources
http://www.reactome.org/http://www.genome.jp/kegg/pathway.html
http://www.broadinstitute.org/gsea/msigdb/index.jsp
Commercial Resources
http://www.pathwaycommons.org/about/#main-container
Translational Genomics Research Institute | www.tgen.org
Threshold-based Pathway Enrichment Tools
https://toppgene.cchmc.org/
http://amp.pharm.mssm.edu/Enrichr
http://www.ici.upmc.fr/cluego/
Translational Genomics Research Institute | www.tgen.org
ToppGene extensive pathway gene sets available for enrichment analysis
Translational Genomics Research Institute | www.tgen.org
Easy to use web interface
Add list of gene identifiers toperform enrichment analysis on.
Translational Genomics Research Institute | www.tgen.org
Results sorted based on significance.
Translational Genomics Research Institute | www.tgen.org
Translational Genomics Research Institute | www.tgen.org
Results
Gene Set/Pathway Categories
Translational Genomics Research Institute | www.tgen.org
Different Result Outputs
Translational Genomics Research Institute | www.tgen.org
http://www.ici.upmc.fr/cluego/
ClueGO integrates Gene Ontology (GO) terms as well as pathways and creates a functionally organized GO/pathway term network.
COL9A1
COL28A1
COL14A1
COL9A3
COL20A1COL12A1
COL9A2
Collagenbiosynthesis and
modifyingenzymes
Collagen formation
forebraindevelopment
SEMA3A
SYPL2
FGF9CNTNAP2
SLC6A4
NDNF
SLC5A3
HEPH
SLC14A1
Transport ofglucose and othersugars, bile salts
and organic acids,metal ions and
amine compounds
RHBG SLC6A20
TBX5
RAC3
negative regulation of celldifferentiation
negative regulation of Wntsignaling pathway
BICC1PRICKLE1
DKK1SFRP2
EFEMP1
regulation of cell development
COL1A1
EPHA3
SLIT2FES APCDD1SULF1
PPP2R3Aregulation of canonical Wnt
signaling pathway
regulation of Wntsignaling pathway
DDR2
LTF
regulation of celldifferentiation
SP7
MT3
BAX
S100A9
S100A8
NDUFA13
regulation of cysteine-typeendopeptidase activity involved
in apoptotic process
BBC3
regulation of intrinsic apoptoticsignaling pathway
IGFBP3MEGF10
SLN
CACNG4
CCL4CACNB2
ENPP1
KCNH2
regulation of ion transport
positiveregulation of ion
transport
CCL3CTLA4SCN4B
GADD45G
TRIB3
intrinsic apoptotic signalingpathway
p53 signalingpathway
BAI1
SEPT4
CD82
SFN TLR4osteoblastdifferentiation
TLR3
Rheumatoidarthritis
IL8
LOC100509457
CXCL5 ANGPT1
Toll-like receptor signalingpathway
CTSK
RUNX2
Cytoscape App
Translational Genomics Research Institute | www.tgen.org
Non-Threshold Pathway Enrichment Tools
http://www.broadinstitute.org/gsea/index.jsp
Translational Genomics Research Institute | www.tgen.org
GSEA
Can be accessed through a number of resources and methods
Java DesktopR-GSEAGene Pattern
Translational Genomics Research Institute | www.tgen.org
GSEA Use CaseAnaplastic Thyroid Cancer vs Non-Tumor Thyroid
Translational Genomics Research Institute | www.tgen.org
GSEA Use Case
Translational Genomics Research Institute | www.tgen.org
GSEA Visualization with Enrichment Map
Translational Genomics Research Institute | www.tgen.org
GSEA Visualization with Enrichment Map
(2010) PLoS ONE, 5(11), 1–12. doi:10.1371/journal.pone.0013984.t001
http://www.baderlab.org/Software/EnrichmentMapCytoscape App
Translational Genomics Research Institute | www.tgen.org
EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.
Translational Genomics Research Institute | www.tgen.org
Methodology that can exploit complex interactions between two conditions, such as tumor v normal that might be missed in traditional approaches based on differential gene expression
Translational Genomics Research Institute | www.tgen.org
Investigate differential dependencies between conditions – Evaluation of Differential DependencY – Computes the differential dependency statistics (JS) and its statistical
significance (p-value, via permutation) between conditions, based on the likelihoods of genetic networks (a probabilistic distribution)
Likelihood
… Possible (or probable) dependency structures JS
A
B C
Gene set of interest
A
B C
A
B C
Class 1
Class 2
MSigDB, …
Gene set DB
Class 2 specific dependency
Class 1 specific dependency
Common dependency
EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.
Translational Genomics Research Institute | www.tgen.org
Likelihood
… Possible (or probable) dependency structures
A
B C
A
B C
Class 1
Class 2
A
B C
A
B C A
B C
Class 1 Specific dependency
Class 2 Specific Dependency
A
B C
Common dependency
Translational Genomics Research Institute | www.tgen.org
• GSEA appears under-powered, and also select disproportionate amount.
• GSCA appears to be overly sensitive – high false positive
(#): Overlap with EDDY gene sets
The number of identified subtype-specific gene sets
Simulation II: comparison of EDDY with other methods
Figure 5 illustrates the ROC curves of GSCA, GSEA andEDDY from Simulation II, and Table 1 lists the areaunder curve values of the corresponding ROC curves inFigure 5.
From the results of the interaction-focused simulationexperiments (Figure 5), EDDY demonstrates superiorperformance than the other two methods. This is partlydue to the fact that the data were generated from modelsassuming conditional dependencies in gene expressionsrather than simple linear correlations, which is alsoassumed by the Bayesian network model that the currentimplementation of EDDY uses. The performance ofEDDY declines with increasing the size of gene sets, butit improves with more computations (by using larger M asshown in Figure 5). Another observation is significantlylower false positive rates of EDDY than that of othermethods (Supplementary Figure S10–S12).
This simulation study clearly indicates that EDDYsignificantly outperforms other methods when differentialgene sets are defined in the sense of gene interactions, withsignificantly lower false-positive rates.
Besides the simulation scenario covered in this study,there can be various different simulation configurationsdepending on the methods to generate synthetic datasets and the definition of differential gene sets. However,it is not feasible to enumerate and cover all such differentcases, and thus, they are left for future studies.
Comparison of EDDY with other methods in applicationto TCGA GBM gene expression data
Table 2 lists the number of statistically significant genesets identified with the three different methods for eachsubtype. EDDY and GSEA produced different results,as EDDY identified 10! 22 gene sets for each subtype,whereas GSEA identified 245 gene sets for mesenchymalbut just a few for other subtypes. Moreover, there is onlyone common gene set (for mesenchymal) between theresults from the two methods. A possible hypothesis ofGSEA identifying many gene sets only for mesenchymalis that mesenchymal is the most differentiated form ofGBM (physiologically or genotypically) (27), and manygenes are differentially expressed in mesenchymalcompared with other subtypes. Compared with GSEA,the results of EDDY are relatively less biased to aspecific subtype (for the lists of identified gene sets fromEDDY and GSEA, see Supplementary Tables S1–S8).Compared with the other two methods,
GSCA identified much more gene sets as statisticallysignificant, from 68 to 80% of the tested gene sets,making it almost noninformative (for the lists of identifiedgene sets from GSCA, see the supplementary file providedin http://biocomputing.tgen.org/software/EDDY). Thisbecomes clearer from Figure 6, where the P-values fromGSCA are much closer to 0 in general than that of EDDYand GSEA (for direct comparison of P-values from
A B C
Figure 5. The comparison of ROC of GSCA, GSEA and EDDY in identifying differential gene sets from the interaction-focused simulationexperiments. (A) v ¼ 10. (B) v ¼ 20. (C) v ¼ 30.
Table 1. The area under curve values of GSCA, GSEA and EDDYin identifying differential gene sets from Simulation II
Method v ¼ 10 v ¼ 20 v ¼ 30
GSCA 0.5774 0.5822 0.5965GSEA 0.4911 0.5574 0.6075EDDY (M=1000) 0.7440 0.6768 0.6704EDDY (M=5000) 0.8287 0.7580 0.7064
Bold face indicates top performance.
Table 2. The number of statistically significant gene sets for eachsubtype
Method Classical Mesenchymal Neural Proneural
EDDY 13 10 22 22GSEA 1 (0) 245 (1) 6 (0) 3 (0)GSCA 1590 (11) 1432 (7) 1681 (21) 1563 (17)
The number of common cases with EDDY is indicated in theparentheses.
Nucleic Acids Research, 2014 9
by guest on February 6, 2014http://nar.oxfordjournals.org/
Dow
nloaded from
Translational Genomics Research Institute | www.tgen.org
G2 pathway and p53 pathway gene sets to have differential dependencies that are related to the enrichment of p53 mutations in the proneural subtype. Heat maps show that genes in pathway are not differentially expressed so would not be identified by GSEA technique.
Two Pathways Identified with EDDY Enriched in Proneural Glioblastoma Phenotype
Translational Genomics Research Institute | www.tgen.org
PARADIGM
March 20, 2014 Vol507 Nature
MEMo
https://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf
Both methods employ multiple genomicdata types to identified altered pathways.
Employed in TCGA studies
Translational Genomics Research Institute | www.tgen.org
Personalized Medicine ‘Interpretation Bottleneck’
Drug Target Annotation
Translational Genomics Research Institute | www.tgen.org
Personalized Medicine Pipeline
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Drug Target Matching for Personalized Medicine
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Framework for Clinical Mapping Genomic Aberration to Drugs
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Drug Target Resources
A number of resources available for drug mapping to gene targets.
Issues with available sources•Different annotations schemes and data structures leads to misleading results for end user.•Contextual information around the drug and target is often not annotated.•Not all annotations are therapeutically actionable or appropriate.
Translational Genomics Research Institute | www.tgen.org
Drug to Target AnnotationInformation for linking drugs to genes should be based on primary literature.
Curated information should be annotated with controlled vocabulary and arrayed in a structured format.
Rules need to capture explicit drug-target response information but also be flexible enough to capture inferred information that may not always be explicitly stated. Important for further research.
Translational Genomics Research Institute | www.tgen.org
Example annotation workflow for capturing drug to target information.
Translational Genomics Research Institute | www.tgen.org
Visualization of Drug Target Network
Translational Genomics Research Institute | www.tgen.org
CNV
OtherEXPDRUG
SNV
Aberration Type Color Key=no_direct=no_inferred=yes_inferred
=yes_directEdge Interaction Key Aberration Type Color Key
=DRUG
=BIOMARKER
=MODIFIER
Patient Specific Drug Target Network
Patient Genomic Information
Translational Genomics Research Institute | www.tgen.org
Impact Areas for Text Mining
•Identify and extract interaction information for network and pathway reconstruction.
•Aid in identifying and extracting genomic events linked to drug response to better enable personalized medicine.