Upload
nuria-lopez-bigas
View
2.527
Download
4
Embed Size (px)
DESCRIPTION
Citation preview
Oncogenomics Workshop - EBI - UKMarch 14th, 2013
Nuria Lopez-BigasUniversity Pompeu Fabra
Barcelonahttp://bg.upf.edu
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
across projects - across cancer sites
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
across projects - across cancer sites
http://beta.intogen.org
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
Expression patterns
Somatic mutations
Epigenomic profiles
Structural aberrations
Copy number alterations
Patient cohortPrimary tumors
Cancer Genomics Data
Expression patterns
Somatic mutations
Epigenomic profiles
Structural aberrations
Copy number alterations
Patient cohortPrimary tumors
Cancer Genomics Data
tumor sample
mached normal sample
Exome/Whole genome sequencing
Reads
Reads
Aligment
Aligned reads
FASTQ
Aligned reads
BAM
Mutation calling
Tumor somatic
mutations
VCF File formats:
Analysis protocol Laboratory protocol
Cancer genome re-sequencing
Tumours are heterogeneous in nature (multiclonality)
Variant calling pipelines entail judgement calls
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
tumor sample
mached normal sample
Exome/Whole genome sequencing
Reads
Reads
Aligment
Aligned reads
FASTQ
Aligned reads
BAM
Mutation calling
Tumor somatic
mutations
VCF File formats:
Analysis protocol Laboratory protocol
Cancer genome re-sequencing
Which mutations are cancer drivers?
How to identify cancer drivers?
How to identify cancer drivers?
Find signs of positive selection across tumour re-sequenced genomes
Frequency based approaches to identify drivers
Assume that cancer drivers are mutated more frequently than background in a cohort of tumours
samples
Recurrence analysis
gene
s
gene
s
not mutatedmutated driver gene
MutSig (Broad Institute)MuSiC-SMG (Washington University)
Frequency based approaches to identify drivers
Assume that cancer drivers are mutated more frequently than background in a cohort of tumours
samples
Recurrence analysis
gene
s
gene
s
not mutatedmutated driver gene
MutSig (Broad Institute)MuSiC-SMG (Washington University)
• Difficulty to correctly estimate the background mutation rates
• Cannot identify lowly recurrent mutated driver genes
• Need raw data (eg. BAM files) to assess sequencing coverage per region
• Computationally costly
Main Challenges of frequency based approaches
How to identify drivers across projects in a scalable way?
How to identify drivers across projects in a scalable way?
• Do not need large nor protected data (eg. list of tumour somatic mutations)
• Are not computationally expensive
• Are robust to differences in mutation calling
Ideally computational methods that:
How to identify drivers across projects in a scalable way?
• Do not need large nor protected data (eg. list of tumour somatic mutations)
• Are not computationally expensive
• Are robust to differences in mutation calling
Ideally computational methods that:
OncodriveFM OncodriveCLUSTWe have developed 2 methods with these properties:
Finding drivers using functional impact bias (FM bias)
Gonzalez-Perez and Lopez-Bigas. NAR 2012
Abel Gonzalez-Perez
Gene A Gene B
Functional Impact metrics:•SIFT•Mutation Assessor•Polyphen2
FI score
highlow
OncodriveFM
1. Compute FI scores for nsSNVs (combining MutationAssessor, SIFT, Polyphen2)2. Compute FI scores of other variants (STOP, synonymous and frameshift) using a set of rules
SIFT Polyphen2 MutationAssessorSynonymous 1 0 -2
STOP-gain 0 1 3.5Frameshift 0 1 3.5
STEP 1: Assess the functional impact (FI) of all variants
FI score
not mutatedFI score
highlow
OncodriveFM method’s details
OncodriveFM method’s details
STEP 2: Compute FM bias per gene
samples
gene
s
gene
s
Functional Impact
HighLow
OncodriveFM
not mutated driver gene
OncodriveFM method’s details
Compute FM bias per module
not mutatedFI score
highlow 0.0010
FM qvalue
samplesm
odul
e 1
mod
ule
2
module 1module 2
OncodriveFM
• It does not depend on background mutation rates
• Only needs list of somatic mutations
• It is computationally cheap
• Can identify lowly recurrent mutated driver genes
Main Advantages of FM bias approach
OncodriveFM main advantages
One example: TCGA Glioblastoma FMbiasqvalue
MutSigqvalue
TP53PTENEGRFNF1RB1FKBP9ERBB2PIK3R1PIK3CAPIK3C2GIDH1ZNF708FGFR3CDKN2AALDH1A3PDGFRAFGFR1MAPK9DCNPIK3C2ACHEK2PSMD13GSTM5
8.5E-118.5E-118.5E-118.5E-112.5E-98.5E-111.2E-81.2E-82.3E-40.0028.5E-117.4E-103.2E-92.5E-85.2E-51.5E-62.0E-62.2E-51.5E-66.2E-5111
<1.0E-8<1.0E-8<1.0E-8<1.0E-8<1.0E-82.7E-81.0E-81.0E-81.0E-86.1E-5NANS0.82NSNS0.210.65NSNSNS0.0020.010.009
not mutatedMA score
5-2 0 0.05 10
FM / MutSig qvalue
Gonzalez-Perez and Lopez-Bigas. NAR 2012
OncodriveFM Results
OncodriveFM Results
PIK3R1PTENEGFRTP53
IDH1
RB1NF1
BRAF
PIK3CA
SPTA1
KRTAP4-11GABRA6
KEL
CDH18
RPL5
STAG2
OR8K3OR5AR1
LZTR1
MYH8
RPL5Onc
odriv
eFM
Qva
lue
MutSig Qvalue
TCGA Glioblastoma (2013)
TP53
KDM6A
FBXW7
NFE2L2
EP300
RB1ERCC2
CDKN1AARID1A
Onc
odriv
eFM
Qva
lue
MutSig Qvalue
TCGA BLCA (2013)
OncodriveFM Results
PIK3CA is recurrently mutated in the same residue in breast tumours
Lowly scored by functional impact metrics
H1047L
PIK3CA
Protein position0 1047
Prot
ein
affe
ctin
g m
utat
ions
80
0
Finding drivers using regional clustering of mutations
Tamborero et al., Under review
Pro
tein
affe
ctin
g m
utat
ions
Protein position
KRAS
0
500
0 175
OncodriveCLUST
12
David Tamborero
OncodriveCLUST method’s details
Th
Gene A Gene B(I)
(II)
(III)
(IV)
(V)
Th
SgeneA
= Sc1 S
geneB = Sc1
+ SC2
(VI)
0
ZA
ZB
mut
atio
ns
Amino acid
C1
C1 C2
Amino acid
mut
atio
ns
mut
atio
ns
mut
atio
ns
SgeneA
SgeneB Tamborero et al., Under review
background model obtained by calculating the clustering score per gene of the coding-silent mutations
• It does not depend on background mutation rates
• It is computationally cheap
• Only needs list of somatic mutations
• It is complementary to OncodriveFM
Main Advantages of FM bias approach
OncodriveCLUST main advantages
OncodriveCLUST Results
CGC
q O
ncoF
Mq
Onc
oCLU
STq
Mut
Sig
1389491221107655818635734348744484
TP53CDH1GATA3SF3B1AKT1MLL3MAP2K4RUNX1PTENRB1MYBNF1PIK3CAGNASCBFBPIK3R1KRASFGFR2EP300HLFARID1AMLLT4JAK2BRCA1ARID2ERBB2NIN
BRCA LUSC
CGC
q O
ncoF
Mq
Onc
oCLU
STq
Mut
Sig
TP53CDKN2ANFE2L2FBXW7PIK3CAPTENNF1EP300MLL2JUNCDH11EGFRNOTCH1MLL3RB1PPP2R1AGPC3ABL2SMARCA4MYH9NSD1TSC1EBF1NCOA2ARID1AAPCBRCA1DICER1
89102010201118628345818245451174697967
Gene significance is obtained by:
3 methods2 methods1 methodonly by OncodriveCLUST
Cancer gene census phenotype:dominantrecessive
Corrected p values scale:0
0.05
1
Not assessable
Combining methods with complementary principles helps to obtain a more comprehensive and
reliable list of cancer drivers
✓ Functional Impact Bias✓ Mutation Clustering✓ Mutation Frequency
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
Catalogs of tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations
Christian Perez-Llamas
Workflow Management Sytem
Catalogs of tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations
Christian Perez-Llamas
Workflow Management Sytem
Catalogs of tumor somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
Input data Analysis Pipeline (powered by Wok) Browser
IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations
Currently:27 Projects12 Cancer sites3229 tumours
.orghttp://beta.intogen.orgChristian Perez-Llamas
Workflow Management Sytem
27 cancer sequencing datasets analysed so far
Total = 3329
CANCER SITE AUTHORS SOURCENumber of Samples
brain TCGA TCGA DATA PORTAL 248brain DKFZ ICGC DCC 114brain Johns Hopkins University ICGC DCC 88breast TCGA TCGA DATA PORTAL 510breast Broad Institute PubMed 102breast WTSI ICGC DCC 100breast Washington University School of Medicine PubMed 75breast University of British Columbia PubMed 65breast Johns Hopkins University ICGC DCC 41colon TCGA TCGA DATA PORTAL 105colon Johns Hopkins University ICGC DCC 34corpus uteri TCGA TCGA DATA PORTAL 247hematopoietic CLL-ICGC ICGC DCC 109hematopoietic Dana-Farber Cancer Institute PubMed 90Kidney TCGA TCGA DATA PORTAL 298liver and bile ducts IACR ICGC DCC 24lung and bronchus TCGA TCGA DATA PORTAL 177lung and bronchus Washington University School of Medicine ICGC DCC 156lung and bronchus Johns Hopkins University PubMed 43lung and bronchus Medical College of Wisconsin PubMed 31lung and bronchus University of Cologne PubMed 26oropharynx Broad Institute PubMed 74ovary TCGA TCGA DATA PORTAL 337pancreas Johns Hopkins University ICGC DCC 113pancreas Queensland Centre for Medical Genomics ICGC DCC 67pancreas Ontario Institute for Cancer Research ICGC DCC 33stomach Pfizer Worldwide Research and Development PubMed 22
Combining results across projects
0.05 1
p-value
0
proj
ect 1
samples
gene
s
Functional Impact
project 1
HighLowNo mutation
OncodriveFM
gene
s
Combining results across projects
0.05 1
p-value
0
proj
ect 1
samples
gene
s
Functional Impact
project 1
HighLowNo mutation
OncodriveFM
gene
s
+
proj
ect 2
proj
ect 3
proj
ect 4
Can
cer s
ite A
...combine
Cancer site A
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
Jordi Deu-Pons
Powered by
Onexus creates IntOGen web discovery tool
Web discovery toolTabulated Files
www.onexus.org
http://beta.intogen.org
http://beta.intogen.org
KRASTP53SMAD4CDKN2A
SMARCA4
Frequency
http://beta.intogen.org/analysis
Tumor Somatic Mutations in one tumor
Users’s Data User’s private browser
SMpipeline
Tumor Somatic Mutations per sample
Users’s Data User’s private browser
SMpipeline
Use case 1: Cohort analysis
Use case 2: Single sample analysis
View matrix of mutated genes per sampleSee predicted impact of mutations
Find cancer driver genesFind FMbiased pathways
Explore the results in the context of accummulated knownledge in IntOGen
See predicted impact of mutationsFind recurrent mutations found in IntOGen
Find mutations in candidate driver genes found in IntOGen
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
.org
PanCancer project
The Mechanisms of tumorigenesis
Data Computational methods
Analysis
Results
PanCancer project
Visualization and analysis of genomic data using Interactive Heatmaps
http://www.gitools.org Perez-Llamas and Lopez-Bigas. PLoS ONE 2011
Christian Perez-Llamas
Muldimesional heatmaps
Michael P. Schroeder
Sort by mutually exclusive alterations
Schroeder MP, Gonzalez-Perez A and Lopez-Bigas N. Visualizing multidimensional cancer genomics data.Genome Medicine. 2013, 5:9
Summary
• OncodriveFM and OncodriveCLUST are complementary methods to identify cancer drivers
• Oncodrive methods are scalable and robust
• IntOGen contains results of analysing more than 3000 tumours to identify cancer drivers across sites
• IntOGenSM pipeline is available to run your own projects
• TCGA PanCancer analysis on the way
• Gitools - interactive heatmaps - useful to explore multidimesional cancer genomics data
Biomedical Genomics Lab
@bbglab@nlbigas
Gunes Gundem
Christian Perez-Llamas
Jordi Deu-Pons
Michael Schroeder
Alba Jené-Sanz
Nuria Lopez-Bigas David Tamborero Abel Gonzalez-Perez
Alberto Santos
http://bg.upf.edu/blog