65
Oncogenomics Workshop - EBI - UK March 14th, 2013 Nuria Lopez-Bigas University Pompeu Fabra Barcelona http://bg.upf.edu

Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Oncogenomics Workshop - EBI - UKMarch 14th, 2013

Nuria Lopez-BigasUniversity Pompeu Fabra

Barcelonahttp://bg.upf.edu

Page 2: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

across projects - across cancer sites

Page 3: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

across projects - across cancer sites

http://beta.intogen.org

Page 4: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 5: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Expression patterns

Somatic mutations

Epigenomic profiles

Structural aberrations

Copy number alterations

Patient cohortPrimary tumors

Cancer Genomics Data

Page 6: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Expression patterns

Somatic mutations

Epigenomic profiles

Structural aberrations

Copy number alterations

Patient cohortPrimary tumors

Cancer Genomics Data

Page 7: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

tumor sample

mached normal sample

Exome/Whole genome sequencing

Reads

Reads

Aligment

Aligned reads

FASTQ

Aligned reads

BAM

Mutation calling

Tumor somatic

mutations

VCF File formats:

Analysis protocol Laboratory protocol

Cancer genome re-sequencing

Tumours are heterogeneous in nature (multiclonality)

Variant calling pipelines entail judgement calls

Page 8: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 9: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 10: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

tumor sample

mached normal sample

Exome/Whole genome sequencing

Reads

Reads

Aligment

Aligned reads

FASTQ

Aligned reads

BAM

Mutation calling

Tumor somatic

mutations

VCF File formats:

Analysis protocol Laboratory protocol

Cancer genome re-sequencing

Which mutations are cancer drivers?

Page 11: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

How to identify cancer drivers?

Page 12: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

How to identify cancer drivers?

Find signs of positive selection across tumour re-sequenced genomes

Page 13: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Frequency based approaches to identify drivers

Assume that cancer drivers are mutated more frequently than background in a cohort of tumours

samples

Recurrence analysis

gene

s

gene

s

not mutatedmutated driver gene

MutSig (Broad Institute)MuSiC-SMG (Washington University)

Page 14: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Frequency based approaches to identify drivers

Assume that cancer drivers are mutated more frequently than background in a cohort of tumours

samples

Recurrence analysis

gene

s

gene

s

not mutatedmutated driver gene

MutSig (Broad Institute)MuSiC-SMG (Washington University)

• Difficulty to correctly estimate the background mutation rates

• Cannot identify lowly recurrent mutated driver genes

• Need raw data (eg. BAM files) to assess sequencing coverage per region

• Computationally costly

Main Challenges of frequency based approaches

Page 15: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

How to identify drivers across projects in a scalable way?

Page 16: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

How to identify drivers across projects in a scalable way?

• Do not need large nor protected data (eg. list of tumour somatic mutations)

• Are not computationally expensive

• Are robust to differences in mutation calling

Ideally computational methods that:

Page 17: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

How to identify drivers across projects in a scalable way?

• Do not need large nor protected data (eg. list of tumour somatic mutations)

• Are not computationally expensive

• Are robust to differences in mutation calling

Ideally computational methods that:

OncodriveFM OncodriveCLUSTWe have developed 2 methods with these properties:

Page 18: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Finding drivers using functional impact bias (FM bias)

Gonzalez-Perez and Lopez-Bigas. NAR 2012

Abel Gonzalez-Perez

Gene A Gene B

Functional Impact metrics:•SIFT•Mutation Assessor•Polyphen2

FI score

highlow

OncodriveFM

Page 19: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

1. Compute FI scores for nsSNVs (combining MutationAssessor, SIFT, Polyphen2)2. Compute FI scores of other variants (STOP, synonymous and frameshift) using a set of rules

SIFT Polyphen2 MutationAssessorSynonymous 1 0 -2

STOP-gain 0 1 3.5Frameshift 0 1 3.5

STEP 1: Assess the functional impact (FI) of all variants

FI score

not mutatedFI score

highlow

OncodriveFM method’s details

Page 20: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

OncodriveFM method’s details

STEP 2: Compute FM bias per gene

samples

gene

s

gene

s

Functional Impact

HighLow

OncodriveFM

not mutated driver gene

Page 21: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

OncodriveFM method’s details

Compute FM bias per module

not mutatedFI score

highlow 0.0010

FM qvalue

samplesm

odul

e 1

mod

ule

2

module 1module 2

OncodriveFM

Page 22: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

• It does not depend on background mutation rates

• Only needs list of somatic mutations

• It is computationally cheap

• Can identify lowly recurrent mutated driver genes

Main Advantages of FM bias approach

OncodriveFM main advantages

Page 23: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

One example: TCGA Glioblastoma FMbiasqvalue

MutSigqvalue

TP53PTENEGRFNF1RB1FKBP9ERBB2PIK3R1PIK3CAPIK3C2GIDH1ZNF708FGFR3CDKN2AALDH1A3PDGFRAFGFR1MAPK9DCNPIK3C2ACHEK2PSMD13GSTM5

8.5E-118.5E-118.5E-118.5E-112.5E-98.5E-111.2E-81.2E-82.3E-40.0028.5E-117.4E-103.2E-92.5E-85.2E-51.5E-62.0E-62.2E-51.5E-66.2E-5111

<1.0E-8<1.0E-8<1.0E-8<1.0E-8<1.0E-82.7E-81.0E-81.0E-81.0E-86.1E-5NANS0.82NSNS0.210.65NSNSNS0.0020.010.009

not mutatedMA score

5-2 0 0.05 10

FM / MutSig qvalue

Gonzalez-Perez and Lopez-Bigas. NAR 2012

OncodriveFM Results

Page 24: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

OncodriveFM Results

PIK3R1PTENEGFRTP53

IDH1

RB1NF1

BRAF

PIK3CA

SPTA1

KRTAP4-11GABRA6

KEL

CDH18

RPL5

STAG2

OR8K3OR5AR1

LZTR1

MYH8

RPL5Onc

odriv

eFM

Qva

lue

MutSig Qvalue

TCGA Glioblastoma (2013)

Page 25: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

TP53

KDM6A

FBXW7

NFE2L2

EP300

RB1ERCC2

CDKN1AARID1A

Onc

odriv

eFM

Qva

lue

MutSig Qvalue

TCGA BLCA (2013)

OncodriveFM Results

Page 26: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

PIK3CA is recurrently mutated in the same residue in breast tumours

Lowly scored by functional impact metrics

H1047L

PIK3CA

Protein position0 1047

Prot

ein

affe

ctin

g m

utat

ions

80

0

Page 27: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Finding drivers using regional clustering of mutations

Tamborero et al., Under review

Pro

tein

affe

ctin

g m

utat

ions

Protein position

KRAS

0

500

0 175

OncodriveCLUST

12

David Tamborero

Page 28: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

OncodriveCLUST method’s details

Th

Gene A Gene B(I)

(II)

(III)

(IV)

(V)

Th

SgeneA

= Sc1 S

geneB = Sc1

+ SC2

(VI)

0

ZA

ZB

mut

atio

ns

Amino acid

C1

C1 C2

Amino acid

mut

atio

ns

mut

atio

ns

mut

atio

ns

SgeneA

SgeneB Tamborero et al., Under review

background model obtained by calculating the clustering score per gene of the coding-silent mutations

Page 29: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

• It does not depend on background mutation rates

• It is computationally cheap

• Only needs list of somatic mutations

• It is complementary to OncodriveFM

Main Advantages of FM bias approach

OncodriveCLUST main advantages

Page 30: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

OncodriveCLUST Results

CGC

q O

ncoF

Mq

Onc

oCLU

STq

Mut

Sig

1389491221107655818635734348744484

TP53CDH1GATA3SF3B1AKT1MLL3MAP2K4RUNX1PTENRB1MYBNF1PIK3CAGNASCBFBPIK3R1KRASFGFR2EP300HLFARID1AMLLT4JAK2BRCA1ARID2ERBB2NIN

BRCA LUSC

CGC

q O

ncoF

Mq

Onc

oCLU

STq

Mut

Sig

TP53CDKN2ANFE2L2FBXW7PIK3CAPTENNF1EP300MLL2JUNCDH11EGFRNOTCH1MLL3RB1PPP2R1AGPC3ABL2SMARCA4MYH9NSD1TSC1EBF1NCOA2ARID1AAPCBRCA1DICER1

89102010201118628345818245451174697967

Gene significance is obtained by:

3 methods2 methods1 methodonly by OncodriveCLUST

Cancer gene census phenotype:dominantrecessive

Corrected p values scale:0

0.05

1

Not assessable

Page 31: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Combining methods with complementary principles helps to obtain a more comprehensive and

reliable list of cancer drivers

✓ Functional Impact Bias✓ Mutation Clustering✓ Mutation Frequency

Page 32: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 33: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Catalogs of tumor somatic

mutations

✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)

Input data Analysis Pipeline (powered by Wok) Browser

IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations

Christian Perez-Llamas

Workflow Management Sytem

Page 34: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 35: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Catalogs of tumor somatic

mutations

✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)

Input data Analysis Pipeline (powered by Wok) Browser

IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations

Christian Perez-Llamas

Workflow Management Sytem

Page 36: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Catalogs of tumor somatic

mutations

✓ Identify consequences of mutations (Ensembl VEP)✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)✓ Compute frequency of mutations per gene and pathway✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)

Input data Analysis Pipeline (powered by Wok) Browser

IntOGen SM-Analysis pipeline To interpret catalogs of cancer somatic mutations

Currently:27 Projects12 Cancer sites3229 tumours

.orghttp://beta.intogen.orgChristian Perez-Llamas

Workflow Management Sytem

Page 37: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

27 cancer sequencing datasets analysed so far

Total = 3329

CANCER SITE AUTHORS SOURCENumber of Samples

brain TCGA TCGA DATA PORTAL 248brain DKFZ ICGC DCC 114brain Johns Hopkins University ICGC DCC 88breast TCGA TCGA DATA PORTAL 510breast Broad Institute PubMed 102breast WTSI ICGC DCC 100breast Washington University School of Medicine PubMed 75breast University of British Columbia PubMed 65breast Johns Hopkins University ICGC DCC 41colon TCGA TCGA DATA PORTAL 105colon Johns Hopkins University ICGC DCC 34corpus uteri TCGA TCGA DATA PORTAL 247hematopoietic CLL-ICGC ICGC DCC 109hematopoietic Dana-Farber Cancer Institute PubMed 90Kidney TCGA TCGA DATA PORTAL 298liver and bile ducts IACR ICGC DCC 24lung and bronchus TCGA TCGA DATA PORTAL 177lung and bronchus Washington University School of Medicine ICGC DCC 156lung and bronchus Johns Hopkins University PubMed 43lung and bronchus Medical College of Wisconsin PubMed 31lung and bronchus University of Cologne PubMed 26oropharynx Broad Institute PubMed 74ovary TCGA TCGA DATA PORTAL 337pancreas Johns Hopkins University ICGC DCC 113pancreas Queensland Centre for Medical Genomics ICGC DCC 67pancreas Ontario Institute for Cancer Research ICGC DCC 33stomach Pfizer Worldwide Research and Development PubMed 22

Page 38: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Combining results across projects

0.05 1

p-value

0

proj

ect 1

samples

gene

s

Functional Impact

project 1

HighLowNo mutation

OncodriveFM

gene

s

Page 39: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Combining results across projects

0.05 1

p-value

0

proj

ect 1

samples

gene

s

Functional Impact

project 1

HighLowNo mutation

OncodriveFM

gene

s

+

proj

ect 2

proj

ect 3

proj

ect 4

Can

cer s

ite A

...combine

Cancer site A

Page 40: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 41: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 42: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Jordi Deu-Pons

Powered by

Onexus creates IntOGen web discovery tool

Web discovery toolTabulated Files

www.onexus.org

Page 43: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

http://beta.intogen.org

Page 44: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

http://beta.intogen.org

Page 45: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 46: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 47: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 48: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 49: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 50: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 51: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 52: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 53: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 54: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

KRASTP53SMAD4CDKN2A

SMARCA4

Frequency

Page 55: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 56: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

http://beta.intogen.org/analysis

Page 57: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Tumor Somatic Mutations in one tumor

Users’s Data User’s private browser

SMpipeline

Tumor Somatic Mutations per sample

Users’s Data User’s private browser

SMpipeline

Use case 1: Cohort analysis

Use case 2: Single sample analysis

View matrix of mutated genes per sampleSee predicted impact of mutations

Find cancer driver genesFind FMbiased pathways

Explore the results in the context of accummulated knownledge in IntOGen

See predicted impact of mutationsFind recurrent mutations found in IntOGen

Find mutations in candidate driver genes found in IntOGen

Page 58: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Page 59: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

Page 60: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

.org

PanCancer project

Page 61: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

The Mechanisms of tumorigenesis

Data Computational methods

Analysis

Results

PanCancer project

Page 62: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Visualization and analysis of genomic data using Interactive Heatmaps

http://www.gitools.org Perez-Llamas and Lopez-Bigas. PLoS ONE 2011

Christian Perez-Llamas

Page 63: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Muldimesional heatmaps

Michael P. Schroeder

Sort by mutually exclusive alterations

Schroeder MP, Gonzalez-Perez A and Lopez-Bigas N. Visualizing multidimensional cancer genomics data.Genome Medicine. 2013, 5:9

Page 64: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Summary

• OncodriveFM and OncodriveCLUST are complementary methods to identify cancer drivers

• Oncodrive methods are scalable and robust

• IntOGen contains results of analysing more than 3000 tumours to identify cancer drivers across sites

• IntOGenSM pipeline is available to run your own projects

• TCGA PanCancer analysis on the way

• Gitools - interactive heatmaps - useful to explore multidimesional cancer genomics data

Page 65: Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop

Biomedical Genomics Lab

@bbglab@nlbigas

Gunes Gundem

Christian Perez-Llamas

Jordi Deu-Pons

Michael Schroeder

Alba Jené-Sanz

Nuria Lopez-Bigas David Tamborero Abel Gonzalez-Perez

Alberto Santos

http://bg.upf.edu/blog