45
gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella, Mauricio Soto Gomez, Murray Patterson, Gianluca Della Vedova, Iman Hajirasouliha and Paola Bonizzoni ICCABS, Las Vegas 2018

gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell dataSimone Ciccolella, Mauricio Soto Gomez, Murray Patterson, Gianluca Della Vedova, Iman Hajirasouliha and Paola Bonizzoni

ICCABS, Las Vegas 2018

Page 2: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Introduction

Simone Ciccolella ICCABS 2018

• Cancer phylogeny

• Mutation losses

• gpps

• Experimental results

Page 3: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Cancer evolution

Simone Ciccolella ICCABS 2018

Page 4: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Cancer evolution

Simone Ciccolella ICCABS 2018

Sequencing

Page 5: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption

Simone Ciccolella ICCABS 2018

• The most used assumption for the inference of cancer evolutions• Permits the use of the simplest phylogeny model

No two mutations can occur at the same locus (site). Kimura, Genetics, 1969.

Page 6: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption

Simone Ciccolella ICCABS 2018

• The most used assumption for the inference of cancer evolutions• Permits the use of the simplest phylogeny model

No two mutations can occur at the same locus (site). Kimura, Genetics, 1969.

• “Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.”

Kuipers et al., Genome Research, 2017.

• “In genomically unstable cancers, deletion of large chromosomal segments is common.”

Brown et al., Nature 8, 2017.

Page 7: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Phylogenies: Perfect vs Dollo

Simone Ciccolella ICCABS 2018

Perfect Phylogeny

BA

FC D

E

Each mutation is acquired once in the evolutionaryhistory

H

Page 8: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Phylogenies: Perfect vs Dollo

Simone Ciccolella ICCABS 2018

Perfect Phylogeny Dollo-k Phylogeny

BA

FC D

E

Each mutation is acquired once in the evolutionaryhistory

BA

FC D

A1–

E

B1–

G

A2–

H

I

Each mutation is acquired once, but it can be lost at most k times in the evolutionary history

H

Page 9: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

……

……

Page 10: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

……

……

……

Page 11: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

Page 12: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

……

……

Page 13: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

……

……

……

Page 14: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Loss of a mutation

Simone Ciccolella ICCABS 2018

Page 15: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Single Cell Sequencing

Simone Ciccolella ICCABS 2018

Page 16: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Available methods for SCS

Simone Ciccolella ICCABS 2018

SCITE:

• Markov Chain Monte Carlo (MCMC) maximum likelihood tree search

• Relies on the Perfect Phylogeny model• Produces solutions with respect to the Infinite

Site Assumption

Tree inference for single-cell data. Jahn K., Kuipers J. and Beerenwinkel N., Genome Biology, 2016.

Page 17: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Available methods for SCS

Simone Ciccolella ICCABS 2018

SCITE:

• Markov Chain Monte Carlo (MCMC) maximum likelihood tree search

• Relies on the Perfect Phylogeny model• Produces solutions with respect to the Infinite

Site Assumption

SiFit:

• Hidden Markov Model (HMM) maximum likelihood tree search

• Does not impose any specific phylogeny model

• Can produce solutions that violate the Infinite Site Assumption

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models.

Zafar H., Tzen A., Navin N., Chen K. and Nakhleh L., Genome Biology, 2017.

Tree inference for single-cell data. Jahn K., Kuipers J. and Beerenwinkel N., Genome Biology, 2016.

Page 18: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption• Once a mutation arises it is not possible to lose it

A B C Ds1 0 1 1 1s2 1 0 1 0s3 0 0 1 0s4 0 0 1 1s5 0 1 1 1

E11000

For each pair of columns (mutations) there are no three rows with all the three configurations (0,1),(1,0),(1,1)

Simone Ciccolella ICCABS 2018

Page 19: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption• Once a mutation arises it is not possible to lose it

A B C Ds1 0 1 1 1

s2 1 0 1 0

s3 0 0 1 0

s4 0 0 1 1

s5 0 1 1 1

E11000

For each pair of columns (mutations) there are no three rows with all the three configurations (0,1),(1,0),(1,1)

Simone Ciccolella ICCABS 2018

Page 20: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption• Once a mutation arises it is not possible to lose it

A B C Ds1 0 1 1 1s2 1 0 1 0s3 0 0 1 0s4 0 0 1 1s5 0 1 1 1

E1

1

0

0

0

For each pair of columns (mutations) there are no three rows with all the three configurations (0,1),(1,0),(1,1)

Simone Ciccolella ICCABS 2018

Page 21: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Infinite Sites Assumption• Once a mutation arises it is not possible to lose it

A B C Ds1 0 1 1 1s2 1 0 1 0s3 0 0 1 0s4 0 0 1 1s5 0 1 1 1

E1

1

0

0

0

For each pair of columns (mutations) there are no three rows with all the three configurations (0,1),(1,0),(1,1)

Simone Ciccolella ICCABS 2018

A binary matrix 𝑀 is a Dollo Phylogeny if the extended matrix 𝑀3 is a Perfect Phylogeny

Bonizzoni etal.,TCBB2018

Page 22: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A B C D E

s1 0 1 1 1 1

s2 1 0 1 0 1

s3 0 0 1 0 0

s4 0 0 1 1 0

s5 0 1 1 1 0

Simone Ciccolella ICCABS 2018

Page 23: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A A- B B- C C- D D- E E-

s1 0 1 1 1 1

s2 1 0 1 0 1

s3 0 0 1 0 0

s4 0 0 1 1 0

s5 0 1 1 1 0

Simone Ciccolella ICCABS 2018

Page 24: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A A- B B- C C- D D- E E-

s1 0 0 1 1 1 0 1 0 1 1

s2 1 0 0 0 1 0 0 0 1 0

s3 0 0 0 0 1 1 0 0 0 0

s4 0 0 0 0 1 0 1 0 0 0

s5 0 0 1 0 1 0 1 1 0 0

Simone Ciccolella ICCABS 2018

Page 25: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A A- B B- C C- D D- E E-

s1 0 0 1 1 1 0 1 0 1 1

s2 1 0 0 0 1 0 0 0 1 0

s3 0 0 0 0 1 1 0 0 0 0

s4 0 0 0 0 1 0 1 0 0 0

s5 0 0 1 0 1 0 1 1 0 0

Simone Ciccolella ICCABS 2018

Mutation A in s1 is never acquired and never lost

Page 26: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A A- B B- C C- D D- E E-

s1 0 0 1 1 1 0 1 0 1 1

s2 1 0 0 0 1 0 0 0 1 0

s3 0 0 0 0 1 1 0 0 0 0

s4 0 0 0 0 1 0 1 0 0 0

s5 0 0 1 0 1 0 1 1 0 0

Simone Ciccolella ICCABS 2018

Mutation C in s5 is acquired but never lost

Page 27: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Dollo extended matrix

A A- B B- C C- D D- E E-

s1 0 0 1 1 1 0 1 0 1 1

s2 1 0 0 0 1 0 0 0 1 0

s3 0 0 0 0 1 1 0 0 0 0

s4 0 0 0 0 1 0 1 0 0 0

s5 0 0 1 0 1 0 1 1 0 0

Simone Ciccolella ICCABS 2018

Mutation E in s1 is acquired and then lost

Page 28: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

Page 29: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

𝛼 = false negative rate

Page 30: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

𝛼 = false negative rate𝛽 = false positive rate

Page 31: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

𝛼 = false negative rate𝛽 = false positive rate𝐼 = Input matrix

Page 32: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

𝛼 = false negative rate𝛽 = false positive rate𝐼 = Input matrix𝐹 = Inferred matrix

Page 33: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚

if𝐼 𝑐,𝑚 = 1if𝐼 𝑐,𝑚 = 0

𝛼 = false negative rate𝛽 = false positive rate𝐼 = Input matrix𝐹 = Inferred matrix max7 7 log(𝑃 𝐼(𝑐,𝑚) 𝐹(𝑐,𝑚 )

?

B=

Page 34: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: ILP

Simone Ciccolella ICCABS 2018

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

• Additional set of constraints used to ensure a Dollo model on the tree

Gusfield etal.,ReCombinatorics 2014

Bonizzoni etal.,TCBB2018

Gusfield,RECOMB 2002

Page 35: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process after the ILP is stopped

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚

2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚…

ILP Solver

Page 36: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process after the ILP is stopped

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚

2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚…

ILP SolverTimeout

Sub-optimal solution

Page 37: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process after the ILP is stopped

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚

2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚…

ILP SolverTimeout

Sub-optimal solution

Hill Climbing

Page 38: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process after the ILP is stopped

max7 7 log𝑤(𝑐,𝑚)�

?∈A

B∈C

1)𝑤 𝑐,𝑚 = 1 − 𝛼 𝐹 𝑐,𝑚 + 𝛽 1 − 𝐹 𝑐,𝑚

2)𝑤 𝑐,𝑚 = 𝛼𝐹 𝑐,𝑚 + (1 − 𝛽) 1 − 𝐹 𝑐,𝑚…

ILP SolverTimeout

Sub-optimal solution

Hill Climbing

Final solution

Page 39: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process using the Subtree Prune and Reattach operation

Page 40: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process using the Subtree Prune and Reattach operation

A

B C

D E F

G HI

PruneReattach

Page 41: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

gpps: Hill Climbing

Simone Ciccolella ICCABS 2018

Continues the search process using the Subtree Prune and Reattach operation

A

B C

D E F

G HI

PruneReattach

A

B C

D E

F

G H

I

Page 42: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Results: Simulated data

Simone Ciccolella ICCABS 2018

• Ancestor-Descendant accuracy:Pairs of mutations in Ancestor-Descendant relationship correctly inferred

• Different lineages accuracy:Pairs of mutations in different branches correctly inferred

F1 scores

Ancestor-Descendant accuracy

Different-Lineages accuracy

gpps gpps

Page 43: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Real data: ER Breast Cancer

Simone Ciccolella ICCABS 2018

Data from: Wang et al. Nature, 2014

Tree inferred by gpps

Bold-faced mutations are drivers and red nodes are losses

DKEZ,DNM3,TRIM58,C15orf23,DCAF8L1,FBN2,H1ENT,TECTA,

ZNE318,CASP3,FUBP3,ITGAD,MUTHY,PPP2RE,LSG1,RABGAP1L,ROPN1B,BTLA,

DUSP12,CABP2,MARCH11,PAMK3,TCP11,CALD1,PITRM1,PRDM9,PIK3CA,GLCE,FCHSD2,SEC11A,

TRIB2,CNDP1,GPR64,FGFR2,CXXX1,c1orf223,KIAA1539,WDR16,PLXNA2

ZEHX4 CNDP1,TECTA,CXXX1

c1orf223,ZNE318,ITGAD,RABGAP1L

Manually curated clonal evolution

Page 44: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Conclusions

Simone Ciccolella ICCABS 2018

• gpps is an accurate tool for inferring intra-tumor progression and subclonal composition from SCS data

• gpps infers mutation losses employing a Dollo model

• gpps provides a new progression model on Single Cell data

• Future directions: 1. Explore different heuristics (Genetic Programming)2. Include the possible presence of doublets to the model

Page 45: gpps: an ILP-based approach for inferring cancer ... · gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data Simone Ciccolella,

Thank you!