14
BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies Thomas Sakoparnig Feb 05, 2016 1 / 15

BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

BitPhylogeny: a probabilistic framework forreconstructing intra-tumor phylogenies

Thomas Sakoparnig

Feb 05, 2016

1 / 15

Page 2: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Intra-tumor view of carcinogenesis

Figure: Nik-Zainal et al., (2012) Cell

2 / 15

Page 3: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Intra-tumor tree of one breast cancer sample

Figure: Nik-Zainal et al., (2012) Cell3 / 15

Page 4: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Intra-tumor phylogenies

Poly-clonal tumor

Clones Normal

A B CClassical phylogenetic trees,hierarchical clustering

Di�erentiation hierarchy

10%

60%30%

Goal: probabilistic model for differentiation hierarchy

4 / 15

Page 5: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Tree structured Dirichlet process - probabilistic clusteringprior

(a) Dirichlet process stick breaking

(b) Tree-structured stick breaking

Figure 1: a) Dirichlet process stick-breaking procedure, with a linear partitioning. b) Interleaving two stick-breaking processes yields a tree-structured partition. Rows 1, 3 and 5 are ν-breaks. Rows 2 and 4 are ψ-breaks.

“prototypes”) should be able to live at internal nodes in the tree, and 2) as the ancestor/descendantrelationships are not known a priori, the data should be infinitely exchangeable.

2 A Tree-Structured Stick-Breaking Process

Stick-breaking processes based on the beta distribution have played a prominent role in the develop-ment of Bayesian nonparametric methods, most significantly with the constructive approach to theDirichlet process (DP) due to Sethuraman [10]. A random probability measure G can be drawn froma DP with base measure αH using a sequence of beta variates via:

G =∞�

i=1

πi δθiπi = νi

i−1�i�=1

(1− νi�) θi ∼ H νi ∼ Be(1, α) π1 = ν1. (1)

We can view this as taking a stick of unit length and breaking it at a random location. We call theleft side of the stick π1 and then break the right side at a new place, calling the left side of this newbreak π2. If we continue this process of “keep the left piece and break the right piece again” as inFig. 1a, assigning each πi a random value drawn from H , we can view this is a random probabilitymeasure centered on H . The distribution over the sequence (π1, π2, · · · ) is a case of the GEMdistribution [11], which also includes the Pitman-Yor process [12]. Note that in Eq. (1) the θi are i.i.d.from H; in the current paper these parameters will be drawn according to a hierarchical process.

The GEM construction provides a distribution over infinite partitions of the unit interval, with naturalnumbers as the index set as in Fig. 1a. In this paper, we extend this idea to create a distribution overinfinite partitions that also possess a hierarchical graph topology. To do this, we will use finite-lengthsequences of natural numbers as our index set on the partitions. Borrowing notation from the Polyatree (PT) construction [13], let �=(�1, �2, · · · , �K), denote a length-K sequence of positive integers,i.e., �k∈N+. We denote the zero-length string as �=ø and use |�| to indicate the length of �’ssequence. These strings will index the nodes in the tree and |�| will then be the depth of node �.

We interleave two stick-breaking procedures as in Fig. 1b. The first has beta variates ν�∼Be(1, α(|�|))which determine the size of a given node’s partition as a function of depth. The second has betavariates ψ�∼Be(1, γ), which determine the branching probabilities. Interleaving these processespartitions the unit interval. The size of the partition associated with each � is given by

π� = ν�ϕ�

���≺�

ϕ��(1− ν��) ϕ��i= ψ��i

�i−1�j=1

(1− ψ�j) πø = νø, (2)

where ��i denotes the sequence that results from appending �i onto the end of �, and ��≺� indicatesthat � could be constructed by appending onto ��. When viewing these strings as identifying nodes ona tree, {��i : �i∈1, 2, · · · } are the children of � and {�� : ��≺�} are the ancestors of �. The {π�} inEq. (2) can be seen as products of several decisions on how to allocate mass to nodes and branches inthe tree: the {ϕ�} determine the probability of a particular sequence of children and the ν� and (1−ν�)terms determine the proportion of mass allotted to � versus nodes that are descendants of �.

2

Adams et al. (2010) NIPS

I Nodes correspond to clonesI Data is placed in nodes

5 / 15

Page 6: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Flexible framework

C

C

AA

AA

AA ...ACTACAGCAC..

...ACTACAGCAC..

...ACTACCGCAC..

...ACTACAGCAC..

...ACTACAGCAC..

...ACTACCGCAC..

...ACTACAGCAC..

...ACTACAGCAC..

...ACTACAGCAC..

I transition kernel - clone parameters depend on parent clonesI provides link to classical phylogenyI back mutation for methylationsI no back mutation for single nucleotide variants

6 / 15

Page 7: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Performance - clustering and tree summaries

mutator phenotype

0 0.01 0.02 0.05

0.0

0.2

0.4

0.6

0.8

1.0

v−m

easu

re

0 0.01 0.02 0.05

0.5

0.6

0.7

0.8

0.9

1.0

v−m

easu

re

tracesBitPhylogenyk−centroidshierarchical clustering

0 0.01 0.02 0.05

0.5

0.6

0.7

0.8

0.9

1.0

v−m

easu

re

error

5 10 15 20

05

1015

mono clone t[2]

mon

o_cl

one_

t[1]

truthBitPhylogenyk−centroidshierarchical clustering

max

imum

tree

dep

th

5 10 15 20 25 30

hyper clone t[2]

hype

r_cl

one_

t[1]

05

1015

max

imum

tree

dept

h

noiselessnoise levels:0.01,0.02,0.05

15 20 25 30

05

1015

20

number of clones

max

imum

tree

dep

thm

axim

um tr

ee d

epth

number of nodes

polyclonal

monclonal

7 / 15

Page 8: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Colon cancer

I about 10.000 cells persample

I Bisulfite sequencing(bulk sequencing)

I IRX2 locus: 201 bplocus

I span 8 CpG regionSottoriva et al. (2013)Cancer Research

8 / 15

Page 9: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Tumor I

0.0

0.2

0.4

0.6

0.8

1

CT_L2 CT_L3 CT_L7 CT_L8 CT_R1 CT_R4 CT_R5 CT_R6

A

0.0

0.5

1.0

1.5

2.0

2 3 4 5Maximum tree depth

Densi

ty

B

0 2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Total branch legnth

Densi

ty

C

left sideright side

Left side Right side

9 / 15

Page 10: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Tumor II

0 2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

Total branch length

Densi

ty

Number of clonesM

axim

um

tre

e d

ep

th4 6 8 10 12

2.6

2.8

3.0

3.2

left sideright side

10 / 15

Page 11: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Leukemia - Myeloproliferative neoplasm

I 56 cancer cellsI whole-exome

sequencingI 712 SNVsI 43 % allelic dropout

rateI assumption: infinite

sites modelHou et. al (2013) Cell

11 / 15

Page 12: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Clonal hierarchy

a:1

00

e:2

d:6

b:7

c:34

f:3

h:2i:4

subtree 2

subtree 1

subtree 3

g:1

0

12 / 15

Page 13: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Discussion

I Probabilistic model for intra-tumor phylogeny reconstructionI Faster inference neededI Comparison method for evolutionary trees is lackingI Vision: patient stratification

13 / 15

Page 14: BitPhylogeny: a probabilistic framework for reconstructing ...€¦ · 2 A Tree-Structured Stick-Breaking Process Stick-breaking processes based on the beta distribution have played

Acknowledgement

I Niko BeerenwinkelI Florian MarkowetzI Ke Yuan

14 / 15