Upload
others
View
8
Download
1
Embed Size (px)
Citation preview
Beta Tucker Decomposition for DNA Methylation Data
Aaron Schein UMass Amherst
Mingyuan Zhou Univ. Texas at Austin
Hanna Wallach Microsoft Research
Joint work with:
Pat Flaherty UMass Amherst
Dan Sheldon UMass Amherst
DNA methylation
CATTCCGCCTTCTCTCCCGAGG
DNA methylation
CpG dinucleotides
CATTCCGCCTTCTCTCCCGAGG
DNA methylation
M
methylated unmethylated
CATTCCGCCTTCTCTCCCGAGG
DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT
TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT
CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG
CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG
ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC
CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT
CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA
GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT
CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC
CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC
CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT
TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC
GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC
TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC
CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT
CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG
CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT
TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT
CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC
CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA
GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG
CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC
TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG
AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC
TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC
CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC
TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
Gene
DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT
TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT
CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG
CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG
ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC
CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT
CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA
GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT
CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC
CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC
CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT
TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC
GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC
TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC
CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT
CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG
CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT
TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT
CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC
CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA
GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG
CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC
TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG
AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC
TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC
CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC
TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
CpG island
DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT
TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT
CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG
CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG
ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC
CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT
CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA
GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT
CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC
CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC
CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT
TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC
GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC
TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC
CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT
CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG
CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT
TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT
CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC
CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA
GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG
CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC
TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG
AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC
TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC
CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC
TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
CpG island (often in the promoter region)
DNA methylation
CGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT
TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT
CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG
CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG
ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC
CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT
CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA
GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT
CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC
CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC
CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT
TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC
GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC
TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC
CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT
M M M
M
MMMMMM
MM
MMM
MM MM
MM M M
MM
MMMM M M
MMMM M
MMM
MM M
M MM
MMM
MM
MMMM
MMMM
MMCATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG
CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT
TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT
CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC
CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA
GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG
CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC
TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG
AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC
TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC
CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC
TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
M
M
M
M M
M
M
M
M
M
M
Gene is silenced
DNA methylation
CGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT
TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT
CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG
CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG
ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC
CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT
CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA
GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT
CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC
CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC
CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT
TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC
GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC
TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC
CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT
M
M
M
M
M M
M
M M
M
M
M
MMCATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG
CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT
TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT
CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC
CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA
GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG
CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC
TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG
AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC
TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC
CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC
TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT
TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC
M
M
M
M M
M
M
M
M
M
M
Gene is expressed
Abnormal DNA methylation
It causes cancer• Hypomethylation of oncogenes
• Hypermethylation of tumor suppressor genes
[Baylin & Ohm (2006)]
Cancer taxonomies
Sample 1 Sample 2
Cancer taxonomies
Sample 1 Sample 2
``Breast cancer”
Cancer taxonomies
Sample 3 Sample 4
Cancer taxonomies
Sample 3 Sample 4
``Ovarian cancer”
``Breast cancer”
Cancer taxonomies
Anatomically similar cancer cells may be genetically different
Anatomically different cancer cells may be genetically similar
Cancer taxonomies
Goal: Develop new taxonomies based on genetic information
ML solution: Unsupervised dimensionality reductionPCA, NMF, ICA,…
[Flusberg et al. (2010)]
[Teschendorff et al. (2007)][Wang et al. (2006)]
DNA methylation data
�ijhow methylated locus j is in sample i=
�ij 2 [0, 1]
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
�Lo
cus
6
CP decomposition
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
�Lo
cus
6k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
k=1
k=2
k=3
'
K ``components”
⇥ �
�ij 'KX
k=1
✓ik�kj
�ij 'KX
k=1
✓ik�kj⇡k
CP decomposition
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
�Lo
cus
6k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
k=1
k=2
k=3
'
K ``components”
⇥ �
k=1
k=2
k=3
k=1
k=2
k=3
⇧
Tucker decomposition
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
�Lo
cus
6k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
'
C ``clusters’’ and K ``components”
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧
�ij 'CX
c=1
✓ic
KX
k=1
⇡ck �kj
Our contributions:• Novel generative model
• Based on the Tucker decomposition • Matches the true data-generating process
✓ Beta likelihood ✓ Latent variables match real ones ✓ Priors match known sources of noise
• Gibbs sampler with closed form conditionals
Is it better than PCA/NMF/ICA/etc in practice?
Beta Tucker decomposition
• Comparable performance on (contrived) prediction tasks
Is it better than PCA/NMF/ICA/etc in theory?• Yes
• ??
[Ma et al. (2015)]
DNA methylation data
�ijhow methylated locus j is in sample i=
�ij 2 [0, 1]
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
�Lo
cus
6
CGTTTTTCTCM
CGCCTTCTCTCCCG CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Sample i
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
y(m)ij num. of methylated CpG sites
in locus j of sample =
y(u)ijnum. of unmethylated CpG sites in locus j of sample
=
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Sample i
Locus j
[Wang & Petronis (2008)]
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Sample i
Locus j
[Wang & Petronis (2008)]
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
�(m)ij �(u)
ij
Two real-valued fluorescent intensities
[Wang & Petronis (2008)]
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
�(m)ij �(u)
ij
�ij :=�(m)ij
�(m)ij + �(u)
ij
``Beta value”
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
�(m)ij �(u)
ij
n
�(m)ij ,�(u)
ij
oJ
j=1
Histogram of intensities for given sample i
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
DNA methylation data
Locus j
Sample i
�(m)ij �(u)
ij
n
�(m)ij ,�(u)
ij
oJ
j=1
Histogram of intensities for given sample i
�(m)ij ⇠ Gam(· · · , ci)
�(u)ij ⇠ Gam(· · · , ci)
Gamma-Beta relationship
�1 ⇠ Gam(↵1, c) �2 ⇠ Gam(↵2, c)
✓�1
�1 + �2
◆⇠ Beta(↵1, ↵2)
Gamma-Beta relationship
�(m)ij ⇠ Gam(· · · , ci) �(u)
ij ⇠ Gam(· · · , ci)
�ij :=�(m)ij
�(m)ij + �(u)
ij
�ij ⇠ Beta(· · · , · · · )
CGTTTTTCTC
MCGCCTTCTCTCCCG
CTCCCGCGTCCCGCGAM M M
ACGCGCCTTCTCTM
CGCCTTCTTTTTM
Locus j
Sample i
�(m)ij �(u)
ij
Beta Tucker decomposition
Locus j
Sample i
CTCCCGCGTCCCGCGAM M M
�(m)ij �(u)
ij
Beta Tucker decomposition
Locus j
Sample i
CTCCCGCGTCCCGCGAM M M
�(m)ij �(u)
ij
1 2 31
Beta Tucker decomposition
Locus j
Sample i
CTCCCGCGTCCCGCGAM M M
�(m)ij �(u)
ij
1 2 31
1 2= + + 3 = 1
Beta Tucker decomposition
Locus j
Sample i
CTCCCGCGTCCCGCGAM M M
�(m)ij �(u)
ij
1 2 31
1 2= + + 3 = 1
Beta Tucker decomposition
Locus j
Sample i
CTCCCGCGTCCCGCGAM M M
�(m)ij �(u)
ij
1 2 31
1 2= + + 3 = 1+ +
Beta Tucker decomposition
�(m)ij
�(u)ij
1 2= + + 3
= 1
+
+
Beta Tucker decomposition
Locus jSample i
Locus jSample i
�(m)ij
�(u)ij
= s
= s
+
+
Beta Tucker decomposition
y(m)ijX
s=1
y(u)ijX
s=1
Locus jSample i
�(m)ij
�(u)ij
= s
= s
+
+
Beta Tucker decomposition
y(m)ijX
s=1
y(u)ijX
s=1
⇠ Gam(
1,c i)
⇠ Gam(
b0,c i)
⇠ Gam⇣b0 + y(m)
ij , ci⌘
⇠ Gam⇣b0 + y(u)ij , ci
⌘
Beta Tucker decomposition
Locus jSample i
�(m)ij
�(u)ij
�(m)ij ⇠ Gam
⇣b0 + y(m)
ij , ci⌘
�(u)ij ⇠ Gam
⇣b0 + y(u)ij , ci
⌘
�ij :=�(m)ij
�(m)ij + �(u)
ij
�ij ⇠ Beta⇣b0 + y(m)
ij , b0 + y(u)ij
⌘
Equivalent to:
Beta Tucker decomposition
�(m)ij ⇠ Gam
⇣b0 + y(m)
ij , ci⌘
�(u)ij ⇠ Gam
⇣b0 + y(u)ij , ci
⌘
�ij :=�(m)ij
�(m)ij + �(u)
ij
Beta Tucker decomposition
y(m)ij ⇠ Pois(· · · ) y(u)ij ⇠ Pois(· · · )
Beta Tucker decomposition
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!
Beta Tucker decomposition
the probability that sample i is in cluster c
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!
Beta Tucker decomposition
the probability that samples in cluster c
methylate loci in component k
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!
Beta Tucker decomposition
the probability that locus j is in component k
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!
Beta Tucker decomposition
✓i ⇠ Dir(⌘1, . . . , ⌘C)
⇡ck ⇠ Beta(⌘(m)0 , ⌘(u)0 )
�j ⇠ Dir(⌫1, . . . , ⌫K)
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!
Beta Tucker decomposition
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!{= pij
Beta Tucker decomposition
y(m)ij ⇠ Pois
�
CX
c=1
✓ic
KX
k=1
⇡ck �kj
!{= pij
Beta Tucker decomposition
y(m)ij ⇠ Pois(� pij)
the probability that sample i methylates CpG sites in locus j
Beta Tucker decomposition
y(m)ij ⇠ Pois(� pij)
the occurrence rate of CpG sites
Beta Tucker decomposition
y(u)ij ⇠ Pois
�� (1� pij)
�y(m)ij ⇠ Pois
�� pij
�
pij :=CX
c=1
✓ic
KX
k=1
⇡ck�kj
�(m)ij ⇠ Gam
⇣b0 + y(m)
ij , ci⌘
�(u)ij ⇠ Gam
⇣b0 + y(u)ij , ci
⌘
�ij :=�(m)ij
�(m)ij + �(u)
ij
Beta Tucker decomposition
y(u)ij ⇠ Pois
�� (1� pij)
�y(m)ij ⇠ Pois
�� pij
�
pij :=CX
c=1
✓ic
KX
k=1
⇡ck�kj
�ij ⇠ Beta⇣b0 + y(m)
ij , b0 + y(u)ij
⌘
k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧
Beta Tucker decomposition
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (m)Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (u)
Sample 1
Sample 2
Sample 3
Sample 4
�
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
�
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (m)Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (u)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(m)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(u)
Beta Tucker decomposition
Sample 1
Sample 2
Sample 3
Sample 4
�
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (m)Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (u)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(m)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(u)
Inference
Sample 1
Sample 2
Sample 3
Sample 4
�
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (m)Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (u)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(m)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(u)
Inference
P⇣⇥,⇧,� |Y (m), Y (u), · · ·
⌘
= Poisson Tucker decomposition[Schein et al. (2016)]
Sample 1
Sample 2
Sample 3
Sample 4
�
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
k=1
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Sample 1
Sample 2
Sample 3
Sample 4
k=2
k=3
c=1
c=2
⇥ �
c=1
c=2
k=1
k=2
k=3
⇧Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (m)Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
Y (u)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(m)
Sample 1
Sample 2
Sample 3
Sample 4
Locu
s 1
Locu
s 2
Locu
s 3
Locu
s 4
Locu
s 5
Locu
s 6
⇤(u)
Inference
P⇣Y (m), Y (u) |⇤(m),⇤(u), · · ·
⌘
Inference
y(m)ij ⇠ Pois
�� pij
�
⇠ Gam⇣b0 + y(m)
ij , ci⌘
�(m)ij
P (y(m)ij |�(m)
ij , · · · ) =?
Poisson is not conjugate to the gamma…
…but maybe the posterior still has a closed form…
Inference
y(m)ij ⇠ Pois
�� pij
�
⇠ Gam⇣b0 + y(m)
ij , ci⌘
�(m)ij
The Bessel distribution!
P (y(m)ij |�(m)
ij , · · · ) = Bes
✓b0�1, 2
qci�
(m)ij � pij
◆ [Yuan & Kalbfleisch (2000)]
The Bessel distribution
Bes(y; v, a) / 1
y!�(y + v)
⇣a2
⌘2y+v
Sampling the Bessel
[Devroye (2002)] [Yuan & Kalbfleisch (2000)]
[Amos (1974)]
[Zhou (2015)]
Stable computation of Bessel functions
Exact rejection sampling (four methods)
Table sampling
Basic properties of Bessel distribution
It’s easy and fast
https://github.com/aschein/fatwalrus
MCMC algorithm
P⇣⇥,⇧,� |Y (m), Y (u), · · ·
⌘
Poisson Tucker decompositionO(CK|Y>0|)
O(2IJ)P⇣Y (m), Y (u) |⇤(m),⇤(u), · · ·
⌘
Sample Bessel counts
� controls sparsity!
Example results
⇥⇧
Top locus in component 8 is in
the promoter region of FLJ1030207
Hypomethylation of FLJ1030207 is
a strong indicator of ovarian cancer
[Model & Rujan (2009)]