Upload
swaantje-grevink
View
28
Download
0
Embed Size (px)
DESCRIPTION
Supplementary Material Epigenetic histone modifications of human transposable elements: genome defense versus exaptation Ahsan Huda, Leonardo Mariño-Ramírez and I. King Jordan Supplementary Figure 1. Comparison of the original (old) mapping and new mapping - PowerPoint PPT Presentation
Citation preview
0
5
10
15
20
25
30
H3
K4
me
3
H2
BK
5a
c
H3
K2
7a
c
H3
K9
ac
H2
BK
12
0a
c
H4
K9
1a
c
H3
K1
8a
c
H3
K4
ac
H4
K5
ac
H2
AK
5a
c
H3
K3
6a
c
H2
AK
9a
c
H2
BK
20
ac
H3
K7
9m
e2
H2
AZ
H2
BK
12
ac
H3
K7
9m
e3
H4
K8
ac
H4
K1
2a
c
H3
K4
me
2
H3
K3
6m
e3
H4
K2
0m
e1
H3
K9
me
1
H4
K1
6a
c
H3
K2
3a
c
H3
K7
9m
e1
H2
BK
5m
e1
H3
K4
me
1
Nu
mb
er
of
tag
s o
f a
cti
ve
mo
dif
ica
tio
ns
(m
illio
ns
)
Old Mapping New Mapping
Supplementary Material
Epigenetic histone modifications of human transposableelements: genome defense versus exaptationAhsan Huda, Leonardo Mariño-Ramírez and I. King Jordan
Supplementary Figure 1. Comparison of the original (old) mapping and new mapping procedures. (a) Number of tags of active histone tail modifications and (b) number of tags of repressive modifications mapped using the old and the new mapping techniques
(a)
(b)
0
5
10
15
20
25
30
H3
K9
me
3
H3
K2
7m
e1
H3
K1
4a
c
H3
R2
me
1
H4
K2
0m
e3
H3
K3
6m
e1
H3
K9
me
2
H3
R2
me
2
H3
K2
7m
e3
H3
K2
7m
e2
Nu
mb
er
of
tag
s o
f re
pre
ss
ive
mo
dif
ica
tio
ns
(m
illio
ns
)
Old Mapping New Mapping
-3
-2
-1
0
1
2
H3
K4
me
3H
2B
K5
ac
H3
K2
7a
cH
3K
9a
cH
2B
K1
20
ac
H4
K9
1a
cH
3K
18
ac
H3
K4
ac
H4
K5
ac
H2
AK
5a
cH
3K
36
ac
H2
AK
9a
cH
2B
K2
0a
cH
3K
79
me
2H
2A
ZH
2B
K1
2a
cH
3K
79
me
3H
4K
8a
cH
4K
12
ac
H3
K4
me
2H
3K
36
me
3H
4K
20
me
1H
3K
9m
e1
H4
K1
6a
cH
3K
23
ac
H3
K7
9m
e1
H2
BK
5m
e1
H3
K4
me
1H
3K
9m
e3
H3
K2
7m
e1
H3
K1
4a
cH
3R
2m
e1
H4
K2
0m
e3
H3
K3
6m
e1
H3
K9
me
2H
3R
2m
e2
H3
K2
7m
e3
H3
K2
7m
e2E
xp
res
sio
n f
old
ch
an
ge
(lo
g2)
Supplementary Figure 2. Effect of individual histone modifications on CD4+ T cellgene expression levels. Histone tail modifications were determined to be present or absent in the promoter regions of human genes as described in the Methods section. For each modification, the log2 normalized ratio of the average expression level for genes present for the modification over the average expression level for genes where the modification is absent is shown. Active modifications (red) have positive ratios, andrepressive modifications (green) have negative ratios.
Supplementary Table 1. Statistics for histone modification enrichment or depletion in TEs and human gene expression. Significance was calculated using the G-test and the P-value was adjusted for multiple tests using the Bonferroni correction (refer to Figure 1).
Supplementary Table 2. Correlation and statistical significance between gene expression enrichment or depletion and TE enrichment or depletion for 38 histone modifications (refer to Figure 2)
Alu L1 LTR DNA L2 MIRr 0.020132 -0.432517 -0.515575 -0.273047 0.082045 0.194877t 0.120815 -2.878243 -3.610283 -1.702993 0.493936 1.192117p- value 0.461729 0.106439 0.086011 0.169008 0.353964 0.222174
G p G p G p G p G p G p G pH3K4me3 1E+07 0 256505 0 631482 0 43338 0 47430 0 785.51 8E-173 1E+06 0H2BK5ac 1E+06 0 32745 0 93028 0 30925 0 8147.4 0 1084.1 9E-238 328030 0H3K27ac 1E+06 0 70683 0 113936 0 17276 0 7213.2 0 3037.6 0 631482 0H3K9ac 704496 0 25868 0 39591 0 170.19 7E-39 1655.8 0 889.88 2E-195 393172 0H2BK120ac 867104 0 400.38 5E-89 131025 0 13074 0 6183.6 0 3330.9 0 43488 0H4K91ac 825807 0 40683 0 361975 0 3306.9 0 10818 0 5742.5 0 339941 0H3K18ac 839704 0 3948.1 0 277985 0 905.94 5E-199 3900.1 0 11484 0 122161 0H3K4ac 374450 0 233.18 1E-52 104050 0 1532.5 0 2043.1 0 5524.2 0 42598 0H4K5ac 387350 0 52285 0 277515 0 2964.4 0 58.931 2E-14 17378 0 34153 0H2AK5ac 3903.8 0 245.18 3E-55 31759 0 267.37 4E-60 2349.2 0 3721.6 0 52898 0H3K36ac 610986 0 10898 0 83990 0 43.966 3E-11 97.349 6E-23 12808 0 197773 0H2AK9ac 32030 0 521.13 2E-115 1895.2 0 237.85 1E-53 41.383 1E-10 119.29 9E-28 135891 0H2BK20ac 526082 0 38126 0 246447 0 1629.1 0 2537.9 0 12181 0 45170 0H3K79me2 21442 0 89641 0 45170 0 42666 0 6754.8 0 6781.9 0 210506 0H2AZ 1E+06 0 162729 0 210232 0 24984 0 89.998 2E-21 19695 0 40338 0H2BK12ac 178059 0 36576 0 3475.3 0 13799 0 800.01 5E-176 1022.1 3E-224 31.912 2E-08H3K79me3 112140 0 758275 0 210506 0 15433 0 4203.6 0 14.253 0.0002 604264 0H4K8ac 321071 0 5587.4 0 146713 0 95.08 2E-22 443.6 2E-98 12248 0 176401 0H4K12ac 43343 0 794.61 8E-175 23035 0 7388.6 0 3178.8 0 4804.7 0 535117 0H3K4me2 429727 0 133398 0 328030 0 3232.7 0 1351.8 6E-296 7970.3 0 210232 0H3K36me3 755.92 2E-166 547116 0 197773 0 30072 0 436.06 8E-97 397.18 2E-88 104050 0H4K20me1 108316 0 689687 0 604264 0 191451 0 12041 0 2979.5 0 39591 0H3K9me1 316935 0 329307 0 393172 0 1812.2 0 18.433 2E-05 12763 0 42732 0H4K16ac 185657 0 71982 0 173120 0 4358.4 0 2631.7 0 1663.9 0 277985 0H3K23ac 19191 0 1029.7 6E-226 57430 0 15700 0 2106 0 13714 0 57430 0H3K79me1 28992 0 386834 0 135891 0 7202 0 167.13 3E-38 2915.6 0 113936 0H2BK5me1 102758 0 455709 0 535117 0 37498 0 8930.4 0 2481.6 0 83990 0H3K4me1 364966 0 2E+06 0 1E+06 0 55818 0 20559 0 11246 0 277515 0H3K9me3 14451 0 422117 0 339941 0 1078 2E-236 3179.1 0 2838.4 0 146713 0H3K27me1 8759.4 0 163552 0 122161 0 14708 0 972.46 2E-213 5277.9 0 23035 0H3K14ac 3431.7 0 2621.7 0 42732 0 19526 0 646.07 2E-142 6931.8 0 173120 0H3R2me1 4024 0 36426 0 40338 0 15997 0 476.23 1E-105 11191 0 361975 0H4K20me3 2818.1 0 224032 0 176401 0 142879 0 90922 0 95259 0 31759 0H3K36me1 4233.2 0 3358.6 0 52898 0 85314 0 2590.1 0 13606 0 1895.2 0H3K9me2 1330.9 2E-291 23439 0 43488 0 105773 0 2090.8 0 2455.2 0 93028 0H3R2me2 665.8 8E-147 97294 0 31.912 2E-08 734.18 1E-161 513.93 9E-114 3412.1 0 3475.3 0H3K27me3 18.077 2E-05 8247.2 0 34153 0 270665 0 948.08 3E-208 38819 0 246447 0H3K27me2 6453.1 0 11951 0 42598 0 207220 0 18.26 2E-05 4549.8 0 131025 0
L2DNA MIR
Act
ive
Rep
res
siv
e
Expression Alu L1 LTR
r = 0.99
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
-0.8 -0.3 0.2Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2)
Supplementary Figure 3. Comparison of global versus local methods for computing TE-histone modification enrichment ratios. For each TE class (family), log2 normalizedenrichment ratios were computed for the 38 histone modifications by using a genome-wide background tag count (y-axis) or a locally computed background tag count (x-axis).
r = 0.95
-2
-1.5
-1
-0.5
0
0.5
1
-2 -1 0 1Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2)
r = 0.95
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -0.5 0.5 1.5Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2)
r = 0.91
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
-0.8 -0.3 0.2Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2)
r = 0.92
-0.5
-0.3
-0.1
0.1
0.3
0.5
-0.5 0 0.5Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2) r = 0.92
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-0.6 -0.1 0.4Local enrichment fold change (log2)
Ge
no
mic
en
rich
me
nt f
old
ch
an
ge
(lo
g2)
Alu L1
L2 MIR
LTR DNA
-1.5
-1
-0.5
0
0.5
1
H3
K4
me
3H
2B
K5
ac
H3
K2
7a
cH
3K
9a
cH
2B
K1
20
ac
H4
K9
1a
cH
3K
18
ac
H3
K4
ac
H4
K5
ac
H2
AK
5a
cH
3K
36
ac
H2
AK
9a
cH
2B
K2
0a
cH
3K
79
me
2H
2A
ZH
2B
K1
2a
cH
3K
79
me
3H
4K
8a
cH
4K
12
ac
H3
K4
me
2H
3K
36
me
3H
4K
20
me
1H
3K
9m
e1
H4
K1
6a
cH
3K
23
ac
H3
K7
9m
e1
H2
BK
5m
e1
H3
K4
me
1H
3K
9m
e3
H3
K2
7m
e1
H3
K1
4a
cH
3R
2m
e1
H4
K2
0m
e3
H3
K3
6m
e1
H3
K9
me
2H
3R
2m
e2
H3
K2
7m
e3
H3
K2
7m
e2
TE
-mo
dif
ica
tio
n e
nri
ch
me
nt
(lo
g 2)
Alu L1 LTR DNA L2 MIR
Supplementary Figure 4. Enrichment or depletion of 38 individual histone modificationsin TE families. Log2 normalized ratio of the number of tags of each of the 38 histonemodifications located within each TE family over the local genomic background tag count.TE-modification enrichment values were calculate using the local genomic background histone modification tag counts, which were computed as described in the Methods section.
Supplementary Figure 5. Correlation between enrichment of histone modifications in TE families and for human gene expression. The enrichment of 38 histone modifications in human gene expression (Supplementary Figure 2) is plotted against the same in 6 TE families (Supplementary Figure 1). TE-histone modification enrichment values were calculated using the local genomic background histone modification tag counts, which were computed as described in the Methods section. Pearson correlation coefficient values (r) are shown.
r = -0.56
-1
0
1
-3 -2 -1 0 1 2
LTR enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
r = -0.48
-1
0
1
-3 -2 -1 0 1 2
DNA enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
r = -0.17
-1
0
1
-3 -2 -1 0 1 2
L2 enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
r = -0.02
-1
0
1
-3 -2 -1 0 1 2
MIR enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
r = -0.21
-1.5
-0.5
0.5
1.5
-3 -2 -1 0 1 2
Alu enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
r =-0.46
-2
-1
0
1
-3 -2 -1 0 1 2
L1 enrichment
Ex
pre
ss
ion
en
ric
hm
en
t
Supplementary Figure 6. Enrichment or depletion of active and repressive histone modifications in retrotransposons. Histone modifications were classified as active or repressive based on expression enrichment (Supplementary Figure 1). The log2 normalized ratios of the number of tags of active or repressive modifications located within each family of retrotransposons over the modification background tag counts are shown. Backgroundmodification tag counts were calculate using the local genomic background histone modification tag counts, which were computed as described in the Methods section.Retrotransposon families are arranged according to their relative age. Spearman rank correlations between active and repressive TE-modification enrichments (depletions) and the relative ages of TE families are shown.
r = 0.89
r = 0.60
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Alu L1 LTR DNA L2 MIRTE
-mo
dif
ica
tio
n e
nri
ch
me
nt
(lo
g2)
ActiveRepressiveLinear (Repressive)
(a) Alu active (b) Alu repressive
Supplementary Figure 7. TE distance from genes versus histone modifications. Distances between TE sequences and the nearest genes are binned in 10kb bins and plotted against the number of histone modification tags mapped to the TE sequence, normalized by its length. Tags are mapped using the new mapping procedure. (Refer to Figure 5)
r = -0.61
0.003
0.0032
0.0034
0.0036
0.0038
0 20 40 60 80 100
distance - x10,000 nucleotides
No
rma
lize
d t
ag
co
un
t
r = -0.38
0.0034
0.0036
0.0038
0.004
0 20 40 60 80 100
distance - x10,000 nucleotidesN
orm
aliz
ed
ta
g c
ou
nt
r = -0.89
0.0015
0.0017
0.0019
0.0021
0.0023
0.0025
0 20 40 60 80 100
distance - x10,000 nucleotides
No
rma
lize
d t
ag
co
un
t
r = -0.90
0.0034
0.0036
0.0038
0.004
0.0042
0 20 40 60 80 100
distance - x10,000 nucleotides
No
rma
lize
d t
ag
co
un
t
(c) L1 active (d) L1 repressive
Supplementary Table 3. Statistics for distances between Alu and L1 sequences and the nearest genes using uniquely mapped tags (Refer to Figure 5)
active repressiveAlu r -0.38 -0.67
t 4.066878956 8.934549001p-value 4.8184E-05 1.23944E-14
L1 r -0.27 -0.01t 2.775961966 0.098999899p-value 0.003197316 0.460670287
-3
-1
1
3
-3 -1 1 3Theoretical Normal quantiles
Exp
ress
ion
fold
ch
an
ge q
uan
tile
s
Supplementary Figure 8. Q-Q plots showing the concordance between the expression fold change and data generated from the theoretical Normal distribution (refer to Figure 2).
Supplementary Table 4. Statistics for correlations between Alu and L1 subfamilies and histone modifications (Refer to Figure 4).
active repressiveAlu subfamilies r 0.94 0.93
t 16.5311 15.1812148p-value 3E-19 5.0865E-18
L1 subfamilies r 0.53 -0.13t 3.75001 -0.7866757p-value 0.00029 0.21817526
divergence from consensus
-0.5
0
0.5
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
s
-0.5
0
0.5
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
s
-1
-0.5
0
0.5
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
s
-1
-0.5
0
0.5
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
s
-1.5
-0.5
0.5
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
s
-2
0
2
-3 -1 1 3Theoretical Normal quantiles
TE
fold
ch
an
ge q
uan
tile
sSupplementary Figure 8 contd. Q-Q plots showing the concordance between the TE enrichment fold change data and data generated from the a theoretical Normal distribution (refer to Figure 2).
Alu L1
L2 MIR
LTR DNA
0.002
0.003
0.004
0.005
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0.002
0.003
0.004
0.005
-3 -1 1 3Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0
5
10
15
20
-3 -1 1 3
Theoretical Normal quantiles
Av
era
ge
mill
idiv
qu
an
tile
s
0
100
200
300
-3 -1 1 3Theoretical Normal quantiles
Avera
ge m
illid
iv q
uan
tile
s
0.001
0.002
0.003
0.004
0.005
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0.002
0.004
0.006
-3 -1 1 3Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
Alu subfamilies L1 subfamilies
Alu subfamilies L1 subfamilies
Alu subfamilies L1 subfamilies
Supplementary Figure 9. Q-Q plots showing the non-concordance of millidiv and tags counts with the data generated from the a theoretical Normal distribution (refer to Figure 4).
0.001
0.0015
0.002
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0
0.0005
0.001
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0.0005
0.001
0.0015
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
0.001
0.0015
0.002
-3 -1 1 3
Theoretical Normal quantiles
Norm
aliz
ed
tag
s q
uan
tile
s
Supplementary Figure 10. Q-Q plots showing the non-concordance of modification tag counts with respect to their distance from human genes with the data generated from the a theoretical Normal distribution (refer to Figure 5).
Alu active Alu repressive
L1 active L1 repressive
0
2
4
6
8
10
12
14
16
18
20
Alu
Ya5
Alu
Yb9
Alu
Yb8
Alu
Yd8
Alu
Yf5
Alu
Yf4
Alu
Yg6
Alu
Yk4
Alu
Yk12
Alu
Yc3
Alu
YA
luYa8
Alu
Yc
Alu
Sg7
Alu
Sc5
Alu
Sp
Alu
Sq4
Alu
Sq1
Alu
Sc
Alu
Sc8
Alu
Yc5
Alu
Sg4
Alu
Sg
Alu
Yk11
Alu
Sq
Alu
Sx4
Alu
Sq2
Alu
Sx1
Alu
Sx3
Alu
Yh9
Alu
Sx
Alu
Sz
Alu
Sz6
FLA
M_C
Alu
JoA
luJb
Alu
JrA
luJr
4FLA
M_A
FR
AM
FA
M
Perc
ent
div
erg
ence
fro
m c
onse
nsu
s se
quence
Supplementary Figure 11. Age of Alu subfamilies determined by divergence from consensus sequence (refer to Figure 4).
0
5
10
15
20
25
L1H
SL1
PA
2L1
PA
3L1
PA
4L1
PA
5L1
PA
6L1
P1
L1P
A7
L1P
2L1
PA
8L1
PL1
P3
L1P
A1
0L1
PA
8A
L1P
B1
L1P
A1
1L1
MA
1L1
PA
13
L1P
B2
L1P
A1
4L1
PB
a1
L1M
A2
L1P
RE
C2
L1M
A3
L1P
A1
2L1
PB
L1P
5L1
MA
5A
L1P
A1
6L1
PA
15
L1P
B3
L1P
Ba
L1P
4L1
P3
bL1
MA
5L1
MA
4A
L1M
A8
L1M
C1
L1M
A4
L1P
A1
7L1
MA
7L1
MA
6L1
M1
L1M
B3
L1P
B4
L1M
B1
L1M
C2
L1M
B2
L1M
B4
L1M
A9
L1M
3L1
MB
7L1
P4
aL1
MB
8L1
M2
a1
L1M
3f
L1M
B5
L1M
D1
L1M
2
Perc
ent
div
erg
ence
fro
m c
onse
nsu
s se
quence
0
5
10
15
20
25
30
35
L1P
A1
5-1
6L1
MA
10
L1P
Bb
L1M
C3
L1M
Ca
L1M
D3
L1P
4e
L1M
D2
L1M
4L1
M2
cL1
ML1
M2
aL1
M4
cL1
P4
dL1
MD
aL1
ME
2z
L1M
3e
L1M
4b
L1M
2b
L1M
DL1
ME
1L1
P4
bL1
ME
3L1
MC
5L1
P4
cL1
ME
2L1
MC
bH
AL1
-3A
_M
EL1
ME
cL1
ME
3A
L1M
CL1
MD
bL1
MC
4L1
M3
cH
AL1
-2a
_M
DL1
ME
fL1
ME
3B
L1M
3b
L1M
C4
aL1
ME
3F
L1M
E3
DL1
M3
de
L1M
E3
EL1
ME
4a
L1M
3d
HA
L1b
L1M
Eb
L1M
Eg
L1M
Ea
L1M
Cc
L1M
E3
CL1
M5
L1M
E5
L1M
Ee
HA
L1L1
M7
L1M
Eg
1L1
M3
aL1
ME
g2
HA
L1N
1_
MD
L1M
Ed
L1M
6
Perc
ent
div
erg
ence
fro
m c
onse
nsu
s se
quence
Supplementary Figure 12. Age of L1 subfamilies determined by divergence from consensus sequence (refer to Figure 4).