Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Variation in mutation rates among human populations 1
Iain Mathieson1, David Reich1,2,3 2 3
1 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 4 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. 5 3 Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02115, USA. 6
7
8
Abstract 9
10
Mutations occur at vastly different rates across the genome, and between populations. Here, we measure 11
variation in the mutational spectrum in a sample of human genomes representing all major world populations. We 12
find at least two distinct signatures of variation. One, private to certain Native American populations, is novel and 13
is concentrated at CpG sites. The other is consistent with a previously reported signature characterized by 14
TCC>TTC mutations in Europeans and other West Eurasians. We describe the geographic extent of this signature 15
and show that it is detectable in the genomes of ancient, but not archaic humans. We hypothesize that these two 16
signatures are driven by independent processes and both result from differences in either the rate, or repair 17
efficiency, of damage due to deamination of methylated bases – respectively guanine and cytosine for the two 18
processes. Variation in these processes could be due to environmental, genetic, or life-history variation between 19
populations, and dramatically affects the spectrum of rare variation in different populations. 20
21
Introduction 22
23
For a process that provides such a fundamental contribution to genetic diversity, the human germline 24
mutation rate is surprisingly poorly understood. Different estimates of the mutation rate–the mean number of 25
mutations per-generation, or per-year–are largely inconsistent with each other [1,2], and similar uncertainty 26
surrounds parameters such as the paternal age effect [3-5], the effect of life-history traits [6,7], and the sequence-27
context determinants of mutations [5,8]. Here, we investigate a related question. Rather than trying to determine 28
the absolute values of parameters of the mutation rate, we ask how much the relative mutation rate–specifically, 29
the relative rate of different classes of mutations–varies between different human populations. 30
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
2
31
Motivation for this comes from two sources. First, analysis of tumor genomes has demonstrated a number 32
of different mutational signatures operating at different rates in somatic cells and cancers, many of which can be 33
linked to specific biological processes or environmental exposures [9-11]. It seems plausible that population-34
specific genetic factors of environmental exposures might similarly lead to variation in germline mutation rates. 35
Second, it is known that some mutations, most notably TCC>TTC, are enriched in Europeans relative to East 36
Asians and Africans [8] though the geographical extent, history, and biological basis for this signal are unclear. 37
By analyzing whole-genome sequence data from diverse world populations, together with high coverage ancient 38
genomes, we aimed to further characterize variation in human mutation rates, to place this variation in a historical 39
context, and to determine the underlying biological or environmental explanations. 40
41
Results 42
43
We first analyzed data from 300 individuals sequenced to high coverage (mean coverage depth 43X) as 44
part of the Simons Genome Diversity project [12] (SGDP). We classified single nucleotide polymorphisms 45
(SNPs) into one of 96 mutational classes according to the SNP, and the two flanking bases. We represent these by 46
the ancestral sequence and the derived base so for example “ACG>T” represents the ancestral sequence 3’–ACG–47
5’ mutating to 3’–ATG–5’. In order to increase power to detect population-specific variation, we first focused on 48
variants where there were exactly two copies of the derived allele in the sample (f2 variants). For each individual, 49
we counted the number of f2 mutations in each mutational class that they carried, and normalized by the number 50
of ATA>C mutations (the most common class and one that did not seem to vary across populations in a 51
preliminary analysis). The residual mutation intensities form a 96×300 matrix, and we used non-negative matrix 52
factorization [10,13] (NMF, implemented in the NMF package [14] in R) to identify specific mutational features. 53
NMF decomposes a matrix into a set of sparse factors, here putatively representing different mutational processes, 54
and individual-specific loadings for each factor, measuring the intensity of each process in each individual. 55
56
NMF requires us to specify the number of signatures (the factorization rank) in advance. For f2 variants 57
we chose a factorization rank of 4, based on standard diagnostic criteria (Supplementary Fig. 1). This identified 58
four mutational signatures; of which two were uncorrelated with each other, were robust across frequencies, 59
replicated in non-cell-line samples, were consistent across samples from the same populations, and had clear 60
geographic distributions (Figure 1, Supplementary Fig. 2). Signature 1 corresponds to the previously described 61
European signal [8] characterized by TCC>T, ACC>T, CCC>T and TCT>T (possibly also including CCG>T, 62
which overlaps with signature 2). Loadings of this component almost perfectly separate West Eurasians from 63
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
3
other populations, with South-West Asians intermediate. This component is seen most strongly in Western and 64
Mediterranean Europe, with decreasing intensity in Northern and Eastern Europe, the Middle East and South-west 65
Asia. This signal is similar to signature 11 from the COSMIC catalog of somatic mutation in cancer [15] (Pearson 66
correlation ρ=0.85) which is most commonly found in melanoma and glioblastoma and is associated with use of 67
chemotherapy drugs which act as alkylating agents, damaging DNA through guanine methylation. 68
69
Signature 2 is restricted to some South and Central American populations and, possibly, Aboriginal 70
Australians. It is characterized by NCG>T mutations similar to the signature caused by deamination of methylated 71
cytosine at CpG sites, corresponding to COSMIC signature 1 (ρ=0.96). Interestingly, this signal is found in South 72
America in Andean populations like Quechua and Piapoco, and in Central American populations such as Mayan 73
and Nahua, but not in the Amazonian Surui and Karitiana, nor in North American populations. 74
75
The remaining two signatures are more difficult to identify (Supplementary Fig. 2). Signature 3 is 76
characterized by GT>GG mutations, particularly GTG>GGG. It is found in some East Asian and some South 77
American populations but is not consistent within populations and does not have a consistent geographic pattern. 78
All affected samples are derived from cell lines. It does not match any mutational signature seen in COSMIC 79
(maximum ρ=0.05). Plausibly this represents some as-yet uncharacterized sample-specific process or cell-line 80
artifact. Signature 4 is diffuse, possibly representing a background mutation rate and is most correlated with 81
COSMIC signature 5 (ρ=0.52) which is found in all cancers and has unknown aetiology. It is significantly 82
reduced in only a single cell-line derived sample (Quechua-2), so probably represents some unknown cell-line or 83
data processing artifact. 84
85
We checked that these signatures were robust when we looked at different frequencies and factorization 86
ranks. For f3 variants with rank 4 we recovered the same signatures as with f2 variants (Supplementary Fig. 3), and 87
retained signatures 1 and 2 when we used rank 3 (Supplementary Fig. 4). At f1 the variation is apparently 88
dominated by cell line artifacts because principal component analysis (PCA) separates cell line from non cell line 89
derived samples (Supplementary Fig. 5A). However, NMF on f1 variants excluding cell line derived samples 90
recovers signatures consistent with signatures 1 and 2 (Supplementary Fig. 5B-C). PCA on f2 variants does not 91
distinguish cell line samples, but does separate samples by geographic region, and recovers factor loadings 92
consistent with NMF-derived signatures 1-3 (Supplementary Fig. 6). To check that our results were not an artifact 93
of the normalization we used, we repeated the analysis normalizing by the total number of mutations in each 94
sample, rather than the number of ATA>C mutations, and obtained equivalent results (Supplementary Figure 7). 95
96
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
4
Demographic effects such as population size changes confound direct estimates of mutation rate 97
differences. However, the proportion of mutations likely attributable to signature 1 (i.e TCT>T, TCC>T, CCC>T 98
and ACC>T) increases from a mean of 7.8% in Africans to 10.0% (range 8.8-11.1%) in West Eurasians which, 99
with the strong assumption that the only differences in mutation are the ones we detected, would provide a lower-100
bound for the increase in genome-wide mutation rate of 2.3% (range 1.1-3.6%). A similar calculation for 101
American samples with apparent excess of signature 2 gives a range of increase of 4.9-10.8%. 102
103
We replicated these results using data from phase 3 of the 1000 Genomes project [16], confirming that 104
mutations consistent with signature 1 are enriched in populations of European and South Asian ancestry (Figure 105
2A) and that mutations consistent with signature 2 are enriched in Peruvians (PEL) and Mexicans (MXL) – the 106
two 1000 Genomes populations with the most Native American ancestry (Figure 2B). 107
108
To study the time depth of these signals, we investigated whether signature 1 could be detected in ancient 109
samples by constructing a corrected statistic, that measures the intensity of the mutations enriched in signature 1, 110
normalized to reduce spurious signals that arise from ancient DNA damage (methods). This statistic is enriched to 111
European levels in both an eight thousand year old European hunter-gatherer and a seven thousand year old Early 112
European Farmer [17] but not in a 45,000 year old Siberian [18], nor in the Neanderthal [19] or Denisovan[20] 113
genomes (Figure 3). This statistic is predicted by neither estimated hunter-gatherer ancestry, nor early farmer 114
ancestry, in 31 samples from 13 populations for which ancestry estimates were available [17] (linear regression p-115
values 0.22 and 0.15, respectively). Thus the effect is not strongly driven by this division of ancestry. If it has an 116
environmental basis, it is not predicted by latitude (linear regression of signature 1 loadings against latitude; 117
p=0.68), but is predicted by longitude (p=6×10-8; increasing east to west). We cannot apply the same approach to 118
Signature 2, because post-mortem deamination of methylated CpG sites is common in all ancient DNA, even 119
when treated with uracil-DNA glycosylase [21]. 120
121
Finally, we investigated the dependence of the two signatures on four genomic features. First we 122
investigated dependence on transcriptional strand. Signature 1 shows a skew whereby the C>T mutation is more 123
likely to occur on the transcribed (i.e. noncoding) strand in West Eurasians, relative to populations from other 124
regions (Figure 4 A&B). Because transcription coupled repair is more likely to repair mutations on the transcribed 125
strand [22] this result, consistent with Harris (2015) [8], suggests that the excess signature 1 mutations in West 126
Eurasians are driven more by G>A than by C>T mutations. Signature 2 shows a global skew where the C>T 127
mutation is more likely to occur on the untranscribed strand, consistent with these mutations resulting from 128
deamination of methylated cytosine, and we do not see a significant difference between individuals with high 129
versus low levels of signature 2 mutations (Figure 4 C,D). Second, we obtained methylation data for a testis cell 130
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
5
line, produced by the Encyclopedia of DNA Elements (ENCODE) project [23]. Signature 2 mutations are ~8.5 131
times as likely to occur in regions of high (>=50%) versus low (<50%) methylation. We do not detect any 132
difference in this ratio between regions, or between individuals with high versus low signature 2 mutation rates, 133
although the number of mutations involved is probably too low to provide much power (Methods; Fisher’s exact 134
test P=0.14). Third, we tested dependence on B statistic [24], a measure of conservation. We found that the 135
relative magnitudes of both signatures 1 and 2 depend on B statistic, but that both these dependencies were 136
independent of the per-population intensities of the signatures (Figure 5 A,B). This, along with a similar result for 137
recombination rate, (Figure 5 C,D) confirm that the differences we detect are truly due to differences in mutation 138
rate, and not some other force like natural selection, recombination or biased gene conversion. 139
140
Discussion 141
142
We characterized two independent signals of variation in mutation rates between human populations, 143
however this may not be comprehensive. Our power to detect differences in mutation rates depends on a number 144
of factors, including sample size, and the level of background variation. For example, we would expect to have 145
more power to detect recent variations in Native Americans because they have relatively low background 146
variation. Nonetheless, it is clear that the patterns we did identify are robust, and represent real differences in 147
mutation rates. 148
149
We cannot be definitive about the causes of these patterns, but our analyses provide a number of clues. In 150
terms of the immediate mutagenic cause, signature 1 is most similar to COSMIC [15] signature 11 (Pearson 151
correlation ρ=0.85), which is associated with alkylating agents used as chemotherapy drugs, damaging DNA 152
through guanine methylation. The reversal of transcriptional strand bias for this signature in West Eurasians 153
supports the idea that the increased rate of these mutations in West Eurasians is driven by damage to guanine 154
bases, consistent with deamination of methyl-guanine to adenine, leading to the G>A (equivalently C>T) 155
mutations that we observe. Signature 1 is also highly correlated with COSMIC signature 7 (ρ=0.76), caused by 156
ultraviolet (UV) radiation exposure but it is difficult to imagine how this could affect the germline, would not 157
explain our observed increase in ACC>T mutations, would not be expected to reverse the strand bias, and should 158
produce an enrichment of CC>TT dinucleotide mutations in West Eurasians that we do not observe (p=0.41). 159
Harris (2015) [8] suggested that UV might cause germline mutations indirectly through folate deficiency in 160
populations with light skin pigmentation (since folate can be degraded in skin by UV radiation). It is unknown 161
what mutational signature would be caused by this effect, but the fact that we do not observe enrichment of 162
signature 1 in other lightly pigmented populations like Siberians suggests that it is not driving the signal. 163
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
6
164
Signature 2 seems most likely to be driven by deamination of methylated cytosine at CpG sites, which is 165
the major source of C>T muations at these sites. We find no evidence of any qualitative difference between the 166
CpG mutations in populations with high rates of signature 2 compared to other populations. Therefore we 167
hypothesize that variation in both signatures is driven by differences either in methylation (of guanine and 168
cytosine, respectively) level, rate of deamination of methylated bases, or in the efficiency of repair of damage 169
caused by deamination of methylated bases. 170
171
Why these factors would vary between populations is another question and, for this study, a matter for 172
speculation. Possibilities include variation in life-history traits, variation in environment, or systematic variation 173
in mutational or DNA repair processes due to either natural selection or neutral evolution. Over longer timescales, 174
variation in life history traits, for example the timing of reproduction and the onset of fertility, can lead to 175
variation in both total mutation rate and in the mutational spectrum, particularly in the relative rates of CpG and 176
non-CpG mutations [1,6,7,25,26]. While these effects are dramatic and important on the timescale of hominid 177
evolution, we do not think that variation in life-history traits drives the variation that we observed within humans. 178
First, we see little variation within African populations, despite great social and cultural diversity across the 179
populations represented in our study. Second, variation in signature 1 across Eurasia seems very smooth (Figure 180
1B) and we know of no life-history traits that vary so smoothly across populations. Signature 2 varies less 181
smoothly, and the relative rate of CpG mutations is known to be affected by life-history traits, although the 182
magnitude of the signal seems too large. For example, in the 1000 Genomes project, the proportion of rare C>T 183
mutations at CpG sites (a lower bound for the difference in mutation rate) varies by up to ~5% across populations 184
(Figure 2 B), on the same order as the difference in the proportion of CpG substitutions between human and 185
baboon lineages [26]. Studies of de novo mutations in humans have found no significant effect of paternal age on 186
CpG:non-CpG mutation ratio [4,5], but we cannot exclude the possibility that variation in signature 2 is driven by 187
dramatic variation in some other life history trait. 188
189
Other than life-history traits, damage or repair rates might vary either because of inherited variation in 190
DNA repair efficiency or other biochemical processes, or because of external environmental factors. In the first 191
case, a promising approach to detecting a causal locus would be to map mutation rate variation, either in large 192
pedigrees or in an admixed population like African Americans. In the second, we might identify the factor by 193
looking for correlations with environmental features, through mutation accumulation experiments in other 194
species, or comparison with mutational signatures identified in cancer with known causes. So far, however, the 195
ultimate cause of this variation remains unknown. 196
197
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
7
It is important to understand changes in the mutation rate on the timescale of hominid evolution in order 198
to calibrate demographic models of human evolution [27] and the observation of variation in mutation rates 199
between populations [8] made this calibration even more complicated. One further consequence of our results is 200
that the rate of CpG mutations, often assumed to be almost “clock-like” [7,25,26] may also vary over short 201
timescales, meaning that they may not be as useful for model calibration as previously though. Further work in 202
this area will involve more detailed measurement of mutation rates in diverse populations – to date, most work on 203
somatic, cancer, or de novo germline mutations has been conducted in popualtions of West Eurasian origin – and 204
the extension of these approaches to other populations will be required to fully understand variation in mutation 205
rates and its consequences for demographic modeling. 206
207
Methods 208
209
Identifying mutational signatures 210
211
We used SNPs called in 300 individuals from the Simons Genome Diversity Project [12] (SGDP). 212
Variant sites were called at filter level 1, and then any site that was variable in any sample was genotyped in every 213
sample. We polarized SNPs with the ancestral allele inferred in the human-chimp ancestor, and classified by the 214
two flanking bases in the human reference (hg19). We restricted to sites of given frequencies and merged reverse 215
complement classes to give counts of SNPs occurring in 96 possible mutational classes. We then normalized these 216
counts by the frequency of ATA>ACA mutations. The remaining matrix represents the normalized intensity of 217
each mutation class in each sample, relative to the sample with the lowest intensity. Formally, let Cij be the 218
counts of mutations in class i for sample j. Then, the intensities that we analyze, Xij are given by, 219
220
Xij =Cij
C{ATA>C} j 221
222
We decomposed this matrix X using non-negative matrix factorization [13] implemented in the NMF R 223
package [14] with the multiplicative algorithm introduced by Lee & Seung (1999) [13], initialized using the non-224
negative components from the output of a fastICA analysis[28] implemented in the fastICA package in R 225
(https://cran.r-project.org/web/packages/fastICA/index.html). For the diagnostic plots in Supplementary Fig. 2, 226
we used 200 random starting points to compare the results of different runs. When we initialized the matrix 227
randomly, rather than using fastICA, we obtained a slightly closer fit to the data (root-mean-squared error in X of 228
0.024 vs 0.025) and similar factor distributions (Supplementary Fig. 8A), except that all signatures were 229
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
8
dominated by CpG mutations (Supplementary Fig. 8B). Removing a constant amount of each CpG mutation from 230
each signature recovered signatures close to the fastICA-initialized signatures (Supplementary Fig. 8C), so we 231
concluded that this was a model-fitting artifact, and did not reflect true signatures. Finally we performed the 232
analysis on a matrix normalized be the total number of mutations in each sample 𝐶!"! rather than the number of 233
ATA>C mutations. (Supplementary Figure 7). 234
235
The ordering of the factors is arbitrary so, where necessary, we reordered for interpretability. To plot 236
mutational signatures and compare with the COSMIC signatures, we rescaled the intensities of each class 237
according to the trinucleotide frequencies in the human reference genome. The scale of the weightings is therefore 238
not easily interpretable. To perform principal component analysis on X, we normalized so that the variance of 239
each row was equal to 1. 240
241
Analysis of 1000 Genomes data 242
243
We classified 1000 Genomes mutations according to the ancestral allele inferred by the 1000 Genomes 244
project, and counted the number of f2 and f3 variants carried by each individual in each mutation class. We ignored 245
SNPs that were multi-allelic or where the ancestral state was not confidently assigned (confident assignment 246
shown by a capital letter in the “AA” tag in the “INFO” field of the vcf file). We excluded the four outlying 247
samples: HG01149(CLM), NA20582 & NA20540 (TSI), NA12275(CEU), NA19728(MXL). 248
249
Analysis of ancient genomes 250
251
We identified heterozygous sites in five ancient genomes from published vcf files, and restricted to sites 252
where there was a single heterozygote in the SGDP. The corrected signature 1 log-ratio is defined by 253
254
M = log2X{TCC>A} jX{ACC>A} jX{TCT>A} jX{CCC>A} jX{TCA>A} jX{ACA>A} jX{TCA>A} jX{CCA>A} j
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪ 255
256
and then normalized so that the distribution in African populations has mean 0 and standard deviation 1. We 257
estimated bootstrap quantiles by resampling the counts Cij for the ancient samples and recomputing M. 258
259
260
261
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
9
Transcriptional strand 262
263
We downloaded the knownGenes table of the UCSC genes track from the UCSC genome browser 264
(http://genome.ucsc.edu/). Taking the union of all transcripts in this table, we classified each base of the genome 265
according to whether it was transcribed on the + or – strand, both, or neither (including uncalled bases). These 266
regions totaled 607Mb, 637Mb, 36Mb and 1,599Mb of sequence respectively. We then counted mutations in our 267
dataset that occurred in regions that were transcribed on the + or – strand, ignoring regions where both or neither 268
strand was transcribed. 269
270
Methylation status 271
272
We downloaded the Testis_BC 1 and 2 (two technical replicates from the same sample) tables from the 273
HAIB Methyl RRBS track from the UCSC genome browser (http://genome.ucsc.edu/). We constructed a list of 274
33,305 sites where both replicates had >=50% methylation and another list of 166,873 sites where both replicates 275
had <50% methylation. We then classified the CpG mutations in our dataset according to which, if either, of these 276
lists they fell into. Ultimately, there were only 1186 classified mutations in the whole dataset, including 43 in 277
Native American samples and 12 in Native American samples with high rates of signature 2. Therefore, although 278
we found no significant interactions between methylation status and population, it may be simply that we lack 279
power to detect it. 280
281
B statistic and recombination rate 282
283
We classified each base of the genome according to which decile of B statistic [24] or HapMap 2 284
combined recombination rate [29] (in 1kb blocks) it fell into and counted mutations in each class. 285
286
Code availability 287
Scripts used to run the analysis are available from https://github.com/mathii/spectrum. 288
289
Acknowledgments 290
We thank Mark Lipson and Priya Moorjani for helpful comments. I.M. was supported by a long-term 291
fellowship from the Human Frontier Science Program LT001095/2014-L. D.R. is a Howard Hughes Medical 292
Institute investigator. 293
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
10
294
Figure 1: Distribution and characterization of mutational signatures 1 and 2. A: Factor coefficients for these two 295
signatures, for 300 individual samples colored by region. B: Geographic representation of the factor loadings from 296
panel A. Darker colors represent higher loadings. C: Characterization of the signatures in terms of mutation 297
intensity for each of 96 possible classes. Bars are scaled by the frequency of each trinucleotide in the human 298
reference genome. Below, the most highly correlated signatures from the COSMIC database are shown for 299
comparison. 300
301
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ●● ●
● ● ● ● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00
.05
0.1
00
.15
0.2
00
.25
Component 4
Pro
po
rtio
n o
f m
uta
tio
ns
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
PCA4
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PC
A3
Signature 1
Signature 1 Signature 2
Signature 2
A B
CC>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
5
Component 2
Loadin
g
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●●
● ● ●●
● ● ●● ●
● ●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●● ●
●
●● ●
●
● ● ●
●
●
● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
●●
● ●● ● ● ●
●● ●
● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00
.05
0.1
00
.15
Component 1
Lo
ad
ing
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
COSMIC Signature 11 COSMIC Signature 1
Chane-1
Piapoco-2
Quechua-3
Mayan-1
Mayan-2 Quechua-1
Nahua-2
Nahua-1
Quechua-2
Mixtex-1Australian-3
Zapotec-1
Sign
atur
e 2
Signature 1
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
11
302 Figure 2: Signatures 1 and 2 in the 1000 Genomes. A: Proportions of f2 and f3 variants in signature 1 (here 303
defined as TCT>T, TCC>T, CCC>T and ACC>T) in each 1000 Genomes individual, by population. B: 304
Proportions of f2 and f3 variants in signature 2 (here defined as NCG>T, for any N) in each 1000 Genomes 305
individual, by population (five outlying samples excluded). 306
1000 Genomes population labels
ASW African Ancestry in Southwest USA
ACB African Caribbean in Barbados
BEB Bengali in Bangladesh
GBR British From England and Scotland, UK
CDX Chinese Dai - Xishuangbanna, China
CHD Chinese in Metropolitan Denver, USA
CLM Colombian in Medellin, Colombia
ESN Esan in Nigeria
FIN Finnish in Finland
GIH Gujarati Indians in Houston, USA
CHB Han Chinese in Beijing, China
CHS Han Chinese South, China
IBS Iberian populations in Spain
ITU Indian Telugu in the UK
JPT Japanese in Tokyo, Japan
KHV Kinh in Ho Chi Minh City, Vietnam
LWK Luhya in Webuye, Kenya
MKK Maasai in Kinyawa, Kenya
MSL Mende in Sierra Leone
MXL Mexican Ancestry in Los Angeles, USA
PEL Peruvian in Lima, Peru
PUR Puerto Rican in Puerto Rico
PJL Punjabi in Lahore, Pakistan
STU Sri Lankan Tamil in the UK
TSI Toscani in Italia
YRI Yoruba in Ibadan, Nigeria
Population
Sig
na
ture
1
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
● ●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●● ●
●●
●
●●
●
●●
●●
● ●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●● ●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
●
●
●●
●●
●●
●●
●●
●
●
●●●
●●
●
●
●●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.0
70
.08
0.0
90
.10
KHV CHS CHB JPT PEL CDX MXL GWD MSL ASW ACB ESN YRI LWK CLM FIN PUR BEB STU GIH ITU GBR IBS CEU PJL TSI
A
B
Population
Sig
na
ture
2
●
●
●●●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●● ●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●●
●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●●●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
● ●
●
●●● ●● ●
●
●
●
●
●●●
●
●●
●●
●
● ●●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●● ●
●
●
●
●●
● ●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●●
●
●
●
●●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
● ●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●●
●● ●
●
●●
●
●●
●●●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●●
●●
●
● ●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●● ●
●●
●
●
●
●●
●●
● ●
● ●
●●
● ●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●●
●
●
●●
●●
●
●●
●
●●
●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●●
●●
●
●
●●
●
●●● ●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●●
●
● ●●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
● ●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●●
●
●●
●●
●● ●● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●●
● ●
●
●
●
●
●
●
●
●● ●
●●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●●
●
●●
●●●● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
● ● ●
●
●● ●
●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
● ●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
● ●
●
●
●
●
●●
● ●
●
●
●●
●
● ●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●● ●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●● ●●
●●●
●
●●
●●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●● ●
●
● ●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
0.1
50
.20
0.2
5
LWK MSL PUR FIN ESN GWD ASW CDX ACB GIH YRI STU JPT CLM KHV CHS ITU PJL GBR BEB CHB CEU IBS TSI MXL PEL
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
12
307
Figure 3: Signature 1, corrected to be more robust to ancient DNA damage (Methods), for f2 variants in the SGDP 308
individuals, by region, and in five high coverage ancient genomes. Solid lines show 5-95% bootstrap quantiles. 309
M2
●
●
●●
●●●
●
●
● ●●
●● ● ●
●
●
●●
●
●● ●
●
●
● ●●
●●
●
●●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
−5
0
5
10
15A
fric
a
Oceania
EastA
sia
Centr
alA
sia
Sib
eri
a
Am
eri
ca
South
Asia
WestE
ura
sia
Loschbour
Stu
ttgart
Ust' I
shim
Neandert
al
Denis
ova
●
●
●●
●
8,00
0 ye
ar-o
ld E
urop
ean
hunt
er-g
athe
rer
7,00
0 ye
ar-o
ld E
urop
ean
farm
er
45,0
00 y
ear-
old
Sibe
rian
Alta
i Nea
nder
thal
Alta
i Den
isova
n
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
13
310 Figure 4: Transcriptional strand bias in mutational signatures. We plot the log of the ratio of mutations occurring 311
on the untranscribed versus transcribed strand. Therefore a positive value indicates that the C>T mutation is more 312
common than the G>A mutation on the untranscribed (i.e. coding) strand. P values in brackets are, respectively, 313
ANOVA P-values for a difference between regions and t-test P-values for a difference between i) West Eurasia 314
and other regions (excluding South Asia) in A&B ii) 11American samples with high rates of signature 2 315
mutations and other regions in C&D. A: Boxplot of per-individual strand bias for mutations in signature 1 316
(TCT>T, TCC>T, CCC>T and ACC>T). One sample (S_Mayan-2) with an extreme value (0.48) is not shown. B: 317
Population-level means for each of the mutations comprising signature 1. C,D: as A&B but for signature 2. We 318
separated out the 11 American samples with high rates of signature 2 mutations. 319
●
●
●
●
−0.4
−0.2
0.0
0.2
0.4
Signature 1 (P=6e−06 , 7.4e−07)
Log−
ratio
stra
nd b
ias
Afric
a
Amer
ica
Cen
tralA
siaS
iber
ia
East
Asia
Oce
ania
Sout
hAsi
a
Wes
tEur
asia
●● ● ● ●
● ●
−0.4
−0.2
0.0
0.2
0.4
TCT>T (P=0.3 , 0.094)
●●
●● ● ●
●
−0.4
−0.2
0.0
0.2
0.4
TCC>T (P=2.2e−05 , 5.4e−08)
● ● ●●
●●
●
−0.4
−0.2
0.0
0.2
0.4
CCC>T (P=0.13 , 0.71)
●● ● ●
●●
●
−0.4
−0.2
0.0
0.2
0.4
ACC>T (P=0.049 , 0.087)
●
●
●
−0.4
−0.2
0.0
0.2
0.4
Signature 2 (P=0.23 , 0.029)
Log−
ratio
stra
nd b
ias
Afric
a
Amer
ica
(hig
h)
Amer
ica
(low
)
Cen
tralA
siaS
iber
ia
East
Asia
Oce
ania
Sout
hAsi
a
Wes
tEur
asia
● ●
●
● ● ● ● ●
−0.4
−0.2
0.0
0.2
0.4
ACG>T (P=0.044 , 0.63)
●
●
●●
● ●● ●
−0.4
−0.2
0.0
0.2
0.4
CCG>T (P=0.00005 , 1.5e−02)
●●● ●
●●
●●
−0.4
−0.2
0.0
0.2
0.4
GCG>T (P=0.014 , 0.49)
● ●●● ● ●
●●
−0.4
−0.2
0.0
0.2
0.4
TCG>T (P=0.25 , 0.84)
A B
C D
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
14
320 Figure 5: Dependence of mutational signatures on genomic features. A,B: dependence on conservation, measured 321
by B statistic (0=lowest B statistic; highest conservation). A: Comparison of proportions of signature 1 mutations 322
between West Eurasia and other populations (excluding South Asia). Blue dashed lines show a fitted linear model 323
of dependence with no interaction term. B: Comparison of proportions of signature 2 mutations between the 11 324
American samples with the highest proportions, and all other samples. Blue dashed lines show a fitted quadratic 325
model of dependence with no interaction term. C,D: As A&B, but showing dependence on recombination rate 326
decile computed in 1kb bins. 327
0.15
0.20
0.25
0.30
Recombination rate decile
Prop
ortio
n of
sig
natu
re 2
mut
atio
ns
0−10% 10−20% 20−30% 30−40% 40−50% 50−60% 60−70% 70−80% 80−90% 90−100%
0.04
0.06
0.08
0.10
0.12
Recombination rate decile
Prop
ortio
n of
sig
natu
re 1
mut
atio
ns
0−10% 10−20% 20−30% 30−40% 40−50% 50−60% 60−70% 70−80% 80−90% 90−100%
0.04
0.06
0.08
0.10
0.12
B statistic decile
Prop
ortio
n of
sig
natu
re 1
mut
atio
ns
0−10% 10−20% 20−30% 30−40% 40−50% 50−60% 60−70% 70−80% 80−90% 90−100%
0.15
0.20
0.25
0.30
B statistic decile
Prop
ortio
n of
sig
natu
re 2
mut
atio
ns0−10% 10−20% 20−30% 30−40% 40−50% 50−60% 60−70% 70−80% 80−90% 90−100%
A B
C D
West EurasiaExcluding West Eurasia and South Asia
11 American samples with strongest sig. 2Other samples
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
15
References 328
329
1. Segurel L, Wyman MJ, Przeworski M (2014) Determinants of mutation rate variation in the human germline. 330 Annu Rev Genomics Hum Genet 15: 47-70. 331
2. Scally A (2016) Mutation rates and the evolution of germline structure. Philos Trans R Soc Lond B Biol Sci 332 371. 333
3. Genome of the Netherlands C (2014) Whole-genome sequence variation, population structure and demographic 334 history of the Dutch population. Nat Genet 46: 818-825. 335
4. Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, et al. (2012) Rate of de novo mutations and the 336 importance of father's age to disease risk. Nature 488: 471-475. 337
5. Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, et al. (2016) Timing, rates and spectra of 338 human germline mutation. Nat Genet 48: 126-133. 339
6. Amster G, Sella G (2016) Life history effects on the molecular clock of autosomes and sex chromosomes. Proc 340 Natl Acad Sci U S A 113: 1588-1593. 341
7. Gao Z, Wyman MJ, Sella G, Przeworski M (2016) Interpreting the Dependence of Mutation Rates on Age and 342 Time. PLoS Biol 14: e1002355. 343
8. Harris K (2015) Evidence for recent, population-specific evolution of the human mutation rate. Proc Natl Acad 344 Sci U S A 112: 3439-3444. 345
9. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, et al. (2013) Signatures of mutational 346 processes in human cancer. Nature 500: 415-421. 347
10. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR (2013) Deciphering signatures of 348 mutational processes operative in human cancer. Cell Rep 3: 246-259. 349
11. Behjati S, Huch M, van Boxtel R, Karthaus W, Wedge DC, et al. (2014) Genome sequencing of normal cells 350 reveals developmental lineages and mutational processes. Nature 513: 422-425. 351
12. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, et al. (2016) The landscape of human genome diversity. 352 In press. 353
13. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401: 354 788-791. 355
14. Gaujoux R, Seoighe C (2010) A flexible R package for nonnegative matrix factorization. BMC 356 Bioinformatics 11: 367. 357
15. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, et al. (2015) COSMIC: exploring the world's 358 knowledge of somatic mutations in human cancer. Nucleic Acids Res 43: D805-811. 359
16. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, et al. (2015) A global reference for 360 human genetic variation. Nature 526: 68-74. 361
17. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, et al. (2014) Ancient human genomes suggest three 362 ancestral populations for present-day Europeans. Nature 513: 409-413. 363
18. Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, et al. (2014) Genome sequence of a 45,000-year-old modern 364 human from western Siberia. Nature 514: 445-449. 365
19. Prufer K, Racimo F, Patterson N, Jay F, Sankararaman S, et al. (2014) The complete genome sequence of a 366 Neanderthal from the Altai Mountains. Nature 505: 43-49. 367
20. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, et al. (2012) A high-coverage genome sequence from 368 an archaic Denisovan individual. Science 338: 222-226. 369
21. Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, et al. (2010) Removal of deaminated cytosines and 370 detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38: e87. 371
22. Green P, Ewing B, Miller W, Thomas PJ, Program NCS, et al. (2003) Transcription-associated mutational 372 asymmetry in mammalian evolution. Nat Genet 33: 514-517. 373
23. Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-374 74. 375
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
16
24. McVicker G, Gordon D, Davis C, Green P (2009) Widespread genomic signatures of natural selection in 376 hominid evolution. PLoS Genet 5: e1000471. 377
25. Kim SH, Elango N, Warden C, Vigoda E, Yi SV (2006) Heterogeneous genomic molecular clocks in 378 primates. PLoS Genet 2: e163. 379
26. Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral 380 substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A 101: 13994-14001. 381
27. Scally A, Durbin R (2012) Revising the human mutation rate: implications for understanding human 382 evolution. Nat Rev Genet 13: 745-753. 383
28. Hyvärinen A (1999) Fast and Robust Fixed-Point Algorithms for Independent Component Analysis. IEEE 384 Transactions on Neural Networks 10. 385
29. International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299-1320. 386 387
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
17
388 Supplementary Figure 1: Diagnostic plots for NMF using variants of different frequencies. Rows from top- 389
variants at frequency 2, 3 and 4. Columns- each plot shows the value of a measure, computed over 50 random 390
start points, for factorization ranks from 2 to 8. From left to right: Dispersion, a measure of reproducibility of 391
clusters across runs (1=perfectly reproducible); Residual sum of squares (lower=better fit); Silhouette, a measure 392
of how reliably elements can be assigned to clusters (1=perfectly reliably). In supplementary plots, we denote the 393
signatures obtained from fr variants with rank k by signaturer,k, so the signatures in the main text are equivalent to 394
signature2,4. 395
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
● ●
● ●
●
● ●
● ●
●
●
●
dispersion rss silhouette
0.6
0.7
0.8
0.9
1.0
20
30
40
0.00
0.25
0.50
0.75
1.00
2 3 4 5 6 7 8 2 3 4 5 6 7 8 2 3 4 5 6 7 8Factorization rank
Measure type●
●
●
●
Basis
Best fit
Coefficients
Consensus
NMF rank survey
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
dispersion rss silhouette
0.4
0.5
0.6
0.7
0.8
0.9
20
25
30
35
40
45
0.25
0.50
0.75
1.00
2 3 4 5 6 7 8 2 3 4 5 6 7 8 2 3 4 5 6 7 8Factorization rank
Measure type●
●
●
●
Basis
Best fit
Coefficients
Consensus
NMF rank survey
● ●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●●
●
●
dispersion rss silhouette
0.6
0.7
0.8
0.9
1.0
25
30
35
40
45
0.00
0.25
0.50
0.75
1.00
2 3 4 5 6 7 8 2 3 4 5 6 7 8 2 3 4 5 6 7 8Factorization rank
Measure type●
●
●
●
Basis
Best fit
Coefficients
Consensus
NMF rank survey
f2
f3
f4
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
18
396 Supplementary Figure 2: Distribution and characterization of mutational signatures2,4 3 and 4. A: Per-sample 397
coefficients for signatures 3 and 4. B: Geographic distribution of signatures 3 and 4. C: Mutational spectrum of 398
signature 3. D: Mutational spectrum of signature 4. E-F: Comparison of loadings of 1 and 2 with signatures 3 and 399
4. 400
●
●●
● ●●
●
●
● ●
●
● ●● ●
●
● ●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ●
●●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ● ● ● ●
● ● ● ●
● ●
● ●
0.0
00.0
10.0
20.0
30.0
40.0
50.0
6
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
●
●
● ● ●
●
●
●
●
●
●●
● ● ●
●● ● ● ● ● ● ●
●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ●● ●
● ●●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
● ●
●
●
● ● ● ●
0.0
00
.05
0.1
00
.15
0.2
00
.25
Component 3
Pro
po
rtio
n o
f m
uta
tio
ns
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Han-3
Pima-2
MIxe-3
Mongola-1
Russian-1
Dai-4
Miao-1
Miao-2
Quechua-2
Signature 3
Sign
atur
e 4
Signature 3
Signature 4
Signature 3 Signature 4
C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G
Signature 4
Sign
atur
e 1
Signature 2
Sign
atur
e 3
A B
C D
E FCell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
19
401 Supplementary Figure 3: Mutational signatures inferred from f3 variants with rank 4. A: Factor coefficients for 402
signature3,4 1 and 2 – compare with Figure 1A. B: Factor coefficients for signature3,4 3 and 4 – compare with 403
Supplementary Figure 2A. C: Mutational signatures3,4 1-4 – compare with Figure 1 C and D, and Supplementary 404
Figure 2 C and D. 405
●
●
●
● ● ●
●
●
●●
●
● ● ●
●●
● ●
●●
●●
●
●● ●
●● ● ●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ●
● ● ●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
● ● ● ● ●
● ● ● ●
● ●● ●
0.0
00.0
20.0
40.0
60.0
8
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●●
●
●
●
●
●
● ● ●
●
●● ● ●
●
●
● ● ● ● ● ●●
●
● ● ●●
● ● ●
●
● ● ● ●
●
● ●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
●
●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ●●
● ● ●
●
● ● ● ● ● ● ●●
● ● ●●
● ● ●● ● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
PCA2
PC
A4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA3
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA4
PC
A3
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA3
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA4
PC
A3
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Signature3,4 1 Signature3,4 3
Sign
atur
e 3,4 2
Sign
atur
e 3,4 4
C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G
Signature3,4 1
Signature3,4 2
Signature3,4 3
Signature3,4 4
A B
C
Nahua-2
Chipewyan-2
Quechua-2
Mayan-2
Pima-2
Piapoco-2
Zapotec-1
Mixe-1
Pima-2
Han-3
Dai-4
Russian-1
Miao-2
Mongola-1
Karitiana-3
Quechua-2
Chippewayan-2
Nahua-2
Piapoco-1Australian-3
Australian-4
●
●
●
● ● ●
●
●
●●
●
● ● ●
●●
● ●
●●
●●
●
●● ●
●● ● ●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ●
● ● ●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
● ● ● ● ●
● ● ● ●
● ●● ●
0.0
00.0
20.0
40.0
60.0
8
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●●
●
●
●
●
●
● ● ●
●
●● ● ●
●
●
● ● ● ● ● ●●
●
● ● ●●
● ● ●
●
● ● ● ●
●
● ●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
●
●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ●●
● ● ●
●
● ● ● ● ● ● ●●
● ● ●●
● ● ●● ● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
● ● ●
●
●
●●
●
● ● ●
●●
● ●
●●
●●
●
●● ●
●● ● ●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ●
● ● ●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
● ● ● ● ●
● ● ● ●
● ●● ●
0.0
00.0
20.0
40.0
60.0
8
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●●
●
●
●
●
●
● ● ●
●
●● ● ●
●
●
● ● ● ● ● ●●
●
● ● ●●
● ● ●
●
● ● ● ●
●
● ●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
●
●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ●●
● ● ●
●
● ● ● ● ● ● ●●
● ● ●●
● ● ●● ● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
● ● ●
●
●
●●
●
● ● ●
●●
● ●
●●
●●
●
●● ●
●● ● ●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ●
● ● ●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●●
● ● ● ● ●
● ● ● ●
● ●● ●
0.0
00.0
20.0
40.0
60.0
8
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●●
●
●
●
●
●
● ● ●
●
●● ● ●
●
●
● ● ● ● ● ●●
●
● ● ●●
● ● ●
●
● ● ● ●
●
● ●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
●
●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ●●
● ● ●
●
● ● ● ● ● ● ●●
● ● ●●
● ● ●● ● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
20
406 Supplementary Figure 4: Mutational signatures inferred from f3 variants with rank 3. A: Factor coefficients for 407
signature3,3 1 and 2. B: Factor coefficients for signature3,3 3 and. C: Mutational signatures3,3 1-3. 408
PCA2
PC
A3
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
C>A C>G C>T T>A T>C T>GC>A C>G C>T T>A T>C T>G
Signature3,3 1
Signature3,3 1
Signature3,3 2
Signature3,3 3
Signature3,3 2
Sign
atur
e 3,3 2
Sign
atur
e 3,3 3
A B
C●
●●
● ●●
●
●● ●
●
● ● ●
●● ● ●
● ●●
● ●●
● ● ● ● ● ●● ●
●
●
●
● ● ●
●
● ●●
●
●
●
●
●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●● ●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
50.1
00.1
5
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ●●
● ● ●● ● ● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
0.1
0.2
0.3
0.4
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ●●
●
●● ●
●
● ● ●
●● ● ●
● ●●
● ●●
● ● ● ● ● ●● ●
●
●
●
● ● ●
●
● ●●
●
●
●
●
●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●● ●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
50.1
00.1
5
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ●●
● ● ●● ● ● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
0.1
0.2
0.3
0.4
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ●●
●
●● ●
●
● ● ●
●● ● ●
● ●●
● ●●
● ● ● ● ● ●● ●
●
●
●
● ● ●
●
● ●●
●
●
●
●
●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●● ●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
50.1
00.1
5
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ●●
● ● ●● ● ● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
00.3
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
0.1
0.2
0.3
0.4
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
Pima-2
Mixe-1
Nahua-2
Han-3
Quechua-3
Dai-4Piapoco-2
Mayan-2
Chipewyan-2
Zapotec-2Quechua-3
Chane-1 Kurumba-1
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
21
409 410
Supplementary Figure 5: Analysis of f1 variants A: The first two principal components of the mutational 411
spectrum of f1 variants, showing the difference between cell line and primary tissue derived samples. B&C: 412
Mutational signatures inferred from f1 variants with rank 2, but excluding cell line samples. B: Factor loadings for 413
signature1*,2 1 and 2 (asterisk denotes no cell lines). C: Mutational signatures1*,2 1 and 2. Signature1*,2 1 is 414
confounded with CpG mutations in this case, but clearly shows an elevated level of TCC>T and ACC>T 415
mutations. 416
PCA1
PC
A2
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
●●●●●●●●●●●
●●●
●
●●●●●●●●
●●●●●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●●●●●●●●●●●●●●
●
●
●
●
●
●●
●●●
●●
●●
●●●●●●●●●●●●●●●●●●
0.0
00
.04
0.0
80
.12
Component 1
Pro
po
rtio
n o
f m
uta
tio
ns
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.AAT
C.A
AT
G.A
AT
T.A
CTA
.AC
TC
.AC
TG
.AC
TT.A
GTA
.AG
TC
.AG
TG
.AG
TT.A
TTA
.AT
TC
.AT
TG
.AT
TT.A
ATA
.CAT
C.C
AT
G.C
AT
T.C
CTA
.CC
TC
.CC
TG
.CC
TT.C
GTA
.CG
TC
.CG
TG
.CG
TT.C
TTA
.CT
TC
.CT
TG
.CT
TT.C
ATA
.GAT
C.G
AT
G.G
AT
T.G
CTA
.GC
TC
.GC
TG
.GC
TT.G
GTA
.GG
TC
.GG
TG
.GG
TT.G
TTA
.GT
TC
.GT
TG
.GT
TT.G
●●●
●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●
●
●●
●
●
●●●
●
●
●●
●
●●●●●●●●●●●●●●●●●
●●●●
●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●
0.0
00
.05
0.1
00
.15
Component 2
Pro
po
rtio
n o
f m
uta
tio
ns
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.AAT
C.A
AT
G.A
AT
T.A
CTA
.AC
TC
.AC
TG
.AC
TT.A
GTA
.AG
TC
.AG
TG
.AG
TT.A
TTA
.AT
TC
.AT
TG
.AT
TT.A
ATA
.CAT
C.C
AT
G.C
AT
T.C
CTA
.CC
TC
.CC
TG
.CC
TT.C
GTA
.CG
TC
.CG
TG
.CG
TT.C
TTA
.CT
TC
.CT
TG
.CT
TT.C
ATA
.GAT
C.G
AT
G.G
AT
T.G
CTA
.GC
TC
.GC
TG
.GC
TT.G
GTA
.GG
TC
.GG
TG
.GG
TT.G
TTA
.GT
TC
.GT
TG
.GT
TT.G
Signature1*,2 1
Sign
atur
e 1*,2 2
Signature1*,2 1
Signature1*,2 2
B C
−10 0 10 20 30 40 50
−50
51
0
PCA1
PC
A2
Cell line sample
Primary tissue sample
A
Hawaiian-1Surui-1
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
22
417 Supplementary Figure 6: Principal component analysis of the mutational spectrum of f2 variants. A&B: 418
Principal component positions. Labeled by sample source (A) and geographic region (B). C: Component loadings. 419
Note that principal components 2,3 and 4 correspond roughly to mutational signatures2,4 3, 2 and 1 respectively. 420
PCA1
PC
A2
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A3
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA3
PC
A4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA4
PC
A5
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●● ●
●
●
0.0
50.1
00.1
5
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
● ●
●
●
● ●
●
−0.3
−0.2
−0.1
0.0
0.1
0.2
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
●●
●
●●
● ●●
●●
● ●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
−0.2
−0.1
0.0
0.1
0.2
0.3
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ● ●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●● ●
●
●
●
●
● ●
●
●
●●
●
●
● ●
●
●
●
● ● ●
●
●
●● ● ●
● ●
●
●
−0.1
0.0
0.1
0.2
0.3
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●● ●
●
●
0.0
50.1
00.1
5
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
● ●
●
●
● ●
●
−0.3
−0.2
−0.1
0.0
0.1
0.2
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●●
●
● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
●●
●
●●
● ●●
●●
● ●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
−0.2
−0.1
0.0
0.1
0.2
0.3
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ● ●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●● ●
●
●
●
●
● ●
●
●
●●
●
●
● ●
●
●
●
● ● ●
●
●
●● ● ●
● ●
●
●
−0.1
0.0
0.1
0.2
0.3
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
B
C
−10 −5 0 5 10 15
−15
−10
−50
PCA1
PC
A2
Cell line sample
Primary tissue sample
A
Piapoco-2
Pima-2
Han-3Mixe-1
Piapoco-2
Pima-2
Han-3
Mixe-1
Han-3Mixe-1
Pima-2
Dai-4
Chane-1
Quechua-2
Piapoco-2
Chane-1
Nahua-1
Chane-1
Piapoco-2
Han-3
Mixe-1
Dai-4
Pima-2
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
23
421 Supplementary Figure 7: NMF analysis of f2 variants at rank 4 - as the main analysis, but normalizing the 422
mutational spectra by the total number of mutations in each sample, rather than the number of ATA>C mutations. 423
Component 2
Com
ponent 4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Component 3
Com
ponent 1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Component 3
Com
ponent 4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Component 3
Com
ponent 2
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
Signature 1 Signature 3
Sign
atur
e 2
Sign
atur
e 4
A B
C>A C>G C>T T>A T>C T>GC>A C>G C>T T>A T>C T>GC
C>A C>G C>T T>A T>C T>GC>A C>G C>T T>A T>C T>G
●
●●
● ● ●
●
●
● ●
●
● ●● ●
●
● ●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ●
●●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ● ● ● ●
● ● ● ●
● ●
●●
0.0
00.0
10.0
20.0
30.0
40.0
50.0
60.0
7
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ●● ●
● ● ● ●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
●
●
● ● ●
●
●
●
●
●
●●
● ● ●
●● ● ● ● ● ● ●
●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●
● ●●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●
● ●
●
● ●● ●
●
● ●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ●
●●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ● ● ● ●
● ● ● ●
● ●
●●
0.0
00.0
10.0
20.0
30.0
40.0
50.0
60.0
7
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ●● ●
● ● ● ●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
●
●
● ● ●
●
●
●
●
●
●●
● ● ●
●● ● ● ● ● ● ●
●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●
● ●●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●
● ●
●
● ●● ●
●
● ●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ●
●●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ● ● ● ●
● ● ● ●
● ●
●●
0.0
00.0
10.0
20.0
30.0
40.0
50.0
60.0
7
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ●● ●
● ● ● ●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
●
●
● ● ●
●
●
●
●
●
●●
● ● ●
●● ● ● ● ● ● ●
●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●
● ●●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●
● ●
●
● ●● ●
●
● ●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ●
●●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ● ● ● ●
● ● ● ●
● ●
●●
0.0
00.0
10.0
20.0
30.0
40.0
50.0
60.0
7
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ● ●● ●
● ● ● ●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
●
●
● ● ●
●
●
●
●
●
●●
● ● ●
●● ● ● ● ● ● ●
●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●
● ●●
● ● ●●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
●
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
Signature 1 Signature 2
Signature 3 Signature 4
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;
24
424 Supplementary Figure 8: NMF analysis of f2 variants at rank 4 with random initialization of the NMF algorithm. 425
A: Distribution of signatures across samples. B: Mutational signatures 1-4 (starting top left, anticlockwise). C: 426
Mutational signatures 1-4 where, for each CpG mutation class, we subtracted the minimum over all four 427
signatures from the signature. 428
●
●●
● ● ●
●
●● ●
●
● ● ●●
● ● ●●
●● ●
●●
● ● ● ● ● ●
●●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●●
● ●● ● ● ● ● ●
● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●●
●●
● ●● ● ● ● ● ●
● ● ●● ● ●
● ●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ●
●
● ● ●
●
● ●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
● ●●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
● ●●
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ● ●● ●
●
●● ●
●● ● ●
● ● ● ● ● ● ● ●●
● ● ●● ●
●●
●
● ● ●
●
● ● ●
●
●●
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
5
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●● ●
●
● ● ●●
● ● ●●
●● ●
●●
● ● ● ● ● ●
●●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●●
● ●● ● ● ● ● ●
● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●●
●●
● ●● ● ● ● ● ●
● ● ●● ● ●
● ●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ●
●
● ● ●
●
● ●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
● ●●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
● ●●
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ● ●● ●
●
●● ●
●● ● ●
● ● ● ● ● ● ● ●●
● ● ●● ●
●●
●
● ● ●
●
● ● ●
●
●●
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
5
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●● ●
●
● ● ●●
● ● ●●
●● ●
●●
● ● ● ● ● ●
●●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●●
● ●● ● ● ● ● ●
● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●●
●●
● ●● ● ● ● ● ●
● ● ●● ● ●
● ●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ●
●
● ● ●
●
● ●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
● ●●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
● ●●
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ● ●● ●
●
●● ●
●● ● ●
● ● ● ● ● ● ● ●●
● ● ●● ●
●●
●
● ● ●
●
● ● ●
●
●●
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
5
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●●
● ● ●
●
●● ●
●
● ● ●●
● ● ●●
●● ●
●●
● ● ● ● ● ●
●●
●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●●
● ●● ● ● ● ● ●
● ● ● ● ● ● ●
●
●
●
●
●
●
●●
●
●● ●
●●
●●
● ●● ● ● ● ● ●
● ● ●● ● ●
● ●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ●
●
● ● ●
●
● ●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
00.2
5
Component 2P
roport
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
● ●●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
● ●●
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
5
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●● ●
● ● ● ● ●● ●
●
●● ●
●● ● ●
● ● ● ● ● ● ● ●●
● ● ●● ●
●●
●
● ● ●
●
● ● ●
●
●●
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●
●
●
● ● ● ●
0.0
00.0
50.1
00.1
5
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>GB
CC>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G
●
●
●
● ●●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ● ● ●●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
● ●
●
●● ●
●●
0.0
00.0
10.0
20.0
30.0
4
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●●
● ● ● ● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ●
● ●
●
●● ●
●
● ●●
●
● ●●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ●● ●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
● ●●
●
●●
●●
● ● ●●
● ● ●● ●
●●
●
●● ●
● ● ● ●● ●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●
●●
● ● ● ●●
●●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
20.0
40.0
60.0
80.1
00.1
2
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
●
●● ●
●
●
●
●
●
●●
●
● ● ●
●
●●
●●
●●
●
●
● ● ●
●●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●● ● ● ● ●
●●
● ● ● ● ● ●
●
●
●
●
●●
●●
●
● ●●
● ● ●●
● ●● ● ● ● ● ●
●
●
●
●
● ●●
●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
● ●●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ● ● ●●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
● ●
●
●● ●
●●
0.0
00.0
10.0
20.0
30.0
4
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●●
● ● ● ● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ●
● ●
●
●● ●
●
● ●●
●
● ●●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ●● ●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
● ●●
●
●●
●●
● ● ●●
● ● ●● ●
●●
●
●● ●
● ● ● ●● ●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●
●●
● ● ● ●●
●●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
20.0
40.0
60.0
80.1
00.1
2
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
●
●● ●
●
●
●
●
●
●●
●
● ● ●
●
●●
●●
●●
●
●
● ● ●
●●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●● ● ● ● ●
●●
● ● ● ● ● ●
●
●
●
●
●●
●●
●
● ●●
● ● ●●
● ●● ● ● ● ● ●
●
●
●
●
● ●●
●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
● ●●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ● ● ●●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
● ●
●
●● ●
●●
0.0
00.0
10.0
20.0
30.0
4
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●●
● ● ● ● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ●
● ●
●
●● ●
●
● ●●
●
● ●●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ●● ●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
● ●●
●
●●
●●
● ● ●●
● ● ●● ●
●●
●
●● ●
● ● ● ●● ●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●
●●
● ● ● ●●
●●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
20.0
40.0
60.0
80.1
00.1
2
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
●
●● ●
●
●
●
●
●
●●
●
● ● ●
●
●●
●●
●●
●
●
● ● ●
●●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●● ● ● ● ●
●●
● ● ● ● ● ●
●
●
●
●
●●
●●
●
● ●●
● ● ●●
● ●● ● ● ● ● ●
●
●
●
●
● ●●
●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
●
●
● ●●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ● ● ●●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
● ●
●
●● ●
●●
0.0
00.0
10.0
20.0
30.0
4
Component 1
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●●
● ● ● ● ● ● ● ● ●● ● ●
●● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ●
● ●
●
●● ●
●
● ●●
●
● ●●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
● ●● ●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
00.0
50.1
00.1
50.2
0
Component 2
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
● ●●
●
●●
●●
● ● ●●
● ● ●● ●
●●
●
●● ●
● ● ● ●● ●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●●
●
●●
● ● ● ●●
●●
●● ● ● ● ● ● ● ●
● ● ● ●● ● ● ●
0.0
00.0
20.0
40.0
60.0
80.1
00.1
2
Component 3
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
●
● ●
●
●● ●
●
●
●
●
●
●●
●
● ● ●
●
●●
●●
●●
●
●
● ● ●
●●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●● ● ● ● ●
●●
● ● ● ● ● ●
●
●
●
●
●●
●●
●
● ●●
● ● ●●
● ●● ● ● ● ● ●
●
●
●
●
● ●●
●
0.0
00.0
20.0
40.0
60.0
80.1
0
Component 4
Pro
port
ion o
f m
uta
tions
AC
A.A
AC
C.A
AC
G.A
AC
T.A
CC
A.A
CC
C.A
CC
G.A
CC
T.A
GC
A.A
GC
C.A
GC
G.A
GC
T.A
TC
A.A
TC
C.A
TC
G.A
TC
T.A
AC
A.G
AC
C.G
AC
G.G
AC
T.G
CC
A.G
CC
C.G
CC
G.G
CC
T.G
GC
A.G
GC
C.G
GC
G.G
GC
T.G
TC
A.G
TC
C.G
TC
G.G
TC
T.G
AC
A.T
AC
C.T
AC
G.T
AC
T.T
CC
A.T
CC
C.T
CC
G.T
CC
T.T
GC
A.T
GC
C.T
GC
G.T
GC
T.T
TC
A.T
TC
C.T
TC
G.T
TC
T.T
ATA
.A
AT
C.A
AT
G.A
AT
T.A
CTA
.A
CT
C.A
CT
G.A
CT
T.A
GTA
.A
GT
C.A
GT
G.A
GT
T.A
TTA
.A
TT
C.A
TT
G.A
TT
T.A
ATA
.C
AT
C.C
AT
G.C
AT
T.C
CTA
.C
CT
C.C
CT
G.C
CT
T.C
GTA
.C
GT
C.C
GT
G.C
GT
T.C
TTA
.C
TT
C.C
TT
G.C
TT
T.C
ATA
.G
AT
C.G
AT
G.G
AT
T.G
CTA
.G
CT
C.G
CT
G.G
CT
T.G
GTA
.G
GT
C.G
GT
G.G
GT
T.G
TTA
.G
TT
C.G
TT
G.G
TT
T.G
PCA3
PC
A2
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA4
PC
A1
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA2
PC
A4
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
PCA1
PC
A3
Cell lines
Other
●
●
●
●
●
●
●
EastAsia
Africa
WestEurasia
America
Oceania
SouthAsia
CentralAsiaSiberia
A
Signature2,4 1 Signature2,4 3
Sign
atur
e 2,42
Sign
atur
e 2,4 4
. CC-BY 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/063578doi: bioRxiv preprint first posted online Jul. 13, 2016;