Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Evolution and molecular characteristics of SARS-CoV-2 1
genome 2
Yunmeng Bai1#, Dawei Jiang1#, Jerome R Lon1#, Xiaoshi Chen1, Meiling Hu1, Shudai 3
Lin1, Zixi Chen1, Xiaoning Wang1,2, Yuhuan Meng3*, Hongli Du1* 4
1School of Biology and Biological Engineering, South China University of Technology, 5
Guangzhou 510006, China 6
2State Clinic Center of Gearitic, Chinese PLA General Hospital, Beijing 100853, 7
China 8
3Guangzhou KingMed Transformative Medicine Institute Co., Ltd, Guangzhou 9
510330, China 10
# Equal contribution. 11
* Corresponding authors. 12
E-mail: [email protected](Du HL), [email protected](Meng YH) 13
14
There are 6 figures, 4 tables, 4 supplementary figures and 6 supplementary tables. 15
16
Abstract 17
In the evolution analysis of 622 or 3624 complete SARS-CoV-2 genomes, the 18
estimated Ka/Ks ratio is 1.008 or 1.094, which is higher than that of SARS-CoV and 19
MERS-CoV, and the time to the most recent common ancestor (tMRCA) is inferred in 20
late September 2019, indicating that SARS-CoV-2 may have completed positive 21
selection pressure of the cross-host evolution in the early stage. Further we find 9 key 22
specific sites of highly linkage which play a decisive role in the classification of 23
SARS-CoV-2. Notably the frequencies of haplotype TTTG and H1 are generally high 24
in European countries and correlated to death rate (r>0.37) based on 3624 or 16373 25
SARS-CoV-2 genomes, which indicated that the haplotypes might be related to 26
pathogenicity of SARS-CoV-2. The development trends in chronological order and 27
Ka/Ks ratio of each major haplotype subgroup showed the four major haplotype 28
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
subgroups were going through different evolution patterns, the H3 haplotype 29
subgroup was going through a purifying evolution and almost disappeared after 30
detection, while H1 haplotype subgroup was going through a neutral evolution and 31
globally increased with time. The evolution and molecular characteristics of more 32
than 16000 genomic sequences provided a new perspective for revealing 33
epidemiology of SARS-CoV-2 and coping with SARS-CoV-2 effectively. 34
KEYWORDS: SARS-CoV-2; evolution; classification; haplotype 35
36
Introduction 37
The global outbreak of SARS-CoV-2 is currently and increasingly recognized as a 38
serious, public health concern worldwide. To date, a total of seven types of 39
coronaviruses that can infect humans have been found, among which the more typical 40
ones are SARS-CoV, MERS-CoV and SARS-CoV-2. They spread across species and 41
usually cause mild to severe respiratory diseases. The total number of SARS-CoV and 42
MERS-CoV infections are only 8,069 and 2,494 with reproduction number (R0) 43
fluctuates 2.5-3.9 and 0.3-0.8, respectively. However, SARS-CoV-2 has infected 44
2471136 people in 212 countries up to April 22th(1), with the basic R0 ranging from 45
1.4 to 6.49(2). Among these three typical coronavirus, MERS-CoV has the highest 46
death rate of 34.40%, SARS-CoV has the modest death rate of 9.59%, SARS-CoV-2 47
is about 6.99% and 7.69% death rate of the global and Wuhan, but is quite high death 48
rate in some countries of Europe, such as Belgium, Italy, United Kingdom, 49
Netherlands, Spain and France, which even reaches 14.95%, 13.39%, 13.48%, 50
11.61%, 10.42% and 13.60% respectively according to the data of April 22th, 2020. 51
Except for the shortage of medical supplies, aging and other factors, it is not clear 52
whether there is virus mutation effect in these countries with such a significantly 53
increased death rate. 54
It has been reported that SARS-CoV-2 belongs to beta-coronavirus and is mainly 55
transmitted by the respiratory tract, which belongs to the same subgenus 56
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
(SarbeCoVirus) as SARS-CoV(3). Through analyzing the genome and structure of 57
SARS-CoV-2, its receptor-binding domain (RBD) was found to bind with 58
angiotensin-converting enzyme 2 (ACE2), which is also one of the receptors for 59
binding SARS-CoV(4). Some early genomic studies have shown that SARS-CoV-2 is 60
similar to certain bat viruses (RaTG13, with the whole genome homology of 96.2%(5)) 61
and Malayan pangolins coronaviruses (GD/P1L and GDP2S, with the whole genome 62
homology of 92.4%(6)). They have speculated several possible origins of 63
SARS-CoV-2 based on its spike protein characteristics (cleavage sites or the 64
RBD)(5-7), in particular, SARS-CoV-2 exhibits 97.4% amino acid similarity to the 65
Guangdong pangolin coronaviruses in RBD, even though it is most closely related to 66
bat coronavirus RaTG13 at the whole genome level. However, it is not enough to 67
present genome-wide evolution by a single gene or local evolution of RBD, whether 68
bats or pangolins play an important role in the zoonotic origin of SARS-CoV-2 69
remains uncertain(8). Furthermore, since there is little molecular characteristics 70
analysis of SARS-CoV-2 based on a large number of genomes, it is difficult to 71
determine whether the virus has significant variation that affects its phenotype. 72
Thereby, it is necessary to further reveal the phylogenetic evolution and molecular 73
characteristics of the whole genome of SARS-CoV-2 in order to develop a 74
comprehensive understanding of the virus and provide a basis for designing vaccines 75
and antiviral treatment strategies. Fortunately, with the increase of SARS-CoV-2 76
genome data, we have an opportunity to reveal phylogenetic evolution and molecular 77
characteristics of SARS-CoV-2 more comprehensively. In this study, the Ka/Ks ratio, 78
substitution rate and the time to the most recent common ancestor (tMRCA) of 79
SARS-CoV-2, SARS-CoV and MERS-CoV were analyzed using KaKs_Calculator 80
and BEAST, and then 622 SARS-CoV-2 genome sequences with high quality were 81
clustered using maximum likelihood (ML) with bootstrap of 100 to ensure the 82
reliability of evolutionary trees. According to the clusters of phylogenetic trees, the 83
characteristic mutations of SARS-CoV-2 were screened and their potential 84
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
significance were explored. The results of this study will provide a landscape for us to 85
further understand SARS-CoV-2, and supply a possible basis for the prevention and 86
treatment of SARS-CoV-2 in different areas. 87
Results 88
Genome sequences 89
We got a total of 1053 genomic sequences up to March 22th, 2020. According to the 90
filter criteria, 37 sequences with ambiguous time, 314 with low quality and 78 with 91
similarity of 100% were removed. A total of 624 sequences were obtained to perform 92
multiple sequences alignment. Two highly divergent sequences (EPI_ ISL_414690, 93
EPI_ISL_415710) according to the firstly constructed phylogenetic tree were also 94
filtered out (Table S1). The remaining 622 sequences were used to reconstruct a 95
phylogenetic tree. In addition, total of 3624 and 16373 genomic sequences were 96
redownloaded up to April 6th, 2020 and May 10th, 2020 respectively for further 97
exploring the evolution and molecular characteristics of SARS-CoV-2. 98
Estimate of evolution rate and the time to the most recent common ancestor for 99
SARS-CoV, MERS-CoV, and SARS-CoV-2 100
In this study, date for SARS-CoV-2 ranged from 2019/12/26 to 2020/03/18 was 101
collected. The average Ka/Ks of all the coding sequences was closer to 1 (1.008), 102
which indicating the genome was going through a neutral evolution. We also 103
reevaluated the Ka/Ks of SARS-CoV and MERS-CoV through the whole period, and 104
found the ratio were smaller than SARS-CoV-2 (Table 1). To estimate more credible 105
Ka/Ks for SARS-CoV-2, we recalculated it using redownloaded 3624 genome 106
sequences ranged from 2019/12/26 to 2020/04/06, and the average Ka/Ks of it was 107
1.094 (Table 1), which was almost same with the above result. 108
We assessed the temporal signal using TempEst v1.5.3(9). All three datasets 109
exhibit a positive correlation between root-to-tip divergence and sample collecting 110
time (Figure S1), hence they are suitable for molecular clock analysis in BEAST(9, 111
10). The substitution rate of SARS-CoV-2 genome was estimated to be 1.601 x 10-3 112
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
(95% CI: 1.418 x 10-3 - 1.796 x 10-3, Table 2, Figure S2A) substitution/site/year, 113
which is as in the same order of magnitude as SARS-CoV and MERS-CoV. The 114
tMRCA was inferred on the late September, 2019 (95% CI: 2019/08/28- 115
2019/10/26,Table 2, Figure S2B), about 2 months before the early cases of 116
SARS-CoV-2(11). 117
Phylogenetic tree and clusters of SARS-CoV-2 118
The no-root phylogenetic trees constructed by the maximum likelihood method with 119
PhyML 3.1(12) and MEGA(13) were showed in Figure 1, Figure S3A and 3B. 120
According to the shape of phylogenetic trees, we divided 622 sequences into three 121
clusters: Cluster 1 including 76 sequences mainly from North America, Cluster 2 122
including 367 sequences from all regions of the world, and Cluster 3 including 179 123
sequences mainly from Europe (Table S2). 124
The specific sites of each Cluster 125
The Fst and population frequency of a total of 9 sites (NC_045512.2 as reference 126
genome) were detected (Table 3, Table S3). Thereinto, three (C17747T, A17858G and 127
C18060T) are the specific sites of Cluster 1, and four (C241T, C3037T, C14408T and 128
A23403G) are the specific sites of Cluster 3. Notably, C241T was located in the 129
5’-UTR region and the others were located in coding regions (6 in ofr1ab gene, 1 in S 130
gene and 1 in ORF8 gene). Five of them were missense variant, including C14408T, 131
C17747T and A17858G in ofr1ab gene, A23403G in S gene, and C28144T in ORF8 132
gene. The PCA results showed that these 9 specific sites could clearly separate the 133
three Clusters, while all SNV dataset could not clearly separate Cluster 1 and Cluster 134
2 (Figure S4), which further suggested that these 9 specific sites are the key sites for 135
separating the three Clusters. 136
Linkage of specific sites 137
We found the 9 specific sites are highly linkage based on 622 genome sequences 138
(Figure 2A), then we carried out a further linkage analysis using the 3624 genome 139
sequences (Figure 2B). As a result, for the 3624 genome sequences, 3 specific sites in 140
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
the Cluster 1 were almost complete linkage, and haplotype CAC and TGT accounted 141
for 98.65% of all the 3 site haplotypes. The same phenomenon was also found in 4 142
specific sites of Cluster 3, and haplotype CCCA and TTTG accounted for 97.68% of 143
all the 4 site haplotypes. Intriguingly, the 9 specific sites were still highly linkage, and 144
four haplotypes, including TTCTCACGT (H1), CCCCCACAT (H2), CCTCTGTAC 145
(H3) and CCTCCACAC (H4), accounted for 95.89% of all the 9 site haplotypes. 146
Thereinto, H1 and H3 had completely different bases at the 9 specific sites. The 147
frequencies of each site and major haplotype for each country were showed in Figure 148
3 and Table S4. The data showed that the haplotype TTTG of the 4 specific sites in 149
Cluster 3 had existed globally at present, and still exhibited high frequencies in most 150
European countries but quite low in Asian countries. While the haplotype TGT of the 151
3 specific sites in Cluster 1 existed almost only in North America and Australia. For 152
the 9 specific sites, most countries had only two or three major haplotypes, except 153
America and Australia had all of four major haplotypes with higher frequency. 154
Characteristics of major haplotype subgroups 155
All haplotypes of 9 specific sites for 3624 or 16373 genomes and the numbers of them 156
were shown in Table S5. Four major haplotypes H1, H2, H3 and H4, three minor 157
haplotypes H5, H7 and H8 close to H1 and one minor haplotype H6 close to H3 were 158
found in both datasets. The numbers of these haplotypes for 16373 genomes with 159
clear collection data detected in each country in chronological order were shown in 160
Figure 4A. From these results, H2 and H4 haplotype subgroups had existed for a long 161
time (2019-12-24 to 2020-04-28), and the H3 haplotype subgroup almost disappeared 162
after detection (2020-02-18 to 2020-04-28), while H1 haplotype subgroup was 163
globally increasing with time (2020-02-18 to 2020-05-05), which indicated H1 164
subgroup had adapted to the human hosts, and was under an adaptive growth period 165
worldwide. However, due to the nonrandom sampling on early phase (only patients 166
having a recent travel to Wuhan were detected), some earlier cases of H3 may be lost, 167
the high proportion of H3 subgroup during February 18 to March 10 could confirm 168
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
this point. 169
H3 and H4 subgroups had the lowest Ka/Ks ratio in 3624 or 16373 genomes among 170
the 4 major subgroups (Table 4), suggesting that H3 and H4 subgroup might be going 171
through a purifying evolution and have existed for a long time, while H2 and H1 172
subgroup might be going through a positive selection and a neutral evolution 173
respectively according to their Ka/Ks ratios (Table 4), which were consistent with the 174
above phenomenon that only H1 subgroup was expanding with time around the world. 175
From the whole genome mutations in each major haplotype subgroup (Figure 4B, 176
Table S6), we found that except the 9 specific sites, there was no common mutations 177
with frequencies more than 0.05 in these haplotype subgroups but between H2 and H4 178
subgroups. 179
Phylogenetic network of haplotype subgroups 180
Phylogenetic networks were inferred with 697 mutations called from 3624 genomes 181
dataset. For these datasets, the network structures of TCS and MSN were similar. The 182
major haplotype subgroups H4 and H2 were in the middle of the network, while H1 183
and H3 were in the end nodes of the network (Figure 5). According to the 184
phylogenetic networks, we proposed four hypothesis: (1) the ancestral haplotypes 185
evolved in four directions to obtain H1, H2, H3 and H4 respectively or evolved in two 186
or more directions to obtain two or more major haplotypes and then involved into the 187
other major haplotype(s); (2) H2 or H4 evolved in two directions respectively and 188
finally generated H1 and H3; (3) H1 evolved in one direction to generate H2 and H4, 189
and then evolved into H3; (4) H3 evolved in one direction to generate H4 and H2, and 190
then evolved into H1. Although we cannot exclude the first hypothesis based on the 191
present data, but if there are evolutionary transformation relationships between the 192
four major haplotypes, according to the Ka/Ks, population size and development trend 193
in chronological order of each major haplotype subgroup, we speculate that the most 194
likely transformation hypothesis is that H3 is the ancestral haplotype, which is 195
gradually eliminated with selection, while H4 and H2 is the transitional haplotype in 196
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
the evolution process, and H1 may be the haplotype to be finally fixed. 197
Correlation analysis of specific sites with death rate 198
To explore the relationship between death rate and the 9 specific sites, we used 199
Pearson method to calculate the correlation coefficient between death rate and 200
frequency of each specific site or major haplotype in 17 countries with 3624 genomes 201
at early stage. As a result, all r values of 241T, 3037T, 14408T, 23403G and haplotype 202
TTTG and H1 were more than 0.4 (Figure 6, Table S4). We also evaluated the 203
correlation coefficient with 16373 genomes in 30 countries, the r values of haplotype 204
TTTG and H1 were still more than 0.37 (Table S4), which might be increased when 205
the death rates of some countries were further stabilized with time. These finding 206
integrating their high frequencies in most European countries indicated that the 4 sites 207
and haplotype TTTG and H1 might be related to the pathogenicity of SARS-CoV-2. 208
209
Discussion 210
SARS-CoV-2 poses a great threat to the production, living and even survival of 211
human beings(14). With the further outbreak of SARS-CoV-2 in the world, further 212
understanding on the evolution and molecular characteristics of SARS-CoV-2 based 213
on a large number of genome sequences will help us cope better with the challenges 214
brought by SARS-CoV-2. In the present study, we estimated the Ka/Ks ratio and 215
tMRCA of SARS-CoV-2, SARS-CoV and MERS-CoV, and constructed the 216
phylogenetic tree of SARS-CoV-2 genomes. As a result, we found that 9 specific sites 217
of highly linkage could play a decisive role in the classification of SARS-CoV-2, and 218
the haplotype TTTG might be related to pathogenicity of SARS-CoV-2 based on more 219
than 16000 genomes. All these results could accelerate our further understanding of 220
SARS-CoV-2. 221
Exploring evolution rate, tMRCA and phylogenetic tree of SARS-CoV-2 would 222
help us better understand the virus(15). The average Ka/Ks for all the coding 223
sequences of 622 and 3624 SARS-CoV-2 genomes was 1.008 and 1.094, which was 224
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
higher than those of SARS-CoV and MERS-CoV, indicating that the SARS-CoV-2 225
was going through a neutral evolution. In general, SARS-CoV would show purifying 226
selection pressures and the Ka/Ks would go down at the middle or end of the 227
epidemic(16). Therefore, it seemed to be reasonable considering that SARS-CoV-2 228
was under an exponential growth period around the world. Interestingly, we also 229
found that the subgroups of different haplotypes (H1, H2, H3 and H4) seemed to 230
experience different evolutionary patterns according to their Ka/Ks. The H3 haplotype 231
subgroup disappeared soon after detection (2020-02-18 to 2020-04-07, Figure 4A), 232
while H1 haplotype subgroup was globally increasing with time, these evolution and 233
changes should be considered in developing therapeutic drugs and vaccines as soon as 234
possible. The tMRCA of SARS-CoV-2 was inferred on the late September, 2019 (95% 235
CI: 2019/08/28- 2019/10/26), about 2 months before the early cases of 236
SARS-CoV-2(11). We also estimated tMRCA of SARS-CoV and MERS-CoV with 237
the same methods, both were about 3 months later than the corresponding tMRCAs 238
estimated by previous studied(17, 18), which indicated the tMRCA of SARS-CoV-2 239
we estimated was reasonable. A recent study used TreeDater method to estimate 240
tMRCA for more than 7000 SARS-CoV-2 genomes and indicated the tMRCA of 241
SARS-CoV-2 was around 6 October 2019 to 11 December 2019, which were about 242
one and a half months later than ours and in broad agreement with six previous 243
studies all performed on smaller subsets of the SARS-CoV-2 genomes with BEAST 244
method(19). While we used the most common method BEAST to estimate tMRCA of 245
SARS-CoV-2 based on 622 genomes, indicating that the estimated tMRCA are 246
reliable. 247
A recent study clustered 160 SARS-CoV-2 whole-genome sequences into A, B 248
and C groups by a phylogenetic network analysis(20). The cluster result was similar 249
to our study: both the samples in Cluster A and our Cluster 1 were mainly from the 250
United States; the samples in Cluster C and our Cluster 3 were mainly from European 251
countries, while Cluster B and our Cluster 2 were mainly from China and the other 252
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
regions. It was interesting that the markers in Cluster B (T8782C and C28144T) were 253
also found in our study, but the other markers in Cluster A (T29095C) and Cluster C 254
(G26144T) were not significantly in our study. That may be caused by different 255
sample sizes and different constructing methods of phylogenetic tree. Based on the 256
base substitution model, the ML method avoids the possible "long-branch attraction" 257
problem in the maximum parsimony method and is faster than the Bayesian 258
method(21), hence it could be used as a reliable method for phylogenetic analysis. 259
Nowadays, there is still a shortage of studies on phylogenetic analysis of 260
SARS-CoV-2. Some studies used the genome of bat SARS-like-CoV(22), RaTG13(22) 261
or MT019529 from Wuhan (https://bigd.big.ac.cn/ncov/tree) as the root of 262
phylogenetic tree. Unfortunately, there was no obvious evidence showing that 263
SARS-CoV-2 was from the bat coronavirus even the identity between SARS-CoV-2 264
and RaTG13 was up to 96.2%(5). In our study, the tMRCA of SARS-CoV-2 was 265
inferred on the late September 2019, which indicated there might exist an earlier 266
SARS-CoV-2 strain we didn't find. Then in the case of unclear source of 267
SARS-CoV-2 and high homology of its genomes (> 99.9% homology mostly), it may 268
be inappropriate to identify the evolutionary characteristics inside the genomes by 269
taking bat SARS-like-CoV, RaTG13 or MT019529 as root. Therefore, the maximum 270
likelihood method was used in current study to construct a no-root tree to obtain the 271
reliable clusters with different characteristics. Based on the no-root tree, we identified 272
9 specific sites of highly linkage that play a decisive role in the classification of 273
clusters successfully. Among the four major haplotypes, H1 and H3 were in Cluster 3 274
and Cluster 1, respectively, while there were two haplotypes H2 and H4 in Cluster 2 275
(Figure 1, Figure S3B). 276
Among the 9 specific sites, 8 of them were located in coding regions (6 in ofr1ab 277
gene, 1 in S gene and 1 in ORF8 gene), and 5 of them were missense variant, 278
including C14408T, C17747T and A17858G in ofr1ab gene, A23403G in S gene, and 279
C28144T in ORF8 gene. Orf1ab is composed of two partially overlapping open 280
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
reading frames (orf), orf1a and 1b. It is proteolytic cleaved into 16 non-structural 281
proteins (nsp), including nsp1 (suppress antiviral host response), nsp9 (RNA/DNA 282
binding activity), nsp12 (RNA-dependent RNA polymerase), nsp13 (helicase) and 283
others(23), indicating the vital role of it in transcription, replication, innate immune 284
response and virulence(24). C14408T with high frequencies of T in European 285
countries were located at nsp12 region, indicating that this missense variant might 286
influence the role of RNA polymerase. Spike glycoprotein, the largest structural 287
protein on the surface of coronaviruses, comprises of S1 and S2 subunits which 288
mediating binding the receptor on the host cell surface and fusing membranes, 289
respectively(25). It has been reported that S protein of SARS-CoV-2 can bind 290
angiotensin-converting enzyme 2 (ACE2) with higher affinity than that of SARS(4, 5). 291
Whether the missense variant of A23403G in S gene, which with high frequencies of 292
G in European countries, will change the affinity between S protein and ACE2 293
remains to be further investigated. 294
It seems to take a long time to fix mutations finally according to the mutation 295
frequency of each subgroup. For example, H2 and H4 subgroups, which have been 296
detected for more than four months from December 24, 2019 to May 5, 2020 (Figure 297
4A), have more mutations with higher frequencies, but the highest mutation frequency 298
is only 0.486 at the position of 11083 (Figure 4B, Table S6). From these phenomena, 299
it can be inferred that it takes a long time for the specific sites of each major subgroup 300
to be fixed, but it may be faster if the early population is small. In addition, there is 301
also the possibility that an ancestor strain evolved in four directions by obtaining the 302
specific mutations directly and produced the four current major haplotypes, so the 303
evolution time may be shorter, which seems to be consistent with the phenomenon 304
that four major haplotypes can be detected in two months (Figure 4A). However, if 305
there exist transformation evolution pattern among the major haplotypes of 306
SARS-CoV-2, it is difficult to complete the transformation among the four major 307
haplotypes within two months at the current evolution rate of each major haplotype 308
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
population. Therefore, we speculate that the transformation between the four major 309
haplotypes may have been completed for a long time, which have not been detected. 310
Previous studies showed that the death rate of SARS-CoV-2 is affected by many 311
factors, such as medical supplies, aging, life style and other factors(26, 27). However, 312
our study found that the 4 specific sites (C241T, C3037T, C14408T and A23403G) in 313
the Cluster 3 were almost complete linkage, and the frequency of haplotype TTTG 314
was generally high in European countries and correlated to death rate (r>0.37) based 315
on 3624 or 16373 SARS-CoV-2 genomes, which provides a new perspective to the 316
reasons of relatively high death rate in Europe. In order to make frequency relatively 317
credible, we selected countries with more than 10 genomes (except South Korea with 318
16 genomes, all the other countries with more than 20 genomes) to analyze the 319
correlation between frequencies of haplotypes and death rate. Therefore, the 320
correlation between the frequencies of haplotypes and death rate obtained in this study 321
is quite reliable, and these specific sites should be considered in designing new 322
vaccine and drug development of SARS-CoV-2. 323
Conclusion and prospective 324
The Ka/Ks ratio of SARS-CoV-2 and tMRCA of SARS-CoV-2 (95% CI: 2019/08/28- 325
2019/10/26) indicated that SARS-CoV-2 might have completed the positive selection 326
pressure of cross-host evolution in the early stage and be going through a neutral 327
evolution at present. The 9 specific sites with highly linkage were found to play a 328
decisive role in the classification of clusters. Thereinto, 3 of them are the specific sites 329
of Cluster 1 mainly from North America and Australia, 4 of them are the specific sites 330
of Cluster 3 mainly from Europe in the early stage, both of these 3 and 4 specific sites 331
are almost complete linkage. The frequencies of haplotype TTTG for the 4 specific 332
sites and H1 for 9 specific sites were generally high in European countries and 333
correlated to death rate (r>0.37) based on 3624 or 16373 SARS-CoV-2 genomes, 334
which suggested these haplotypes might relate to pathogenicity of SARS-CoV-2 and 335
provided a new perspective to the reasons of relatively high death rates in Europe. 336
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
The relationship between the haplotype TTTG or H1 and the pathogenicity of 337
SARS-CoV-2 needs to be further verified by clinical samples or virus virulence test 338
experiment, and the 9 specific sites with highly linkage may provide a starting point 339
for the traceability research of SARS-CoV-2. Given that the different evolution 340
patterns of different haplotypes subgroups, we should consider these evolution and 341
changes in the development of therapeutic drugs and vaccines as soon as possible. 342
343
Materials and methods 344
Genome sequences 345
The complete genome sequences of SARS-CoV-2 were downloaded from China 346
National Center for Bioinformation (https://bigd.big.ac.cn/ncov/release_genome/) and 347
GISAID (https://www.gisaid.org/) up to March 22th, 2020. The sequences were 348
filtered out according to the following criteria: (1) sequences with ambiguous time; (2) 349
sequences with low quality which contained the counts of unknown bases > 15 and 350
degenerate bases > 50 (https://bigd.big.ac.cn/ncov/release_genome); (3) sequences 351
with similarity of 100% were removed to unique one. Finally, 624 high quality 352
genomes with precise collection time were selected and aligned using MAFFT v7 353
with automatic parameters. Besides, the genome sequences of 7 SARS-CoV and 475 354
MERS-CoV were also downloaded from NCBI (https://www.ncbi.nlm.nih.gov/), and 355
the MERS-CoV dataset including samples collected from both human and camel. In 356
addition, for further exploring evolution and molecular characteristics of 357
SARS-CoV-2 based on the larger amount of genomic data, we redownloaded 358
validation datasets of the genome sequences from GISAID up to April 6th, 2020 and 359
May 10th, 2020 respectively. 360
Estimate of evolution rate and the time to the most recent common ancestor for 361
SARS-CoV, MERS-CoV, and SARS-CoV-2 362
The average Ka, Ks and Ka/Ks for all coding sequences were calculated using 363
KaKs_Calculator v1.2(28), and the substitution rate and tMRCA were estimated using 364
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
BEAST v2.6.2(10). The temporal signal with root-to-tip divergence was visualized in 365
TempEst v1.5.3(9) using a ML whole genome tree with bootstrap value as input. For 366
SARS-CoV and SARS-CoV-2, we selected a strict molecular clock and Coalescent 367
Exponential Population Model. For MERS-CoV, we selected a relaxed molecular 368
clock and Birth Death Skyline Serial Cond Root Model. We used the tip dates and 369
chose the HKY as the site substitution model in all these analyses. Markov Chain 370
Monte Carlo (MCMC) chain length was set to 10,000,000 steps sampling after every 371
1000 steps. The output was examined in Tracer v1.6 372
(http://tree.bio.ed.ac.uk/software/tracer/). 373
Variants calling of SARS-CoV-2 genome sequences 374
Each genome sequence was aligned to the Wuhan-Hu-1 reference genome 375
(NC_045512.2) using bowtie2 with default parameters(29), and variants were called 376
by samtools (sort; mpileup -gf) and bcftoots (call -vm). The merge VCF files were 377
created by bgzip and bcftools (merge --missing-to-ref)(30, 31). 378
Phylogenetic tree construction and virus isolates clustering for SARS-CoV-2 379
After alignment of 624 SARS-CoV-2 genomes with high quality and manually deleted 380
2 highly divergent genomes (EPI_ISL_415710, EPI_ISL_414690) according to the 381
firstly constructed phylogenetic tree, the aligned dataset of 622 sequences was 382
phylogenetically analyzed. The SMS method was used to select GTR+G as the base 383
substitution model(32), and the PhyML 3.1(12) and MEGA(13) were used to 384
construct the no-root phylogenetic tree by the maximum likelihood method with the 385
bootstrap value of 100. The online tool iTOL(33) was used to visualized the 386
phylogenetic tree. The clusters were defined by the shape of phylogenetic tree. 387
Detection of specific sites from each Cluster 388
Information (ID, countries/regions and collection time) and variants (NC_045512.2 as 389
reference genome) of each genome from each Cluster were extracted. The allele 390
frequency and nucleotide divergency (pi) for each site in the virus population of each 391
Cluster were measured by vcftools(34). The Fst were also calculated by vcftools(34) 392
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
to assess the diversity between the Clusters. Sites with high level of Fst together with 393
different major allele in each Cluster were filtered as the specific sites. PCA were 394
analyzed by the GCTA v1.93.1beta(35) with the specific sites and all SNV dataset 395
respectively. 396
Linkage analysis of specific sites and characteristics of major haplotype 397
subgroups 398
The linkage disequilibrium of the specific sites were analyzed by haploview(36), and 399
the statistics of the haplotype of the specific sites for each Cluster or country were 400
used in-house perl script. 401
Phylogenetic network of haplotype subgroups 402
The phylogenetic networks were inferred by PopART package v1.7.2(37) using TCS 403
and minimum spanning network (MSN) methods respectively. 404
Frequencies of specific sites or haplotypes and correlation with death rate 405
The frequencies of specific sites for each country were calculated. The death rate was 406
estimated with Total Deaths /Confirmed Cases based on the data from Johns Hopkins 407
resources on May 12th, 2020 408
(https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467409
b48e9ecf6).The correlation coefficient between death rate and frequencies of specific 410
site or haplotype in different countries was calculated using Pearson method. 411
412
Authors’ contributions 413
HD conceived the study. YM, YB, DJ, JL and XC carried out the data analysis and 414
wrote the manuscript. MH, SL, and ZC collected data and revised the manuscript. 415
XW attended the discussions. HD and YM supervised the whole work and revised the 416
manuscript. 417
418
Competing interests 419
The authors declare no competing interests. 420
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
421
Acknowledgements 422
This work was supported by the National Key R&D Program of China 423
(2018YFC0910201), the Key R&D Program of Guangdong Province 424
(2019B020226001), the Science and the Technology Planning Project of Guangzhou 425
(201704020176 and 2020Q-P013). 426
427
References 428 1. Coronavirus disease 2019 (COVID-19) Situation Report – 93. WHO. 2020. 429 2. Liu Y, Gayle AA, Wilder-Smith A, Rocklov J. The reproductive number of COVID-19 is higher 430 compared to SARS coronavirus. J Travel Med. 2020;27(2). 431 3. Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 432 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 433 2020;395(10224):565-74. 434 4. Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh CL, Abiona O, et al. Cryo-EM structure of 435 the 2019-nCoV spike in the prefusion conformation. Science. 2020;367(6483):1260-3. 436 5. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated 437 with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270-3. 438 6. Lam TT, Shum MH, Zhu HC, Tong YG, Ni XB, Liao YS, et al. Identifying SARS-CoV-2 related 439 coronaviruses in Malayan pangolins. Nature. 2020. 440 7. Zhang YZ, Holmes EC. A Genomic Perspective on the Origin and Emergence of SARS-CoV-2. 441 Cell. 2020. 442 8. Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. The proximal origin of 443 SARS-CoV-2. Nature Medicine. 2020. 444 9. Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of 445 heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016;2(1):vew007. 446 10. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchene S, Fourment M, Gavryushkina A, et al. 447 BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 448 2019;15(4):e1006650. 449 11. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 450 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497-506. 451 12. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and 452 methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst 453 Biol. 2010;59(3):307-21. 454 13. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics 455 Analysis across Computing Platforms. Mol Biol Evol. 2018;35(6):1547-9. 456 14. Guo YR, Cao QD, Hong ZS, Tan YY, Chen SD, Jin HJ, et al. The origin, transmission and clinical 457 therapies on coronavirus disease 2019 (COVID-19) outbreak - an update on the status. Mil Med Res. 458
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
2020;7(1):11. 459 15. Yuen K-S, Ye Z-W, Fung S-Y, Chan C-P, Jin D-Y. SARS-CoV-2 and COVID-19: The most 460 important research questions. Cell Biosci. 2020;10:40. 461 16. Chinese SMEC. Molecular evolution of the SARS coronavirus during the course of the SARS 462 epidemic in China. Science. 2004;303(5664):1666-9. 463 17. Zhao Z, Li H, Wu X, Zhong Y, Zhang K, Zhang YP, et al. Moderate mutation rate in the SARS 464 coronavirus genome and its implications. BMC Evol Biol. 2004;4:21. 465 18. Cotten M, Watson SJ, Zumla AI, Makhdoom HQ, Palser AL, Ong SH, et al. Spread, circulation, 466 and evolution of the Middle East respiratory syndrome coronavirus. mBio. 2014;5(1). 467 19. van Dorp L, Acman M, Richard D, Shaw LP, Ford CE, Ormond L, et al. Emergence of genomic 468 diversity and recurrent mutations in SARS-CoV-2. Infection, Genetics and Evolution. 2020:104351. 469 20. Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 470 genomes. Proceedings of the National Academy of Sciences. 2020:202004999. 471 21. Holder M, Lewis PO. Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet. 472 2003;4(4):275-84. 473 22. Zhang L, Shen F-m, Chen F, Lin Z. Origin and Evolution of the 2019 Novel Coronavirus. Clinical 474 Infectious Diseases. 2020. 475 23. Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, et al. Genomic characterization of the 2019 476 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting 477 Wuhan. Emerg Microbes Infect. 2020;9(1):221-36. 478 24. Graham RL, Sparks JS, Eckerle LD, Sims AC, Denison MR. SARS coronavirus replicase proteins 479 in pathogenesis. Virus Res. 2008;133(1):88-100. 480 25. Li F. Structure, Function, and Evolution of Coronavirus Spike Proteins. Annu Rev Virol. 481 2016;3(1):237-61. 482 26. Malavolta M, Giacconi R, Brunetti D, Provinciali M, Maggi F. Exploring the Relevance of 483 Senotherapeutics for the Current SARS-CoV-2 Emergency and Similar Future Global Health Threats. 484 Cells. 2020;9(4). 485 27. Liu Y, Du X, Chen J, Jin Y, Peng L, Wang HHX, et al. Neutrophil-to-lymphocyte ratio as an 486 independent risk factor for mortality in hospitalized patients with COVID-19. J Infect. 2020. 487 28. Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs_Calculator: calculating Ka and Ks 488 through model selection and model averaging. Genomics Proteomics Bioinformatics. 489 2006;4(4):259-63. 490 29. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 491 2012;9(4):357-9. 492 30. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map 493 format and SAMtools. Bioinformatics (Oxford, England). 2009;25(16):2078-9. 494 31. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and 495 population genetical parameter estimation from sequencing data. Bioinformatics (Oxford, England). 496 2011;27(21):2987-93. 497 32. Lefort V, Longueville JE, Gascuel O. SMS: Smart Model Selection in PhyML. Mol Biol Evol. 498 2017;34(9):2422-4. 499 33. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. 500
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
Nucleic Acids Res. 2019;47(W1):W256-W9. 501 34. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format 502 and VCFtools. Bioinformatics (Oxford, England). 2011;27(15):2156-8. 503 35. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait 504 analysis. Am J Hum Genet. 2011;88(1):76-82. 505 36. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype 506 maps. Bioinformatics. 2005;21(2):263-5. 507
37. Leigh JW, Bryant D, Nakagawa S. popart: full ‐ feature software for haplotype network 508
construction. Methods in Ecology and Evolution. 2015;6(9):1110-6. 509
510
Figure Legends 511
Figure 1 Phylogenetic tree and clusters of 622 SARS-CoV-2 genomes 512
The 622 sequences were clustered into three clusters: Cluster 1 were mainly from 513
North America, Cluster 2 were from regions all over the world, and Cluster 3 were 514
mainly from Europe. 515
Figure 2 Linkage disequilibrium plot of haplotypes of the 9 specific sites 516
A. The plot for 622 genome sequences; B. The plot for 3624 genome sequences. 517
Figure 3 The frequencies of both the 9 specific sites and haplotypes 518
The frequencies of the 9 specific sites (A) and haplotypes (B) in each country for 519
3624 genomes. 520
Figure 4 The characteristics of haplotype subgroups 521
A. The numbers of haplotypes of the 9 specific sites for 16373 genomes with clear 522
collection data detected in each country in chronological order; B. The whole genome 523
mutations in each major haplotype subgroup. 524
Figure 5 Phylogenetic network of haplotype subgroups for 3624 genomes 525
The network was inferred by POPART using TCS method. Each colored vertex 526
represents a haplotype, with different colors indicating the different sampling areas. 527
Hatch marks along edge indicated the number of mutations. Small black circles within 528
the network indicated unsampled haplotypes. H1-H5 subgroups were point out 529
according haplotypes of the 9 specific sites, and other small subgroups were not 530
specially pointed out. 531
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
Figure 6 The correlation between death rate and frequencies of both the 9 532
specific sites and haplotypes 533
534
Tables 535
Table 1 Statistics of Ka, Ks and Ka/Ks ratios for all coding regions of the 536
SARS-CoV, MERS-CoV and SARS-CoV-2 genome sequences 537
Table 2 Substitution rate and tMRCA estimated by BEAST v2.6.2 538
Table 3 The information of the 9 specific sites in each Cluster 539
Table 4 Statistics of Ka, Ks and Ka/Ks ratios for all coding regions of each 4 540
major haplotype subgroup with 3624 and 16373 genomes respectively 541
542
Supplementary material 543
Figure S1 Regression analyses of the root-to-tip divergence with the 544
maximum-likelihood phylogenetic tree of SARS-CoV (A), MERS-CoV (B) and 545
SARS-CoV-2 (C) 546
The root-to-tip genetic distances were inferred in TempEst v1.5.3. Three data sets 547
exhibit a positive correlation between root-to-tip divergence and sample collecting 548
time, and they appear to be suitable for molecular clock analysis. 549
Figure S2 Substitution rate and tMRCA estimate of SARS-CoV-2 550
A. Estimated substitution rate of SARS-CoV-2 using BEAST v2.6.2 under a strict 551
molecular clock. The mean rate was 1.601 x 10-3 (substitutions per site per year). B. 552
The estimated tMRCA of SARS-CoV-2 using BEAST v2.6.2 under a strict molecular 553
clock. The mean tMRCA was 2019.74 (date: 2019-09-27). 554
Figure S3 Phylogenetic tree of 622 SARS-CoV-2 genomes 555
A. Phylogenetic tree constructed by PhyML with bootstrap value of 100; B. Phylogenetic tree 556
constructed by MEGA with bootstrap value of 100. 557
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
Figure S4 The PCAs analysis result based on the 9 specific sites (A) and all SNVs 558
(B) 559
Table S1 The information of the first downloaded complete genome sequences 560
Table S2 The sequence IDs in each Cluster 561
Table S3 The details information of the SNVs in all Clusters 562
Table S4 The information of death rate, frequencies of the 9 specific sites and 563
major haplotypes in each country for 3624 and 16373 genomes respectively 564
Table S5 All haplotypes of the 9 specific sites and numbers of them for 3624 and 565
16373 genomes respectively 566
Table S6 Whole genome mutations and frequencies in each major haplotype 567
subgroup of 3624 and 16373 genomes respectively 568
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted May 15, 2020. ; https://doi.org/10.1101/2020.04.24.058933doi: bioRxiv preprint
https://doi.org/10.1101/2020.04.24.058933http://creativecommons.org/licenses/by-nc-nd/4.0/
EPI.ISL.413573
EPI.ISL.414624
EPI.ISL.413018
EPI.ISL.415605
EPI.ISL.413577
EPI.ISL.412873
EPI.ISL.41
5707
EPI.ISL.403963
EPI.ISL.416338
MT152824
EPI.ISL.415154
EPI.ISL.4140
20
EPI.ISL.414617
EPI.ISL.410531
EPI.IS
L.4156
41
EPI.ISL.416415
70-
20
03
10
06
CD
MN
EPI.ISL.413015
EPI.ISL.415626
EPI.IS
L.4164
82
EPI.ISL.415711
EPI.ISL.413853
EPI.ISL.416448
MN994468
EPI.ISL.415471
EPI.ISL.412872
EPI.ISL.414537
EPI.ISL.413459
EPI.ISL.416411
EPI.ISL.413864
EPI.ISL.416442
EPI.ISL.414563
EPI.ISL.414571
EPI.ISL.416036
EPI.ISL.414544
EPI.ISL.414638
MT159712
EPI.ISL.413599
EPI.ISL.41
4011
EPI.ISL.414554
EPI.ISL.414630
EPI.IS
L.4154
56
EPI.ISL.413214
EPI.ISL.413582
EPI.ISL.415621
EPI.ISL.
415128
EPI.ISL.415700
EPI.ISL.416373
EPI.ISL.411915
MT106053
EPI.ISL.412981
MT019529
EPI.ISL.415640
MT072688
EPI.ISL.407896
EPI.ISL.416451
EPI.ISL.415602
EPI.ISL.415625
MN988713
EPI.ISL.406533
EPI.ISL.415596
EPI.ISL.415543
EPI.ISL.414429
EPI.ISL.407893
EPI.ISL.415606
EPI.ISL.416521
EPI.ISL.406973
EPI.ISL.413861
EPI.ISL.415608
EPI.ISL.41637679
44
14.
LSI.I
PE
EPI.ISL.
414005
EPI.ISL.408514
EPI.IS
L.4105
46
EPI.IS
L.4145
24
EPI.IS
L.4157
09
EPI.IS
L.4140
45
EPI.ISL.416336
MT163718
MT159718
EPI.ISL.413555
EPI.ISL.415609
13
55
14.
LSI.I
PE
EPI.ISL.414632
64
46
14.
LSI.I
PE
EPI.ISL.415595
EPI.ISL.407987
EPI.ISL.412978
EPI.ISL.413863
EPI.ISL.415627
EPI.ISL.416497
EPI.ISL.413490
EPI.ISL.
413648
EPI.ISL.407313
EPI.ISL.415619
EPI.ISL.406595
EPI.IS
L.4060
31
EPI.ISL.413996
EPI.ISL.408668
EPI.ISL.408485
EPI.ISL.408976
12
54
14.
LSI
.IP
E
MT188341
EPI.ISL.415516
87
48
04.
LSI
.IP
E
EPI.ISL.408977
MT066176EPI.ISL.410717
EPI.ISL.416519
EPI.IS
L.4165
07
EPI.ISL.416402
EPI.ISL.415156
EPI.ISL.416333
EPI.ISL.413020
EPI.ISL.415607
EPI.ISL.415158
MN994467
EPI.ISL.414536
NMDC60013002-09
EPI.ISL.414597
EPI.ISL.414467
MT012098
EPI.ISL.408486
EPI.ISL.416458
EPI.ISL.416361
EPI.ISL.414625
EPI.ISL.415603
EPI.ISL.412975
MT159720
78
44
14.
LSI.I
PE
EPI.ISL.415646
EPI.ISL.415153
26
93
04.
LSI
.IP
E
LC522974
EPI.ISL.416412
EPI.ISL.415485
EPI.ISL.415597
EPI.ISL.414470
EPI.ISL.415654
EPI.ISL.414634
EPI.ISL.415650
EPI.ISL.415614
EPI.ISL.415486
EPI.ISL.413860
EPI.ISL.414635
MT123293
EPI.ISL.4140
19
EPI.ISL.415467
EPI.ISL.415598
EPI.ISL.415612
EPI.ISL.415581
EPI.ISL.
416493
EPI.ISL.403936
EPI.ISL.415501
EPI.ISL.411950
EPI.ISL.415129
EPI.ISL.413595
EPI.ISL.415658
EPI.ISL.
414525
EPI.ISL.415491
EPI.ISL.414618
82
93
14.
LSI.I
PE
EPI.ISL.414600
EPI.ISL.408488
EPI.ISL.416431
97
36
14.
LSI
.IP
E
02
64
14.
LSI.I
PE
EPI.ISL.416377
EPI.IS
L.4157
01
EPI.ISL.414601
EPI.ISL.412870
EPI.ISL.415698
LC521925
EPI.ISL.414626
EPI.ISL.415534
EPI.ISL.411952
EPI.ISL.415476
52
35
89
NM
EPI.ISL.414663
EPI.ISL.414526
EPI.ISL.416363
EPI.IS
L.4139
99
EPI.ISL.415525
EPI.ISL.415488
MT126808
EPI.ISL.413600
EPI.ISL.412912
EPI.ISL.416317
EPI.ISL.413651
EPI.ISL.414688
EPI.ISL.412026
MN996531
EPI.ISL.414595
EPI.ISL.407193
EPI.ISL.414578
MN938384
EPI.ISL.416035
EPI.ISL.416425
EPI.ISL.414691
EPI.ISL.414633
EPI.ISL.415157
EPI.ISL.413014
MT159707
EPI.ISL.412030
EPI.ISL.415473
EPI.ISL.416498
EPI.ISL.416465
MT019532
EPI.ISL.416353
EPI.ISL.416436
EPI.ISL.415620
EPI.ISL.416389
EPI.ISL.415497
EPI.ISL.414621
EPI.ISL.
414006
23
12
04.
LSI
.IP
E
45
06
01
TM
EPI.ISL.414938
EPI.ISL.408479
EPI.ISL.41
5704
EPI.ISL.416513
EPI.IS
L.4154
55
EPI.ISL.415468
LC522975
EPI.ISL.416473
EPI.ISL.413604
EPI.ISL.406535
EPI.ISL.416479
EPI.ISL.410714
EPI.ISL.413580
EPI.ISL.411
219
EPI.ISL.413857
EPI.ISL.415613
EPI.ISL.416455
MT044258
EPI.ISL.411926
EPI.ISL.414522
NMDC60013002-08
MT135041
EPI.ISL.409067
EPI.ISL.4164
89
EPI.ISL.416024
EPI.ISL.408430
EPI.ISL.413456
EPI.ISL.406592
EPI.ISL.415592
EPI.ISL.416398
EPI.ISL.414586
EPI.ISL.411902
MT159716
EPI.ISL.415615
EPI.ISL.414580
EPI.ISL.414627
MT019533
EPI.ISL.416512
EPI.ISL.415457
EPI.ISL.413022
EPI.ISL.412982
EPI.ISL.416403
EPI.ISL.416514
EPI.ISL.410719
MN908947
EPI.ISL.416320
EPI.ISL.414428
EPI.ISL.413854
EPI.IS
L.4144
45
EPI.ISL.416342
EPI.ISL.412973
EPI.ISL.
414631
EPI.ISL.415458
MT184908
EPI.ISL.414579
MT007544
EPI.ISL.416397
EPI.ISL.413653
EPI.ISL.410716
EPI.ISL.41
6426
EPI.ISL.41
4451
EPI.ISL.414539
EPI.ISL.416434
EPI.ISL.414023
02
95
14.
LSI.I
PE
EPI.ISL.416477
EPI.IS
L.4140
22EPI
.ISL.4
13997
EPI.ISL.413650
EPI.ISL.415481
EPI.ISL.4154
92
EPI.ISL.415601
52
93
14.
LSI.I
PE
EPI.ISL.415583
EPI.ISL.415151
MT159717
EPI.ISL.416330
EPI.ISL.416390
EPI.ISL.416452
EPI.ISL.416334
EPI.ISL.414636
EPI.ISL.415703
EPI.ISL.415532
EPI.ISL.413592
EPI.ISL.413563
EPI.ISL.413559
EPI.ISL.415539
06
46
14.
LSI.I
PE
EPI.ISL.413560
EPI.ISL.415743
CNA0007332
EPI.ISL.416468
EPI.ISL.
415435
EPI.ISL.416441
EPI.ISL.416510
MT039873
MT163719
EPI.ISL.416406
EPI.ISL.410715
EPI.ISL.414637
EPI.IS
L.4145
27
EPI.ISL.416322
MT039887
EPI.ISL.416430
EPI.ISL.415604
EPI.ISL.412979
EPI.ISL.416494
EPI.ISL.414555
EPI.IS
L.4145
69
EPI.ISL.414623
EPI.ISL.412972
MT1637
16
EPI.ISL.414593
EPI.ISL.414641
MT093631
EPI.ISL.414545
EPI.ISL.416323
EPI.ISL.
415630
EPI.ISL.414940
EPI.ISL.
416031
EPI.ISL.408484
EPI.ISL.416367
EPI.ISL.410713
EPI.ISL.411218
EPI.ISL.414587
EPI.ISL.416362
EPI.ISL.416433
MT159705
EPI.ISL.410536
EPI.ISL.413572
EPI.ISL.413017
EPI.ISL.414523
EPI.ISL.410535
EPI.IS
L.4136
02
EPI.ISL.414468
EPI.ISL.416469
EPI.ISL.416429
EPI.ISL.415611
MT135043
EPI.ISL.413213
EPI.ISL.410718
EPI.ISL.413489
MT123290
CNA0007335
EPI.ISL.416454
MT121215
EPI.ISL.416471
EPI.ISL.415487
EPI.IS
L.4163
25
MT188339
EPI.IS
L.4129
74
EPI.ISL.416502
EPI.ISL.414559
EPI.ISL.414476
MT066175
EPI.ISL.415622
EPI.ISL.
416499
26
25
79
NM
EPI.ISL.415503
EPI.ISL.416046
EPI.ISL.410537
EPI.ISL.414687
EPI.ISL.412028
MT027064
EPI.ISL.416491
EPI.ISL.416444
EPI.ISL.406534
EPI.ISL.413647
EPI.ISL.414562
72
24
04.
LSI
.IP
E
MT184913
MN996527
EPI.ISL.413589
EPI.ISL.415624
EPI.ISL.416495
EPI.IS
L.4146
46
EPI.ISL.414692
EPI.ISL.416438
EPI.ISL.41
5706
EPI.ISL.414500
EPI.ISL.414551
EPI.ISL.4156
31
MT106052
CNA0007334
NMDC60013002-06
EPI.ISL.407894
EPI.IS
L.4144
57
EPI.ISL.407976
EPI.ISL.413558
MT184912
EPI.ISL.416439
EPI.ISL.41
6032
EPI.ISL.
413603
EPI.ISL.406536
EPI.IS
L.4112
20
EPI.ISL.
415652
EPI.ISL.416340
EPI.ISL.413513
EPI.ISL.414941
EPI.ISL.416355
EPI.ISL.416443
EPI.ISL.414642
EPI.ISL.408481
EPI.ISL.416324
EPI.ISL.416370
EPI.ISL.416335
LC528233
EPI.ISL.416496
39
56
04.
LSI
.IP
E
EPI.ISL.414549
EPI.ISL.412983
MT163717
EPI.ISL.407071
EPI.ISL.416365
EPI.ISL.415648
EPI.ISL.
412116
EPI.ISL.416044
99
44
14.
LSI
.IP
E
EPI.ISL.416511
EPI.ISL.410984
88
43
14.
LSI
.IP
E
EPI.ISL.416028
88
46
14.
LSI.I
PE
EPI.ISL.416435
MT0661
56
EPI.ISL.416141
MT159715
EPI.ISL.408482
EPI.ISL.415500
48
55
14.
LSI.I
PE
EPI.ISL.416467
EPI.ISL.410720
EPI.ISL.413023
EPI.ISL.415702
EPI.ISL.413594
EPI.ISL.415535
MT184910
EPI.ISL.415454
EPI.ISL.403932
EPI.IS
L.4156
42
EPI.ISL.416033
LC522973
EPI.ISL.415616
MT044257
06
011
4.L
SI.I
PE
EPI.ISL.407073
EPI.ISL.415660
MN996529
EPI.ISL.415649
EPI.ISL.413485
EPI.ISL.416366
EPI.ISL.408515
EPI.ISL.406596
EPI.ISL.413515
MN996530
MT019531
EPI.ISL.416437
MT093571
EPI.ISL.416337
EPI.ISL.416472
MT027062
EPI.ISL.414689
EPI.ISL.415617
EPI.ISL.
414007
EPI.ISL.406597
EPI.ISL.416432
EPI.ISL.406594
EPI.ISL.413021
EPI.ISL.416457
95
42
14.
LSI
.IP
E
EPI.ISL.41
3019
EPI.ISL.413584
EPI.ISL.414591
EPI.ISL.414628
EPI.ISL.413566
24
06
14.
LSI
.IP
E
EPI.ISL.41
5705
EPI.ISL.416332
EPI.IS
L.4145
28
EPI.ISL.416382
EPI.ISL.416339
EPI.ISL.416440
16
46
14.
LSI.I
PE
MT123292
LC522972
EPI.ISL.413593
EPI.ISL.416326
EPI.ISL.416445
MT159708
EPI.ISL.414012
EPI.ISL.415479
MT019530
MN988668
EPI.ISL.416318
MT159719
MT159722
EPI.IS
L.4156
43
EPI.ISL.416476
EPI.IS
L.4130
24
EPI.ISL.416140
EPI.ISL.416428
90
54
14.
LSI.I
PE
EPI.ISL.415155
EPI.ISL.415527
EPI.ISL.416506
EPI.ISL.4155
12
EPI.ISL.415594
MT188340
EPI.ISL.415105
EPI.ISL.412898
EPI.ISL.415591
EPI.ISL.415505
EPI.ISL.4164
81
EPI.ISL.
414009
EPI.ISL.404228
EPI.ISL.414592
EPI.ISL.
415651
EPI.IS
L.4120
29
EPI.ISL.415475
EPI.ISL.414629
EPI.IS
L.4164
80
EPI.ISL.411066
EPI.ISL.413596
MT049951
EPI.ISL.416405
50
54
14.
LSI.I
PE
EPI.ISL.416316
EPI.ISL.416372
EPI.ISL.416034
EPI.IS
L.4145
17
EPI.ISL.414684
EPI.IS
L.4165
03
EPI.ISL.415462
EPI.ISL.408480
EPI.ISL.414577
EPI.ISL.416475
EPI.ISL.415600
EPI.ISL.413597
EPI.IS
L.4146
48
EPI.ISL.415541
MT123291
NMDC60013002-10
EPI.ISL.415460
MT159706
EPI.ISL.415506
EPI.ISL.414552
EPI.ISL.413579
EPI.IS
L.4156
55
MT184911
EPI.ISL.
415478
EPI.ISL.407988
66
46
14.
LSI.I
PE
EPI.ISL.414558
EPI.ISL.416047
EPI.ISL.415461
EPI.ISL.415741
EPI.ISL.406862
EPI.ISL.415629
EPI.ISL.408431
EPI.ISL.406531
MT050493
EPI.ISL.415510
EPI.ISL.412980
MN997409
EPI.ISL.416470
EPI.ISL.416478
EPI.ISL.414616
EPI.ISL.412869
EPI.ISL.403934
EPI.ISL.416409
13
93
14.
LSI.I
PE
EPI.ISL.416319
EPI.ISL.416321
EPI.ISL.411927
EPI.ISL.416029
EPI.ISL.414520
EPI.ISL.414008
EPI.ISL.416474
EPI.ISL.415482
EPI.ISL.416509
EPI.ISL.416447
EPI.ISL.415459
EPI.ISL.412871
75
65
14.
LSI.I
PE
EPI.ISL.416505
EPI.ISL.406538
LC528232
EPI.ISL.414435
EPI.ISL.416401
EPI.ISL.415708
EPI.ISL.416387
EPI.ISL.416341
EPI.ISL.413858
EPI.ISL.416399
EPI.ISL.415472
59
45
14.
LSI.I
PE
EPI.ISL.414643
MT159709
EPI.ISL.414452
MT118835
MT163720
EPI.IS
L.4140
21
EPI.ISL.41
3016
EPI.ISL.416407
EPI.ISL.415159
MN908947_2019/12/30_China/Hubei/Wuhan
MN938384_2020/01/10_China/Guangdong/Shenzhen ne
hzn
eh
S/g
no
dg
na
uG/
ani
hC
_11
/1
0/0
20
2_
26
25
79
NM
ytn
uo
Chsi
mo
ho
nS/
not
gni
hs
aW/
set
atS
deti
nU
_9
1/1
0/0
20
2_
52
35
89
NM
MN988713_2020/01/21_UnitedStates/Illinois/Chicago
MN994467_2020/01/23_UnitedStates/California/LosAngeles
MN994468_2020/01/22_UnitedStates/California/OrangeCounty
MN997409_2020/01/22_UnitedStates/Arizona/Phoenix
MN988668_2020/01/02_China/Hubei/Wuhan
CNA0007332_2019/12/26_China/Hubei/Wuhan
CNA0007334_2020/01
/01_China/Hubei/Wu
han
CNA0007335_2020/01/05_China/Hubei/Wuhan
MT007544_2020/01/25_Australia/Victoria/Clayton
NMDC60013002-06_2019/12/30_C
hina/Hubei/Wuhan
na
hu
W/ie
bu
H/a
nih
C_
70/
10/
02
02
_7
0-2
00
31
00
6C
DM
N
NMDC60013002-08_2019/12/30_China/Hubei/Wuhan
NMDC60013002-09_2020/01/01_China/Hubei/Wuhan
NMDC60013002-10_2019/12/30_China/Hubei/WuhanLC521925_2020/01/25_Japan/Aichi
MT027062_2020/01/29_UnitedStates/California
MT027064_2020/01/29_UnitedStates/California
MT066175_2020/02/01_China/Taiwan/Taipei
oyk
oT/
na
paJ
_9
2/1
0/0
20
2_
37
92
25
CLLC522974_2020/01/31_Japan/Tokyo
LC522975_2020/01/31_Japan/Tokyo
LC522972_2020/01/29_Japan/Kyoto
MT039887_2020/01/31_UnitedStates/Wisconsin
MT044258_2020/01/27_UnitedStates/California
MT044257_2020/01/28_UnitedStates/Illinois
MT049951_2020/01/17_China/Yunnan
MT066176_2020/02/05_China/Taiwan/Taipei
MT072688_2020/01/13_Nepal/Kathmandu
MT093631_2020/01/08_China
MT093571_2020/02/07_Sweden
MT106052_2020/02/06_UnitedStates/California
MT106053_2020/02/10_UnitedStates
/California
sa
xe
T/s
etat
Sd
eti
nU
_11
/2
0/0
20
2_
45
06
01
TM
MT118835_2020/02/23_UnitedStates/California/Solano
MT123291_2020/01/29_China/Guangdong/Guangzhou
MT123290_2020/02/05_China/Guangdong/Guangzhou
LC528233_2020/02/10_Japan/KanagawaLC528232_2020/02/10_Japan/Kanagawa
MT152824_2020/02/24_UnitedStates/Washington/SnohomishCounty
MT126808_2
020/02/28_
Brazil
MT1637
16_202
0/02/2
7_Unit
edStat
es/Was
hingto
n
MT135041_2020/01/26_China/Beijing
MT135043_2020/01/28_China/Beijing
MT163717_2020/02/28_UnitedStates/Washington
MT163718_2020/02/29_UnitedStates/Washington
MT163719_2020/03/01_UnitedStates/Washington
MT163720_2020/03/01_UnitedStates/Washington
MT050493_2020/01/31_India/KeralaState
MT012098_2020/01/27_India/KeralaState
MT159717_2020/02/17_UnitedStates
MT159718_2020/02/18_UnitedStates
MT159719_2020/02/18_UnitedStates
MT159720_2020/02/21_UnitedStates
MT159722_2020/02/21_UnitedStates
MT159705_2020/02/17_UnitedStates
MT159706_2020/02/17_UnitedStates
MT159707_2020/02/17_UnitedStates
MT159708_2020/02/17_UnitedStates
MT159709_2020/02/20_UnitedStates
MT159712_2020/02/25_UnitedStates
MT159715_2020/02/24_UnitedStates
MT159716_2020/02/24_UnitedStates
MT123292_2020/01/27_China/Guangdong/Guangzhou
MT123293_2020/01/29_China/Guangdong/Guangzhou
MT121215_2020/02/02_China/Shanghai
MT0661
56_202
0/01/3
0_Ital
y
MT184908_2020/02/21_UnitedStates
MT184910_2020/02/18_UnitedStates
MT184911_2020/02/17_UnitedStatesMT184912_2020/02/17_UnitedStates
MT184913_2020/02/24_UnitedStates
MT188339_2020/03/09_UnitedStates/MN
MT188340_2020/03/07_UnitedStates/MN
MT188341_2020/03/05_UnitedStates/MN
MN996527_2019/12/30_Ch
ina/Hubei/Wuhan
MN996529_2019/12/30_China/Hubei/Wuhan
na
hu
W/ie
bu
H/a
nih
C_
03/
21/
91
02
_0
35
69
9N
M
MN996531_2019/12/30_China/Hubei/Wuhan
MT019529_2019/12/23_China/Hubei/Wuhan
MT019530_2019/12/30_China/Hubei/Wuhan
MT019531_2019/12/30_China/Hubei/Wuhan
MT019532_2019/12/30_China/Hubei/Wuhan
MT019533_2020/01/01_China/Hubei/Wuhan
uo
hz
gn
aH/
gn
aije
hZ/
ani
hC
_0
2/1
0/0
20
2_
37
89
30
TM
na
hu
W/ie
bu
H/a
nih
C/ai
sA
_0
3/2
1/9
10
2_
23
12
04
_L
SI_I
PE
EPI_ISL_403932_2020/01/14_Asia/China/Guandong/Shenzhen
EPI_ISL_403934_202
0/01/15_Asia/China
/Guandong/Shenzhen
EPI_ISL_403936_2020/01/17_Asia/China/Guangdong/Zhuhai
iru
ba
htn
oN/
dn
alia
hT/
ais
A_
80/
10/
02
02
_2
69
30
4_
LSI
_IP
E
EPI_ISL_403963_2020/01/13_Asia/Thailand/Nonthaburi
EPI_ISL_404227_2020/01/16_Asia/China/Zhejiang
EPI_ISL_404228_2020/01/17_Asia/China/Zhejiang
EPI_IS
L_4060
31_202
0/01/2
3_Asia
/Taiwa
n/Kaoh
siung
EPI_ISL_406531_2020/01/22_Asia/China/Guangdong/Zhuhai
EPI_ISL_406533_2020/01/22_Asia/China/Guangdong/Guangzhou
EPI_ISL_406534_2020/01/22_Asia/China/Guangdong/Foshan
EPI_ISL_406535_2020/01/22_Asia/China/Guangdong/Foshan
EPI_ISL_406536_2020/01/22_Asia/China/Guangdong/Foshan
EPI_ISL_406538_2020/01/23_Asia/China/Guangdong
EPI_ISL_406592_2020/01/13_Asia/China/Guandong/Shenzhen
ne
hzn
eh
S/g
no
dn
au
G/a
nih
C/ai
sA
_3
1/1
0/0
20
2_
39
56
04
_L
SI_I
PE
EPI_ISL_406594_2020/01/16_Asia/China/Guandong/Shenzhen
EPI_ISL_406595_2020/
01/16_Asia/China/Gua
ndong/Shenzhen
EPI_ISL_406596_2020/01/23_Europe/France/Ile-de-France/Paris
EPI_ISL_406597_2020/01/23_Europe/France/Ile-de-France/Paris
EPI_ISL_406862_2020/01/28_Europe/Germany/Bavaria/Munich
EPI_ISL_406973_2020/01/23_Asia/Singapore
EPI_ISL_407071_2020/01/29_Europe/UnitedKingdom/England
EPI_ISL_407073_2020/01/29_Europe/UnitedKingdom/England
EPI_ISL_407193_2020/01/25_Asia/SouthKorea/Gyeonggi-do
EPI_ISL_407313_2020/01/19_Asia/China/Zhejiang/Hangzhou
EPI_ISL_407893_2020/01/24_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_407894_2020/01/28_Oceania/Australia/Queensland/GoldCoast
EPI_ISL_407896_2020/01/30_Oceania/Australia/Queensland/GoldCoast
EPI_ISL_407976_2020/02/03_Europe/Belgium/Leuven
EPI_ISL_407987_2020/01/25_Asia/Singapore
EPI_ISL_407988_2020/02/01_Asia/Singapore
EPI_ISL_408430_2020/01/29_Europe/France/Ile-de-France/Paris
EPI_ISL_408431_2020/01/29_Europe/France/Ile-de-France/Paris
na
uh
cg
no
Y/q
niq
gn
oh
C/a
nih
C/ai
sA
_1
2/1
0/0
20
2_
87
48
04
_L
SI_I
PE
EPI_ISL_408479_2020/01/23_Asia/China/Chongqing/Zhongxian
EPI_ISL_408480_2020/01/17_Asia/China/Yunnan/Kunming
EPI_ISL_408481_2020/01/18_Asia/China/Chongqing
EPI_ISL_408482_2020/01/19_Asia/China/Shandong/Qingdao
EPI_ISL_408484_2020/01/15_Asia/China/Sichuan/Chengdu
EPI_ISL_408485_2020/01/18_Asia/China/Beijing
EPI_ISL_408486_2020/01/11_Asia/China/J
iangxi/Pingxiang
EPI_ISL_408488_2020/01/19_Asia/China/Jiangsu/Huaian
EPI_ISL_408514_2020/01/01_Asia/China/Hubei/Wuhan
EPI_ISL_408515_2020/01/01_Asia/China/Hubei/Wuhan
EPI_ISL_408668_2020/01/24_Asia/Vietnam/ThanhHoa
EPI_ISL_408976_2020/01/22_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_408977_2020/01/25_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_409067_2020/01/29_NorthAmerica/USA/Massachusetts
EPI_ISL_410531_2020/01/25_Asia/Japan/Nara
EPI_ISL_410535_2020/02/03_Asia/Singapore
EPI_ISL_410536_2020/02/06_Asia/Singapore
EPI_ISL_410537_2020/02/09_Asia/Singapore
EPI_IS
L_4105
46_202
0/01/3
1_Euro
pe/Ita
ly/Rom
e
EPI_ISL_410713_2020/01/27_Asia/Singapore
EPI_ISL_410714_2020/02/03_Asia/Singapore
EPI_ISL_410715_2020/02/04_Asia/Singapore
EPI_ISL_410716_2020/02/04_Asia/Singapore
EPI_ISL_410717_2020/02/05_Oceania/Australia/Queensland/GoldCoast
EPI_ISL_410718_2020/02/05_Oceania/Australia/Queensland/GoldCoast
EPI_ISL_410719_2020/02/02_Asia/Singapore
EPI_ISL_410720_2020/01/23_Europe/France/Ile-de-France/Paris
EPI_ISL_410984_2020/01/29_Europe/France/Ile-de-France/Paris
naij
uF/
ani
hC/
ais
A_
12/
10/
02
02
_0
60
114
_L
SI_I
PE
EPI_ISL_411066_2020/01/22_Asia/China/Fujian
EPI_ISL_411218_2020/02/02_Europe/France/Ile-de-France/Paris
EPI_ISL_411
219_2020/01/28_Europe/France/Ile-de-France/Paris
EPI_IS
L_4112
20_202
0/01/2
8_Euro
pe/Fra
nce/Il
e-de-F
rance/
Paris
EPI_ISL_411902_2020/01/27_Asia/Cambodia/Sihanoukville
EPI_ISL_411915_2020/01/25_Asia/Taiwan/Taoyuan
EPI_ISL_411926_2020/01/24_Asia/Taiwan/Taipei
EPI_ISL_411927_2020/01/28_Asia/Taiwan/Taipei
EPI_ISL_411950_2020/01/23_Asia/China/Jiangsu
EPI_ISL_411952_2020/01/24_Asia/China/Jiangsu
EPI_ISL_412026_2020/02/23_Asia/China/Anhui/Hefei
EPI_ISL_412028_2020/01/22_Asia/HongKong
EPI_IS
L_4120
29_202
0/01/3
0_Asia
/HongK
ong
EPI_ISL_412030_2020/02/01_Asia/HongKong
EPI_ISL_
412116_2
020/02/0
9_Europe
/UnitedK
ingdom/E
ngland
uo
hzg
niJ/i
eb
uH/
ani
hC/
ais
A_
80/
10/
02
02
_9
54
21
4_
LSI
_IP
E
EPI_ISL_412869_2020/01/30_Asia/SouthKorea/Seoul
EPI_ISL_412870_2020/01/30_Asia/SouthKorea/Seoul
EPI_ISL_412871_2020/01/31_Asia/SouthKorea/Seoul
EPI_ISL_412872_2020/02/01_Asia/SouthKorea/Gyeonggi-do
EPI_ISL_412873_2020/02/06_Asia/SouthKorea/Chungcheongnam-do
EPI_ISL_412898_2019/12/3
0_Asia/China/Hubei/Wuhan
EPI_ISL_412912_2020/02/25_Europe/Germany/Baden-Wuerttemberg
EPI_ISL_412972_2020/02/27_NorthAmerica/Mexico/MexicoCity
EPI_ISL_412973_2
020/02/20_Europe
/Italy/Lombardy
EPI_IS
L_4129
74_202
0/01/2
9_Euro
pe/Ita
ly/Rom
e
EPI_ISL_412975_2020/02/28_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_412978_2020/01/17_Asia/China/Hubei/Wuhan
EPI_ISL_412979_2020/01/18_Asia/China/Hubei/Wuhan
EPI_ISL_412980_2020/01/18_Asia/China/Hubei/Wuhan
EPI_ISL_412981_2020/01/18_Asia/China/Hubei/Wuhan
EPI_ISL_412982_2020/02/07_Asia/China/Hubei/Wuhan
EPI_ISL_412983_2020/02/08_Asia/China/Hubei/Tianmen
EPI_ISL_413014_2020/01/25_NorthAmerica/Canada/Ontario
EPI_ISL_413015_2020/01/23_NorthAmerica/Canada/Ontario
EPI_ISL_41
3016_2020/
02/28_Sout
hAmerica/B
razil/SaoP
aulo/SaoPa
ulo
EPI_ISL_413017_2020/02/06_Asia/SouthKorea
EPI_ISL_413018_2020/02/06_Asia/SouthKorea
EPI_ISL_41
3019_2020/
02/26_Euro
pe/Switzer
land/Zuric
h
EPI_ISL_413020_2020/02/27_Europe/Switzerland/Zurich
EPI_ISL_413021_2020/02/29_Europe/Switzerland/Zurich
EPI_ISL_413022_2020/02/29_Europe/Switzerland/Zurich
EPI_ISL_413023_2020/02/29_Europe/Switzerland/Zurich
EPI_IS
L_4130
24_202
0/02/2
9_Euro
pe/Swi
tzerla
nd/Zur
ich
EPI_ISL_413213_2020/02/29_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_413214_2020/02/29_Oceania/Australia/NewSouthWales/Sydney
EPI_ISL_413456_2020/02/20_NorthAmerica/USA/Washington/KingCounty
EPI_ISL_413459_2020/02/20_Asia/Japan/Tokyo
EPI_ISL_413485_2020/01/24_Asia/China/Anhui/Suzhou
tcir
tsi
Dgr
ebs
nie
H/ail
ah
pts
eW
eni
hR
htr
oN/
yn
amr
eG/
ep
oru
E_
82/
20/
02
02
_8
84
31
4_
LSI
_IP
E
EPI_ISL_413489_202
0/03/03_Europe/Ita
ly/Lombardy/Milan
EPI_ISL_413490_2020/02/27_Oceania/NewZealand/Auckland
EPI_ISL_413513_2020/02/27_Asia/SouthKorea
EPI_ISL_413515_2020/02/27_Asia/SouthKorea
EPI_ISL_413555_2020/02/27_Europe/UnitedKingdom/Wales
EPI_ISL_413558_2020/02/27_NorthAmerica/USA/California/SolanoCounty
EPI_ISL_413559_2020/02/27_NorthAmerica/USA/California/SolanoCounty
EPI_ISL_413560_2020/02/28_NorthAmerica/USA/Washington
EPI_ISL_413563_2020/03/03_NorthAmerica/USA/Washington
EPI_ISL_413566_2020/03/02_Europe/Netherlands/Blaricum
EPI_ISL_413572_2020/03/01_Europe/Netherlands/Haarlem
EPI_ISL_413573_2020/03/02_Europe/Netherlands/HardinxveldGiessendam
EPI_ISL_413577_2020/03/02_Europe/Netherlands/Naarden
EPI_ISL_413579_2020/03/03_Europe/Netherlands/Nootdorp
EPI_ISL_413580_2020/03/02_Europe/Netherlands/Oisterwijk
EPI_ISL_413582_2020/03/01_Europe/Netherlands/Rotterdam
EPI_ISL_413584_2020/03/03_Europe/Netherlands/Rotterdam
EPI_ISL_413589_2020/03/01_Europe/Netherlands/Utrecht
EPI_ISL_413592_2020/03/02_
Asia/Taiwan/Taipei
EPI_ISL_413593_2020/02/29_Europe/Luxembourg
EPI_ISL_413594_2020/02/28_Oceania/Australia/NSW/Sydney
EPI_ISL_413595_2020/02/28_Oceania/Australia/NSW/Sydney
EPI_ISL_413596_2020/02/28_Oceania/Australia/NSW/Sydney
EPI_ISL_413597_2020/03/02_Oceania/Australia/NSW/Sydney
EPI_ISL_413599_2020/03/04_Oceania/Australia/NSW/Sydney
EPI_ISL_413600_2020/03/03_Oceania/Australia/NSW/Sydney
EPI_IS
L_4136
02_202
0/03/0
3_Euro
pe/Fin
land/H
elsink
i
EPI_ISL_
413603_2
020/03/0
3_Europe
/Finland
/Helsink
i
EPI_ISL_413604_2020/03/03_Europe/Finland/Helsinki
EPI_ISL_413647_2020/03/01_Europe/Portugal
EPI_ISL_
413648_2
020/03/0
1_Europe
/Portuga
l
EPI_ISL_413650_2020/03/05_NorthAmerica/USA/Washington
EPI_ISL_413651_2020/03/05_NorthAmerica/USA/Washington
EPI_ISL_413653_2020/03/05_NorthAmerica/USA/Washington
EPI_ISL_413853_2020/01/30_Asia/China/Guangdong
EPI_ISL_413854_2020/01/30_Asia/China/Guangdong
EPI_ISL_413857_2020/01/31_Asia/China/GuangdongEPI_ISL_413858_2020/01/30_Asia/China/Guangdong
EPI_ISL_413860_2020/01/30_Asia/China/Guangdong
EPI_ISL_413861_2020/02/01_Asia/China/Guangdong
EPI_ISL_413863_2020/02/01_Asia/China/Guangdong
EPI_ISL_413864_2020/02/09_Asia/China/Guangdong
ocsic
nar
Fn
aS/
ainr
ofila
C/A
SU/
acir
em
Ahtr
oN
_5
0/3
0/0
20
2_
52
93
14
_L
SI_I
PE
oc
sic
nar
Fn
aS/
ainr
ofila
C/A
SU/
acire
mA
htro
N_
50/
30/
02
02
_8
29
31
4_
LSI
_IP
Eoc
sic
nar
Fn
aS/
ainr
ofila
C/A
SU/
acire
mA
htro
N_
50/
30/
02
02
_1
39
31
4_
LSI
_IP
E
EPI_ISL_413996_2
020/02/24_Europe
/Switzerland/Tes
sin
EPI_IS
L_4139
97_202
0/02/2
6_Euro
pe/Swi
tzerla
nd/Gen
eva
EPI_IS
L_4139
99_202
0/02/2
7_Euro
pe/Swi
tzerla
nd/Arg
ovie
EPI_ISL_
414005_2
020/02/2
5_Europe
/UnitedK
ingdom/E
ngland
EPI_ISL_
414006_2
020/02/2
8_Europe
/UnitedK
ingdom/E
ngland
EPI_ISL_
414007_2
020/02/2
8_Europe
/UnitedK
ingdom/E
ngland
EPI_ISL_414008_2020/02/27_Europe/UnitedKingdom/England
EPI_ISL_
414009_2
020/02/2
5_Europe
/UnitedK
ingdom/E
ngland
EPI_ISL_41
4011_2020/
02/26_Euro
pe/UnitedK
ingdom/Eng
land
EPI_ISL_414012_2020/02/27_Europe/UnitedKingdom/England
EPI_ISL_4140
19_2020/02/2
7_Europe/Swi
tzerland/Gen
eva
EPI_ISL_4140
20_2020/02/2
7_Europe/Swi
tzerland/Gen
eva
EPI_IS
L_4140
21_202
0/02/2
7_Euro
pe/Swi
tzerla
nd/Bas
el
EPI_IS
L_4140
22_202
0/02/2
7_Euro
pe/Swi
tzerla
nd/Gen
eva
EPI_ISL_414023_2020/03/01_Europe/Switzerland/Vaud
EPI_IS
L_4140
45_202
0/03/0
4_Sout
hAmeri
ca/Bra
zil/Ri
odeJan
eiro/R
iodeJa
neiro
EPI_ISL_414428_2020/03/02_Europe/Netherlands/NoordBrabant
EPI_ISL_414429_2020/03/02_Europe/Netherlands/NoordBrabant
EPI_ISL_414435_2020/03/03_Europe/Netherlands/Utrecht
EPI_IS
L_4144
45_202
0/03/0
3_Euro
pe/Net
herlan
ds/Zui
dHolla
nd
EPI_ISL_41
4451_2020/
03/06_Euro
pe/Netherl
ands/Noord
Brabant
EPI_ISL_414452_2020/03/06_Europe/Netherlands/NoordBrabant
EPI_IS
L_4144
57_202
0/03/0
6_Euro
pe/Net
herlan
ds/Noo
rdBrab
ant
EPI_ISL_414467_2020/03/05_Europe/Netherlands/ZuidHolland
EPI_ISL_414468
_2020/03/06_Eu
rope/Netherlan
ds/ZuidHolland
EPI_ISL_414470_2020/03/06_Europe/Netherlands/ZuidHolland
EPI_ISL_414476_2020/02/29_NorthAmerica/USA/NewYork
EPI_ISL_414487_2020/03/04_Europe/Ireland/Cork t
cirtsiD
gre
bsni
eH/
aila
hpts
eW
eni
hR
htro
N/yn
amr
eG/
ep
oru
E_
52/
20/
02
02
_7
94
41
4_
LSI
_IP
E tci
rtsi
Dgr
eb
sni
eH/
aila
hpt
se
We
nih
Rht
ro
N/y
na
mre
G/e
por
uE
_6
2/2
0/0
20
2_
99
44
14
_L
SI_I
PE
EPI_ISL_414500_2020/03/04_Europe/UnitedKingdom/England/Yorkshire/Sheffield
tcirtsiD
gre
bsni
eH/
aila
hpts
eW
eni
hR
htro
N/yn
amr
eG/
ep
oru
E_
72/
20/
02
02
_5
05
41
4_
LSI
_IP
Etcirt
siD
gre
bsni
eH/
aila
hpts
eW
eni
hR
htro
N/yn
amr
eG/
ep
oru
E_
82/
20/
02
02
_9
05
41
4_
LSI
_IP
E
EPI_IS
L_4145
17_202
0/02/2
5_Asia
/HongK
ong
EPI_ISL_414520_2020/03/02_Europe/Germany/Munich
hci
nu
M/y
na
mre
G/e
por
uE
_2
0/3
0/0
20
2_
12
54
14
_L
SI_I
PE
EPI_ISL_414522_2020/02/28_Europe/UnitedKingdom/England
EPI_ISL_414523_2020/02/27_Europe/UnitedKingdom/England
EPI_IS
L_4145
24_202
0/03/0
1_Euro
pe/Uni
tedKin
gdom/E
ngland
EPI_ISL_
414525_2
020/03/0
2_Europe
/Unit