23
1 Pangolin homology associated with 2019-nCoV 1 Tao Zhang 1* , Qunfu Wu 1* , Zhigang Zhang 1# 2 1 State key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, 3 School of Life Sciences, Yunnan University, Kunming, Yunnan, 650091, China 4 * These authors contributed equally 5 # Correspondence to: [email protected] (Z.Z.G.) 6 . CC-BY-NC-ND 4.0 International license preprint (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this this version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253 doi: bioRxiv preprint

Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

1

Pangolin homology associated with 2019-nCoV 1

Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 2

1State key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, 3

School of Life Sciences, Yunnan University, Kunming, Yunnan, 650091, China 4

*These authors contributed equally 5

#Correspondence to: [email protected] (Z.Z.G.) 6

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 2: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

2

Abstract 1

To explore potential intermediate host of a novel coronavirus is vital to rapidly control 2

continuous COVID-19 spread. We found genomic and evolutionary evidences of the 3

occurrence of 2019-nCoV-like coronavirus (named as Pangolin-CoV) from dead Malayan 4

Pangolins. Pangolin-CoV is 91.02% and 90.55% identical at the whole genome level to 5

2019-nCoV and BatCoV RaTG13, respectively. Pangolin-CoV is the lowest common 6

ancestor of 2019-nCoV and RaTG13. The S1 protein of Pangolin-CoV is much more 7

closely related to 2019-nCoV than RaTG13. Five key amino-acid residues involved in the 8

interaction with human ACE2 are completely consistent between Pangolin-CoV and 2019-9

nCoV but four amino-acid mutations occur in RaTG13. It indicates Pangolin-CoV has 10

similar pathogenic potential to 2019-nCoV, and would be helpful to trace the origin and 11

probable intermediate host of 2019-nCoV.12

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 3: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

3

In the late December of 2019, an epidemic pneumonia (the W.H.O. announced it 1

recently as Corona Virus Disease (COVID-19)(1)) outbreak in city of Wuhan in China and 2

soon widely spreads all over the world. According to authoritative statistics, the COVID-3

19 has caused more than 40,000 laboratory-confirmed infections with more than 1000 4

deaths by 12 February 2020 and is still increasing. It was caused by a novel identified 5

coronavirus 2019-nCoV (the International Committee on Taxonomy of Viruses (ICTV) 6

renamed this virus as severe acute respiratory syndrome coronavirus 2, SARS-CoV-2 (1)). 7

Released complete genomes of 2019-nCoVs (2, 3) have helped rapid identification and 8

diagnosis of the COVID-19. Another key task is to find potential origin of 2019-nCoV(4). 9

Unsurprisingly, like SARS-CoV and MERS-CoV(5), the bat is still a probable origin of the 10

2019-nCoV because the 2019-nCoV shared 96% whole genome identity with a bat 11

coronavirus Bat-CoV-RaTG13 from Rhinolophus affinis from Yunnan Province(2). 12

However, SARS-CoV and MERS-CoV usually pass into medium host like civets or camels 13

before leaping to humans(4). It indicates that the 2019n-Cov was probably transmitted to 14

humans by some other animals. Considering the earliest COVID-19 patient reported no 15

exposure at the seafood market(6), finding intermediate host of 2019-nCoV is vital to block 16

its transmission. 17

On 24 October 2019, Liu and his colleagues from the Guangdong Wildlife Rescue 18

Center of China (7) firstly detected the existence of SARS-liked coronavirus from lung 19

samples of two dead Malayan Pangolins with a frothy liquid in lung and pulmonary fibrosis, 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 4: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

4

which is close on the outbreak of COVID-19. From their published results, all virus contigs 1

assembled from 2 lung samples (lung07, lung08) showed not high identities ranging from 2

80.24% to 88.93% with known SARS coronavirus. Hence, we conjectured that dead 3

Malayan pangolin may carry a new coronavirus close to 2019-nCoV. 4

To confirm our assumption, we downloaded raw RNA-seq data (SRA accession 5

number PRJNA573298) of those two lung samples from SRA and conducted consistent 6

quality control and contamination removing as described by Liu’s study(7). We found 1882 7

clean reads from lung08 sample mapped upon 2019-nCoV reference genome (GenBank 8

Accession MN908947)(3) with high genome coverage of 76.02%. We performed de novo 9

assembly of those reads and totally obtained 36 contigs with length ranging from 287bp to 10

2187bp with mean length of 700bp. Blasting against proteins from 2845 coronavirus 11

reference genomes including RaTG13, 2019-nCoVs and other known coronaviruses, we 12

found 22 contigs can be best matched to 2019-nCoVs (70.6%-100% aa identity; average: 13

95.41%) and 12 contigs matched to Bat SARS-like coronavirus (92.7%-100% aa identity; 14

average: 97.48%) (Table S1). These results indicate Malayan pangolin indeed carries a 15

novel coronavirus (here named as Pangolin-CoV) close to 2019-nCoV. 16

Using reference-guided scaffolding approach, we created Pangolin-CoV draft 17

genome (19,587 bp) based on the above 34 contigs. Remapping 1882 reads against the draft 18

genome resulted in 99.99% genome coverage at a mean depth of 7.71 X (range: 1X-47X) 19

(Figure 1A). Based on Simplot analysis, Pangolin-Cov showed highly overall genome 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 5: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

5

sequence identity throughout the genomes to RaTG13 (90.55%) and 2019-nCoV (91.02%) 1

(Figure 1B), although there is greatly high identity 96.2% between 2019-nCoV and 2

RaTG13(3). Another SARS-like coronavirus more similar to Pangolin-Cov were Bat 3

SARSr-CoV ZXC21(85.65%) and Bat SARSr-CoV ZC45 (85.01%). These results indicate 4

Pangolin-Cov may be the common origin of 2019-nCoV and RaTG13. 5

The viral genome organization of Pangolin-Cov was characterized by sequence 6

alignment against 2019-nCoV (GenBank Accession MN908947) and RaTG13. The 7

Pangolin-Cov genome consists of six major open reading frames (ORFs) common to 8

coronaviruses and other four accessory genes (Figure 1C and Table S2). Further analysis 9

indicates that Pangolin-Cov genes covered 2019-nCoV genes with coverage ranging from 10

45.8% to 100% (average coverage 76.9%). Pangolin-Cov genes shared high average 11

nucleotide and amino identity with both 2019-nCoV (MN908947) (93.2% nt/ 94.1% aa) 12

and RaTG13 (92.8% nt/93.5% aa) (Figure 1C and Table S2). Surprisingly, some of 13

Pangolin-Cov genes showed higher aa sequence identity to 2019-nCoV than RaTG13, 14

including orf1b (73.4/72.8), S-protein (97.5/95.4), orf7a (96.9/93.6), and orf10 (97.3/94.6). 15

High S-protein amino acid identity implies function similarity between Pangolin-Cov and 16

2019-nCoV. 17

To determine the evolutionary relationships among Pangolin-Cov, 2019-nCoV and 18

previously identified coronaviruses, we estimated phylogenetic trees based on the 19

nucleotide sequences of the whole genome sequence, RNA-dependent RNA polymerase 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 6: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

6

gene (RdRp), non-structural protein genes ORF1a and 1b, and the main structural proteins 1

encoded by the S and M genes. In all phylogenies, Pangolin-CoV, RaTG13 and 2019-nCoV 2

were clustered into a well-supported group, here named as “SARS-CoV-2 group” (Figure 3

2 and Figures S1 to S2). This group represents a novel Beta-coronaviruses group. Within 4

this group, RaTG13 and 2019-nCoV was grouped together, and the Pangolin-CoV was 5

their lowest common ancestor. However, whether the basal position of the SARS-CoV-2 6

group is SARSr-CoV ZXC21 and/or SARSr-CoV ZC45 or not is still in debate. Such 7

debate also occurred in both Wu et al.(3) and Zhou et al. (2)studies. Possible explanation 8

is due to a past history of recombination in Beta-CoV group(3). It is noteworthy that our 9

discovered evolutionary relationships of coronaviruses shown by the whole genome, RdRp 10

gene, and S-gene were highly consistent with that discovered by complete genome 11

information in Zhou et al. study(2). It indicates our Pangolin-CoV draft genome has enough 12

genomic information to trace true evolutionary position of Pangolin-CoV in coronaviruses. 13

The coronavirus spike (S) protein consisting of 2 subunits (S1 and S2) mediates 14

infection of receptor-expressing host cells and is a critical target for antiviral neutralizing 15

antibodies. S1 contains a receptor binding domain (RBD) about 193 amino acid fragment, 16

which is responsible for recognizing and binding with the cell surface receptor(8, 9). Zhou 17

et al. had experimentally confirmed that 2019-nCoV is able to use human, Chinese 18

horseshoe bats, civet, and pig ACE2 as an entry receptor in the ACE2-expressing cells(2), 19

suggesting the RBD of 2019-nCoV mediates infection to human and other animals. To gain 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 7: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

7

a sequence-level insight into understanding pathogen potential of Pangolin-Cov, we 1

investigated the amino acid variation pattern of S1 protein from Pangolin-CoV, 2019-2

nCoV, RaTG13, and other representative SARS-CoVs. The amino acid phylogenetic tree 3

showed the S1 protein of Pangolin-CoV is more closely related to that of 2019-CoV than 4

RaTG13. Within the RBD, we further found Pangolin-CoV and 2019-nCOV was highly 5

conserved with only one amino acid change (500H/500Q) (Figure 3) but not belonging to 6

five key residues involved in the interaction with human ACE2(2, 9). In contrast, RaTG13 7

has 17 amino acid residue changes and 4 of them belonged to key amino acid residue 8

(Figure 3). These results indicate Pangolin-Cov has similar pathogen potential to 2019-9

nCoV. 10

The nucleocapsid protein (N-protein) is the most abundant protein in coronavirus. 11

The N-protein is a highly immunogenic phosphoprotein, and it is normally very conserved. 12

The N protein of coronavirus is often used as a marker in diagnostic assays. To gain a 13

further insight into the diagnostic potential for Pangolin-Cov, we investigated the amino 14

acid variation pattern of N-protein from Pangolin-CoV, 2019-nCoV, RaTG13, and other 15

representative SARS-CoV. Phylogenetic analysis based on N protein supports Pangolin-16

Cov as a sister taxon of 2019-nCoV and RaTG13 (Figure 4). We further found seven amino 17

acid mutations can differentiate our defined “SAR-CoV-2 group” (12N, 26 G, 27S, 104D, 18

218A, 335T, 346N, 350Q) from other known SARS-CoVs (12S, 26D, 27N, 104E, 218T, 19

335H, 346Q, 350N). Two amino acid sites (38P and 268Q) are shared by Pangolin-Cov, 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 8: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

8

RaTG13 and SARS-CoVs, which are mutated as 38S and 268A in 2019-nCoV. Only one 1

amino acid residue shared by Pangolin-CoV and other SARS-CoVs (129E) is consistently 2

changed in both 2019-nCoV and RaTG13 (129D). Our observed amino acid changes in N-3

protein would be useful for developing antigen for much more sensitive serological 4

detection of 2019-nCoV. 5

Based on published metagenomic data, this study provides the first report on a 6

potential closely related kin (Pangolin-CoV) of 2019-nCoV, which was discovered from 7

dead Malayan Pangolins after extensive rescue efforts. Aside from RaTG13, the Pangolin-8

CoV is the most closely related to 2019-nCoV. Due to original sample unavailable, we did 9

not perform further experiments to confirm our findings, including PCR validation, 10

serological detection, and even the isolation of virus particle etc. However, on 7 February, 11

researchers from the South China Agricultural University in Guangzhou reported pangolin 12

would be the potential candidate host of 2019-nCoV for isolating a virus 99% similar to 13

2019-nCoV in genome (Data unpublished). Our discovered Pangolin-CoV genome showed 14

91.02% nt identity with 2019-nCoV, implying Pangolin-CoV could be different from that 15

unpublished. Whether pangolin species is a good candidate for 2019-nCoV still need to be 16

further investigated. Considering the wide spread of SARSr-CoV in their natural reservoirs, 17

our findings would be meaningful to find novel intermediate hosts of 2019-nCoV for 18

blocking interspecies transmission. 19

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 9: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

9

Materials and Methods 1

Data preparation 2

We downloaded raw data of lung08 and lung07 published by Liu’s study(7) from 3

NCBI sequence read archive (SRA) under Bio Project PRJNA573298. Raw reads were 4

first adaptor- and quality-trimmed using the Trimmomatic program (version 0.39)(10). For 5

removing host contamination, Bowtie2 (version 2.3.4.3) (11)was used to map clean reads 6

to the host reference genome of Manis javanica (NCBI Project ID: PRJNA256023). Only 7

unmapped reads were mapped to 2019-nCoV reference genome (GenBank Accession 8

MN908947) for identifying virus reads. 9

Read Assembly and construction of consensus sequence 10

Virus-targeted reads were assembled de novo using MEGAHIT (v1.1.3)(12). Read 11

remapping to assembled contigs was performed by using Bowtie2(11). Mapping coverage 12

and depth were produced using Samtools (version 1.9)(13). Contigs were taxonomically 13

annotated using BLAST 2.9.0+ against 2845 Coronavirus reference genomes (Table S1). 14

Bat-Cov-RaTG13 genome was downloaded from NGDC database (https://bigd.big.ac.cn/) 15

(Accession no. GWHABKP00000000)(2). 2019-nCoV reference genome was downloaded 16

from NCBI (Accession no. MN908947)(3). Other coronavirus genomes were downloaded 17

ViPR database (https://www.viprbrc.org/brc/home.spg?decorator=corona) on 6 February 18

2020. We further used reference-guided strategy to construct draft genome based on those 19

contigs taxonomically annotated to 2019-nCoVs, SARS-CoV, and Bat SARS-like CoV. 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 10: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

10

Each contig was aligned against 2019-nCoV reference genome with MUSCLE software 1

(version 3.8.31)(14). Aligned contigs were merged into consensus scaffold with BioEdit 2

version 7.2.5 (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-3

alignment-editor.html) following manually quality checking. Small fragments with less 4

than length 25bp were discarded, if these fragments were not covered by any large 5

fragments. The potential open reading frames (ORFs) of finally obtained draft genome 6

were annotated by the alignment to 2019-nCoV reference genome (Accession no. 7

MN908947). 8

Phylogenetic relationship analysis 9

Sequence alignment was carried out using MUSCLE software(14). Alignment 10

accuracy was checked manually base-by-base. Gblocks(15) was used to process the gap in 11

aligned sequence. Using MegaX (version 10.1.7)(16), we inferred all maximum likelihood 12

(ML) phylogenetic trees under the best-fit DNA/anima acid substitute model with 1000 13

bootstrap replications. Phylogenetic analyses were performed using the nucleotide 14

sequences of various CoV gene data sets: the whole genome and ORF1a, ORF1b, 15

Membrane (M) gene, spike (S) and RNA-dependent RNA polymerase (RdRp) gene. The 16

best model of M is GTR+G and all others are GTR+G+I. Two additional protein-based 17

trees were constructed under WAG+G (S1 subunit of S protein) and JTT+G (N-protein), 18

respectively. Branches with values< 70% bootstrap were hidden in all phylogenetic trees. 19

Acknowledgments: 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 11: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

11

This study was supported by the 1

Second Tibetan Plateau Scientific Expedition and Research (STEP) program (no. 2

2019QZKK0503), the National Key Research and Development Program of China (no. 3

2018YFC2000500), the Key Research Program of the Chinese Academy of Sciences (no. 4

KFZD-SW-219), and the Chinese National Natural Science Foundation (no. 31970571). 5

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 12: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

12

Figure legends 1

Figure 1. Genome-related analysis. (A) Sequence depth of mapped reads remapping to 2

Pangolin-Cov. (B) Similarity plot based on the full-length genome sequence of Pangolin-3

Cov. Full-length genome sequences of 2019-nCoV (Beta-CoV/Wuhan-Hu-1), BatCov-4

RaTG13, Bat SARSr-CoV 21, Bat SARSr-CoV45, Bat SARSr-CoV WIV1, and SARS-5

CoV BJ01 were used as reference sequences. (C) Comparison of common genome 6

organization similarity among 2019-nCoV, Pangolin-Cov and BatCov-RaTG13 linked to 7

Table S2. 8

Figure 2. Phylogenetic relationship of coronavirus based on whole genome and RdRp gene 9

nucleotide sequences. Red text denotes the Malayan Pangolin-CoV. Pink text denotes 10

2019-nCoV (SARS-CoV-2). Green text denotes a bat coronavirus having 96% similarity 11

at genome level to SARS-CoV-2. Blue text denotes reference coronaviruses used in 12

Figure1B. Detailed information can be found in Materials and Methods. 13

Figure 3. Amino acid sequence alignment of the S1 protein and its phylogeny. The 14

receptor-binding motif of SARS-CoV and the homologous region of other coronaviruses 15

are indicated by the grey box. The key amino acid residues involved in the interaction with 16

human ACE2 are marked with the orange box. Bat SARS-like CoVs had been reported not 17

to use ACE2, had amino acid deletions at two motifs marked by the yellow box. Detailed 18

information can be found in Materials and Methods. 19

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 13: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

13

Figure 4. Amino acid sequence alignment of N protein and its phylogeny. Highly-1

conserved amino acid residues in N-protein marked by colors have diagnostic potential. 2

Detailed information can be found in Materials and Methods. 3

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 14: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

14

References 1

1. A. E. Gorbalenya, Severe acute respiratory syndrome-related coronavirus – The 2

species and its viruses, a statement of the Coronavirus Study Group. bioRxiv, 3

2020.2002.2007.937862 (2020). 4

2. P. Zhou et al., A pneumonia outbreak associated with a new coronavirus of probable 5

bat origin. Nature, https://doi.org/10.1038/s41586-020-2008-3 (2020). 6

3. F. Wu et al., A new coronavirus associated with human respiratory disease in China. 7

Nature, https://doi.org/10.1038/s41586-020-2012-7 (2020). 8

4. J. Cui, F. Li, Z.-L. Shi, Origin and evolution of pathogenic coronaviruses. Nat. Rev. 9

Microbiol. 17, 181-192 (2019). 10

5. W. Li et al., Bats are natural reservoirs of SARS-like coronaviruses. Science 310, 676-11

679 (2005). 12

6. C. Huang et al., Clinical features of patients infected with 2019 novel coronavirus in 13

Wuhan, China. Lancet 395, 497-506 (2020). 14

7. P. Liu, W. Chen, J.-P. Chen, Viral metagenomics revealed sendai virus and coronavirus 15

infection of Malayan Pangolins (Manis javanica). Viruses 11, 979 (2019). 16

8. X.-Y. Ge et al., Isolation and characterization of a bat SARS-like coronavirus that uses 17

the ACE2 receptor. Nature 503, 535-538 (2013). 18

9. S. K. Wong, W. Li, M. J. Moore, H. Choe, M. Farzan, A 193-amino acid fragment of 19

the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2. J. 20

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 15: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

15

Biol. Chem. 279, 3197-3201 (2004). 1

10. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina 2

sequence data. Bioinformatics 30, 2114-2120 (2014). 3

11. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Meth. 4

9, 357-359 (2012). 5

12. D. Li, C.-M. Liu, R. Luo, K. Sadakane, T.-W. Lam, MEGAHIT: an ultra-fast single-6

node solution for large and complex metagenomics assembly via succinct de Bruijn 7

graph. Bioinformatics 31, 1674-1676 (2015). 8

13. H. Li et al., The sequence alignment/map format and SAMtools. Bioinformatics 25, 9

2078-2079 (2009). 10

14. R. C. Edgar, MUSCLE: multiple sequence alignment with high accuracy and high 11

throughput. Nucleic Acids Res. 32, 1792-1797 (2004). 12

15. G. Talavera, J. Castresana, Improvement of phylogenies after removing divergent 13

and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 14

564-577 (2007). 15

16. S. Kumar, G. Stecher, M. Li, C. Knyaz, K. Tamura, MEGA X: molecular evolutionary 16

genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547-1549 (2018). 17

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 16: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

Figure 1

A

B

C

0 5000 10000 15000 20000

010

2030

40

Genome nucleotide position

Sequ

ensi

ng d

epth

Genome Coverage: 99.99% (19585nt/19587nt)Mean depth: 7.71

Genome nucleotide position300002500020000150001000050000

Perc

enta

ge n

ucle

otid

e id

entit

y

100

90

80

70

60

50

40

Beta-CoV/Wuhan-Hu-1 (91.02%)

SARS-CoV BJ01 (73.62)

Bat CoV RaTG13 (90.55%)

Bat SARSr-CoV WIV1(73.94%)

Bat SARSr-CoV ZXC21(85.65%)Bat SARSr-CoV ZC45 (85.01%)

Query: Pangolin CoV

0 10000 20000 30000

EM N

101a 1b 3a

67a8S

E M N 10Mean

(DNA/AA %)

Mean (DNA/AA %)

1a 1b 3a 6 7a 8S

E M N 101a 1b 3a 6 7a 8S90.3/97.0 90.4/72.8 87.3/95.4 95.0/96.8 98.3/97.4 93.0/98.6 93.8/96.3 90.4/93.6 91.5/96.6 95.4/96.4 98.3/94.6 92.8/93.5

89.9/96.8 90.7/73.4 88.3/97.5 94.5/96.8 98.3/97.4 93.0/98.6 95.1/96.3 92.0/96.9 91.7/94.8 94.9/96.4 99.1/97.3 93.2/94.1

19,587 bpPangolin CoVPangolin

Human

Bat 29,855 bpBat CoV RaTG13

29,903 bp2019-nCoV

Figure 1. Genome-related analysis. (A) Sequence depth of mapped reads remapping to Pangolin-Cov. (B)

Similarity plot based on the full-length genome sequence of Pangolin-Cov. Full-length genome sequences of

2019-nCoV (Beta-CoV/Wuhan-Hu-1), BatCov-RaTG13, Bat SARSr-CoV 21, Bat SARSr-CoV45,

Bat SARSr-CoV WIV1, and SARS-CoV BJ01 were used as reference sequences. (C) Comparison of common

genome organization similarity among 2019-nCoV, Pangolin-Cov and BatCov-RaTG13 l inked to Table S2.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 17: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

RdRp

Human CoV 229E93

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV Rp3

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV GX2013

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HuB2013

Bat SARSr-CoV ZXC21

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV ZC45

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV BM48-31 Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Rousettus bat CoV HKU9

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV Mink-CoV

PEDV

Scotophilus bat CoV 512

Rhinolophus bat CoV HKU2

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV NL63

100100

10081

100

100

100100

85

100

100

92

100

100

100

100

100100

100

99

10098

100

Whole GenomeFigure 2

Bet

a-C

oVA

lpha

-CoV

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45 Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat SARSr-CoV BM48-31

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV HKU1

100

100

100

100

100

81

100

100

100

100

99

100

100

100

100

100100

100

100

100

100

100100

100

100100

100

100

100

100100

100

100

100

0.200.50

SARS-CoV-2group SARS-CoV-2

group

Figure 2. Phylogenetic relationship of coronavirus based on whole genome and RdRp gene nucleotide sequences. Red text

denotes the Malayan Pangolin-CoV. Pink text denotes 2019-nCoV (SARS-CoV-2). Green text denotes a bat coronavirus having 96%

similarity at genome level to SARS-CoV-2. Blue text denotes reference coronaviruses used in Figure1B. Detailed information can be

found in Materials and Methods.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 18: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

340Figure 3 350 360 370 380 390 400 410 420 430. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .

Beta-CoV/Wuhan-Hu-1 P N I T N L C P F G E V F N A T R F A S V Y A W N R K R I S N C V A D Y S V L Y N S A S F S T F K C Y G V S P T K L N D L C F T N V Y A D S F V I R G D E V R Q I A P G Q T G K I A D Y N Y K L P D D F 432Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Pangolin-CoV . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V . . . . . . . . . . . . . . R . . G . . . . . . . . . 432Bat_CoV_RaTG13 . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . 432SARS-CoV_SZ3 . . . . . . . . . . . . . . . . K . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432SARS-CoV_BJ01 . . . . . . . . . . . . . . . . K . P . . . . . E . . K . . . . . . . . . . . . . . T F . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV1 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_WIV16 . . . . . . . . . . . . . . . . T . P . . . . . E . . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . V K . . D . . . . . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZXC21 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_ZC45 . . . . . V . . . H K . . . . . . . P . . . . . E . T K . . D . I . . . T . F . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_YNLF31C . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SX2013 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rf1 . . . . . . . . . D K . . . . . . . P . . . . . E . T K . . D . . . . . T . F . . . T . . . . . N . . . . . . S . . I . . . . . S . . . . T . L . . F S . . . . V . . . . . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Rp3 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_GX2013 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HKU3-1 . . . . . R . . . D K . . . . . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_Longquan-140 . . . . . R . . . D K . . . V . . . P N . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_SC2018 . . . . . R . . . D K . . . . S . . P . . . . . E . I K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432Bat_SARSr-CoV_HuB2013 . . . . . R . . . D R . . . . S . . P . . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . S S . . . . V . . . E . . V . . . . . . . . . . . . 432

440 450 460 470 480 490 500 510 520. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .

Beta-CoV/Wuhan-Hu-1 T G C V I A W N S N N L D S K V G G N Y N Y L Y R L F R K S N L K P F E R D I S T E I Y Q A G T P C N G V E G F N C Y F P L Q S Y G F Q P T N G V G Y Q P Y R V V V L S F E L L H A 522

Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522Pangolin-CoV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H . . . . . . . . . . . . . . . . . . . . N . 522Bat_CoV_RaTG13 . . . . . . . . . K H I . A . E . . . F . . . . . . . . . A . . . . . . . . . . . . . . . . . K . . . . Q T . L . . . Y . . Y R . . . Y . . D . . . H . . . . . . . . . . . . . N . 522SARS-CoV_SZ3 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . K D . . . Y T . S . I . . . . . . . . . . . . . . . N . 520SARS-CoV_BJ01 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . N D . . . Y T . T . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV1 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_WIV16 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520Bat_SARSr-CoV_ZXC21 . . . . . . . . T A K Q . T G H - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_ZC45 . . . . . . . . T A K Q . V G N - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_YNLF31C . . . . . . . . T A K Y . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G A R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_SX2013 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q Y V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rf1 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_Rp3 . . . . . . . . T A K Q . Q G Q - - - - - . Y . . S H . . T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . Y . S V P . A . . A T . . . . . . . . . . N . 502Bat_SARSr-CoV_GX2013 . . . . . . . . T A K Q . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HKU3-1 . . . . . . . . T A K H . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_Longquan-140 . . . . . . . . T A K Q . I G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_SC2018 . . . . . . . . T A K Q . T G S - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503Bat_SARSr-CoV_HuB2013 . . . . . . . . T A K Q . T G Y - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503

99

91

99

84

72

98

96

100

9393

99

Figure 3. Amino acid sequence alignment of the S1 protein and its phylogeny.

The receptor-binding motif of SARS-CoV and the homologous region of other

coronaviruses are indicated by the grey box. The key amino acid residues involved

in the interaction with human ACE2 are marked with the orange box. Bat SARS-like

CoVs had been reported not to use ACE2, had amino acid deletions at two motifs

marked by the yellow box. Detailed information can be found in Materials and Methods.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 19: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018

Beta-CoV/Wuhan-Hu-1Beta-CoV/Wuhan/IPBCAMS-WH-02Beta-CoV/Wuhan/IPBCAMS-WH-03Beta-CoV/Wuhan/IPBCAMS-WH-05Beta-CoV/Wuhan/IPBCAMS-WH-01Beta-CoV/Wuhan/IPBCAMS-WH-04Bat_CoV_RaTG13Pangolin-CoVBat_SARSr-CoV_ZXC21Bat_SARSr-CoV_ZC45SARS-CoV_SZ3SARS-CoV_BJ01Bat_SARSr-CoV_WIV1Bat_SARSr-CoV_SHC014Bat_SARSr-CoV_WIV16Bat_SARSr-CoV_YNLF31CBat_SARSr-CoV_Rs4231Bat_SARSr-CoV_Longquan-140Bat_SARSr-CoV_HKU3-1Bat_SARSr-CoV_GX2013Bat_SARSr-CoV_Rf1Bat_SARSr-CoV_SX2013Bat_SARSr-CoV_HuB2013Bat_SARSr-CoV_Rp3Bat_SARSr-CoV_SC2018

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|MSDNGPQ-NQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAAL.......-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-...........................................................................................................................................................................................................................-.............................P.................................................................................................................................................................................S...........-................A............P...........................R..............................................................E......N.............................................................T.............................-..SS............SDNS.....N...P.........................N.T..............K......................E........................E........................................................................................T.........-...S............SDNSK....N...P.........................N.T..............K......................E........................E........................................................................................T.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T.P.DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET.........P...S.........T..IDN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET........H-...S.S.......T...DN....G.N...P.........................E.R..Q..........G..............V........E................S.......E.......................N....T..................................GN................I.SG..ET.........P...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E.......................N....T..................................GN...........N......SG..ET.........-S..S.........T..ADN..D.G.....P.........................E.R.............GK........K....V........E................S.......E..V....................N.......................................GN...........N....L.SG..ET.........-S..S.........A..NDN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.............................S.........GN...........S....L.SG..ET.........S...S..S......T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V...I................N.......................................GN...........N....L.SG..ET.........-..CS.............DN..D.G.....P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G...V.P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET.........-...S.............DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET.........-...S.........T...DN..D.G.....P.........................E...............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..

230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|...ALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMS--SADSTQA*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.................................................................................................................................................................................................--.......*.......................S.......................Q.................................................................................................................................................--.......*...............................................Q................................Q.................................................................------------------------------..-N.............--.......*............N.I................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*............N.V................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.....................R..........Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q............D.........D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.......P........................Q..................................................................H..........Q...N..........................T..A.P....P...P.........M....R...N...GA.......*..............V.......P........................Q............................I.....................................H..........Q...N..........................T..A.P........P.........M....R...H...GA.......*..............V................................Q......................D.T.........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................TS..............Q............D........................................S..........I.H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V.................S..............Q............D.....................................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*..............V................................S..................................................................H..........Q...N..........................T..A.P....-...P.........M....R...N...GA.......*..............V..RS............................Q..................................................................H..........Q...N............I.............T..A.P........P.........M....R...N...GA.......*..............V................................Q..................................................................H..........Q...N.M..........A.............T..A.P........P.........M....R...N...GA.......*

100

100

90

52

65

100100

96

93

Figure 4

Figure 4. Amino acid sequence alignment of N protein and its phylogeny. Highly-conserved amino acid residues in N-protein marked by

colors have diagnostic potential. Detailed information can be found in Materials and Methods.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 20: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

SARS-CoV-2group

SFigure S1

M

0.50

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV WIV16

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rs4231

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV HuB2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV BM48-31

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

99

10089

73

9970

100675564

91

77

10051

99

68

91

100

99

95

62100

69

100

99

86

98

9866

91

100

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-05 Bat SARSr-CoV BM48-31

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Human CoV HKU1

100

100

100

100

100

81

100

100

100

100

99

100

100

100

100

100

100

100

100

100

100

100100

100

100100

100

100

100

100100

100

100

100

0.50

SARS-CoV-2group

Figure S1. Phylogenetic relationship of coronavirus based on ORF1a gene (A) and ORF1b gene (B) nucleotide sequences.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 21: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

orf1aFigure S2

0.20

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1 Bat SARSr-CoV SHC014

SARS-CoV SZ3

SARS-CoV BJ01 Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV YNLF31C

Bat SARSr-CoV SC2018

Bat SARSr-CoV HuB2013

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV BM48-31

Bat SARSr-CoV ZXC21 Bat SARSr-CoV ZC45

Pangolin-CoV Bat CoV RaTG13 Beta-CoV/Wuhan/IPBCAMS-WH-01 Beta-CoV/Wuhan-Hu-1 Beta-CoV/Wuhan/IPBCAMS-WH-02 Beta-CoV/Wuhan/IPBCAMS-WH-03 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

MERS-CoV

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

98

90

100

100100

100

100

90

93

100

86

97

100

100

100

97

100

94

83

98100

100

100

100

100

10088

100

83

orf1b

0.20

Bet

a-C

oVA

lpha

-CoV

Bat SARSr-CoV WIV16

Bat SARSr-CoV Rs4231

Bat SARSr-CoV WIV1

Bat SARSr-CoV SHC014

Bat SARSr-CoV GX2013

Bat SARSr-CoV Rp3

Bat SARSr-CoV YNLF31C

SARS-CoV SZ3

SARS-CoV BJ01

Bat SARSr-CoV SC2018

Bat SARSr-CoV Rf1

Bat SARSr-CoV SX2013

Bat SARSr-CoV HuB2013

Bat SARSr-CoV HKU3-1

Bat SARSr-CoV Longquan-140

Bat SARSr-CoV ZXC21

Bat SARSr-CoV ZC45

Bat SARSr-CoV BM48-31

Pangolin-CoV

Bat CoV RaTG13

Beta-CoV/Wuhan-Hu-1

Beta-CoV/Wuhan/IPBCAMS-WH-01

Beta-CoV/Wuhan/IPBCAMS-WH-02

Beta-CoV/Wuhan/IPBCAMS-WH-03

Beta-CoV/Wuhan/IPBCAMS-WH-04

Beta-CoV/Wuhan/IPBCAMS-WH-05

Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9

Tylonycteris bat CoV HKU4

Pipistrellus bat CoV HKU5

MERS-CoV

Human CoV HKU1

Mus musculus MHV-1

Human CoV OC43

TGEV

Mink-CoV

PEDV

Scotophilus bat CoV 512

Miniopterus bat CoV HKU8

Miniopterus bat CoV1

Rhinolophus bat CoV HKU2

Human CoV NL63

Human CoV 229E

100

100

87

97

95

100

100

100

10081

100

100

85

100

99

100

100

100

100

100

72

95100

86

100100

100

100

100

99

75

100

100

SARS-CoV-2group

SARS-CoV-2group

Figure S2. Phylogenetic relationship of coronavirus based on S gene (A) and M gene (B) nucleotide sequences.

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint

Page 22: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

Table S1. Contigs taxonomically annotated by using BLASTx against 2845 Coronavirus reference genomes.

Contig_ID Contig Length (nt) Subject id aa identity (%) alignment length q. start q. end s. start s. end e-value bit score

Con_01 1306 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 86.8 423 37 1305 1 422 1.40E-221 767.3

Con_02 383 AVP78041|non-structural polyprotein 1ab [Bat SARS-like coronavirus] 96.1 127 3 383 431 557 7.50E-64 241.5

Con_03 398 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 70.6 102 2 307 867 968 6.50E-34 142.1

Con_04 541 AVP78030|non-structural polyprotein 1ab [Bat SARS-like coronavirus] 95.5 179 4 540 1311 1489 9.90E-94 341.3

Con_05 525 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 93.1 174 2 523 1495 1668 1.90E-94 343.6

Con_06 611 AVP78041|non-structural polyprotein 1ab [Bat SARS-like coronavirus] 92.7 165 60 554 1671 1835 6.30E-89 325.5

Con_07 307 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 96.2 53 2 160 1866 1918 1.90E-25 113.6

Con_08 721 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 95.5 179 167 703 2700 2878 9.10E-95 345.1

Con_09 1078 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 98.6 359 2 1078 2966 3324 1.80E-208 723.4

Con_10 987 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 99.7 329 1 987 3670 3998 8.20E-179 624.8

Con_11 751 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 100 202 144 749 3960 4161 3.00E-80 297

Con_12 2138 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 96.4 413 859 2094 4402 4814 5.10E-242 835.9

Con_13 562 AIA62319|ORF1ab polyprotein [BtRs-BetaCoV/GX2013] 100 186 3 560 5165 5350 5.80E-113 405.2

Con_14 512 ARI44798|orf1ab polyprotein [Bat coronavirus] 100 170 2 511 5307 5476 3.50E-101 365.9

Con_15 296 ATO98191|non-structural polyprotein 1ab [Bat SARS-like coronavirus] 100 46 157 294 5307 5352 6.50E-23 105.1

Con_16 983 ARI44798|orf1ab polyprotein [Bat coronavirus] 100 288 3 866 5431 5718 3.50E-166 582.8

Con_17 646 ATO98167|non-structural polyprotein 1ab [Bat SARS-like coronavirus] 98.1 215 2 646 5708 5922 2.10E-122 436.8

Con_18 640 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 98.6 213 2 640 5985 6197 9.50E-128 454.5

Con_19 832 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 92.2 244 1 723 6151 6394 2.20E-140 496.9

Con_20 1830 QHO60603|orf1ab polyprotein [Wuhan seafood market pneumonia virus] 98.6 570 2 1711 6527 7096 0.00E+00 1145.2

Con_21 496 QHO60594|surface glycoprotein [Wuhan seafood market pneumonia virus] 95.8 118 143 496 307 424 1.70E-63 240.7

Con_22 440 QHO60594|surface glycoprotein [Wuhan seafood market pneumonia virus] 96.6 145 1 435 378 522 9.80E-84 307.8

Con_23 381 QHO60594|surface glycoprotein [Wuhan seafood market pneumonia virus] 93.1 131 1 381 658 788 1.20E-61 234.2

Con_24 503 QHO60594|surface glycoprotein [Wuhan seafood market pneumonia virus] 100 157 3 473 785 941 7.00E-86 315.1

Con_25 646 ABD75323|spike protein [Bat SARS CoV Rf1/2004] 95.1 183 96 644 946 1128 3.20E-99 359.8

Con_26 381 QHO60595|orf3a protein [Wuhan seafood market pneumonia virus] 96.9 127 1 381 114 240 4.40E-72 268.9

Con_27 900 AVP78045|membrane protein [Bat SARS-like coronavirus] 99.5 222 170 835 1 222 2.40E-121 433.7

Con_28 359 QHO60599|orf7a protein [Wuhan seafood market pneumonia virus] 100 42 131 256 1 42 3.40E-18 89.7

Con_29 989 AVP78048|hypothetical protein [Bat SARS-like coronavirus] 95 121 461 823 1 121 1.40E-66 251.9

Con_30 488 QHO60601|nucleocapsid phosphoprotein [Wuhan seafood market pneumonia virus] 96.9 162 1 486 3 164 3.20E-91 332.8

Con_31 287 AAU04658|nucleocapsid protein [SARS coronavirus civet010] 97.8 46 2 139 258 303 7.70E-21 98.2

Con_32 555 QHO60601|nucleocapsid phosphoprotein [Wuhan seafood market pneumonia virus] 97.8 184 2 553 119 302 7.80E-78 288.5

Con_33 461 QHO60601|nucleocapsid phosphoprotein [Wuhan seafood market pneumonia virus] 98.2 109 2 328 257 365 1.50E-58 224.2

Con_34 375 QHO60602|orf10 protein [Wuhan seafood market pneumonia virus] 97.4 38 90 203 1 38 4.90E-15 79.3

.C

C-B

Y-N

C-N

D 4.0 International license

preprint (which w

as not certified by peer review) is the author/funder. It is m

ade available under aT

he copyright holder for thisthis version posted F

ebruary 20, 2020. .

https://doi.org/10.1101/2020.02.19.950253doi:

bioRxiv preprint

Page 23: Pangolin homology associated with 2019-nCoV · 1 Pangolin homology associated with 2019-nCoV 2 Tao Zhang1*, Qunfu Wu1*, Zhigang Zhang1# 3 1State key Laboratory for Conservation and

Table S2. Comparing nt and aa sequence identity difference of ten genes among Pangolin-Cov, 2019-nCoV, and BatCov-RaTG13

Genes 2019-nCoV (nt) RaTG13 (nt) Pangolin-CoV (nt) orf1ab 21290 21287 13669 (partial) orf1a 13128 13215 7342 (partial) orf1b 8072 8072 6327 (partial) S 3822 3810 2156 (partial) 3a 828 828 379 (partial) E 228 228 120 (partial) M 669 666 669 (Complete) orf6 186 186 175 (partial) orf7a 366 366 296 (partial) orf8 366 366 366 (Complete) N 1260 1260 1192 (partial) orf10 117 117 117 (Complete) Nucleotide 2019-nCoV vs Pangolin-CoV 2019-nCoV vs RaTG13 Pangolin-CoV vs RaTG13 orf1ab 90.3% 96.6% 90.3% orf1a 89.9% 96.2% 90.3% orf1b 90.7% 97.0% 90.4% S 88.3% 91.5% 87.3% 3a 94.5% 98.6% 95.0% E 98.3% 100.0% 98.3% M 93.0% 95.8% 93.0% orf6 95.1% 98.8% 93.8% orf7a 92.0% 95.1% 90.4% orf8 91.7% 96.9% 91.5% N 94.9% 96.7% 95.4% orf10 99.1% 99.1% 98.3% Average 93.2% 96.9% 92.8% Amino Acid 2019-nCoV vs Pangolin-CoV 2019-nCoV vs RaTG13 Pangolin-CoV vs RaTG13 orf1ab 87.1% 95.8% 86.9% orf1a 96.8% 98.6% 97.0% orf1b 73.4% 92.2% 72.8% S 97.5% 96.7% 95.4% 3a 96.8% 100.0% 96.8% E 97.4% 100.0% 97.4% M 98.6% 100.0% 98.6% orf6 96.3% 100.0% 96.3% orf7a 96.9% 96.9% 93.6% orf8 94.8% 94.8% 96.6% N 96.4% 99.0% 96.4% orf10 97.3% 97.3% 94.6% Average 94.1% 97.6% 93.5%

.CC-BY-NC-ND 4.0 International licensepreprint (which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for thisthis version posted February 20, 2020. . https://doi.org/10.1101/2020.02.19.950253doi: bioRxiv preprint