Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Understanding genomic diversity,pan-genome, and evolution of SARS-CoV-2Arohi Parlikar*, Kishan Kalia*, Shruti Sinha, Sucheta Patnaik,Neeraj Sharma, Sai Gayatri Vemuri and Gaurav Sharma
Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, Karnataka, India* These authors contributed equally to this work.
ABSTRACTCoronovirus disease 2019 (COVID-19) infection, which originated from Wuhan,China, has seized the whole world in its grasp and created a huge pandemic situationbefore humanity. Since December 2019, genomes of numerous isolates have beensequenced and analyzed for testing confirmation, epidemiology, and evolutionarystudies. In the first half of this article, we provide a detailed review of the history andorigin of COVID-19, followed by the taxonomy, nomenclature and genomeorganization of its causative agent Severe Acute Respiratory Syndrome-relatedCoronavirus-2 (SARS-CoV-2). In the latter half, we analyze subgenus Sarbecovirus(167 SARS-CoV-2, 312 SARS-CoV, and 5 Pangolin CoV) genomes to understandtheir diversity, origin, and evolution, along with pan-genome analysis of genusBetacoronavirus members. Whole-genome sequence-based phylogeny of subgenusSarbecovirus genomes reasserted the fact that SARS-CoV-2 strains evolved from theircommon ancestors putatively residing in bat or pangolin hosts. We predicted a fewcountry-specific patterns of relatedness and identified mutational hotspots with high,medium and low probability based on genome alignment of 167 SARS-CoV-2strains. A total of 100-nucleotide segment-based homology studies revealed that themajority of the SARS-CoV-2 genome segments are close to Bat CoV, followed bysome to Pangolin CoV, and some are unique ones. Open pan-genome of genusBetacoronavirus members indicates the diversity contributed by the novel virusesemerging in this group. Overall, the exploration of the diversity of these isolates,mutational hotspots and pan-genome will shed light on the evolution andpathogenicity of SARS-CoV-2 and help in developing putative methods of diagnosisand treatment.
Subjects Bioinformatics, Evolutionary Studies, Genomics, Microbiology, VirologyKeywords COVID-19, SARS, Coronavirus, Pandemic, Viral disease, Genome, Bioinformatics,Genomics, Mutations
INTRODUCTIONThe emergence of coronavirus disease 2019 (COVID-19) has sent people from across theworld into a state of high alert, and they are trying to survive complete or partial lockdowns(Lescure et al., 2020). The causative agent of this pandemic is a novel Severe AcuteRespiratory Syndrome (SARS) Coronavirus, Severe Acute Respiratory Syndrome-relatedCoronavirus-2 (SARS-CoV-2). Since 2000, the world has witnessed two major coronavirusoutbreaks: the first, of SARS, caused by SARS-CoV, in 2002 in China (Zhong et al., 2003),
How to cite this article Parlikar A, Kalia K, Sinha S, Patnaik S, Sharma N, Vemuri SG, Sharma G. 2020. Understanding genomic diversity,pan-genome, and evolution of SARS-CoV-2. PeerJ 8:e9576 DOI 10.7717/peerj.9576
Submitted 29 April 2020Accepted 29 June 2020Published 17 July 2020
Corresponding authorGaurav Sharma,[email protected]
Academic editorAbhishek Kumar
Additional Information andDeclarations can be found onpage 24
DOI 10.7717/peerj.9576
Copyright2020 Parlikar et al.
Distributed underCreative Commons CC-BY 4.0
and the second, of Middle East Respiratory Syndrome (MERS) in 2012 in Saudi Arabia(Zaki et al., 2012). Amongst coronaviruses, SARS-CoV and MERS are known as highlypathogenic viruses that cause pneumonia and respiratory system infections (Coleman &Frieman, 2014). Besides these, several other low pathogenic strains are also known thatcause mild respiratory diseases and infect majorly the upper respiratory tract (Han et al.,2020). After the diagnosis of the first reported case in Wuhan, China, and genomesequencing of the Wuhan Hu-1 strain (SARS-CoV-2 reference genome), scientists acrossthe world are striving to develop vaccines and diagnostic kits and understand itsevolution and epidemiology. Several potential drug molecules to combat COVID-19 areundergoing clinical trials. As no vaccine or drugs are available at present, the only waysknown to contain transmission are social distancing, regular washing of hands, andcovering mouth and nose while coughing or sneezing, thereby, preventing direct or indirectphysical contact and containing the transfer of infected respiratory droplets.
This study comprises a detailed overview of the history and origin of COVID-19along with the taxonomy, nomenclature and genome structure of its causative agentSARS-CoV-2. Following that, we analyze 167 SARS-CoV-2, 312 SARS-CoV and fivePangolin CoV genomes to study their genomic variability, evolution and mutationhotspots. We also characterize the pan-genome of the genus Betacoronavirus (includingthe recently sequenced Pangolin CoV) to classify their proteins-coding regions into core,accessory, and unique categories within each genome group.
TAXONOMY AND NOMENCLATURECoronaviruses are members of family Coronaviridae that include enveloped positive sensessRNA containing viruses, taxonomically classified amongst order Nidovirales in the realmRiboviria. Like other viruses, they thrive in the gray area of living and non-living asthey do not have the machinery to survive outside a living host cell. Therefore, they cannotreplicate outside the host. The Riboviruses or realm Riboviria members replicate byutilizing RNA-dependent RNA-polymerases (RdRps). Their genetic material that is, RNAserves as messenger RNA (mRNA), directly translating into proteins and undergoesgenetic recombination in the presence of another viral genome in the host cell(Barr & Fearns, 2010).
Members of order Nidovirales have a positive sense linear (capped and polyadenylated)RNA molecule, known for causing severe infections. The word “Nido” stands for “nest”implying that all Nidoviruses express subgenomic mRNAs (sgRNA) in a nested form. Theyare known to infect hosts within three important classes in Vertebrates, namely mammals(Mammalia), birds (Aves) and fishes (Pisces). Coronaviruses (family Coronaviridae;subfamily Orthocoronavirinae) commonly infect both mammals and birds; Torovirus(family Tobaniviridae; subfamily Torovirinae) infect specifically mammals and Bafinivirus(family Tobaniviridae; subfamily Piscanivirinae) infect fishes. Coronavirus virions arespherical, Torovirus are crescent-shaped, and Bafinivirus are rod-like; however, allmembers of this family are adorned with a crown or club-shaped surface proteinscalled the Spike proteins. Because of their crown-like morphology observed in electronmicrographs, they are named Coronaviruses. Family Coronaviridae has a single
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 2/31
subfamily Orthocoronavirinae and its members form a monophyletic clade. They arefurther classified into four defined genera based on phylogenetic studies of conservedgenome regions and their serological cross-reactivity: Alphacoronavirus, Betacoronavirus,Gammacoronavirus and Deltacoronavirus. Under genus Betacoronavirus, five lineagesdiverge: Lineage A (subgenus: Embecovirus) includes HCoV-OC43 and HCoV-HKU1,Lineage B (subgenus: Sarbecovirus) includes SARS-CoV and SARS-CoV-2, Lineage C(subgenus: Merbecovirus) contains Bat coronavirus BtCoV-HKU4 and BtCoV-HKU5,Lineage D (subgenus: Nobecovirus) includes Rousettus Bat coronavirus BtCoV-HKU9and lineage E (subgenus: Hibecovirus) includes Bat Hp-betacoronavirus/Zhejiang2013.The subgenus Sarbecovirus members tend to undergo deep recombination events leadingto the formation of new alleles (Boni et al., 2020) and zoonotic transfer.
ORIGIN AND HISTORY OF THE CORONAVIRUSINFECTIONSThe first Coronavirus associated disease was described in 1931 (Peiris, 2012). The viruseswere first isolated from humans in the UK and USA around the same time. The firstisolated specimen was B814, taken from a boy exhibiting symptoms of the common cold in1960 and cultured in human embryonic tracheal organ culture (Tyrrell & Bynoe, 1966).The specimen was deemed distinct from Adeno-, Entero- or Rhinoviruses, being etherlabile and propagated only in organ cultures (Kendall, Bynoe & Tyrrell, 1962; Tyrrell &Bynoe, 1966). Later, the gradual discoveries of 229E in 1966 (Hamre & Procknow, 1966),OC43 in 1967 (McIntosh, Becker & Chanock, 1967), SARS-CoV in 2002, Bovine CoV in2006, MERS-CoV in 2012 and SARS-CoV-2 in 2019 (Zhu et al., 2020) were majorsignificant steps in coronavirus history.
Coronaviruses have been associated with mild respiratory infections and cold inhumans and animals (specifically poultry and livestock). These viruses were notconsidered treacherous until the SARS epidemic that emerged in China in November 2002.The associated SARS-CoV species were first classified as a separate clade underBetacoronavirus after this outbreak, post which new species including those of humancoronaviruses (hCoVs) have been identified. The 2002 SARS epidemic ultimately infected8,096 people, with 774 deaths. Later, MERS-CoV emerged in Saudi Arabia in September2012, causing 2,494 confirmed cases of infection and 858 deaths across 27 countries(https://www.who.int/emergencies/diseases/en/).
The current pandemic, COVID-19, is caused by the virus, initially designated “2019novel coronavirus” or “2019-nCoV”, later re-named SARS-CoV-2, segregating it as a novelSARS species (Huang et al., 2020). Now, it forms a new clade in the subgenus Sarbecovirusand is established as the 7th member of the family which infects humans (Simmonds et al.,2017; Gorbalenya et al., 2020; Zhu et al., 2020). The first patients of COVID-19 wereidentified in late December 2019 when several local health centers reported clusters ofpatients with pneumonia caused by an unknown pathogen, epidemiologically linked toliving animal and seafood wholesale market in Wuhan. The Chinese Center for DiseaseControl and Prevention called a rapid action team for the epidemiological investigation.Bronchoalveolar-lavage fluids were propagated on human respiratory tract epithelial cell
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 3/31
cultures for 4–6 weeks. Extracted nucleic acid was identified as viral RNA by real-timereverse transcription PCR targeting the RdRp region of pan-BetaCoV. The electronmicrographs revealed the presence of distinctive spikes of length 9–12 nm, establishingmorphological resemblance with the Coronaviridae family (Zhu et al., 2020). Followingthis, SARS-CoV-2 infection has been spreading worldwide. As of 29 April 2020(submission date), 213 countries and territories around the world are under surveillance,with more than 3.15 million confirmed cases and over 218,490 reported fatalities, and thesenumbers continue to exponentially increase day by day.
Since December 2019, numerous SARS-CoV-2 genomes have been sequenced, whichallows researchers to address many critical questions regarding its origin, transmission,epidemiology, and most importantly to design vaccines and viral detection kits. Viralgenome sequencing and multiple sequence alignment of SARS-CoV-2 genomes haverevealed its identity with Bat CoV RaTG13 (MN996532) and Pangolin-CoV (Cui, Li &Shi, 2019; Boni et al., 2020; Rehman et al., 2020). A comparative phylogenetic studyrevealed that the Pangolin-CoV genes shared a high level of sequence identity withSARS-CoV-2 genes, specifically orf1b, the spike (S), orf7a and orf10 genes (Zhang, Wu &Zhang, 2020). Higher identity of S protein sequence implies the functional similaritybetween Pangolin-CoV and SARS-CoV-2 as compared to the RaTg1 strain (Zhang, Wu &Zhang, 2020) further suggesting Pangolin as an intermediate host for infection and naturalreservoir of SARS-CoV-2-like strains.
CORONAVIRUS GENOME ORGANIZATIONCoronaviruses belong to family Coronaviridae and comprise enveloped viruses thatreplicate in the cytoplasm of animal host cells; distinguished by the presence of asingle-stranded positive-sense RNA genome (about 30 kb in length) (Marra et al., 2003).Coronaviruses possess the largest genomes among all known RNA viruses (Fehr &Perlman, 2015), ranging from 25.32 kb in Porcine Deltacoronavirus PD-CoV to 31.775 kbin Beluga whale coronavirus SW1 (information available on NCBI Virus webpagehttps://www.ncbi.nlm.nih.gov/labs/virus/). SARS-CoV-2 is a spherical or pleomorphicenveloped particle, containing single-stranded, positive-sense RNA associated with anucleoprotein, lying inside a capsid composed of matrix protein (Lai et al., 2020).
The SARS-CoV-2 reference genome (NC_045512; Wuhan Hu-1 strain) is 29,903nucleotides long, which consists of A (8,954 nt, 29.94%), G (5,863 nt, 19.61%), C (5,492 nt,18.37%), T (9,594 nt, 32.08%). Compared to SARS-CoV-2, the SARS-CoV referencegenome (NC_004718; Tor1 strain) is a little larger, that is, 29,751 nucleotides and itsnucleotide composition is marginally different: A (8,481 nt, 28.50), G (6,187 nt, 20.80%),C (5,940 nt, 19.97%), T (9,143 nt, 30.73%). We identified that the GC content (averageof all genomes under this study) of SARS-CoV, SARS-CoV-2, and Pangolin CoV is40.81%, 38% and 38.52% respectively. Like other Betacoronavirus members, this genomecontains two flanking untranslated regions (UTRs), 5′-UTR (265 nt) and 3′-UTR (229 nt)sequences.
A typical CoV genome contains at least six ORFs (Graham et al., 2008). The genomes ofall coronaviruses usually encode four well-conserved and characterized structural proteins:
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 4/31
S (spike), E (envelope), M (membrane) and N (nucleocapsid) (Graham et al., 2008;Fuk-Woo Chan et al., 2020), encoded by sgRNA 2, 4, 5 and 9a respectively present at 3′endand sharing 1/3 of the genome in SARS-CoV (Graham et al., 2008).
SARS-CoV-2 Proteins: The SARS-CoV-2 genome has 12 protein-coding regions, whichencode two categories of proteins: first, Structural proteins, which give characteristicstructure to the virus and are involved in viral entry, and second, Non-structural (NS)or accessory proteins, which help in viral replication (Marra et al., 2003; Graham et al.,2008; Hu et al., 2018; Fuk-Woo Chan et al., 2020). Most of the information available so farabout these proteins is based on SARS-CoV and other members of the genusBetacoronavirus.
STRUCTURAL PROTEINSSpike proteinThe S protein is a 1,273 aa trimeric, cell-surface glycoprotein consisting of two subunits(S1 and S2), encoded by the gene S. The S1 subunit is responsible for receptor binding(Hu et al., 2018). This is also important for mediating the fusion of viral and hostmembranes. Both these processes are critical for virus entry into host cells (Tan, Lim &Hong, 2005). SARS-CoV-2 has a polybasic cleavage site RRAR, at the junction of S1 and S2subunits, which enables effective cleavage by Furin and other proteases (Andersen et al.,2020; Zhang, Wu & Zhang, 2020). Furthermore, one proline residue is also inserted atthe leading cleavage site of SARS-CoV-2, making “PRRA”, the final inserted sequence.Comparison of this site within different Betacoronavirus members revealed that theinsertion of a Furin cleavage site at the S1–S2 junction of SARS-CoV-2 enhances cell-cellfusion (Follis, York & Nunberg, 2006;De Haan et al., 2008; Andersen et al., 2020). Similarly,an effective cleavage of the polybasic cleavage site in Hemagglutinin esterase proteinfacilitated the inter-species transmission of MERS-like coronaviruses from bats to humans(Andersen et al., 2020). Majorly, variations in the S protein are responsible for twoattributes, tissue tropism and host ranges of different CoVs (Hu et al., 2018). It wasobserved that S protein underwent several drastic changes during the viral infection.The S protein of SARS-CoV-2 is more vulnerable to mutations, especially in the aminoacids associated with the spike protein-cell receptor interface. Interestingly, the amino acidsequence represented ~19% changes with four major insertions and ~81% sequencesimilarity in contrast to SARS-CoV (Ahmed, Quadeer & McKay, 2020; Rehman et al.,2020).
Nucleocapsid proteinN proteins are 419 aa long phosphoproteins weighing ~46kDa. These have helix bindingproperties. Coronavirus N proteins possess three easily distinguishable and highlyconserved domains; the N-terminal domain (NTD) (N1b), the C-terminal domain (CTD)(N2b) and the N3 region (Grossoehme et al., 2009; Cong et al., 2019). The N proteinplays a vital role in virion structure formation as it is localized in both the replication-transcription region and the ERGIC (Endoplasmic reticulum-Golgi apparatusIntermediate Compartment), the site of virion assembly (Tok & Tatar, 2017).
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 5/31
The N protein of SARS-CoV is primarily expressed during the early stages of SARS-CoVinfection (Surjit & Lal, 2008; Grossoehme et al., 2009; Cong et al., 2019). It is importantas a diagnostic marker as it induces a strong immune response (Hu et al., 2018).Evolutionary analysis has shown 89–91% sequence homology between the N proteins inthe SARS-CoV and Bat SL-CoV (Hu et al., 2018; Ahmed, Quadeer & McKay, 2020).The N protein can bind to Nsp3 protein to help bind the genome to replication/transcription complex (RTC) (Cong et al., 2019) and package the encapsulated genomeinto virions (Chen, Liu & Guo, 2020). IBV, SARS CoV and MHV N protein undergophosphorylation and allow discrimination of non-viral mRNA binding from viralmRNA binding (Cong et al., 2019). N protein is also an antagonist of interferon (IFN)(Kopecky-Bromberg et al., 2007) and virus-encoded repressor of RNA interference (RNAi),therefore, it appears to benefit the viral replication (Chen, Liu & Guo, 2020).
Membrane proteinThe most abundant structural protein in the genome is the membrane (M) glycoprotein,which spans across the membrane bilayer three times. Thus, M glycoproteins have threetransmembrane regions, leaving a short NH2-terminal domain outside the virus and along COOH terminus (cytoplasmic domain) inside the virion (Tok & Tatar, 2017;Mousavizadeh & Ghasemi, 2020). The M protein plays a key role in regenerating virions inthe cell. M proteins undergo glycosylation in the Golgi apparatus which is crucial forthe virion to fuze into the cell and to make antigenic proteins. As mentioned before, Nprotein forms a complex by binding to genomic RNA andM protein triggers the formationof interacting virions in the endoplasm (Tok & Tatar, 2017).
Envelope glycoproteinE glycoproteins are composed of approximately 76–109 amino acids in other coronavirusspecies. E protein plays a crucial role in the assembly and morphogenesis of virionswithin the host. About 30 amino acids in the N-terminus of the E protein enablebinding to the viral membranes. Co-expression of E and M proteins with mammalianexpression vectors enable the formation of virus-like structures within the cell (Tok &Tatar, 2017).
NON-STRUCTURAL (ACCESSORY) PROTEINSORF1ab and ORF1aThe SARS 5′ proximal gene 1 (~22 kb) comprises two long overlapping open readingframes, orf1a and orf1ab, encoding for polyproteins 1a and 1ab. These polyproteins arecleaved by viral proteases that is, PLpro (papain-like protease) and 3CLpro (chymotrypsin-like protease) into 16 non-structural proteins, Nsp1–Nsp16. Expression of orf1abinvolves a (−1) ribosomal frameshift upstream of orf1a stop codon, thus forming thefull-length ORF1ab. ORF1a polyprotein of ~500 kDa encodes for Nsp 1–11, while ORF1abpolyprotein of ~800 kDa encodes for all Nsp1–16. The ORF1a and ORF1ab polyproteinsundergo post-translational modifications to form mature proteins and hence are not
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 6/31
detected during infection (Graham et al., 2008; Lei, Kusov & Hilgenfeld, 2018). A briefdescription of these proteins is provided below:
i. Nsp1: SARS-CoV Nsp1 is an N-terminal protein coded by the first gene of ORF1ab,which plays an important role in SARS-CoV pathogenesis by inhibiting the hostgene expression via binding to the small subunit of the ribosome and then truncatingthe translation activity. Nsp1 induces endonucleolytic RNA cleavage of a cappedhost mRNA, making it translationally incompetent (Tanaka et al., 2012). Nsp1inhibits the type-1 IFN expression and other antiviral signaling pathways, thussuppressing the innate immune system. The viral mRNA is resistant to Nsp1-mediated RNA cleavage in the infected host cells, however, the mechanism throughwhich it achieves this is unknown (Tanaka et al., 2012).
ii. Nsp2: the function of Nsp2 in SARS-CoV is unknown. The experimental results showthat the deletion of the nsp2 gene from MHV and SARS-CoV was tolerated withmodest growth and some RNA defect. Also, there was no observable effect on thepolyprotein processing in the mutants. Another evidence shows that neither theNsp2 protein nor the nsp2 gene is involved in pathogenesis (Graham et al., 2005).
iii. Nsp3: the multi-domain SARS-CoV Nsp3 is a 215 kDa glycosylated transmembranemultidomain protein involved in viral replication and transcription. It may serveas a scaffolding protein for numerous other proteins (Angelini et al., 2013).The organization of various Nsp3 domains differs amongst the Coronavirusgenomes. However, eight domains are common to all CoVs: ubiquitin-like domain 1(Ubl1), the glutamate-rich acidic domain also known as “hypervariable region”, an Xmacrodomain, ubiquitin-like domain 2 (Ubl2), PL2pro (papain-like protease 2),ectodomain 3Ecto, also known as “zinc-finger domain”, and domains Y1 and CoV-Y(unknown functions). Nsp3 releases itself, Nsp1 and Nsp2 from the polyproteins.It alters the post-translational modification of host proteins to antagonize theinnate immune response by de-MARylating, de-PARylating, de-ubiquitinating,or de-ISGylating the host proteins. Meanwhile, Nsp3 modifies itself by theN-glycosylation of the ectodomain and can also interact with host proteins (such asRCHY1) to enhance viral survival (Lei, Kusov & Hilgenfeld, 2018).
iv. Nsp4: it is known that Coronaviruses induce double-membrane vesicles (DMVs).Any alteration in the DMVs’ morphology impairs the RNA synthesis andtherefore growth of Nsp4 mutants (Gadlage et al., 2010).
v. Nsp5: the Nsp5 protease also known as 3CLpro or Mpro, cleaves the Nsp peptides at11 cleavage sites (Stobart et al., 2013). Nsp5 of Porcine Deltacoronavirus (PDCoV)is a type I IFN antagonist. It disrupts the IFN signaling pathway by cleaving theNF-κB essential modulator (NEMO), thus impairing the host’s ability to activate theIFN response (Zhu et al., 2017).
vi. Nsp6: Nsp6, along with Nsp3 and Nsp4, has membrane proliferation ability. Thus, itcan induce perinuclear vesicles localized around the microtubule-organizingcenter and DMV formation (Angelini et al., 2013). The coronavirus Nsp6 protein
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 7/31
restricts the autophagosome expansion either directly or indirectly throughstarvation or chemical inhibition of MTOR signaling (Cottam, Whelband &Wileman, 2014).
vii. Nsp7: the Nsp12 RNA-dependent RNA polymerase cannot act on its own anddepends upon stimulation by a complex of Nsp7 and Nsp8. Thus, Nsp7 acts asone of the cofactors along with Nsp8 to stimulate Nsp12. It is hypothesized that Nsp7and Nsp8 heterodimers stabilize the Nsp12 regions involved in RNA binding; also,Nsp8 acts as a minor RdRp (Kirchdoerfer & Ward, 2019).
viii. Nsp8: Nsp8 is a 22 kDa non-canonical polymerase shown to be capable of de novoRNA synthesis with low fidelity on single-stranded RNA templates. The ‘main’ RdRpin Coronaviruses is the Nsp12 which employs a primer-dependent initiationmechanism. These observations led to a hypothesis that Nsp8 would act as an RNAprimase and synthesize a short primer that will be extended by Nsp12 (Te Velthuis,Van den Worm & Snijder, 2012).
ix. Nsp9: Nsp9 is an RNA-binding subunit in the replication complex in allcoronaviruses (Zeng et al., 2018).
x. Nsp10: SARS-CoV Nsp10 binds to and stimulates the exoribonucleases, Nsp14, andNsp16, thus playing an important role in the replication-transcription complex(RTC) formation (Bouvet et al., 2014). Nsp10 acts as an essential co-factor intriggering Nsp16 2′O-MTase activity, which suggests its involvement in theregulation of capping of viral RNA (Lugari et al., 2010).
xi. Nsp11: the exact function of Nsp11 in Coronaviridae is unknown. Arterivirus Nsp11was identified as NendoU (Nidoviral uridylate-specific endoribonucleases) that playsa role in the viral life cycle. Nsp11 could be essential to produce helicase (Woo et al.,2005).
xii. Nsp12: Nsp12 is a 102 kDa RNA-dependent RNA polymerase (RdRp) featuringall conserved motifs of the known RdRps. It is the most conserved protein incoronaviruses and assumes a center stage in the viral RTC. The Nsp7/Nsp8 complexenhances the binding of Nsp12 to RNA. Nsp8 is the second, non-canonical RdRp incoronaviruses (Subissi et al., 2014).
xiii. Nsp13: Nsp13 is a 66.5 kDa multi-functional protein. The N terminal has azinc-binding domain while the C-terminal harbors a helicase domain-containingconserved motif of superfamily-1 helicases (Subissi et al., 2014).
xiv. Nsp14: Nsp14 is a 60 kDa bifunctional enzyme. Its N-terminal is a 3′–5′exoribonuclease involved in the mismatch repair system, improving the fidelityof virus replication via RNA proofreading. Yeast trans-complementationstudies have shown that the C-terminal domain of SARS-CoV Nsp14 is anS-adenosylmethionine-dependent guanine-N7-methyltransferase (MTase) (Subissiet al., 2014; Zeng et al., 2016) with no RNA sequence specificity (Subissi et al., 2014).
xv. Nsp15: SARS-CoV Nsp15 is a uridine specific ribonuclease that cleaves the 3′ ofuridylates, producing 2′–3′ cyclic phosphate ends (Subissi et al., 2014).
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 8/31
xvi. Nsp16: Nsp16 is a 2′-O-Methyl Transferase (Zeng et al., 2016) and requires Nsp10 asa stimulatory factor to exhibit its MTase activity (Chen et al., 2011). EukaryoticmRNA has a unique 5′ cap; to evade the host machinery, the viral RNA must bemade indistinguishable from the host mRNA by capping the viral RNA (Menachery,Debbink & Baric, 2014). Both Nsp14 and Nsp16 are involved in the modificationof viral RNA cap structure (Zeng et al., 2016) to maintain the viral RNA stability,ensure protein translation and immune escape (Lugari et al., 2010; Subissi et al.,2014). The SARS-CoV genome encodes two SAM-dependent methyltransferases(Zeng et al., 2016); Nsp14 N7 methyltransferase, and Nsp16 2′-O-methyl transferase,that methylates the RNA at N7 of guanosine and ribose 2′O sites, respectively(Chen et al., 2011). This process also involves Nsp10 for stabilizing the SAM bindingregion and RNA binding of Nsp16 (Subissi et al., 2014).
ORF3a proteinThe orf3a gene lies between S and E genes and encodes for a transmembrane protein(McBride & Fielding, 2012). SARS-CoV-2 ORF3a protein shows 97.82% homology to NS3 ofBat coronavirus RaTG13 and 72% homology to SARS-CoV ORF3a protein (Issa et al., 2020).It is localized to the cell membrane and partly to the ER and the Golgi perinuclearspace in the host cell. ORF3a is known to activate PERK (PKR-like ER kinase) signalingpathway through which the viral particles escape ER-associated degradation and theresulting ER stress also induces apoptosis. ORF3a of both SARS-CoV and SARS-CoV-2 havean APA3_viroporin (a pro-apoptosis protein) conserved domain (McBride & Fielding, 2012;Issa et al., 2020). ORF3a has been shown to activate NF-κB and JNK. This leads to theupregulation of the chemokine named RANTES (Regulated upon Activation, Normal T CellExpressed and Secreted) along with IL-8 in A549 and HEK293T cells (Liu et al., 2014).The extracellular N-terminus of protein can evoke a humoral immune response. Bindingof ORF3a protein to caveolin-1 may be required for viral uptake and release (Liu et al., 2014).ORF3a can augment the activation of the p38 MAPK pathway and induce the mitochondriato leak inducing apoptosis (Liu et al., 2014).
ORF6 proteinSARS-CoV ORF6/Protein 6 is a 63-aa polypeptide. It has an amphipathic 1–40 aaN-terminal portion and a highly polar C-terminal. The amino acid residues 2–32 in theN-terminal form an a-helix which is embedded in the cell membrane. There are twosignal sequences in the C-terminal; aa 49–52 sequence YSEL targets proteins forincorporating into endosomes and the acidic tail which signals ER export. This proteinis localized in the ER and Golgi apparatus. ORF6 is incorporated into VLPs, whenco-expressed with SARS-CoV S, M and E structural proteins. It enhances viral replication,thus serving an important role in pathogenesis during SARS-CoV infection (Liu et al.,2014). ORF6 is also well known as a β-interferon antagonist; its overexpression inhibitsnuclear import of STAT1 in IFN-β treated cells. Along with ORF3b and N protein, theSARS-CoV ORF6 inhibits activation of IRF-3 via phosphorylation and binding of IRF-3 to
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 9/31
a promoter with IRF-3 binding sites. IRF-3 is an important protein for the expression ofIFN. ORF6 may disturb the ER/Golgi transport necessary for the interferon response(Kopecky-Bromberg et al., 2007). During SARS-CoV infection, the C-terminal domain ofORF6 modulates the host protein nuclear transport and type-I interferon signaling, thusplaying an important role in immune evasion (Liu et al., 2014).
ORF7a proteinSARS-CoV ORF7a/ Protein 7a is a 122 aa type-I transmembrane protein (Liu et al., 2014).It consists of 15 aa N terminal signal peptide sequence, an 81 aa ectodomain, 21 aa Cterminal transmembrane domain, and a cytoplasmic tail of 5 aa residues. The ectodomainof ORF7a binds to human LFA-1 (lymphocyte function-related antigen 1) on Jurkat cellsvia the alpha integrin-I domain of LFA-1. This suggests that probably LFA-1 is abinding factor or receptor for SARS-CoV on human leukocytes (McBride & Fielding,2012). SARS-CoV ORF7a is localized in the ER-Golgi Intermediate Compartment(ERGIC), the assembly point of coronaviruses. It interacts with M and E structuralproteins, indicating a possible role in viral assembly during SARS-CoV replication.In mammalian cells infected with SARS-CoV, ORF7a has been known to inducecaspase-dependent apoptosis by cleaving PARP (poly(ADP-ribose) polymerase), anapoptotic marker. Its pro-apoptotic nature is a result of the interaction between itstransmembrane domain with Bcl-XL, an anti-apoptotic protein of the Bcl-2 family(Liu et al., 2014). Like ORF3a, overexpression of ORF7a activates NF-κB and c-Jun N-terminal kinase (JNK), augmenting the production of pro-inflammatory cytokines such asinterleukin 8 (IL-8) and RANTES (Liu et al., 2014). ORF7a overexpression inducedapoptosis can occur via a caspase-3 dependent pathway as well as p38 MAPK pathway(McBride & Fielding, 2012).
ORF7b proteinSARS-CoV ORF7b/Protein 7b is a 44 aa long, highly hydrophobic, integraltransmembrane protein with a luminal N-terminal and cytoplasmic C-terminal.The expression of ORF7a is reduced significantly when the orf7a start codon (upstreamof 7b) is mutated to a strong Kozak sequence or when an additional start codon AUG isinserted upstream of the orf7b start codon. Thus, ORF7b may be expressed by “leakyscanning” of the ribosome (Liu et al., 2014). The Golgi-restricted localization of ORF7bwas attributed to the transmembrane domain (Liu et al., 2014). However, the role ofORF7a and ORF7b during the replication of SARS-CoV remains uncertain (McBride &Fielding, 2012).
ORF8 proteinHuman SARS-CoV-2 isolated from early patients, Civet SARS-CoV and other batSARSr-CoV contains the full-length ORF8 (Fuk-Woo Chan et al., 2020). Two genomes ofgenus Betacoronavirus isolated from horseshoe bat, SARS-Rf-BatCoV YNLF_31C andYNLF_34C are 93% identical to the human and civet SARS-CoV genome. However, allHuman SARS-CoV isolates from mid and late-phase patients contain a signature 29
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 10/31
nucleotide deletion in the orf8 gene, splitting it into orf8a and orf8b. This suggests thatORF8 may play a role in interspecies transmission. The generation of civet SARSr-CoVcould be a result of a potential recombination event between SARSr-Rf-BatCoVs andSARSr-Rs-BatCoVs identified around ORF8 (Lau et al., 2015). Interestingly, the newSARS-CoV-2 ORF8 shares very less homology with the conserved ORF8 or ORF8bisolated from Human SARS-CoV or its related viruses, Civet SARS-CoV, Bat-CoVYNLF_31C andYNLF_34C. The ORF8 lacks any known functional domain or motif.SARS-CoV ORF8b has an aggregation motif VLVVL (75–79aa) which is known to activateNLRP3 inflammasomes and trigger intracellular stress pathways (Fuk-Woo Chan et al.,2020), but this is not conserved in SARS-CoV-2. Notably, both ORF3 and ORF8 inSARS-CoV-2 are highly divergent from the interferon antagonist ORF3a andinflammasome activator ORF8b in SARS-CoV (Yuen et al., 2020).
ORF10 proteinThe SARS-CoV-2 ORF10 protein has no homologs in SARS-CoV and any other membersof genus Betacoronavirus (Xu et al., 2020). Viruses are known to sabotage ubiquitinationpathways for replication and pathogenesis. The ORF10 interacts with specifically theCUL2ZYG11B complex and the rest of the members of the Cullin-2 (CUL2) RING E3 ligasecomplex. ZYG11B degrades substrates with exposed N-terminal glycine residues.By studying the ORF10 interactome, it was observed that it interacts the most with theCULZYG11B complex (Gordon et al., 2020). This evidence shows that ORF10 might bind tothis complex and hijack its ubiquitination of restriction factors (Gordon et al., 2020).
METHODOLOGY (GENOMES, DATABASES, AND TOOLSUSED IN THIS STUDY)Complete genomes, belonging to SARS-CoV-2 (167 genomes), SARS-CoV (312 genomes)and Pangolin CoV (five genomes) were downloaded from NCBI Virus webpage(https://www.ncbi.nlm.nih.gov/labs/virus/) as on 29 March 2020. Similarly, RefSeqassemblies of 56 suborder Cornidovirineae genomes (including 18 from Betacoronavirusgenus) along with the genome of Paguronivirus-1 (as an outgroup) were downloadedas on 1 April 2020. The latest version of the Virus database was downloaded from theNCBI Virus page as on 6 April 2020.
For alignments of the studied genomes, as required for phylogeny and mutationalstudies, we have used MUSCLE v3.8.1551 and MEGAX (Edgar, 2004; Kumar et al., 2018)tools as per their default settings. To generate maximum likelihood phylogenies ofgenome-based alignments, RAxML v8.2.12 (Stamatakis, 2014) was used to performrapid bootstrap analysis and search for bestscoring ML tree in one program run (“-f a”parameter) using GTRGAMMA as nucleotide substitution model (-m) with 100 bootstrapvalues. After obtaining the newik trees from RAxML, an online iTOL platform (Letunic &Bork, 2019) was used to visualize phylogenetic trees as shown in this study. For eachexperiment in respective result sections, we have provided comprehensive details about thegenomes under study, research methodology, and used parameters.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 11/31
For the pangenome study, Proteinortho v6.0.12 (Lechner et al., 2011) was used with anE value cutoff 1E−05 for each identified hit along with other default settings. It utilizesdiamond v0.8.36.98 and BLAST 2.8.1+ (Christiam et al., 2009; Buchfink, Xie & Huson,2014) to identify ortholog proteins using reciprocal blast strategy. For the identificationof pairwise average nucleotide identity (ANI) values within all genomes under study,PYANI v0.1.2 (Pritchard et al., 2016) program was used. Throughout this study, theWuhan Hu-1 strain has been used as a reference for SARS-CoV-2 genomes.
RESULTS AND DISCUSSIONHuman SARS-CoV-2 genome is quite disparate as compared to otherBetacoronavirus genomesThe SARS outbreak in 2003 and MERS in 2012 had already set a precedent for widespreadgenome analysis, therefore, with the recent emergence of COVID-19 in November2019, extensive sequencing and genome analysis of SARS-CoV-2 isolates are beingperformed. Since February 2020, several Bat and Pangolin coronaviruses have also beensequenced to understand the probable origin of SARS-CoV-2. In this study, we haveanalyzed 312 SARS-CoV, 167 SARS-CoV-2 and 5 Pangolin CoV genomes to understandtheir genomic conservation, unique genes, mutational hotspots and respective evolutionand origin.
Phylogeny relationship at suborder Cornidovirineae levelSARS-CoV-2 is a member of suborder Cornidovirineae, family Coronaviridae, genusBetacoronavirus, and subgenus Sarbecovirus. In this study, we have tried to understand thestrain diversity and evolutionary relationships at different taxonomy levels in a top-bottomclassification scheme. For this analysis, the complete whole-genome sequences of 56suborder Cornidovirineae reference genomes were analyzed, which included membersfrom five genera i.e. 23 Alphacoronavirus, 20 Betacoronavirus, 10 Deltacoronavirus andthree Gammacoronavirus and the genome of Paguronivirus-1 was included as an outgroupfor this study. Their genome length varied from 25,425-31,686 nucleotides. ClustalWalignment (using MEGA-X) generated an alignment of 35,601 nucleotides off which17,284 conserved sites were stripped and used to generate an ML-based phylogeny usingRAxML. We noted that this phylogeny is unambiguously able to demarcate all the generainto their respective monophyletic clades (Fig. S1). The significant variations amongstbranch length distances indicate the relative diversity within each genus.
Phylogeny relationship at genus Betacoronavirus levelWe generated the phylogeny of 18 Betacoronavirus reference strains (along with fivePangolin CoV strains) using their complete genomes ranging within 29,114–31,526nucleotides. ClustalW based whole genome alignment using MEGAX generated analignment of length 33,346 from which 24,555 conserved nucleotide sites were strippedand used to generate ML phylogeny (Fig. S2). Three distinct clades were seen in thephylogeny: the first clade includes strains from subgenus Sarbecovirus, Nobecovirus andHibecovirus, second with Embecovirus members, and the third clade of members of
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 12/31
subgenus Merbecovirus. Within the first clade, both members of subgenus Sarbecovirus(SARS-CoV and SARS-CoV-2) are grouped with Pangolin CoV strains, which is supportedby their percentage identity. All Pangolin CoV genomes are >99.8% identical to each otherwhereas their closest relatives are the SARS-CoV-2 reference strain (~85% identity),followed by the SARS-CoV reference strain (~79% identity). These three groups are closest(~57% identity) to their sister branch occupied by subgenus Hibecovirus strain: BatHp-betacoronavirus Zhejiang 2013. Two reference strains of another subgenusNobecovirus (~67% identity with each other) form a separate clade that is closest (~52%identity with other group members) to the above-mentioned clade comprising subgenusSarbecovirus and Hibecovirus. The second clade only has representatives of subgenusEmbecovirus including Murine hepatitis virus, Bovine CoV, Rabbit CoV, Rat CoV, etc.Some of these strains form respective subclades in the phylogeny based on their closeness.Strains belonging to subgenus Merbecovirus that include the MERS, form the thirdclade. MERS genomes (England 1 and MERS Co.) are 99.7% identical to each other andtherefore form a separate sub-clade. As observed from their phylogenetic branch lengthsand percentage identity matrix, all subgenera are quite distinct from each other.As expected, members of each subgenus are closer to each other, therefore formingrespective monophyletic clades.
Phylogeny relationship at subgenus Sarbecovirus levelTo understand the similarities and differences among the strains of subgenus Sarbecovirus,we analyzed the complete genomes of 103 SARS-CoV-2 and 312 SARS-CoV isolatesincluding the reference strains of Wuhan Hu-1 (SARS-CoV-2) and Tor2 (SARS-CoV).Since Pangolin CoV belongs to subgenus Sarbecovirus, we also included 5 Pangolin CoVgenome sequences in our dataset and aligned them using MUSCLE. The genome size ofSARS-CoV, Pangolin CoV and SARS-CoV-2 strains were in the range of 29,013–30,311nucleotides, 29,795–29,806 nucleotides and 29,325–29,945 nucleotides, respectively.The alignment of length 31,718 nucleotides was stripped to a conserved region of 26,762nucleotides (sites with gaps in between them were excluded) which was used to generate amaximum likelihood phylogeny using RAxML (Fig. 1; Fig. S3). We noted from thephylogeny that SARS-CoV-2 strains form a single monophyletic clade as compared to theSARS-CoV. SARS-CoV strain RaTG13 procured from the bat in 2013 from Yunnan,China is closest to SARS-CoV-2 strains, suggesting that they might have diverged from acommon ancestor. Average Nucleotide Identity (ANI) studies also indicated that theSARS-CoV-2 reference genome and the Bat RaTG13 branch share 96.11% identity at thegenome level, a finding supported by earlier reports (Lv et al., 2020; Zhou et al., 2020).The next closest strains in the phylogeny were those of Pangolin CoV, all forming asubclade. Other close relatives are the SARS-CoV strains Bat CoVZC45 and BatCoVZXC21, procured from Zhoushan, eastern China in 2018. The phylogeny suggeststhat Pangolin CoVs are closer to SARS-CoV-2 strains as compared to CoVZC45 andCoVZXC21, however, genome-wide percentage identity suggests the opposite—thegenomes of both CoVZC45 and CoVZXC21 are ~89% identical to SARS-CoV-2, asopposed to its 86% identity with Pangolin CoVs.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 13/31
MT0
4033
5.1
Pan
golin
PC
oV G
X-P
5L
JF292911 SA
RS
-1 US
A M
us MA
15 d4ym1
MT044258 SARS-2 USA Hum CA6
KY
4171
45 S
AR
S-1
Chi
na B
at R
f409
2
KF514423 SARS-1 USA NA WTic c3P20-2009
JF292922 SARS-1 USA NA ExoN1 c5P1
KF514417 SARS-1 USA NA ExoN1 c5 3P20-2010
MT1
6372
1 SA
RS-
2 U
SA H
um W
A9-
UW
6
MT159720 SARS-2 USA Hum CruiseA-4
AY291315 S
AR
S-1 G
ermany H
um Frankfurt-1
HQ890529 SARS-1 USA Mus MA15-ExoN1 d2ym4
KF514400 SARS-1 USA NA WTic c1 8P20-2010
MT0
6617
5 S
AR
S-2
Tai
wan
Hum
NTU
01
HQ890536 SARS-1 USA Mus MA15-ExoN1 d2om3
LC528233 SARS-2 NA Hum Hu-DP-Kng-19-027-RNA
AY427439 SAR
S-1 NA H
um A
S
FJ882953 SARS-1 USA Mus MA15-ExoN1 P3pp4
AY
394979 SA
RS
-1 NA
NA
GZ-C
AY394998 SARS-1 China NA LC1
AY279354 SARS-1 NA NA BJ04
AY362699 SARS-1 NA NA TWC3
LC528232 SARS-2 NA Hum Hu-DP-Kng-19-020-RNA
JF292915 SA
RS
-1 US
A M
us MA
15 d4ym5
EU371564 SARS-1 NA NA BJ182-12
DQ
4120
43 S
AR
S-1
NA
Bat
Rm
1
HQ890533 SARS-1 USA Mus MA15-ExoN1 d4ym3
DQ
898174 SA
RS
-1 Canada N
A C
V7
MK
062182 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S-C
3-MA
AY3949
85 S
ARS-1 China N
A HSZ-B
b
GQ
1535
48 S
AR
S-1
Ch
ina
Bat
HK
U3-
13
FJ882942 SARS-1 USA Mus MA15-ExoN1 P3pp5
KF514399 SARS-1 USA NA WTic c1 1P20-2010
AY
323977 SA
RS
-1 Italy NA
HS
R-1
GQ
1535
44 S
AR
S-1
Ch
ina
Bat
HK
U3-
9
AY345987 SARS-1 HongKong NA AG02
MN996529 SARS-2 China Hum WIV05
AY394990 SARS-1 China NA HZS2-E
AY
278741 SA
RS
-1 NA
NA
Urb
ani
KF514416 SARS-1 USA NA ExoN1 c5 8P20-2010
MT0
7286
4.1
Pan
golin
PC
oV G
X-P
2V
AY502930 SARS-1 Taiwan NA TW7
AY283794 S
AR
S-1 S
ingapore NA
Sin2500
JF292920 SA
RS
-1 US
A M
us MA
15 d3om5
MT184913 SARS-2 USA Hum CruiseA-26
AY291451 SAR
S-1 Taiwan N
A TW1
AY350750 SARS-1 NA NA PUMC01
EU371562 SARS-1 NA NA BJ182-4
GU553365 SARS-1 USA M
on HKU-39849 HARROD-00003
HQ890527 SARS-1 USA Mus MA15-ExoN1 d2ym2
HQ890535 SARS-1 USA Mus MA15-ExoN1 d2om2
FJ882946 SARS-1 USA NA wtic-MB P3pp13
AY485278 SARS-1 NA NA Sino3-11
FJ882955 SARS-1 USA NA ExoN1 P3pp19
FJ95
9407
SA
RS-
1 C
hina
Mon
A00
1
AY54
5916
SA
RS-
1 C
hina
NA
HC
-SZ-
266-
03
KF514420 SARS-1 USA NA ExoN1 c5P10-2009
MK
062179 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S
FJ882951 SARS-1 USA Mus MA15-ExoN1 P3pp3
AY502931 SARS-1 Taiwan NA TW8
AY864805 SARS-1 C
hina NA B
J162
MT0
5049
3 SA
RS-
2 In
dia
Hum
166
AY394989 SARS-1 China NA HZS2-D
MT121215 SARS-2 China Hum SH01
MT1
6371
9 SA
RS-2
USA H
um W
A7-UW
4
MT184910 SARS-2 USA Hum CruiseA-23
MT198653 SARS-2 Spain Hum Valencia001
MK
062184 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S-C
7-MA
AY559091 S
AR
S-1 S
ingapore NA
SinP
4
AY282752 SARS-1 HongKong NA Su10
MT1849
11 S
ARS-2 USA H
um Cru
iseA-24
LR75
7995
SA
RS
-2 C
hina
Hum
sea
food
-mar
ket
AY395000 SARS-1 China NA LC3
AY394991 SARS-1 China NA HZS2-Fc
3102X
S t aB a
nih
C 1-S
RA
S 318374JK
FJ882941 SARS-1 USA NA ExoN1 P3pp8
JF292903 SARS-1 USA Mus MA15-ExoN1 d4ym5
FJ882962 SARS-1 USA Mus MA15-ExoN1 P3pp10AY345988 SARS-1 HongKong NA AG03
8-3U
KH ta
B ani
hC 1-
SR
AS 345351
QG
AY3044
88 S
ARS-1 H
ongKong NA S
Z16
KF514396 SARS-1 USA NA WTic c3P10-2009
MT159722 SARS-2 USA Hum CruiseA-6
AY
283796 SA
RS
-1 Sin
gap
ore N
A S
in2679
MT159717 SARS-2 U
SA Hum C
ruiseA-1
KF514409 SARS-1 USA NA WTic c2P20-2009
KF514421 SARS-1 USA NA WTic c1 2P20-2010
GQ
1535
42 S
AR
S-1
Ch
ina
Bat
HK
U3-
7
AY394999 SARS-1 China NA LC2
JX163927 S
AR
S-1 U
SA
NA
Tor2-F
P1-10851
KF514388 SARS-1 USA NA WTic c1 5P20-2010
GQ
1535
47 S
AR
S-1
Ch
ina
Bat
HK
U3-
12
MT1
3504
4 SA
RS-
2 C
hina
Hum
235
HQ
890541 SA
RS
-1 US
A M
us MA
15 d2ym1
FJ882940 SARS-1 USA NA ExoN1 P3pp37
KF514414 SARS-1 USA NA ExoN1 c5P20-2009
FJ882938 SARS-1 USA NA wtic-MB
AA
N er
opa
gni
S 1-
SR
AS
4909
55Y
648
niS
MT126808 SARS-2 Brazil Hum SP02
AY502932 SARS-1 Taiwan NA TW9
KY
4171
47 S
AR
S-1
Ch
ina
Bat
Rs4
237
MN
9383
84 S
AR
S-2
Chi
na H
um S
Z-00
2a 2
020
AY394987 SARS-1 China NA HZS2-Fb
MT0
4033
3.1
Pan
golin
PC
oV G
X-P
4L
KF514419 SARS-1 USA NA WTic c1P10-2009
JF292916 SAR
S-1 USA M
us MA
15 d3om1
MG
7729
34 S
AR
S-1
Ch
ina
Bat
Co
VZ
XC
21
GQ
1535
39 S
AR
S-1
Ch
ina
Bat
HK
U3-
4
AY357075 SARS-1 NA NA PUMC02
KF514412 SARS-1 USA NA ExoN1 c13P20-2009
MN996530 SARS-2 China Hum WIV06
AY3044
86 S
ARS-1 H
ongKong NA S
Z3
AY362698 SARS-1 NA NA TWC2
MT192765 SARS-2 USA Hum PC00101P
KF
5699
96 S
AR
S-1
Ch
ina
Bat
LY
Ra1
1
KU
9736
92 S
AR
S-1
Chi
na B
at F
46
KF514413 SARS-1 USA NA WTic c1 6P20-2010
HQ890539 SARS-1 USA Mus MA15-ExoN1 d3om1
AY485277 SARS-1 NA NA Sino1-11
KF514392 SARS-1 USA NA WTic c1 4P20-2010
AY
714217 SA
RS
-1 US
A H
um
CD
C-200301157
HQ890530 SARS-1 USA Mus MA15-ExoN1 d2ym5
KF514391 SARS-1 USA NA ExoN1 c5 9P20-2010
MT159718 SARS-2 USA Hum CruiseA-2
AY394983 SARS-1 China NA HSZ2-A
MT0398
88 S
ARS-2 U
SA Hum
MA1
AY6868
64 S
ARS-1 N
A Mon B
039
AY395002 SARS-1 China NA LC5
KF514393 SARS-1 USA NA ExoN1 c5 10P20-2010
MT1
8833
9 SA
RS-2
USA H
um M
N3-M
DH3
FJ882936 SARS-1 USA NA wtic-MB P3pp2
MT0
4995
1 SARS-2
Chi
na H
um Y
unna
n-01
AY345986 SARS-1 HongKong NA AG01
KF514408 SARS-1 USA NA WTic c2P1-2009
DQ
6488
57 S
AR
S-1
NA
Bat
279
-200
5
MT1
0605
2 S
AR
S-2
US
A H
um C
A7
EU371560 SARS-1 NA NA BJ182a
AY278490 SARS-1 NA NA BJ03
MT184908 SARS-2 USA Hum CruiseA-21
MT066176 SARS-2 Taiwan Hum NTU02
JF292907 SAR
S-1 U
SA M
us MA
15 d2ym2
MT106053 SARS-2 USA H
um CA8
HQ890526 SARS-1 USA Mus MA15-ExoN1 d2ym1
FJ882950 SARS-1 USA NA ExoN1 P3pp60
HQ890531 SARS-1 USA Mus MA15-ExoN1 d4ym1
MT019530 SARS-2 China Hum WH-02-2019
KY
4171
49 S
AR
S-1
Chi
na B
at R
s425
5
MT066156 SARS-2 Italy Hum INMI1
AY56
8539
SA
RS-
1 C
hina
Hum
GZ0
401
LC529905 SARS-2 Japan Hum TKYE6182
MT093571 SARS-2 Sweden Hum 01
MT123293 SARS-2 China Hum IQTC03
MT159711 SARS-2 USA Hum CruiseA-13
EU371563 SARS-1 NA NA BJ182-8
AY395001 SARS-1 China NA LC4
GQ
1535
41 S
AR
S-1
Ch
ina
Bat
HK
U3-
6
KY
4171
48 S
AR
S-1
Ch
ina
Bat
Rs4
247
MT007544 SARS-2 Australia Hum VIC01
LR757998 SARS-2 China Hum seafood-market
AY502929 SARS-1 Taiwan NA TW6
MT0
4033
6.1
Pan
golin
PC
oV G
X-P
5E
AY54
5918
SA
RS-
1 C
hina
NA
HC
-GZ-
32-0
3
JF292908 SAR
S-1 USA M
us MA
15 d2ym3
AY61
3949
SARS-
1 Chi
na M
on P
C4-13
6
DQ
497008 SA
RS
-1 US
A M
us M
A-15
AY394986 SARS-1 C
hina NA H
SZ-Cb
MT184912 SARS-2 USA Hum CruiseA-25
JX163928 SAR
S-1 USA
NA Tor2-FP1-10895
AY39
4996
SA
RS
-1 N
A N
A Z
S-B
KF514404 SARS-1 USA NA WTic c1 9P20-2010
EU371559 SARS-1 NA NA ZJ02
KF514411 SARS-1 USA NA ExoN1 c13P10-2009
AY310120 S
AR
S-1 N
A N
A FR
A
AY348314 SARS-1 Taiwan NA TC3
AY
559090 SA
RS
-1 Singapore N
A S
inP3
KJ4
7381
1 S
AR
S-1
Ch
ina
Bat
JL
2012
AY54
5917
SAR
S-1
Chin
a NA
HC-
GZ-
81-0
3
KJ4
7381
5 S
AR
S-1
Ch
ina
Bat
GX
2013
MT1
2329
2 S
AR
S-2
Chi
na H
um IQ
TC04
AY
559083 SA
RS
-1 Sin
gap
ore N
A S
in3408
KC
8810
06 S
AR
S-1
Chi
na B
at R
s336
7
MT1
5282
4 SA
RS-
2 US
A Hu
m W
A2
MN996528 SARS-2 China Hum WIV04
MT1
6371
8 SA
RS-2
USA H
um W
A6-UW
3
AY39
0556
SA
RS-
1 C
hina
NA
GZ0
2
AY502928 SARS-1 Taiwan NA TW5
KF514410 SARS-1 USA NA ExoN1 c8P20-2009
MT07
2688
SARS-2
Nep
al H
um 6
1-TW
KY
4171
51 S
AR
S-1
Chi
na B
at R
s732
7
MT159713 SARS-2 USA Hum CruiseA-15
FJ882961 SA
RS
-1 US
A M
us MA
15 P3pp5
FJ882937 SARS-1 USA NA wtic-MB P3pp18
HQ890540 SARS-1 USA Mus MA15-ExoN1 d3om2
MT159710 SARS-2 USA Hum CruiseA-9
FJ882954 SARS-1 USA NA ExoN1 P3pp46
KJ4
7381
4S
AR
S-1
Ch
ina
Bat
Hu
B20
13
DQ
182595 SA
RS
-1 Ch
ina N
A Z
J0301
JX99
3988
SA
RS
-1 C
hin
a B
at C
p-Y
un
nan
2011
GU
553363 SAR
S-1 China H
um H
KU
-39849 HA
RR
OD
-00001
AY283797 SAR
S-1 Singapore N
A Sin2748
FJ882929 SARS-1 USA NA ExoN1 P3pp1
AY502925 SAR
S-1 Taiwan N
A TW2
FJ882959 SARS-1 USA Mus MA15-ExoN1 P3pp6
AY
351680 SA
RS
-1 NA
NA
ZM
Y-1
AY502924 SARS-1 Taiwan NA TW11
MK
062180 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S-M
A
AY502926 SAR
S-1 Taiw
an NA
TW3
DQ
6488
56 S
AR
S-1
NA
Bat
273
-200
5
AY338175 SARS-1 NA NA TC2
MT159714 SARS-2 USA Hum CruiseA-16
AY321118 SARS-1 NA NA TW
C
AY5720
35 S
ARS-1 C
hina
Mon
civ
et01
0
FJ58
8686
SA
RS
-1 C
hina
Bat
Rs6
72-2
006
KY
4171
44 S
AR
S-1
Chi
na B
at R
s408
4
MT019533 SARS-2 China Hum WH-05
KY
4171
42 S
AR
S-1
Ch
ina
Bat
As6
526
FJ882933 SARS-1 USA NA wtic-MB P3pp6
DQ
0223
05 S
AR
S-1
Ch
ina
Bat
HK
U3-
1
AY
559086 SA
RS
-1 Sin
gap
ore N
A S
in849
AY283798 S
AR
S-1 S
ingapore NA
Sin2774
MT1
0605
4 S
AR
S-2
US
A H
um T
X1
MN
9965
32 S
AR
S-1
Chi
na B
at R
aTG
13
MT159716 SARS-2 USA Hum CruiseA-18
MT0
2088
1 SA
RS-
2 U
SA H
um W
A1-
F6
JF292917 SA
RS
-1 US
A M
us MA
15 d3om2
KY
4171
46 S
AR
S-1
Chi
na B
at R
s423
1
MT184907 SARS-2 USA Hum CruiseA-19
AY278488 SARS-1 NA NA BJ01
FJ882944 SARS-1 USA NA ExoN1 P3pp23
KF514406 SARS-1 USA NA ExoN1 c13P1-2009
AY278487 SARS-1 NA NA BJ02
KF514403 SARS-1 USA NA ExoN1 c5 1P20-2010
FJ882935 SARS-1 USA NA wtic-MB P3pp21
AY61
3947
SA
RS-
1 C
hina
NA
GZ0
402
AY57
2038
SA
RS-
1 C
hina
Mon
civ
et02
0
MT159706 SARS-2 USA Hum CruiseA-8
FJ882927 SARS-1 USA NA wtic-MB P1pp1
AY54
5914
SARS-1
Chi
na N
A HC-S
Z-79
-03
KF514390 SARS-1 USA NA ExoN1 c5 4P20-2010
NC 045512 SARS-2 China Hum Wuhan-Hu-1
AY864806 SARS-1 NA N
A BJ202
MT019532 SARS-2 China Hum WH-04-2019
GQ
1535
45 S
AR
S-1
Ch
ina
Bat
HK
U3-
10
JN854286 SARS-1 UK NA recHKU-39849
KF514395 SARS-1 USA NA ExoN1 c8P1-2009
AY
559097 SA
RS
-1 Sin
gap
ore N
A S
in3408L
AY559089 S
AR
S-1 S
ingapore NA
SinP
2
AY283795 S
AR
S-1 S
ingapore NA
Sin2677
AS
RA
S 78
0955
YA
N er
opa
gni
S 1-
V527
3ni
S
HQ
890542 SAR
S-1 USA M
us MA
15 d2om1
DQ640652 SARS-1 China Hum GDH-BJH01
FJ882949 SARS-1 USA NA wtic-MB P3pp23
AY54
5915
SARS-
1 Chi
na N
A HC-S
Z-DM
1-03
MN988668 SARS-2 China Hum WHU01
MT159719 SARS-2 USA Hum CruiseA-3
AY278491 SARS-1 China NA HKU-39849
MT184909 SARS-2 USA Hum CruiseA-22
KF514401 SARS-1 USA NA ExoN1 c5 5P20-2010
AY39
5003
SA
RS
-1 N
A N
A Z
S-C
FJ882947 SARS-1 USA NA wtic-MB P3pp7
MN99
4467
SARS-2
USA H
um C
A1
HQ
890544 SAR
S-1 USA
Mus M
A15 d2om
3
FJ882952 SA
RS
-1 US
A M
us MA
15 P3pp4
AY772062 SARS-1 NA Mus W
H20
MN988669 SARS-2 China Hum WHU02
MT019529 SARS-2 China Hum WH-01-2019
HQ
890545 SAR
S-1 USA
Mus M
A15 d2om
4
MT118835 SARS-2 USA Hum CA9
JX99
3987
SA
RS
-1 C
hin
a B
at R
p-S
haa
nxi
2011
FJ882934 SARS-1 USA NA wtic-MB P3pp29A
N 1-S
RA
S 240214Q
D1f
R taB
AY57
2034
SARS-1
Chi
na M
on c
ivet
007
JF292905 SARS-1 USA Mus MA15-ExoN1 d3om4
MG
7729
33 S
AR
S-1
Chi
na B
at C
oVZC
45
MN996531 SARS-2
China Hum W
IV07
AY559088 S
AR
S-1 S
ingapore NA
SinP
1
KF514407 SARS-1 USA NA ExoN1 c5 7P20-2010
FJ882943 SARS-1 USA Mus MA15-ExoN1
AY
559081 SA
RS
-1 Sin
gap
ore N
A S
in842
AY463059 SARS-1 NA N
A QXC1
DQ
0842
00 S
AR
S-1
Ch
ina
Bat
HK
U3-
3
MK
062181 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S-C
3
AY5459
19 S
ARS-1 C
hina
NA C
FB-S
Z-94
-03
JF292918 SA
RS
-1 US
A M
us MA
15 d3om3
JF292914 SA
RS
-1 US
A M
us MA
15 d4ym4
AY39
4997
SA
RS
-1 N
A N
A Z
S-A
MT159708 SARS-2 USA Hum CruiseA-11
FJ882956 SARS-1 USA NA ExoN1 P3pp53
AY502927 SAR
S-1 Taiwan N
A TW
4
MT0
4425
7 SARS-2
USA H
um IL
2
KF514422 SARS-1 USA NA WTic c1 3P20-2010 N
C 004718 S
AR
S-1 C
anad
a Hu
m To
r2
AY
559093 SA
RS
-1 Sin
gap
ore N
A S
in845
AY
559085 SA
RS
-1 Sin
gap
ore N
A S
in848
MN996527 SARS-2 China H
um WIV02
MT1
6372
0 SA
RS-
2 U
SA H
um W
A8-
UW
5
JF292910 SA
RS
-1 US
A M
us MA
15 d2ym5
AY461660 S
AR
S-1 R
ussia NA
SoD
AY394992 SARS-1 China NA HZS2-CKF514415 SARS-1 USA NA W
Tic c1 7P20-2010
KY
4171
50 S
AR
S-1
Chi
na B
at R
s487
4
MT123291 SARS-2 China Hum IQTC02
HQ890528 SARS-1 USA Mus MA15-ExoN1 d2ym3
AY278554 SARS-1 H
ongKong NA W
1
DQ
0841
99 S
AR
S-1
Ch
ina
Bat
HK
U3-
2
JQ316196 SARS-1 UK NA HKU-39849 UO
B
MT020781 SARS-2 Finland Hum Jan29
AY595412 SARS-1 China NA LLJ-2004
HQ890537 SARS-1 USA Mus MA15-ExoN1 d2om4
MN994468 SARS-2 USA Hum CA2
MT0270
63 S
ARS-2 U
SA Hum
CA4
JX162087 SARS-1 USA NA ExoN1 c5P10
JF292921 SARS-1 USA NA wtic-MB c1P1
AY3949
95 S
ARS-1 C
hina N
A HSZ-C
c
MT192773 SARS-2 VietNam H
um nCoV-19-02S
KF514397 SARS-1 USA NA WTic c2P10-2009
KC
8810
05 S
AR
S-1
Chi
na B
at R
sSH
C01
4
KY
4171
52 S
AR
S-1
Chi
na B
at R
s940
1
GQ
1535
46 S
AR
S-1
Ch
ina
Bat
HK
U3-
11
AY3949
94 S
ARS-1 C
hina N
A HSZ-B
c
MT2
2661
0 SARS-2
Chi
na H
um K
MS1
FJ882928 SARS-1 USA NA ExoN1 P1pp1
AY
297028 SA
RS
-1 NA
NA
ZJ01
FJ882963 S
AR
S-1 U
SA
Hu
m P
2
AY394850 SARS-1 NA NA WHU
GU
553364 SAR
S-1 China H
um H
KU
-39849 HA
RR
OD
-00002
AY
274119 SA
RS
-1 Can
ada H
um
Tor2
AY463060 SARS-1NA N
A QXC2
KF514402 SARS-1 USA NA ExoN1 c5 6P20-2010
AS
RA
S 28
0955
YA
N er
opa
gni
S 1-
258
niS
MT192759 SARS-2 Taiwan Hum CGMH-CGU-01
KF3
6745
7 S
AR
S-1
Chi
na B
at W
IV1
JF292902 SARS-1 USA Mus MA15-ExoN1 d4ym4
AY
559092 SA
RS
-1 Singapore N
A S
inP5
FJ882926 SARS-1 USA NA ExoN1
MT1597
15 S
ARS-2 U
SA Hum
Cru
iseA-1
7
KF514389 SARS-1 USA NA ExoN1 c8P10-2009
AY
394978 SA
RS
-1 NA
NA
GZ-B
MT039873 SARS-2 China Hum HZ-1
JF292909 SA
RS
-1 US
A M
us MA
15 d2ym4
AY61
3950
SARS-
1 Chi
na M
on P
C4-22
7
MT027064 SARS-2 USA Hum CA5
FJ882960 SARS-1 USA NA ExoN1 P3pp34MT012098 SARS-2 India Hum 29
MT0270
62 S
ARS-2 U
SA Hum
CA3
MT093631 SARS-2 China Hum WH-09
AY357076 SARS-1 NA NA PUMC03
MT159721 SARS-2 USA Hum CruiseA-5
AY502923 SARS-1 Taiwan NA TW10
FJ882948 SA
RS
-1 US
A M
us MA
15 P3pp3
MT188340 SARS-2 USA Hum MN2-MDH2
AS
RA
S 48
0955
YA
N er
opa
gni
S 1-
V567
3ni
S
MT159707 SARS-2 USA Hum CruiseA-10
MN
9853
25 S
AR
S-2
USA
Hum
WA
1
MT0
2088
0 SA
RS-
2 U
SA H
um W
A1-
A12
HQ890538 SARS-1 USA Mus MA15-ExoN1 d2om5
MT159709 SARS-2 USA Hum CruiseA-12
AY68
6863
SARS-1
NA M
on A
022
AY5155
12 S
ARS-1 C
hina
Mon H
C-SZ-6
1-03
FJ882931 SARS-1 USA NA ExoN1 P3pp12
AY
559096 SA
RS
-1 Sin
gap
ore N
A S
in850
MN
9752
62 S
AR
S-2
Chi
na H
um S
Z-00
5b 2
020
MN
9974
09 S
AR
S-2
US
A H
um A
Z1
EU371561 SARS-1 NA NA BJ182b
JF292912 SAR
S-1 USA M
us MA
15 d4ym2
MT163716 SARS-2 USA Hum WA3-UW1
MT1
3504
2 SA
RS-
2 C
hina
Hum
231
MN98
8713
SARS-2
USA H
um IL
1
MT039890 SARS-2 SouthKorea Hum SNU01
FJ882939 SARS-1 USA NA wtic-MB P3pp16
MT192772 SARS-2 VietN
am Hum nCoV-19-01S
JF292906 SARS-1 USA Mus MA15-ExoN1 d3om5
HQ890532 SARS-1 USA Mus MA15-ExoN1 d4ym2
KF514405 SARS-1 USA NA ExoN1 c5 2P20-2010
3102Be
H t aB a
nih
C 1-S
RA
S 218374JK
AY394993 SARS-1 NA NA HGZ8L2
AY313906 SARS-1 China NA GD69
KY
4171
43 S
AR
S-1
Chi
na B
at R
s408
1
MT019531 SARS-2 China Hum WH-03-2019
LR757996 SARS-2 China Hum seafood-market
MT1
8834
1 SARS-2
USA H
um M
N1-M
DH1
MT1
3504
3 S
AR
S-2
Chi
na H
um 2
33
KF514394 SARS-1 USA NA WTic c1P20-2009
MT1
3504
1 SA
RS-
2 C
hina
Hum
105
MT1
6371
7 SA
RS-2
USA H
um W
A4-UW
2
FJ882930 SARS-1 USA NA ExoN1
FJ882957 S
AR
S-1 U
SA
Mu
s MA
15
JF292904 SARS-1 USA Mus MA15-ExoN1 d3om3
HQ890534 SARS-1 USA Mus MA15-ExoN1 d2om1
KP
8868
09 S
AR
S-1
Ch
ina
Bat
YN
LF
34C
AY338174 SARS-1 NA NA TC1
HQ
890543 SA
RS
-1 US
A M
us MA
15 d2om2
AY508724 SARS-1 NA NA NS-1
AY304495 SARS-1 HongKong NA GZ50
MT123290 SARS-2 China Hum IQTC01
KY
3524
07 S
AR
S-1
Ken
ya B
at K
Y72
MK
062183 SA
RS
-1 US
A H
um
Urb
ani icS
AR
S-C
7
JX163923 S
AR
S-1 U
SA
NA
Tor2-FP1-10912
JF292919 SAR
S-1 USA M
us MA
15 d3om4
MT039887 SARS-2 USA Hum WI1
AY27
8489
SA
RS-
1 N
A N
A G
D01
JX163924 S
AR
S-1 U
SA
NA
Tor2-FP1-10851
FJ882945 SA
RS
-1 US
A M
us MA
15 P3pp6
KP
8868
08 S
AR
S-1
Ch
ina
Bat
YN
LF
31C
JF292913 SA
RS
-1 US
A M
us MA
15 d4ym3
MT0
4033
4.1
Pan
golin
PC
oV G
X-P
1E
AY654624 SARS-1 NA NA TJF
AY61
3948
SARS-
1 Chi
na M
on P
C4-13
JX163926 S
AR
S-1 U
SA
NA
Tor2-F
P1-10912
KF514398 SARS-1 USA NA WTic c1 10P20-2010
FJ882932 SARS-1 USA NA wtic-MB P3pp14
KJ4
7381
6 S
AR
S-1
Chi
na B
at Y
N20
13
DQ
0716
15 S
AR
S-1
Ch
ina
Bat
Rp
3
GQ
1535
40 S
AR
S-1
Ch
ina
Bat
HK
U3-
5
AY
559095 SA
RS
-1 Sin
gap
ore N
A S
in847
FJ882958 SA
RS
-1 US
A M
us MA
15 P3pp
7
MT159705 SARS-2 USA Hum CruiseA-7
KF514418 SARS-1 USA NA WTic c3P1-2009
MT159712 SARS-2 USA Hum CruiseA-14
HQ
890546 SAR
S-1 USA
Mus M
A15 d2om
5
JX163925 S
AR
S-1 U
SA
NA
Tor2-F
P1-10895
0.08
0.16
0.24
0.32
0.4
0.48
0.56
0.64
0.72
0.04
0.12
0.2
0.28
0.36
0.44
0.52
0.6
0.68
Tree scale: 0.1
Taxonomy
SARS-CoV-2
SARS-CoV
Pangolin CoV
Host
Human
Bat
Pangolin
Monkey
Civets
Mouse
Pig
Unknown
Figure 1 Maximum Likelihood (ML) phylogeny representing relationship amongst 312 SARS-CoV, 103 SARS-CoV-2, and five Pangolin CoVstrains. The whole-genome sequences of 420 isolates were aligned using MUSCLE and stripped to include the highly conserved alignments across allstrains. The final alignment was subjected to RAxML to generate the ML phylogeny utilizing the GTRGAMMA model of nucleotide substitutionwith 100 bootstrap replicates. The phylogeny is depicted with branch length consideration. The inner-circle represents the taxonomy of all strains(depicting SARS-CoV, SARS-CoV-2 and Pangolin CoV). The outermost circle represents the respective host of each strain. Inner Blue and Grayalternative dashed lines represent an internal tree scale with a branch length increment of 0.04 from inside to outside.
Full-size DOI: 10.7717/peerj.9576/fig-1
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 14/31
We also examined the ANI scores of the above-identified closest strains of SARS-CoV-2with all SARS-CoV, SARS-CoV-2 and Pangolin CoV genomic sequences used in this study(Table S2). We found that the Bat RaTG13 strain is 95.97–96.12% identical whencompared to all SARS-CoV-2 genomes under study. Amongst the SARS-CoV strains, BatCoVZC45 and Bat CoVZXC21 are its closest neighbors with 89% genome identity,indicating a common ancestor in bat coronaviruses, whereas, with Pangolin CoV, theyshare lowest (~86%) nucleotide identity. Bat CoVZC45 and Bat CoVZXC21 are ~97%identical to each other, whereas their identity with SARS-CoV-2 (~89%) is higher thanthat with SARS-CoV (86–89%) and lowest with Pangolin CoV (~85%). Similarly,Pangolin CoVs share maximum identity with Bat RaTG13 SARS-CoV (86.50%) followedby SARS-CoV-2 (86.30–86.38%) and lowest with other SARS-CoV. Therefore, we canargue that as SARS-CoV-2, Bat RaTG13 and Pangolin CoV have considerable genomesimilarity, they possibly diverged from a common ancestor and SARS-CoV-2 was latertransmitted to humans through recombination and transformation events via an unknownintermediate host.
Phylogeny relationship at SARS-CoV-2 levelSARS-CoV-2 strains, like other viruses, have a high mutation rate for better adaptabilityand survival. Therefore, one of our aims was to understand the diversity amongSARS-CoV-2 isolates. Whole-genome sequences of 167 SARS-CoV-2 were aligned usingMUSCLE (alignment length 29,950 nucleotides) and the conserved sites of length 29,725were used to generate a maximum-likelihood phylogeny using RAxML (Fig. 2; Fig. S4).Irrespective of high similarity amongst all (>99.90% identity), several subclades can beidentified from the phylogeny based on their distance from each other, depicting theirsubtypes. Our data has 97 isolates from the USA, 46 from China, five from Japan, fourfrom Spain, three from Taiwan, two from Vietnam and India, one isolate each fromSweden, South Korea, Pakistan, Nepal, Italy, Finland, Brazil and Australia. From thegenome sequence data of 15 distinct geographical locations, we were able to identify a fewphylogenetic clusters depicting country-specific patterns. We identified multiple clustersper country suggesting the existence of multiple subtypes. The KMS1 isolate fromChina was found to be the most distinct one, followed by WA-UW230, WA-UW194,WA-UW218 isolates from the USA. The Wuhan Hu-1 isolate shares a sister cladewith the USA Cruise samples, indicative of high similarity. Isolates sampled at theUniversity of Washington (WA, USA) form two separate subclades within two differentclades in the phylogeny, suggesting that even amongst people residing in a limitedarea in the state of Washington, multiple strains exist and are causing COVID-19.Furthermore, both the WA subclades have Valencia-Spain isolates as the closestbranch/subclade, suggesting that at least two individuals from these two locationsindependently encountered each other and transmitted different subtypes of the virus.This study also included two distinct genomes sampled from India which weredistributed by the phylogeny into two separate clusters signifying the existence ofdifferent subtypes.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 15/31
As strains from diverse locations continue to be sequenced and analyzed, more reliablecountry-specific patterns showing relatedness can be obtained. Similar to several cuttingedge studies (Woo et al., 2010; Fuk-Woo Chan et al., 2020; Zhang, Wu & Zhang, 2020;Rehman et al., 2020), this study also provides an example of how phylogenetic analysiscan help generate clusters of identical strains, further enabling the identification ofsignature mutations amongst those diverse groups.
MT093631 China WH-09
MT246469 USA WA-UW212
NC 045512 China Wuhan-Hu-1
MT121215 China SH01 MT246479 USA WA-UW222
MT233523 Spain Valencia8
MT184911 U
SA CruiseA-24
MT253700 China HZ-4
81M
T039873 China HZ-1
MT027064 U
SACA
5
MT2
4644
9 U
SA W
A-U
W19
2
MT184913 USA CruiseA-26
MT246478 USA WA-UW221
MT246486 USA WA-UW229
MT159709 USA CruiseA-12
MT253697 China HZ-178
MT253708 China H
Z-79
MT159719 USA CruiseA-3
MT246476 USA WA-UW219
MN996531 China W
IV07
MT251975 USA WA-UW239
MT253703 China HZ-551
MT246667 USA FDAARGOS 983
MT246488 USA WA-UW231
MN988669 China WHU02
MT253698 China H
Z-185
MT2
5197
3 U
SA W
A-U
W23
7
MT066176 Taiwan NTU02
MT135042 China 2
31
MT251980 USA WA-UW242
MT019529 China IPBCAM
S-WH-01
LC529905 Japan TKYE6182
MT2
5197
9 U
SA W
A-U
W24
1
MT2
5370
6 Ch
ina
HZ-
62
MT251978 USA WA-UW235
MT246473 USA WA-UW216
MT039888 U
SA MA1
MT184907 USA CruiseA-19
MT246461 USA WA-UW204
MT253701 China HZ-48
MT039887 USA WI1
MT163717 USA WA4-UW2
MT159712 USA CruiseA-14
MT123291 China IQTC02
MT2
4645
0 U
SA W
A-U
W19
3
MT019533 China IPBCA
MS-W
H-05
MT2
2661
0 Ch
ina
KMS1
20P
S li z
arB
8086
21T
M
MT123290 China IQTC01
MT246474 USA WA-UW217
MT066156 Italy IN
MI1
MT240479 Pakistan Gilgit1
MN
994468 USA CA
2
MT019531 China IPBCAMS-WH-03
MT123293 China IQTC03
LC528233 Japan 19-027
LR757998 China Wuhan seafood m
arket
MT184909 U
SA CruiseA-22
MT253696 China HZ-162
MT0
6617
5 Ta
iwan
NTU
01
MT246466 USA WA-UW209
LC534419 Ja
pan 19-4
37
MT246452 USA WA-UW195
MT192772 VietNam 01S
MN996529 China WIV05
791
WU-
AW
AS
U 45
4642
TM
MN
9752
62 C
hina
HKU
-SZ-
005b
MN9
9740
9 US
A AZ
1
MT246477 USA WA-UW220
MT118835 USA CA9
MN996530 China WIV06
MT0
4425
7 U
SA IL
2
MT027062 U
SA CA3
MT163718 USA WA6-UW3
MT246471 USA WA-UW214
MT019532 China IPBCAMS-WH-04
MT198652 Spain Valencia003
MT246464 USA WA-UW207
MT152824 USA WA2M
T159713 USA CruiseA-15
MT246489 USA WA-UW232
MT253705 China HZ-60 M
T027063 USA CA4
MT251977 USA WA-UW234
MT163716 U
SA WA
3-UW
1
MT2
4648
1 US
A W
A-UW
224
MT188340 USA M
N2-MDH2
MT2
3352
2 Spain
Valen
cia7
MN
9944
67 U
SA C
A1
MT012098 India 29
MT159711 USA CruiseA-13
MT159720 USA CruiseA-4
MN
9887
13U
SA IL
1
MT246475 USA WA-UW218
MT163719 USA WA7-UW4
MT159722 USA CruiseA-6
MT246457 USA WA-UW200
MT233526 USA DAARGOS 983
MT2
5197
6 U
SA W
A-U
W24
0
MN996528 China WIV04
MT2
4648
4 U
SA W
A-U
W22
7
MT246459 USA WA-UW202
MT159707 USA CruiseA-10
MT2
5370
9 Ch
ina
HZ-
90
MT2
4648
0 USA
WA-
UW22
3
774-Z
H ani
hC 996352
TM
MT159705 USA CruiseA-7
MT1
35043 Chin
a 233
MT0
4995
1 Ch
ina
Yunn
an-0
1
LR757996 China Wuhan seafood market
MT184908 USA CruiseA-21
9-Aesi
urC
AS
U 017951T
M
MT039890 SouthKorea SN
U01
MT2
4646
7 US
A W
A-UW
210
LC534418 Japan 19-031
MT0
5049
3 In
dia 1
66
MT253702 China HZ-49
MT2
4647
0 USA
WA-U
W21
3
MT188339 USA MN3-MDH3
MT020781 Finland 29 Jan
MT007544 Australia VIC01
MT246472 USA WA-UW215
MT253710 China HZ-91
MT2
4645
3 U
SA W
A-U
W19
6
MT159721 USA CruiseA-5
MT246462 USA WA-UW205
MT1
9276
5 USA
PC0
0101
P
MT159708 USA CruiseA-11
MT2
4645
1 USA
WA-U
W19
4
MT159706 USA CruiseA-8
MT192773 VietNam 02S
MT020881 USA WA1-F6
MT1
2329
2 Ch
ina
IQTC
04
MT072688 N
epal 61-TW
MT246455 USA WA-UW198
MT192759 Taiwan CGM
H-CGU-01M
N93
8384
Chi
na H
KU-S
Z-00
2a
MT2
4646
0 US
A W
A-UW
203
MT246482 USA WA-UW225
MT159718 U
SA CruiseA-2
MT1
3504
4 Chin
a 235
MT1
0605
4 US
A TX
1
MT253704 China HZ-576
MT044258 USA CA6M
T159717 USA CruiseA-1
MN996527 China W
IV02
MT1
5971
5 U
SA C
ruis
eA-1
7
MN988668 China WHU01
LC528232 Japan 19-020
MT159714 USA CruiseA-16
MT159716 USA CruiseA-18
MT233519 Spain Valencia5
MN985325 USA WA1
MT106053 USA CA8
MT1
0605
2 U
SA C
A7
MT020880 USA WA1-A12
MT2
5197
4 US
A W
A-UW
238
MT253707 China HZ-638
MT184910 USA CruiseA-23
MT1
3504
1 Chi
na 10
5
MT188341 USA MN1-MDH1
MT2
4649
0 U
SA W
A-U
W23
3
LR75
7995
Chi
na W
uhan
seaf
ood
mar
ket
MT2
4648
7 U
SA W
A-U
W23
0
MT093571 Sw
eden 01
MT019530 China IPBCAMS-W
H-02
MT184912 USA CruiseA-25
0.00005
0.00015
0.00025
0.00035
0.00045
0.00055
0.00065
0.00075
0.00085
0.00095
Tree scale: 0.001
Geo Location:
China
USA
India
Spain
Vietnam
Pakistan
Sweden
Japan
Italy
Brazil
Figure 2 ML phylogeny representing relationship amongst 167 SARS-CoV-2 strains. The whole-genome sequences of 167 isolates were alignedusing MUSCLE, stripped to include only conserved alignments, and subjected to RAxML to generate the ML phylogeny utilizing the GTRGAMMAmodel of nucleotide substitution with 100 bootstrap replicates. The phylogeny is depicted with branch length consideration. The inner circlerepresents the respective geo-location of each strain. Inner Blue and Gray alternative dashed lines represent an internal tree scale with a branchlength increment of 0.00005 from inside to outside. Full-size DOI: 10.7717/peerj.9576/fig-2
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 16/31
Mutations are continuously diversifying the Human SARS-CoV-2strainsLike the previous section, the whole genome sequence alignment of 167 SARS-CoV-2isolates were used for this study. As anticipated, ~100 nucleotides at both N andC-terminals were highly diverse and, therefore, changes in these sites were not consideredin this study. For the rest of the alignment, the percentage of nucleotide occurrence at eachsite (along with gap and other non-standard nucleotides) was calculated using theSARS-CoV-2 Wuhan Hu-1 genome as the reference (Fig. 3). The mutational probabilitywas classified into four categories based on random distribution patterns: high (>19%variation), medium (4–19% variation), low (2–3% variation) and very low (<2% variation).Overall, 415 mutation sites were identified, among which non-standard nucleotideswere present in the genome at 125 sites, which have been ignored in this study.We ultimately selected 290 sites with a confirmed variation. Amongst these, we found43 sites with high, medium and low mutation probabilities within 9 out of 12protein-coding regions in SARS-CoV-2 genomes, which also included the genes (S, M andN), identified as core proteins across genus Betacoronavirus. Amongst all mutation sites,we found 21 sites within the gene encoding Spike protein, 18 sites in the gene encodingNucleocapsid protein and three sites in gene encoding M protein, suggesting that eventhese highly conserved protein-coding regions (core proteins as later identified viapan-genome analysis) are capable of mutating. Also, we recognized 142 mutational sites inorf1a, 50 in orf1b, 6 in orf3a, 2 in the envelope protein-coding gene, 4 in orf6, and orf8, 3 inorf7b and 1 in orf10. We also identified 36 sites within non-coding regions.
Amongst sites with high mutation probability, C8,782T in orf1a showed ~35% variationfrom C to T, suggesting a transition mutation. Likewise, T28,144C in orf8 also has~35% transition mutation variations. We found three hotspot mutational sites withinthe orf1b region of orf1ab, which has three transitions, two C to T (C17,747T andC18,060T; ~20% variation) and one A to G (A17,858G; 19% variation). In the spikeprotein-coding region, we were able to identify one transition site (A23,403G nucleotide)with medium (11%) mutational probability. Overall, we were able to identify nine hotspotswith more than two consecutive probable mutational nucleotides in vicinity (197–210(non-coding), 508–522 (orf1a), 686–694 (orf1a), 4,879–4,881 (orf1a), 20,298–20,300(orf1b), 21,385–21,389 (orf1b), 21,991–21,993 (S), 28,878–28,883 (N), 29,750–29,759(non-coding)). Most of these hotspot sites showed ~1–2% mutational probabilities andnone of these have high or medium mutational probability as discussed in Fig. 3.We believe these genomic locations are the probable sites of evolutionary divergence, andtherefore govern the evolutionary capabilities of these viruses.
We also analyzed transition and transversion possibilities within all these mutationalcombinations. Out of the 290 identified mutational sites, 244 had a nucleotide substitution,distributed into 158 transitions and 86 transversions. Transition events are known to bemore frequent and less likely to cause a change at the protein level as compared totransversions (Sanjuán et al., 2010). Most transition events are silent mutations, whiletransversion events may cause change at the protein level and therefore impact the
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 17/31
pathogenesis of the disease (Lyons & Lauring, 2017). Therefore, we suggest that amongstSARS-CoV-2 genomes, transversion events, that are occurring in significant numbers,would provide more evolutionary diversity and possibly lead to divergence, as compared totransition events.
Site 29095
Site 11764
Sit
e 12
773
Site 22033
Site 01059
Site 19269
Sit
e 00
119
1942
1 et
iS
Site 00832
Sit
e 00
198
Site 06636
Site 27226
Site 05716
Site
160
75
Site 22984
Site 09157
Sit
e 00
104
Site 19175
Site 23952
Site 07016
Site
002
06Site 29752
Site 1
8060
Site 12115
Site
154
18
Site 22988
Site 18
603
Site
002
10
Site 29774
Site 21991
Site 09924
Sit
e 12
578
Site 29635
Site 21707
Site 27806
Site 03518
Site 25563
Site 01469
Site 29742
Site 09159
Site 20936
Site 05062
Site 29751
Site 28863
Site 06026
Site 06996
Site 09511
Site 02277
Site 04402
Site 29757
Site 1
7423
Site 27525
Site 00692
Site
132
25Site 00833
Site 26326
Sit
e 12
600
Site 18975
Site 03259
Site 09474
Site
002
05
Site 06031
Site 29756
Site 00693
Site 26936
Site 06819
Site 12355
Site
005
09
Site
168
87
Site 25810
Site
003
32
43521 etiS
Site 00
522
Site 02971
Site 12160
Site 09274
Site 00694
Site 09561
Site 21575
Site 28657
Site 00691
Site 11750
Site 08768
Site 11207
Site 09034
Site 21316
Site 01348
Site 12213
Site
146
57
Site
005
14
Site 03373
Site 11956
Site 0
0517
Site
155
97
Site 29754
Site
148
05
Site 07866Site 08987
Sit
e 00
124
Site 12208
Site 02632
Site 04880
Site 03992
Site 24351
100 etiS
11
Site 28916
Site 19065
Site 05784
Site 18
814
Sit
e 12
582
Site 01545
Site 11233
Site 28409
Site
170
00
Sit
e 00
120
Site 03738
Sit
e 12
572
Site 00
654
Site
005
10
Site 01691
Site 05572
Site
156
07
Site 20031
Site 22303
Site 11161
Site 11752
Site 27327
Sit
e 00
197
Site 00690
Site 04879
Site
132
26
Site 29254
100 etiS
12
Site08388
Site 09477
Site 26729
Site
003
54
Site 25775Site 25494
Site
172
47
Site 06968
Site 22104
Sit
e 12
660
Site 28854
Site 03411
Site
164
67
Site 18877
No
mu
tation
Site 21644
Site 28881
Site 29230
3742
1 et
iS
Site 06035
Site
005
16
Site 21387
Site 27925
Site 29736
Site 21784
Site 04288
Site 21385
Site 20299
Site 28792
Site 24034
Site 21386
Site 1
7747
Site 21147
Site 10319
Site 22224
Site 20980
Site 29753
Site 06695
Site 12378
Site 12467
Site 20823
Site 09180
Site 21389
Site 11320
Site 26354
Site 20298
Site 29758
Site 22785
Site 10036
Site 1
7474
Site 26144
Site
144
08
Site 08782
Site 10232Site
178
58
Site 29755
Site 28077
Site 27964
Site 29750
Site 18
149
Site
153
24
Site 01548
Sit
e 12
793
Site 00689
Sit
e 00
201
Site 28144Site 27807
Site 05052
Site 0
0521
Site
174
10 Site 09491
Site 00686
Site
169
12
Sit
e 13
072
Site
173
76
Site 12202
Site 12464
Site 28882
Site 29301
Site 07479
Site 27722
Site 29553
Site 02717
Site 20281
Site 12041
Site 27276
Site 08001
Site 29303
Site 04881
Sit
e 00
200
Site 04307
Site
005
15
Site
004
90
Site 01102
Sit
e 00
140
Site
005
08
Site 19610
Site 10507
Site 23185
Site 11410
Site 0
0519
Site 03778
Site 29759
Site 18
689
Site 00884
Site 28883
Site
002
08
Site 19018
Site
002
21
Site 27046
Sit
e 00
186
Site 03099
Site
142
29
Site 24325
Site 28878
Site 27384
Site 00687
Site 02269
Site 04642
Site 03037
Sit
e 12
685
Site
005
12
Site 22432S
ite 0
0202
Site 03177
Site 29527
Site 21992
Site 00688
Site 06501
41521 etiS
Site
168
77
Site 23403
Site 25979
Site
173
73
Site 07105
Site
002
41Site 29140
Site 0
0518
Site 05845
Site 28378
Site 00
614
Site 06310
Site 05084
Site 01385
Site 21137
Site 02446
Site 20300Site
005
20
Site
159
27
Site
002
54
Site 09534
Site 02091
Site 21993
Site 01397
Site
002
04
Site 11083
Site
005
11Si
te 0
0513
A
C
G
T
Others
GapMutational Probability
High
Medium
Low
Very Low
orf1a
orf1b
S
orf3a
E
orf6
M
orf7b
orf8
N
orf10
Figure 3 Mutation sites and hotspots distribution amongst 167 SARS-CoV-2 genomes as compared to Wuhan Hu-1. MUSCLE-basedalignment was used to analyze the distribution of all nucleotides (including gap and other nucleotides) at each alignment site and the relativepercentage of each nucleotide is represented here in a circular manner respectively for each nucleotide. The innermost circle represents the respectiveorf and non-coding DNA according to the location as mentioned in the third circle from inside. The second circle represents the mutationalprobability score of each locus distributed in high, medium, low and very low categories. Full-size DOI: 10.7717/peerj.9576/fig-3
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 18/31
Evolutionary relationship amongst SARS-CoV-2, Bat CoV andPangolin CoVTo re-examine the closest hosts of these viruses, we performed homology studies on smallgenome fragments (100 nucleotides) to identify their close relatives. We segmented each ofthe SARS-CoV-2, Pangolin CoV, SARS-CoV and MERS genome (reference genome ofeach category) into 100 nucleotide fragments, identified their closest homologs in thevirus database, filtered the strains belonging to its taxonomy, selected top 50 hits persegment, calculated the host CoV distribution and represented it as a heatmap for eachsegment/strain along with the closest (topmost) hit per segment/strain (Fig. 4).MERS-CoV is already known to originate from the bat, however, the infection spread viacamels either directly/indirectly (Azhar et al., 2014). On excluding Camel CoVs fromthe data, we found several segments showing maximum similarity representation in BatCoVs, followed by Hedgehog and Human CoV strains, however, most of the parts wereuniquely present in MERS. As expected, when Camel CoV strains were included, they werethe closest best hit as indicated in the “MERS Closest” category (Fig. 4).
When we further investigated the same patterns in the SARS-CoV genome, weidentified the Bat CoV as the closest strain for most of the reference segments, followedby SARS-CoV-2 and Pangolin CoV. In the closest category, we found Bat CoV as theclosest suggesting their evolutionary origin. However, many segments do not showany closeness with any strain, indicating their uniqueness in the reference genome.We performed the same analysis on the Pangolin CoV strain GX-P5E and found thatseveral segments have close homology-distribution within SARS-CoV and SARS-CoV-2genomes. Interestingly, as visible in the “Pangolin CoV Closest” category, several hits showmaximum similarity with Bat SARS-CoV, followed by SARS-CoV-2 and a few HumanCoVs.
Likewise, SARS-CoV-2 genome fragments show maximum homology representation inBat CoV followed by Pangolin CoV. In the “closest category”, Bat CoV indisputably standsout as the closest host, which is corroborated by whole-genome phylogeny and thepercentage identity matrix studies. However, the presence of several Pangolin CoVhomologous segments in this analysis indicates that certain specific segments in theSARS-CoV-2 genome share a closer relationship with Pangolin CoV strains as comparedto Bat CoVs.
Overall, our analysis indicates that the SARS-CoV-2 genome has a close relationshipwith Bat CoV and Pangolin CoV, along with the presence of a few unique segments.This reasserts that SARS-CoV-2 strains might have evolved from a common ancestorof Bat SARS-CoV and Pangolin CoV strains and during evolution, several segments in theSARS-CoV-2 genome might have diverged as compared to Bat SARS-CoV and PangolinCoV. The recognized unique genomic segments in this study may be used as potentialtargets for diagnostic kits.
Pan-genome (Core, accessory and unique gene) analysisRegardless of their approximately similar genome size, the members of Betacoronavirusharbor huge diversity (5–14 CDS) in the number of protein-coding regions, as per the
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 19/31
available NCBI genome annotations. SARS-CoV and SARS-CoV-2, belonging to subgenusSarbecovirus, have 14 and 12 protein-coding regions respectively, revealing the diversitywithin the same subgenus. Similarly, Pangolin CoV (studied strain: GX-P5E), which alsobelongs to subgenus Sarbecovirus, has only nine protein-coding regions. To identify the
1930
1 19
400
7601 7700
25901 26000
1101
120
0
21001 21100
26101 26200
24101 24200
5901 6000
11001 11100
12801 12900
1830
1 18
400
25501 25600
2901
300
0
5401 5500
20501 20600
21801 21900
2030
1 204
00
10901 11000
701
800
28901 29000
23501 23600
26901 27000
23001 23100
6201 6300
3601
370
0
24601 24700
10101 10200
14301 14400
3001
310
0
8101 820016
401
1650
0
9701 9800
13701 13800
3801
390
0
25101 25200
12301 12400
27101 27200
8001 8100
9801 9900
2701
280
0
1630
1 16
400
6701 6800
28801 2890016
901
1700
0
9001 9100
21201 21300
27301 27400
10201 10300
1790
1 18
000
14901 15000
25201 25300
12501 12600
26201 26300
23901 24000
3901
400
0
2801
290
0
6401 6500
22801 22900
28201 28300
7501 7600
12601 12700
2301
240
0
30101 30200
29201 29300
5801 5900
10501 10600
8401 8500
23101 23200
1920
1 19
300
2201
230
0
5201 5300
27001 27100
21301 21400
6301 6400
27601 27700
29501 29600
1880
1 18
900
1760
1 17
700
20401 20500
14201 14300
25601 25700
27801 27900
1710
1 17
200
29701 29800
22701 22800
30001 30100
901
1000
1960
1 19
700
29801 29900
1970
1 19
800
10401 10500
21901 22000
13901 14000
11801 11900
26501 26600
15501 15600
4101
4200
25701 25800
4701 4800
13401 13500
1670
1 16
800
11401 11500
14101 14200
801
900
2501
260
0
4201
4300
8301 8400
24001 24100
29601 29700
1400114100
26801 26900
6801 6900
7301 74007201 7300
12701 12800
10001 10100
28101 28200
1890
1 19
000
24801 24900
15301 15400
29001 29100
4001
4100
8901 9000
1910
1 19
200
26401 26500
15401 15500
10301 10400
9601 9700
3201
330
0
1701
180
0
14801 14900
15101 15200
1860
1 18
700
22601 22700
1850
1 18
600
1750
1 17
600
7901 8000
4801 4900
22001 22100
7001 7100
7801 7900
14401 14500
12001 12100
7701 7800
5701 5800
1990
1 20
000
21101 21200
002 101
1840
1 18
500
1780
1 17
900
1680
1 16
900
20601 20700
1770
1 17
800
1601
170
0
13101 13200
8701 8800
29301 29400
1700
1 17
100
25301 25400
3701
380
0
24301 24400
26601 26700
22201 22300
5101 5200
20801 20900
23201 23300
3501
360
0
11701 11800
28701 28800
1201
130
0
15701 15800
4501 4600
1301
140
0
5601 5700
25401 25500
11201 11300
1001
110
0
2001
210
0
6101 6200
15001 15100
2601
270
0
28001 28100
14701 14800
22301 22400
22901 23000
22401 22500
00261 10161
2010
1 202
00
22101 22200
23801 23900
27201 27300
1740
1 17
500
12901 13000
0006
1 10
951
5301 5400
201
300
21701 21800
4901 5000
00161 100611620
1 16
300
26701 26800
1501
160
0
9201 9300
8201 8300
2101
220
0
7101 720017
301
1740
0
25001 25100
12101 12200
27701 278004401 4500
1801
190
0
501
600
13001 13100
9501 9600
28401 28500
23301 23400
22501 22600
2000
1 201
00
8601 87001
100
1950
1 19
600
601
700
301
400
24201 24300
2020
1 203
00
10701 10800
4601 4700
28501 28600
20701 20800
6601 6700
9401 9500
29901 30000
3401
350
0
23601 23700
4301
4400
1800
1 18
100
15601 15700
5501 5600
1660
1 16
700
6901 7000
13201 13300
7401 7500
20901 21000
27901 28000
23401 23500
21401 21500
15201 15300
8801 8900
5001 5100
12401 12500
9301 9400
1401
150
0
14601 14700
21601 21700
11601 11700
401
500
21501 21600
9101 9200
1980
1 19
900
6501 6600
12201 12300
29101 29200
25801 25900
1870
1 18
800 13501 13600
11101 11200
11301 11400
13801 13900
1720
1 17
300
0095
1 10
851
10601 10700
13301 13400
1940
1 19
500
27501 27600
11901 12000
24901 25000
1901
200
0
26001 2610018
201
1830
0
24501 24600
24701 24800
1650
1 16
600
28301 28400
13601 13700
10801 10900
1900
1 19
100
11501 1160031
01 3
200
23701 23800
24401 24500
14501 14600
28601 28700
2401
250
0
26301 26400
9901 10000
27401 27500
1810
1 18
200
8501 8600
29401 29500
3301
340
0
6001 6100
Bat-CoV
Hedgehog-CoV
Human-CoV
Bird-CoV
MERS Unique
Bat-CoV
SARS-CoV-2
Pangolin-CoV
Bird/Rabbit-CoV
SARS-CoV Unique
Bat-CoV
SARS-CoV-2
Human-CoV
Other-CoV
Pangolin-CoV Unique
Bat-CoV
Pangolin-CoV
Human-CoV
SARS-CoV-2 Unique
MERS Closest
Camel
SARS-CoV Closest
Bat
SARS1 Unique
Pangolin CoV Closest
Bat-CoV
SARS-CoV-2
Human-CoV
Pangolin-CoV Unique
Representation Score
0
10
20
30
40
50
60
70
80
90
100
SARS-CoV-2 Closest
Bat CoV
Human
Pangolin
SARS2 Unique
SARS-CoV-2 Closest
SARS-CoV Closest
Pangolin CoV Closest
MERS Closest
SA
RS
-CoV
-2
Pan
golin
-CoV
SA
RS
-CoV
ME
RS
-CoV
Figure 4 Host distribution of each 100-nucleotide segment in the reference genomes of MERS, SARS-CoV, Pangolin CoV, and SARS-CoV-2.Each 100-nucleotide segment per strain was subjected to BLASTn against the virus database. Self-category hits were removed from the analysis, hostinformation of top 50 hits was identified, and the relative percentage is represented here as a heatmap for each segment in each genome. In the closestcategory, the host information of best hit (after removing self-hit) is represented in the form of a multi-color stripe. The innermost circle representsthe location of each segment to start from 1 to 100 nucleotides to the total genome length of the genome.
Full-size DOI: 10.7717/peerj.9576/fig-4
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 20/31
presence of (co-)orthologous proteins present across all these genomes, a bidirectional(reciprocal) best BLAST hit-based homology detection strategy was followed using the toolProteinortho (Lechner et al., 2011) (Fig. 5; Fig. S5). A total of 44 protein-coding regionclusters represent the pan-genome of nineteen studied genus Betacoronavirus membersamongst which, 3 protein-coding regions (S, M and N proteins) were found to beconserved across all genomes, representing the core genome. A similar analysis hasbeen performed on all Betacoronavirus species; however, it does not include PangolinCoV (Alam et al., 2020). Our study also recognized that the pan-genome of genusBetacoronavirus is open, which means that with the addition of a new species, thepan-genome is continuously increasing (Fig. S6). The increase in the pan-genome at eachstep is approximately proportional to the increase in unique genes. Since each newlyidentified strain has additional unique gene sequences, it is effectively impossible to predicttheir complete pan-genome. As expected, the members of each subgenus, which areclosest to each other, have only a few variations, however, with the addition of a newsubgenus member(s), the pan-genome expands immediately.
orf1ab, the largest coding region, codes for two polyproteins ORF1a and ORF1ab(266–13,468,13,468–21,555) because of the “leaky mechanism” of the ribosome, a (-1)frameshift just upstream of the orf1a translation termination codon. All CoVs, except forstrain MHV A59 of subgenus Embecovirus, translate the full-length orf1ab. MHV A59,
Figure 5 Pan-genome (Core, accessory, and unique) analysis amongst 19 Betacoronavirus referencegenomes (including one Pangolin CoV). The X-axis represents the strain combinations as black-filledcircles, whereas Y-axis represents the distribution of genes in the respective combinations and inside eachvertical bar, those gene names are written. For each strain, their respective subgenus category has beenpointed out on the left side as the colored scale in the corner. In the left-down corner, the number ofgenes encoded by each genome is depicted as a horizontal bar. In the lowermost panel, pan-genomecategories (core, accessory and unique genes) are shown. Full-size DOI: 10.7717/peerj.9576/fig-5
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 21/31
however, has separate polyproteins from orf1a and orf1b. Due to the absence of ribosomalslippage reported in several coronaviruses, ORF1a is encoded by only 9 out of 18genomes, making it an accessory protein cluster of the pan-genome. ORF1b polyprotein isencoded as a part of orf1ab in most of the species belonging to genus Betacoronavirus,again except for strain MHV A59, where it is encoded separately as RdRp, forming aunique gene. Envelope protein (76–82 aa across Betacoronavirus) is another accessoryprotein encoded by 15 out of 18 genomes. In another cluster, we identified the presence ofsmall envelope protein in two out of those three genomes where E protein is absent(both in subgenus Nobecovirus). No E protein-encoding region is found in the MHV A59genome annotations. The gene encoding ORF3 is distributed into two subunits (orf3aand orf3b) in SARS-CoV (via an internal ribosomal entry mechanism), whereas onlythe first subunit (orf3a; 276 aa) is present in SARS-CoV-2. The gene for ORF3a isconserved only amongst subgenus Sarbecovirus organisms. This protein activates thepro-transcription of gene IL-1β and helps its maturation; both signals are prerequisitesfor NLRP3 (NOD-, LRR- and pyrin domain-containing protein 3) inflammasomeactivation during infection (Siu et al., 2019). Conservation of ORF3a across SARS-CoV-2,SARS-CoV and Pangolin CoVs provides a clue about its unique infection strategy. ORF3b(155 aa) is only present in SARS-CoV and absent in all other coronaviruses. It hasbeen suggested that ORF3b supports SARS-CoV infection by inhibiting the type-1Interferon (IFN) synthesis and therefore, IFN signaling (McBride & Fielding, 2012).ORF6 is also a subgenus Sarbecovirus specific protein (conserved across SARS-CoV,SARS-CoV-2 and Pangolin CoV) localized in the Endoplasmic Reticulum or the Golgimembrane. During infection, it interacts with and interrupts the nuclear import complexformation. During infection-caused interferon signaling, this interruption inhibits STAT1transportation to the nucleus and blocks the expression of genes involved, thereforemimicking an antiviral state (Frieman et al., 2007).
Another accessory protein of the Betacoronavirus pan-genome is ORF7a, which is aunique protein amongst subgenus Sarbecovirus members. In SARS-CoV, it is knownthat bone marrow stromal antigen-2 (BST-2 or tetherin or CD317) can restrict the virusrelease from inside the cell causing partial inhibition of infection. However, ORF7a caninhibit the action of this protein, thus promoting the infection (Taylor et al., 2015). ORF7bis present in both SARS-CoV-2 and SARS-CoV but absent in Pangolin CoV NCBIannotations. Like orf7b in SARS-CoV, its initiation codon overlaps with orf7a viaribosome leaky scanning and it works as a structural component (Schaecher, Mackenzie &Pekosz, 2007). orf8 gene (365 nucleotides) is uniquely present in SARS-CoV-2 andPangolin CoV but absent in SARS-CoV. Distinctively, orf8 in SARS-CoV is separatedinto two proteins ORF8a (119 nucleotides) and ORF8b (254 nucleotides). ORF8 inSARS-CoV-2 is highly divergent from the inflammasome activator ORF8b in SARS-CoV(Yuen et al., 2020). The function of this protein is not clear yet, but it has been suggestedthat overall ORF8ab or ORF8a and ORF8b together modulate viral replication andpathogenesis in unknown fashion (McBride & Fielding, 2012). ORF10 is a unique proteinin SARS-CoV-2 strains. We also performed the homology analysis of ORF10 protein
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 22/31
against the NCBI Non-redundant database and found no hits, which suggests that thisprotein is specific to SARS-CoV-2.
This analysis also revealed that some proteins are specific to a subgenus. SubgenusEmbecovirus has a unique combination of proteins, namely HE, ORF4, internal protein,NS2a and NS3a, that are present in most genomes of this subgenus. These proteins mightbe considered unique to subgenus Embecovirus as compared to the rest of the genusBetacoronavirus. Hemagglutinin Esterase glycoprotein (HE) is a nonessential proteinwith sialic acid-binding and acetyl esterase activity that create tinier spikes (secondattachment factor) on the viral envelope in several MHV strains. orf4 (also annotated asorf5a, ns12.9) encodes a nonstructural accessory protein that is, known to function as typeI interferon antagonist. The internal gene is encoded within the N gene in MHV CoV;however, its function is not well-known. Similarly, subgenus Merbecovirus members alsohave some unique proteins (NS3a, NS3b, NS4b, ORF5), that are absent in other subgenera.We also identified unique proteins in each organism, which were absent in otherBetacoronavirus. Overall, it can be argued that although genus Betacoronavirus membershave ~10 proteins coding regions per genome, they have conserved similarity in only~10% (3–4 proteins) of the genes and their diversity (~40% accessory genes and ~50%unique genes) spans the rest of the genome, leading to their different pathogenesis acrossdiverse hosts and distinct evolution. We identified that even among subgenus Sarbecovirusmembers (SARS-CoV, SARS-CoV-2 and Pangolin CoV), orf3b, orf7b, orf8, orf9b andorf10 are uniquely distributed within at least one genome, therefore depicting widediversity within the same subgenus.
CONCLUSIONSHumanity has already encountered coronavirus infections twice in the past two decades,in the form of the SARS and MERS epidemics. The latest coronavirus infection,COVID-19, caused by the highly pathogenic and transmissible SARS-CoV-2, turned into apandemic in sheer three months. This work provides a brief review of the taxonomy,history, origin, and genome organization of the virus, followed by comparative genomicsresearch focused on understanding the evolutionary relationship amongst 167SARS-CoV-2, 312 SARS-CoV and 5 Pangolin CoV genome sequences as available on29 March 2020. We identified that SARS-CoV-2 strains form a monophyletic clade distinctfrom SARS-CoV and Pangolin CoV. This study reaffirmed that they are closest to theBat CoV RaTG13 strain followed by Pangolin CoV, suggesting that SARS-CoV-2 evolvedfrom a common ancestor putatively residing in bat or pangolin hosts. We identified severalmutation sites and hotspots within SARS-CoV genomes with high, medium and lowprobabilities. Homology studies based on 100 nucleotide segments in the SARS-CoV-2reference genome pointed out their close relationships with Bat and Pangolin CoVs, alongwith the unique nature of a few segments. We recognized that 44 protein-coding regionsconstitute the pan-genome for nineteen genus Betacoronavirus strains. Moreover, theirpan-genome is open, highlighting the wide diversity provided by newly identified novelstrains. Even members of subgenus Sarbecovirus are diverse relative to each other due to
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 23/31
the relative presence of unique protein-coding regions orf3b, orf7b, orf8 and orf9b andorf10. Overall, this review and research highlight the diversity within the availableSARS-CoV-2 genomes, their potential mutational sites hotspots, and probableevolutionary relationship with other coronaviruses, which might further assist in ourunderstanding of their evolution, epidemiology and pathogenicity.
ACKNOWLEDGEMENTSWe acknowledge Tanvi Kale, IBAB MSc (2018–20) student for graciously offering herartwork to use as a cover image for this manuscript.
ADDITIONAL INFORMATION AND DECLARATIONS
FundingDepartment of Science and Technology (DST), Government of India and IBAB, Bengaluruprovided financial and infrastructure support to Gaurav Sharma. The funders had no rolein study design, data collection and analysis, decision to publish, or preparation of themanuscript.
Grant DisclosuresThe following grant information was disclosed by the authors:DST-INSPIRE Faculty Award from Department of Science and Technology (DST),Government of India to GS.
Competing InterestsThe authors declare that they have no competing interests.
Author Contributions� Arohi Parlikar performed the experiments, analyzed the data, authored or revieweddrafts of the paper, and approved the final draft.
� Kishan Kalia performed the experiments, analyzed the data, prepared figures and/ortables, authored or reviewed drafts of the paper, and approved the final draft.
� Shruti Sinha performed the experiments, analyzed the data, authored or reviewed draftsof the paper, and approved the final draft.
� Sucheta Patnaik performed the experiments, analyzed the data, authored or revieweddrafts of the paper, and approved the final draft.
� Neeraj Sharma performed the experiments, analyzed the data, prepared figures and/ortables, authored or reviewed drafts of the paper, and approved the final draft.
� Sai Gayatri Vemuri performed the experiments, analyzed the data, authored or revieweddrafts of the paper, and approved the final draft.
� Gaurav Sharma conceived and designed the experiments, performed the experiments,analyzed the data, prepared figures and/or tables, authored or reviewed drafts of thepaper, and approved the final draft.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 24/31
Data AvailabilityThe following information was supplied regarding data availability:
The strains and genomes used in this research were retrieved from NCBI Virus(https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/) on 29 March 2020. This information isprovided in Table S1.
Supplemental InformationSupplemental information for this article can be found online at http://dx.doi.org/10.7717/peerj.9576#supplemental-information.
REFERENCESAhmed SF, Quadeer AA, McKay MR. 2020. Preliminary identification of potential vaccine targets
for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies.Viruses 12(3):254 DOI 10.3390/v12030254.
Alam I, Kamau A, Kulmanov M, Arold ST, Pain A, Gojobori T, Duarte CM. 2020. Functionalpangenome analysis provides insights into the origin, function and pathways to therapy ofSARS-CoV-2 coronavirus. bioRxiv DOI 10.1101/2020.02.17.952895.
Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. 2020. The proximal origin ofSARS-CoV-2. Nature Medicine 26:450–452 DOI 10.1038/s41591-020-0820-9.
Angelini MM, Akhlaghpour M, Neuman BW, Buchmeier MJ. 2013. Severe acute respiratorysyndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles.mBio 4(4):e00524-13 DOI 10.1128/mBio.00524-13.
Azhar EI, El-Kafrawy SA, Farraj SA, Hassan AM, Al-Saeed MS, Hashem AM, Madani TA. 2014.Evidence for camel-to-human transmission of MERS coronavirus. New England Journal ofMedicine 370(26):2499–2505 DOI 10.1056/NEJMoa1401505.
Barr JN, Fearns R. 2010. How RNA viruses maintain their genome integrity. Journal of GeneralVirology 91(6):1373–1387 DOI 10.1099/vir.0.020818-0.
Boni MF, Lemey P, Jiang X, Lam TT-Y, Perry B, Castoe T, Rambaut A, Robertson DL. 2020.Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19pandemic. bioRxiv DOI 10.1101/2020.03.30.015008.
Bouvet M, Lugari A, Posthuma CC, Zevenhoven JC, Bernard S, Betzi S, Imbert I, Canard B,Guillemot JC, Lécine P, Pfefferle S, Drosten C, Snijder EJ, Decroly E, Morelli X. 2014.Coronavirus Nsp10, a critical co-factor for activation of multiple replicative enzymes.Journal of Biological Chemistry 289(37):25783–25796 DOI 10.1074/jbc.M114.577353.
Buchfink B, Xie C, Huson DH. 2014. Fast and sensitive protein alignment using DIAMOND.Nature Methods 12(1):59–60 DOI 10.1038/nmeth.3176.
Chen Y, Liu Q, Guo D. 2020. Emerging coronaviruses: genome structure, replication, andpathogenesis. Journal of Medical Virology 92(4):418–423 DOI 10.1002/jmv.25681.
Chen Y, Su C, Ke M, Jin X, Xu L, Zhang Z, Wu A, Sun Y, Yang Z, Tien P, Ahola T, Liang Y,Liu X, Guo D. 2011. Biochemical and structural insights into the mechanisms of SARScoronavirus RNA ribose 2′-O-Methylation by nsp16/nsp10 protein complex. PLOS Pathogens7(10):e1002294 DOI 10.1371/journal.ppat.1002294.
Christiam C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009.BLAST+: architecture and applications. BMC Bioinformatics 10:421DOI 10.1186/1471-2105-10-421.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 25/31
Coleman CM, Frieman MB. 2014. Coronaviruses: important emerging human pathogens.Journal of Virology 88(10):5209–5212 DOI 10.1128/JVI.03488-13.
Cong Y, Ulasli M, Schepers H, Mauthe M, V’kovski P, Kriegenburg F, Thiel V, De Haan CAM,Reggiori F. 2019. Nucleocapsid protein recruitment to replication-transcription complexesplays a crucial role in coronaviral life cycle. Journal of Virology 94(4):e01925-19DOI 10.1128/JVI.01925-19.
Cottam EM, Whelband MC, Wileman T. 2014. Coronavirus NSP6 restricts autophagosomeexpansion. Autophagy 10(8):1426–1441 DOI 10.4161/auto.29309.
Cui J, Li F, Shi ZL. 2019. Origin and evolution of pathogenic coronaviruses. Nature ReviewsMicrobiology 17(3):181–192 DOI 10.1038/s41579-018-0118-9.
De Haan CAM, Haijema BJ, Schellen P, Schreur PW, Te Lintelo E, Vennema H, Rottier PJM.2008. Cleavage of group 1 coronavirus spike proteins: how furin cleavage is traded off againstheparan sulfate binding upon cell culture adaptation. Journal of Virology 82(12):6078–6083DOI 10.1128/JVI.00074-08.
Edgar RC. 2004.MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Research 32(5):1792–1797 DOI 10.1093/nar/gkh340.
Fehr AR, Perlman S. 2015. Coronaviruses: an overview of their replication and pathogenesis.In: Maier H, Bickerton E, Britton P, eds. Coronaviruses: Methods and Protocols. New York:Springer, 1–23.
Follis KE, York J, Nunberg JH. 2006. Furin cleavage of the SARS coronavirus spike glycoproteinenhances cell–cell fusion but does not affect virion entry. Virology 350(2):358–369DOI 10.1016/j.virol.2006.02.003.
Frieman M, Yount B, Heise M, Kopecky-Bromberg SA, Palese P, Baric RS. 2007. Severe acuterespiratory syndrome coronavirus ORF6 antagonizes STAT1 function by sequestering nuclearimport factors on the rough endoplasmic reticulum/golgi membrane. Journal of Virology81(18):9812–9824 DOI 10.1128/JVI.01012-07.
Fuk-Woo Chan J, Kok K-H, Zhu Z, Chu H, Kai-Wang To K, Yuan S, Yuen K-Y. 2020. Genomiccharacterization of the 2019 novel human-pathogenic coronavirus isolated from a patient withatypical pneumonia after visiting Wuhan. Emerging Microbes & Infections 9(1):221–236DOI 10.1080/22221751.2020.1719902.
Gadlage MJ, Sparks JS, Beachboard DC, Cox RG, Doyle JD, Stobart CC, Denison MR. 2010.Murine hepatitis virus nonstructural protein 4 regulates virus-induced membrane modificationsand replication complex function. Journal of Virology 84(1):280–290DOI 10.1128/JVI.01772-09.
Gorbalenya AE, Baker SC, Baric RS, De Groot RJ, Drosten C, Gulyaeva AA, Haagmans BL,Lauber C, Leontovich AM, Neuman BW, Penzar D, Perlman S, Poon LLM, Samborskiy DV,Sidorov IA, Sola I, Ziebuhr J. 2020. The species severe acute respiratory syndrome-relatedcoronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology5(4):536–544 DOI 10.1038/s41564-020-0695-z.
Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, O’Meara MJ, Rezelj VV,Guo JZ, Swaney DL, Tummino TA, Huettenhain R, Kaake RM, Richards AL,Tutuncuoglu B, Foussard H, Batra J, Haas K, Modak M, Kim M, Haas P, Polacco BJ,Braberg H, Fabius JM, Eckhardt M, Soucheray M, Bennett MJ, Cakir M, McGregor MJ, Li Q,Meyer B, Roesch F, Vallet T, Mac Kain A, Miorin L, Moreno E, Naing ZZC, Zhou Y, Peng S,Shi Y, Zhang Z, Shen W, Kirby IT, Melnyk JE, Chorba JS, Lou K, Dai SA, Barrio-Hernandez I, Memon D, Hernandez-Armenta C, Lyu J, Mathy CJP, Perica T, Pilla KB,Ganesan SJ, Saltzberg DJ, Rakesh R, Liu X, Rosenthal SB, Calviello L, Venkataramanan S,
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 26/31
Liboy-Lugo J, Lin Y, Huang X-P, Liu Y, Wankowicz SA, Bohn M, Safari M, Ugur FS, Koh C,Savar NS, Tran QD, Shengjuler D, Fletcher SJ, O’Neal MC, Cai Y, Chang JCJ,Broadhurst DJ, Klippsten S, Sharp PP, Wenzell NA, Kuzuoglu D, Wang H-Y, Trenker R,Young JM, Cavero DA, Hiatt J, Roth TL, Rathore U, Subramanian A, Noack J, Hubert M,Stroud RM, Frankel AD, Rosenberg OS, Verba KA, Agard DA, Ott M, Emerman M, Jura N,von Zastrow M, Verdin E, Ashworth A, Schwartz O, d’Enfert C, Mukherjee S, Jacobson M,Malik HS, Fujimori DG, Ideker T, Craik CS, Floor SN, Fraser JS, Gross JD, Sali A, Roth BL,Ruggero D, Taunton J, Kortemme T, Beltrao P, Vignuzzi M, García-Sastre A, Shokat KM,Shoichet BK, Krogan NJ. 2020. A SARS-CoV-2 protein interaction map reveals targets fordrug repurposing. Epub ahead of print 30 April 2020. Nature DOI 10.1038/s41586-020-2286-9.
Graham RL, Sims AC, Brockway SM, Baric RS, Denison MR. 2005. The nsp2 replicase proteinsof murine hepatitis virus and severe acute respiratory syndrome coronavirus are dispensable forviral replication. Journal of Virology 79(21):13399–13411DOI 10.1128/JVI.79.21.13399-13411.2005.
Graham RL, Sparks JS, Eckerle LD, Sims AC, Denison MR. 2008. SARS coronavirusreplicase proteins in pathogenesis. Virus Research 133(1):88–100DOI 10.1016/j.virusres.2007.02.017.
Grossoehme NE, Li L, Keane SC, Liu P, Dann CE, Leibowitz JL, Giedroc DP. 2009. CoronavirusN protein N-terminal domain (NTD) specifically binds the transcriptional regulatory sequence(TRS) and melts TRS-cTRS RNA duplexes. Journal of Molecular Biology 394(3):544–557DOI 10.1016/j.jmb.2009.09.040.
Hamre D, Procknow JJ. 1966. A new virus isolated from the human respiratory tract.Proceedings of the Society for Experimental Biology and Medicine 121(1):190–193DOI 10.3181/00379727-121-30734.
Han Q, Lin Q, Jin S, You L. 2020. Coronavirus 2019-nCoV: a brief perspective from the front line.Journal of Infection 80(4):373–377 DOI 10.1016/j.jinf.2020.02.010.
Hu D, Zhu C, Ai L, He T, Wang Y, Ye F, Yang L, Ding C, Zhu X, Lv R, Zhu J, Hassan B, Feng Y,Tan W, Wang C. 2018. Genomic characterization and infectivity of a novel SARS-likecoronavirus in Chinese bats. Emerging Microbes & Infections 7(1):1–10DOI 10.1038/s41426-018-0155-5.
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z, Yu T, Xia J,Wei Y, Wu W, Xie X, Yin W, Li H, Liu M, Xiao Y, Gao H, Guo L, Xie J, Wang G, Jiang R,Gao Z, Jin Q, Wang J, Cao B. 2020. Clinical features of patients infected with 2019 novelcoronavirus in Wuhan, China. Lancet 395(10223):497–506DOI 10.1016/S0140-6736(20)30183-5.
Issa E, Merhi G, Panossian B, Salloum T, Tokajian S. 2020. SARS-CoV-2 and ORF3a:nonsynonymous mutations, functional domains, and viral pathogenesis. mSystems5(3):e00266-20 DOI 10.1128/mSystems.00266-20.
Kendall EJC, Bynoe ML, Tyrrell DAJ. 1962. Virus isolations from common colds occurring in aresidential school. BMJ 2(5297):82–86 DOI 10.1136/bmj.2.5297.82.
Kirchdoerfer RN, Ward AB. 2019. Structure of the SARS-CoV nsp12 polymerase bound to nsp7and nsp8 co-factors. Nature Communications 10(1):1–9 DOI 10.1038/s41467-019-10280-3.
Kopecky-Bromberg SA, Martinez-Sobrido L, Frieman M, Baric RA, Palese P. 2007. Severe acuterespiratory syndrome coronavirus open reading frame (ORF) 3b, ORF 6, and nucleocapsidproteins function as interferon antagonists. Journal of Virology 81(2):548–557DOI 10.1128/JVI.01782-06.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 27/31
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018.MEGA X: molecular evolutionary geneticsanalysis across computing platforms. Molecular Biology and Evolution 35(6):1547–1549DOI 10.1093/molbev/msy096.
Lai AL, Millet JK, Daniel S, Freed JH, Whittaker GR. 2020. The SARS-CoV fusion peptide formsan extended bipartite fusion platform that perturbs membrane order in a calcium-dependentmanner. Journal of Molecular Biology 429(24):3875–3892 DOI 10.1016/j.jmb.2017.10.017.
Lau SKP, Feng Y, Chen H, Luk HKH, Yang W-H, Li KSM, Zhang Y-Z, Huang Y, Song Z-Z,Chow W-N, Fan RYY, Ahmed SS, Yeung HC, Lam CSF, Cai J-P, Wong SSY, Chan JFW,Yuen K-Y, Zhang H-L, Woo PCY, Perlman S. 2015. Severe acute respiratory syndrome (SARS)coronavirus ORF8 protein is acquired from SARS-related coronavirus from greater horseshoebats through recombination. Journal of Virology 89(20):10532–10547DOI 10.1128/JVI.01048-15.
Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ. 2011. Proteinortho: detectionof (Co-)orthologs in large-scale analysis. BMC Bioinformatics 12(1):124DOI 10.1186/1471-2105-12-124.
Lei J, Kusov Y, Hilgenfeld R. 2018. Nsp3 of coronaviruses: structures and functions of a largemulti-domain protein. Antiviral Research 149:58–74 DOI 10.1016/j.antiviral.2017.11.001.
Lescure FX, Bouadma L, Nguyen D, Parisey M, Wicky PH, Behillil S, Gaymard A,Bouscambert-DuchampM, Donati F, Le Hingrat Q, Enouf V, Houhou-Fidouh N, Valette M,Mailles A, Lucet JC, Mentre F, Duval X, Descamps D, Malvy D, Timsit JF, Lina B,Van-der-Werf S, Yazdanpanah Y. 2020. Clinical and virological data of the first cases ofCOVID-19 in Europe: a case series. Lancet Infectious Diseases 20:697–706DOI 10.1016/S1473-3099(20)30200-0.
Letunic I, Bork P. 2019. Interactive tree of life (iTOL) v4: recent updates and new developments.Nucleic Acids Research 47(W1):W256–W259 DOI 10.1093/nar/gkz239.
Liu DX, Fung TS, Chong KK-L, Shukla A, Hilgenfeld R. 2014. Accessory proteins of SARS-CoVand other coronaviruses. Antiviral Research 109:97–109 DOI 10.1016/j.antiviral.2014.06.013.
Lugari A, Betzi S, Decroly E, Bonnaud E, Hermant A, Guillemot JC, Debarnot C, Borg JP,Bouvet M, Canard B, Morelli X, Lécine P. 2010. Molecular mapping of the RNA cap2′-O-methyltransferase activation interface between severe acute respiratory syndromecoronavirus nsp10 and nsp16. Journal of Biological Chemistry 285(43):33230–33241DOI 10.1074/jbc.M110.120014.
Lv L, Li G, Chen J, Liang X, Li Y. 2020. Comparative genomic analysis revealed specific mutationpattern between human coronavirus SARS-CoV-2 and Bat-SARSr-CoV RaTG13. bioRxivDOI 10.1101/2020.02.27.969006.
Lyons DM, Lauring AS. 2017. Evidence for the selective basis of transition-to-transversionsubstitution bias in two RNA viruses. Molecular Biology and Evolution 34(12):3205–3215DOI 10.1093/molbev/msx251.
Marra MA, Jones SJM, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YSN, Khattra J,Asano JK, Barber SA, Chan SY, Cloutier A, Coughlin SM, Freeman D, Girn N, Griffith OL,Leach SR, Mayo M, McDonald H, Montgomery SB, Pandoh PK, Petrescu AS, Robertson AG,Schein JE, Siddiqui A, Smailus DE, Stott JM, Yang GS, Plummer F, Andonov A, Artsob H,Bastien N, Bernard K, Booth TF, Bowness D, Czub M, Drebot M, Fernando L, Flick R,Garbutt M, Gray M, Grolla A, Jones S, Feldmann H, Meyers A, Kabani A, Li Y, Normand S,Stroher U, Tipples GA, Tyler S, Vogrig R, Ward D, Watson B, Brunham RC, Krajden M,Petric M, Skowronski DM, Upton C, Roper RL. 2003. The genome sequence of theSARS-associated coronavirus. Science 300(5624):1399–1404 DOI 10.1126/science.1085953.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 28/31
McBride R, Fielding B. 2012. The role of severe acute respiratory syndrome (SARS)-coronavirusaccessory proteins in virus pathogenesis. Viruses 4(11):2902–2923 DOI 10.3390/v4112902.
McIntosh K, Becker WB, Chanock RM. 1967. Growth in suckling-mouse brain of “IBV-like”viruses from patients with upper respiratory tract disease. Proceedings of the National Academyof Sciences of the United States of America 58(6):2268–2273 DOI 10.1073/pnas.58.6.2268.
Menachery VD, Debbink K, Baric RS. 2014. Coronavirus non-structural protein 16: evasion,attenuation, and possible treatments. Virus Research 194:191–199DOI 10.1016/j.virusres.2014.09.009.
Mousavizadeh L, Ghasemi S. 2020. Genotype and phenotype of COVID-19: their roles inpathogenesis. Epub ahead of print 31 March 2020. Journal of Microbiology, Immunology andInfection DOI 10.1016/j.jmii.2020.03.022.
Peiris JSM. 2012. Coronaviruses. Medical Microbiology (Eighteenth Edition) 12:587–593DOI 10.1016/B978-0-7020-4089-4.00072-X.
Pritchard L, Glover RH, Humphris S, Elphinstone JG, Toth IK. 2016. Genomics and taxonomyin diagnostics for food security: soft-rotting enterobacterial plant pathogens. Analytical Methods8(1):12–24 DOI 10.1039/C5AY02550H.
Rehman S, Shafique L, Ihsan A, Liu Q. 2020. Evolutionary trajectory for the emergence of novelcoronavirus SARS-CoV-2. Pathogens 9(3):240 DOI 10.3390/pathogens9030240.
Sanjuán R, Nebot MR, Chirico N, Mansky LM, Belshaw R. 2010. Viral mutation rates.Journal of Virology 84(19):9733–9748 DOI 10.1128/JVI.00694-10.
Schaecher SR, Mackenzie JM, Pekosz A. 2007. The ORF7b protein of severe acute respiratorysyndrome coronavirus (SARS-CoV) is expressed in virus-infected cells and incorporated intoSARS-CoV particles. Journal of Virology 81(2):718–731 DOI 10.1128/JVI.01691-06.
Simmonds P, Adams MJ, Benkő Mária, Breitbart M, Brister JR, Carstens EB, Davison AJ,Delwart E, Gorbalenya AE, Harrach B, Hull R, King AMQ, Koonin EV, Krupovic M,Kuhn JH, Lefkowitz EJ, Nibert ML, Orton R, Roossinck MJ, Sabanadzovic S, Sullivan MB,Suttle CA, Tesh RB, Van der Vlugt RA, Varsani A, Zerbini FM. 2017. Virus taxonomy in theage of metagenomics. Nature Reviews Microbiology 15(3):161–168DOI 10.1038/nrmicro.2016.177.
Siu K, Yuen K, Castano-Rodriguez C, Ye Z, Yeung M, Fung S, Yuan S, Chan C, Yuen K,Enjuanes L, Jin D. 2019. Severe acute respiratory syndrome coronavirus ORF3a proteinactivates the NLRP3 inflammasome by promoting TRAF3-dependent ubiquitination of ASC.FASEB Journal 33(8):8865–8877 DOI 10.1096/fj.201802418R.
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of largephylogenies. Bioinformatics 30(9):1312–1313 DOI 10.1093/bioinformatics/btu033.
Stobart CC, Sexton NR, Munjal H, Lu X, Molland KL, Tomar S, Mesecar AD, Denison MR.2013. Chimeric exchange of coronavirus nsp5 proteases (3CLpro) identifies common anddivergent regulatory determinants of protease activity. Journal of Virology 87(23):12611–12618DOI 10.1128/JVI.02050-13.
Subissi L, Imbert I, Ferron F, Collet A, Coutard B, Decroly E, Canard B. 2014. SARS-CoVORF1b-encoded nonstructural proteins 12–16: replicative enzymes as antiviral targets.Antiviral Research 101:122–130 DOI 10.1016/j.antiviral.2013.11.006.
Surjit M, Lal SK. 2008. The SARS-CoV nucleocapsid protein: a protein with multifarious activities.Infection, Genetics and Evolution 8(4):397–405 DOI 10.1016/j.meegid.2007.07.004.
Tan YJ, Lim SG, Hong W. 2005. Characterization of viral proteins encoded by theSARS-coronavirus genome. Antiviral Research 65(2):69–78 DOI 10.1016/j.antiviral.2004.10.001.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 29/31
Tanaka T, Kamitani W, DeDiego ML, Enjuanes L, Matsuura Y. 2012. Severe acute respiratorysyndrome coronavirus nsp1 facilitates efficient propagation in cells through a specifictranslational shutoff of host mRNA. Journal of Virology 86(20):11128–11137DOI 10.1128/JVI.01700-12.
Taylor JK, Coleman CM, Postel S, Sisk JM, Bernbaum JG, Venkataraman T, Sundberg EJ,Frieman MB. 2015. Severe acute respiratory syndrome coronavirus ORF7a inhibits bonemarrow stromal antigen 2 virion tethering through a novel mechanism of glycosylationinterference. Journal of Virology 89(23):11820–11833 DOI 10.1128/JVI.02274-15.
Te Velthuis AJW, Van den Worm SHE, Snijder EJ. 2012. The SARS-coronavirus nsp7+nsp8complex is a unique multimeric RNA polymerase capable of both de novo initiation and primerextension. Nucleic Acids Research 40(4):1737–1747 DOI 10.1093/nar/gkr893.
Tok TT, Tatar G. 2017. Structures and functions of coronavirus proteins : molecular modeling ofviral nucleoprotein. International Journal of Virology & Infectious Diseases 2:1–7.
Tyrrell DA, Bynoe ML. 1966. Cultivation of viruses from a high proportion of patients with colds.Lancet 1(7428):76–77 DOI 10.1016/S0140-6736(66)92364-6.
Woo PCY, Huang Y, Lau SKP, Tsoi HW, Yuen KY. 2005. In silico analysis of ORF1ab incoronavirus HKU1 genome reveals a unique putative cleavage site of coronavirus HKU1 3C-likeprotease. Microbiology and Immunology 49(10):899–908DOI 10.1111/j.1348-0421.2005.tb03681.x.
Woo PCY, Huang Y, Lau SKP, Yuen KY. 2010. Coronavirus genomics and bioinformaticsanalysis. Viruses 2(8):1805–1820 DOI 10.3390/v2081803.
Xu J, Zhao S, Teng T, Abdalla AE, Zhu W, Xie L, Wang YGX. 2020. Systematic comparison oftwo animal-to-human transmitted human coronaviruses: SARS-CoV-2 and SARS-CoV. Viruses12(2):244 DOI 10.3390/v12020244.
Yuen KS, Ye ZW, Fung SY, Chan CP, Jin DY. 2020. SARS-CoV-2 and COVID-19: the mostimportant research questions. Cell & Bioscience 10:40 DOI 10.1186/s13578-020-00404-4.
Zaki AM, Van Boheemen S, Bestebroer TM, Osterhaus ADME, Fouchier RAM. 2012. Isolationof a novel coronavirus from a man with pneumonia in Saudi Arabia. New EnglandJournal of Medicine 367(19):1814–1820 DOI 10.1056/NEJMoa1211721.
Zeng Z, Deng F, Shi K, Ye G, Wang G, Fang L, Xiao S, Fu Z, Peng G. 2018. Dimerization ofcoronavirus nsp9 with diverse modes enhances its nucleic acid binding affinity.Journal of Virology 92(17):e00692-18 DOI 10.1128/JVI.00692-18.
Zeng C, Wu A, Wang Y, Xu S, Tang Y, Jin X, Wang S, Qin L, Sun Y, Fan C, Snijder EJ,Neuman BW, Chen Y, Ahola T, Guo D. 2016. Identification and characterization of a ribose2′-O-Methyltransferase encoded by the ronivirus branch of nidovirales. Journal of Virology90(15):6675–6685 DOI 10.1128/JVI.00658-16.
Zhang T, Wu Q, Zhang Z. 2020. Probable pangolin origin of SARS-CoV-2 associated with theCOVID-19 outbreak. Current Biology 30(7):1346–1351.e2 DOI 10.1016/j.cub.2020.03.022.
Zhong NS, Zheng BJ, Li YM, Poon LLM, Xie ZH, Chan KH, Li PH, Tan SY, Chang Q, Xie JP,Liu XQ, Xu J, Li DX, Yuen KY, Peiris JSM, Guan Y. 2003. Epidemiology and cause of severeacute respiratory syndrome (SARS) in Guangdong. People’s Republic of China, in February362:1353–1358 DOI 10.1016/S0140-6736(03)14630-2.
Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, Si H-R, Zhu Y, Li B, Huang C-L,Chen H-D, Chen J, Luo Y, Guo H, Jiang R-D, Liu M-Q, Chen Y, Shen X-R, Wang X,Zheng X-S, Zhao K, Chen Q-J, Deng F, Liu L-L, Yan B, Zhan F-X, Wang Y-Y, Xiao G-F,Shi Z-L. 2020. A pneumonia outbreak associated with a new coronavirus of probable bat origin.Nature 579(7798):270–273 DOI 10.1038/s41586-020-2012-7.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 30/31
Zhu X, Fang L, Wang D, Yang Y, Chen J, Ye X, Foda MF, Xiao S. 2017. Porcine deltacoronavirusnsp5 inhibits interferon-$β$ production through the cleavage of NEMO. Virology 502:33–38DOI 10.1016/j.virol.2016.12.005.
Zhu N, Zhang D, WangW, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, Niu P, Zhan F,Ma X, Wang D, Xu W, Wu G, Gao GF, Tan W. 2020. A novel coronavirus from patients withpneumonia in China, 2019. New England Journal of Medicine 382(8):727–733DOI 10.1056/NEJMoa2001017.
Parlikar et al. (2020), PeerJ, DOI 10.7717/peerj.9576 31/31