Coronavirus SARS-CoV-2: Analysis of subgenomic mRNA transcription, 3CLpro and PL2pro protease cleavage sites and protein synthesis
11 Coronavirus SARS-CoV-2: Analysis of subgenomic mRNA transcription, 3CLpro and PL2proprotease cleavage sites and protein synthesis
Corresponding autor: Miguel Ramos-PascualAbstractCoronaviruses have recently caused world-wide severe outbreaks: SARS (Severe Acute RespiratorySyndrome) in 2002 and MERS (Middle-East Respiratory Syndrome) in 2012. At the end of 2019, a newcoronavirus outbreak appeared in Wuhan (China) seafood market as the first focus of infection, becoming apandemics in 2020, spreading mainly into Europe and Asia [Zu et al 2020]. Although the virus family is well-known and symptoms are similar to other coronaviruses (fever, pneumonia, small pleural effusions), thisspecific virus type presents considerable differences, as higher transmission and mortality rates, being achallenge for diagnostic methods, treatments and vaccines.Coronavirus.pro (SARS-CoV-2) App is a module of Virus.pro, a C++ application which simulates Coronavirusreplication cycle. This software identifies virus types in short times and provides FASTA files of virusproteins, a list of RNA sequences (regulatory, packaging, transcription and translation) and secondarystructures (stem-loops, helices, palindromes, mirrors), once the virus genome has been sequenced. Thecode is supported by other bio-informatics tools, such as Vienna RNA package, Varna software andClustalW2.Coronavirus.pro has identified 2019-nCoV virus as a beta-coronavirus more close related to SARS type thanto MERS. However, it presents significant differences, such as the spike glycoprotein precursor,characteristic of this virus type, and the increased number of transcription regulating sequences (TRS),producing more subgenomic mRNAS and synthesizing more fusion proteins than SARS/MERS. This could berelated with those severe health effects (toxicity) on host patients than other coronaviruses.The software has identified a list of structural, non-structural and accessory proteins in 2019-nCoV virusgenome similar to SARS and MERS. It has found also several ORF encoding some accessory proteins withunknown TRS (i.e. AP3b, AP9b, AP11, AP12 and AP14a/b). Furthermore, there is a subgenomic mRNA, theshortest with 374bp, which translates no proteins, specific only of SARS type virus. Finally, there are someaccessory proteins AP2 in SARS/MERS and AP2a/b in 2019-nCoV, encoded before ORF1.2 and ORF1.4respectively, which have not been previously reported.2019-nCoV protein sequences have been compared with those from SARS and MERS. As 3CLpro (nsp5) andRdRp (nsp12) have >90% similarities with SARS, some antiviral drugs effective with SARS coronavirus, suchas protease inhibitors or RNA-dependent RNA polymerase inhibitors could be also effective to this virustype. Nevertheless, further comparisons would be required, including other types of estimators.These results are useful as a first step with other bio-informatics and pharmacological tools in order todevelop diagnostic methods (real time RT-PCR or ELISA tests), new vaccines or antiviral drugs, which avoidvirus replication in any stage: fusion inhibitors, RdRp inhibitors and PL2pro/3CLpro protease inhibitors.Keywords: SARS-CoV, MERS-CoV, 2019-nCoV, Coronavirus, virus proteins, protease cleavage sites
1. IntroductionCoronaviruses (CoVs) are specific viruses that cause diseases in mammals and birds, including humans, withsymptoms such as enteritis in bats, mice and pigs and upper respiratory malfunctions and potentially lethalrespiratory infections in humans [Fehr and Perlman 2015] A large variety of coronaviruseses have beenpreviously studied and analyzed. These viruses are responsible in a 2-10% of common cold inimmunocompetent individuals (i.e. 229E, OC43E, NL63 and HKU1 types). However, other types can causesevere respiratory syndromes, such as SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) thatcaused an epidemics in 2002-2003, with origin on Guangdong (China) [Vijayanand et al 2004] and MERS-CoV (Middle Eastern Respiratory Syndrome Coronavirus) expanded in 2014-2015 through Saudi Arabia intoEgypt, Oman and Qatar, from bats to dromedary camels, as the source of infection in humans [Aleanizy etal 2017].At the end of 2019, a new SARS outbreak appeared in Wuhan (China) seafood market as the first focus ofinfection, becoming a pandemics in 2020 and spreading mainly into Europe and Asia, declaring generalstate of alarm in several countries, as Spain and Italy [Liao et al 2020, Zu et al 2020, Giovanetti et al 2020].Coronavirus infections have normally low case fatality rates, with symptoms more severe than commoncold, affecting mainly respiratory tract (cilia epithelium of the trachea, nasal mucosa and alveolar cells ofthe lung). Although the virus family is well-known and symptoms are similar to other coronavirus (fever,pneumonia, small pleural effusions), this specific type of virus presents considerable differences, as a higherinfection/transmission and mortality rate, being a challenge for disease protection, prevention, diagnosticmethods, vaccines and treatments [Wu et al 2020, Zhu et al 2020].Virus pharmacology is based on preventive actions (vector-based or RBD-based vaccines), diagnosticmethods (real time RT-PCR or ELISA tests) and antiviral drugs. SARS-CoV and MERS-CoV epidemics haveexpanded the use of several drugs, specially virus cycle inhibitors against coronaviruses (fusion inhibitors,RdRp inhibitors or PL2pro/3CLpro protease inhibitors) [Li G and De Clercq 2019, Raoult et al 2020]In order to develop these methods, virus replication cycle must be simulated through computerized tools,specially for this virus family, with a complex replication cycle. Coronaviruses synthesize in a first stage aviral RNA-dependent RNA polymerase (RdRp) and multiple proteases, transcribes several subgenomicmRNAs and translates them progressively into viral proteins through several ribosomal pathways: (-1)programmed frameshift, leaky scanning and internal ribosome entry site (IRES) [Plant et al 2005]. SomemRNAs include genes encoding large polypeptide chains, which are cleaved through 3CLpro and PL2proproteases, producing non-structural proteins as enzymes for catalyzing assembly and packaging of newviruses [Sawicki et al 2007, Fehr and Perlman 2015, Oxford et al. 2016]Coronavirus.pro (2019-nCoV) App is a C++ code which simulates Coronavirus replication cycle. Thissoftware identifies virus types in short times and provides FASTA files of virus proteins, a list of RNAsequences (regulatory, packaging, transcription and translation) and secondary structures (stem-loops,helixes, palindromes, mirrors), once the virus genome has been sequenced [Ramos-Pascual 2019]. The codeis supported by other RNA analysis tools, such as Vienna RNA package and Varna software [Gruver et al2008, Darty et al 2009]. These results are useful as a first step with other bioinformatics andpharmacological tools in order to develop diagnostic methods, new vaccines and antiviral drugs.
2. The Coronavirus: classification, structure, genome and virus cycle2.1 ClassificationCoronaviruses (CoVs) are part of the order Nidovirales, from the family Coronaviridae and formed byseveral subtypes: Alpha-, beta-, gamma- and delta-coronavirus. Virus 229E and OC43, the first of beingisolated and responsible of common cold, belong to alpha-coronavirus group I, while SARS and MERS arebeta-coronavirus.2.2 Structure of the virionCoronaviruses have diameters from 100 to 160 μ m with very large heavily glycosylated spikes (S) of 200kDaand 20 μ m, placed around virus membrane as a crown, hence their name, in a trimer configuration (fig. 1).Viral RNA genome is encapsulated in a helicoidal nucleocapsid phosphoprotein (N), known also asribonucleoprotein (RNP), and enveloped into a virus particle with different membrane (M) and envelopeglycoproteins (E).Some coronaviruses include also a hemagglutinin-acetyleserase glycoprotein (HE) in the outer membrane.HEs helps during attachment, destroying certain sialic acid receptors in host-cell surface. Not all strainsdemonstrated hemagglutination, as observed only in beta-coronaviruses subgroup 2a (HCoV-HKU1, MHV)and toroviruses (BToV) [Brian et al. 1995, de Groot 2006] Fig. 1 - Structure of a general Coronavirus virion particle
5’ UTR (ORF1.1, ORF1.2 ...), whereas at 3’ UTR are genes for structural (N, M, E and S). These genes areinterspaced with several accessory genes, encoding accessory proteins (AP), characteristics in number ofeach virus type. Some of these AP are not essential for in vitro or in vivo replication.As transcription starts at different TRS in each subgenomic mRNA, the number of ORF genes is variable onvirus type, and therefore the number of polypeptide chains. This produces different frequencies of non-structural proteins during virus cycle. For example, SARS produces ORF1.1 and ORF1.2 genes, whereas2019-nCoV, produces ORF1.1 to ORF1.6 genes, synthesizing several groups of fusion proteins [vanBoheemen et al. 2012]. A (-1) programmed slippery ribosome frameshift is placed approximately in themiddle of ORF1 genes, then translated into polypeptides pp1a and pp1ab [Dinman 2012, Bock et al 2019].Furthermore, Coronaviruses uses a leaky scanning mechanism (shunting) to synthesize proteins fromoverlapping ORF, translating different proteins from the same mRNA [Nakagawa et 2016].Surface glycosyllabed Spike (S) is processed in some coronaviruses from a proteollytic cleavage of a spikeprecursor [Belouzard et al 2009]. The number of spike precursors is characteristic of each coronavirus. Forexample, in the case of SARS-CoV, two spike precursors (Sp1 and Sp2) are proteolytically cleaved, producingtwo surface glycosyllabed spikes (S1 an S2) and a protease fragment (S0).Figure 2 shows a scheme of the main genes in Coronavirus family, including ORF1.1 with a -1 slipperyribosome frameshift. Genes transcribed from different TRS are placed in another line. Figure 3 presents ascheme of ORF1.1 gene and non-structural proteins (nsp1 to nsp16), including accessory protein AP2.Tables 1 and 2 summarize main genes and proteins of SARS-CoV and MERS-CoV coronaviruses.
11 432 2b2a1 2 6543 7 1098(-1) slipperyribosome frameshift
SARS-CoV
ORF1.1a ORF1.1b E M N An HKU1-CoV
ORF1.1a ORF1.1b S E M N1 An HE MERS-CoV
ORF1.1a ORF1.1b Sp E M N An ORF1.1a ORF1.1b Sp E M N83S1p structural and accesory proteinsnon-structural and fusion proteins5'5'5'5' An pp1a pp1ab S2p 7a63a 4a 53a 7a6 7b4b 8b7b2b2a 14a/b14a/b9b8a 9b8b22 11-12114 N23b3b3b 5b 14a/b
Fig. 2 – Scheme of the genes in viral genomes from Coronavirus family: non-structural proteins (white), structural (green), accessory(blue) and other ORF (grey). Fusion proteins and subgenomic mRNAs are not depicted.Fig. 3 – Scheme of the ORF1.1 gene and description of the non-structural proteins (nsp1 to nsp16) of SARS-CoV (SWISS Model)Accessory protein AP2 has been includedTable 1 - Summary of the main genes and characteristics of SARS-CoV [Xu et al 2003, Liu et al 2014] and MERS-CoV [Li et al 2019] (a) - Spike precursors length and number depends on virus type (Sp=S+S0) (b) TRS unknown
Table 2 - Description of non-structural proteins (Polyprotein pp1ab) of SARS and MERS coronavirus [Chen et al 2020]
Protease Protein CommentsPL2pro nsp1 Leader protein, suppress antiviral host response, promotes degradation of host mRNAs,inhibiting IFN signalingnsp2 unknownnsp3 ADP-ribose 1-phosphatase, PL2pro (papain-like protease 2)nsp3a unknownnsp3b unknown3CLpro nsp4 DMV formation, complex with nsp3nsp5 3C-like (3CLpro), Mpro, polypeptides cleavingnsp6 Restricting autophagosome expansion, DMV formationnsp7 Cofactor with nsp8 and nsp12nsp8a DNA primase, cofactor with nsp7 and nsp12nsp8bnsp9 Dimerization and RNA/DNA binding activitynsp10 interacts with nsp14 and nsp16 [Bouvet et al 2010,2012]nsp11 Short peptide at pp1a endnsp12 RNA-dependent RNA polymerase (RdRp)nsp13a Helicase, NTPase nucleoside 5’ triphosphatase (ZD, NTPase/HEL)nsp13bnsp14 3’-to-5’ exoribonuclease (nuclease ExoN homolog)nsp15 Endoribonuclease (endoRNAse), evasion of dsRNA sensorsnsp16 S-adenosylmethionine-dependent ribose 2'-O- methyltransferase (2'-O-MT)Type CodingGenes Protein DescriptionNon-structural ORF 1.1 pp1a Polyprotein 1app1ab Polyprotein 1ab, [-1] PRFStructural andaccessoryORF ORF 1.2 AP2 Accessory protein AP2, unknownpp1a Polyprotein 1app1ab Polyprotein 1ab, [-1] PRFORF2 Sp (a)
Surface Glycosylabed Spike precursor (Sp)S Surface Glycosylabed Spike (S)S0 Spike protease fragment (S0)ORF3a AP3a Viral pathogenesis, apoptosis induction, cell cycle arrest, modulation of NF-kb-mediatedinflammationORF3b AP3b (b)
IRES translation, viral pathogenesis, not required for SARS-CoV replicationORF4 E Envelope membraneORF5 M Transmembrane glycoproteinAP5 Unknown, only MERSORF6 AP6 Type I IFN production and signaling inhibition, only SARSORF7 AP7a/b Viral pathogenesis, apoptosis induction, cell cycle arrest, modulation of NF-kb-mmediatedinflammationORF8 AP8a/b (b)
ORF9 N Nucleocapsid phosphoproteinORF9b AP9b (b)
Viral pathogenesis, apoptosis induction, cell cycle arrest, modulation of NF-kb-mmediatedinflammation, named AP8b in MERSORF11 AP11 (b)
Unknown, only SARSORF14 AP14a/b (b) unknown
Figure 4 – Scheme of a Coronavirus cycle replication: [1] Attachment and fusion (APN/ACE2/DPP4), [2] endocytosis, [3] translationof vRNA (ORF1.1), [4] assembly of RNA-dependent RNA polymerase (RdRp) and other non-structural proteins (nsp) by proteases, [5]transcription of subgenomic mRNAs by RdRp, [6] translation of subgenomic mRNAs and protein synthesis, [7] assembly intomembraneous regions ERGIC and [7] fusion with plasma membrane and exit (BioRender http://app.biorender.io)
3. Coronavirus.pro3.1 DescriptionCoronavirus.pro is a module of Virus.pro, a C++ software application developed in modules that simulatemainly RNA and DNA virus replication cycles: Ebola, HIV-1, HCV (Hepatitis C), CoV, HSV1 (Human HerpesVirus), PV1 (Poliovirus 1, Mahoney) The software reproduces several virus cycle replication stages, fromattachment and fusion to virion exit from host-cell, focusing into more complex stages, such as subgenomicmRNAs translation, protein synthesis and protease catalytic processing (see fig. 5 and 6).Virus.pro contains a set of RNA/DNA databases and protein databases to scan viral genome and proteinsequences for recognized motifs, reconstruct secondary structures (helixes, stem-loops, palindromes,mirrors) and identify RNA-protein interaction regions. The software is supported with other applications, asVienna RNA package, for bracket-dot notation and Varna for plotting (see fig. 7) [Gruber et al 2008, Darty etal 2009]. The code contains also machine-learning algorithms, in which new virus, RNA/DNA sequences andproteins can be included to the internal databases to future identifications and analysis. The software hasbeen validated with other bio-informatic tools, as Blastp or Swiss-Model [Altschul et al 1990, Camacho et al2008, Waterhouse et al 2018, Ramos-Pascual 2019].
Fig. 5 – Software Virus.pro for simulating RNA/DNA virus replication cyclesFig. 6 – Software Coronavirus.pro for simulating SARS/MERS/2019-nCoV virus replication cycleFig. 7 - Scheme of the Virus.pro software (RNA module)
Coronavirus.pro includes a preprocessor to convert viral genome sequence file into a plain sequenceformat (nucleotides list). Preprocessor supports genomes in formats: FASTA, EMBL, GenBank, Stockholm1.0 and GCG (see fig 8).
Fig. 8 – Software for preprocessing viral genome sequence file. Supported formats: FASTA, EMBL, GenBank, Stockholm 1.0 and GCG pro ) and a cysteine 3C-like proteinase (3CL pro )synthesized from nsp3 and nsp5, respectively [Chen et al 2005]. Coronavirus.pro simulates proteolytic effect of coronavirus proteases PL2pro and 3CLpro though multipleprotease pattern sequences (table 3). Most of these sequences have been previously validated in someresearch studies and others have been proposed by comparison with protein databases, as UniProt or NCBI,and recursive simulations with the software [Kiemer et al 2004, Sulea et al 2006, Ramos-Pascual 2019].T able 3 - Protease cleavage site sequences for Coronavirus proteins (SARS -CoV and MERS-CoV)
PCS Sequence SARS-CoV MERS-CoV[1.1] VSQIQ↓ SRLT S1/S2-S0 -[1.2] GKIQD↓SLSST S1/S2-S0 -[1.3] GAMQT↓GFTTT - S1/S2-S0[2] YPKLQ↓ASQAW M1-M2 -[3] SNNLQ↓GLEN - N1-N2[4] ETRVQ↓CSTN - N2-N3[5] ELNGG↓AVTRY nsp1-nsp2 -[6] DPKGK↓YAQNL nsp1-nsp2[7] RLKGG↓APIKG nsp2-nsp3 nsp2-nsp3[8] KSSVQ↓ SVAG nsp3-nsp3a -[9] KNTVK↓SVGKF nsp3-nsp3a -[10] AQGLK↓KFYKE - nsp3-nsp3a[11/4] ETRVQ↓CSTN nsp3a-nsp3b nsp3a-nsp3b[12] SLKGG↓KIVST nsp3b-nsp4 -[13] KIVGG↓APTWF - nsp3b-nsp4[14] SAVLQ↓SGFRK nsp4-nsp5 nsp4-nsp5[15] GVTFQ↓GKFK nsp5-nsp6 -[16] GVVMQ↓SGVRK - nsp5-nsp6[17] VATVQ↓SKMSD nsp6-nsp7 -[18] VATLQ↓AENV nsp7-nsp8a -[19] VAAMQ↓SKLTD nsp8a-nsp8b nsp6-nsp7[20] HSVLQ↓APMST - nsp7-nsp8a[21] AVKLQ↓NNELS nsp8b-nsp9 nsp8a-nsp9[22] TVRLQ↓AGNAT nsp9-nsp10 nsp9-nsp10[23] EPLMQ↓ SADA nsp10-nsp11/nsp12 -[24] ALPQS↓KDSNF - nsp10-nsp11/nsp12[25] HTVLQ↓AVGAC nsp12-nsp13a nsp12-nsp13a[26/18] VATLQ↓AENV nsp13a-nsp14 nsp13a-nsp13b[27] YKLQS↓QIVTG - nsp13b-nsp14[28] FTRLQ↓SLENV nsp14-nsp15 -[29] TKVQG↓LENIA - nsp14-nsp15[30/2] YPKLQ↓ASQAW nsp15-nsp16 nsp15-nsp16
Cleavage sites are identified with a coarse approximation in which each protease cleavage sequence (A)scans through each protein aminoacid sequence (B). If ka and kb are respectively the amino acid length of Asequence and protein B, protein is cleaved at positions with the highest Levenshtein distance, calculated as:for (i = 1; i <= ka; i++) d[i][0] = i; for (i = 1; i <= kb; i++) d[0][i] = i;for (i = 1; i <= ka; i++)for (j = 1; j <= kb; j++)c = 0;if (a[i - 1] == b[j - 1]) { c = 0; }else { c = 1; };d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + c)}}
4. Results and discussion4.1 Virus identification: comparison with SARS-CoV and MERS-CoVCoronavirus.pro has been used with sequence MN908947 (Wuhan seafood market pneumonia virus isolateWuhan-Hu-1, complete genome) This sequence is specific of the virus type that caused the outbreak inWuhan of Hubei province (China), the first infection focus (23-jan-2020), which has been named as 2019-nCoV [NCBI database]The code has been also applied to other coronavirus types, such as sequences NC004718 (SARS coronavirus,complete genome) and NC019843 (MERS Middle East respiratory syndrome coronavirus, complete genome)[Snijder et al 2003, Moreno et al 2017], which have been taken as reference sequences. Other sequenceshave been also applied to compare similarity with 2019-nCoV virus genome (see table 4)
Table 4 - Summary of Coronavirus sequence files applied to Coronavirus.pro software
Virus Sequence Date Description bp Comments2019-nCoV MN908947 23-JAN-2020 Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1 (2019-nCoV) 29903 [ref]MERS-CoV NC019843 13-AUG-2018 Middle East respiratory syndrome coronavirus 30119 [ref]SARS-CoV NC004718 13-AUG-2018 SARS-CoV coronavirus 29751 [ref]KY417149 18-DEC-2017 Bat SARS-like coronavirus isolate Rs4255 29743 -AY278488 01-SEP-2009 SARS coronavirus BJ01 isolate genome sequence 29725 - MERS-CoV
SL1SL2TRS-L[53-78bp]SL4 SL5ASL5BSL5CSL4b RS-IINRE - II (196bp)RS-I[32bp]
Fig. 9 – MERS-CoV [1-356bp] - Scheme of the 5’ utr secondary structures (SL1-SL5C)
SL1SL2TRS-L[58-80bp] SL4NRE - I (101bp) SL5A SL5BSL5C
SARS-CoV
NRE - II (153bp)RS-III RS-IINRE - III (250bp)RS-I
Fig. 10 – SARS-CoV [1-300bp] - Scheme of the 5’ utr secondary structures (SL1-SL5C)3
SL1SL2TRS-L[61-83bp]SL4 SL5A SL5BSL5CNRE - II (154bp) NRE - III (251bp)RS - II RS - ISL4b
Fig. 11 – 2019-nCoV [1-300bp] - Scheme of the 5’ utr secondary structures (SL1-SL5C) SARS-CoV 2019-nCoV MERS-CoV
Poly(A) Poly(A) Poly(A)SL-3'utr SL-3'utr SL-3'utr
Fig. 12 – Scheme of the 3’ utr secondary structures [1-100bp]: stem-loop (SL) and Poly(A) tail
Protein PCS SARS-CoV PCS 2019-nCoV PCS MERS-CoVStructural proteins S1 & S2 S0 [1.1] SQIQE↓SLTTT [1.2] GKIQD↓SLSST [1.3] GAMQT↓GFTTTM1 M2 [2] YKLGA↓SQRVG [2] YKLGA↓SQRVA - -N1 N2 - - - - [3] NRLQA↓LESGKN2 N3 - - - - [4] QRVQG↓SITQRNon-structural proteins nsp1 nsp2 [5] ELNGG↓AVTRY [5] ELNGG↓AYTRY [6] DPKGK↓YAQNLnsp2 nsp3 [7] RLKGG↓APIKG [7] LKGGA↓PTKVT [7] RLKGG↓APVKKnsp3 nsp3a [8] NSVKS↓VAKLC [9] KNTVK↓SVGKF [10] AQGLK↓KFYKEnsp3a nsp3b [11/4] TRVEC↓TTIVN [11/4] TRVEC↓TTIVN [11/4] TRVEA↓STVVCnsp3b nsp4 [12] SLKGG↓KIVST [12] ALKGG↓KIVNN [13] KIVGG↓APTWFnsp4 nsp5 [14] SAVLQ↓SGFRK [14] SAVLQ↓SGFRK [14] GVLQS↓GLVKMnsp5 nsp6 [15] GVTFQ↓GKFKK [15] VTFQS↓AVKRT [16] GVVMQ↓SGVRKnsp6 nsp7 [17] VATVQ↓SKMSD [17] VATVQ↓SKMSD [19] VAAMQ↓SKLTDnsp7 nsp8a [18] ATLQA↓IASEF [18] ATLQA↓IASEF [20] SVLQA↓TLSEFnsp8a nsp8b [19] AAMQR↓KLEKM [19] AAMQR↓KLEKM [21] AVKLQ↓NNEIKnsp8b nsp9 [21] AVKLQ↓NNELS [21] AVKLQ↓NNELSnsp9 nsp10 [22] TVRLQ↓AGNAT [22] TVRLQ↓AGNAT [22] TVRLQ↓AGSNTnsp10 nsp11 & nsp12 [23] EPLMQ↓SADAS [23] PMLQS↓ADAQS [24] ALPQS↓KDSNFnsp12 nsp13a [25] HTVLQ↓AVGAC [25] HTVLQ↓AVGAC [25] TTLQA↓VGSCVnsp13a nsp13b [26/18] VATLQ↓AENVT [26/18] VATLQ↓AENVT [26/18] ATLTA↓PTIVNnsp13b nsp14 [27] YKLQS↓QIVTGnsp14 nsp15 [28] FTRLQ↓SLENV [28] FTRLQ↓SLENV [29] TKVQG↓LENIAnsp15 nsp16 [30/2] YPKLQ↓ASQAW [30/2] YPKLQ↓SSQAW [30/2] TFYPR↓LQASA
Table 6 - MERS-CoV open-reading frames (ORF) and proteins identified with Coronavirus.pro mRNA-TRS (a) j (bp) ORF proteins(Aa) Fusion protein(Aa) Comments1-[2] 63 ORF1.1 pp1a 4391 - - [nsp1-nsp11]pp1ab 7078 - - [nsp1-nsp10], [nsp12-nsp16]2-[1] 3904 ORF1.2 AP2 58 - - unknownpp2a 3022 pp2/nsp3 846 [nsp3a-nsp11]pp2ab 5709 pp2/nsp3 846 [nsp3a-nsp10], [nsp12-nsp16]3-[1] 11815 ORF1.3 pp3a 487 pp3/nsp7 24 [nsp8-nsp11]pp3ab 3174 pp3/nsp7 24 [nsp8-nsp10], [nsp12-nsp16]4-[2] 12751 ORF1.4 pp4a 226 pp4/nsp9 72 [nsp10-nsp11]pp4ab 2913 pp4/nsp9 72 [nsp10-nsp16]5-[2] 21405 ORF2 Sp 1354 - - Surface glycoprotein spike precursorS 1010 - - Surface glycoprotein spikeS0 344 - - S0 protease fragment6-[2] 25521 ORF3 AP3 103 - - Accessory protein AP37-[1] 25843 ORF4a AP4a 109 - - Accessory protein AP4a8-[2] 25928 ORF4b AP4b 246 - - Accessory protein AP4b9-[2] 26833 ORF5 AP5 224 - - Accessory protein AP510-[2] 27583 ORF6 E 82 - - Envelope protein11-[2] 27838 ORF7 M 219 - - Membrane proteinORF8a N 413 - - Nucleocapsid phosphoproteinN1/N2/N3 223/166/25 - - N1/N2/N3 protease fragments(a) Transcription regulating sequence: [1] TRS1 - 5’-cuaaac-3’ // [2] TRS2 - 5’-acgaac-3’ mRNA j (bp) ORF proteins(Aa) Commentsa >25532 ORF3b AP3b 66 Accessory protein AP3b, IRES translationb >28570 ORF8b AP8b1-b7 113/105/9061/55/53/49/37 Accessory proteins AP8b1-b7c >28990 ORF14 AP14a/b 42/108 Accessory proteins AP14a/b
SARS-CoV includes a transcription regulating sequence TRS which is not in MERS virus type, TRS3 (5’-cuaaacgaac-3’), also present in 2019-nCoV. SARS-CoV has several accessory protein (AP3a, AP6, AP7a),which are translated directly from mRNAs. In the case of AP6, although in some studies is named as nsp6, itis not processed by any protease, as other non-structural proteins. Other proteins, as AP11 is onlycharacteristic from SARS-CoV. Furthermore, the shortest mRNA (ORF15), with a length of 263bp, translatesno significant proteins and has an unknown functionality. Tables 8 and 9 show open-reading frames (ORF)and proteins (structural, non-structural, accessory and fusion) for SARS identified with Coronavirus.pro.
Table 8 - SARS-CoV open-reading frames (ORF) and proteins identified with Coronavirus.pro mRNA-TRS (a) j (bp) ORF proteins(Aa) Fusion protein(Aa) Comments1-[3] 63 ORF1.1 pp1a 4383 - - [nsp1-nsp11]pp1ab 7074 - - [nsp1-nsp10], [nsp12-nsp16]2-[1] 3665 ORF1.2 AP2 50 - - unknownpp2a 3095 pp2/nsp3 894 [nsp4-nsp11]pp2ab 5786 pp2/nsp3 894 [nsp4-nsp10], [nsp12-nsp16]pp2b 2628 pp3/nsp12 856 [nsp13-[nsp16]3-[1] 3800 ORF1.2 AP2 50 - - unknownpp2a 3095 pp2/nsp3 894 [nsp4-nsp11]pp2ab 5786 pp2/nsp3 894 [nsp4-nsp10], [nsp12-nsp16]pp2b 2628 pp3/nsp12 856 [nsp13-nsp16]4-[3] 21482 ORF2b S1p 1255 - - Surface glycoprotein Spike precursorS1 917 - - Surface glycoprotein SpikeS0 339 - - S0 protease fragment5-[1] 21913 ORF2a S2p 1112 - - Surface glycoprotein Spike precursorS2 774 - - Surface glycoprotein SpikeS0 339 - - S0 protease fragment6-[2] 25260 ORF3a AP3a 274 - - Accessory protein AP3a (SARS acsp3)7-[2] 26109 ORF4 E 76 - - Envelope protein8-[3] 26344 ORF5 M 221 - - Membrane protein9-[2] 26913 ORF6 AP6 63 - - Accessory protein AP6 (SARS nsp6)10-[2] 27267 ORF7a AP7a 122 - - Accessory protein AP7a11-[3] 27769 ORF8a AP8a 40 - - Accessory protein AP8a12-[2] 28106 ORF9a N 422 - - Nucleocapsid phosphoprotein (p9a)13-[1] 29489 ORF15 - - - - -(a) Transcription regulating sequence: [1] TRS-1 - 5’-cuaaac-3’ // [2] TRS2 - 5’-acgaac-3’ // [3] TRS3 - 5’-cuaaacgaac-3’
Table 9 - SARS-CoV open-reading frames (ORF) with unknown TRS predicted by Coronavirus.pro mRNA j (bp) ORF proteins(Aa) Commentsa > 25478 ORF3b AP3b 175 Accessory protein AP3b, IRES translationb >25640 ORF3b AP3b2 142 Accessory protein AP3b2c > 27273 ORF7b AP7b 45 AP7bd > 27779 ORF8b AP8b 85 AP8be >28120 ORF9b AP9b 98 Accessory protein AP9b, MA15 ExoN1f >28130 ORF11 AP11 73 AP11g > 28500 ORF14 AP14a/a 71/105 AP14a/b Table 10 - 2019-nCoV open-reading frames (ORF) and proteins identified with Coronavirus.pro mRNA-TRS (a) j (bp) ORF proteins(Aa) Fusion proteins (Aa) Comments1-[3] 66 ORF1.1 pp1a 4406 - - [nsp1-nsp11]pp1ab 7097 - - [nsp1-nsp10], [nsp12-nsp16]2-[1] 753 ORF1.2 pp2a 4233 pp2/nsp1 8 [nsp2-nsp11]pp2ab 6923 pp2/nsp1 8 [nsp2-nsp10], [nsp12-nsp16]3-[1] 2358 ORF1.3 pp3a 3676 pp3/nsp2 89 [nsp3-nsp11]pp3ab 6366 pp3/nsp2 89 [nsp3-nsp10], [nsp12-nsp16]4-[1] 3597 ORF1.4 AP2a 47 - - unknownAP2b 52 - -pp4a 3095 pp4/nsp3 893 [nsp3a-nsp11]pp4ab 5785 pp4/nsp3 893 [nsp3a-nsp10], [nsp12-nsp16]5-[1] 6936 ORF1.5 pp5a 2153 pp5/nsp3a 170 [nsp3b-nsp11]pp5ab 4843 pp5/nsp3a 170 [nsp3b-nsp10], [nsp12-nsp16]6-[1] 8655 ORF1.6 pp6a 1377 pp6/nsp4 234 [nsp5-nsp11]pp6ab 4067 pp6/nsp4 234 [nsp5-nsp10], [nsp12-nsp16]7-[1] 13730 ORF1.7 pp7 2595 pp7/nsp12 824 [nsp13-nsp16]8-[1] 16049 ORF1.8 pp8 1807 pp8/nsp12 34 [nsp13-nsp16]9-[1] 18452 ORF1.9 pp9 1019 pp9/nsp14 375 [nsp15-nsp16]10-[1] 20384 ORF1.10 pp10 374 pp10/nsp15 76 nsp1611-[3] 21552 ORF2 Sp 1274 - - Surface glycoprotein spike precursorS 936 - - Surface glycoprotein spikeS0 338 - - S0 protease fragment12-[2] 25385 ORF3a AP3a 276 - - (SARS ORF3/ORF3a/X1/U274)13-[2] 26237 ORF4 E 76 - - Envelope protein14-[3] 26469 ORF5 M 223 - - Transmembrane proteinM1 183 - - M1 protease fragmentM2 40 - - M2 protease fragment15-[2] 27041 ORF6 AP6 62 - - (SARS ORF6/p6)16-[2] 27388 ORF7a AP7a 122 - - (SARS ORF8/U122/X4/ORF7a)17-[1] 27644 ORF7b AP7b 43 - - (SARS ORF7b)18-[3] 27884 ORF8 AP8 122 - - (SARS ORF8)19-[3] 28256 ORF9 N 420 - - Nucleocapsid phosphoprotein (p9a)20-[1] 29530 ORF15 - - - - --(a) Transcription regulating sequence: [1] TRS1 - 5’-cuaaac-3’ // [2] TRS2 - 5’-acgaac-3’ // [3] TRS3 - 5’-cuaaacgaac-3’ mRNA j (bp) ORF proteins(Aa) Commentsa >25405 ORF3b1 AP3b1 42 AP3b1ORF3b2 AP3b2 34 AP3b2b >25457 ORF3b3 AP3b3 58 AP3b3ORF3b4 AP3b4 152 AP3b4c > 28274 ORF9b AP9b 98 AP9bd > 28305 ORF11 AP11 73 AP11e > 28359 ORF12 AP12 43 AP12f > 28450 ORF14 AP14a/b 74/187 AP14a/b
In general to all of these betacoronaviruses, there are several accessory proteins which expression in vivoand in vitro has not been proved, and therefore its function is still unknown. It is the case of accessoryprotein AP2 in SARS/MERS and AP2a/b in 2019-nCoV.4.5 Coronavirus proteinsThere are considerable differences between spike glycoproteins. For example, the number of spikeglycoproteins is variable with MERS and also between SARS virus types. KY417149 (SARS) virus sequenceencodes three spike glycoprotein precursors of different amino acid lengths (S1p, S2p and S3p), which laterare processed by virus protease into S1, S2 and S3 spikes, with a common fragment S0. In the case of,NC004718 and AY278488 (SARS), it synthesizes two spike precursors (S1p and S2p), whereas 2019-nCoVand MERS, only one is processed. Spike glycoproteins from the same virus, although having differentlengths, are estimated with a 100% identity, as observed from their identity matrices.In the case of other proteins (N, M and E), it can be observed that this virus is more close related to SARSthan to MERS, as also discussed previously. However, it presents also around 10% differences with otherSARS, so it could be considered as a different virus typeAll these proteins have been aligned with Clustal 1.2 to compare similarities [Higgins 1994, Brown et al1998] (see Annex A for alignment details).Table 12 compares structural proteins in these genome sequences of beta-coronaviruses. id(%) = (1-d/L)x100 , where d is the Levenshtein distance between both sequences and L is theprotein length of the SARS/MERS reference protein sequence. Although there are other distanceestimators (Needleman-Wunsch, Smith-Waterman, Damerau-Levenshtein), the Levenshtein distance is anaccurate estimator for high similar sequences.In the case of MERS, no identity has been found in any protein (< 50%). Table 13 compares 2019-nCoVproteins with several virus genome sequences of SARS-CoV. As observed, most of non-structural proteins(nsp1, nsp3b and nsp5 to nsp16), accessory proteins AP7a/b and structural proteins M, N and E have thehighest percents of similarity (>70%), proving that this virus is more close related with SARS type thanMERS. Glycoprotein spike (S), most of accessory proteins (except AP7a/b, AP11 and AP14a) and non-structured proteins nsp2 to nsp3a and nsp4 have low similarity (< 50 %), proving that those proteins arecharacteristics of this virus type, and potential targets for specific vaccines and antiviral drugs. The fact thatnon-structural proteins are similar to SARS, indicates that antiviral drugs could be effective also to this virus. Table 13 - Comparison of structural, non-structural and accessory proteins of 2019-nCoV with SARS-CoV
Protein NC004718 KY417149 AY27848813-AUG-2018 18-DEC-2017 01-SEP-2009% Id (1) % Id (3) % Id (2)
Structural N 90.48 90.24 90.48M 90.58 89.24 90.58M1 91.26 90.16 91.80M2 82.50 85.00 85.00E 94.74 94.74 94.74Sp < 50 % < 50 % < 50 %SS0 94.67 94.67 94.67Non-structural Nsp1 83.89 85.00 84.44Nsp2 < 50 % < 50 % < 50 %Nsp3Nsp3aNsp3b 87.65 87.94 88.24Nsp4 < 50 % < 50 % < 50 %Nsp5 95.44 95.77 95.77Nsp6 87.54 87.20 87.20Nsp7 97.62 100 98.81Nsp8a 94.64 98.21 98.21Nsp8b 95.74 97.16 97.16Nsp9 95.58 97.35 97.35Nsp10 96.43 97.14 96.43Nsp11 84.62 76.92 76.92Nsp12 96.24 96.03 96.24Nsp13 99.50 99.50 99.67Nsp14 94.69 95.64 95.07Nsp15 88.15 88.73 88.73Nsp16 93.31 94.31 93.31Accessory AP2a < 50 % < 50% < 50 %AP2bAP3aAP3b1AP3b2AP3b3AP3b4AP6 67.74AP7a 85.25 87.70 85.25AP7b 79.55 81.82 79.55AP8 < 50 % < 50 % < 50 %AP9bAP11 76.71 73.97 76.71AP12 < 50 % < 50 % < 50 %AP14a 74.32% 74.32 74.32AP14b < 50 % < 50 % < 50 % As 3CLpro (nsp5) and RdRp (nsp12) have >90% similarities with SARS, some antiviral drugs, such as proteaseinhibitors or RNA-dependent RNA polymerase inhibitors could be effective to this virus type. Nevertheless,further comparisons would be required, including other types of estimators.5. ConclusionsCoronavirus.pro software provides an accurate and reliable simulation model of Coronaviruses replicationcycles: SARS/MERS/2019-nCoV. The code simulates transcription of subgenomic mRNAs, translation,protease cleavage, protein synthesis and virus assembly, including all fusion proteins.As a result of the analysis, 2019-nCoV can be identified as a beta-coronavirus type SARS-CoV virus with highconfidence, named SARS-CoV2, and it is consistent with other recent research analysis. Similarities havebeen found in 5’utr and 3’utr regions, protease cleavage sites and amino acid composition of bothstructural and non-structural proteins [Ceraolo and Giorgi 2020, Gorbalenya et al. 2020, Wu et al 2020]However, there are still differences between both coronavirus (SARS-CoV and 2019-nCoV), as the numberof spike precursors and accessory proteins.Coronavirus.pro is able to identify virus type and family, comparing virus genome and proteins with proteinand RNA motifs databases. In this case, 2019-nCoV has been identified as a beta-coronavirus SARS in morethan 70% than with MERS. However several differences have been found with SARS/MERS. 2019-nCoV hasmore transcription regulating sequences (TRS) interspaced in the genome and consequently, is producingmore subgenomic mRNAs and more fusion proteins during RdRp transcription, which could explain moresevere health effects and infectivity than SARS/MERS.The software has identified those proteins characteristics of 2019-nCoV: Spike S, AP3a, AP3b, AP8, AP9b,AP12 and AP14b and nsp2/3/3a, with similarity < 50 % with other beta-coronaviruses.Coronavirus.pro has predicted also some accessory proteins in all beta-coronavirus which have not beenpreviously described, called AP2 in SARS-CoV and MERS-CoV, and AP2a/AP2b in 2019-nCoV, respectively.These proteins are encoded in the same genetic region as PL2pro protease (nsp3) and are translated beforeORF1.2 (SARS/MERS) and ORF1.4 (2019-nCoV). If they are expressed in vivo or in vitro is not clearlyunderstood, as they could be part of a leaky scanning/shunting mechanism.The software has predicted some additional protease cleavage sites, giving place to some hypotheticalproteins, as nsp3a↓nsp3b, nsp8a↓nsp8b, nsp13a↓nsp13b, M1↓M2 and N1↓N2↓N3. These cleavagesites must be discussed and supported in detail with other methods.As a conclusion, Coronavirus.pro (2019-nCoV) is able to identify virus genomes and provides in short timesuseful results (FASTA files of virus proteins and RNA secondary structures). Future research will be focusedin interactions between RNA and protein sequences and intracellular processes, fusion protein synthesis,RNA packaging and virus assembly, as carried out before with HIV virus with Monte Carlo simulations.These results will be applied to develop preventive actions (vaccines), diagnostic methods (real time RT-PCR or ELISA tests), and antiviral drugs (fusion inhibitors, RdRp inhibitors or PL2pro/3CLpro proteaseinhibitors).
6. ReferencesAleanizy FS, Mohmed N, Algahtani FY and Mohamed RAEH. Outbreak of Middle East respiratory syndromecoronavirus in Saudi Arabia: a retrospective study. BMC Infect Dis. 2017; 17: 23. doi: 10.1186/s12879-016-2137-3Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ. Basic local alignment search tool. J Mol Biol.215:403-410. PubMed (1990)Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, MagraneM, martin MJ, Natale DA, O’Donovan C, Redaschi N and Yeh L. UniProt: the Universal Proteinknowledgebase. Nucleic Acids Res 32 D115-D119 (2004)Belouzard S, Chu VC and Whittaker GR. Actiation of the SARS coronavirus spike protein via sequentialproteolytic cleavage at two distinct sites. Proc Natl Acad Sci USA 106(14):5871-6 (2009)Bock LV, Caliskan N, Korniy N, Peske F, Rodnina MV and Grubmüller H. Thermodynamic control of -1programmed ribosomal frameshifting. Nature Communications 10: 4598 (2019)Bonnal S, Boutonnet C, Prado-Lourenço and Vagner S. IRESdb: The Internal Ribosome Entry Site database.Nucleic Acids Res, 31(1): 427-428 (2003)Bouvet M, Debarnot C, Imbert I, Selisko B, Snijder EJ, Canard B et al. In vitro reconstruction of SARS-coronavirus mRNA cap methylation. PLoS Pathog 6 (2010)Bouvet M, Imbert I, Subissi L, Gluais L, Canard B, Decroly E. RNA 3’end mismatch excision by the severeacute respiratory syndrome coronavirus non-structural protein nsp10/nsp14 exoribonuclease complex.Proc Natl Acad Sci USA 109 (2012)Brian DA, Hogue BG and Kienzle TE. The Coronavirus Hemagglutinin Esterase Glycoprotein. TheCoronaviridae. In: Siddell S.G. (eds) The Coronaviridae. The Viruses. Springer, Boston, MA (1995)Brown NP, Leroy C, Sander C. MView: A Web compatible database search or multiple alignment viewer.Bioinformatics. 14 (4):380-381 (1998)Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, and Madden TL. BLAST+: architectureand applications. BMC Bioinformatics 10:421. PubMed (2008)Ceraolo C and Giorgi FM. Genomic variance of the 2019-nCoV coronavirus. Journal of Medical Virology.Short Comm. (2020)Chen S, Chen L, Luo H, Sun T, Chen J, Ye F, Cai J, Shen J, Shen X and Jiang H. Enzymatic activitycharacterization of SARS coronavirus 3C-like protease by fluorescence resonance energy transfer technique.Acta Pharmacologica Sinica volume 26, p99–106 (2005)Chen Y, Liu Q and Guo D. Emerging coronaviruses: Genome structure, replication and pathogenesis. JourMed Virol 92(4) (2020)Darty K, Denise A and Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure.Bioinformatics. 2009 Aug 1; 25(15): 1974–1975.De Groot. Structure, function and evolution of the hemagglutinin-esterase proteins of corona- andtoroviruses. Glycoconj J 23, 59–72 (2006). https://doi.org/10.1007/s10719-006-5438-8Dinman JD. Control of gene expression by translational recoding. Adv Protein Chem Struct Biol. 86, 129–149(2012) Doytchinova IA and Flower DR. Identifying candidate subunit vaccines using an alignment-independentmethod based on principal amino acid properties. Vaccine 25:856-866 (2007)Doytchinova IA and Flower DR. VaxiJen: a server for prediction of protective antigens, tumour antigens andsubunit vaccines. BMC Bioinformatics 8:4 (2007)Fehr AR and Perlman S. Coronaviruses: An Overview of their Replication and Pathogenesis. Methods MolBiol 1282: 1–23 (2015) doi: 10.1007/978-1-4939-2438-7_1Giovanetti M, Benvenuto D, Angeletti S and Ciccozzi M. The first two cases of 2019-nCoV in Italy: Wherethey come from? J Med Virol 92(5): 518-521 (2020) http://doi:10.1002/jmv.25699Gorbalenya AE, Baker SC, Baric RS, de Groot RJ, Drosten C, Gulyaeva AA, Haagmans BL et al. CoronaviridaeStudy Group of The International Committee on Taxonomy of Viruses. Nature Microbiology. ConsensusStatement (2020)Gruber AR, Lorenz R, Bernhart SH, Neuboeck R and Hofacker IL. The Vienna RNA Websuite. Nucleic AcidsRes 36: W70-W74 (2008)He Y and Shibo J. Vaccine Design for Severe Acute Respiratory Syndrome Coronavirus. Viral Immunology18(2) (2005)Higgins DG. CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol. Biol., 25, 307–318(1994)Irigoyen N, Firth AE, Jones JD, Chung BYW, Siddell SG and Brierley I. High-Resolution Analysis of CoronavirusGene Expression by RNA Sequencing and Ribosome Profiling. PLOS Pathogens (2016)Jia HP, Look DC, Shi L, Hickey M, Pewe L, Netland J et al. ACE2 Receptor Expression and Severe AcuteRespiratory Syndrome Coronavirus Infection Depend on Differentiation of Human Airway Epithelia. J Virol79(23) 14614-14621 (2005)Kiemer L, Lund O, Brunak S and Blom N. Coronavirus 3CLpro proteinase cleavage sites: Possible relevanceto SARS virus pathology. BMC Bioinformatics 5:72 (2004)Li F, Li W, Farzan M and Harrison SC. Structure of SARS Coronavirus Spike Receptor-Binding DomainComplexed with Receptor. Science 309, issue 5742, pp 1864-1868 (2005)Li G and De Clercq E. Therapeutic options for the 2019 novel coronavirus (2019-nCoV) Nature Reviews.Supplementary Information (2019)Li Y, Hu C, Wu N, Yao H and Li L. Molecular Characteristics, functions and related pathogenicity of MERS-CoV proteins. Engineering 5, 940-947 (2019)Liao X, Wang B and Kang Y. Novel coronavirus infection during the 2019-2020 epidemic: preparing intensivecare units-the experience in Sichuan Province, China. Intensive Care Med 46(2):357-360 (2020) http://doi:10.1007/s00134-020-05954-2Lin Y, Zhang X, Wu R and Lai M. The 3’ Untranslated Region of Coronavirus RNA is requiered for subgenomicmRNA transcription from a defective interfering RNA. Journal of Virology 7236-7240 (1996)Liu DX, Fung TS, Chong KKL, Shukla A, Hilgenfeld R. Accessory proteins of SARS-CoV and other coronaviruses.Antiviral Research 10, 97-109 (2014) th Edition (2016) Oxford PublishingPeng YH, Lin CH, Lin CN, Lo CY, Tsai TL and Wu HY. Characterization of the Role of Hexamer AGUAAA andPoly(A) Tail in Coronavirus Polyadenylation. PLoS One 11(10) (2016)Plant E, Pérez-Alvarado, Jacobs JL, Mukhopadhyay, Hennig M and Dinman JD. A three-stemmed mRNApseudoknot in the SARS Coronavirus Frameshift Signal. PLoS Biol 3(6) (2005)Ramos-Pascual M. Simulation of HIV-1 virus cycle with fortran 90 / C++. Lambert Academic Publishing (LAP)ISBN: 978-3-659-78218-3 (2019)Ramos-Pascual M. Simulation of HIV-1 virus infection of a CD4+T lymphocyte by Monte Carlo. GrinPublishing. ISBN (Ebook) 978-3-346-05467-8 (2019)Raoult D, Zumla A, Locatelli F, Ippolito G and Kroemer G. Coronavirus infections: Epidemiological, clinicaland immunological features and hypotheses. Cell Stress 2 (2020) http://doi: 10.15698/cst2020.04.216Sawicki AG, Sawicki DL and Siddell SG. A Contemporary View of Coronavirus Transcription. J Virol 8(1):20-29(2007)Sethna PB, Hung SL, Brian DA. Coronavirus subgenomic minus-strand RNAs and the potential for mRNAreplicons. Proc Natl Acad Sci USA. Vol 86 5626-5630 (1989)Sharma K, Akerström S, Sharma AK, Chow VTK, Teow S, Abrenica B, Booth SA, Booth TF, Mirazimi A and LalSK. SARS-CoV 9b Protein Diffuses into Nucleus, Undergoes Active Crm1 Mediated Nucleocytoplasmic Exportand Triggers Apoptosis When Retained in the Nucleus. PLoS One 6(5) (2011)Song J, Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC and Pike R. PROSPER: An Integrated feature-basedtool for predicting protease substrate cleavage sites. PLoS ONE 7(11): e50300https://doi.org/10.1371/journal.pone.0050300 Snijder EJ, Bredenbeek PJ, Dobbe JC, Thiel V, Ziebuhr J, Poon LL, Guan Y, Rozanov M, Spaan WJ andGorbalenya AE. Unique and conserved features of genome and proteome of SARS-coronavirus, an earlysplit-off from the coronavirus group 2lineage. J. Mol. Biol. 331 (5), 991-1004 (2003)Sulea T, Lindner HA, Purisima EO and Ménard R. Binding site-based classification of coronaviral papain-likeproteases. Proteins 62(3) (2006)van Boheemen S, de Graaf M, Lauber C, Bestebroer TM, Raj VS, Zaki AM, Osterhaus AD, Haagmans BL,Gorbalenya AE, Snijder EJ and Fouchier RA. Genomic characterization of a newly discovered coronavirusassociated with acute respiratory distress syndrome in humans. MBio 3 (6), e00473-12 (2012)Vijayanand P, Wilkins E, Woodhead M. Severe acute respiratory syndrome (SARS): a review. Clin Med (Lond)2004 4(2):152-60Wang, H., Yang, P., Liu, K., Guo, F., Zhang, Y., Zhang, G., Jiang, C., 2008. SARS coronavirus entry into hostcells through a novel clathrin- and caveolae-independent endocytic pathway. Cell research 18(2), 290-301.Wang Q, Wong G, Lu G, Yan J and Gao GF. MERS-CoV spike protein: Targets for vaccines and therapeutics.Antiviral Research 133, 165-177 (2016)Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, de Beer, TAP, Rempfer C,Bordoli L, Lepore R, Schwede T. SWISS-MODEL: homology modelling of protein structures and complexes.Nucleic Acids Res. 46, W296-W303 (2018)Wu A, Peng Y, Huang B et al. Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV)originating in China. Cell Host & Microbe (2020)Wu Z and McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019(COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for DiseaseControl and Prevention. JAMA. 2020 Feb 24. doi: 10.1001/jama.2020.2648.Xu J, Hu J, Wang J, Han Y, Hu Y, Wen J, Li Y, Ji J et al. Genome Organization of the SARS-CoV. GenomicsProteomics Bioinformatics. 1(3): 226-235 (2003)Yang D, Leibowitz JL, The Structure and Functions of Coronavirus Genomic 3’ and 5’ Ends, Virus Research,doi:10.1016/j.virusres.2015.02.025Zeng Q, Langereis MA, van Vliet ALW, Huizinga EG and de Groot RJ. Structure of coronavirus hemagglutinin-esterase offers insight into corona and influenza virus evolution. Proc Natl Acad Sci USA Jul 1; 105(26):9065–9069. (2008)Zhu N et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020 Feb20;382(8):727-733. doi: 10.1056/NEJMoa2001017Zu ZY, Jiang MD, Xu PP, Chen W, Ni QQ, Lu GM, Zhang LJ. Coronavirus Disease 2019 (COVID-19): Aperspective from China. Radiology. 2020 Feb 21:200490. doi: 10.1148/radiol.2020200490 Annex A - Sequence alignment of Spike GlycoproteinA.1 Spike glycoproteins (S) MView 1.63, Copyright © 1997-2018 Nigel P. Brown A.2 S0 protein from Spike glycoprotein precursor (Sp)
MView 1.63, Copyright © 1997-2018 Nigel P. Brown A.3 Envelope protein (E)
MView 1.63, Copyright © 1997-2018 Nigel P. Brown A.4 Membrane protein (M)