bioRxiv | 2021

Variation in synonymous nucleotide composition among genomes of sarbecoviruses and consequences for the origin of COVID-19

 

Abstract


The subgenus Sarbecovirus includes two human viruses, SARS-CoV and SARS-CoV-2, respectively responsible for the SARS epidemic and COVID-19 pandemic, as well as many bat viruses and two pangolin viruses. Here, the synonymous nucleotide composition (SNC) of Sarbecovirus genomes was analysed by examining third codon-positions, dinucleotides, and degenerate codons. The results show evidence for the eigth following groups: (i) SARS-CoV related coronaviruses (SCoVrC including many bat viruses from China), (ii) SARS-CoV-2 related coronaviruses (SCoV2rC; including five bat viruses from Cambodia, Thailand and Yunnan), (iii) pangolin viruses, (iv) three bat viruses showing evidence of recombination between SCoVrC and SCoV2rC genomes, (v) two highly divergent bat viruses from Yunnan, (vi) the bat virus from Japan, (vii) the bat virus from Bulgaria, and (viii) the bat virus from Kenya. All these groups can be diagnosed by specific nucleotide compositional features except the one concerned by recombination between SCoVrC and SCoV2rC. In particular, SCoV2rC genomes are characterised by the lowest percentages of cyosine and highest percentages of uracil at third codon-positions, whereas the genomes of pangolin viruses exhibit the highest percentages of adenine at third codon-positions. I suggest that latitudinal and taxonomic differences in the imbalanced nucleotide pools available in host cells during viral replication can explain the seven groups of SNC here detected among Sarbecovirus genomes. A related effect due to hibernating bats is also considered. I conclude that the two independent host switches from Rhinolophus bats to pangolins resulted in convergent mutational constraints and that SARS-CoV-2 emerged directly from a horseshoe bat virus.

Volume None
Pages None
DOI 10.1101/2021.08.26.457807
Language English
Journal bioRxiv

Full Text