Proceedings of the National Academy of Sciences | 2021

Virus-like insertions with sequence signatures similar to those of endogenous nonretroviral RNA viruses in the human genome

 
 
 
 
 
 
 
 

Abstract


Significance Ancient animals left diverse physical fossil records from which we can deduce that species with extraordinary features once populated our planet. By infecting germlines, some ancient viruses deposited genetic fossil records. However, inferring that a sequence is a viral fossil has so far required homology to circulating viruses. We developed a method to recognize viral fossils that do not closely resemble known viruses. Rather than homology, we detected sequence patterns of fossilized and modern RNA viruses that distinguish them from human sequences. Our results indicate that as-yet-undiscovered fossils from unknown viruses remain hidden in animal genomes. These relics of the ancient virosphere, including sequences reported here, will expand our knowledge about the diversity of ancient viruses and also our genomes. Understanding the genetics and taxonomy of ancient viruses will give us great insights into not only the origin and evolution of viruses but also how viral infections played roles in our evolution. Endogenous viruses are remnants of ancient viral infections and are thought to retain the genetic characteristics of viruses from ancient times. In this study, we used machine learning of endogenous RNA virus sequence signatures to identify viruses in the human genome that have not been detected or are already extinct. Here, we show that the k-mer occurrence of ancient RNA viral sequences remains similar to that of extant RNA viral sequences and can be differentiated from that of other human genome sequences. Furthermore, using this characteristic, we screened RNA viral insertions in the human reference genome and found virus-like insertions with phylogenetic and evolutionary features indicative of an exogenous origin but lacking homology to previously identified sequences. Our analysis indicates that animal genomes still contain unknown virus-derived sequences and provides a glimpse into the diversity of the ancient virosphere.

Volume 118
Pages None
DOI 10.1073/pnas.2010758118
Language English
Journal Proceedings of the National Academy of Sciences

Full Text