Mutations on COVID-19 diagnostic targets
aa r X i v : . [ q - b i o . GN ] M a y Mutations on COVID-19 diagnostic targets
Rui Wang , Yuta Hozumi , Changchuan Yin ∗ , and Guo-Wei Wei , , † Department of Mathematics, Michigan State University, MI 48824, USA Department of Mathematics, Statistics, and Computer Science,University of Illinois at Chicago, Chicago, IL 60607, USA Department of Biochemistry and Molecular BiologyMichigan State University, MI 48824, USA Department of Electrical and Computer EngineeringMichigan State University, MI 48824, USA
Abstract
Effective, sensitive, and reliable diagnostic reagents are of paramount importance for combating theongoing coronavirus disease 2019 (COVID-19) pandemic at the time there is no preventive vaccine norspecific drug available for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It would bean absolute tragedy if currently used diagnostic reagents are undermined in any manner. Based on thegenotyping of 7818 SARS-CoV-2 genome samples collected up to May 1, 2020, we reveal that essentiallyall of the current COVID-19 diagnostic targets have had mutations. We further show that SARS-CoV-2 hasthe most devastating mutations on the targets of various nucleocapsid (N) gene primers and probes, whichhave been unfortunately used by countries around the world to diagnose COVID-19. Our findings explainwhat has seriously gone wrong with a specific diagnostic reagent made in China. To understand whetherSARS-CoV-2 genes have mutated unevenly, we have computed the mutation ratio and mutation h -indexof all SARS-CoV genes, indicating that the N gene is the most non-conservative gene in the SARS-CoV-2genome. Our findings enable researchers to target the most conservative SARS-CoV-2 genes and proteinsfor the design and development of COVID-19 diagnostic reagents, preventive vaccines, and therapeuticmedicines. ∗ Address correspondences to Changchuan Yin. E-mail:[email protected] † Address correspondences to Guo-Wei Wei. E-mail:[email protected] K -means methodsto cluster these mutations, resulting in globally at least five distinct subtypes of SARS-CoV-2 genomes, fromearly Cluster I to late Cluster V. Table 1 shows cluster distributions of samples ( N NS ) and total mutationcounts ( N TF ) for 11 countries. Table 1: The cluster distributions of samples ( N NS ) and total mutation counts ( N TF ) for 11 countries. Cluster I Cluster II Cluster III Cluster IV Cluster VCountry N NS N TF N NS N TF N NS N TF N NS N TF N NS N TF US 739 5149 248 1623 514 4968 60 555 677 5035CA 40 240 13 72 28 193 14 119 19 126AU 63 434 354 3810 182 1315 99 873 96 691UA 2 13 554 3785 607 4206 597 5730 60 457CN 23 54 179 865 7 63 1 13 1 7DE 0 0 12 42 3 18 8 70 20 131FR 0 0 14 55 105 755 6 49 66 463UK 0 0 23 90 10 55 4 30 0 0IT 0 0 6 134 22 161 12 140 0 0JP 0 0 67 194 0 0 0 0 0 0KR 0 0 26 160 0 0 0 0 0 0The US, Canada (CA), Australia (AU), Ukraine (UA), and China (CN) samples involve all of the fiveclusters. Among them, China initially had samples only in Clusters I and II and its sample distributionsreached to other Clusters after March 2020. Germany (DE) and France (FR) samples are in Cluster II, III,IV, and V. United Kingdom (UK) and Italy (IT) samples are mainly in Clusters II, III, and IV. Japan (JP) andKorea (KR) samples belong to Cluster II only. Cluster II is common to all countries.Table 2 provides all mutations on various primers and probes and their occurring frequencies in variousclusters. More detailed mutation information is given in Tables S2-S42 of the Supporting Material. It isinteresting to note that N-China-F [10] is the most inefficient reagent among all primers/probes and itsSARS-CoV-2 target has eight mutations involving samples in all five clusters, which may explain manymedia reports about the inefficiency of certain COVID-19 diagnostic kits made in China. Note that primersand probes typically have a small length of around 20 nucleotides.2 able 2: Summary of mutations on COVID-19 diagnostic primers and probes and their occurrence frequencies in clusters.
Primer/probe a a a a b [3] 5 15 3 4 6 0 2N-Sarbeco-P b [3] 2 3 0 0 1 2 0N-Sarbeco-R b [3] 7 33 7 6 8 0 12N-China-F [10] 8 4194 9 76 29 4062 15N-China-R [10] 7 14 2 1 7 3 1N-China-P [10] 0 0 0 0 0 0 0N-HK-F [10] 4 44 0 3 16 25 0N-HK-R [10] 3 12 2 0 7 2 1N-JP-F [10] 2 5 3 2 0 0 0N-JP-R [10] 2 4 0 2 2 0 0N-TL-F [10] 7 40 1 32 4 3 0N-TL-R [10] 6 14 0 6 6 1 1N-TL-P [10] 3 12 0 1 2 9 0E-Sarbeco-F1 c c c c c c c c c d d d a b c d Table 3: Gene-specific statistics of SARS-CoV-2 single mutations.
Gene type Gene site Gene length Unique SNPs mutation ratio h-indexNSP1 266:805 540 121 0.2241 8NSP2 806:2719 1914 407 0.2126 16NSP3 2720:8554 5835 912 0.1563 18NSP4 8555:10054 1500 203 0.1353 11NSP5(3CL) 10055:10972 918 130 0.1416 10NSP6 10973:11842 870 133 0.1529 8NSP7 11843:12091 249 37 0.1486 5NSP8 12092:12685 594 77 0.1296 4NSP9 12686:13024 339 48 0.1416 6NSP10 13025:13441 417 44 0.1055 4NSP11 13442:13480 39 5 0.1282 2RNA-dependent-polymerase 13442:16236 2796 363 0.1298 15Helicase 16237:18039 1803 227 0.1259 123’-to-5’ exonuclease 18040:19620 1581 241 0.1524 10endoRNAse 19621:20658 1038 143 0.1378 102’-O-ribose methyltransferase 20659:21552 894 115 0.1286 8Spike protein 21563:25384 3819 622 0.1629 17ORF3a protein 25393:26220 825 231 0.28 13Envelope protein 26245:26472 225 30 0.1333 6Membrane glycoprotein 26523:27191 666 105 0.1577 11ORF6 protein 27202:27387 183 47 0.2568 6ORF7a protein 27394:27759 363 88 0.2424 6ORF7b protein 27756:27887 129 10 0.0775 2ORF8 protein 27894:28259 363 90 0.2479 8Nucleocapsid protein 28274:29533 1257 340 0.2705 29ORF10 protein 29558:29674 114 27 0.2368 4To understand whether there is a differentiation in SARS-CoV-2 gene mutation pattern, we analyze thegene-specific statistics of SARS-CoV-2 single mutations. Table 3 lists the mutation ratio, i.e., number ofunique single-nucleotide polymorphisms (SNPs) over the corresponding gene length, for all SARS-CoV-2genes. A smaller mutation ratio for a given gene indicates its higher degree of conservativeness. Clearly,ORF7b gene has the smallest mutation ratio of 0.0775. The N gene has the second largest mutation ratioof 0.2705, which is very close to the largest ratio of 0.2800 for ORF3a gene. To take into the considerationof mutation frequency, we introduce the mutation h -index, defined as the maximum value of h such thatthe given gene section has h single mutations that have each occurred at least h times. Normally, largergenes tend to have higher h -index. Table 3 shows that, with a moderate length, the N gene has the largest h -index of 29, which is significantly higher the second largest h -index of 18 for NSP3. Therefore, it was truly4nfortunate for the world to have selected SARS-CoV-2 N gene primers and probes as diagnostic reagentsfor combating COVID-19.In summary, the targets of currently used COVID-19 diagnostic reagents have had numerous mutationsthat have seriously undermined our ability to combat COVID-19. In the Supporting Material, we providea full list of all 5117 SNP variants, including their positions and mutation types. This information, togetherwith ranking of the degree of the conservativeness of SARS-CoV-2 genes or proteins given in Table 3, en-ables researchers to avoid non-conservative genes (or their proteins) and mutated nucleotide segments indesigning COVID-19 diagnosis, vaccine and drugs. Methods and materials k -means clustering of all samples. Acknowledgment
This work was supported in part by NIH grant GM126189, NSF Grants DMS-1721024, DMS-1761320, andIIS1900473, Michigan Economic Development Corporation, Bristol-Myers Squibb, and Pfizer. The authorsthank The IBM TJ Watson Research Center, The COVID-19 High Performance Computing Consortium, andNVIDIA for computational assistance.
References [1] A. M. Casto, M.-L. Huang, A. Nalla, G. A. Perchetti, R. Sampoleo, L. Shrestha, Y. Wei, H. Zhu, A. L.Greninger, and K. R. Jerome. Comparative performance of SARS-CoV-2 detection assays using sevendifferent primer/probe sets and one assay kit. medRxiv , 2020.[2] J. F.-W. Chan, C. C.-Y. Yip, K. K.-W. To, T. H.-C. Tang, S. C.-Y. Wong, K.-H. Leung, A. Y.-F. Fung, A. C.-K. Ng, Z. Zou, H.-W. Tsoi, et al. Improved molecular diagnosis of COVID-19 by the novel, highlysensitive and specific COVID-19-rdrp/hel real-time reverse transcription-pcr assay validated in vitroand with clinical specimens.
Journal of Clinical Microbiology , 58(5), 2020.[3] V. M. Corman, O. Landt, M. Kaiser, R. Molenkamp, A. Meijer, D. K. Chu, T. Bleicker, S. Br ¨unink,J. Schneider, M. L. Schmidt, et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR.
Eurosurveillance , 25(3):2000045, 2020. 54] Y. J. Jung, G.-S. Park, J. H. Moon, K. Ku, S.-H. Beak, S. Kim, E. C. Park, D. Park, J.-H. Lee, C. W.Byeon, et al. Comparative analysis of primer-probe sets for the laboratory confirmation of SARS-CoV-2.
BioRxiv , 2020.[5] M. Levandowsky and D. Winter. Distance between sets.
Nature , 234(5323):34–35, 1971.[6] A. K. Nalla, A. M. Casto, M.-L. W. Huang, G. A. Perchetti, R. Sampoleo, L. Shrestha, Y. Wei, H. Zhu,K. R. Jerome, and A. L. Greninger. Comparative performance of SARS-CoV-2 detection assays usingseven different primer/probe sets and one assay kit.
Journal of Clinical Microbiology , 2020.[7] S. Pfefferle, S. Reucher, D. N ¨orz, and M. L ¨utgehetmann. Evaluation of a quantitative rt-pcr assay for thedetection of the emerging coronavirus SARS-CoV-2 using a high throughput system.
Eurosurveillance ,25(9):2000152, 2020.[8] Y. Shu and J. McCauley. Gisaid: Global initiative on sharing all influenza data–from vision to reality.
Eurosurveillance , 22(13), 2017.[9] F. Sievers and D. G. Higgins. Clustal omega.
Current protocols in bioinformatics , 48(1):3–13, 2014.[10] B. Udugama, P. Kadhiresan, H. N. Kozlowski, A. Malekjahani, M. Osborne, V. Y. Li, H. Chen,S. Mubareka, J. Gubbay, and W. C. Chan. Diagnosing COVID-19: The disease and tools for ddtec-tion.
ACS nano , 2020.[11] C. B. Vogels, A. F. Brito, A. L. Wyllie, J. R. Fauver, I. M. Ott, C. C. Kalinich, M. E. Petrone, M.-L. Landry,E. F. Foxman, and N. D. Grubaugh. Analytical sensitivity and efficiency comparisons of SARS-CoV-2qrt-pcr assays. medRxiv , 2020.[12] F. Wu, S. Zhao, B. Yu, Y.-M. Chen, W. Wang, Z.-G. Song, Y. Hu, Z.-W. Tao, J.-H. Tian, Y.-Y. Pei, et al. Anew coronavirus associated with human respiratory disease in China.