bioRxiv | 2021

CoCoRV: a rare variant analysis framework using publicly available genotype summary counts to prioritize germline disease-predisposition genes

 
 
 
 
 

Abstract


Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not addressed. We propose a new framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV has consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and can detect rare variants in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.

Volume None
Pages None
DOI 10.1101/2021.09.29.462472
Language English
Journal bioRxiv

Full Text