bioRxiv | 2021

Accurate assignment of disease liability to genetic variants using only population data

 
 
 
 
 
 
 
 
 
 
 
 
 
 

Abstract


Purpose The growing size of public variant repositories prompted us to test the accuracy of predicting pathogenicity of DNA variants using population data alone. Methods Under the a priori assumption that the ratio of the prevalence of variants in healthy and affected populations form two distinct distributions (pathogenic and benign), we used a Bayesian method to assign probability of a variant belonging to either distribution. Results The approach, termed BayPR, accurately parsed 300 of 313 expertly curated cystic fibrosis transmembrane conductance regulator (CFTR) variants: 284 of 296 pathogenic/likely pathogenic (P/LP) variants in one distribution and 16 of 17 benign/likely benign (B/LB) variants in another. BayPR produced an area under the receiver operating curve (AUC) of 0.99 for 103 functionally-confirmed missense CFTR variants, equal to or exceeding ten commonly used algorithms (AUC range: 0.54 to 0.99). Application of BayPR to expertly curated variants in eight genes associated with seven Mendelian conditions assigned ≥80% disease-causing probability to 1,350 of 1,374 (98.3%) P/LP variants and ≤20% to 22 of 23 (95.7%) B/LB variants. Conclusion Agnostic to variant type or functional effect, BayPR provides probabilities of pathogenicity for DNA variants responsible for Mendelian disorders using only variant counts in affected and unaffected population samples.

Volume None
Pages None
DOI 10.1101/2021.04.19.440463
Language English
Journal bioRxiv

Full Text