bioRxiv | 2021

Genomic analyses of 10,376 individuals provides comprehensive map of genetic variations, structure and reference haplotypes for Chinese population

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Abstract


\n Here, we initiated the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole-genome sequencing individuals and 5,481 high-density genotyping individuals. We identified 80.99 million SNPs and INDELs, of which 38.6% are novel. The genetic evidence of Chinese population structure supported the corresponding geographical boundaries of the Qinling-Huaihe Line and Nanling Mountains. The genetic architecture within North Han was more homogeneous than South Han, and the history of effective population size of Lingnan began to deviate from the other three regions from 6 thousand years ago. In addition, we identified a novel locus (SNX29) under selection pressure and confirmed several loci associated with alcohol metabolism and histocompatibility systems. We observed significant selection of genes on epidermal cell differentiation and skin development only in southern Chinese. Finally, the WBBC haplotype panel, which is a population-specific reference panel, yielded substantial improvement of imputation performance in Chinese population for low-frequency and rare variants compared to 1KG Project, and merging EAS individuals to increase the haplotype size of WBBC could improve the performance across all MAF bins. We provided an online imputation server (https://wbbc.westlake.edu.cn/) which could result in higher imputation accuracy compared to the existing panels, especially for lower frequency variants.

Volume None
Pages None
DOI 10.21203/rs.3.rs-184446/v1
Language English
Journal bioRxiv

Full Text