Yungang He
CAS-MPG Partner Institute for Computational Biology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yungang He.
Nature | 2004
Bo Wen; Hui Li; Daru Lu; Xiufeng Song; Feng Zhang; Yungang He; Feng Li; Yang Gao; Xianyun Mao; Liang Zhang; Ji Qian; Jingze Tan; Jianzhong Jin; Wei Huang; Ranjan Deka; Bing Su; Ranajit Chakraborty; Li Jin
The spread of culture and language in human populations is explained by two alternative models: the demic diffusion model, which involves mass movement of people; and the cultural diffusion model, which refers to cultural impact between populations and involves limited genetic exchange between them. The mechanism of the peopling of Europe has long been debated, a key issue being whether the diffusion of agriculture and language from the Near East was concomitant with a large movement of farmers. Here we show, by systematically analysing Y-chromosome and mitochondrial DNA variation in Han populations, that the pattern of the southward expansion of Han culture is consistent with the demic diffusion model, and that males played a larger role than females in this expansion. The Han people, who all share the same culture and language, exceed 1.16 billion (2000 census), and are by far the largest ethnic group in the world. The expansion process of Han culture is thus of great interest to researchers in many fields.
American Journal of Human Genetics | 2009
Shuhua Xu; Xianyong Yin; Shilin Li; Wenfei Jin; Haiyi Lou; Ling Yang; Xiaohong Gong; Hongyan Wang; Yiping Shen; Xuedong Pan; Yungang He; Yajun Yang; Yi Wang; Wenqing Fu; Yu An; Jiucun Wang; Jingze Tan; Ji Qian; Xiaoli Chen; Xin Zhang; Yangfei Sun; Xuejun Zhang; Bai-Lin Wu; Li Jin
To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at approximately 160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F(ST) = 0.0002 approximately 0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F(ST) > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10(-101)). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.
Annals of Human Genetics | 2008
Feng Zhang; Z. Li; Bo Wen; J. Jiang; M. Shao; Yingnan Zhao; Yungang He; Xiao Song; Ji Qian; Daru Lu; Li Jin
The gene families in the AZFc region of the Y chromosome have been shown to be functionally important in human spermatogenesis. The gr/gr deletion, a partial AZFc deletion that reduces the copy numbers of all the AZFc gene families, was identified as a significant risk factor for spermatogenic impairment in Dutch, Spanish and Italians. However, the presence of this deletion in healthy French and Germans questioned its importance in male infertility. In this study, we have shown that the gr/gr deletion does not render an increased risk in Han Chinese. In fact, the gr/gr deletion is frequent (about 8%) in our survey of 886 East Asians from 8 ethnic groups. Furthermore, the DAZ1/DAZ2 deletion has been detected as the primary subtype of the gr/gr deletion in East Asians, though this doublet has been considered as crucial for normal spermatogenesis in Europeans. The different spermatogenic effects of various types of the partial AZFc deletion suggest that the functional difference between AZFc gene copies is a likely cause of inconsistent associations of the gr/gr deletion with spermatogenic impairment across populations.
PLOS ONE | 2014
Shi Yan; Chuan-Chao Wang; Hong-Xiang Zheng; Wei Wang; Zhendong Qin; Lan-Hai Wei; Yi Wang; Xuedong Pan; Wenqing Fu; Yungang He; Li-Jun Xiong; Wenfei Jin; Shilin Li; Yu An; Hui Li; Li Jin
Demographic change of human populations is one of the central questions for delving into the past of human beings. To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region, discovered >4,000 new SNPs, and identified many new clades. The relative divergence dates can be estimated much more precisely using a molecular clock. We found that all the Paleolithic divergences were binary; however, three strong star-like Neolithic expansions at ∼6 kya (thousand years ago) (assuming a constant substitution rate of 1×10−9/bp/year) indicates that ∼40% of modern Chinese are patrilineal descendants of only three super-grandfathers at that time. This observation suggests that the main patrilineal expansion in China occurred in the Neolithic Era and might be related to the development of agriculture.
Molecular Biology and Evolution | 2013
Longli Kang; Hong-Xiang Zheng; Feng Chen; Shi Yan; Kai Liu; Zhendong Qin; Lijun Liu; Zhipeng Zhao; Lei Li; Xiaofeng Wang; Yungang He; Li Jin
Sherpa population is an ethnic group living in south mountainside of Himalayas for hundreds of years. They are famous as extraordinary mountaineers and guides, considered as a good example for successful adaptation to low oxygen environment in Tibetan highlands. Mitochondrial DNA (mtDNA) variations might be important in the highland adaption given its role in coding core subunits of oxidative phosphorylation in mitochondria. In this study, we sequenced the complete mtDNA genomes of 76 unrelated Sherpa individuals. Generally, Sherpa mtDNA haplogroup constitution was close to Tibetan populations. However, we found three lineage expansions in Sherpas, two of which (C4a3b1 and A4e3a) were Sherpa-specific. Both lineage expansions might begin within the past hundreds of years. Especially, nine individuals carry identical Haplogroup C4a3b1. According to the history of Sherpas and Bayesian skyline plot, we constructed various demographic models and found out that it is unlikely for these lineage expansions to occur in neutral models especially for C4a3b1. Nonsynonymous mutations harbored in C4a3b1 (G3745A) and A4e3a (T4216C) are both ND1 mutants (A147T and Y304H, respectively). Secondary structure predictions showed that G3745A were structurally closing to other pathogenic mutants, whereas T4216C itself was reported as the primary mutation for Lebers hereditary optic neuropathy. Thus, we propose that these mutations had certain effect on Complex I function and might be important in the high altitude adaptation for Sherpa people.
Proceedings of the National Academy of Sciences of the United States of America | 2006
Wei Huang; Yungang He; Haifeng Wang; Ying Wang; Yangfan Liu; Yi Wang; Xun Chu; Liang Xu; Yayun Shen; Xiaoyan Xiong; Hui Li; Bo Wen; Ji Qian; Wentao Yuan; Chenhui Zhang; Hongquan Jiang; Guoping Zhao; Zhu Chen; Li Jin
The discovery of the block-like structure of linkage disequilibrium (LD) in human populations holds the promise of delineating the etiology of common diseases. However, understanding the magnitude, mechanism, and utility of between-population LD sharing is critical for future genome-wide association studies. In this study, substantial LD sharing between six non-African populations was observed, although much less between African-American and non-African, based on 20,000 SNPs of chromosome 21. We also demonstrated the respective roles of recombination and demographic events in shaping LD sharing. Furthermore, we showed that the haplotype-tagged SNPs chosen from one population are portable to the others in East Asia. Therefore, we concluded that the magnitude of LD sharing between human populations justifies the use of representative populations for selecting haplotype-tagged SNPs in genome-wide association studies of complex diseases.
Scientific Reports | 2012
Yungang He; Wei R. Wang; Shuhua Xu; Li Jin
The genetic origins of Japanese populations have been controversial. Upper Paleolithic Japanese, i.e. Jomon, developed independently in Japanese islands for more than 10,000 years until the isolation was ended with the influxes of continental immigrants about 2,000 years ago. However, the knowledge of origin of Jomon and its contribution to the genetic pool of contemporary Japanese is still limited, albeit the extensive studies using mtDNA and Y chromosomes. In this report, we aimed to infer the origin of Jomon and to estimate its contribution to Japanese by fitting an admixture model with missing data from Jomon to a genome-wide data from 94 worldwide populations. Our results showed that the genetic contributions of Jomon, the Paleolithic contingent in Japanese, are 54.3∼62.3% in Ryukyuans and 23.1∼39.5% in mainland Japanese, respectively. Utilizing inferred allele frequencies of the Jomon population, we further showed the Paleolithic contingent in Japanese had a Northeast Asia origin.
Human Mutation | 2010
Zhimin Wang; Yanping Li; Beilan Wang; Yungang He; Yi Wang; Huifeng Xi; Yifeng Li; Ying Wang; Dingliang Zhu; Jianzhong Jin; Wei Huang; Li Jin
Our previous study in an isolated population showed an association between a genetic variant in the catalase gene (CAT) and essential hypertension (EH). This study indicates that three variants in the promoter and 5′‐UTR region of CAT are predominant in Chinese Han, and they form two major haplotypes. A case–control study showed that the CATH2 haplotype confers susceptibility to EH (Pgenotype=0.0017, and Pallilc=0.00078). Subjects bearing CATH1/CATH2 and CATH2/CATH2 genotypes demonstrated a higher susceptibility to EH than CATH1/CATH1 homozygotes, with odds ratios of 1.474 and 1.625, respectively. Also, CATH1/CATH1 individuals had a later‐onset age (P=0.015). Expression analysis using luciferase reporter vectors indicated that the CATH1 haplotype showed a lower transcriptional activity than the haplotype CATH2 (P<0.05 in all four cell lines), and we observed similar results in the endogenous allelic expression ratios of CATH1/CATH2 in cell lines. In contrast, most CATH1 haplotypes showed a higher transcription level than CATH2 haplotypes (10 out of 11 or 90.9%) in blood from normal individuals (P<0.01). We therefore hypothesize that CATH1 and CATH2 may play alternating roles at different level of oxidative stress. Hum Mutat 31:272–278, 2010.
Scientific Reports | 2015
Bin Zhou; Hui Dong; Yungang He; Jian Sun; Weirong Jin; Qing Xie; Rong Fan; Minxian Wang; Ran Li; Yangyi Chen; Shaoqing Xie; Yan Shen; Xin Huang; Wang S; Fengming Lu; Jidong Jia; Zhuang H; Stephen Locarnini; Guoping Zhao; Li Jin; Jinlin Hou
Reverse transcriptase (RT) mutations contribute to hepatitis B virus resistance during antiviral therapy with nucleos(t)ide analogs. However, the composition of the RT quasispecies and their interactions during antiviral treatment have not yet been thoroughly defined. In this report, 10 patients from each of 3 different virological response groups, i.e., complete virological response, partial virological response and virological breakthrough, were selected from a multicenter trial of Telbivudine treatment. Variations in the drug resistance-related critical RT regions in 107 serial serum samples from the 30 patients were examined by ultra-deep sequencing. A total of 496,577 sequence reads were obtained, with an average sequencing coverage of 4,641X per sample. The phylogenies of the quasispecies revealed the independent origins of two critical quasispecies, i.e., the rtA181T and rtM204I mutants. Data analyses and theoretical modeling showed a cooperative-competitive interplay among the quasispecies. In particular, rtM204I mutants compete against other quasispecies, which eventually leads to virological breakthrough. However, in the absence of rtM204I mutants, synergistic growth of the drug-resistant rtA181T mutants with the wild-type quasispecies could drive the composition of the viral population into a state of partial virological response. Furthermore, we demonstrated that the frequency of drug-resistant mutations in the early phase of treatment is important for predicting the virological response to antiviral therapy.
Molecular Biology and Evolution | 2014
Minxian Wang; Xin Huang; Ran Li; Hongyang Xu; Li Jin; Yungang He
Studies of natural selection, followed by functional validation, are shedding light on understanding of genetic mechanisms underlying human evolution and adaptation. Classic methods for detecting selection, such as the integrated haplotype score (iHS) and Fay and Wus H statistic, are useful for candidate gene searching underlying positive selection. These methods, however, have limited capability to localize causal variants in selection target regions. In this study, we developed a novel method based on conditional coalescent tree to detect recent positive selection by counting unbalanced mutations on coalescent gene genealogies. Extensive simulation studies revealed that our method is more robust than many other approaches against biases due to various demographic effects, including population bottleneck, expansion, or stratification, while not sacrificing its power. Furthermore, our method demonstrated its superiority in localizing causal variants from massive linked genetic variants. The rate of successful localization was about 20-40% higher than that of other state-of-the-art methods on simulated data sets. On empirical data, validated functional causal variants of four well-known positive selected genes were all successfully localized by our method, such as ADH1B, MCM6, APOL1, and HBB. Finally, the computational efficiency of this new method was much higher than that of iHS implementations, that is, 24-66 times faster than the REHH package, and more than 10,000 times faster than the original iHS implementation. These magnitudes make our method suitable for applying on large sequencing data sets. Software can be downloaded from https://github.com/wavefancy/scct.