Zhenglong Gu
Cornell University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhenglong Gu.
Nature | 2003
Zhenglong Gu; Lars M. Steinmetz; Xun Gu; Curt Scharfe; Ronald W. Davis; Wen-Hsiung Li
Deleting a gene in an organism often has little phenotypic effect, owing to two mechanisms of compensation. The first is the existence of duplicate genes: that is, the loss of function in one copy can be compensated by the other copy or copies. The second mechanism of compensation stems from alternative metabolic pathways, regulatory networks, and so on. The relative importance of the two mechanisms has not been investigated except for a limited study, which suggested that the role of duplicate genes in compensation is negligible. The availability of fitness data for a nearly complete set of single-gene-deletion mutants of the Saccharomyces cerevisiae genome has enabled us to carry out a genome-wide evaluation of the role of duplicate genes in genetic robustness against null mutations. Here we show that there is a significantly higher probability of functional compensation for a duplicate gene than for a singleton, a high correlation between the frequency of compensation and the sequence similarity of two duplicates, and a higher probability of a severe fitness effect when the duplicate copy that is more highly expressed is deleted. We estimate that in S. cerevisiae at least a quarter of those gene deletions that have no phenotype are compensated by duplicate genes.
Nature | 2001
Wen-Hsiung Li; Zhenglong Gu; Haidong Wang; Anton Nekrutenko
The completion of the human genome will greatly accelerate the development of a new branch of science—evolutionary genomics. We can now directly address important questions about the evolutionary history of human genes and their regulatory sequences. Computational analyses of the human genome will reveal the number of genes and repetitive elements, the extent of gene duplication and compositional heterogeneity in the human genome, and the extent of domain shuffling and domain sharing among proteins. Here we present some first glimpses of these features.
Trends in Genetics | 2002
Zhenglong Gu; Dan L. Nicolae; Henry H-S. Lu; Wen-Hsiung Li
For more than 30 years, expression divergence has been considered as a major reason for retaining duplicated genes in a genome, but how often and how fast duplicate genes diverge in expression has not been studied at the genomic level. Using yeast microarray data, we show that expression divergence between duplicate genes is significantly correlated with their synonymous divergence (K(S)) and also with their nonsynonymous divergence (K(A)) if K(A) </= 0.3. Thus, expression divergence increases with evolutionary time, and K(A) is initially coupled with expression divergence. More interestingly, a large proportion of duplicate genes have diverged quickly in expression and the vast majority of gene pairs eventually become divergent in expression. Indeed, more than 40% of gene pairs show expression divergence even when K(S) is </= 0.10, and this proportion becomes >80% for K(S) > 1.5. Only a small fraction of ancient gene pairs do not show expression divergence.
Proceedings of the National Academy of Sciences of the United States of America | 2007
Wu Wei; John H. McCusker; Richard W. Hyman; Ted Jones; Ye Ning; Zhiwei Cao; Zhenglong Gu; Dan Bruno; Molly Miranda; Michelle Nguyen; Julie Wilhelmy; Caridad Komp; Raquel Tamse; Xiaojing Wang; Peilin Jia; Philippe P. Luedi; Peter J. Oefner; Lior David; Fred S. Dietrich; Yixue Li; Ronald W. Davis; Lars M. Steinmetz
We sequenced the genome of Saccharomyces cerevisiae strain YJM789, which was derived from a yeast isolated from the lung of an AIDS patient with pneumonia. The strain is used for studies of fungal infections and quantitative genetics because of its extensive phenotypic differences to the laboratory reference strain, including growth at high temperature and deadly virulence in mouse models. Here we show that the ≈12-Mb genome of YJM789 contains ≈60,000 SNPs and ≈6,000 indels with respect to the reference S288c genome, leading to protein polymorphisms with a few known cases of phenotypic changes. Several ORFs are found to be unique to YJM789, some of which might have been acquired through horizontal transfer. Localized regions of high polymorphism density are scattered over the genome, in some cases spanning multiple ORFs and in others concentrated within single genes. The sequence of YJM789 contains clues to pathogenicity and spurs the development of more powerful approaches to dissecting the genetic basis of complex hereditary traits.
Nature Genetics | 2004
Zhenglong Gu; Scott A. Rifkin; Kevin P. White; Wen-Hsiung Li
Using microarray gene expression data from several Drosophila species and strains, we show that duplicated genes, compared with single-copy genes, significantly increase gene expression diversity during development. We show further that duplicate genes tend to cause expression divergences between Drosophila species (or strains) to evolve faster than do single-copy genes. This conclusion is also supported by data from different yeast strains.
Genome Biology | 2003
Peng Zhang; Zhenglong Gu; Wen-Hsiung Li
BackgroundFollowing gene duplication, two duplicate genes may experience relaxed functional constraints or acquire different mutations, and may also diverge in function. Whether the two copies will evolve in different patterns remains unclear, however, because previous studies have reached conflicting conclusions. In order to resolve this issue, by providing a general picture, we studied 250 independent pairs of young duplicate genes from the whole human genome.ResultsWe showed that nearly 60% of the young duplicate gene pairs have evolved at the amino-acid level at significantly different rates from each other. More than 25% of these gene pairs also showed significantly different ratios of nonsynonymous to synonymous rates (Ka/Ks ratios). Moreover, duplicate pairs with different rates of amino-acid substitution also tend to differ in the Ka/Ks ratio, with the fast-evolving copy tending to have a slightly higher Ks than the slow-evolving one. Lastly, a substantial portion of fast-evolving copies have accumulated amino-acid substitutions evenly across the protein sequences, whereas most of the slow-evolving copies exhibit uneven substitution patterns.ConclusionsOur results suggest that duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence. Such different evolutionary patterns may be largely due to different functional constraints on the two copies.
Proceedings of the National Academy of Sciences of the United States of America | 2014
Kaixiong Ye; Jian Lu; Fei Ma; Alon Keinan; Zhenglong Gu
Significance There are hundreds to thousands of copies of mitochondrial DNA (mtDNA) in each human cell in contrast to only two copies of nuclear DNA. High-frequency pathogenic mtDNA mutations have been found in patients with classic mitochondrial diseases, premature aging, cancers, and neurodegenerative diseases. In this study we investigated the distribution of heteroplasmic mutations, their pathogenic potential, and their underlying evolutionary forces using genome sequence data from the 1000 Genomes Project. Our results demonstrated the prevalence of low-frequency high-pathogenic-potential mtDNA mutations in healthy human individuals. These deleterious mtDNA mutations, when reaching high frequency, could provide a likely source of mitochondrial dysfunction. Managing the expansion of deleterious mtDNA mutations could be a promising means of preventing disease progression. A majority of mitochondrial DNA (mtDNA) mutations reported to be implicated in diseases are heteroplasmic, a status with coexisting mtDNA variants in a single cell. Quantifying the prevalence of mitochondrial heteroplasmy and its pathogenic effect in healthy individuals could further our understanding of its possible roles in various diseases. A total of 1,085 human individuals from 14 global populations have been sequenced by the 1000 Genomes Project to a mean coverage of ∼2,000× on mtDNA. Using a combination of stringent thresholds and a maximum-likelihood method to define heteroplasmy, we demonstrated that ∼90% of the individuals carry at least one heteroplasmy. At least 20% of individuals harbor heteroplasmies reported to be implicated in disease. Mitochondrial heteroplasmy tend to show high pathogenicity, and is significantly overrepresented in disease-associated loci. Consistent with their deleterious effect, heteroplasmies with derived allele frequency larger than 60% within an individual show a significant reduction in pathogenicity, indicating the action of purifying selection. Purifying selection on heteroplasmies can also be inferred from nonsynonymous and synonymous heteroplasmy comparison and the unfolded site frequency spectra for different functional sites in mtDNA. Nevertheless, in comparison with population polymorphic mtDNA mutations, the purifying selection is much less efficient in removing heteroplasmic mutations. The prevalence of mitochondrial heteroplasmy with high pathogenic potential in healthy individuals, along with the possibility of these mutations drifting to high frequency inside a subpopulation of cells across lifespan, emphasizes the importance of managing mitochondrial heteroplasmy to prevent disease progression.
Gene | 2000
Zhenglong Gu; Haidong Wang; Anton Nekrutenko; Wen-Hsiung Li
The densities of repetitive elements in the human genome were calculated in each GC content class using non-overlapping windows of 50kb. The density of Alu is two to three times higher in GC-rich regions than in AT-rich regions, while the opposite is true for LINE1. In contrast, LINE2 and other elements, such as DNA transposons, are more uniformly distributed in the genome. The number of Alus in the human genome was estimated to be 1.4 million, higher than previous estimates. About 40% of the autosomes and approximately 51% of the X and Y chromosomes are occupied by repetitive elements. In total, the human genome is estimated to contain more than 4 million repetitive elements. The GC contents (%) of repetitive elements and their flanking regions were also calculated. The GC contents of almost all kinds of repeats are positively correlated with the window GC contents, suggesting that a repetitive sequence is subject to the same mutation pressure as its surrounding regions, so it tends to have the same GC content as its surrounding regions. This observation supports the regional mutation hypothesis. The only two exceptions are AluYa and AluYb8, the two youngest Alu subfamilies. The GC content of AluYb8 is negatively correlated with that of its surrounding regions, while AluYa shows no correlation, suggesting different insertion patterns for these two young Alu subfamilies. This suggestion was supported by the fact that the average genetic distance between members of AluYb8 in each GC window class is positively correlated with the GC content of the window, but no correlation was found for AluYa. AluYa is more frequent in Y chromosome than in other chromosomes; the same is true for LTR retroviruses. This pattern might be correlated with the evolutionary history of Y chromosome.
PLOS Genetics | 2010
K. T. Nishant; Wu Wei; Eugenio Mancera; Juan Lucas Argueso; Andreas Schlattl; Nicolas Delhomme; Xin Ma; Carlos Bustamante; Jan O. Korbel; Zhenglong Gu; Lars M. Steinmetz; Eric Alani
Accurate estimates of mutation rates provide critical information to analyze genome evolution and organism fitness. We used whole-genome DNA sequencing, pulse-field gel electrophoresis, and comparative genome hybridization to determine mutation rates in diploid vegetative and meiotic mutation accumulation lines of Saccharomyces cerevisiae. The vegetative lines underwent only mitotic divisions while the meiotic lines underwent a meiotic cycle every ∼20 vegetative divisions. Similar base substitution rates were estimated for both lines. Given our experimental design, these measures indicated that the meiotic mutation rate is within the range of being equal to zero to being 55-fold higher than the vegetative rate. Mutations detected in vegetative lines were all heterozygous while those in meiotic lines were homozygous. A quantitative analysis of intra-tetrad mating events in the meiotic lines showed that inter-spore mating is primarily responsible for rapidly fixing mutations to homozygosity as well as for removing mutations. We did not observe 1–2 nt insertion/deletion (in-del) mutations in any of the sequenced lines and only one structural variant in a non-telomeric location was found. However, a large number of structural variations in subtelomeric sequences were seen in both vegetative and meiotic lines that did not affect viability. Our results indicate that the diploid yeast nuclear genome is remarkably stable during the vegetative and meiotic cell cycles and support the hypothesis that peripheral regions of chromosomes are more dynamic than gene-rich central sections where structural rearrangements could be deleterious. This work also provides an improved estimate for the mutational load carried by diploid organisms.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Xiaoqiu Liu; Huifeng Jiang; Zhenglong Gu; Jeffrey W. Roberts
Bacteriophage lambda is one of the most extensively studied organisms and has been a primary model for understanding basic modes of genetic regulation. Here, we examine the progress of lambda gene expression during phage development by ribosome profiling and, thereby, provide a very-high-resolution view of lambda gene expression. The known genes are expressed in a predictable fashion, authenticating the analysis. However, many previously unappreciated potential open reading frames become apparent in the expression analysis, revealing an unexpected complexity in the pattern of lambda gene function.