Zhengdong D. Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhengdong D. Zhang is active.

Explore More

Publication

Featured researches published by Zhengdong D. Zhang.

Nature | 2011

Mapping copy number variation by population-scale genome sequencing

Ryan E. Mills; Klaudia Walter; Chip Stewart; Robert E. Handsaker; Ken Chen; Can Alkan; Alexej Abyzov; Seungtai Yoon; Kai Ye; R. Keira Cheetham; Asif T. Chinwalla; Donald F. Conrad; Yutao Fu; Fabian Grubert; Iman Hajirasouliha; Fereydoun Hormozdiari; Lilia M. Iakoucheva; Zamin Iqbal; Shuli Kang; Jeffrey M. Kidd; Miriam K. Konkel; Joshua M. Korn; Ekta Khurana; Deniz Kural; Hugo Y. K. Lam; Jing Leng; Ruiqiang Li; Yingrui Li; Chang-Yun Lin; Ruibang Luo

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

Science | 2012

A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes

Daniel G. MacArthur; Suganthi Balasubramanian; Adam Frankish; Ni Huang; James A. Morris; Klaudia Walter; Luke Jostins; Lukas Habegger; Joseph K. Pickrell; Stephen B. Montgomery; Cornelis A. Albers; Zhengdong D. Zhang; Donald F. Conrad; Gerton Lunter; Hancheng Zheng; Qasim Ayub; Mark A. DePristo; Eric Banks; Min Hu; Robert E. Handsaker; Jeffrey A. Rosenfeld; Menachem Fromer; Mike Jin; Xinmeng Jasmine Mu; Ekta Khurana; Kai Ye; Mike Kay; Gary Saunders; Marie-Marthe Suner; Toby Hunt

Defective Gene Detective Identifying genes that give rise to diseases is one of the major goals of sequencing human genomes. However, putative loss-of-function genes, which are often some of the first identified targets of genome and exome sequencing, have often turned out to be sequencing errors rather than true genetic variants. In order to identify the true scope of loss-of-function genes within the human genome, MacArthur et al. (p. 823; see the Perspective by Quintana-Murci) extensively validated the genomes from the 1000 Genomes Project, as well as an additional European individual, and found that the average person has about 100 true loss-of-function alleles of which approximately 20 have two copies within an individual. Because many known disease-causing genes were identified in “normal” individuals, the process of clinical sequencing needs to reassess how to identify likely causative alleles. Validation of predicted nonfunctional alleles in the human genome affects the medical interpretation of genomic analyses. Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease–causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

Genome Biology | 2009

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Jan O. Korbel; Alexej Abyzov; Xinmeng Jasmine Mu; Nicholas Carriero; Philip Cayting; Zhengdong D. Zhang; Michael Snyder; Mark Gerstein

Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMers coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

Genome Research | 2013

The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes

Stephen B. Montgomery; David L. Goode; Erika Kvikstad; Cornelis A. Albers; Zhengdong D. Zhang; Xinmeng Jasmine Mu; Guruprasad Ananda; Bryan Howie; Konrad J. Karczewski; Kevin S. Smith; Vanessa Anaya; Rhea Richardson; Joseph S. Davis; Daniel G. MacArthur; Arend Sidow; Laurent Duret; Mark Gerstein; Kateryna D. Makova; Jonathan Marchini; Gil McVean; Gerton Lunter

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

Nucleic Acids Research | 2002

NCIR: a database of non-canonical interactions in known RNA structures.

Uma Nagaswamy; Maia Larios-Sanz; James Hury; Shakaala Collins; Zhengdong D. Zhang; Qin Zhao; George E. Fox

The secondary and tertiary structure of an RNA molecule typically includes a number of non-canonical base-base interactions. The known occurrences of these interactions are tabulated in the NCIR database, which can be accessed from http://prion.bchs.uh.edu/bp_type/. The number of examples is now over 1400, which is an increase of >700% since the database was first published. This dramatic increase reflects the addition of data from the recently published crystal structures of the 50S (2.4 A) and 30S (3.0 A) ribosomal subunits. In addition, non-canonical interactions observed in published crystal and NMR structures of tRNAs, group I introns, ribozymes, RNA aptamers and synthetic oligonucleotides are included. Properties associated with these interactions, such as sequence context, sugar pucker conformation, glycosidic angle conformation, melting temperature, chemical shift and free energy, are also reported when available. Out of the 29 anticipated pairs with at least two hydrogen bonds, 28 have been observed to date. In addition, several novel examples, not generally predicted, have also been encountered, bringing the total of such pairs to 36. Added to this list are a variety of single, bifurcated, triple and quadruple interactions. The most common non-canonical pairs are the sheared GA, GA imino, AU reverse Hoogsteen, and the GU and AC wobble pairs. The most frequent triple interaction connects N3 of an A with the amino of a G that is also involved in a standard Watson-Crick pair.

Nucleic Acids Research | 2000

Database of non-canonical base pairs found in known RNA structures

Uma Nagaswamy; Neil Voss; Zhengdong D. Zhang; George E. Fox

Atomic resolution RNA structures are being published at an increasing rate. It is common to find a modest number of non-canonical base pairs in these structures in addition to the usual Watson-Crick pairs. This database summarizes the occurrence of these rare base pairs in accordance with standard nomenclature. The database, http://prion.bchs.uh.edu/, contains information such as sequence context, sugar pucker conformation, anti / syn base conformations, chemical shift, p K (a)values, melting temperature and free energy. Of the 29 anticipated pairs with two or more hydrogen bonds, 20 have been encountered to date. In addition, four unexpected pairs with two hydrogen bonds have been reported bringing the total to 24. Single hydrogen bond versions of five of the expected geometries have been encountered among the single hydrogen bond interactions. In addition, 18 different types of base triplets have been encountered, each of which involves three to six hydrogen bonds. The vast majority of the rare base pairs are antiparallel with the bases in the anti configuration relative to the ribose. The most common are the GU wobble, the Sheared GA pair, the Reverse Hoogsteen pair and the GA imino pair.

Nature Reviews Genetics | 2014

Comparative genetics of longevity and cancer: insights from long-lived rodents

Vera Gorbunova; Andrei Seluanov; Zhengdong D. Zhang; Vadim N. Gladyshev; Jan Vijg

Mammals have evolved a remarkable diversity of ageing rates. Within the single order of Rodentia, maximum lifespans range from 4 years in mice to 32 years in naked mole rats. Cancer rates also differ substantially between cancer-prone mice and almost cancer-proof naked mole rats and blind mole rats. Recent progress in rodent comparative biology, together with the emergence of whole-genome sequence information, has opened opportunities for the discovery of genetic factors that control longevity and cancer susceptibility.

Proceedings of the National Academy of Sciences of the United States of America | 2009

EBNA1 regulates cellular gene expression by binding cellular promoters

Allon Canaan; Izhak Haviv; Alexander E. Urban; Vincent P. Schulz; Steve Hartman; Zhengdong D. Zhang; Dean Palejev; Albert B. Deisseroth; Jill Lacy; Michael Snyder; Mark Gerstein; Sherman M. Weissman

Epstein–Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitts lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

PLOS Computational Biology | 2008

Modeling ChIP Sequencing In Silico with Applications

Zhengdong D. Zhang; Joel Rozowsky; Michael Snyder; Joseph T. Chang; Mark Gerstein

ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.

Proceedings of the National Academy of Sciences of the United States of America | 2013

Naked mole-rat has increased translational fidelity compared with the mouse, as well as a unique 28S ribosomal RNA cleavage

Jorge Azpurua; Zhonghe Ke; Iris X. Chen; Quanwei Zhang; Dmitri N. Ermolenko; Zhengdong D. Zhang; Vera Gorbunova; Andrei Seluanov

Significance Molecular mechanisms responsible for differences in longevity between animal species are largely unknown. Here we show that the longest-lived rodent, the naked mole-rat, has more accurate protein translation than the mouse. Furthermore, we show that the naked mole-rat has a unique fragmented ribosomal RNA structure. Such cleaved ribosomal RNA has been reported for only one other species of mammal. This article suggests the importance of protein translation in aging and provides insight into the mechanisms of longevity. The naked mole-rat (Heterocephalus glaber) is a subterranean eusocial rodent with a markedly long lifespan and resistance to tumorigenesis. Multiple data implicate modulation of protein translation in longevity. Here we report that 28S ribosomal RNA (rRNA) of the naked mole-rat is processed into two smaller fragments of unequal size. The two breakpoints are located in the 28S rRNA divergent region 6 and excise a fragment of 263 nt. The excised fragment is unique to the naked mole-rat rRNA and does not show homology to other genomic regions. Because this hidden break site could alter ribosome structure, we investigated whether translation rate and amino acid incorporation fidelity were altered. We report that naked mole-rat fibroblasts have significantly increased translational fidelity despite having comparable translation rates with mouse fibroblasts. Although we cannot directly test whether the unique 28S rRNA structure contributes to the increased fidelity of translation, we speculate that it may change the folding or dynamics of the large ribosomal subunit, altering the rate of GTP hydrolysis and/or interaction of the large subunit with tRNA during accommodation, thus affecting the fidelity of protein synthesis. In summary, our results show that naked mole-rat cells produce fewer aberrant proteins, supporting the hypothesis that the more stable proteome of the naked mole-rat contributes to its longevity.

Explore More