Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhaojun Zhang is active.

Publication


Featured researches published by Zhaojun Zhang.


Nature Genetics | 2015

Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance

James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh

Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.


Genes and Immunity | 2014

Using the emerging Collaborative Cross to probe the immune system

J. Phillippi; Yuying Xie; Darla R. Miller; Timothy A. Bell; Zhaojun Zhang; Alan B. Lenarcic; David L. Aylor; S. H. Krovi; David W. Threadgill; F. Pardo-Manuel De Villena; Wei Wang; William Valdar; Jeffrey A. Frelinger

The Collaborative Cross (CC) is an emerging panel of recombinant inbred (RI) mouse strains. Each strain is genetically distinct but all descended from the same eight inbred founders. In 66 strains from incipient lines of the CC (pre-CC), as well as the 8 CC founders and some of their F1 offspring, we examined subsets of lymphocytes and antigen-presenting cells. We found significant variation among the founders, with even greater diversity in the pre-CC. Genome-wide association using inferred haplotypes detected highly significant loci controlling B-to-T cell ratio, CD8 T-cell numbers, CD11c and CD23 expression. Comparison of overall strain effects in the CC founders with strain effects at QTL in the pre-CC revealed sharp contrasts in the genetic architecture of two traits with significant loci: variation in CD23 can be explained largely by additive genetics at one locus, whereas variation in B-to-T ratio has a more complex etiology. For CD23, we found a strong QTL whose confidence interval contained the CD23 structural gene Fcer2a. Our data on the pre-CC demonstrate the utility of the CC for studying immunophenotypes and the value of integrating founder, CC and F1 data. The extreme immunophenotypes observed could have pleiotropic effects in other CC experiments.


Bioinformatics | 2014

RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

Zhaojun Zhang; Wei Wang

Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Genetics | 2014

Bayesian modeling of haplotype effects in multiparent populations.

Zhaojun Zhang; Wei Wang; William Valdar

A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.


Bioinformatics | 2013

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

Zhaojun Zhang; Shunping Huang; Jack Wang; Xiang Zhang; Fernando Pardo-Manuel de Villena; Leonard McMillan; Wei Wang

Motivation: RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives) and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observe that ∼3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, ∼10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries. Results: We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls owing to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that >16.3% of them are false positives. Availability: The software can be downloaded at http://csbio.unc.edu/genescissors/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


G3: Genes, Genomes, Genetics | 2012

HTreeQA: Using Semi-Perfect Phylogeny Trees in Quantitative Trait Loci Study on Genotype Data

Zhaojun Zhang; Xiang Zhang; Wei Wang

With the advances in high-throughput genotyping technology, the study of quantitative trait loci (QTL) has emerged as a promising tool to understand the genetic basis of complex traits. Methodology development for the study of QTL recently has attracted significant research attention. Local phylogeny-based methods have been demonstrated to be powerful tools for uncovering significant associations between phenotypes and single-nucleotide polymorphism markers. However, most existing methods are designed for homozygous genotypes, and a separate haplotype reconstruction step is often needed to resolve heterozygous genotypes. This approach has limited power to detect nonadditive genetic effects and imposes an extensive computational burden. In this article, we propose a new method, HTreeQA, that uses a tristate semi-perfect phylogeny tree to approximate the perfect phylogeny used in existing methods. The semi-perfect phylogeny trees are used as high-level markers for association study. HTreeQA uses the genotype data as direct input without phasing. HTreeQA can handle complex local population structures. It is suitable for QTL mapping on any mouse populations, including the incipient Collaborative Cross lines. Applied HTreeQA, significant QTLs are found for two phenotypes of the PreCC lines, white head spot and running distance at day 5/6. These findings are consistent with known genes and QTL discovered in independent studies. Simulation studies under three different genetic models show that HTreeQA can detect a wider range of genetic effects and is more efficient than existing phylogeny-based approaches. We also provide rigorous theoretical analysis to show that HTreeQA has a lower error rate than alternative methods.


PLOS Computational Biology | 2012

Chapter 10: Mining Genome-Wide Genetic Markers

Xiang Zhang; Shunping Huang; Zhaojun Zhang; Wei Wang

Genome-wide association study (GWAS) aims to discover genetic factors underlying phenotypic traits. The large number of genetic factors poses both computational and statistical challenges. Various computational approaches have been developed for large scale GWAS. In this chapter, we will discuss several widely used computational approaches in GWAS. The following topics will be covered: (1) An introduction to the background of GWAS. (2) The existing computational approaches that are widely used in GWAS. This will cover single-locus, epistasis detection, and machine learning methods that have been recently developed in biology, statistic, and computer science communities. This part will be the main focus of this chapter. (3) The limitations of current approaches and future directions.


Nature Genetics | 2015

Erratum: Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance (Nature Genetics (2015) 47 (353-360))

James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh

Nat. Genet. 47, 353–360 (2015); published online 2 March 2015; corrected after print 16 April 2015 In the version of this article initially published, an accession number was not provided for RNA-seq data sets. The RNA-seq data sets that passed quality control are available at the Sequence Read Archive (SRA) under accession SRP056236.


knowledge discovery and data mining | 2011

Clustering with relative constraints

Eric Yi Liu; Zhaojun Zhang; Wei Wang


Archive | 2009

Global Dynamical Significance of Zigzag Fractures in South Polar Ice Cap of Mars

Z. Q. Zeng; Zhaojun Zhang; S. J. Birnbaum; Hongjie Xie; Wenkui Yang

Collaboration


Dive into the Zhaojun Zhang's collaboration.

Top Co-Authors

Avatar

Wei Wang

University of California

View shared research outputs
Top Co-Authors

Avatar

Shunping Huang

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

William Valdar

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Alan B. Lenarcic

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

David L. Aylor

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Timothy A. Bell

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Xiang Zhang

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Yuying Xie

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Andrew P. Morgan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Catherine E. Welsh

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge