Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shunping Huang is active.

Publication


Featured researches published by Shunping Huang.


Nature Genetics | 2014

Heritability and genomics of gene expression in peripheral blood

Fred A. Wright; Patrick F. Sullivan; Andrew I. Brooks; Fei Zou; Wei Sun; Kai Xia; Vered Madar; Rick Jansen; Wonil Chung; Yi Hui Zhou; Abdel Abdellaoui; Sandra Batista; Casey Butler; Guanhua Chen; Ting-huei Chen; David B. D'Ambrosio; Paul J. Gallins; Min Jin Ha; Jouke-Jan Hottenga; Shunping Huang; Mathijs Kattenberg; Jaspreet Kochar; Christel M. Middeldorp; Ani Qu; Andrey A. Shabalin; Jay A. Tischfield; Laura Todd; Jung-Ying Tzeng; Gerard van Grootheest; Jacqueline M. Vink

We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.


Bioinformatics | 2010

TEAM: efficient two-locus epistasis tests in human genome-wide association study

Xiang Zhang; Shunping Huang; Fei Zou; Wei Wang

As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene–gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach. Contact: [email protected]


Bioinformatics | 2012

seeQTL: A searchable database for human eQTLs

Kai Xia; Andrey A. Shabalin; Shunping Huang; Vered Madar; Yi Hui Zhou; Wei Wang; Fei Zou; Wei Sun; Patrick F. Sullivan; Fred A. Wright

Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots. Availability and implementation: seeQTL is freely available for non-commercial use at http://www.bios.unc.edu/research/genomic_software/seeQTL/. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Nature Genetics | 2015

Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance

James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh

Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.


Source Code for Biology and Medicine | 2011

Tools for efficient epistasis detection in genome-wide association study

Xiang Zhang; Shunping Huang; Fei Zou; Wei Wang

BackgroundGenome-wide association study (GWAS) aims to find genetic factors underlying complex phenotypic traits, for which epistasis or gene-gene interaction detection is often preferred over single-locus approach. However, the computational burden has been a major hurdle to apply epistasis test in the genome-wide scale due to a large number of single nucleotide polymorphism (SNP) pairs to be tested.ResultsWe have developed a set of three efficient programs, FastANOVA, COE and TEAM, that support epistasis test in a variety of problem settings in GWAS. These programs utilize permutation test to properly control error rate such as family-wise error rate (FWER) and false discovery rate (FDR). They guarantee to find the optimal solutions, and significantly speed up the process of epistasis detection in GWAS.ConclusionsA web server with user interface and source codes are available at the website http://www.csbio.unc.edu/epistasis/. The source codes are also available at SourceForge http://sourceforge.net/projects/epistasis/.


Database | 2014

A novel multi-alignment pipeline for high-throughput sequencing data

Shunping Huang; James Holt; Chia Yu Kao; Leonard McMillan; Wei Wang

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo


Bioinformatics | 2013

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

Zhaojun Zhang; Shunping Huang; Jack Wang; Xiang Zhang; Fernando Pardo-Manuel de Villena; Leonard McMillan; Wei Wang

Motivation: RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives) and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observe that ∼3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, ∼10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries. Results: We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls owing to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that >16.3% of them are false positives. Availability: The software can be downloaded at http://csbio.unc.edu/genescissors/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


international conference on bioinformatics | 2013

Transforming Genomes Using MOD Files with Applications

Shunping Huang; Chia Yu Kao; Leonard McMillan; Wei Wang

Next generation sequencing techniques have enabled new methods of DNA and RNA quantification. Many of these methods require a step of aligning short reads to some reference genome. If the target organism differs significantly from this reference, alignment errors can lead to significant errors in downstream analysis. Various attempts have been tried to integrate known genetic variants into the reference genome so as to construct sample-specific genomes to improve read alignments. However, many hurdles in generating and annotating such genomes remain unsolved. In this paper, we propose a general framework for mapping back and forth between genomes. It employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping. We demonstrate the utility of this framework using three inbred mouse strains. We built pseudogenomes from the mm9 mouse reference genome for three highly divergent mouse strains based on MOD files and used them to map the gene annotations to these new genomes. We observe that a large fraction of genes have their positions or ranges altered. Finally, using RNA-seq and DNA-seq short reads from these strains, we demonstrate that mapping to the new genomes yields a better alignment result than mapping to the standard reference. The MOD files for the 17 mouse strains sequenced in the Wellcome Trust Sanger Institutes Mouse Genomes Project can be found at http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo The auxiliary tools (i.e. MODtools and Lapels), written in Python, are available at http://code.google.com/p/modtools/ and http://code.google.com/p/lapels/.


international conference on bioinformatics | 2013

Read Annotation Pipeline for High-Throughput Sequencing Data

James Holt; Shunping Huang; Leonard McMillan; Wei Wang

Mapping reads to a reference sequence is a common step when analyzing allele effects in high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins.


Genetics | 2012

Rapid and Robust Resampling-Based Multiple-Testing Correction with Application in a Genome-Wide Expression Quantitative Trait Loci Study

Xiang Zhang; Shunping Huang; Wei Sun; Wei Wang

Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap procedures is a standard approach to address the multiple-testing problem in eQTL studies. A brute force application of the resampling-based test to large-scale eQTL data sets is often computationally infeasible. Several computationally efficient methods have been proposed to calculate approximate resampling-based P-values. However, these methods rely on certain assumptions about the correlation structure of the genetic markers, which may not be valid for certain studies. We propose a novel algorithm, rapid and exact multiple testing correction by resampling (REM), to address this challenge. REM calculates the exact resampling-based P-values in a computationally efficient manner. The computational advantage of REM lies in its strategy of pruning the search space by skipping genetic markers whose upper bounds on test statistics are small. REM does not rely on any assumption about the correlation structure of the genetic markers. It can be applied to a variety of resampling-based multiple-testing correction methods including permutation and bootstrap methods. We evaluate REM on three eQTL data sets (yeast, inbred mouse, and human rare variants) and show that it achieves accurate resampling-based P-value estimation with much less computational cost than existing methods. The software is available at http://csbio.unc.edu/eQTL.

Collaboration


Dive into the Shunping Huang's collaboration.

Top Co-Authors

Avatar

Wei Wang

University of California

View shared research outputs
Top Co-Authors

Avatar

Wei Sun

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Xiang Zhang

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Fei Zou

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Leonard McMillan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Ginger D. Shaw

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

James J. Crowley

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Patrick F. Sullivan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Vasyl Zhabotynsky

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Alan B. Lenarcic

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge