Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chelsea J.-T. Ju is active.

Publication


Featured researches published by Chelsea J.-T. Ju.


Nature Genetics | 2018

Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits

Farhad Hormozdiari; Steven Gazal; Bryce van de Geijn; Hilary Finucane; Chelsea J.-T. Ju; Po-Ru Loh; Armin Schoech; Yakir A. Reshef; Xuanyao Liu; Luke O’Connor; Alexander Gusev; Eleazar Eskin; Alkes L. Price

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10−31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10−35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.A new set of functional annotations based on fine-mapped molecular quantitative trait loci from GTEx and BLUEPRINT consortium data are enriched for disease heritability across 41 diseases and complex traits.


international conference on data engineering | 2017

AZTEC: A Cloud-based Computational Platform to Integrate Biomedical Resources

Patrick Tan; Yichao Zhou; Xinxin Huang; Giuseppe M. Mazzeo; Chelsea J.-T. Ju

Omics phenotyping has become increasingly recognizedin our path to Precision Medicine. A major computationalchallenge of our investigator community is to identify the necessarydata analytical tools to process multi-omics data. To aidnavigation of the analytical tools fragmented across the web, wehave created a novel computational resource platform, AZTEC,that empowers users to simultaneously search a diverse arrayof digital resources including databases, standalone software,web services, publications, and large libraries composed ofmany interrelated functions. AZTEC fosters an environment ofsustainable resource support and discovery, enabling researchersto overcome the challenges of information science. An online videoof the demonstration can be viewed at https://www.youtube.com/watch?v=FIxvlof6Kbg.


international conference on bioinformatics | 2017

Fleximer: Accurate Quantification of RNA-Seq via Variable-Length k-mers

Chelsea J.-T. Ju; Ruirui Li; Zhengliang Wu; Jyun-Yu Jiang; Zhao Yang; Wei Wang

The advent of RNA-Seq has made it possible to quantify transcript expression on a large scale simultaneously. This technology generates small fragments of each transcript sequence, known as sequencing reads. As the first step of data analysis towards expression quantification, most of the existing methods align these reads to a reference genome or transcriptome to establish their origins. However, read alignment is computationally costly. Recently, a series of methods have been proposed to perform a lightweight quantification analysis in an alignment-free manner. These methods utilize the notion of k-mers, which are short consecutive sequences representing the signatures of each transcript, to estimate the relative abundance from RNA-Seq reads. Current k-mer based approaches make use of a set of fixed size k-mers; however, the true signatures of each transcript may not exist in a fixed size. In this paper, we demonstrate the importance of k-mers selection in transcript abundance estimation. We propose a novel method, Fleximer, to efficiently discover and select an optimal set of k-mers with flexible lengths. Using both simulated and real datasets, we show that, with fewer k-mers, Fleximer is able to cover the similar amount of reads as Sailfish and Kallisto. The selected k-mers own more distinguishing features, and thus substantially reduce the errors in transcript abundance estimation.


bioRxiv | 2017

TahcoRoll: An Efficient Approach for Signature Profiling in Genomic Data through Variable-Length k-mers

Chelsea J.-T. Ju; Jyun-Yu Jiang; Ruirui Li; Zeyu Li; Wei Wang

k-mer profiling has been one of the trending approaches to analyze read data generated by high-throughput sequencing technologies. The tasks of k-mer profiling include, but are not limited to, counting the frequencies and determining the occurrences of short sequences in a dataset. The notion of k-mer has been extensively used to build de Bruijn graphs in genome or transcriptome assembly, which requires examining all possible k-mers presented in the dataset. Recently, an alternative way of profiling has been proposed, which constructs a set of representative k-mers as genomic markers and profiles their occurrences in the sequencing data. This technique has been applied in both transcript quantification through RNA-Seq and taxonomic classification of metagenomic reads. Most of these applications use a set of fixed-size k-mers since the majority of existing k-mer counters are inadequate to process genomic sequences with variable-length k-mers. However, choosing the appropriate k is challenging, as it varies for different applications. As a pioneer work to profile a set of variable-length k-mers, we propose TahcoRoll in order to enhance the Aho-Corasick algorithm. More specifically, we use one bit to represent each nucleotide, and integrate the rolling hash technique to construct an efficient in-memory data structure for this task. Using both synthetic and real datasets, results show that TahcoRoll outperforms existing approaches in either or both time and memory efficiency without using any disk space. In addition, compared to the most efficient state-of-the-art k-mer counters, such as KMC and MSBWT, TahcoRoll is the only approach that can process long read data from both PacBio and Oxford Nanopore on a commodity desktop computer. The source code of TahcoRoll is implemented in C++14, and available at https://github.com/chelseaju/TahcoRoll.git.


bioRxiv | 2017

Leveraging molecular QTL to understand the genetic architecture of diseases and complex traits

Farhad Hormozdiari; Steven Gazal; Bryce van de Geijn; Hilary Finucane; Chelsea J.-T. Ju; Po-Ru Loh; Armin Schoech; Yakir A. Reshef; Xuanyao Liu; Luke O'Connor; Alexander Gusev; Eleazar Eskin; Alkes L. Price

There is increasing evidence that many GWAS risk loci are molecular QTL for gene ex-pression (eQTL), histone modification (hQTL), splicing (sQTL), and/or DNA methylation (meQTL). Here, we introduce a new set of functional annotations based on causal posterior prob-abilities (CPP) of fine-mapped molecular cis-QTL, using data from the GTEx and BLUEPRINT consortia. We show that these annotations are very strongly enriched for disease heritability across 41 independent diseases and complex traits (average N = 320K): 5.84x for GTEx eQTL, and 5.44x for eQTL, 4.27-4.28x for hQTL (H3K27ac and H3K4me1), 3.61x for sQTL and 2.81x for meQTL in BLUEPRINT (all P ≤ 1.39e-10), far higher than enrichments obtained using stan-dard functional annotations that include all significant molecular cis-QTL (1.17-1.80x). eQTL annotations that were obtained by meta-analyzing all 44 GTEx tissues generally performed best, but tissue-specific blood eQTL annotations produced stronger enrichments for autoimmune dis-eases and blood cell traits and tissue-specific brain eQTL annotations produced stronger enrich-ments for brain-related diseases and traits, despite high cis-genetic correlations of eQTL effect sizes across tissues. Notably, eQTL annotations restricted to loss-of-function intolerant genes from ExAC were even more strongly enriched for disease heritability (17.09x; vs. 5.84x for all genes; P = 4.90e-17 for difference). All molecular QTL except sQTL remained significantly enriched for disease heritability in a joint analysis conditioned on each other and on a broad set of functional annotations from previous studies, implying that each of these annotations is uniquely informative for disease and complex trait architectures.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

Efficient Approach to Correct Read Alignment for Pseudogene Abundance Estimates

Chelsea J.-T. Ju; Zhuangtian Zhao; Wei Wang

RNA-Sequencing has been the leading technology to quantify expression of thousands of genes simultaneously. The data analysis of an RNA-Seq experiment starts from aligning short reads to the reference genome/transcriptome or reconstructed transcriptome. However, current aligners lack the sensitivity to distinguish reads that come from homologous regions of an genome. One group of these homologies is the paralog pseudogenes. Pseudogenes arise from duplication of a set of protein coding genes, and have been considered as degraded paralogs in the genome due to their lost of functionality. Recent studies have provided evidence to support their novel regulatory roles in biological processes. With the growing interests in quantifying the expression level of pseudogenes at different tissues or cell lines, it is critical to have a sensitive method that can correctly align ambiguous reads and accurately estimate the expression level among homologous genes. Previously in PseudoLasso, we proposed a linear regression approach to learn read alignment behaviors, and to leverage this knowledge for abundance estimation and alignment correction. In this paper, we extend the work of PseudoLasso by grouping the homologous genomic regions into different communities using a community detection algorithm, followed by building a linear regression model separately for each community. The results show that this approach is able to retain the same accuracy as PseudoLasso. By breaking the genome into smaller homologous communities, the running time is improved from quadratic growth to linear with respect to the number of genes.


American Journal of Human Genetics | 2017

Widespread Allelic Heterogeneity in Complex Traits

Farhad Hormozdiari; Anthony Zhu; Gleb Kichaev; Chelsea J.-T. Ju; Ayellet V. Segrè; Jong Wha J. Joo; Hyejung Won; Sriram Sankararaman; Bogdan Pasaniuc; Sagiv Shifman; Eleazar Eskin


arXiv: Digital Libraries | 2017

Aztec: A Platform to Render Biomedical Software Findable, Accessible, Interoperable, and Reusable.

Wei Wang; Brian J. Bleakley; Chelsea J.-T. Ju; Vincent Kyi; Patrick Tan; Howard Choi; Xinxin Huang; Yichao Zhou; Justin Wood; Ding Wang; Alex A. T. Bui; Peipei Ping


Journal of Molecular and Cellular Cardiology | 2017

Temporal Dynamics of Plasma Metabolites in ISO-induced Cardiac Remodeling in Mice

Quan Cao; Howard Choi; Ding Wang; David A. Liem; Chelsea J.-T. Ju; Jennifer S Polson; Wei Wang; Peipei Ping


F1000Research | 2017

Aztec: automated biomedical tool index with improved information retrieval system

Wei Wang; Yichao Zhou; Patrick Tan; Vincent Kyi; Xinxin Huang; Chelsea J.-T. Ju; Justin Wood; Peipei Ping

Collaboration


Dive into the Chelsea J.-T. Ju's collaboration.

Top Co-Authors

Avatar

Wei Wang

University of California

View shared research outputs
Top Co-Authors

Avatar

Peipei Ping

University of California

View shared research outputs
Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David A. Liem

University of California

View shared research outputs
Top Co-Authors

Avatar

Ding Wang

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge