Jingchun Zhu
University of California, Santa Cruz
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jingchun Zhu.
Bioinformatics | 2010
Charles J. Vaske; Stephen Charles Benz; J. Zachary Sanborn; Dent Earl; Christopher W. Szeto; Jingchun Zhu; David Haussler; Joshua M. Stuart
Motivation: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines. Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients. Results: We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathways activities (e.g. internal gene states, interactions or high-level ‘outputs’) are altered in the patient using probabilistic inference. Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients. Availability:Source code available at http://sbenz.github.com/Paradigm Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Nucleic Acids Research | 2011
Mary Goldman; Brian Craft; Teresa Swatloski; Kyle Ellrott; Melissa S. Cline; Mark Diekhans; Singer Ma; Chris Wilks; Joshua M. Stuart; David Haussler; Jingchun Zhu
The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) comprises a suite of web-based tools to integrate, visualize and analyze cancer genomics and clinical data. The browser displays whole-genome views of genome-wide experimental measurements for multiple samples alongside their associated clinical information. Multiple data sets can be viewed simultaneously as coordinated ‘heatmap tracks’ to compare across studies or different data modalities. Users can order, filter, aggregate, classify and display data interactively based on any given feature set including clinical features, annotated biological pathways and user-contributed collections of genes. Integrated standard statistical tools provide dynamic quantitative analysis within all available data sets. The browser hosts a growing body of publicly available cancer genomics data from a variety of cancer types, including data generated from the Cancer Genome Atlas project. Multiple consortiums use the browser on confidential prepublication data enabled by private installations. Many new features have been added, including the hgMicroscope tumor image viewer, hgSignature for real-time genomic signature evaluation on any browser track, and ‘PARADIGM’ pathway tracks to display integrative pathway activities. The browser is integrated with the UCSC Genome Browser; thus inheriting and integrating the Genome Browser’s rich set of human biology and genetics data that enhances the interpretability of the cancer genomics data.
Nature Methods | 2011
Xin Zhou; Brett Maricque; Mingchao Xie; Daofeng Li; Vasavi Sundaram; Eric A Martin; Brian C. Koebbe; Cydney Nielsen; Martin Hirst; Peggy J. Farnham; Robert M. Kuhn; Jingchun Zhu; Ivan Smirnov; W. James Kent; David Haussler; Pamela A. F. Madden; Joseph F. Costello; Ting Wang
To the Editor: Advances in next-generation sequencing have reshaped the landscape of genomic and epigenomic research. Large consortia such as the Encyclopedia of DNA Elements, the Roadmap Epigenomics Mapping Consortium and The Cancer Genome Atlas have generated tens of thousands of sequencingbased genome-wide datasets, creating a reference and resource for the scientific community. Small groups of researchers now can rapidly obtain huge volumes of genomic data, which need to be placed in the context of the consortium data for comparison. These data are often accompanied by rich metadata describing the sample and experiment, which is critical for their interpretation. Visualizing, navigating and interpreting such data in a meaningful way is a daunting challenge1. We developed the Human Epigenome Browser to host Human Epigenome Atlas data produced by the Roadmap Epigenomics Project2 and to support navigation of the Atlas and its interactive visualization, integration, comparison and analysis (http://epigenomegateway.wustl.edu/; see Supplementary Note and Supplementary Protocol for main components and use). The Browser is web-based, and it extends the seminal concept introduced by the University of California Santa Cruz Cancer Genomics Browser3 to support large, sequencing-based datasets. Epigenome measurements are displayed as genome heatmaps in which color gradients reflect signal strength (Fig. 1 and Supplementary Fig. 1). Metadata such as cell type, assay type, epigenetic mark and phenotype of the sample are encoded numerically and displayed in different colors by a metadata heatmap next to the genome heatmap (Fig. 1 and Supplementary Figs. 2–4). Investigators can zoom and pan in a ‘Google Maps’–like style to examine dozens to
Scientific Reports | 2013
Melissa S. Cline; Brian Craft; Teresa Swatloski; Mary Goldman; Singer Ma; David Haussler; Jingchun Zhu
The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) offers interactive visualization and exploration of TCGA genomic, phenotypic, and clinical data, as produced by the Cancer Genome Atlas Research Network. Researchers can explore the impact of genomic alterations on phenotypes by visualizing gene and protein expression, copy number, DNA methylation, somatic mutation and pathway inference data alongside clinical features, Pan-Cancer subtype classifications and genomic biomarkers. Integrated Kaplan–Meier survival analysis helps investigators to assess survival stratification by any of the information.
PLOS Computational Biology | 2005
Jingchun Zhu; J. Zachary Sanborn; Mark Diekhans; Craig B. Lowe; Tom H. Pringle; David Haussler
Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.
The EMBO Journal | 2012
Jesse R. Raab; Jonathan Chiu; Jingchun Zhu; Sol Katzman; Sreenivasulu Kurukuti; Paul A. Wade; David Haussler; Rohinton T. Kamakaka
Insulators help separate active chromatin domains from silenced ones. In yeast, gene promoters act as insulators to block the spread of Sir and HP1 mediated silencing while in metazoans most insulators are multipartite autonomous entities. tDNAs are repetitive sequences dispersed throughout the human genome and we now show that some of these tDNAs can function as insulators in human cells. Using computational methods, we identified putative human tDNA insulators. Using silencer blocking, transgene protection and repressor blocking assays we show that some of these tDNA‐containing fragments can function as barrier insulators in human cells. We find that these elements also have the ability to block enhancers from activating RNA pol II transcribed promoters. Characterization of a putative tDNA insulator in human cells reveals that the site possesses chromatin signatures similar to those observed at other better‐characterized eukaryotic insulators. Enhanced 4C analysis demonstrates that the tDNA insulator makes long‐range chromatin contacts with other tDNAs and ETC sites but not with intervening or flanking RNA pol II transcribed genes.
Molecular & Cellular Proteomics | 2014
Rehan Akbani; Karl-Friedrich Becker; Neil O. Carragher; Theodore C. Goldstein; Leanne De Koning; Ulrike Korf; Lance A. Liotta; Gordon B. Mills; Satoshi Nishizuka; Michael Pawlak; Emanuel F. Petricoin; Harvey B. Pollard; Bryan Serrels; Jingchun Zhu
Reverse phase protein array (RPPA) technology introduced a miniaturized “antigen-down” or “dot-blot” immunoassay suitable for quantifying the relative, semi-quantitative or quantitative (if a well-accepted reference standard exists) abundance of total protein levels and post-translational modifications across a variety of biological samples including cultured cells, tissues, and body fluids. The recent evolution of RPPA combined with more sophisticated sample handling, optical detection, quality control, and better quality affinity reagents provides exquisite sensitivity and high sample throughput at a reasonable cost per sample. This facilitates large-scale multiplex analysis of multiple post-translational markers across samples from in vitro, preclinical, or clinical samples. The technical power of RPPA is stimulating the application and widespread adoption of RPPA methods within academic, clinical, and industrial research laboratories. Advances in RPPA technology now offer scientists the opportunity to quantify protein analytes with high precision, sensitivity, throughput, and robustness. As a result, adopters of RPPA technology have recognized critical success factors for useful and maximum exploitation of RPPA technologies, including the following: preservation and optimization of pre-analytical sample quality, application of validated high-affinity and specific antibody (or other protein affinity) detection reagents, dedicated informatics solutions to ensure accurate and robust quantification of protein analytes, and quality-assured procedures and data analysis workflows compatible with application within regulated clinical environments. In 2011, 2012, and 2013, the first three Global RPPA workshops were held in the United States, Europe, and Japan, respectively. These workshops provided an opportunity for RPPA laboratories, vendors, and users to share and discuss results, the latest technology platforms, best practices, and future challenges and opportunities. The outcomes of the workshops included a number of key opportunities to advance the RPPA field and provide added benefit to existing and future participants in the RPPA research community. The purpose of this report is to share and disseminate, as a community, current knowledge and future directions of the RPPA technology.
PLOS ONE | 2014
Amie Radenbaugh; Singer Ma; Adam D. Ewing; Joshua M. Stuart; Eric A. Collisson; Jingchun Zhu; David Haussler
The detection of somatic single nucleotide variants is a crucial component to the characterization of the cancer genome. Mutation calling algorithms thus far have focused on comparing the normal and tumor genomes from the same individual. In recent years, it has become routine for projects like The Cancer Genome Atlas (TCGA) to also sequence the tumor RNA. Here we present RADIA (RNA and DNA Integrated Analysis), a novel computational method combining the patient-matched normal and tumor DNA with the tumor RNA to detect somatic mutations. The inclusion of the RNA increases the power to detect somatic mutations, especially at low DNA allelic frequencies. By integrating an individual’s DNA and RNA, we are able to detect mutations that would otherwise be missed by traditional algorithms that examine only the DNA. We demonstrate high sensitivity (84%) and very high precision (98% and 99%) for RADIA in patient data from endometrial carcinoma and lung adenocarcinoma from TCGA. Mutations with both high DNA and RNA read support have the highest validation rate of over 99%. We also introduce a simulation package that spikes in artificial mutations to patient data, rather than simulating sequencing data from a reference genome. We evaluate sensitivity on the simulation data and demonstrate our ability to rescue back mutations at low DNA allelic frequencies by including the RNA. Finally, we highlight mutations in important cancer genes that were rescued due to the incorporation of the RNA.
Nature Biotechnology | 2017
John Vivian; Arjun Arkal Rao; Frank Austin Nothaft; Christopher Ketchum; Joel Armstrong; Adam M. Novak; Jacob Pfeil; Jake Narkizian; Alden Deran; Audrey Musselman-Brown; Hannes Schmidt; Peter Amstutz; Brian Craft; Mary Goldman; Kate R. Rosenbloom; Melissa S. Cline; Brian O'Connor; Megan Hanna; Chet Birger; W. James Kent; David A. Patterson; Anthony D. Joseph; Jingchun Zhu; Sasha Zaranek; Gad Getz; David Haussler; Benedict Paten
1. Baker, M. Nature 533, 452–454 (2016). 2. Yachie, N. et al. Nat. Biotechnol. 35, 310–312 (2017). 3. Hadimioglu, B., Stearns, R. & Ellson, R. J. Lab. Autom. 21, 4–18 (2016). 4. ANSI SLAS 1–2004: Footprint dimensions; ANSI SLAS 2–2004: Height dimensions; ANSI SLAS 3–2004: Bottom outside flange dimensions; ANSI SLAS 4–2004: Well positions; (ANSI SLAS, 2004). 5. Mckernan, K. & Gustafson, E. in DNA Sequencing II: Optimizing Preparation and Cleanup (ed. Kieleczawa, J.) 9.128 (Jones and Bartlett Publishers, 2006). 6. Storch, M. et al. BASIC: a new biopart assembly standard for idempotent cloning provides accurate, singletier DNA assembly for synthetic biology. ACS Synth. Biol. 4, 781–787 (2015). open sharing of protocols. With a precise ontology to describe standardized protocols, it may be possible to share methods widely and create community standards. We envisage that in future individual research laboratories, or clusters of colocated laboratories, will have in-house, low-cost automation work cells but will access DNA foundries via the cloud to carry out complex experimental workflows. Technologies enabling this from companies such as Emerald Cloud Lab (S. San Francisco, CA, USA), Synthace (London) and Transcriptic (Menlo Park, CA, USA) could, for example, send experimental designs to foundries and return output data to a researcher. This ‘mixed economy’ should accelerate the development and sharing of standardized protocols and metrology standards and shift a growing proportion of molecular, cellular and synthetic biology into a fully quantitative and reproducible era.
bioRxiv | 2016
John Vivian; Arjun Rao; Frank Austin Nothaft; Christopher Ketchum; Joel Armstrong; Adam M. Novak; Jacob Pfeil; Jake Narkizian; Alden Deran; Audrey Musselman-Brown; Hannes Schmidt; Peter Amstutz; Brian Craft; Mary Goldman; Kate R. Rosenbloom; Melissa S. Cline; Brian O'Connor; Megan Hanna; Chet Birger; W. James Kent; David A. Patterson; Anthony D. Joseph; Jingchun Zhu; Sasha Zaranek; Gad Getz; David Haussler; Benedict Paten
Toil is portable, open-source workflow software that supports contemporary workflow definition languages and can be used to securely and reproducibly run scientific workflows efficiently at large-scale. To demonstrate Toil, we processed over 20,000 RNA-seq samples to create a consistent meta-analysis of five datasets free of computational batch effects that we make freely available. Nearly all the samples were analysed in under four days using a commercial cloud cluster of 32,000 preemptable cores.