Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Charles Yu is active.

Publication


Featured researches published by Charles Yu.


Science | 2010

Identification of functional elements and regulatory circuits by Drosophila modENCODE

Sushmita Roy; Jason Ernst; Peter V. Kharchenko; Pouya Kheradpour; Nicolas Nègre; Matthew L. Eaton; Jane M. Landolin; Christopher A. Bristow; Lijia Ma; Michael F. Lin; Stefan Washietl; Bradley I. Arshinoff; Ferhat Ay; Patrick E. Meyer; Nicolas Robine; Nicole L. Washington; Luisa Di Stefano; Eugene Berezikov; Christopher D. Brown; Rogerio Candeias; Joseph W. Carlson; Adrian Carr; Irwin Jungreis; Daniel Marbach; Rachel Sealfon; Michael Y. Tolstorukov; Sebastian Will; Artyom A. Alekseyenko; Carlo G. Artieri; Benjamin W. Booth

From Genome to Regulatory Networks For biologists, having a genome in hand is only the beginning—much more investigation is still needed to characterize how the genome is used to help to produce a functional organism (see the Perspective by Blaxter). In this vein, Gerstein et al. (p. 1775) summarize for the Caenorhabditis elegans genome, and The modENCODE Consortium (p. 1787) summarize for the Drosophila melanogaster genome, full transcriptome analyses over developmental stages, genome-wide identification of transcription factor binding sites, and high-resolution maps of chromatin organization. Both studies identified regions of the nematode and fly genomes that show highly occupied targets (or HOT) regions where DNA was bound by more than 15 of the transcription factors analyzed and the expression of related genes were characterized. Overall, the studies provide insights into the organization, structure, and function of the two genomes and provide basic information needed to guide and correlate both focused and genome-wide studies. The Drosophila modENCODE project demonstrates the functional regulatory network of flies. To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.


Cell | 2011

A Protein Complex Network of Drosophila melanogaster

K. G. Guruharsha; Jean François Rual; Bo Zhai; Julian Mintseris; Pujita Vaidya; Namita Vaidya; Chapman Beekman; Christina Y. Wong; David Y. Rhee; Odise Cenaj; Emily McKillip; Saumini Shah; Mark Stapleton; Kenneth H. Wan; Charles Yu; Bayan Parsa; Joseph W. Carlson; Xiao Chen; Bhaveen Kapadia; K. VijayRaghavan; Steven P. Gygi; Susan E. Celniker; Robert A. Obar; Spyros Artavanis-Tsakonas

Determining the composition of protein complexes is an essential step toward understanding the cell as an integrated system. Using coaffinity purification coupled to mass spectrometry analysis, we examined protein associations involving nearly 5,000 individual, FLAG-HA epitope-tagged Drosophila proteins. Stringent analysis of these data, based on a statistical framework designed to define individual protein-protein interactions, led to the generation of a Drosophila protein interaction map (DPiM) encompassing 556 protein complexes. The high quality of the DPiM and its usefulness as a paradigm for metazoan proteomes are apparent from the recovery of many known complexes, significant enrichment for shared functional attributes, and validation in human cells. The DPiM defines potential novel members for several important protein complexes and assigns functional links to 586 protein-coding genes lacking previous experimental annotation. The DPiM represents, to our knowledge, the largest metazoan protein complex map and provides a valuable resource for analysis of protein complex evolution.


Nature | 2014

Diversity and dynamics of the Drosophila transcriptome

James B. Brown; Nathan Boley; Robert C. Eisman; Gemma May; Marcus H. Stoiber; Michael O. Duff; Ben W. Booth; Jiayu Wen; Soo Park; Ana Maria Suzuki; Kenneth H. Wan; Charles Yu; Dayu Zhang; Joseph W. Carlson; Lucy Cherbas; Brian D. Eads; David J. Miller; Keithanne Mockaitis; Johnny Roberts; Carrie A. Davis; Erwin Frise; Ann S. Hammonds; Sara H. Olson; Sol Shenker; David Sturgill; Anastasia A. Samsonova; Richard Weiszmann; Garret Robinson; Juan Hernandez; Justen Andrews

Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.


Genome Biology | 2002

A Drosophila full-length cDNA resource

Mark Stapleton; Joe Carlson; Peter Brokstein; Charles Yu; Mark Champe; Reed A. George; Hannibal Guarin; Brent Kronmiller; Joanne Pacleb; Soo Park; Ken Wan; Gerald M. Rubin; Susan E. Celniker

BackgroundA collection of sequenced full-length cDNAs is an important resource both for functional genomics studies and for the determination of the intron-exon structure of genes. Providing this resource to the Drosophila melanogaster research community has been a long-term goal of the Berkeley Drosophila Genome Project. We have previously described the Drosophila Gene Collection (DGC), a set of putative full-length cDNAs that was produced by generating and analyzing over 250,000 expressed sequence tags (ESTs) derived from a variety of tissues and developmental stages.ResultsWe have generated high-quality full-insert sequence for 8,921 clones in the DGC. We compared the sequence of these clones to the annotated Release 3 genomic sequence, and identified more than 5,300 cDNAs that contain a complete and accurate protein-coding sequence. This corresponds to at least one splice form for 40% of the predicted D. melanogaster genes. We also identified potential new cases of RNA editing.ConclusionsWe show that comparison of cDNA sequences to a high-quality annotated genomic sequence is an effective approach to identifying and eliminating defective clones from a cDNA collection and ensure its utility for experimentation. Clones were eliminated either because they carry single nucleotide discrepancies, which most probably result from reverse transcriptase errors, or because they are truncated and contain only part of the protein-coding sequence.


Genome Research | 2011

Genome-wide analysis of promoter architecture in Drosophila melanogaster

Roger A. Hoskins; Jane M. Landolin; James B. Brown; Jeremy E. Sandler; Hazuki Takahashi; Timo Lassmann; Charles Yu; Benjamin W. Booth; Dayu Zhang; Kenneth H. Wan; Li Yang; Nathan Boley; Justen Andrews; Thomas C. Kaufman; Brenton R. Graveley; Peter J. Bickel; Piero Carninci; Joseph W. Carlson; Susan E. Celniker

Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.


Scientific Data | 2014

Long-read, whole-genome shotgun sequence data for five model organisms.

Kristi Kim; Paul Peluso; Primo Babayan; P. Jane Yeadon; Charles Yu; William W. Fisher; Chen-Shan Chin; Nicole A Rapicavoli; David Rank; Joachim J. Li; David E. A. Catcheside; Susan E. Celniker; Adam M. Phillippy; Casey M. Bergman; Jane M Landolin

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


Nucleic Acids Research | 2005

Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP)

Roger A. Hoskins; Mark Stapleton; Reed A. George; Charles Yu; Kenneth H. Wan; Joseph W. Carlson; Susan E. Celniker

cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5′- and 3′-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT–PCR approaches.


Nature Protocols | 2006

High-Throughput Plasmid cDNA Library Screening

Kenneth H. Wan; Charles Yu; Reed A. George; Joseph W. Carlson; Roger A. Hoskins; Robert Svirskas; Mark Stapleton; Susan E. Celniker

Libraries of cDNA clones are valuable resources for analyzing the expression, structure and regulation of genes, and for studying protein functions and interactions. Full-length cDNA clones provide information about intron and exon structures, splice junctions, and 5′ and 3′ untranslated regions (UTRs). Open reading frames (ORFs) derived from cDNA clones can be used to generate constructs allowing the expression of both wild-type proteins and proteins tagged at their amino or carboxy terminus. Thus, obtaining full-length cDNA clones and sequences for most or all genes in an organism is essential for understanding genome functions. EST sequencing samples cDNA libraries at random, an approach that is most useful at the beginning of large-scale screening projects. As projects progress towards completion, however, the probability of identifying unique cDNAs by EST sequencing diminishes, resulting in poor recovery of rare transcripts. Here we describe an adapted, high-throughput protocol intended for the recovery of specific, full-length clones from plasmid cDNA libraries in 5 d.


Genome Announcements | 2017

Complete Genome Sequence of Lactobacillus plantarum Oregon-R-modENCODE Strain BDGP2 Isolated from Drosophila melanogaster Gut

Kenneth H. Wan; Charles Yu; Soo Park; Ann S. Hammonds; Benjamin W. Booth; Susan E. Celniker

ABSTRACT Lactobacillus plantarum Oregon-R-modENCODE strain BDGP2 was isolated from the gut of Drosophila melanogaster for functional host microbial interaction studies. The complete genome comprised a single circular genome of 3,407,160 bp, with a G+C content of 44%, and four plasmids.


Genome Announcements | 2017

Complete Genome Sequence of Enterococcus durans Oregon-R-modENCODE Strain BDGP3, a Lactic Acid Bacterium Found in the Drosophila melanogaster Gut

Kenneth H. Wan; Charles Yu; Soo Park; Ann S. Hammonds; Benjamin W. Booth; Susan E. Celniker

ABSTRACT Enterococcus durans Oregon-R-modENCODE strain BDGP3 was isolated from the Drosophila melanogaster gut for functional host-microbe interaction studies. The complete genome is composed of a single circular genome of 2,983,334 bp, with a G+C content of 38%, and a single plasmid of 5,594 bp.

Collaboration


Dive into the Charles Yu's collaboration.

Top Co-Authors

Avatar

Kenneth H. Wan

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Susan E. Celniker

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Joseph W. Carlson

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Soo Park

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ann S. Hammonds

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Benjamin W. Booth

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Mark Stapleton

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Reed A. George

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Roger A. Hoskins

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Dayu Zhang

Indiana University Bloomington

View shared research outputs
Researchain Logo
Decentralizing Knowledge