Carrie A. Davis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carrie A. Davis is active.

Explore More

Publication

Featured researches published by Carrie A. Davis.

Bioinformatics | 2013

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin; Carrie A. Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R. Gingeras

MOTIVATION Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Nature | 2012

Landscape of transcription in human cells

Sarah Djebali; Carrie A. Davis; Angelika Merkel; Alexander Dobin; Timo Lassmann; Ali Mortazavi; Andrea Tanzer; Julien Lagarde; Wei Lin; Felix Schlesinger; Chenghai Xue; Georgi K. Marinov; Jainab Khatun; Brian A. Williams; Chris Zaleski; Joel Rozowsky; Maik Röder; Felix Kokocinski; Rehab F. Abdelhamid; Tyler Alioto; Igor Antoshechkin; Michael T. Baer; Nadav S. Bar; Philippe Batut; Kimberly Bell; Ian Bell; Sudipto Chakrabortty; Xian Chen; Jacqueline Chrast; Joao Curado

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

Genome Research | 2012

The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression

Thomas Derrien; Rory Johnson; Giovanni Bussotti; Andrea Tanzer; Sarah Djebali; Hagen Tilgner; Gregory Guernec; David Martin; Angelika Merkel; David G. Knowles; Julien Lagarde; Lavanya Veeravalli; Xiaoan Ruan; Yijun Ruan; Timo Lassmann; Piero Carninci; James B. Brown; Leonard Lipovich; José Manuel Rodríguez González; Mark G. Thomas; Carrie A. Davis; Ramin Shiekhattar; Thomas R. Gingeras; Tim Hubbard; Cedric Notredame; Jennifer Harrow; Roderic Guigó

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.

Nature | 2011

The developmental transcriptome of Drosophila melanogaster

Brenton R. Graveley; Angela N. Brooks; Joseph W. Carlson; Michael O. Duff; Jane M. Landolin; Li Min Yang; Carlo G. Artieri; Marijke J. van Baren; Nathan Boley; Benjamin W. Booth; James B. Brown; Lucy Cherbas; Carrie A. Davis; Alexander Dobin; Renhua Li; Wei Lin; John H. Malone; Nicolas R Mattiuzzo; David S. Miller; David Sturgill; Brian B. Tuch; Chris Zaleski; Dayu Zhang; Marco Blanchette; Sandrine Dudoit; Brian D. Eads; Richard E. Green; Ann S. Hammonds; Lichun Jiang; Phil Kapranov

Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.

Science | 2010

Identification of functional elements and regulatory circuits by Drosophila modENCODE

Sushmita Roy; Jason Ernst; Peter V. Kharchenko; Pouya Kheradpour; Nicolas Nègre; Matthew L. Eaton; Jane M. Landolin; Christopher A. Bristow; Lijia Ma; Michael F. Lin; Stefan Washietl; Bradley I. Arshinoff; Ferhat Ay; Patrick E. Meyer; Nicolas Robine; Nicole L. Washington; Luisa Di Stefano; Eugene Berezikov; Christopher D. Brown; Rogerio Candeias; Joseph W. Carlson; Adrian Carr; Irwin Jungreis; Daniel Marbach; Rachel Sealfon; Michael Y. Tolstorukov; Sebastian Will; Artyom A. Alekseyenko; Carlo G. Artieri; Benjamin W. Booth

From Genome to Regulatory Networks For biologists, having a genome in hand is only the beginning—much more investigation is still needed to characterize how the genome is used to help to produce a functional organism (see the Perspective by Blaxter). In this vein, Gerstein et al. (p. 1775) summarize for the Caenorhabditis elegans genome, and The modENCODE Consortium (p. 1787) summarize for the Drosophila melanogaster genome, full transcriptome analyses over developmental stages, genome-wide identification of transcription factor binding sites, and high-resolution maps of chromatin organization. Both studies identified regions of the nematode and fly genomes that show highly occupied targets (or HOT) regions where DNA was bound by more than 15 of the transcription factors analyzed and the expression of related genes were characterized. Overall, the studies provide insights into the organization, structure, and function of the two genomes and provide basic information needed to guide and correlate both focused and genome-wide studies. The Drosophila modENCODE project demonstrates the functional regulatory network of flies. To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

Cell | 2012

Extensive Promoter-centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation

Guoliang Li; Xiaoan Ruan; Raymond K. Auerbach; Kuljeet Singh Sandhu; Meizhen Zheng; Ping Wang; Huay Mei Poh; Yufen Goh; Joanne Lim; Jingyao Zhang; Hui Shan Sim; Su Qin Peh; Fabianus Hendriyan Mulawadi; Chin Thing Ong; Yuriy L. Orlov; Shuzhen Hong; Zhizhuo Zhang; Steve Landt; Debasish Raha; Ghia Euskirchen; Chia-Lin Wei; Weihong Ge; Huaien Wang; Carrie A. Davis; Katherine I. Fisher-Aylor; Ali Mortazavi; Mark Gerstein; Thomas R. Gingeras; Barbara J. Wold; Yi Sun

Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.

Genome Research | 2011

Synthetic spike-in standards for RNA-seq experiments

Lichun Jiang; Felix Schlesinger; Carrie A. Davis; Yu Zhang; Renhua Li; Marc L. Salit; Thomas R. Gingeras; Brian Oliver

High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.

Genome Biology | 2012

An encyclopedia of mouse DNA elements (Mouse ENCODE)

John A. Stamatoyannopoulos; Michael Snyder; Ross C. Hardison; Bing Ren; Thomas R. Gingeras; David M. Gilbert; Mark Groudine; M. A. Bender; Rajinder Kaul; Theresa K. Canfield; Erica Giste; Audra K. Johnson; Mia Zhang; Gayathri Balasundaram; Rachel Byron; Vaughan Roach; Peter J. Sabo; Richard Sandstrom; A Sandra Stehling; Robert E. Thurman; Sherman M. Weissman; Philip Cayting; Manoj Hariharan; Jin Lian; Yong Cheng; Stephen G. Landt; Zhihai Ma; Barbara J. Wold; Job Dekker; Gregory E. Crawford

To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

Nature | 2014

Diversity and dynamics of the Drosophila transcriptome

James B. Brown; Nathan Boley; Robert C. Eisman; Gemma May; Marcus H. Stoiber; Michael O. Duff; Ben W. Booth; Jiayu Wen; Soo Park; Ana Maria Suzuki; Kenneth H. Wan; Charles Yu; Dayu Zhang; Joseph W. Carlson; Lucy Cherbas; Brian D. Eads; David J. Miller; Keithanne Mockaitis; Johnny Roberts; Carrie A. Davis; Erwin Frise; Ann S. Hammonds; Sara H. Olson; Sol Shenker; David Sturgill; Anastasia A. Samsonova; Richard Weiszmann; Garret Robinson; Juan Hernandez; Justen Andrews

Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.

Nature Methods | 2012

Accurate identification of human Alu and non- Alu RNA editing sites

Gokul Ramaswami; Wei Ju Lin; Robert Piskol; Meng How Tan; Carrie A. Davis; Jin Billy Li

We developed a computational framework to robustly identify RNA editing sites using transcriptome and genome deep-sequencing data from the same individual. As compared with previous methods, our approach identified a large number of Alu and non-Alu RNA editing sites with high specificity. We also found that editing of non-Alu sites appears to be dependent on nearby edited Alu sites, possibly through the locally formed double-stranded RNA structure.

Explore More