Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Catherine M. Farrell is active.

Publication


Featured researches published by Catherine M. Farrell.


Nucleic Acids Research | 2014

RefSeq: an update on mammalian reference sequences

Kim D. Pruitt; Garth Brown; Susan M. Hiatt; Françoise Thibaud-Nissen; Alexander Astashyn; Olga Ermolaeva; Catherine M. Farrell; Jennifer Hart; Melissa J. Landrum; Kelly M. McGarvey; Michael R. Murphy; Nuala A. O’Leary; Shashikant Pujar; Bhanu Rajput; Sanjida H. Rangwala; Lillian D. Riddick; Andrei Shkeda; Hanzhen Sun; Pamela Tamez; Raymond E. Tully; Craig Wallin; David Webb; Janet Weber; Wendy Wu; Michael DiCuccio; Paul Kitts; Donna Maglott; Terence Murphy; James Ostell

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Genome Research | 2009

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Kim D. Pruitt; Jennifer Harrow; Rachel A. Harte; Craig Wallin; Mark Diekhans; Donna Maglott; Steve Searle; Catherine M. Farrell; Jane Loveland; Barbara J. Ruef; Elizabeth Hart; Marie-Marthe Suner; Melissa J. Landrum; Bronwen Aken; Sarah Ayling; Robert Baertsch; Julio Fernandez-Banet; Joshua L. Cherry; Val Curwen; Michael DiCuccio; Manolis Kellis; Jennifer M. Lee; Michael F. Lin; Michael Schuster; Andrew Shkeda; Clara Amid; Garth Brown; Oksana Dukhanina; Adam Frankish; Jennifer Hart

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Proceedings of the National Academy of Sciences of the United States of America | 2002

The insulation of genes from external enhancers and silencing chromatin.

Bonnie Burgess-Beusse; Catherine M. Farrell; Miklos Gaszner; Michael D. Litt; Vesco Mutskov; Félix Recillas-Targa; Melanie A. Simpson; Adam G. West; Gary Felsenfeld

Insulators are DNA sequence elements that can serve in some cases as barriers to protect a gene against the encroachment of adjacent inactive condensed chromatin. Some insulators also can act as blocking elements to protect against the activating influence of distal enhancers associated with other genes. Although most of the insulators identified so far derive from Drosophila, they also are found in vertebrates. An insulator at the 5′ end of the chicken β-globin locus marks a boundary between an open chromatin domain and a region of constitutively condensed chromatin. Detailed analysis of this element shows that it possesses both enhancer blocking activity and the ability to screen reporter genes against position effects. Enhancer blocking is associated with binding of the protein CTCF; sites that bind CTCF are found at other critical points in the genome. Protection against position effects involves other properties that appear to be associated with control of histone acetylation and methylation. Insulators thus are complex elements that can help to preserve the independent function of genes embedded in a genome in which they are surrounded by regulatory signals they must ignore.


Molecular and Cellular Biology | 2002

Conserved CTCF Insulator Elements Flank the Mouse and Human β-Globin Loci

Catherine M. Farrell; Adam G. West; Gary Felsenfeld

ABSTRACT A binding site for the transcription factor CTCF is responsible for enhancer-blocking activity in a variety of vertebrate insulators, including the insulators at the 5′ and 3′ chromatin boundaries of the chicken β-globin locus. To date, no functional domain boundaries have been defined at mammalian β-globin loci, which are embedded within arrays of functional olfactory receptor genes. In an attempt to define boundary elements that could separate these gene clusters, CTCF-binding sites were searched for at the most distal DNase I-hypersensitive sites (HSs) of the mouse and human β-globin loci. Conserved CTCF sites were found at 5′HS5 and 3′HS1 of both loci. All of these sites could bind to CTCF in vitro. The sites also functioned as insulators in enhancer-blocking assays at levels correlating with CTCF-binding affinity, although enhancer-blocking activity was weak with the mouse 5′HS5 site. These results show that with respect to enhancer-blocking elements, the architecture of the mouse and human β-globin loci is similar to that found previously for the chicken β-globin locus. Unlike the chicken locus, the mouse and human β-globin loci do not have nearby transitions in chromatin structure but the data suggest that 3′HS1 and 5′HS5 may function as insulators that prevent inappropriate interactions between β-globin regulatory elements and those of neighboring domains or subdomains, many of which possess strong enhancers.


Molecular and Cellular Biology | 2003

A Complex Chromatin Landscape Revealed by Patterns of Nuclease Sensitivity and Histone Modification within the Mouse β-Globin Locus

Michael Bulger; Dirk Schübeler; M. A. Bender; Joan Hamilton; Catherine M. Farrell; Ross C. Hardison; Mark Groudine

ABSTRACT In order to create an extended map of chromatin features within a mammalian multigene locus, we have determined the extent of nuclease sensitivity and the pattern of histone modifications associated with the mouse β-globin genes in adult erythroid tissue. We show that the nuclease-sensitive domain encompasses the β-globin genes along with several flanking olfactory receptor genes that are inactive in erythroid cells. We describe enhancer-blocking or boundary elements on either side of the locus that are bound in vivo by the transcription factor CTCF, but we found that they do not coincide with transitions in nuclease sensitivity flanking the locus or with patterns of histone modifications within it. In addition, histone hyperacetylation and dimethylation of histone H3 K4 are not uniform features of the nuclease-sensitive mouse β-globin domain but rather define distinct subdomains within it. Our results reveal a complex chromatin landscape for the active β-globin locus and illustrate the complexity of broad structural changes that accompany gene activation.


Nucleic Acids Research | 2014

Current status and new features of the Consensus Coding Sequence database

Catherine M. Farrell; Nuala A. O’Leary; Rachel A. Harte; Jane Loveland; Laurens Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M. J. Searle; Bronwen Aken; Susan M. Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A. Steward; Garth Brown; Ruth Bennett; Michael R. Murphy; Wendy Wu; Mike Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D. Riddick; Toby Hunt; David Webb; Mark G. Thomas; Pamela Tamez; Sanjida H. Rangwala

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Database | 2012

Tracking and coordinating an international curation effort for the CCDS Project

Rachel A. Harte; Catherine M. Farrell; Jane Loveland; Marie-Marthe Suner; Laurens Wilming; Bronwen Aken; Daniel Barrell; Adam Frankish; Craig Wallin; Steve Searle; Mark Diekhans; Jennifer Harrow; Kim D. Pruitt

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. Database URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi


Nucleic Acids Research | 2018

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Shashikant Pujar; Nuala A. O’Leary; Catherine M. Farrell; Jane Loveland; Jonathan M Mudge; Craig Wallin; Carlos García Girón; Mark Diekhans; If Barnes; Ruth Bennett; Andrew E Berry; Eric Cox; Claire Davidson; Tamara Goldfarb; Jose Gonzalez; Toby Hunt; John D. Jackson; Vinita Joardar; Mike P Kay; Vamsi K. Kodali; Fergal J Martin; Monica McAndrews; Kelly M. McGarvey; Mike Murphy; Bhanu Rajput; Sanjida H. Rangwala; Lillian D. Riddick; Ruth L Seal; Marie-Marthe Suner; David Webb

Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Genes & Development | 2002

The barrier function of an insulator couples high histone acetylation levels with specific protection of promoter DNA from methylation

Vesco Mutskov; Catherine M. Farrell; Paul A. Wade; Alan P. Wolffe; Gary Felsenfeld

Collaboration


Dive into the Catherine M. Farrell's collaboration.

Top Co-Authors

Avatar

Bhanu Rajput

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Gary Felsenfeld

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Craig Wallin

University of California

View shared research outputs
Top Co-Authors

Avatar

David Webb

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kim D. Pruitt

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Lillian D. Riddick

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Donna Maglott

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Garth Brown

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jennifer Hart

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Mark Diekhans

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge