Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Olga Ermolaeva is active.

Publication


Featured researches published by Olga Ermolaeva.


Nucleic Acids Research | 2014

RefSeq: an update on mammalian reference sequences

Kim D. Pruitt; Garth Brown; Susan M. Hiatt; Françoise Thibaud-Nissen; Alexander Astashyn; Olga Ermolaeva; Catherine M. Farrell; Jennifer Hart; Melissa J. Landrum; Kelly M. McGarvey; Michael R. Murphy; Nuala A. O’Leary; Shashikant Pujar; Bhanu Rajput; Sanjida H. Rangwala; Lillian D. Riddick; Andrei Shkeda; Hanzhen Sun; Pamela Tamez; Raymond E. Tully; Craig Wallin; David Webb; Janet Weber; Wendy Wu; Michael DiCuccio; Paul Kitts; Donna Maglott; Terence Murphy; James Ostell

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Nature Genetics | 1998

Data management and analysis for gene expression arrays

Olga Ermolaeva; Mohit Rastogi; Kim D. Pruitt; Gregory D. Schuler; Michael L. Bittner; Yidong Chen; Richard Simon; Paul S. Meltzer; Jeffrey M. Trent; Mark S. Boguski

Microarray technology makes it possible to simultaneously study the expression of thousands of genes during a single experiment. We have developed an information system, ArrayDB, to manage and analyse large-scale expression data. The underlying relational database was designed to allow flexibility in the nature and structure of data input and also in the generation of standard or customized reports through a web-browser interface. ArrayDB provides varied options for data retrieval and analysis tools that should facilitate the interpretation of complex hybridization results. A sampling of ArrayDB storage, retrieval and analysis capabilities is available (http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/), along with information on a set of approximately 15,000 genes used to fabricate several widely used microarrays. Information stored in ArrayDB is used to provide integrated gene expression reports by linking array target sequences with NCBIs Entrez retrieval system, UniGene and KEGG pathway views. The integration of external information resources is essential in interpreting intrinsic patterns and relationships in large-scale gene expression data.


Nucleic Acids Research | 2015

Gene: a gene-centered information resource at NCBI

Garth Brown; Vichet Hem; Kenneth S. Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D. Pruitt; Donna Maglott; Terence Murphy

The National Center for Biotechnology Informations (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBIs Entrez system, via NCBIs Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.


Proceedings of the 1999 Advances in Fluorescence Sensing Technology | 1999

Clustering analysis for gene expression data

Yidong Chen; Olga Ermolaeva; Michael L. Bittner; Paul S. Meltzer; Jeffrey M. Trent; Edward R. Dougherty; Sinan Batman

The recent development of cDNA microarray allows ready access to large amount gene expression patterns for many genetic materials. Gene expression of tissue samples can be quantitatively analyzed by hybridizing fluor-tagged mRNA to targets on a cDNA microarray. Ratios of average expression level arising from co-hybridized normal and pathological samples are extracted via image segmentation, thus the gene expression pattern are obtained. The gene expression in a given biological process may provide a fingerprint of the sample development, or response to certain treatment. We propose a K-mean based algorithm in which gene expression levels fluctuate in parallel will be clustered together. The resulting cluster suggests some functional relationships between genes, and some known genes belongs to a unique functional classes shall provide indication for unknown genes in the same clusters.


pp. 949-955. (2008) | 2008

The genome of the model beetle and pest Tribolium castaneum

Stephen Richards; Richard A. Gibbs; George M. Weinstock; Susan J. Brown; Robin E. Denell; Richard W. Beeman; Richard A.L. Gibbs; Gregor Bucher; Markus Friedrich; Cornelis J. P. Grimmelikhuijzen; Martin Klingler; Marcé D. Lorenzen; Siegfried Roth; Reinhard Schröder; Diethard Tautz; Evgeny M. Zdobnov; Donna M. Muzny; Tony Attaway; Stephanie Bell; Christian Buhay; Mimi N. Chandrabose; Dean Chavez; Kp Clerk-Blankenburg; Andrew Cree; Marvin Diep Dao; Clay Davis; Joseph Chacko; Huyen Dinh; Shannon Dugan-Rocha; Gerald Fowler

Collaboration


Dive into the Olga Ermolaeva's collaboration.

Top Co-Authors

Avatar

Donna Maglott

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kim D. Pruitt

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Alexander Astashyn

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Bhanu Rajput

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Catherine M. Farrell

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Craig Wallin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Garth Brown

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jeffrey M. Trent

Translational Genomics Research Institute

View shared research outputs
Top Co-Authors

Avatar

Michael L. Bittner

Translational Genomics Research Institute

View shared research outputs
Top Co-Authors

Avatar

Paul S. Meltzer

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge