Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tatiana Tatusova is active.

Publication


Featured researches published by Tatiana Tatusova.


Nucleic Acids Research | 2004

Database resources of the National Center for Biotechnology Information.

David Wheeler; Deanna M. Church; Ron Edgar; Scott Federhen; Wolfgang Helmberg; Thomas L. Madden; Joan Pontius; Gregory D. Schuler; Lynn M. Schriml; Edwin Sequeira; Tugba O. Suzek; Tatiana Tatusova; Lukas Wagner

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.


Nucleic Acids Research | 2004

Entrez Gene: gene-centered information at NCBI

Donna Maglott; James Ostell; Kim D. Pruitt; Tatiana Tatusova

Entrez Gene () is NCBIs database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBIs Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBIs Entrez system, via NCBIs Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.


Nucleic Acids Research | 2001

The COG database: new developments in phylogenetic classification of proteins from complete genomes

Roman L. Tatusov; Darren A. Natale; Igor Garkavtsev; Tatiana Tatusova; Uma Shankavaram; Bachoti S. Rao; Boris Kiryutin; Michael Y. Galperin; Natalie D. Fedorova; Eugene V. Koonin

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.


Nucleic Acids Research | 2003

Database resources of the National Center for Biotechnology

David Wheeler; Deanna M. Church; Scott Federhen; Alex E. Lash; Thomas L. Madden; Joan Pontius; Gregory D. Schuler; Lynn M. Schriml; Edwin Sequeira; Tatiana Tatusova; Lukas Wagner

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBIs Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.


Nucleic Acids Research | 2012

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Kim D. Pruitt; Tatiana Tatusova; Garth Brown; Donna Maglott

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).


Journal of Virology | 2008

The Influenza Virus Resource at the National Center for Biotechnology Information

Yiming Bao; Pavel Bolotov; Dmitry Dernovoy; Boris Kiryutin; Leonid Zaslavsky; Tatiana Tatusova; Jim Ostell; David J. Lipman

Influenza epidemics cause morbidity and mortality worldwide (4). Each year in the United States, more than 200,000 patients are admitted to hospitals because of influenza and there are approximately 36,000 influenza-related deaths (14). In recent years, several subtypes of avian influenza viruses have jumped host species to infect humans. The H5N1 subtype, in particular, has been reported in 328 human cases and has caused 200 human deaths in 12 countries (World Health Organization, http://www.who.int/csr/disease/avian_influenza/country/cases_table_2007_09_10/en/index.html). These viruses have the potential to cause a pandemic in humans. Antiviral drugs and vaccines must be developed to minimize the damage that such a pandemic would bring. To achieve this, it is vital that researchers have free access to viral sequences in a timely fashion, and sequence analysis tools need to be readily available. Historically, the number of influenza virus sequences in public databases has been far less than those of some well-studied viruses, such as human immunodeficiency virus. The number of complete influenza virus genomes has been even smaller. In addition, many of the sequences were collected in the course of influenza surveillance programs that prioritized antigenically novel isolates. Although collecting antigenically novel isolates is appropriate for surveillance, it results in biased samples of sequenced isolates that are not representative of community cases of influenza (2, 13). Therefore, in 2004, the National Institute of Allergy and Infectious Diseases (NIAID) launched the Influenza Genome Sequencing Project (7), which aims to rapidly sequence influenza viruses from samples collected all over the world. Viral sequences were generated at the J. Craig Venter Institute, annotated at the National Center for Biotechnology Information (NCBI), and deposited in GenBank. In just over 2 years after the initiation of the project, more than 2,000 complete genomes of influenza viruses A and B had been deposited in GenBank. To help the research community to make full use of the wealth of information from such a large amount of data, which will be increasing continuously, the Influenza Virus Resource was created at NCBI in 2004.


Nucleic Acids Research | 2009

NCBI Reference Sequences: current status, policy and new initiatives.

Kim D. Pruitt; Tatiana Tatusova; William Klimke; Donna Maglott

NCBIs Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.


Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Omics A Journal of Integrative Biology | 2008

Toward an Online Repository of Standard Operating Procedures (SOPs) for (Meta)genomic Annotation

Samuel V. Angiuoli; Aaron Gussman; William Klimke; Guy Cochrane; Dawn Field; George M Garrity; Chinnappa D. Kodira; Nikos C. Kyrpides; Ramana Madupu; Victor Markowitz; Tatiana Tatusova; Nicholas R. Thomson; Owen White

The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.


Nucleic Acids Research | 2016

NCBI prokaryotic genome annotation pipeline.

Tatiana Tatusova; Michael DiCuccio; Azat Badretdin; Vyacheslav Chetvernin; Eric P. Nawrocki; Leonid Zaslavsky; Alexandre Lomsadze; Kim D. Pruitt; Mark Borodovsky; James Ostell

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBIs Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Collaboration


Dive into the Tatiana Tatusova's collaboration.

Top Co-Authors

Avatar

Kim D. Pruitt

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Donna Maglott

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Terence Murphy

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Garth Brown

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Craig Wallin

University of California

View shared research outputs
Top Co-Authors

Avatar

Mike Murphy

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Leonid Zaslavsky

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

William Klimke

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Ilene Mizrachi

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Yiming Bao

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge