Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Igor Tolstoy is active.

Publication


Featured researches published by Igor Tolstoy.


Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Nucleic Acids Research | 2014

RefSeq microbial genomes database: new representation and annotation strategy

Tatiana Tatusova; Stacy Ciufo; Boris Fedorov; Kathleen ONeill; Igor Tolstoy

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.


Nucleic Acids Research | 2009

The National Center for Biotechnology Information's Protein Clusters Database

William Klimke; Richa Agarwala; Azat Badretdin; Slava Chetvernin; Stacy Ciufo; Boris Fedorov; Boris Kiryutin; Kathleen O’Neill; Wolfgang Resch; Sergei Resenchuk; Susan C. Schafer; Igor Tolstoy; Tatiana Tatusova

Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.


Nucleic Acids Research | 2015

Gene: a gene-centered information resource at NCBI

Garth Brown; Vichet Hem; Kenneth S. Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D. Pruitt; Donna Maglott; Terence Murphy

The National Center for Biotechnology Informations (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBIs Entrez system, via NCBIs Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.


Nucleic Acids Research | 2015

Update on RefSeq microbial genomes resources

Tatiana Tatusova; Stacy Ciufo; Scott Federhen; Boris Fedorov; Richard McVeigh; Kathleen ONeill; Igor Tolstoy; Leonid Zaslavsky

NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10 000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30 000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach.


International Journal of Systematic and Evolutionary Microbiology | 2016

Phylogenomic analysis of the family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium acidaminophilum comb. nov.

Michael Y. Galperin; Vyacheslav Brover; Igor Tolstoy; Natalya Yutin

In 1994, analyses of clostridial 16S rRNA gene sequences led to the assignment of 18 species to Clostridium cluster XI, separating them from Clostridium sensu stricto (Clostridium cluster I). Subsequently, most cluster XI species have been assigned to the family Peptostreptococcaceae with some species being reassigned to new genera. However, several misclassified Clostridium species remained, creating a taxonomic conundrum and confusion regarding their status. Here, we have re-examined the phylogeny of cluster XI species by comparing the 16S rRNA gene-based trees with protein- and genome-based trees, where available. The resulting phylogeny of the Peptostreptococcaceae was consistent with the recent proposals on creating seven new genera within this family. This analysis also revealed a tight clustering of Clostridium litorale and Eubacterium acidaminophilum. Based on these data, we propose reassigning these two organisms to the new genus Peptoclostridium as Peptoclostridium litorale gen. nov. comb. nov. (the type species of the genus) and Peptoclostridium acidaminophilum comb. nov., respectively. As correctly noted in the original publications, the genera Acetoanaerobium and Proteocatella also fall within cluster XI, and can be assigned to the Peptostreptococcaceae. Clostridium sticklandii, which falls within radiation of genus Acetoanaerobium, is proposed to be reclassified as Acetoanaerobium sticklandii comb. nov. The remaining misnamed members of the Peptostreptococcaceae, [Clostridium] hiranonis, [Clostridium] paradoxum and [Clostridium] thermoalcaliphilum, still remain to be properly classified.


Archive | 2018

Bacteriophage Taxonomy: An Evolving Discipline

Igor Tolstoy; Andrew M. Kropinski; J. Rodney Brister

While taxonomy is an often-unappreciated branch of science it serves very important roles. Bacteriophage taxonomy has evolved from a mainly morphology-based discipline, characterized by the work of David Bradley and Hans-Wolfgang Ackermann, to the holistic approach that is taken today. The Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) takes a comprehensive approach to classifying prokaryote viruses measuring overall DNA and protein identity and phylogeny before making decisions about the taxonomic position of a new virus. The huge number of complete genomes being deposited with NCBI and other public databases has resulted in a reassessment of the taxonomy of many viruses, and the future will see the introduction of new viral families and higher orders.


bioRxiv | 2017

Genomic, proteomic, and phylogenetic analysis of spounaviruses indicates paraphyly of the order Caudovirales

Jakub Barylski; François Enault; Bas E. Dutilh; Margo Bp Schuller; Robert Edwards; Annika Gillis; Jochen Klumpp; Petar Knezevic; Mart Krupovic; Jens H. Kuhn; Rob Lavigne; Hanna M. Oksanen; Matthew B. Sullivan; Johannes Wittmann; Igor Tolstoy; J. Rodney Brister; Andrew M. Kropinski; Evelien M. Adriaenssens

ABSTRACT It is almost a cliche that tailed bacteriophages of the order Caudovirales are the most abundant and diverse viruses in the world. Yet, their taxonomy still consists of a single order with just three families: Myoviridae, Siphoviridae, and Podoviridae. Thousands of newly discovered phage genomes have recently challenged this morphology-based classification, revealing that tailed bacteriophages are genomically even more diverse than once thought. Here, we evaluate a range of methods for bacteriophage taxonomy by using a particularly challenging group as an example, the Bacillus phage SPO1-related viruses of the myovirid subfamily Spounavirinae. Exhaustive phylogenetic and phylogenomic analyses indicate that the spounavirins are consistent with the taxonomic rank of family and should be divided into at least five subfamilies. This work is a case study for virus genomic taxonomy and the first step in an impending massive reorganization of the tailed bacteriophage taxonomy.Since the mid-20th century, prokaryotic double-stranded DNA viruses producing tailed particles (“tailed phages”) were grouped according to virion tail morphology. In the early 1980s, these viruses were classified into the families Myoviridae, Siphoviridae, and Podoviridae, later included in the order Caudovirales. However, recent massive sequencing of prokaryotic virus genomes revealed that caudovirads are extremely diverse. The official taxonomic framework does not adequately reflect caudovirad evolutionary relationships. Here, we reevaluate the classification of caudovirads using a particularly challenging group of viruses with large dsDNA genomes: SPO1-like viruses associated with the myovirid subfamily Spounavirinae. Our extensive genomic, proteomic, and phylogenetic analyses reveal that some of the currently established caudovirad taxa, especially at the family and subfamily rank, can no longer be supported. Spounavirins alone need to be elevated to family rank and divided into at least five major clades, a first step in an impending massive reorganization of caudovirad taxonomy.


bioinformatics and biomedicine | 2011

An approach to phylogenomic analysis of bacterial pathogens

Leonid Zaslavsky; Vyacheslav Chetvernin; Dmitry Dernovoy; Boris Fedorov; William Klimke; Alexandre Souvorov; Igor Tolstoy; Tatiana Tatusova; David J. Lipman

From the beginning of the microbial genome sequencing era, researchers have shown a commendable commitment to phylogenetic diversity. The completion of one genome from each prokaryotic division or phylum is still a frequently articulated community goal. However, largely because of the interest in human pathogens and advances in sequencing technologies, there are also now a number of very closely related genomes whose organization and gene content can be directly compared. Studying genetic variability of pathogenic bacteria using whole-genome sequencing provides a way to understanding the mechanism of bacterial adaptation to rapid environmental changes and can be a source of useful information on virulence mechanisms. The bacterial genome datasets available in public archives represent a large collection of genome at different levels of sequence quality and assembly. A fast and reliable method of phylogenetic classification based on genome sequences provides a necessary foundation for a more detailed comparative analysis. NCBI has developed an approach of grouping bacterial organisms into phylogenetic clades using a genome dissimilarity measure based on the comparison of universally conserved markers. Special adjustments have been made to compensate for data inaccuracy and incompleteness. Tests performed on complete and draft genomes from phylum Proteobacteria demonstrated that the proposed robust genomic distance allows stable and reliable species-level clustering and can be used for forming phylogenetic clades. Since the tradeoff for the increased robustness of the method is its limited sensitivity at a very fine level, a phylogenomic refinement could be done within each constructed clade when file-level phylogenetic resolution of close genomes is necessary.


Archive | 2014

About Prokaryotic Genome Processing and Tools

Tatiana Tatusova; Stacy Ciufo; Boris Fedorov; Kathleen O’Neill; Igor Tolstoy; Leonid Zaslavsky

Collaboration


Dive into the Igor Tolstoy's collaboration.

Top Co-Authors

Avatar

Tatiana Tatusova

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Boris Fedorov

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Stacy Ciufo

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Leonid Zaslavsky

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

J. Rodney Brister

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kathleen O’Neill

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Azat Badretdin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Donna Maglott

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kathleen ONeill

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Olga Ermolaeva

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge