Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roman L. Tatusov is active.

Publication


Featured researches published by Roman L. Tatusov.


BMC Bioinformatics | 2003

The COG database: an updated version includes eukaryotes

Roman L. Tatusov; Natalie D. Fedorova; John D. Jackson; Aviva R. Jacobs; Boris Kiryutin; Eugene V. Koonin; Dmitri M. Krylov; Raja Mazumder; Sergei L. Mekhedov; Anastasia N. Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V. Sverdlov; Sona Vasudevan; Yuri I. Wolf; Jodie J. Yin; Darren A. Natale

BackgroundThe availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.ResultsWe describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after euk aryotic o rthologous g roups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The euk aryotic o rthologous g roups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.ConclusionThe updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.


Nucleic Acids Research | 2000

The COG database: a tool for genome-scale analysis of protein functions and evolution

Roman L. Tatusov; Michael Y. Galperin; Darren A. Natale; Eugene V. Koonin

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.


Nucleic Acids Research | 2001

The COG database: new developments in phylogenetic classification of proteins from complete genomes

Roman L. Tatusov; Darren A. Natale; Igor Garkavtsev; Tatiana Tatusova; Uma Shankavaram; Bachoti S. Rao; Boris Kiryutin; Michael Y. Galperin; Natalie D. Fedorova; Eugene V. Koonin

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.


Journal of Bacteriology | 2001

Genome Sequence and Comparative Analysis of the Solvent-Producing Bacterium Clostridium acetobutylicum

Jörk Nölling; Gary L. Breton; Marina V. Omelchenko; Kira S. Makarova; Qiandong Zeng; Rene Gibson; Hong Mei Lee; JoAnn Dubois; Dayong Qiu; Joseph Hitti; Finishing; Bioinformatics Teams; Yuri I. Wolf; Roman L. Tatusov; Fabrice Sabathé; Lynn Doucette-Stamm; Philippe Soucaille; Michael J. Daly; George N. Bennett; Eugene V. Koonin; Douglas R. Smith

The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.


Microbiology and Molecular Biology Reviews | 2001

Genome of the Extremely Radiation-Resistant Bacterium Deinococcus radiodurans Viewed from the Perspective of Comparative Genomics

Kira S. Makarova; L. Aravind; Yuri I. Wolf; Roman L. Tatusov; Kenneth W. Minton; Eugene V. Koonin; Michael J. Daly

SUMMARY The bacterium Deinococcus radiodurans shows remarkable resistance to a range of damage caused by ionizing radiation, desiccation, UV radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is best known for its extreme resistance to ionizing radiation; not only can it grow continuously in the presence of chronic radiation (6 kilorads/h), but also it can survive acute exposures to gamma radiation exceeding 1,500 kilorads without dying or undergoing induced mutation. These characteristics were the impetus for sequencing the genome of D. radiodurans and the ongoing development of its use for bioremediation of radioactive wastes. Although it is known that these multiple resistance phenotypes stem from efficient DNA repair processes, the mechanisms underlying these extraordinary repair capabilities remain poorly understood. In this work we present an extensive comparative sequence analysis of the Deinococcus genome. Deinococcus is the first representative with a completely sequenced genome from a distinct bacterial lineage of extremophiles, the Thermus-Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, supports the hypothesis that it is an ancient group with no clear affinities to any of the other known bacterial lineages. Distinctive features of the Deinococcus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to the collection of clusters of orthologous groups of proteins. Analysis of paralogs in Deinococcus has revealed several unique protein families. In addition, specific expansions of several other families including phosphatases, proteases, acyltransferases, and Nudix family pyrophosphohydrolases were detected. Genes that potentially affect DNA repair and recombination and stress responses were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes and are not present in other bacteria. For example, three proteins homologous to plant desiccation resistance proteins were identified, and these are particularly interesting because of the correlation between desiccation and radiation resistance. Compared to other bacteria, the D. radiodurans genome is enriched in repetitive sequences, namely, IS-like transposons and small intergenic repeats. In combination, these observations suggest that several different biological mechanisms contribute to the multiple DNA repair-dependent phenotypes of this organism.


Methods in Enzymology | 1996

Applications of network BLAST server.

Thomas L. Madden; Roman L. Tatusov; Jinghui Zhang

The sequence databases continue to grow at an extraordinary rate. Contributions come from both small laboratories and large-scale projects, such as the Merck EST project. This growth has placed new demands on computational sequence comparison tools such as BLAST. Even now it is no longer practical to evaluate some BLAST reports manually; it is necessary to filter the output by, for example, organism, source, or degree of annotation. The new network BLAST service makes such tools possible. It is also possible to present BLAST output in different formats, such as BLANCE. Perhaps most important of all, it becomes simple to call BLAST from another application, making it one step within an integrated system. This makes the automated preparation of sequence evaluations that include BLAST runs possible. In the near future we expect to see a number of applications that use the network BLAST interface to help molecular biologists search against a database that is growing not only in size but in biological richness.


BMC Evolutionary Biology | 2001

Genome trees constructed using five different approaches suggest new major bacterial clades

Yuri I. Wolf; Igor B. Rogozin; Nick V. Grishin; Roman L. Tatusov; Eugene V. Koonin

BackgroundThe availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes.ResultsFive largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota,and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota.ConclusionsWe conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.


Trends in Genetics | 1998

Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles

L. Aravind; Roman L. Tatusov; Yuri I. Wolf; D.Roland Walker; Eugene V. Koonin

Sequencing of multiple complete genomes of bacteria and archaea makes it possible to perform systematic, genome-scale comparisons that aim to delineate the genomic complement of a particular phenotype. Recently, the first genome of a hyperthermophilic bacterium, Aquifex aeolicus, has been sequenced1. Previous studies based on rRNA and aminoacyl-tRNA analysis had suggested a very early divergence of Aquifex from the rest of the bacteria2,3. Aquifex is exceptional among bacteria in that it occupies the hyperthermophilic niche otherwise dominated by archaea2. In the published analysis of the Aquifex genome, it has been concluded that the genome sequence yielded ‘only a few specific indications of thermophily’1. With three genomes of extreme thermophilic archaea (Methanococcus jannaschii, Methanobacterium thermoautotrophicum and Archaeoglobus fulgidus) currently available4–6, we reasoned that a detailed comparison of the Aquifex and archaeal genomes could reveal genome-scale adaptations for thermophily. The protein sequences encoded in all complete bacterial genomes were compared with the nonredundant protein sequence database using the gapped BLAST program7, and a phylogenetic breakdown was automatically produced using the TAX_COLLECTOR program (Ref. 8, and D.R. Walker, unpublished). The results show that the fraction of Aquifex gene products that have archaeal proteins as clear best hits is by far greater than for each of the other bacteria (Table 1). Taking the fraction of ‘archaeal’ genes in Bacillus subtilis (Table 1) as a conservative estimate for the random expectation in a bacterial genome and using the normal approximation of the binomial distribution, it could be estimated that the excess of ‘archaeal’ genes in Aquifex could not be explained by a random fluctuation, with p<<10210. A reciprocal comparison showed that, for proteins encoded in each of the three archaeal genomes, Aquifex proteins are the best hits significantly more frequently than proteins from other bacteria, even those with genomes 2–3 times larger than the Aquifex genome, such as Synechocystis sp. or B. subtilis (Table 2). In a complementary analysis, bacterial proteins were compared with Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles


Current Biology | 1996

Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli

Roman L. Tatusov; Arcady Mushegian; Peer Bork; Nigel P. Brown; William S. Hayes; Mark Borodovsky; Kenneth E. Rudd; Eugene V. Koonin

BACKGROUND The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution. RESULTS We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements. CONCLUSIONS By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.


Proceedings of the National Academy of Sciences of the United States of America | 2002

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens

Alexei I. Slesarev; Katja V. Mezhevaya; Kira S. Makarova; Nikolai Polushin; Ov Shcherbinina; Vera V. Shakhova; Galina I. Belova; L. Aravind; Darren A. Natale; Igor B. Rogozin; Roman L. Tatusov; Yuri I. Wolf; Karl O. Stetter; Andrei Malykh; Eugene V. Koonin; Sergei A. Kozyavkin

We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2′-modified oligonucleotides (Fimers). Sequencing redundancy (3.3×) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum. These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.

Collaboration


Dive into the Roman L. Tatusov's collaboration.

Top Co-Authors

Avatar

Eugene V. Koonin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Yuri I. Wolf

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Michael Y. Galperin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kira S. Makarova

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Darren A. Natale

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Igor B. Rogozin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

L. Aravind

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Tatiana Tatusova

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kenneth E. Rudd

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Nick V. Grishin

University of Texas Southwestern Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge