Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Y. Galperin is active.

Publication


Featured researches published by Michael Y. Galperin.


Nucleic Acids Research | 2000

The COG database: a tool for genome-scale analysis of protein functions and evolution

Roman L. Tatusov; Michael Y. Galperin; Darren A. Natale; Eugene V. Koonin

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.


Nucleic Acids Research | 2001

The COG database: new developments in phylogenetic classification of proteins from complete genomes

Roman L. Tatusov; Darren A. Natale; Igor Garkavtsev; Tatiana Tatusova; Uma Shankavaram; Bachoti S. Rao; Boris Kiryutin; Michael Y. Galperin; Natalie D. Fedorova; Eugene V. Koonin

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.


Microbiology and Molecular Biology Reviews | 2013

Cyclic di-GMP: the First 25 Years of a Universal Bacterial Second Messenger

Ute Römling; Michael Y. Galperin; Mark Gomelsky

SUMMARY Twenty-five years have passed since the discovery of cyclic dimeric (3′→5′) GMP (cyclic di-GMP or c-di-GMP). From the relative obscurity of an allosteric activator of a bacterial cellulose synthase, c-di-GMP has emerged as one of the most common and important bacterial second messengers. Cyclic di-GMP has been shown to regulate biofilm formation, motility, virulence, the cell cycle, differentiation, and other processes. Most c-di-GMP-dependent signaling pathways control the ability of bacteria to interact with abiotic surfaces or with other bacterial and eukaryotic cells. Cyclic di-GMP plays key roles in lifestyle changes of many bacteria, including transition from the motile to the sessile state, which aids in the establishment of multicellular biofilm communities, and from the virulent state in acute infections to the less virulent but more resilient state characteristic of chronic infectious diseases. From a practical standpoint, modulating c-di-GMP signaling pathways in bacteria could represent a new way of controlling formation and dispersal of biofilms in medical and industrial settings. Cyclic di-GMP participates in interkingdom signaling. It is recognized by mammalian immune systems as a uniquely bacterial molecule and therefore is considered a promising vaccine adjuvant. The purpose of this review is not to overview the whole body of data in the burgeoning field of c-di-GMP-dependent signaling. Instead, we provide a historic perspective on the development of the field, emphasize common trends, and illustrate them with the best available examples. We also identify unresolved questions and highlight new directions in c-di-GMP research that will give us a deeper understanding of this truly universal bacterial second messenger.


Molecular Microbiology | 2005

C-di-GMP: the dawning of a novel bacterial signalling system

Ute Römling; Mark Gomelsky; Michael Y. Galperin

Bis‐(3′‐5′)‐cyclic dimeric guanosine monophosphate (c‐di‐GMP) has come to the limelight as a result of the recent advances in microbial genomics and increased interest in multicellular microbial behaviour. Known for more than 15 years as an activator of cellulose synthase in Gluconacetobacter xylinus, c‐di‐GMP is emerging as a novel global second messenger in bacteria. The GGDEF and EAL domain proteins involved in c‐di‐GMP synthesis and degradation, respectively, are (almost) ubiquitous in bacterial genomes. These proteins affect cell differentiation and multicellular behaviour as well as interactions between the microorganisms and their eukaryotic hosts and other phenotypes. While the role of GGDEF and EAL domain proteins in bacterial physiology and behaviour has gained appreciation, and significant progress has been achieved in understanding the enzymology of c‐di‐GMP turnover, many questions regarding c‐di‐GMP‐dependent signalling remain unanswered. Among these, the key questions are the identity of targets of c‐di‐GMP action and mechanisms of c‐di‐GMP‐dependent regulation. This review discusses phylogenetic distribution of the c‐di‐GMP signalling pathway in bacteria, recent developments in biochemical and structural characterization of proteins involved in its metabolism, and biological processes affected by c‐di‐GMP. The accumulated data clearly indicate that a novel ubiquitous signalling system in bacteria has been discovered.


Proceedings of the National Academy of Sciences of the United States of America | 2003

Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome

Alexis Dufresne; Marcel Salanoubat; Frédéric Partensky; François Artiguenave; Ilka M. Axmann; Valérie Barbe; Simone Duprat; Michael Y. Galperin; Eugene V. Koonin; Florence Le Gall; Kira S. Makarova; Martin Ostrowski; Sophie Oztas; Catherine Robert; Igor B. Rogozin; David J. Scanlan; Nicole Tandeau de Marsac; Jean Weissenbach; Patrick Wincker; Yuri I. Wolf; Wolfgang R. Hess

Prochlorococcus marinus, the dominant photosynthetic organism in the ocean, is found in two main ecological forms: high-light-adapted genotypes in the upper part of the water column and low-light-adapted genotypes at the bottom of the illuminated layer. P. marinus SS120, the complete genome sequence reported here, is an extremely low-light-adapted form. The genome of P. marinus SS120 is composed of a single circular chromosome of 1,751,080 bp with an average G+C content of 36.4%. It contains 1,884 predicted protein-coding genes with an average size of 825 bp, a single rRNA operon, and 40 tRNA genes. Together with the 1.66-Mbp genome of P. marinus MED4, the genome of P. marinus SS120 is one of the two smallest genomes of a photosynthetic organism known to date. It lacks many genes that are involved in photosynthesis, DNA repair, solute uptake, intermediary metabolism, motility, phototaxis, and other functions that are conserved among other cyanobacteria. Systems of signal transduction and environmental stress response show a particularly drastic reduction in the number of components, even taking into account the small size of the SS120 genome. In contrast, housekeeping genes, which encode enzymes of amino acid, nucleotide, cofactor, and cell wall biosynthesis, are all present. Because of its remarkable compactness, the genome of P. marinus SS120 might approximate the minimal gene complement of a photosynthetic organism.


Journal of Bacteriology | 2006

Structural Classification of Bacterial Response Regulators: Diversity of Output Domains and Domain Combinations

Michael Y. Galperin

CheY-like phosphoacceptor (or receiver [REC]) domain is a common module in a variety of response regulators of the bacterial signal transduction systems. In this work, 4,610 response regulators, encoded in complete genomes of 200 bacterial and archaeal species, were identified and classified by their domain architectures. Previously uncharacterized output domains were analyzed and, in some cases, assigned to known domain families. Transcriptional regulators of the OmpR, NarL, and NtrC families were found to comprise almost 60% of all response regulators; transcriptional regulators with other DNA-binding domains (LytTR, AraC, Spo0A, Fis, YcbB, RpoE, and MerR) account for an additional 6%. The remaining one-third is represented by the stand-alone REC domain (approximately 14%) and its combinations with a variety of enzymatic (GGDEF, EAL, HD-GYP, CheB, CheC, PP2C, and HisK), RNA-binding (ANTAR and CsrA), protein- or ligand-binding (PAS, GAF, TPR, CAP_ED, and HPt) domains, or newly described domains of unknown function. The diversity of domain architectures and the abundance of alternative domain combinations suggest that fusions between the REC domain and various output domains is a widespread evolutionary mechanism that allows bacterial cells to regulate transcription, enzyme activity, and/or protein-protein interactions in response to environmental challenges. The complete list of response regulators encoded in each of the 200 analyzed genomes is available online at http://www.ncbi.nlm.nih.gov/Complete_Genomes/RRcensus.html.


Nucleic Acids Research | 2015

Expanded microbial genome coverage and improved protein family annotation in the COG database

Michael Y. Galperin; Kira S. Makarova; Yuri I. Wolf; Eugene V. Koonin

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.


BMC Evolutionary Biology | 2003

Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes

Boris Mirkin; Trevor I. Fenner; Michael Y. Galperin; Eugene V. Koonin

BackgroundComparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task.ResultsWe addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes.ConclusionsThe notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution.


Molecular Microbiology | 1997

Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea.

Eugene V. Koonin; Arcady Mushegian; Michael Y. Galperin; D. Roland Walker

Protein sequences encoded in three complete bacterial genomes, those of Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp., and the first available archaeal genome sequence, that of Methanococcus jannaschii, were analysed using the blast2 algorithm and methods for amino acid motif detection. Between 75% and 90% of the predicted proteins encoded in each of the bacterial genomes and 73% of the M. jannaschii proteins showed significant sequence similarity to proteins from other species. The fraction of bacterial and archaeal proteins containing regions conserved over long phylogenetic distances is nearly the same and close to 70%. Functions of 70–85% of the bacterial proteins and about 70% of the archaeal proteins were predicted with varying precision. This contrasts with the previous report that more than half of the archaeal proteins have no homologues and shows that, with more sensitive methods and detailed analysis of conserved motifs, archaeal genomes become as amenable to meaningful interpretation by computer as bacterial genomes. The analysis of conserved motifs resulted in the prediction of a number of previously undetected functions of bacterial and archaeal proteins and in the identification of novel protein families. In spite of the generally high conservation of protein sequences, orthologues of 25% or less of the M. jannaschii genes were detected in each individual completely sequenced genome, supporting the uniqueness of archaea as a distinct domain of life. About 53% of the M. jannaschii proteins belong to families of paralogues, a fraction similar to that in bacteria with larger genomes, such as Synechocystis sp. and Escherichia coli, but higher than that in H. influenzae, which has approximately the same number of genes as M. jannaschii. Certain groups of proteins, e.g. molecular chaperones and DNA repair enzymes, thought to be ubiquitous and represented in the minimal gene set derived by bacterial genome comparison, are missing in M. jannaschii, indicating massive non‐orthologous displacement of genes responsible for essential functions. An unexpectedly large fraction of the M. jannaschii gene products, 44%, shows significantly higher similarity to bacterial than to eukaryotic proteins, compared with 13% that have eukaryotic proteins as their closest homologues (the rest of the proteins show approximately the same level of similarity to bacterial and eukaryotic homologues or have no homologues). Proteins involved in translation, transcription, replication and protein secretion are most closely related to eukaryotic proteins, whereas metabolic enzymes, metabolite uptake systems, enzymes for cell wall biosynthesis and many uncharacterized proteins appear to be ‘bacterial’. A similar prevalence of proteins of apparent bacterial origin was observed among the currently available sequences from the distantly related archaeal genus, Sulfolobus. It is likely that the evolution of archaea included at least one major merger between ancestral cells from the bacterial lineage and the lineage leading to the eukaryotic nucleocytoplasm.


Nature Biotechnology | 2000

Who's your neighbor? New computational approaches for functional genomics.

Michael Y. Galperin; Eugene V. Koonin

Several recently developed computational approaches in comparative genomics go beyond sequence comparison. By analyzing phylogenetic profiles of protein families, domain fusions, gene adjacency in genomes, and expression patterns, these methods predict many functional interactions between proteins and help deduce specific functions for numerous proteins. Although some of the resultant predictions may not be highly specific, these developments herald a new era in genomics in which the benefits of comparative analysis of the rapidly growing collection of complete genomes will become increasingly obvious.

Collaboration


Dive into the Michael Y. Galperin's collaboration.

Top Co-Authors

Avatar

Eugene V. Koonin

Uniformed Services University of the Health Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kira S. Makarova

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yuri I. Wolf

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mark J. Jedrzejas

Children's Hospital Oakland Research Institute

View shared research outputs
Top Co-Authors

Avatar

Roman L. Tatusov

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Darren A. Natale

Georgetown University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge