Alexandre Lomsadze | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandre Lomsadze is active.

Explore More

Publication

Featured researches published by Alexandre Lomsadze.

Nucleic Acids Research | 2016

NCBI prokaryotic genome annotation pipeline.

Tatiana Tatusova; Michael DiCuccio; Azat Badretdin; Vyacheslav Chetvernin; Eric P. Nawrocki; Leonid Zaslavsky; Alexandre Lomsadze; Kim D. Pruitt; Mark Borodovsky; James Ostell

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBIs Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Nucleic Acids Research | 2010

Ab initio gene identification in metagenomic sequences

Wenhan Zhu; Alexandre Lomsadze; Mark Borodovsky

We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.

Genome Research | 2008

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training

Vardges Ter-Hovhannisyan; Alexandre Lomsadze; Yury O. Chernoff; Mark Borodovsky

We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus)

Jason E. Stajich; Sarah K. Wilke; Dag Ahrén; Chun Hang Au; Bruce W. Birren; Mark Borodovsky; Claire Burns; Björn Canbäck; Lorna A. Casselton; Chi Keung Cheng; Jixin Deng; Fred S. Dietrich; David C. Fargo; Mark L. Farman; Allen C. Gathman; Jonathan M. Goldberg; Roderic Guigó; Patrick J. Hoegger; James Hooker; Ashleigh Huggins; Timothy Y. James; Takashi Kamada; Sreedhar Kilaru; Chinnapa Kodira; Ursula Kües; Doris M. Kupfer; Hoi Shan Kwan; Alexandre Lomsadze; Weixi Li; Walt W. Lilly

The mushroom Coprinopsis cinerea is a classic experimental model for multicellular development in fungi because it grows on defined media, completes its life cycle in 2 weeks, produces some 108 synchronized meiocytes, and can be manipulated at all stages in development by mutation and transformation. The 37-megabase genome of C. cinerea was sequenced and assembled into 13 chromosomes. Meiotic recombination rates vary greatly along the chromosomes, and retrotransposons are absent in large regions of the genome with low levels of meiotic recombination. Single-copy genes with identifiable orthologs in other basidiomycetes are predominant in low-recombination regions of the chromosome. In contrast, paralogous multicopy genes are found in the highly recombining regions, including a large family of protein kinases (FunK1) unique to multicellular fungi. Analyses of P450 and hydrophobin gene families confirmed that local gene duplications drive the expansions of paralogous copies and the expansions occur in independent lineages of Agaricomycotina fungi. Gene-expression patterns from microarrays were used to dissect the transcriptional program of dikaryon formation (mating). Several members of the FunK1 kinase family are differentially regulated during sexual morphogenesis, and coordinate regulation of adjacent duplications is rare. The genomes of C. cinerea and Laccaria bicolor, a symbiotic basidiomycete, share extensive regions of synteny. The largest syntenic blocks occur in regions with low meiotic recombination rates, no transposable elements, and tight gene spacing, where orthologous single-copy genes are overrepresented. The chromosome assembly of C. cinerea is an essential resource in understanding the evolution of multicellularity in the fungi.

Genome Biology | 2012

The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation

Guillaume Blanc; Irina V. Agarkova; Jane Grimwood; Alan Kuo; Andrew J. Brueggeman; David D. Dunigan; James R. Gurnon; Istvan Ladunga; Erika Lindquist; Susan Lucas; Jasmyn Pangilinan; Thomas Pröschold; Asaf Salamov; Jeremy Schmutz; Donald P. Weeks; Takashi Yamada; Alexandre Lomsadze; Mark Borodovsky; Jean-Michel Claverie; Igor V. Grigoriev; James L. Van Etten

BackgroundLittle is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced.ResultsThe 48.8 Mb genome contained in 20 chromosomes exhibits significant synteny conservation with the chromosomes of its relatives Chlorella variabilis and Chlamydomonas reinhardtii. The order of the genes is highly reshuffled within synteny blocks, suggesting that intra-chromosomal rearrangements were more prevalent than inter-chromosomal rearrangements. Remarkably, Zepp retrotransposons occur in clusters of nested elements with strictly one cluster per chromosome probably residing at the centromere. Several protein families overrepresented in C. subellipsoidae include proteins involved in lipid metabolism, transporters, cellulose synthases and short alcohol dehydrogenases. Conversely, C-169 lacks proteins that exist in all other sequenced chlorophytes, including components of the glycosyl phosphatidyl inositol anchoring system, pyruvate phosphate dikinase and the photosystem 1 reaction center subunit N (PsaN).ConclusionsWe suggest that some of these gene losses and gains could have contributed to adaptation to low temperatures. Comparison of these genomic features with the adaptive strategies of psychrophilic microbes suggests that prokaryotes and eukaryotes followed comparable evolutionary routes to adapt to cold environments.

Journal of Virology | 2004

Identification of Proteins Associated with Murine Cytomegalovirus Virions

Lisa M. Kattenhorn; Ryan Mills; Markus Wagner; Alexandre Lomsadze; Vsevolod J. Makeev; Mark Borodovsky; Hidde L. Ploegh; Benedikt M. Kessler

ABSTRACT Proteins associated with the murine cytomegalovirus (MCMV) viral particle were identified by a combined approach of proteomic and genomic methods. Purified MCMV virions were dissociated by complete denaturation and subjected to either separation by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and in-gel digestion or treated directly by in-solution tryptic digestion. Peptides were separated by nanoflow liquid chromatography and analyzed by tandem mass spectrometry (LC-MS/MS). The MS/MS spectra obtained were searched against a database of MCMV open reading frames (ORFs) predicted to be protein coding by an MCMV-specific version of the gene prediction algorithm GeneMarkS. We identified 38 proteins from the capsid, tegument, glycoprotein, replication, and immunomodulatory protein families, as well as 20 genes of unknown function. Observed irregularities in coding potential suggested possible sequence errors in the 3′-proximal ends of m20 and M31. These errors were experimentally confirmed by sequencing analysis. The MS data further indicated the presence of peptides derived from the unannotated ORFs ORFc225441-226898 (m166.5) and ORF105932-106072. Immunoblot experiments confirmed expression of m166.5 during viral infection.

Bioinformatics | 2016

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS

Katharina Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke

MOTIVATION Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. RESULTS We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. AVAILABILITY AND IMPLEMENTATION BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

PLOS ONE | 2011

The Genome Sequence of the North-European Cucumber (Cucumis sativus L.) Unravels Evolutionary Adaptation Mechanisms in Plants

Rafał Wóycicki; Justyna Witkowicz; Piotr Gawroński; Joanna Dąbrowska; Alexandre Lomsadze; Magdalena Pawełkowicz; Ewa Siedlecka; Kohei Yagi; Wojciech Pląder; Anna Seroczyńska; Mieczysław Śmiech; Wojciech Gutman; Katarzyna Niemirowicz-Szczytt; Grzegorz Bartoszewski; Norikazu Tagashira; Yoshikazu Hoshi; Mark Borodovsky; Stanislaw Karpinski; Stefan Malepszy; Zbigniew Przybecki

Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar – Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions.

Nucleic Acids Research | 2014

Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm

Alexandre Lomsadze; Paul D. Burns; Mark Borodovsky

We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a novel method that integrates RNA-Seq read alignments into the self-training procedure. Use of ‘assembled’ RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments. We demonstrated in computational experiments that the proposed method of incorporation of ‘unassembled’ RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%. In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.

Methods of Molecular Biology | 2009

In Sffamily Identification of Genes in Bacteriophage DNA

Andrew M. Kropinski; Mark Borodovsky; Tim Carver; Ana Cerdeño-Tárraga; Aaron E. Darling; Alexandre Lomsadze; Padmanabhan Mahadevan; Paul Stothard; Donald Seto; Gary Van Domselaar; David S. Wishart

One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.

Explore More