Anton Korobeynikov
Saint Petersburg State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anton Korobeynikov.
BMC Genomics | 2013
Sergey I. Nikolenko; Anton Korobeynikov; Max A. Alekseyev
Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYES HAMMER. While BAYES HAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYES HAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.
Genome Research | 2017
Sergey Nurk; Dmitry Meleshko; Anton Korobeynikov; Pavel A. Pevzner
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
PLOS ONE | 2014
Coates Rc; Sheila Podell; Anton Korobeynikov; Alla Lapidus; Pavel A. Pevzner; David H. Sherman; Eric E. Allen; Lena Gerwick; William H. Gerwick
Cyanobacteria possess the unique capacity to naturally produce hydrocarbons from fatty acids. Hydrocarbon compositions of thirty-two strains of cyanobacteria were characterized to reveal novel structural features and insights into hydrocarbon biosynthesis in cyanobacteria. This investigation revealed new double bond (2- and 3-heptadecene) and methyl group positions (3-, 4- and 5-methylheptadecane) for a variety of strains. Additionally, results from this study and literature reports indicate that hydrocarbon production is a universal phenomenon in cyanobacteria. All cyanobacteria possess the capacity to produce hydrocarbons from fatty acids yet not all accomplish this through the same metabolic pathway. One pathway comprises a two-step conversion of fatty acids first to fatty aldehydes and then alkanes that involves a fatty acyl ACP reductase (FAAR) and aldehyde deformylating oxygenase (ADO). The second involves a polyketide synthase (PKS) pathway that first elongates the acyl chain followed by decarboxylation to produce a terminal alkene (olefin synthase, OLS). Sixty-one strains possessing the FAAR/ADO pathway and twelve strains possessing the OLS pathway were newly identified through bioinformatic analyses. Strains possessing the OLS pathway formed a cohesive phylogenetic clade with the exception of three Moorea strains and Leptolyngbya sp. PCC 6406 which may have acquired the OLS pathway via horizontal gene transfer. Hydrocarbon pathways were identified in one-hundred-forty-two strains of cyanobacteria over a broad phylogenetic range and there were no instances where both the FAAR/ADO and the OLS pathways were found together in the same genome, suggesting an unknown selective pressure maintains one or the other pathway, but not both.
Journal of Natural Products | 2015
Karin Kleigrewe; Jehad Almaliti; Isaac Yuheng Tian; Robin B. Kinnel; Anton Korobeynikov; Emily A. Monroe; Brendan M. Duggan; Vincenzo Di Marzo; David H. Sherman; Pieter C. Dorrestein; Lena Gerwick; William H. Gerwick
An innovative approach was developed for the discovery of new natural products by combining mass spectrometric metabolic profiling with genomic analysis and resulted in the discovery of the columbamides, a new class of di- and trichlorinated acyl amides with cannabinomimetic activity. Three species of cultured marine cyanobacteria, Moorea producens 3L, Moorea producens JHB, and Moorea bouillonii PNG, were subjected to genome sequencing and analysis for their recognizable biosynthetic pathways, and this information was then compared with their respective metabolomes as detected by MS profiling. By genome analysis, a presumed regulatory domain was identified upstream of several previously described biosynthetic gene clusters in two of these cyanobacteria, M. producens 3L and M. producens JHB. A similar regulatory domain was identified in the M. bouillonii PNG genome, and a corresponding downstream biosynthetic gene cluster was located and carefully analyzed. Subsequently, MS-based molecular networking identified a series of candidate products, and these were isolated and their structures rigorously established. On the basis of their distinctive acyl amide structure, the most prevalent metabolite was evaluated for cannabinomimetic properties and found to be moderate affinity ligands for CB1.
Bioinformatics | 2014
Andrey D. Prjibelski; Irina Vasilinetc; Anton Bankevich; Alexey Gurevich; Tatiana Krivosheeva; Sergey Nurk; Son K. Pham; Anton Korobeynikov; Alla Lapidus; Pavel A. Pevzner
Next-generation sequencing (NGS) technologies have raised a challenging de novo genome assembly problem that is further amplified in recently emerged single-cell sequencing projects. While various NGS assemblers can use information from several libraries of read-pairs, most of them were originally developed for a single library and do not fully benefit from multiple libraries. Moreover, most assemblers assume uniform read coverage, condition that does not hold for single-cell projects where utilization of read-pairs is even more challenging. We have developed an exSPAnder algorithm that accurately resolves repeats in the case of both single and multiple libraries of read-pairs in both standard and single-cell assembly projects. Availability and implementation: http://bioinf.spbau.ru/en/spades Contact: [email protected]
Bioinformatics | 2016
Dmitry Antipov; Anton Korobeynikov; Jeffrey S. McLean; Pavel A. Pevzner
MOTIVATION Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. RESULTS We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. AVAILABILITY AND IMPLEMENTATION hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades CONTACT [email protected] SUPPLEMENTARY INFORMATION supplementary data are available at Bioinformatics online.
Journal of Statistical Software | 2015
Nina Golyandina; Anton Korobeynikov; Alex Shlemov; Konstantin Usevich
Implementation of multivariate and 2D extensions of singular spectrum analysis (SSA) by means of the R package Rssa is considered. The extensions include MSSA for simultaneous analysis and forecasting of several time series and 2D-SSA for analysis of digital images. A new extension of 2D-SSA analysis called shaped 2D-SSA is introduced for analysis of images of arbitrary shape, not necessary rectangular. It is shown that implementation of shaped 2D-SSA can serve as a basis for implementation of MSSA and other generalizations. Efficient implementation of operations with Hankel and Hankel-block-Hankel matrices through the fast Fourier transform is suggested. Examples with code fragments in R, which explain the methodology and demonstrate the proper use of Rssa, are presented.
Scientific Reports | 2015
Rajat Shuvro Roy; Dana C. Price; Alexander Schliep; Guohong Cai; Anton Korobeynikov; Hwan Su Yoon; Eun Chan Yang; Debashish Bhattacharya
A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative methods. One promising approach to study these lineages is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified total DNA. Here we tested the efficacy of SCG to generate a draft genome assembly from a single sample, in this case a cell belonging to the broadly distributed MAST-4 uncultured marine stramenopiles. Using de novo gene prediction, we identified 6,996 protein-encoding genes in the MAST-4 genome. This genetic inventory was sufficient to place the cell within the ToL using multigene phylogenetics and provided preliminary insights into the complex evolutionary history of horizontal gene transfer (HGT) in the MAST-4 lineage.
Microbiology | 2016
Michelle Schorn; Mohammad Alanjary; Kristen Aguinaldo; Anton Korobeynikov; Sheila Podell; Nastassia V. Patin; Tommie Lincecum; Paul R. Jensen; Nadine Ziemert; Bradley S. Moore
Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institutes Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.
Bioinformatics | 2015
Irina Vasilinetc; Andrey D. Prjibelski; Alexey Gurevich; Anton Korobeynikov; Pavel A. Pevzner
MOTIVATION Advances in Next-Generation Sequencing technologies and sample preparation recently enabled generation of high-quality jumping libraries that have a potential to significantly improve short read assemblies. However, assembly algorithms have to catch up with experimental innovations to benefit from them and to produce high-quality assemblies. RESULTS We present a new algorithm that extends recently described exSPAnder universal repeat resolution approach to enable its applications to several challenging data types, including jumping libraries generated by the recently developed Illumina Nextera Mate Pair protocol. We demonstrate that, with these improvements, bacterial genomes often can be assembled in a few contigs using only a single Nextera Mate Pair library of short reads. AVAILABILITY AND IMPLEMENTATION Described algorithms are implemented in C++ as a part of SPAdes genome assembler, which is freely available at bioinf.spbau.ru/en/spades. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.