Max A. Alekseyev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Max A. Alekseyev is active.

Explore More

Publication

Featured researches published by Max A. Alekseyev.

Journal of Computational Biology | 2012

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey Gurevich; Mikhail Dvorkin; Alexander S. Kulikov; Valery M. Lesin; Sergey I. Nikolenko; Son Pham; Andrey D. Prjibelski; Alexey V. Pyshkin; Alexander V. Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A. Alekseyev; Pavel A. Pevzner

The lions share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

BMC Genomics | 2013

BayesHammer: Bayesian clustering for error correction in single-cell sequencing

Sergey I. Nikolenko; Anton Korobeynikov; Max A. Alekseyev

Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYES HAMMER. While BAYES HAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYES HAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.

Genome Research | 2009

Breakpoint graphs and ancestral genome reconstructions

Max A. Alekseyev; Pavel A. Pevzner

Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.

Theoretical Computer Science | 2008

Multi-break rearrangements and chromosomal evolution

Max A. Alekseyev; Pavel A. Pevzner

Most genome rearrangements (e.g., reversals and translocations) can be represented as 2-breaks that break a genome at 2 points and glue the resulting fragments in a new order. Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While multi-break rearrangements were studied in depth for k=2 breaks, the k-break distance problem for arbitrary k remains unsolved. We prove a duality theorem for multi-break distance problem and give a polynomial algorithm for computing this distance.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2007

Colored de Bruijn Graphs and the Genome Halving Problem

Max A. Alekseyev; Pavel A. Pevzner

Breakpoint graph analysis is a key algorithmic technique in studies of genome rearrangements. However, breakpoint graphs are defined only for genomes without duplicated genes, thus limiting their applications in rearrangement analysis. We discuss a connection between the breakpoint graphs and de Bruijn graphs that leads to a generalization of the notion of breakpoint graph for genomes with duplicated genes. We further use the generalized breakpoint graphs to study the genome halving problem (first introduced and solved by Nadia El-Mabrouk and David Sankoff). The El-Mabrouk-Sankoff algorithm is rather complex, and, in this paper, we present an alternative approach that is based on generalized breakpoint graphs. The generalized breakpoint graphs make the El-Mabrouk-Sankoff result more transparent and promise to be useful in future studies of genome rearrangements

PLOS Computational Biology | 2005

Are There Rearrangement Hotspots in the Human Genome

Max A. Alekseyev; Pavel A. Pevzner

In a landmark paper, Nadeau and Taylor [18] formulated the random breakage model (RBM) of chromosome evolution that postulates that there are no rearrangement hotspots in the human genome. In the next two decades, numerous studies with progressively increasing levels of resolution made RBM the de facto theory of chromosome evolution. Despite the fact that RBM had prophetic prediction power, it was recently refuted by Pevzner and Tesler [4], who introduced the fragile breakage model (FBM), postulating that the human genome is a mosaic of solid regions (with low propensity for rearrangements) and fragile regions (rearrangement hotspots). However, the rebuttal of RBM caused a controversy and led to a split among researchers studying genome evolution. In particular, it remains unclear whether some complex rearrangements (e.g., transpositions) can create an appearance of rearrangement hotspots. We contribute to the ongoing debate by analyzing multi-break rearrangements that break a genome into multiple fragments and further glue them together in a new order. In particular, we demonstrate that (1) even if transpositions were a dominant force in mammalian evolution, the arguments in favor of FBM still stand, and (2) the “gene deletion” argument against FBM is flawed.

Genome Biology | 2010

Comparative genomics reveals birth and death of fragile regions in mammalian evolution

Max A. Alekseyev; Pavel A. Pevzner

BackgroundAn important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals raised some doubts about their existence.ResultsHere we demonstrate that fragile regions are subject to a birth and death process, implying that fragility has a limited evolutionary lifespan.ConclusionsThis finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions as a phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.

Journal of Computational Biology | 2008

Multi-Break Rearrangements and Breakpoint Re-Uses: From Circular to Linear Genomes

Max A. Alekseyev

Multi-break rearrangements break a genome into multiple fragments and further glue them together in a new order. While 2-break rearrangements represent standard reversals, fusions, fissions, and translocations, 3-break rearrangements represent a natural generalization of transpositions. Alekseyev and Pevzner (2007a, 2008a) studied multi-break rearrangements in circular genomes and further applied them to the analysis of chromosomal evolution in mammalian genomes. In this paper, we extend these results to the more difficult case of linear genomes. In particular, we give lower bounds for the rearrangement distance between linear genomes and for the breakpoint re-use rate as functions of the number and proportion of transpositions. We further use these results to analyze comparative genomic architecture of mammalian genomes.

Journal of Computational Biology | 2013

Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly.

Son K. Pham; Dmitry Antipov; Alexander V. Sirotkin; Glenn Tesler; Pavel A. Pevzner; Max A. Alekseyev

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.

SIAM Journal on Computing | 2007

Whole Genome Duplications and Contracted Breakpoint Graphs

Max A. Alekseyev; Pavel A. Pevzner

The genome halving problem, motivated by the whole genome duplication events in molecular evolution, was solved by El-Mabrouk and Sankoff in the pioneering paper [SIAM J. Comput., 32 (2003), pp. 754-792]. The El-Mabrouk-Sankoff algorithm is rather complex, inspiring a quest for a simpler solution. An alternative approach to the genome halving problem based on the notion of the contracted breakpoint graph was recently proposed in [M. A. Alekseyev and P. A. Pevzner, IEEE/ACM Trans. Comput. Biol. Bioinformatics, 4 (2007), pp. 98-107]. This new technique reveals that while the El-Mabrouk-Sankoff result is correct in most cases, it does not hold in the case of unichromosomal genomes. This raises a problem of correcting a flaw in the El-Mabrouk-Sankoff analysis and devising an algorithm that deals adequately with all genomes. In this paper we efficiently classify all genomes into two classes and show that while the El-Mabrouk-Sankoff theorem holds for the first class, it is incorrect for the second class. The crux of our analysis is a new combinatorial invariant defined on duplicated permutations. Using this invariant we were able to come up with a full proof of the genome halving theorem and a polynomial algorithm for the genome halving problem.

Explore More