Pavel A. Pevzner
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pavel A. Pevzner.
Journal of Computational Biology | 2012
Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey Gurevich; Mikhail Dvorkin; Alexander S. Kulikov; Valery M. Lesin; Sergey I. Nikolenko; Son Pham; Andrey D. Prjibelski; Alexey V. Pyshkin; Alexander V. Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A. Alekseyev; Pavel A. Pevzner
The lions share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
Proceedings of the National Academy of Sciences of the United States of America | 2001
Pavel A. Pevzner; Haixu Tang; Michael S. Waterman
For the last 20 years, fragment assembly in DNA sequencing followed the “overlap–layout–consensus” paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical “overlap–layout–consensus” approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old “repeat problem” in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.
intelligent systems in molecular biology | 2005
Alkes L. Price; Neil C. Jones; Pavel A. Pevzner
MOTIVATION De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. RESULTS Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. AVAILABILITY Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html
Journal of Computational Biology | 1999
Vlado Dančík; Theresa A. Addona; Karl R. Clauser; James E. Vath; Pavel A. Pevzner
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.
SIAM Journal on Computing | 1996
Vineet Bafna; Pavel A. Pevzner
Sequence comparison in molecular biology is in the beginning of a major paradigm shift---a shift from gene comparison based on local mutations (i.e., insertions, deletions, and substitutions of nucleotides) to chromosome comparison based on global rearrangements (i.e., inversions and transpositions of fragments). The classical methods of sequence comparison do not work for global rearrangements, and little is known in computer science about the edit distance between sequences if global rearrangements are allowed. In the simplest form, the problem of gene rearrangements corresponds to sorting by reversals, i.e., sorting of an array using reversals of arbitrary fragments. Recently, Kececioglu and Sankoff gave the first approximation algorithm for sorting by reversals with guaranteed error bound 2 and identified open problems related to chromosome rearrangements. One of these problems is Gollans conjecture on the reversal diameter of the symmetric group. This paper proves the conjecture. Further, the problem of expected reversal distance between two random permutations is investigated. The reversal distance between two random permutations is shown to be very close to the reversal diameter, thereby indicating that reversal distance provides a good separation between related and nonrelated sequences in molecular evolution studies. The gene rearrangement problem forces us to consider reversals of signed permutations, as the genes in DNA could be positively or negatively oriented. An approximation algorithm for signed permutation is presented, which provides a performance guarantee of
SIAM Journal on Discrete Mathematics | 1998
Vineet Bafna; Pavel A. Pevzner
{3 \over 2}
Proceedings of the National Academy of Sciences of the United States of America | 2003
Pavel A. Pevzner; Glenn Tesler
. Finally, using the signed permutations approach, an approximation algorithm for sorting by reversals is described which achieves a performance guarantee of
Nature Biotechnology | 2011
Phillip E. C. Compeau; Pavel A. Pevzner; Glenn Tesler
{7 \over 4}
symposium on the theory of computing | 1995
Sridhar Hannenhalli; Pavel A. Pevzner
.
Nature Communications | 2014
Sangtae Kim; Pavel A. Pevzner
Sequence comparison in computational molecular biology is a powerful tool for deriving evolutionary and functional relationships between genes. However, classical alignment algorithms handle only local mutations (i.e., insertions, deletions, and substitutions of nucleotides) and ignore global rearrangements (i.e., inversions and transpositions of long fragments). As a result, the applications of sequence alignment to analyze highly rearranged genomes (i.e., herpes viruses or plant mitochondrial DNA) are rather limited. The paper addresses the problem of genome comparison versus classical gene comparison and presents algorithms to analyze rearrangements in genomes evolving by transpositions. In the simplest form the problem corresponds to sorting by transpositions, i.e., sorting of an array using transpositions of arbitrary fragments. We derive lower bounds on {\em transposition distance} between permutations and present approximation algorithms for sorting by transpositions. The algorithms also imply a nontrivial upper bound on the transposition diameter of the symmetric group. Finally, we formulate two biological problems in genome rearrangements and describe the first {\em algorithmic} steps toward their solution.