Michael S. Waterman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael S. Waterman is active.

Explore More

Publication

Featured researches published by Michael S. Waterman.

Journal of Molecular Biology | 1981

Identification of common molecular subsequences

Temple F. Smith; Michael S. Waterman

The identification of maximally homologous subsequences among sets of long sequences is an important problem in molecular sequence analysis. The problem is straightforward only if one restricts consideration to contiguous subsequences (segments) containing no internal deletions or insertions. The more general problem has its solution in an extension of sequence metrics (Sellers 1974; Waterman et al., 1976) developed to measure the minimum number of “events” required to convert one sequence into another. These developments in the modern sequence analysis began with the heuristic homology algorithm of Needleman & Wunsch (1970) which first introduced an iterative matrix method of calculation. Numerous other heuristic algorithms have been suggested including those of Fitch (1966) and Dayhoff (1969). More mathematically rigorous algorithms were suggested by Sankoff (1972), Reichert et al. (1973) and Beyer et al. (1979) but these were generally not biologically satisfying or interpretable. Success came with Sellers (1974) development of a true metric measure of the distance between sequences. This metric was later generalized by Waterman et al. (1976) to include deletions/insertions of arbitrary length. This metric represents the minimum number of “mutational events” required to convert one sequence into another. It is of interest to note that Smith et al. (1980) have recently shown that under some conditions the generalized Sellers metric is equivalent to the original homology algorithm of Needleman & Wunsch (1970). In this letter we extend the above ideas to find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity (homology). The similarity measure used here allows for arbitrary length deletions and insertions.

Advances in Applied Mathematics | 1981

Comparison of biosequences

Temple F. Smith; Michael S. Waterman

Homology and distance measures have been routinely used to compare two biological sequences, such as proteins or nucleic acids. The homology measure of Needleman and Wunsch is shown, under general conditions, to be equivalent to the distance measure of Sellers. A new algorithm is given to find similar pairs of segments, one segment from each sequence. The new algorithm, based on homology measures, is compared to an earlier one due to Sellers.

Proceedings of the National Academy of Sciences of the United States of America | 2001

An Eulerian path approach to DNA fragment assembly

Pavel A. Pevzner; Haixu Tang; Michael S. Waterman

For the last 20 years, fragment assembly in DNA sequencing followed the “overlap–layout–consensus” paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical “overlap–layout–consensus” approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old “repeat problem” in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.

Archive | 1995

Introduction to Computational Biology

Michael S. Waterman

Introduction to computational biology , Introduction to computational biology , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

Proceedings of the National Academy of Sciences of the United States of America | 2002

A dynamic programming algorithm for haplotype block partitioning

Kui Zhang; Minghua Deng; Ting Chen; Michael S. Waterman; Fengzhu Sun

We develop a dynamic programming algorithm for haplotype block partitioning to minimize the number of representative single nucleotide polymorphisms (SNPs) required to account for most of the common haplotypes in each block. Any measure of haplotype quality can be used in the algorithm and of course the measure should depend on the specific application. The dynamic programming algorithm is applied to analyze the chromosome 21 haplotype data of Patil et al. [Patil, N., Berno, A. J., Hinds, D. A., Barrett, W. A., Doshi, J. M., Hacker, C. R., Kautzer, C. R., Lee, D. H., Marjoribanks, C., McDonough, D. P., et al. (2001) Science 294, 1719–1723], who searched for blocks of limited haplotype diversity. Using the same criteria as in Patil et al., we identify a total of 3,582 representative SNPs and 2,575 blocks that are 21.5% and 37.7% smaller, respectively, than those identified using a greedy algorithm of Patil et al. We also apply the dynamic programming algorithm to the same data set based on haplotype diversity. A total of 3,982 representative SNPs and 1,884 blocks are identified to account for 95% of the haplotype diversity in each block.

Advances in Mathematics | 1976

Some biological sequence metrics

Michael S. Waterman; Temple F. Smith; W. A. Beyer

Abstract Some new metrics are introduced to measure the distance between biological sequences, such as amino acid sequences or nucleotide sequences. These metrics generalize a metric of Sellers, who considered only single deletions, mutations, and insertions. The present metrics allow, for example, multiple deletions and insertions and single mutations. They also allow computation of the distance among more than two sequences. Algorithms for computing the values of the metrics are given which also compute best alignments. The connection with the information theory approach of Reichert, Cohen, and Wong is discussed.

Journal of Molecular Biology | 1987

A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons

Michael S. Waterman; Mark Eggert

The algorithm of Smith & Waterman for identification of maximally similar subsequences is extended to allow identification of all non-intersecting similar subsequences with similarity score at or above some preset level. The resulting alignments are found in order of score, with the highest scoring alignment first. In the case of single gaps or multiple gaps weighted linear with gap length, the algorithm is extremely efficient, taking very little time beyond that of the initial calculation of the matrix. The algorithm is applied to comparisons of tRNA-rRNA sequences from Escherichia coli. A statistical analysis is important for proper evaluation of the results, which differ substantially from the results of an earlier analysis of the same sequences by Bloch and colleagues.

Journal of Molecular Biology | 1994

Sequence alignment and penalty choice. Review of concepts, case studies and implications.

Martin Vingron; Michael S. Waterman

Alignment algorithms to compare DNA or amino acid sequences are widely used tools in molecular biology. The algorithms depend on the setting of various parameters, most notably gap penalties. The effect that such parameters have on the resulting alignments is still poorly understood. This paper begins by reviewing two recent advances in algorithms and probability that enable us to take a new approach to this question. The first tool we introduce is a newly developed method to delineate efficiently all optimal alignments arising under all choices of parameters. The second tool comprises insights into the statistical behavior of optimal alignment scores. From this we gain a better understanding of the dependence of alignments on parameters in general. We propose novel criteria to detect biologically good alignments and highlight some specific features about the interaction between similarity matrices and gap penalties. To illustrate our analysis we present a detailed study of the comparison of two immunoglobulin sequences.

Journal of Computational Biology | 1995

A new algorithm for DNA sequence assembly.

Ramana M. Idury; Michael S. Waterman

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.

Bellman Prize in Mathematical Biosciences | 1978

RNA Secondary Structure: A Complete Mathematical Analysis

Michael S. Waterman; T.F. Smith

Using a rigorous mathematical analysis, the prediction of RNA secondary structure as a function of free energy is obtained. The iterative method effectively allows a search over the entire configuration space of the RNA molecule not possible by earlier methods. The approach also allows for the direct inclusion of the nearest neighbor or stacking energies.

Explore More