Gene Myers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gene Myers is active.

Explore More

Publication

Featured researches published by Gene Myers.

Information Processing Letters | 1990

An O(NP) sequence comparison algorithm

Sun Wu; Udi Manber; Gene Myers; Webb Miller

Abstract Let A and B be two sequences of length M and N respectively, where without loss of generality N ⩾ M , and let D be the length of a shortest edit script (consisting of insertions and deletions) between them. A parameter related to D is the number of deletions in such a script, P= 1 2 D− 1 2 (N−M) . We present an algorithm for finding a shortest edit distance of A and B whose worst-case running time is O( NP ) and whose expected running time is O( N + PD ). The algorithm is simple and is very efficient whenever A is similar to a subsequence of B . It is nearly twice as fast as the O( ND ) algorithm of Myers, and much more efficient when A and B differ substantially in length.

Journal of the ACM | 1992

A Four Russians algorithm for regular expression pattern matching

Gene Myers

Given a regular expression <italic>R</italic> of length <italic>P</italic> and a word <italic>A</italic> of length <italic>N</italic>, the membership problem is to determine if <italic>A</italic> is in the language denoted by <italic>R</italic>. An <italic>O</italic>(<italic>PN</italic>/lg<italic>N</italic>) time algorithm is presented that is based on a lg<italic>N</italic> speedup of the standard <italic>O</italic>(<italic>PN</italic>) time simulation of <italic>R</italic>s nonderministic finite automaton on <italic>A</italic> using a combination of the node-listing and “Four-Russians” paradigms. This result places a new worst-case upper bound on regular expression pattern matching. Moreover, in practice the method provides an implementation that is faster than existing software for small regular expressions.

research in computational molecular biology | 1997

Progressive multiple alignment with constraints

Gene Myers; Sanford Selznick; Zheng Zhang; Webb Miller

A progressive alignment algorithm produces a multi-alignment of a set of sequences by repeatedly aligning pairs of sequences and/or previously generated alignments. We describe a method for guaranteeing that the alignment generated by a progressive alignment strategy satisfies a user-specified collection of constraints about where certain sequence positions should appear relative to others. Given a collection of constraints over sequences whose total length is , our algorithm takes time. An alignment of the -like globin gene clusters of several mammals illustrates t he practicality of the method.

combinatorial pattern matching | 1998

A fast bit-vector algorithm for approximate string matching based on dynamic programming

Gene Myers

The approximate string matching problem is to find all locations at which a query of length m matches a substring of a text of length n with k-or-fewer differences. Simple and practical bit-vector algorithms have been designed for this problem, most notably the one used in agrep. These algorithms compute a bit representation of the current state-set of the k-difference automaton for the query, and asymptotically run in O(nmk/w) time where w is the word size of the machine (e.g. 32 or 64 in practice). Here we present an algorithm of comparable simplicity that requires only O(nm/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the problem. Thus the algorithms performance is independent of k, and it is found to be more efficient than the previous results for many choices of k and small m.

Information Processing Letters | 1995

Approximately matching context-free languages

Gene Myers

Abstract Given a string w and a pattern p, approximate pattern matching merges traditional sequence comparison and pattern matching by asking for the minimum difference between w and a string exactly matched by p. We give an O(PN2(N + log P)) algorithm for approximately matching a string of length N and a context-free language specified by a grammar of size P. The algorithm generalizes the Cocke-Younger-Kasami algorithm for determining membership in a context-free language. We further sketch an O(P5N88p) algorithm for the problem where gap costs are concave and pose two open problems for such general comparison cost models.

combinatorial pattern matching | 1997

Estimating the Probability of Approximate Matches

Stefan Kurtz; Gene Myers

While considerable effort and some progress has been made on developing an analytic formula for the probability of an approximate match, such work has not achieved fruition [4, 6, 2, 1]. Therefore, we consider here the development of an unbiased estimation procedure for determining said probability given a specific string P ∈ Σ and a specific cost function δ for weighting edit operations. Problems of this type are of general interest, see for example a recent paper [5] giving an unbiased estimator for counting the words of a fixed length in a regular language. We were further motivated by a particular application arising in the pattern matching system Anrep designed by us for use in genomic sequence analysis [8, 11]. Anrep accomplishes a search for a complex pattern by backtracking over subprocedures that find approximate matches. The subpatterns are searched in an order that attempts to minimize the expected running time of the search. Determining this optimal backtrack order requires a reasonably accurate estimate of the probability with which one will find an approximate match to each subpattern. Given that the probabilities involved are frequently 10 or less, the simple expedient of measuring match frequency over a random text of several thousand characters has been less than satisfactory. The unbiased estimator herein is shown to give good results in a matter of a thousand samples even for small probability patterns. Thus it is expected to improve the performance of Anrep and may have utility in estimating the significance of similarity searches. Proceeding formally, suppose we are given

Journal of Computational Biology | 2010

Error Tolerant Indexing and Alignment of Short Reads with Covering Template Families

Eldar Giladi; John Healy; Gene Myers; Chris Hart; Philipp Kapranov; Doron Lipson; Steve Roels; Edward C. Thayer; Stan Letovsky

The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools—specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types—namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.

Journal of Computational Biology | 2003

A table-driven, full-sensitivity similarity search algorithm.

Gene Myers; Richard Durbin

Searching a database for a local alignment to a query under a typical scoring scheme, such as PAM120 or BLOSUM62 with affine gap costs, is a computation that has resisted algorithmic improvement due to its basis in dynamic programming and the weak nature of the signals being searched for. In a query preprocessing step, a set of tables can be built that permit one to (a) eliminate a large fraction of the dynamic programming matrix from consideration and (b) to compute several steps of the remainder with a single table lookup. While this result is not an asymptotic improvement over the original Smith-Waterman algorithm, its complexity is characterized in terms of some sparse features of the matrix and it yields the fastest software implementation to date for such searches.

Archive | 1996

Combinatorial pattern matching

Daniel S. Hirschberg; Gene Myers

This book is the proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, held June 10-12, 1996. It contains articles on the development and optimization of algorithms for the solution of problems related to pattern recognition, including in particular applications to DNA sequencing. Separate abstracts have been submitted for articles from this database.

symposium on discrete algorithms | 1990