Sun Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sun Wu is active.

Explore More

Publication

Featured researches published by Sun Wu.

Communications of The ACM | 1992

Fast text searching: allowing errors

Sun Wu; Udi Manber

T h e string-matching problem is a very c o m m o n problem. We are searching for a string P = PtP2. . Pro i n s i d e a la rge t ex t f i le T = t l t2. . . t . , b o t h sequences of characters from a f i n i t e character set Z. T h e characters may be English characters in a text file, DNA base pairs, lines of source code, angles between edges in polygons, machines or machine parts in a production schedule, music notes and tempo in a musical score, and so fo r th . We w a n t to f i n d a l l occurrences of P i n T; n a m e l y , we are searching for the set of starting posit ions F = {i[1 --i--n m + 1 s u c h t h a t titi+ l t i + m 1 = P } T h e two most famous algorithms for this problem are t h e B o y e r M o o r e algorithm [3] and t h e K n u t h Morris Pratt algorithm [10]. There are many extensions to t h i s problem; for example, we may be looking for a set of patterns, a pattern w i t h wi ld cards, or a regular expression. String-matching tools are included in every reasonable text editor, word processor, and many other applications.

combinatorial pattern matching | 1994

Proximity Matching Using Fixed-Queries Trees

Ricardo A. Baeza-Yates; Walter Cunto; Udi Manber; Sun Wu

We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.

Information Processing Letters | 1990

An O(NP) sequence comparison algorithm

Sun Wu; Udi Manber; Gene Myers; Webb Miller

Abstract Let A and B be two sequences of length M and N respectively, where without loss of generality N ⩾ M , and let D be the length of a shortest edit script (consisting of insertions and deletions) between them. A parameter related to D is the number of deletions in such a script, P= 1 2 D− 1 2 (N−M) . We present an algorithm for finding a shortest edit distance of A and B whose worst-case running time is O( NP ) and whose expected running time is O( N + PD ). The algorithm is simple and is very efficient whenever A is similar to a subsequence of B . It is nearly twice as fast as the O( ND ) algorithm of Myers, and much more efficient when A and B differ substantially in length.

Algorithmica | 1996

A subquadratic algorithm for approximate limited expression matching

Sun Wu; Udi Manber; G. Myers

In this paper we present an efficient subquadratic-time algorithm for matching strings and limited expressions in large texts. Limited expressions are a subset of regular expressions that appear often in practice. The generalization from simple strings to limited expressions has a negligible affect on the speed of our algorithm, yet allows much more flexibility. Our algorithm is similar in spirit to that of Masek and Paterson [MP], but it is much faster in practice. Our experiments show a factor of four to five speedup against the algorithms of Sellers [Se] and Ukkonen [Uk1] independent of the sizes of the input strings. Experiments also reveal our algorithm to be faster, in most cases, than a recent improvement by Chang and Lampe [CL2], especially for small alphabet sizes for which it is two to three times faster.

Information Processing Letters | 1994

An algorithm for approximate membership checking with application to password security

Udi Manber; Sun Wu

Abstract Given a large set of words W , we want to be able to determine quickly whether a query word q is close to any word in the set. A new data structure is presented that allows such queries to be answered very quickly even for huge sets if the words are not too long and the query is quite close. The major application is in limiting password guessing by verifying, before a password is approved, that the password is not too close to a dictionary word. Other applications include spelling correction of bibliographic files and approximate matching.

Journal of Algorithms | 1995

A subquadratic algorithm for approximate regular expression matching

Sun Wu; Udi Manber; Eugene W. Myers

Abstract The main result of this paper is an algorithm for approximate matching of a regular expression of size m in a text of size n in time O ( nm /log d +2 n ), where d is the number of allowed errors. This algorithm is the first o ( mn ) algorithm for approximate matching to regular expressions.

Operations Research Letters | 1992

An algorithm for min-cost edge-disjoint cycles and its applications

Sun Wu; Udi Manber

The problems of finding minimum-cost and maximum-cost sets of edge-disjoint cycles in a weighted undirected graph are studied. The importance of this problem is that it presents a middle station in two reductions for planar graphs - one between the max-cut problem and that of max-weight matching, and the other between the Chinese Postman Problem and max-weight matching. The introduction of negative edge costs makes the reductions simple and efficient. We obtain new simpler algorithms for these two problems for planar graphs (where the max-weight matching problem can be solved very efficiently). We conclude that, in the case of planar weighted graphs (with arbitrary costs), all the three problems are mutually reducible and equivalent in terms of complexity.

Archive | 1999