Oren Kapah | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oren Kapah is active.

Explore More

Publication

Featured researches published by Oren Kapah.

international conference on computational linguistics | 2004

Efficient unsupervised recursive word segmentation using minimum description length

Shlomo Argamon; Navot Akiva; Amihood Amir; Oren Kapah

Automatic word segmentation is a basic requirement for unsupervised learning in morphological analysis. In this paper, we formulate a novel recursive method for minimum description length (MDL) word segmentation, whose basic operation is resegmenting the corpus on a prefix (equivalently, a suffix). We derive a local expression for the change in description length under resegmentation, i.e., one which depends only on properties of the specific prefix (not on the rest of the corpus). Such a formulation permits use of a new and efficient algorithm for greedy morphological segmentation of the corpus in a recursive manner. In particular, our method does not restrict words to be segmented only once, into a stem+affix form, as do many extant techniques. Early results for English and Turkish corpora are promising.

SIAM Journal on Computing | 2009

On the Cost of Interchange Rearrangement in Strings

Amihood Amir; Tzvika Hartman; Oren Kapah; Avivit Levy; Ely Porat

Consider the following optimization problem: given two strings over the same alphabet, transform one into another by a succession of interchanges of two elements. In each interchange the two participating elements exchange positions. An interchange is given a weight that depends on the distance in the string between the two exchanged elements. The object is to minimize the total weight of the interchanges. This problem is a generalization of a classical problem on permutations (where every element appears once). The generalization considers general strings with possibly repeating elements, and a function assigning weights to the interchanges. The generalization to general strings (with unit weights) was mentioned by Cayley in the 19th century, and its complexity has been an open question since. We solve this open problem and consider various weight functions as well.

Theoretical Computer Science | 2008

Generalized LCS

Amihood Amir; Tzvika Hartman; Oren Kapah; B. Riva Shalom; Dekel Tsur

The Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we study the Longest Common Substructure of two matrices and show that this problem is NP-hard. We also study the Longest Common Subforest problem for multiple trees including a constrained version, as well. We show NP-hardness for k>2 unordered trees in the constrained LCS. We also give polynomial time algorithms for ordered trees and prove a lower bound for any decomposition strategy for k trees.

combinatorial pattern matching | 2004

Faster two dimensional pattern matching with rotations

Amihood Amir; Oren Kapah; Dekel Tsur

The most efficient currently known algorithms for two dimensional matching with rotation have a worst case time complexity of O(n 2 m 3), where the size of the text is n 2 and the size of the pattern is m 2. In this paper we present two algorithms for the two dimensional rotated matching problem whose running time is O(n 2 m 2). The preprocessing time of the first algorithms is O(m 5) and the preprocessing time of the second algorithm is O(m 4).

combinatorial pattern matching | 2008

Approximate String Matching with Address Bit Errors

Amihood Amir; Yonatan Aumann; Oren Kapah; Avivit Levy; Ely Porat

A string Si¾? Σmcan be viewed as a set of pairs S= { (i¾? i , i) : ii¾? { 0,..., mi¾? 1} }. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (i¾? i ). In this paper, we consider the case where bits of imay be erroneously flipped, either in a consistent or transient manner. We formally define the corresponding approximate pattern matching problems, and provide efficient algorithms for their resolution, while introducing some novel techniques.

european symposium on algorithms | 2007

On the cost of interchange rearrangement in strings

Amihood Amir; Tzvika Hartman; Oren Kapah; Avivit Levy; Ely Porat

An underlying assumption in the classical sorting problem is that the sorter does not know the index of every element in the sorted array. Thus, comparisons are used to determine the order of elements, while the sorting is done by interchanging elements. In the closely related interchange rearrangement problem, final positions of elements are already given, and the cost of the rearrangement is the cost of the interchanges. This problem was studied only for the limited case of permutation strings, where every element appears once. This paper studies a generalization of the classical and well-studied problem on permutations by considering general strings input, thus solving an open problem of Cayley from 1849, and examining various cost models.

Theoretical Computer Science | 2006

Faster two-dimensional pattern matching with rotations

Amihood Amir; Oren Kapah; Dekel Tsur

The most efficient currently known algorithms for two-dimensional pattern matching with rotations have a worst case time complexity of O(n^2m^3), where the size of the text is nxn and the size of the pattern is mxm. In this paper we present a new algorithm for the problem whose running time is O(n^2m^2).

Theoretical Computer Science | 2009

Interchange rearrangement: The element-cost model

Oren Kapah; Gad M. Landau; Avivit Levy; Nitsan Oz

Abstract Given an input string S and a target string T when S is a permutation of T , the interchange rearrangement problem is to apply on S a sequence of interchanges, such that S is transformed into T . The interchange operation exchanges the position of the two elements on which it is applied. The goal is to transform S into T at the minimum cost possible, referred to as the distance between S and T . The distance can be defined by several cost models that determine the cost of every operation. There are two known models: The Unit-cost model and the Length-cost model. In this paper, we suggest a natural cost model: The Element-cost model. In this model, the cost of an operation is determined by the elements that participate in it. Though this model has been studied in other fields, it has never been considered in the context of rearrangement problems. We consider both the special case where all elements in S and T are distinct, referred to as a permutation string, and the general case, referred to as a general string. An efficient optimal algorithm for the permutation string case and efficient approximation algorithms for the general string case, which is N P -hard, are presented. The study is broadened to include the transposition rearrangement problem under the Element-cost model and under the other known models, in order to provide additional perspective on the new model.

combinatorial pattern matching | 2007

Deterministic length reduction: fast convolution in sparse data and applications

Amihood Amir; Oren Kapah; Ely Porat

In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. The proposed algorithm performs the convolution in O(n1 log3 n1), where n1 is the number of non-zero values in V1. This algorithm assumes that V1 is given in advance, and the V2 is given in running time.

string processing and information retrieval | 2008

Interchange Rearrangement: The Element-Cost Model

Oren Kapah; Gad M. Landau; Avivit Levy; Nitsan Oz

Given an input string S and a target string T when S is a permutation of T , the interchange rearrangement problem is to apply on S a sequence of interchanges, such that S is transformed into T . The interchange operation exchanges the position of the two elements on which it is applied. The goal is to transform S into T at the minimum cost possible, referred to as the distance between S and T . The distance can be defined by several cost models that determine the cost of every operation. There are two known models: The Unit-cost model and the Length-cost model . In this paper, we suggest a natural cost model: The Element-cost model . In this model, the cost of an operation is determined by the elements that participate in it. Though this model has been studied in other fields, it has never been considered in the context of rearrangement problems. We consider both the special case where all elements in S and T are distinct, referred to as a permutation string , and the general case, referred to as a general string . An efficient optimal algorithm for the permutation string case and efficient approximation algorithms for the general string case, which is

Explore More