Yaacov Choueka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yaacov Choueka is active.

Explore More

Publication

Featured researches published by Yaacov Choueka.

Journal of Computer and System Sciences | 1974

Theories of automata on ω-tapes: A simplified approach

Yaacov Choueka

Using a combinatorial lemma on regular sets, and a technique of attaching a control unit to a parallel battery of finite automata, a simple and transparent development of McNaughtons theory of automata on @w-tapes is given. The lemma and the technique are then used to give an independent and equally simple development of Buchis theory of nondeterministic automata on these tapes. Some variants of these models are also studied. Finally a third independent approach, modelled after a simplified version of Rabins theory of automata on infinite trees, is developed.

Journal of Computer and System Sciences | 1978

Finite automata, definable sets, and regular expressions over ωn-tapes

Yaacov Choueka

Abstract The theory of finite automata and regular expressions over a finite alphabet Σ is here generalized to infinite tapes X = X 1 … X k , where X i , are themselves tapes of length ω n , for some n ⩾ 0. Closure under the usual set-theoretical operations is established, and the equivalence of deterministic and nondeterministic automata is proved. A Kleene-type characterization of the definable sets is given and finite-length generalized regular expressions are developed for finitely denoting these sets. Decision problems are treated; a characterization Of regular tapes by multiperiodic sets is specified. Characterization by equivalence relations is discussed while stressing dissimilarities with the finite case.

international acm sigir conference on research and development in information retrieval | 1986

Improved hierarchical bit-vector compression in document retrieval systems

Aviezri S. Fraenkel; Shmuel T. Klein; Yaacov Choueka; E. Segal

The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector <italic>v<subscrpt>i</subscrpt></italic> into equi-sized blocks, constructing a new bit-vector <italic>v<subscrpt>i</subscrpt></italic>+1 which points to the non-zero blocks in <italic>v<subscrpt>i</subscrpt></italic>, dropping the zero-blocks of <italic>v<subscrpt>i</subscrpt></italic>, and repeating the process for <italic>v<subscrpt>i</subscrpt></italic>+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.

international acm sigir conference on research and development in information retrieval | 1985

Efficient variants of Huffman codes in high level languages

Yaacov Choueka; Shmuel T. Klein; Yehoshua Perl

Although it is well-known that Huffman Codes are optimal for text compression in a character-per-character encoding scheme, they are seldom used in practical situations since they require a bit-per-bit decoding algorithm, which has to be written in some assembly language, and will perform rather slowly. A number of methods are presented that avoid these difficulties. The decoding algorithms efficiently process the encoded string on a byte-per-byte basis, are faster than the original algorithm, and can be programmed in any high level language. This is achieved at the cost of storing some tables in the internal memory, but with no loss in the compression savings of the optimal Huffman codes. The internal memory space needed can be reduced either at the cost of increased processing time, or by using non-binary Huffman codes, which give sub-optimal compression. Experimental results for English and Hebrew text are also presented.

international acm sigir conference on research and development in information retrieval | 1988

Compression of concordances in full-text retrieval systems

Yaacov Choueka; Aviezri S. Fraenkel; Shmuel T. Klein

The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.

International Journal of Computer Vision | 2011

Identifying Join Candidates in the Cairo Genizah

Lior Wolf; Rotem Littman; Naama Mayer; Tanya German; Nachum Dershowitz; Roni Shweka; Yaacov Choueka

A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. The task of finding joins is currently conducted manually by experts, and presumably only a small fraction of the existing joins have been discovered. In this work, we study the problem of automatically finding candidate joins, so as to streamline the task. The proposed method is based on a combination of local descriptors and learning techniques. To evaluate the performance of various join-finding methods, without relying on the availability of human experts, we construct a benchmark dataset that is modeled on the Labeled Faces in the Wild benchmark for face recognition. Using this benchmark, we evaluate several alternative image representations and learning techniques. Finally, a set of newly-discovered join-candidates have been identified using our method and validated by a human expert.

Journal of the ACM | 1978

KEDMA—Linguistic Tools for Retrieval Systems

R. Attar; Yaacov Choueka; Nachum Dershowitz; Aviezri S. Fraenkel

In a full-text natural-language retrieval system, frequent need for automatic hngulst~c analysis arises, e.g for keyword expansion in a search process, content analysis, or automatic construction of concordances The avadablhty of sophisticated hngulstic tools, which is highly desirable for languages such as Enghsh, is quite imperative for, say, Semmc languages, whose complex morphological structure renders simple-minded and approximate soluuons such as suffix stripping totally useless. Sophisticated tools were designed and constructed via the fusion of grammatical analysis and grammatical synthesis, resulting in a set of global files which provide in some sense a complete grammatical and lexlcal description of the language These files induce a set of local files which adapt to the database at hand and permit flexible on-hne morphological analysis.

Archive | 2000

A comprehensive bilingual word alignment system

Yaacov Choueka; Ehud S. Conley; Ido Dagan

This chapter describes a general, comprehensive and robust word-alignment system and its application to the Hebrew-English language pair. A major goal of the system architecture is to assume as little as possible about its input and about the relative nature of the two languages, while allowing the use of (minimal) specific monolingual pre-processing resources when required. The system thus receives as input a pair of raw parallel texts and requires only a tokeniser (and possibly a lemmatiser) for each language. After tokenisation (and lemmatisation if necessary), a rough initial alignment is obtained for the texts using a version of Fung and McKeown’s DK-vec algorithm (Fung und McKeown, 1997; Fung, this volume). The initial alignment is given as input to a version of the word_ align algorithm (Dagan, Church and Gale, 1993), an extension of Model 2 in the IBM statistical translation model. Word_align produces a word level alignment for the texts and a probabilistic bilingual dictionary. The chapter describes the details of the system architecture, the algorithms implemented (emphasising implementation details), the issues regarding their application to Hebrew and similar Semitic languages, and some experimental results.

international conference on image processing | 2011

Computerized paleography: Tools for historical manuscripts

Lior Wolf; Liza Potikha; Nachum Dershowitz; Roni Shweka; Yaacov Choueka

The Digital Age has brought with it large-scale digitization of historical records. The modern scholar of history or of other disciplines is often faced today with hundreds of thousands of readily-available and potentially-relevant full or fragmentary documents, but without computer aids that would make it possible to find the sought-after needles in the proverbial haystack of online images. The problems are even more acute when documents are handwritten, since optical character recognition does not provide quality results. We consider two tools: (1) a handwriting matching tool that is used to join together fragments of the same scribe, and (2) a paleographic classification tool that matches a given document to a large set of paleographic samples. Both tools are carefully designed not only to provide a high level of accuracy, but also to provide a clean and concise justification of the inferred results. This last requirement engenders challenges, such as sparsity of the representation, for which existing solutions are inappropriate for document analysis.

international conference on computer vision | 2009

Automatically identifying join candidates in the Cairo Genizah

Lior Wolf; Rotem Littman; Naama Mayer; Nachum Dershowitz; Roni Shweka; Yaacov Choueka

A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 250,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an onging effort to document and catalogue all extant fragments.

Explore More