Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jonathan D. Cohen is active.

Publication


Featured researches published by Jonathan D. Cohen.


Journal of the Association for Information Science and Technology | 1995

Highlights: language- and domain-independent automatic indexing terms for abstracting

Jonathan D. Cohen

A method of drawing index terms from text is presented. The approach uses no stop list, stemmer, or other language‐ and domain‐specific component, allowing operation in any language or domain with only trivial modification. The method uses n‐gram counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, which the author calls “highlights,” are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Some experimental results are presented, showing operation in English, Spanish, German, Georgian, Russian, and Japanese.


ACM Transactions on Information Systems | 1997

Recursive hashing functions for n -grams

Jonathan D. Cohen

Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of implementing such operations is through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a recursive hash function, in which the hash value of the current n-gram is drived from the hash value of its predecessor. This article generalizes recursive hash functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial speed improvement over conventional approaches, while retaining near-ideal hash value distribution.


Information Processing and Management | 1998

Hardware-assisted algorithm for full-text large-dictionary string matching using N -gram hashing

Jonathan D. Cohen

Abstract A method of full-text scanning for matches in a large dictionary is described. The method is suitable for SDI (selective dissemination of information) systems, accommodating large dictionaries (10 4 –10 5 entries) and typical digital data rates (tens of megabytes per second or more). It can be implemented on a single commercially-available board hosted by a personal computer or entirely in software. The preferred approach employs a hardware primary test, followed by a software secondary test. The algorithm is described in detail, the implementation is sketched, and simulation results are presented.


Software - Practice and Experience | 1998

An n-gram hash and skip algorithm for finding large numbers of keywords in continuous text streams

Jonathan D. Cohen

A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 104 to 105 entries), and to arbitrary byte streams for both patterns and data samples. The approach involves a sequence of tests, beginning with Boyer–Moore–Horspool skipping on digrams, followed by a succession of hash tests, and completed by trie searching, the combination of which is quite fast. Background information is provided, the algorithm and its implementation are described in detail, and experimental results are presented. In particular, tests suggest that the proposed method outperforms the algorithms of Aho–Corasick and Commentz–Walter when implementing large dictionaries.


Journal of the Association for Information Science and Technology | 1999

Massive query resolution for rapid selective dissemination of information

Jonathan D. Cohen

The tasks of choosing documents from a new collection and categorizing the choices, both on the basis of a body of standing queries, are known variously as selection and routing, selective dissemination of information (SDI), and information filtering. The combined operation of selecting and labeling documents naturally separates into two processes: feature scanning and query resolution. The first process examines a document for features and their locations; the second takes the findings from the first process, looks for satisfaction of combinations specified in the queries, and marks the document accordingly. When the body of queries is large, query resolution can become a significant factor in total processing speed. This paper outlines an efficient approach to performing query resolution on massive Boolean queries, suitable for implementation on a desktop computer. Algorithms are sketched in pseudo-code and experimental results are reported.


1982 Technical Symposium East | 1982

Frequency Division Multiplexing Optical Processors

Jonathan D. Cohen

This paper describes a method by which optical time- and space-integrating processors may be generalized and expanded by accommodating vector inputs. Many architectures using this frequency division multiplexing approach are presented, demonstrating new operations and generalizations of old ones.


1982 Technical Symposium East | 1982

Real Time Space Integrating Optical Ambiguity Processor

Jonathan D. Cohen

An optical ambiguity processor is described which allows realtime processing of wideband signals. One-dimensional acousto-optic cells are used as input transducers and no light-to-light modulators are required. Performance is predicted and measured. The material in this paper is distilled from the authors masters thesis of spring, 19801.


Computing in Science and Engineering | 2009

Graph Twiddling in a MapReduce World

Jonathan D. Cohen


Journal of the Association for Information Science and Technology | 2000

Visualizing document classification: a search aid for the digital library

Yew-Huey Liu; Paul M. Dantzig; Martin William Sachs; James T. Corey; Mark T. Hinnebusch; Marc Damashek; Jonathan D. Cohen


Archive | 1988

Microwave and millimeter-wave spectrum analyzer

Jonathan D. Cohen

Collaboration


Dive into the Jonathan D. Cohen's collaboration.

Researchain Logo
Decentralizing Knowledge