Jonathan D. Cohen
National Security Agency
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan D. Cohen.
Journal of the Association for Information Science and Technology | 1995
Jonathan D. Cohen
A method of drawing index terms from text is presented. The approach uses no stop list, stemmer, or other language‐ and domain‐specific component, allowing operation in any language or domain with only trivial modification. The method uses n‐gram counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, which the author calls “highlights,” are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Some experimental results are presented, showing operation in English, Spanish, German, Georgian, Russian, and Japanese.
ACM Transactions on Information Systems | 1997
Jonathan D. Cohen
Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of implementing such operations is through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a recursive hash function, in which the hash value of the current n-gram is drived from the hash value of its predecessor. This article generalizes recursive hash functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial speed improvement over conventional approaches, while retaining near-ideal hash value distribution.
Information Processing and Management | 1998
Jonathan D. Cohen
Abstract A method of full-text scanning for matches in a large dictionary is described. The method is suitable for SDI (selective dissemination of information) systems, accommodating large dictionaries (10 4 –10 5 entries) and typical digital data rates (tens of megabytes per second or more). It can be implemented on a single commercially-available board hosted by a personal computer or entirely in software. The preferred approach employs a hardware primary test, followed by a software secondary test. The algorithm is described in detail, the implementation is sketched, and simulation results are presented.
Software - Practice and Experience | 1998
Jonathan D. Cohen
A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 104 to 105 entries), and to arbitrary byte streams for both patterns and data samples. The approach involves a sequence of tests, beginning with Boyer–Moore–Horspool skipping on digrams, followed by a succession of hash tests, and completed by trie searching, the combination of which is quite fast. Background information is provided, the algorithm and its implementation are described in detail, and experimental results are presented. In particular, tests suggest that the proposed method outperforms the algorithms of Aho–Corasick and Commentz–Walter when implementing large dictionaries.
Journal of the Association for Information Science and Technology | 1999
Jonathan D. Cohen
The tasks of choosing documents from a new collection and categorizing the choices, both on the basis of a body of standing queries, are known variously as selection and routing, selective dissemination of information (SDI), and information filtering. The combined operation of selecting and labeling documents naturally separates into two processes: feature scanning and query resolution. The first process examines a document for features and their locations; the second takes the findings from the first process, looks for satisfaction of combinations specified in the queries, and marks the document accordingly. When the body of queries is large, query resolution can become a significant factor in total processing speed. This paper outlines an efficient approach to performing query resolution on massive Boolean queries, suitable for implementation on a desktop computer. Algorithms are sketched in pseudo-code and experimental results are reported.
1982 Technical Symposium East | 1982
Jonathan D. Cohen
This paper describes a method by which optical time- and space-integrating processors may be generalized and expanded by accommodating vector inputs. Many architectures using this frequency division multiplexing approach are presented, demonstrating new operations and generalizations of old ones.
1982 Technical Symposium East | 1982
Jonathan D. Cohen
An optical ambiguity processor is described which allows realtime processing of wideband signals. One-dimensional acousto-optic cells are used as input transducers and no light-to-light modulators are required. Performance is predicted and measured. The material in this paper is distilled from the authors masters thesis of spring, 19801.
Computing in Science and Engineering | 2009
Jonathan D. Cohen
Journal of the Association for Information Science and Technology | 2000
Yew-Huey Liu; Paul M. Dantzig; Martin William Sachs; James T. Corey; Mark T. Hinnebusch; Marc Damashek; Jonathan D. Cohen
Archive | 1988
Jonathan D. Cohen