Paul S. Jacobs
General Electric
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul S. Jacobs.
Communications of The ACM | 1990
Paul S. Jacobs; Lisa F. Rau
The future of natural language text processing is examined in the SCISOR prototype. Drawing on artificial intelligence techniques, and applying them to financial news items, this powerful tool illustrates some of the future benefits of natural language analysis through a combination of bottom-up and top-down processing.
Information Processing and Management | 1989
Lisa F. Rau; Paul S. Jacobs; Uri Zernik
Abstract Storing and accessing texts in a conceptual format has a number of advantages over traditional document retrieval methods. A conceptual format facilitates natural language access to text information. It can support imprecise and inexact queries, conceptual information summarization, and, ultimately, document translation. The lack of extensive linguistic coverage is the major barrier to extracting useful information from large bodies of text. Current natural language processing (NLP) systems do not have rich enough lexicons to cover all the important words and phrases in extended texts. Two methods of overcoming this limitation are (1) to apply a text processing strategy that is tolerant of unknown words and gaps in linguistics knowledge, and (2) to acquire lexical information automatically from the texts. These two methods have been implemented in a prototype intelligent information retrieval system called SCISOR (System for Conceptual Information Summarization, Organization and Retrieval). This article describes the text processing, language acquisition, and summarization components of SCISOR.
conference on applied natural language processing | 1988
Lisa F. Rau; Paul S. Jacobs
The SCISOR system is a computer program designed to scan naturally occurring texts in constrained domains, extract information, and answer questions about that information. The system currently reads newspapers stories in the domain of corporate mergers and acquisitions. The language analysis strategy used by SCISOR combines full syntactic (bottom-up) parsing and conceptual expectation-driven (top-down) parsing. Four knowledge sources, including syntactic and semantic information and domain knowledge, interact in a flexible manner. This integration produces a more robust semantic analyzer designed to deal gracefully with gaps in lexical and syntactic knowledge, transports easily to new domains, and facilitates the extraction of information from texts.
international acm sigir conference on research and development in information retrieval | 1991
Lisa F. Rau; Paul S. Jacobs
Indexing text for accurate retrieval ZS a dificuli and in)portant problem.. On-ltne information services generally depend on “keyword” mdtces rather ihat~ other methods of retrieval, because of the pract~cal jealures of keywords for storage, dtssemtnatlonj and browsing m well as for retrveval. However, these methods OJ ~ndex~ng hove two major drawbacks: First, they m vst be laboriously asstgned by human indexers. Second, they are znaccuraie, because of mistakes made by these zndezers as well as the dtficulties users have tn choosing keywords jor their queries, and the ambzgulty a keyword may have. Carrent natural language text processing (AILP) lneihods help to overcome lhese problems. Such methods caa provzde auiomaiic ~ndezlng and keyword assign njeni capabilities that are at least as accuraie as human indezers in many applications. In adddlon, NLP syste?ns can merease the information conta~ned Ln keyword fields by separating keywords into segment~, or distinct fields that capture certain dtscrlminating content or relations among keywords. Th~s paper reports on a system that uses natural language text processing to derive keywords from free ted news siorles, separat,e these kegwords into segments, and awtomatica!iy butld a segmented database. The systenl M used as part of a conlmerctai news “cllpplng” altd relrieual prodwct. Preliminary rrsvlts show zn~provfd accuracy, as well as reduced cost. r[sulitng front fhesc oo tornated techniques.
human language technology | 1991
Paul S. Jacobs; George R. Krupka; Lisa F. Rau
Ordinarily, one thinks of the problem of natural language understanding as one of making a single, left-to-right pass through an input, producing a progressively refined and detailed interpretation. In text interpretation, however, the constraints of strict left-to-right processing are an encumbrance. Multi-pass methods, especially by interpreting words using corpus data and associating units of text with possible interpretations, can be more accurate and faster than single-pass methods of data extraction. Quality improves because corpus-based data and global context help to control false interpretations; speed improves because processing focuses on relevant sections.The most useful forms of pre-processing for text interpretation use fairly superficial analysis that complements the style of ordinary parsing but uses much of the same knowledge base. Lexico-semantic pattern matching, with rules that combine lexical analysis with ordering and semantic categories, is a good method for this form of analysis. This type of pre-processing is efficient, takes advantage of corpus data, prevents many garden paths and fruitless parses, and helps the parser cope with the complexity and flexibility of real text.
international conference on computational linguistics | 1990
Uri Zernik; Paul S. Jacobs
Recent work in text analysis has suggested that data on words that frequently occur together reveal important information about text content. Co-occurrence relations can serve two main purposes in language processing. First, the statistics of co-occurrence have been shown to produce accurate results in syntactic analysis. Second, the way that words appear together can help in assigning thematic roles in semantic interpretation. This paper discusses a method for collecting co-occurrence data, acquiring lexical relations from the data, and applying these relations to semantic analysis.
IEEE Intelligent Systems | 1993
Paul S. Jacobs
NLDB, a knowledge-based system that automatically categorizes news stories for dissemination, retrieval, and browsing, is discussed. The major knowledge-based component of NLDB is a lexicosemantic pattern matcher that identifies combinations of words and phrases, as well as more complex patterns. These include word roots, grammatical categories, and semantic structures, such as verbs describing classes of events. It is shown that this linguistic analysis outperforms statistical methods. Because building lexicosemantic patterns can be a laborious process, a set of statistical methods that automate pattern acquisition while preserving the benefits of a knowledge-based approach are developed.<<ETX>>
MUC4 '92 Proceedings of the 4th conference on Message understanding | 1992
George B. Krupka; Paul S. Jacobs; Lisa F. Rau; Lois Childs; Ira Sider
The GE NLTOOLSET is a set of text interpretation tools designed to be easily adapted to new domains. This report summarizes the system and its performance on the MUC-4 task.
MUC5 '93 Proceedings of the 5th conference on Message understanding | 1993
Paul S. Jacobs; George B. Krupka; Lisa F. Rau; Michael L. Mauldin; Teruko Mitamura; Tsuyoshi Kitani; Ira Sider; Lois Childs
This paper describes the GE-CMU TIPSTER/SHOGUN system as configured for the TIP-STER 24-month (MUC-5) benchmark, and gives details of the systems performance on the selected Japanese and English texts. The SHOGUN system is a distillation of some of the key ideas that emerged from previous benchmarks and experiments, emphasizing a simple architecture in which the focus is on detailed corpus-based knowledge. This design allowed the project to meet its goal of achieving advances in coverage and accuracy while showing consistently good performance across languages and domains.
International Journal of Intelligent Systems | 1992
Paul S. Jacobs
Transportability has perpetually been the nemesis of natural language processing systems, in both the research and commercial sectors. During the last 20 years, the technology has not moved much closer to providing robust coverage of everyday language, and has failed to produce commercial successes beyond a few specialized interfaces and application programs. the redesign required for each application has limited the impact of natural language systems. Trump (TRansportable Understanding Mechanism Package) is a natural language analyzer that functions in a variety of domains, in both interfaces and text processing. While other similar efforts have treated transportability as a problem in knowledge engineering, Trump instead relies mainly on a “core” of knowledge about language and a set of techniques for applying that knowledge within a domain. the information about words, word meanings, and linguistic relations in this generic knowledge base guides the conceptual framework of language interpretation in each domain. Turmp uses this core knowledge to piece together a conceptual representation of a natural language input by combining generic and specialized inforamtion. the result has been a language processing system that is capable of performing fairly extensive analysis with a minimum of customization for each application.