Lisa F. Rau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lisa F. Rau is active.

Explore More

Publication

Featured researches published by Lisa F. Rau.

Information Processing and Management | 1995

Automatic condensation of electronic publications by sentence selection

Ronald Brandow; Karl Mitze; Lisa F. Rau

Abstract As electronic information access becomes the norm, and the variety of retrievable material increases, automatic methods of summarizing or condensing text will become critical. This paper describes a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portion of the texts (the lead), up to the target length of the summaries. Three lengths of articles were evaluated for 250 documents by both systems, totalling 1500 suitability judgements in all. The outcome of perhaps the largest evaluation of human vs machine summarization performed to date was unexpected. The lead-based summaries outperformed the “intelligent” summaries significantly, achieving acceptability ratings of over 90%, compared to 74.4%. This paper briefly reviews the literature, details the implications of these results, and addresses the remaining hopes for content-based summarization. We expect the results presented here to be useful to other researchers currently investigating the viability of summarization through sentence selection heuristics.

Communications of The ACM | 1990

SCISOR: extracting information from on-line news

Paul S. Jacobs; Lisa F. Rau

The future of natural language text processing is examined in the SCISOR prototype. Drawing on artificial intelligence techniques, and applying them to financial news items, this powerful tool illustrates some of the future benefits of natural language analysis through a combination of bottom-up and top-down processing.

[1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application | 1991

Extracting company names from text

Lisa F. Rau

A detailed description is given of an implemented algorithm that extracts company names automatically from financial news. Extracting company names from text is one problem; recognizing subsequent references to a company is another. The author addresses both problems in an implemented, well-tested module that operates as a detachable process from a set of natural language processing tools. She implements a good algorithm by combining heuristics, exception lists and extensive corpus analysis. The algorithm generates the most likely variations that those names may go by, for use in subsequent retrieval. Tested on over one million words of naturally occurring financial news, the system has extracted thousands of company names with over 95% accuracy (precision) compared to a human, and succeeded in extracting 25% more companies than were indexed by a human.<<ETX>>

Information Processing and Management | 1989

Information extraction and text summarization using linguistic knowledge acquisition

Lisa F. Rau; Paul S. Jacobs; Uri Zernik

Abstract Storing and accessing texts in a conceptual format has a number of advantages over traditional document retrieval methods. A conceptual format facilitates natural language access to text information. It can support imprecise and inexact queries, conceptual information summarization, and, ultimately, document translation. The lack of extensive linguistic coverage is the major barrier to extracting useful information from large bodies of text. Current natural language processing (NLP) systems do not have rich enough lexicons to cover all the important words and phrases in extended texts. Two methods of overcoming this limitation are (1) to apply a text processing strategy that is tolerant of unknown words and gaps in linguistics knowledge, and (2) to acquire lexical information automatically from the texts. These two methods have been implemented in a prototype intelligent information retrieval system called SCISOR (System for Conceptual Information Summarization, Organization and Retrieval). This article describes the text processing, language acquisition, and summarization components of SCISOR.

Information Processing and Management | 1987

Knowledge organization and access in a conceptual information system

Lisa F. Rau

Abstract Traditional approaches to information retrieval, based on automatic or manually constructed keywords, are inappropriate for certain desirable tasks in an intelligent information system. Obtaining simple answers to direct questions, a summary of an event sequence that could span multiple documents, and an update of recent developments in an ongoing event sequence are three examples of such tasks. In this paper, the SCISOR system is described. SCISOR illustrates the potential for increased recall and precision of stored information through the understanding in context of articles in its domain of corporate takeovers. A constrained form of marker passing is used to answer queries of the knowledge base posed in natural language. Among other desirable characteristics, this method of retrieval focuses search on likely candidates, and tolerates incomplete or incorrect input indices very well.

conference on applied natural language processing | 1988

INTEGRATING TOP-DOWN AND BOTTOM-UP STRATEGIES IN A TEXT PROCESSING SYSTEM

Lisa F. Rau; Paul S. Jacobs

The SCISOR system is a computer program designed to scan naturally occurring texts in constrained domains, extract information, and answer questions about that information. The system currently reads newspapers stories in the domain of corporate mergers and acquisitions. The language analysis strategy used by SCISOR combines full syntactic (bottom-up) parsing and conceptual expectation-driven (top-down) parsing. Four knowledge sources, including syntactic and semantic information and domain knowledge, interact in a flexible manner. This integration produces a more robust semantic analyzer designed to deal gracefully with gaps in lexical and syntactic knowledge, transports easily to new domains, and facilitates the extraction of information from texts.

Communications of The ACM | 1995

Commercial applications of natural language processing

Kenneth Ward Church; Lisa F. Rau

Vast quantities of text are becoming available in electronic form, ranging from published documents (e.g., electronic dictionaries, encyclopedias, libraries and archives for information retrieval services), to private databases (e.g., marketing information, legal records, medical histories), to personal email and faxes. Online information services are reaching mainstream computer users. There were over 15 million Internet users in 1993, and projections are for 30 million in 1997. With media attention reaching all-time highs, hardly a day goes by without a new article on the National Information Infrastructure, digital libraries, networked services, digital convergence or intelligent agents. This attention is moving natural language processing along the critical path for all kinds of novel applications.

international acm sigir conference on research and development in information retrieval | 1991

Creating segmented databases from free text for text retrieval

Lisa F. Rau; Paul S. Jacobs

Indexing text for accurate retrieval ZS a dificuli and in)portant problem.. On-ltne information services generally depend on “keyword” mdtces rather ihat~ other methods of retrieval, because of the pract~cal jealures of keywords for storage, dtssemtnatlonj and browsing m well as for retrveval. However, these methods OJ ~ndex~ng hove two major drawbacks: First, they m vst be laboriously asstgned by human indexers. Second, they are znaccuraie, because of mistakes made by these zndezers as well as the dtficulties users have tn choosing keywords jor their queries, and the ambzgulty a keyword may have. Carrent natural language text processing (AILP) lneihods help to overcome lhese problems. Such methods caa provzde auiomaiic ~ndezlng and keyword assign njeni capabilities that are at least as accuraie as human indezers in many applications. In adddlon, NLP syste?ns can merease the information conta~ned Ln keyword fields by separating keywords into segment~, or distinct fields that capture certain dtscrlminating content or relations among keywords. Th~s paper reports on a system that uses natural language text processing to derive keywords from free ted news siorles, separat,e these kegwords into segments, and awtomatica!iy butld a segmented database. The systenl M used as part of a conlmerctai news “cllpplng” altd relrieual prodwct. Preliminary rrsvlts show zn~provfd accuracy, as well as reduced cost. r[sulitng front fhesc oo tornated techniques.

human language technology | 1991

Lexico-semantic pattern matching as a companion to parsing in text understanding

Paul S. Jacobs; George R. Krupka; Lisa F. Rau

Ordinarily, one thinks of the problem of natural language understanding as one of making a single, left-to-right pass through an input, producing a progressively refined and detailed interpretation. In text interpretation, however, the constraints of strict left-to-right processing are an encumbrance. Multi-pass methods, especially by interpreting words using corpus data and associating units of text with possible interpretations, can be more accurate and faster than single-pass methods of data extraction. Quality improves because corpus-based data and global context help to control false interpretations; speed improves because processing focuses on relevant sections.The most useful forms of pre-processing for text interpretation use fairly superficial analysis that complements the style of ordinary parsing but uses much of the same knowledge base. Lexico-semantic pattern matching, with rules that combine lexical analysis with ordering and semantic categories, is a good method for this form of analysis. This type of pre-processing is efficient, takes advantage of corpus data, prevents many garden paths and fruitless parses, and helps the parser cope with the complexity and flexibility of real text.

international acm sigir conference on research and development in information retrieval | 1988

Natural language techniques for intelligent information retrieval

P. S. Jacob; Lisa F. Rau

Neither natural language processing nor information retrieval is any longer a young field, but the two areas have yet to achieve a graceful interaction. Mainly, the reason for this incompatibility is that information retrieval technology depends upon relatively simple but robust methods, while natural language processing involves complex knowledge-based systems that have never approached robustness. We provide an analysis of areas in which natural language and information retrieval come together, and describe a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.

Explore More