Mary S. Neff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mary S. Neff is active.

Explore More

Publication

Featured researches published by Mary S. Neff.

north american chapter of the association for computational linguistics | 2000

Multi-document summarization by visualizing topical content

Rie Kubota Ando; Branimir Boguraev; Roy J. Byrd; Mary S. Neff

This paper describes a framework for multi-document summarization which combines three premises: coherent themes can be identified reliably; highly representative themes, running across subsets of the document collection, can function as multi-document summary surrogates; and effective end-use of such themes should be facilitated by a visualization environment which clarifies the relationship between themes and documents. We present algorithms that formalize our framework, describe an implementation, and demonstrate a prototype system and interface.

north american chapter of the association for computational linguistics | 2003

The talent system: TEXTRACT architecture and data model

Mary S. Neff; Roy J. Byrd; Branimir Boguraev

We present the architecture and data model for TEXTRACT, a document analysis framework for text analysis components. The framework and components have been deployed in research and industrial environments for text analysis and text mining tasks.

Natural Language Engineering | 2005

Visualization-enabled multi-document summarization by Iterative Residual Rescaling

Rie Kubota Ando; Branimir Boguraev; Roy J. Byrd; Mary S. Neff

This paper describes a novel approach to multi-document summarization, which explicitly addresses the problem of detecting, and retaining for the summary, multiple themes in document collections. We place equal emphasis on the processes of theme identification and theme presentation. For the former, we apply Iterative Residual Rescaling (IRR); for the latter, we argue for graphical display elements. IRR is an algorithm designed to account for correlations between words and to construct multi-dimensional topical space indicative of relationships among linguistic objects (documents, phrases, and sentences). Summaries are composed of objects with certain properties, derived by exploiting the many-to-many relationships in such a space. Given their inherent complexity, our multi-faceted summaries benefit from a visualization environment. We discuss some essential features of such an environment.

conference on applied natural language processing | 1988

CREATING AND QUERYING LEXICAL DATA BASES

Mary S. Neff; Roy J. Byrd; Omneya A. Rizk

Users of computerized dictionaries require powerful and flexible tools for analyzing and manipulating the information in them. This paper discusses a system for grammatically describing and parsing entries from machine-readable dictionary tapes and a lexical data base representation for storing the dictionary information. It also describes a language for querying, formatting, and maintaining dictionaries and other lexical data stored with that representation.

Natural Language Engineering | 2004

The Talent system: TEXTRACT architecture and data model

Mary S. Neff; Roy J. Byrd; Branimir Boguraev

We present the architecture and data model for TEXTRACT, a robust, scalable and configurable document analysis framework. TEXTRACT has been engineered as a pipeline architecture, allowing for rapid prototyping and application development by freely mixing reusable, existing, language analysis plugins and custom, new, plugins with customizable functionality. We discuss design issues which arise from requirements of industrial strength efficiency and scalability, and which are further constrained by plugin interactions, both among themselves, and with a common data model comprising an annotation store, document vocabulary and a lexical cache. We exemplify some of these by focusing on a meta-plugin: an interpreter for annotation-based finite state transduction, through which many linguistic filters can be implemented as stand-alone plugins. The framework and component plugins have been extensively deployed in both research and industrial environments, for a broad range of text analysis and mining tasks.

conference on information and knowledge management | 2008

Semi-automated logging of contact center telephone calls

Roy J. Byrd; Mary S. Neff; Wilfried Teiken; Youngja Park; Keh-Shin F. Cheng; Stephen C. Gates; Karthik Visweswariah

Modern businesses use contact centers as a communication channel with users of their products and services. The largest factor in the expense of running a telephone contact center is the labor cost of its agents. IBM Research has built a new system, Contact-Center Agent Buddies (CAB), which is designed to help reduce the average handle time (AHT) for customer calls, thereby also reducing their cost. In this paper, we focus on the call logging subsystem, which helps agents reduce the time they spend documenting those calls. We built a Template CAB and a Call Logging CAB, using a pipeline consisting of audio capture of a telephone conversation, automatic speech recognition, text analysis, and log generation. We developed techniques for ASR text cleansing, including normalization of expressions and acronyms, domain terms, capitalization, and boundaries for sentences, paragraphs, and call segments. We found that simple heuristics suffice to generate high-quality logs from the normalized sentences. The pipeline yields a candidate call log which the agents can edit in less time than it takes them to generate call logs manually. Evaluation of the Call Logging CAB in an industrial contact center environment shows that it reduces the amount of time agents spend logging calls by at least 50% without compromising the quality of the resulting call documentation.

hawaii international conference on system sciences | 1999

ASHRAM: active summarization and Markup

Mary S. Neff; James W. Cooper

Typically, searching for information in a document collection amounts to refining a query and then scanning a large number of documents to determine their relevance. Active Summarization Having Related Active Markup (ASHRAM) is a facility for representing and automatically selecting, marking, and linking useful and/or salient items in a document, to make it easier for the user to determine the main points in a document or navigate through documents without having to read all of them. ASHRAM is a novel client server system and user interface consisting of dynamically generated HTML, JavaScript and Java which requests information from a document database stored on a server. We describe a system for summarization by sentence extraction and a user interface for representation that allows the user to exploit the summary not only as an aid for relevance assessment of documents, but as an active aid to document navigation. The server-based scalable text summarization and keyword extraction system uses Natural Language Processing (NLP) technology and corpus-based NLP techniques in the foreground and databases constructed using NLP technology in the background.

language resources and evaluation | 2010

A framework for traversing dense annotation lattices

Branimir Boguraev; Mary S. Neff

Pattern matching, or querying, over annotations is a general purpose paradigm for inspecting, navigating, mining, and transforming annotation repositories—the common representation basis for modern pipelined text processing architectures. The open-ended nature of these architectures and expressiveness of feature structure-based annotation schemes account for the natural tendency of such annotation repositories to become very dense, as multiple levels of analysis get encoded as layered annotations. This particular characteristic presents challenges for the design of a pattern matching framework capable of interpreting ‘flat’ patterns over arbitrarily dense annotation lattices. We present an approach where a finite state device applies (compiled) pattern grammars over what is, in effect, a linearized ‘projection’ of a particular route through the lattice. The route is derived by a mix of static grammar analysis and runtime interpretation of navigational directives within an extended grammar formalism; it selects just the annotations sequence appropriate for the patterns at hand. For expressive and efficient pattern matching in dense annotations stores, our implemented approach achieves a mix of lattice traversal and finite state scanning by exposing a language which, to its user, provides constructs for specifying sequential, structural, and configurational constraints among annotations.

international conference on computational linguistics | 2000

The effects of analysing cohesion on document summarisation

Branimir Boguraev; Mary S. Neff

We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor--simple lexical repetition---can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring end-user effects in the summaries, typically due to coherence degradation, readability deterioration, and topical under-representation. Lexical repetition is instrumental to, among other things, the topical make-up of a text, and in our framework a lexical repetition-based model of discourse segmentation, capable of detecting topic shifts, is integrated with a linguistically-aware summarizer utilizing notions of salience and dynamically-adjustable summary size. We show that even by leveraging lexical repetition alone, summaries are of comparable, and under certain conditions better, quality than the ones delivered by a state-of-the-art summarizer. This is encouraging for a broad research platform focusing on the recognition and use of cohesive devices in text for a range of content characterisation and document management tasks.

Archive | 1998