Adam L. Berger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam L. Berger is active.

Explore More

Publication

Featured researches published by Adam L. Berger.

Machine Learning | 1999

Statistical Models for Text Segmentation

Doug Beeferman; Adam L. Berger; John D. Lafferty

This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which may be domain-specific, that tend to be used near segment boundaries. Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.

international acm sigir conference on research and development in information retrieval | 2000

Bridging the lexical chasm: statistical approaches to answer-finding

Adam L. Berger; Rich Caruana; David A. Cohn; Dayne Freitag; Vibhu O. Mittal

This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing the relation between question and answer with a statistical model. For the purpose of learning this relation, we propose two sources of data: Usenet FAQ documents and customer service call-center dialogues from a large retail company. We will show that the task of “answer-finding” differs from both document retrieval and tradition question-answering, presenting challenges different from those found in these problems. The central aim of this work is to discover, through theoretical and empirical investigation, those statistical techniques best suited to the answer-finding problem.

international acm sigir conference on research and development in information retrieval | 2000

OCELOT: a system for summarizing Web pages

Adam L. Berger; Vibhu O. Mittal

We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in both structure and content. Instead of coherent text with a well-defined discourse structure, they are more often likely to be a chaotic jumble of phrases, links, graphics and formatting commands. Such text provides little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous, coherent span of text from it. This paper builds upon recent work in non-extractive summarization, producing the gist of a web page by “translating” it into a more concise representation rather than attempting to extract a text span verbatim. OCELOT uses probabilistic models to guide it in selecting and ordering words into a gist. This paper describes a technique for learning these models automatically from a collection of human-summarized web pages.

human language technology | 1994

The Candide system for machine translation

Adam L. Berger; Peter F. Brown; Stephen A. Della Pietra; Vincent J. Della Pietra; John R. Gillett; John D. Lafferty; Robert L. Mercer; Harry Printz; Lubos Ures

We present an overview of Candide, a system for automatic translation of French text to English text. Candide uses methods of information theory and statistics to develop a probability model of the translation process. This model, which is made to accord as closely as possible with a large body of French and English sentence pairs, is then used to generate English translations of previously unseen French sentences. This paper provides a tutorial in these methods, discussions of the training and operation of the system, and a summary of test results.

meeting of the association for computational linguistics | 2000

Query-relevant summarization using FAQs

Adam L. Berger; Vibhu O. Mittal

This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequently-asked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments---on a collection of Usenet FAQs and on a FAQ-like set of customer-submitted questions to several large retail companies---suggest the plausibility of learning for summarization.

international conference on acoustics speech and signal processing | 1998

Just-in-time language modelling

Adam L. Berger; Robert C. Miller

Traditional approaches to language modelling have relied on a fixed corpus of text to inform the parameters of a probability distribution over word sequences. Increasing the corpus size often leads to better-performing language models, but no matter how large, the corpus is a static entity, unable to reflect information about events which postdate it. We introduce an online paradigm which interleaves the estimation and application of a language model. We present a Bayesian approach to online language modelling, in which the marginal probabilities of a static trigram model are dynamically updated to match the topic being dictated to the system. We also describe the architecture of a prototype we have implemented which uses the World Wide Web (WWW) as a source of information, and provide results from some initial proof of concept experiments.

international conference on acoustics speech and signal processing | 1998

Cyberpunc: a lightweight punctuation annotation system for speech

Doug Beeferman; Adam L. Berger; John D. Lafferty

This paper describes a lightweight method for the automatic insertion of intra-sentence punctuation into text. Despite the intuition that pauses in an acoustic stream are a positive indicator for some types of punctuation, this work will demonstrate the feasibility of a system which relies solely on lexical information. Besides its potential role in a speech recognition system, such a system could serve equally well in non-speech applications such as automatic grammar correction in a word processor and parsing of spoken text. After describing the design of a punctuation-restoration system, which relies on a trigram language model and a straightforward application of the Viterbi algorithm, we summarize results, both quantitative and subjective, of the performance and behavior of a prototype system.

human-computer interaction with mobile devices and services | 2004

Automatic Partitioning of Web Pages Using Clustering

Richard Romero; Adam L. Berger

This paper introduces a method for automatically partitioning richly-formatted electronic documents. An automatic partitioning system has many potential uses, but we focus here on one: dividing web content into fragments small enough to be delivered to and rendered on a mobile phone or PDA. The segmentation algorithm is analyzed from a theoretical and an empirical basis, with a suite of measurements.

human-computer interaction with mobile devices and services | 2004

CENTAUR: A Two-Panel User Interface for Mobile Document Access

Greg Schohn; Adam L. Berger

This paper introduces a novel user interface designed to mitigate some of the usability problems in mobile web access. The interface consists of two side-by-side panels for representing a richly-formatted document (e.g. a web page). In one panel is a “wide angle” view of the entire document, partitioned into regions. In the other panel is a “zoomed in” view of the currently-active region. We describe a prototype system, called Centaur, which automatically and in real-time reformats electronic documents into this side-by-side view.

Computational Linguistics | 1996