David D. Palmer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David D. Palmer is active.

Explore More

Publication

Featured researches published by David D. Palmer.

meeting of the association for computational linguistics | 1997

A Trainable Rule-Based Algorithm for Word Segmentation

David D. Palmer

This paper presents a trainable rule-based algorithm for performing word segmentation. The algorithm provides a simple, language-independent alternative to large-scale lexical-based segmenters requiring large amounts of knowledge engineering. As a stand-alone segmenter, we show our algorithm to produce high performance Chinese segmentation. In addition, we show the transformation-based algorithm to be effective in improving the output of several existing word segmentation algorithms in three different languages.

conference on applied natural language processing | 1994

Adaptive Sentence Boundary Disambiguation

David D. Palmer; Marti A. Hearst

Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. This work demonstrates the feasibility of using prior probabilities of part-of-speech assignments, as opposed to words or definite part-of-speech assignments, as contextual information. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.

conference on applied natural language processing | 1997

A Statistical Profile of the Named Entity Task

David D. Palmer; David S. Day

In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound estimation for Named Entity corpora and discuss the significance of the cross-lingual comparisons provided by the analysis.

Computer Speech & Language | 2005

Improving Out-of-Vocabulary Name Resolution

David D. Palmer; Mari Ostendorf

This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings, which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.

international conference on human language technology research | 2001

Improving information extraction by modeling errors in speech recognizer output

David D. Palmer; Mari Ostendorf

In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of multiple error types, improved confidence estimation, and multipass processing. In combination, these techniques improve named entity recognition performance over a text-based baseline by 28%.

Speech Communication | 2000

Robust information extraction from automatically generated speech transcriptions

David D. Palmer; Mari Ostendorf; John D. Burger

This paper describes a robust system for information extraction (IE) from spoken language data. The system extends previous hidden Markov model (HMM) work in IE, using a state topology designed for explicit modeling of variable-length phrases and class-based statistical language model smoothing to produce state-of-the-art performance for a wide range of speech error rates. Experiments on broadcast news data show that the system performs well with temporal and source differences in the data. In addition, strategies for integrating word-level confidence estimates into the model are introduced, showing improved performance by using a generic error token for incorrectly recognized words in the training data and low confidence words in the test data.

meeting of the association for computational linguistics | 1998

Named Entity Scoring for Speech Input

John D. Burger; David D. Palmer; Lynette Hirschman

This paper describes a new scoring algorithm that supports comparison of linguistically annotated data from noisy sources. The new algorithm generalizes the Message Understanding Conference (MUC) Named Entity scoring algorithm, using a comparison based on explicit alignment of the underlying texts, followed by a scoring phase. The scoring procedure maps corresponding tagged regions and compares these according to tag type and tag extent, allowing us to reproduce the MUC Named Entity scoring for identical underlying texts. In addition, the new algorithm scores for content (transcription correctness) of the tagged region, a useful distinction when dealing with noisy data that may differ from a reference transcription (e.g., speech recognizer output). To illustrate the algorithm, we have prepared a small test data set consisting of a careful transcription of speech data and manual insertion of SGML named entity annotation. We report results for this small test corpus on a variety of experiments involving automatic speech recognition and named entity tagging.

Proceedings of the TIPSTER Text Program: Phase II | 1996

MITRE: DESCRIPTION OF THE ALEMBIC SYSTEM AS USED IN MET

John S. Aberdeen; John D. Burger; David S. Day; Lynette Hirschman; David D. Palmer; Patricia Robinson; Marc B. Vilain

Alembic is a comprehensive information extraction system that has been applied to a range of tasks. These include the now-standard components of the formal MUC evaluations: name tagging (NE in MUC-6), name normalization (TE), and template generation (ST). The system has also been exploited to help segment and index broadcast video and was used for early experiments on variants of the co-reference identification task. (For details, see [1].)

ANLP/NAACL-ReadingComp '00 Proceedings of the 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computer-based language understanding sytems - Volume 6 | 2000

Some challenges of developing fully-automated systems for taking audio comprehension exams

David D. Palmer

Audio comprehension tests are designed to help evaluate a listeners understanding of a spoken passage and are frequently a key component of language competency exams. Just as reading comprehension exams are proving useful in evaluating text-based language processing technology, audio comprehension exams can be used to evaluate spoken language processing systems. In this paper we discuss some of the challenges of developing automated systems for taking audio comprehension exams.

Computational Linguistics | 1997