Todd Ward | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Todd Ward is active.

Explore More

Publication

Featured researches published by Todd Ward.

meeting of the association for computational linguistics | 2002

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni; Salim Roukos; Todd Ward; Wei-Jing Zhu

Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

international conference on acoustics speech and signal processing | 1996

Statistical natural language understanding using hidden clumpings

Mark E. Epstein; Kishore Papineni; Salim Roukos; Todd Ward; S. Della Pietra

We present a new approach to natural language understanding (NLU) based on the source-channel paradigm, and apply it to ARPAs Air Travel Information Service (ATIS) domain. The model uses techniques similar to those used by IBM in statistical machine translation. The parameters are trained using the exact match algorithm; a hierarchy of models is used to facilitate the bootstrapping of more complex models from simpler models.

IEEE Transactions on Speech and Audio Processing | 2004

Automatic recognition of spontaneous speech for access to multilingual oral history archives

William Byrne; David S. Doermann; Martin Franz; Samuel Gustman; Jan Hajic; Douglas W. Oard; Michael Picheny; Josef Psutka; Bhuvana Ramabhadran; Dagobert Soergel; Todd Ward; Wei-Jing Zhu

Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.

international acm sigir conference on research and development in information retrieval | 2001

Unsupervised and supervised clustering for topic tracking

Martin Franz; Todd Ward; J. Scott McCarley; Wei-Jing Zhu

We investigate important differences between two styles of document clustering in the context of Topic Detection and Tracking. Converting a Topic Detection system into a Topic Tracking system exposes fundamental differences between these two tasks that are important to consider in both the design and the evaluation of TDT systems. We also identify features that can be used in systems for both tasks.

ieee automatic speech recognition and understanding workshop | 1997

Towards a universal speech recognizer for multiple languages

Paul S. Cohen; Satya Dharanipragada; J. Gros; M. Monkowski; Chalapathy Neti; Salim Roukos; Todd Ward

We describe our initial efforts in building a universal recognizer for multiple languages that permits a user to switch languages seamlessly in a single session without requiring any switch in the speech recognition system. Towards this end we have begun building a universal speech recognizer for English and French languages. We experiment with a universal phonology for both French and English and describe speech recognition results for the ATIS task using a combined phonology. Our best results so far show about 5% relative performance degradation for English relative to a purely English system with about twice the vocabulary size and a 9% relative degradation in French relative to a purely French system.

meeting of the association for computational linguistics | 1997

Fertility Models for Statistical Natural Language Understanding

Stephen A. Della Pietra; Mark E. Epstein; Salim Roukos; Todd Ward

Several recent efforts in statistical natural language understanding (NLU) have focused on generating clumps of English words from semantic meaning concepts (Miller et al., 1995; Levin and Pieracini, 1995; Epstein et al., 1996; Epstein, 1996). This paper extends the IBM Machine Translation Groups concept of fertility (Brown et al., 1993) to the generation of clumps for natural language understanding. The basic underlying intuition is that a single concept may be expressed in English as many disjoint clump of words. We present two fertility models which attempt to capture this phenomenon. The first is a Poisson model which leads to appealing computational simplicity. The second is a general nonparametric fertility model. The general models parameters are boot-strapped from the Poisson model and updated by the EM algorithm. These fertility models can be used to impose clump fertility structure on top of preexisting clump generation models. Here, we present results for adding fertility structure to unigram, bigram, and headword clump generation models on ARPAs Air Travel Information Service (ATIS) domain.

international acm sigir conference on research and development in information retrieval | 2001

Quantifying the utility of parallel corpora

Martin Franz; J. Scott McCarley; Todd Ward; Wei-Jing Zhu

Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.

Archive | 2002

Segmentation and Detection at IBM

Satya Dharanipragada; Martin Franz; Jeffrey Scott McCarley; Todd Ward; Wei-Jing Zhu

IBM’s story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower C seg by combining them. IBM’s topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.

north american chapter of the association for computational linguistics | 2003

TIPS: a translingual information processing system

Yaser Al-Onaizan; Radu Florian; Martin Franz; Hany Hassan; Young-Suk Lee; J. Scott McCarley; Kishore Papineni; Salim Roukos; Jeffrey S. Sorensen; Christoph Tillmann; Todd Ward; Fei Xia

Searching online information is increasingly a daily activity for many people. The multilinguality of online content is also increasing (e.g. the proportion of English web users, which has been decreasing as a fraction the increasing population of web users, dipped below 50% in the summer of 2001). To improve the ability of an English speaker to search mutlilingual content, we built a system that supports cross-lingual search of an Arabic newswire collection and provides on demand translation of Arabic web pages into English. The cross-lingual search engine supports a fast search capability (sub-second response for typical queries) and achieves state-of-the-art performance in the high precision region of the result list. The on demand statistical machine translation uses the Direct Translation model along with a novel statistical Arabic Morphological Analyzer to yield state-of-the-art translation quality. The on demand SMT uses an efficient dynamic programming decoder that achieves reasonable speed for translating web documents.

north american chapter of the association for computational linguistics | 2009

Improving Coreference Resolution by Using Conversational Metadata

Xiaoqiang Luo; Radu Florian; Todd Ward

In this paper, we propose the use of metadata contained in documents to improve coreference resolution. Specifically, we quantify the impact of speaker and turn information on the performance of our coreference system, and show that the metadata can be effectively encoded as features of a statistical resolution system, which leads to a statistically significant improvement in performance.

Explore More