Is this you? Create Your Porfile

Matthew G. Snover

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew G. Snover is active.

Explore More

Publication

Featured researches published by Matthew G. Snover.

workshop on statistical machine translation | 2009

Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric

Matthew G. Snover; Nitin Madnani; Bonnie J. Dorr; Richard M. Schwartz

Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT performance that can be captured by automatic MT metrics. We explore these differences through the use of a new tunable MT metric: TER-Plus, which extends the Translation Edit Rate evaluation metric with tunable parameters and the incorporation of morphology, synonymy and paraphrases. TER-Plus was shown to be one of the top metrics in NISTs Metrics MATR 2008 Challenge, having the highest average rank in terms of Pearson and Spearman correlation. Optimizing TER-Plus to different types of human judgments yields significantly improved correlations and meaningful changes in the weight of different types of edits, demonstrating significant differences between the types of human judgments.

Machine Translation | 2009

TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Matthew G. Snover; Nitin Madnani; Bonnie J. Dorr; Richard M. Schwartz

This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on the success of TER as an evaluation metric and alignment tool and addresses several of its weaknesses through the use of paraphrases, stemming, synonyms, as well as edit costs that can be automatically optimized to correlate better with various types of human judgments. We present a correlation study comparing TERp to BLEU, METEOR and TER, and illustrate that TERp can better evaluate translation adequacy.

empirical methods in natural language processing | 2008

Language and Translation Model Adaptation using Comparable Corpora

Matthew G. Snover; Bonnie J. Dorr; Richard M. Schwartz

Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to generate, monolingual data are relatively common. Yet monolingual data have been under-utilized, having been used primarily for training a language model in the target language. This paper describes a novel method for utilizing monolingual target data to improve the performance of a statistical machine translation system on news stories. The method exploits the existence of comparable text---multiple texts in the target language that discuss the same or similar stories as found in the source language document. For every source document that is to be translated, a large monolingual data set in the target language is searched for documents that might be comparable to the source documents. These documents are then used to adapt the MT system to increase the probability of generating texts that resemble the comparable document. Experimental results obtained by adapting both the language and translation models show substantial gains over the baseline system.

international conference on acoustics, speech, and signal processing | 2006

Reranking for Sentence Boundary Detection in Conversational Speech

Brian Roark; Yang Liu; Mary P. Harper; Robin Stewart; Matthew Lease; Matthew G. Snover; Izhak Shafran; Bonnie J. Dorr; John Hale; Anna Krasnyanskaya; Lisa Yung

We present a reranking approach to sentence-like unit (SU) boundary detection, one of the EARS metadata extraction tasks. Techniques for generating relatively small n-best lists with high oracle accuracy are presented. For each candidate, features are derived from a range of information sources, including the output of a number of parsers. Our approach yields significant improvements over the best performing system from the NIST RT-04F community evaluation

north american chapter of the association for computational linguistics | 2004

A lexically-driven algorithm for disfluency detection

Matthew G. Snover; Bonnie J. Dorr; Richard M. Schwartz

This paper describes a transformation-based learning approach to disfluency detection in speech transcripts using primarily lexical features. Our method produces comparable results to two other systems that make heavy use of prosodic features, thus demonstrating that reasonable performance can be achieved without extensive prosodic cues. In addition, we show that it is possible to facilitate the identification of less frequently disfluent discourse markers by taking speaker style into account.

meeting of the association for computational linguistics | 2006

PCFGs with Syntactic and Prosodic Indicators of Speech Repairs

John Hale; Izhak Shafran; Lisa Yung; Bonnie J. Dorr; Mary P. Harper; Anna Krasnyanskaya; Matthew Lease; Yang Liu; Brian Roark; Matthew G. Snover; Robin Stewart

A grammatical method of combining two kinds of speech repair cues is presented. One cue, prosodic disjuncture, is detected by a decision tree-based ensemble classifier that uses acoustic cues to identify where normal prosody seems to be interrupted (Lickley, 1996). The other cue, syntactic parallelism, codifies the expectation that repairs continue a syntactic category that was left unfinished in the reparandum (Levelt, 1983). The two cues are combined in a Treebank PCFG whose states are split using a few simple tree transformations. Parsing performance on the Switchboard and Fisher corpora suggests that these two cues help to locate speech repairs in a synergistic way.

Machine Translation | 2009

Expected dependency pair match: predicting translation quality with expected syntactic structure

Jeremy G. Kahn; Matthew G. Snover; Mari Ostendorf

Recent efforts to develop new machine translation evaluation methods have tried to account for allowable wording differences either in terms of syntactic structure or synonyms/paraphrases. This paper primarily considers syntactic structure, combining scores from partial syntactic dependency matches with standard local n-gram matches using a statistical parser, and taking advantage of N-best parse probabilities. The new scoring metric, expected dependency pair match (EDPM), is shown to outperform BLEU and TER in terms of correlation to human judgments and as a predictor of HTER. Further, we combine the syntactic features of EDPM with the alternative wording features of TERp, showing a benefit to accounting for syntactic structure on top of semantic equivalency features.

conference of the association for machine translation in the americas | 2005

A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION

Matthew G. Snover; Bonnie J. Dorr; Richard M. Schwartz; Linnea Micciulla; Ralph M. Weischedel

language resources and evaluation | 2006

SParseval: Evaluation Metrics for Parsing Speech.

Brian Roark; Mary P. Harper; Eugene Charniak; Bonnie J. Dorr; Mark Johnson; Jeremy G. Kahn; Yang Liu; Mari Ostendorf; John Hale; Anna Krasnyanskaya; Matthew Lease; Izhak Shafran; Matthew G. Snover; Robin Stewart; Lisa Yung

Archive | 2005

2005 Johns Hopkins Summer Workshop Final Report on Parsing and Spoken Structural Event Detection

Mary P. Harper; Bonnie J. Dorr; John Hale; Brian Roark; Izhak Shafran; Matthew Lease; Yang Liu; Matthew G. Snover; Lisa Yung; Anna Krasnyanskaya; Robin Stewart

Explore More