Spence Green | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Spence Green is active.

Explore More

Publication

Featured researches published by Spence Green.

human factors in computing systems | 2013

The efficacy of human post-editing for language translation

Spence Green; Jeffrey Heer; Christopher D. Manning

Language translation is slow and expensive, so various forms of machine assistance have been devised. Automatic machine translation systems process text quickly and cheaply, but with quality far below that of skilled human translators. To bridge this quality gap, the translation industry has investigated post-editing, or the manual correction of machine output. We present the first rigorous, controlled analysis of post-editing and find that post-editing leads to reduced time and, surprisingly, improved quality for three diverse language pairs (English to Arabic, French, and German). Our statistical models and visualizations of experimental data indicate that some simple predictors (like source text part of speech counts) predict translation time, and that post-editing results in very different interaction patterns. From these results we distill implications for the design of new language translation interfaces.

Computational Linguistics | 2013

Parsing models for identifying multiword expressions

Spence Green; Marie-Catherine de Marneffe; Christopher D. Manning

Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is based on context-free grammars and the second uses tree substitution grammars, a formalism that can store larger syntactic fragments. Our experiments show that both models can identify multiword expressions with much higher accuracy than a state-of-the-art system based on word co-occurrence statistics.We experiment with Arabic and French, which both have pervasive multiword expressions. Relative to English, they also have richer morphology, which induces lexical sparsity in finite corpora. To combat this sparsity, we develop a simple factored lexical representation for the context-free parsing model. Morphological analyses are automatically transformed into rich feature tags that are scored jointly with lexical items. This technique, which we call a factored lexicon, improves both standard parsing and multiword expression identification accuracy.

meeting of the association for computational linguistics | 2014

Word Segmentation of Informal Arabic with Domain Adaptation

Will Monroe; Spence Green; Christopher D. Manning

Segmentation of clitics has been shown to improve accuracy on a variety of Arabic NLP tasks. However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect. We extend an existing MSA segmenter with a simple domain adaptation technique and new features in order to segment informal and dialectal Arabic text. Experiments show that our system outperforms existing systems on newswire, broadcast news and Egyptian dialect, improvingsegmentationF1 scoreonarecently released Egyptian Arabic corpus to 95.1%, compared to 90.8% for another segmenter designed specifically for Egyptian Arabic.

workshop on statistical machine translation | 2014

Phrasal: A Toolkit for New Directions in Statistical Machine Translation

Spence Green; Daniel M. Cer; Christopher D. Manning

We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like thebitext, and(c)web-basedinteractivemachine translation. A direct comparison with Moses shows favorable results in terms of decoding speed and tuning time.

empirical methods in natural language processing | 2014

Human Effort and Machine Learnability in Computer Aided Translation

Spence Green; Sida I. Wang; Jason Chuang; Jeffrey Heer; Sebastian Schuster; Christopher D. Manning

Analyses of computer aided translation typically focus on either frontend interfaces and human effort, or backend translation and machine learnability of corrections. However, this distinction is artificial in practice since the frontend and backend must work in concert. We present the first holistic, quantitative evaluation of these issues by contrasting two assistive modes: postediting and interactive machine translation (MT). We describe a new translator interface, extensive modifications to a phrasebased MT system, and a novel objective function for re-tuning to human corrections. Evaluation with professional bilingual translators shows that post-edit is faster than interactive at the cost of translation quality for French-English and EnglishGerman. However, re-tuning the MT system to interactive output leads to larger, statistically significant reductions in HTER versus re-tuning to post-edit. Analysis shows that tuning directly to HTER results in fine-grained corrections to subsequent machine output.

workshop on statistical machine translation | 2014

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

Spence Green; Daniel M. Cer; Christopher D. Manning

Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific datadomains, identifyanobjectivefunction that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.

user interface software and technology | 2014

Predictive translation memory: a mixed-initiative system for human language translation

Spence Green; Jason Chuang; Jeffrey Heer; Christopher D. Manning

The standard approach to computer-aided language translation is post-editing: a machine generates a single translation that a human translator corrects. Recent studies have shown this simple technique to be surprisingly effective, yet it underutilizes the complementary strengths of precision-oriented humans and recall-oriented machines. We present Predictive Translation Memory, an interactive, mixed-initiative system for human language translation. Translators build translations incrementally by considering machine suggestions that update according to the users current partial translation. In a large-scale study, we find that professional translators are slightly slower in the interactive mode yet produce slightly higher quality translations despite significant prior experience with the baseline post-editing condition. Our analysis identifies significant predictors of time and quality, and also characterizes interactive aid usage. Subjects entered over 99% of characters via interactive aids, a significantly higher fraction than that shown in previous work.

workshop on statistical machine translation | 2014

Stanford Universityâ€™s Submissions to the WMT 2014 Translation Task

Julia Neidert; Sebastian Schuster; Spence Green; Kenneth Heafield; Christopher D. Manning

We describe Stanford’s participation in the French-English and English-German tracks of the 2014 Workshop on Statistical Machine Translation (WMT). Our systems used large feature sets, word classes, and an optional unconstrained language model. Among constrained systems, ours performed the best according to uncased BLEU: 36.0% for French-English and 20.9% for English-German.

empirical methods in natural language processing | 2015

Hierarchical Incremental Adaptation for Statistical Machine Translation

Joern Wuebker; Spence Green; John DeNero

We present an incremental adaptation approach for statistical machine translation that maintains a flexible hierarchical domain structure within a single consistent model. Both weights and rules are updated incrementallyonastreamofpost-edits. Our multi-level domain hierarchy allows the system to adapt simultaneously towards local context at dierent levels of granularity, including genres and individual documents. Our experiments show consistent improvements in translation quality from all components of our approach.

meeting of the association for computational linguistics | 2016

Models and Inference for Prefix-Constrained Machine Translation

Joern Wuebker; Spence Green; John DeNero; Sasa Hasan; Minh-Thang Luong

We apply phrase-based and neural models to a core task in interactive machine translation: suggesting how to complete a partial translation. For the phrase-based system, we demonstrate improvements in suggestion quality using novel objective functions, learning techniques, and inference algorithms tailored to this task. Our contributions include new tunable metrics, an improved beam search strategy, an n-best extraction method that increases suggestion diversity, and a tuning procedure for a hierarchical joint model of alignment and translation. The combination of these techniques improves next-word suggestion accuracy dramatically from 28.5% to 41.2% in a large-scale English-German experiment. Our recurrent neural translation system increases accuracy yet further to 53.0%, but inference is two orders of magnitude slower. Manual error analysis shows the strengths and weaknesses of both approaches.

Explore More