Jonathan Weese | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan Weese is active.

Explore More

Publication

Featured researches published by Jonathan Weese.

workshop on statistical machine translation | 2009

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

Zhifei Li; Chris Callison-Burch; Chris Dyer; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Jonathan Weese; Omar F. Zaidan

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.

meeting of the association for computational linguistics | 2009

Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation

Zhifei Li; Chris Callison-Burch; Chris Dyery; Juri Ganitkevitch; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Jonathan Weese; Omar F. Zaidan

We describe Joshua (Li et al., 2009a), an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We also provide a demonstration outline for illustrating the toolkits features to potential users, whether they be newcomers to the field or power users interested in extending the toolkit.

The Prague Bulletin of Mathematical Linguistics | 2010

The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows

Jonathan H. Clark; Jonathan Weese; Byung Gyu Ahn; Andreas Zollmann; Qin Gao; Kenneth Heafield; Alon Lavie

The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows Construction of machine translation systems has evolved into a multi-stage workflow involving many complicated dependencies. Many decoder distributions have addressed this by including monolithic training scripts - train-factored-model.pl for Moses and mr_runmer.pl for SAMT. However, such scripts can be tricky to modify for novel experiments and typically have limited support for the variety of job schedulers found on academic and commercial computer clusters. Further complicating these systems are hyperparameters, which often cannot be directly optimized by conventional methods requiring users to determine which combination of values is best via trial and error. The recently-released LoonyBin open-source workflow management tool addresses these issues by providing: 1) a visual interface for the user to create and modify workflows; 2) a well-defined logging mechanism; 3) a script generator that compiles visual workflows into shell scripts, and 4) the concept of Hyperworkflows, which intuitively and succinctly encodes small experimental variations within a larger workflow. In this paper, we describe the Machine Translation Toolpack for LoonyBin, which exposes state-of-the-art machine translation tools as drag-and-drop components within LoonyBin.

The Prague Bulletin of Mathematical Linguistics | 2010

Visualizing Data Structures in Parsing-based Machine Translation

Jonathan Weese; Chris Callison-Burch

Visualizing Data Structures in Parsing-Based Machine Translation As machine translation (MT) systems grow more complex and incorporate more linguistic knowledge, it becomes more difficult to evaluate independent pieces of the MT pipeline. Being able to inspect many of the intermediate data structures used during MT decoding allows a more fine-grained evaluation of MT performance, helping to determine which parts of the current process are effective and which are not. In this article, we present an overview of the visualization tools that are currently distributed with the Joshua (Li et al., 2009) MT decoder. We explain their use and present an example of how visually inspecting the decoders data structures has led to useful improvements in the MT model.

conference of the european chapter of the association for computational linguistics | 2014

PARADIGM: Paraphrase Diagnostics through Grammar Matching

Jonathan Weese; Juri Ganitkevitch; Chris Callison-Burch

Paraphrase evaluation is typically done either manually or through indirect, taskbased evaluation. We introduce an intrinsic evaluation PARADIGM which measures the goodness of paraphrase collections that are represented using synchronous grammars. We formulate two measures that evaluate these paraphrase grammars using gold standard sentential paraphrases drawn from a monolingual parallel corpus. The first measure calculates how often a paraphrase grammar is able to synchronously parse the sentence pairs in the corpus. The second measure enumerates paraphrase rules from the monolingual parallel corpus and calculates the overlap between this reference paraphrase collection and the paraphrase resource being evaluated. We demonstrate the use of these evaluation metrics on paraphrase collections derived from three different data types: multiple translations of classic French novels, comparable sentence pairs drawn from different newspapers, and bilingual parallel corpora. We show that PARADIGM correlates with human judgments more strongly than BLEU on a task-based evaluation of paraphrase quality.

meeting of the association for computational linguistics | 2010

cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Chris Dyer; Adam Lopez; Juri Ganitkevitch; Jonathan Weese; Ferhan Türe; Phil Blunsom; Hendra Setiawan; Vladimir Eidelman; Philip Resnik

workshop on statistical machine translation | 2012

Joshua 4.0: Packing, PRO, and Paraphrases

Juri Ganitkevitch; Yuan Cao; Jonathan Weese; Matt Post; Chris Callison-Burch

workshop on statistical machine translation | 2011

Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor

Jonathan Weese; Juri Ganitkevitch; Chris Callison-Burch; Matt Post; Adam Lopez

workshop on statistical machine translation | 2010

Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies

Zhifei Li; Chris Callison-Burch; Chris Dyer; Juri Ganitkevitch; Ann Irvine; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Ziyuan Wang; Jonathan Weese; Omar F. Zaidan

workshop on statistical machine translation | 2013