Jonathan Weese
Johns Hopkins University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan Weese.
workshop on statistical machine translation | 2009
Zhifei Li; Chris Callison-Burch; Chris Dyer; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Jonathan Weese; Omar F. Zaidan
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.
meeting of the association for computational linguistics | 2009
Zhifei Li; Chris Callison-Burch; Chris Dyery; Juri Ganitkevitch; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Jonathan Weese; Omar F. Zaidan
We describe Joshua (Li et al., 2009a), an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We also provide a demonstration outline for illustrating the toolkits features to potential users, whether they be newcomers to the field or power users interested in extending the toolkit.
The Prague Bulletin of Mathematical Linguistics | 2010
Jonathan H. Clark; Jonathan Weese; Byung Gyu Ahn; Andreas Zollmann; Qin Gao; Kenneth Heafield; Alon Lavie
The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows Construction of machine translation systems has evolved into a multi-stage workflow involving many complicated dependencies. Many decoder distributions have addressed this by including monolithic training scripts - train-factored-model.pl for Moses and mr_runmer.pl for SAMT. However, such scripts can be tricky to modify for novel experiments and typically have limited support for the variety of job schedulers found on academic and commercial computer clusters. Further complicating these systems are hyperparameters, which often cannot be directly optimized by conventional methods requiring users to determine which combination of values is best via trial and error. The recently-released LoonyBin open-source workflow management tool addresses these issues by providing: 1) a visual interface for the user to create and modify workflows; 2) a well-defined logging mechanism; 3) a script generator that compiles visual workflows into shell scripts, and 4) the concept of Hyperworkflows, which intuitively and succinctly encodes small experimental variations within a larger workflow. In this paper, we describe the Machine Translation Toolpack for LoonyBin, which exposes state-of-the-art machine translation tools as drag-and-drop components within LoonyBin.
The Prague Bulletin of Mathematical Linguistics | 2010
Jonathan Weese; Chris Callison-Burch
Visualizing Data Structures in Parsing-Based Machine Translation As machine translation (MT) systems grow more complex and incorporate more linguistic knowledge, it becomes more difficult to evaluate independent pieces of the MT pipeline. Being able to inspect many of the intermediate data structures used during MT decoding allows a more fine-grained evaluation of MT performance, helping to determine which parts of the current process are effective and which are not. In this article, we present an overview of the visualization tools that are currently distributed with the Joshua (Li et al., 2009) MT decoder. We explain their use and present an example of how visually inspecting the decoders data structures has led to useful improvements in the MT model.
conference of the european chapter of the association for computational linguistics | 2014
Jonathan Weese; Juri Ganitkevitch; Chris Callison-Burch
Paraphrase evaluation is typically done either manually or through indirect, taskbased evaluation. We introduce an intrinsic evaluation PARADIGM which measures the goodness of paraphrase collections that are represented using synchronous grammars. We formulate two measures that evaluate these paraphrase grammars using gold standard sentential paraphrases drawn from a monolingual parallel corpus. The first measure calculates how often a paraphrase grammar is able to synchronously parse the sentence pairs in the corpus. The second measure enumerates paraphrase rules from the monolingual parallel corpus and calculates the overlap between this reference paraphrase collection and the paraphrase resource being evaluated. We demonstrate the use of these evaluation metrics on paraphrase collections derived from three different data types: multiple translations of classic French novels, comparable sentence pairs drawn from different newspapers, and bilingual parallel corpora. We show that PARADIGM correlates with human judgments more strongly than BLEU on a task-based evaluation of paraphrase quality.
meeting of the association for computational linguistics | 2010
Chris Dyer; Adam Lopez; Juri Ganitkevitch; Jonathan Weese; Ferhan Türe; Phil Blunsom; Hendra Setiawan; Vladimir Eidelman; Philip Resnik
workshop on statistical machine translation | 2012
Juri Ganitkevitch; Yuan Cao; Jonathan Weese; Matt Post; Chris Callison-Burch
workshop on statistical machine translation | 2011
Jonathan Weese; Juri Ganitkevitch; Chris Callison-Burch; Matt Post; Adam Lopez
workshop on statistical machine translation | 2010
Zhifei Li; Chris Callison-Burch; Chris Dyer; Juri Ganitkevitch; Ann Irvine; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Ziyuan Wang; Jonathan Weese; Omar F. Zaidan
workshop on statistical machine translation | 2013
Matt Post; Juri Ganitkevitch; Luke Orland; Jonathan Weese; Yuan Cao; Chris Callison-Burch