Jonathan H. Clark | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan H. Clark is active.

Explore More

Publication

Featured researches published by Jonathan H. Clark.

mexican international conference on artificial intelligence | 2007

A classifier system for author recognition using synonym-based features

Jonathan H. Clark; Charles Hannon

The writing style of an author is a phenomenon that computer scientists and stylometrists have modeled in the past with some success. However, due to the complexity and variability of writing styles, simple models often break down when faced with real world data. Thus, current trends in stylometry often employ hundreds of features in building classifier systems. In this paper, we present a novel set of synonym-based features for author recognition. We outline a basic model of how synonyms relate to an authors identify and then build an additional two models refined to meet real world needs. Experiments show strong correlation between the presented metric and the writing style of four authors with the second of the three models outperforming the others. As modern stylometric classifier systems demand increasingly larger feature sets, this new set of synonym-based features will serve to fill this ever-increasing need.

workshop on statistical machine translation | 2009

An Improved Statistical Transfer System for French-English Machine Translation

Greg Hanneman; Vamshi Ambati; Jonathan H. Clark; Alok Parlikar; Alon Lavie

This paper presents the Carnegie Mellon University statistical transfer MT system submitted to the 2009 WMT shared task in French-to-English translation. We describe a syntax-based approach that incorporates both syntactic and non-syntactic phrase pairs in addition to a syntactic grammar. After reporting development test results, we conduct a preliminary analysis of the coverage and effectiveness of the systems components.

The Prague Bulletin of Mathematical Linguistics | 2010

The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows

Jonathan H. Clark; Jonathan Weese; Byung Gyu Ahn; Andreas Zollmann; Qin Gao; Kenneth Heafield; Alon Lavie

The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows Construction of machine translation systems has evolved into a multi-stage workflow involving many complicated dependencies. Many decoder distributions have addressed this by including monolithic training scripts - train-factored-model.pl for Moses and mr_runmer.pl for SAMT. However, such scripts can be tricky to modify for novel experiments and typically have limited support for the variety of job schedulers found on academic and commercial computer clusters. Further complicating these systems are hyperparameters, which often cannot be directly optimized by conventional methods requiring users to determine which combination of values is best via trial and error. The recently-released LoonyBin open-source workflow management tool addresses these issues by providing: 1) a visual interface for the user to create and modify workflows; 2) a well-defined logging mechanism; 3) a script generator that compiles visual workflows into shell scripts, and 4) the concept of Hyperworkflows, which intuitively and succinctly encodes small experimental variations within a larger workflow. In this paper, we describe the Machine Translation Toolpack for LoonyBin, which exposes state-of-the-art machine translation tools as drag-and-drop components within LoonyBin.

mexican international conference on computer science | 2007

An Algorithm for Identifying Authors Using Synonyms

Jonathan H. Clark; Charles Hannon

An approach for identifying the human source of a text by leveraging the significance of synonyms in language is presented. While others have attempted to identify authors in the past, they have focused on purely statistical approaches such as word length distribution, number of distinct words, and language models. We claim that an authors choice of synonyms is idiosyncratic and can be used in determining the identity of an author, which we demonstrate via our algorithm for recognizing authors. This algorithm uses synonym sets from the WordNet lexical database to give more weight to words that have many common synonyms. The results of this method applied to the task of identifying the authors of classic literature show that there is a correlation between an authors synonym choice and the authors identity. With this new author recognition technology, we may now explore new avenues of intelligent and meaningful interaction with users.

meeting of the association for computational linguistics | 2008

Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation

Jonathan H. Clark; Robert E. Frederking; Lori S. Levin

Syntax-based Machine Translation systems have recently become a focus of research with much hope that they will outperform traditional Phrase-Based Statistical Machine Translation (PBSMT). Toward this goal, we present a method for analyzing the morphosyntactic content of language from an Elicitation Corpus such as the one included in the LDCs upcoming LCTL language packs. The presented method discovers a mapping between morphemes and linguistically relevant features. By providing this tool that can augment structure-based MT models with these rich features, we believe the discriminative power of current models can be improved. We conclude by outlining how the resulting output can then be used in inducing a morphosyntactically feature-rich grammar for AVENUE, a modern syntax-based MT system.

meeting of the association for computational linguistics | 2011