Chris Quirk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Quirk is active.

Explore More

Publication

Featured researches published by Chris Quirk.

international conference on computational linguistics | 2004

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

Bill Dolan; Chris Quirk; Chris Brockett

We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58% on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7% respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships.

meeting of the association for computational linguistics | 2005

Dependency Treelet Translation: Syntactically Informed Phrasal SMT

Chris Quirk; Arul Menezes; Colin Cherry

We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. We describe an efficient decoder and show that using these tree-based models in combination with conventional SMT models provides a promising approach that incorporates the power of phrasal SMT with the linguistic generality available in a parser.

international joint conference on natural language processing | 2015

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

Michel Galley; Chris Brockett; Alessandro Sordoni; Yangfeng Ji; Michael Auli; Chris Quirk; Margaret Mitchell; Jianfeng Gao; Bill Dolan

We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman’s ρ and Kendall’s τ .

empirical methods in natural language processing | 2006

The impact of parse quality on syntactically-informed statistical machine translation

Chris Quirk; Simon Corston-Oliver

We investigate the impact of parse quality on a syntactically-informed statistical machine translation system applied to technical text. We vary parse quality by varying the amount of data used to train the parser. As the amount of data increases, parse quality improves, leading to improvements in machine translation output and results that significantly outperform a state-of-the-art phrasal baseline.

Machine Translation | 2006

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

Chris Quirk; Arul Menezes

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation.

international conference on computational linguistics | 2008

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation

Robert C. Moore; Chris Quirk

Ochs (2003) minimum error rate training (MERT) procedure is the most commonly used method for training feature weights in statistical machine translation (SMT) models. The use of multiple randomized starting points in MERT is a well-established practice, although there seems to be no published systematic study of its benefits. We compare several ways of performing random restarts with MERT. We find that all of our random restart methods outperform MERT without random restarts, and we develop some refinements of random restarts that are superior to the most common approach with regard to resulting model quality and training time.

language and technology conference | 2006

Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation

Chris Quirk; Arul Menezes

We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these models into a syntax-based SMT system and demonstrate that it improves on the state of the art translation quality within a theoretically more desirable framework.

meeting of the association for computational linguistics | 2016

Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text

Kristina Toutanova; Victoria Lin; Wen-tau Yih; Hoifung Poon; Chris Quirk

Modeling relation paths has offered significant gains in embedding models for knowledge base (KB) completion. However, enumerating paths between two entities is very expensive, and existing approaches typically resort to approximation with a sampled subset. This problem is particularly acute when text is jointly modeled with KB relations and used to provide direct evidence for facts mentioned in it. In this paper, we propose the first exact dynamic programming algorithm which enables efficient incorporation of all relation paths of bounded length, while modeling both relation types and intermediate nodes in the compositional path representations. We conduct a theoretical analysis of the efficiency gain from the approach. Experiments on two datasets show that it addresses representational limitations in prior approaches and improves accuracy in KB completion.

Bioinformatics | 2014

Literome: PubMed-scale genomic knowledge base in the cloud

Hoifung Poon; Chris Quirk; Charlie DeZiel; David Heckerman

MOTIVATION Advances in sequencing technology have led to an exponential growth of genomics data, yet it remains a formidable challenge to interpret such data for identifying disease genes and drug targets. There has been increasing interest in adopting a systems approach that incorporates prior knowledge such as gene networks and genotype-phenotype associations. The majority of such knowledge resides in text such as journal publications, which has been undergoing its own exponential growth. It has thus become a significant bottleneck to identify relevant knowledge for genomic interpretation as well as to keep up with new genomics findings. RESULTS In the Literome project, we have developed an automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a Web site to facilitate browsing, searching and reasoning. Currently, Literome focuses on two types of knowledge most pertinent to genomic medicine: directed genic interactions such as pathways and genotype-phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as diseases and drugs associated with a single nucleotide polymorphism or gene. Users can also search for indirect connections between two entities, e.g. a gene and a disease might be linked because an interacting gene is associated with a related disease. AVAILABILITY AND IMPLEMENTATION Literome is freely available at literome.azurewebsites.net. Download for non-commercial use is available via Web services.

international acm sigir conference on research and development in information retrieval | 2009

Page hunt: improving search engines using human computation games

Hao Ma; Raman Chandrasekar; Chris Quirk; Abhishek Gupta

There has been a lot of work on evaluating and improving the relevance of web search engines. In this paper, we suggest using human computation games to elicit data from players that can be used to improve search. We describe Page Hunt, a single-player game. The data elicited using Page Hunt has several applications including providing metadata for pages, providing query alterations for use in query refinement, and identifying ranking issues. We describe an experiment with over 340 game players, and highlight some interesting aspects of the data obtained.

Explore More