Daniel M. Cer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel M. Cer is active.

Explore More

Publication

Featured researches published by Daniel M. Cer.

language and technology conference | 2006

Learning to recognize features of valid textual entailments

Bill MacCartney; Trond Grenager; Marie-Catherine de Marneffe; Daniel M. Cer; Christopher D. Manning

This paper advocates a new architecture for textual inference in which finding a good alignment is separated from evaluating entailment. Current approaches to semantic inference in question answering and textual entailment have approximated the entailment problem as that of computing the best alignment of the hypothesis to the text, using a locally decomposable matching score. We argue that there are significant weaknesses in this approach, including flawed assumptions of monotonicity and locality. Instead we propose a pipelined approach where alignment is followed by a classification step, in which we extract features representing high-level characteristics of the entailment problem, and pass the resulting feature vector to a statistical classifier trained on development data. We report results on data from the 2005 Pascal RTE Challenge which surpass previously reported results for alignment-based systems.

international conference on computational linguistics | 2014

SemEval-2014 Task 10: Multilingual Semantic Textual Similarity

Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Rada Mihalcea; German Rigau; Janyce Wiebe

In Semantic Textual Similarity, systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity. For the English subtask, we exposed the systems to a diversity of testing scenarios, by preparing additional OntoNotesWordNet sense mappings and news headlines, as well as introducing new genres, including image descriptions, DEFT discussion forums, DEFT newswire, and tweet-newswire headline mappings. For Spanish, since, to our knowledge, this is the first time that official evaluations are conducted, we used well-formed text, by featuring sentences extracted from encyclopedic content and newswire. The annotations for both tasks leveraged crowdsourcing. The Spanish subtask engaged 9 teams participating with 22 system runs, and the English subtask attracted 15 teams with 38 system runs.

north american chapter of the association for computational linguistics | 2015

SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability

Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Iñigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe

In semantic textual similarity (STS), systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new datasets in English and Spanish. The annotations for both subtasks leveraged crowdsourcing. The English subtask attracted 29 teams with 74 system runs, and the Spanish subtask engaged 7 teams participating with 16 system runs. In addition, this year we ran a pilot task on interpretable STS, where the systems needed to add an explanatory layer, that is, they had to align the chunks in the sentence pair, explicitly annotating the kind of relation and the score of the chunk pair. The train and test data were manually annotated by an expert, and included headline and image sentence pairs from previous years. 7 teams participated with 29 runs.

meeting of the association for computational linguistics | 2007

Learning Alignments and Leveraging Natural Logic

Nathanael Chambers; Daniel M. Cer; Trond Grenager; David Leo Wright Hall; Chloé Kiddon; Bill MacCartney; Marie-Catherine de Marneffe; Daniel Ramage; Eric Yeh; Christopher D. Manning

We describe an approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level. We present a machine learning approach to alignment scoring, a stochastic search procedure, and a new tool that finds deeper semantic alignments, allowing rapid development of semantic features over the aligned graphs. Further, we describe a complementary semantic component based on natural logic, which shows an added gain of 3.13% accuracy on the RTE3 test set.

north american chapter of the association for computational linguistics | 2016

SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

Eneko Agirre; Carmen Banea; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Rada Mihalcea; German Rigau; Janyce Wiebe

Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.

workshop on statistical machine translation | 2008

Regularization and Search for Minimum Error Rate Training

Daniel M. Cer; Daniel Jurafsky; Christopher D. Manning

Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powells method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the stochastic method obtains test set gains of +0.98 BLEU on MT03 and +0.61 BLEU on MT05. We also present a method for regularizing the MERT objective that achieves statistically significant gains when combined with both Powells method and coordinate descent.

Machine Translation | 2009

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Sebastian Padó; Daniel M. Cer; Michel Galley; Daniel Jurafsky; Christopher D. Manning

Current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations. We believe the main problem to be their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that assesses the quality of MT output through its semantic equivalence to the reference translation, based on a rich set of match and mismatch features motivated by textual entailment. We first evaluate this metric in an evaluation setting against a combination metric of four state-of-the-art scores. Our metric predicts human judgments better than the combination metric. Combining the entailment and traditional features yields further improvements. Then, we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training (MERT) to improve parameter estimation in MT system training. A manual evaluation of the resulting translations indicates that the new model obtains a significant improvement in translation quality.

workshop on statistical machine translation | 2014

Phrasal: A Toolkit for New Directions in Statistical Machine Translation

Spence Green; Daniel M. Cer; Christopher D. Manning

We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like thebitext, and(c)web-basedinteractivemachine translation. A direct comparison with Moses shows favorable results in terms of decoding speed and tuning time.

workshop on statistical machine translation | 2014

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

Spence Green; Daniel M. Cer; Christopher D. Manning

Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific datadomains, identifyanobjectivefunction that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.

joint conference on lexical and computational semantics | 2012