David Vilar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Vilar is active.

Explore More

Publication

Featured researches published by David Vilar.

meeting of the association for computational linguistics | 2005

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation

Stephan Kanthak; David Vilar; Richard Zens; Hermann Ney

This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce different reordering constraints. In translation, we apply source sentence reordering on word level and use a reordering automaton as input. We show how to compute reordering automata on-demand using IBM or ITG constraints, and also introduce two new types of reordering constraints. We further add weights to the reordering automata. We present detailed experimental results and show that reordering significantly improves translation quality.

workshop on statistical machine translation | 2007

Can We Translate Letters

David Vilar; Jan-Thorsten Peter; Hermann Ney

Current statistical machine translation systems handle the translation process as the transformation of a string of symbols into another string of symbols. Normally the symbols dealt with are the words in different languages, sometimes with some additional information included, like morphological data. In this work we try to push the approach to the limit, working not on the level of words, but treating both the source and target sentences as a string of letters. We try to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words. Experiments are carried out for the translation of Catalan to Spanish.

international conference natural language processing | 2004

Multi-label Text Classification Using Multinomial Models

David Vilar; María José Castro; Emilio Sanchis

Traditional approaches to pattern recognition tasks normally consider only the unilabel classification problem, that is, each observation (both in the training and test sets) has one unique class label associated to it. Yet in many real-world tasks this is only a rough approximation, as one sample can be labeled with a set of classes and thus techniques for the more general multi-label problem have to be explored. In this paper we review the techniques presented in our previous work and discuss its application to the field of text classification, using the multinomial (Naive Bayes) classifier. Results are presented on the Reuters-21578 dataset, and our proposed approach obtains satisfying results.

workshop on statistical machine translation | 2007

Human Evaluation of Machine Translation Through Binary System Comparisons

David Vilar; Gregor Leusch; Hermann Ney; Rafael E. Banchs

We introduce a novel evaluation scheme for the human evaluation of different machine translation systems. Our method is based on direct comparison of two sentences at a time by human judges. These binary judgments are then used to decide between all possible rankings of the systems. The advantages of this new method are the lower dependency on extensive evaluation guidelines, and a tighter focus on a typical evaluation task, namely the ranking of systems. Furthermore we argue that machine translation evaluations should be regarded as statistical processes, both for human and automatic evaluation. We show how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates. We give an example of our new evaluation scheme, as well as a comparison with classical automatic and human evaluation on data from a recent international evaluation campaign.

Machine Translation | 2012

Jane: an advanced freely available hierarchical machine translation toolkit

David Vilar; Daniel Stein; Matthias Huck; Hermann Ney

In this article we will describe the design and implementation of Jane, an efficient hierarchical phrase-based (HPB) toolkit developed at RWTH Aachen University. The system has been used by RWTH at several international evaluation campaigns, including the WMT and NIST evaluations, and is now freely available for non-commercial application. We will go through the main features of Jane, which include, among others, support for different search strategies, different language model formats, support for syntax-based enhancements to the HPB machine translation paradigm, string-to-dependency translation, extended lexicon models, different methods for minimum-error-rate training and distributed operation on a computer cluster. Special attention has been paid to the efficiency of the decoder, clean code and quality assurance through unit and regression testing. Results on current machine translation tasks are reported, which show that the system is able to obtain state-of-the-art performance.

The Prague Bulletin of Mathematical Linguistics | 2011

A Guide to Jane, an Open Source Hierarchical Translation Toolkit

Daniel Stein; David Vilar; Stephan Peitz; Markus Freitag; Matthias Huck; Hermann Ney

A Guide to Jane, an Open Source Hierarchical Translation Toolkit Jane is RWTHs hierarchical phrase-based translation toolkit. It includes tools for phrase extraction, translation and scaling factor optimization, with efficient and documented programs of which large parts can be parallelized. The decoder features syntactic enhancements, reorderings, triplet models, discriminative word lexica, and support for a variety of language model formats. In this article, we will review the main features of Jane and explain the overall architecture. We will also indicate where and how new models can be included.

meeting of the association for computational linguistics | 2005

Augmenting a small parallel text with morpho-syntactic language resources for Serbian-English statistical machine translation

Maja Popović; David Vilar; Hermann Ney; Slobodan T. Jovičić; Zoran Saric

In this work, we examine the quality of several statistical machine translation systems constructed on a small amount of parallel Serbian-English text. The main bilingual parallel corpus consists of about 3k sentences and 20k running words from an unrestricted domain. The translation systems are built on the full corpus as well as on a reduced corpus containing only 200 parallel sentences. A small set of about 350 short phrases from the web is used as additional bilingual knowledge. In addition, we investigate the use of monolingual morpho-syntactic knowledge i.e. base forms and POS tags.

language resources and evaluation | 2012

Involving Language Professionals in the Evaluation of Machine Translation

Eleftherios Avramidis; Aljoscha Burchardt; Christian Federmann; Maja Popović; Cindy Tscherwinka; David Vilar

Abstract Significant breakthroughs in machine translation (MT) only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXŰ project has paved the way for wide usage of multiple MT outputs through various feedback loops in system development. The project has integrated human translators into the development process thus collecting feedback for possible improvements. This paper describes results from detailed human evaluation. Performance of different types of translation systems has been compared and analysed via ranking, error analysis and post-editing.

north american chapter of the association for computational linguistics | 2007

Analysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems

Marta Ruiz Costa-Jussà; Josep Maria Crego; David Vilar; José A. R. Fonollosa; José B. Mariño; Hermann Ney

In the framework of the Tc-Star project, we analyze and propose a combination of two Statistical Machine Translation systems: a phrase-based and an N-gram-based one. The exhaustive analysis includes a comparison of the translation models in terms of efficiency (number of translation units used in the search and computational time) and an examination of the errors in each systems output. Additionally, we combine both systems, showing accuracy improvements.

workshop on statistical machine translation | 2009

The RWTH Machine Translation System for WMT 2009

Maja Popović; David Vilar; Daniel Stein; Hermann Ney

RWTH participated in the shared translation task of the Fourth Workshop of Statistical Machine Translation (WMT 2009) with the German-English, French-English and Spanish-English pair in each translation direction. The submissions were generated using a phrase-based and a hierarchical statistical machine translation systems with appropriate morpho-syntactic enhancements. pos-based reorderings of the source language for the phrase-based systems and splitting of German compounds for both systems were applied. For some tasks, a system combination was used to generate a final hypothesis. An additional English hypothesis was produced by combining all three final systems for translation into English.

Explore More