Germán Sanchis-Trilles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Germán Sanchis-Trilles is active.

Explore More

Publication

Featured researches published by Germán Sanchis-Trilles.

Pattern Recognition | 2012

Online adaptation strategies for statistical machine translation in post-editing scenarios

Pascual Martínez-Gómez; Germán Sanchis-Trilles; Francisco Casacuberta

One of the most promising approaches to machine translation consists in formulating the problem by means of a pattern recognition approach. By doing so, there are some tasks in which online adaptation is needed in order to adapt the system to changing scenarios. In the present work, we perform an exhaustive comparison of four online learning algorithms when combined with two adaptation strategies for the task of online adaptation in statistical machine translation. Two of these algorithms are already well-known in the pattern recognition community, such as the perceptron and passive-aggressive algorithms, but here they are thoroughly analyzed for their applicability in the statistical machine translation task. In addition, we also compare them with two novel methods, i.e., Bayesian predictive adaptation and discriminative ridge regression. In statistical machine translation, the most successful approach is based on a log-linear approximation to a posteriori distribution. According to experimental results, adapting the scaling factors of this log-linear combination of models using discriminative ridge regression or Bayesian predictive adaptation yields the best performance.

empirical methods in natural language processing | 2008

Improving Interactive Machine Translation via Mouse Actions

Germán Sanchis-Trilles; Daniel Ortiz-Martínez; Jorge Civera; Francisco Casacuberta; Enrique Vidal; Hieu Hoang

Although Machine Translation (MT) is a very active research field which is receiving an increasing amount of attention from the research community, the results that current MT systems are capable of producing are still quite far away from perfection. Because of this, and in order to build systems that yield correct translations, human knowledge must be integrated into the translation process, which will be carried out in our case in an Interactive-Predictive (IP) framework. In this paper, we show that considering Mouse Actions as a significant information source for the underlying system improves the productivity of the human translator involved. In addition, we also show that the initial translations that the MT system provides can be quickly improved by an expert by only performing additional Mouse Actions. In this work, we will be using word graphs as an efficient interface between a phrase-based MT system and the IP engine.

Machine Translation | 2014

Interactive translation prediction versus conventional post-editing in practice: a study with the CasMaCat workbench

Germán Sanchis-Trilles; Vicent Alabau; Christian Buck; Michael Carl; Francisco Casacuberta; Mercedes García-martínez; Ulrich Germann; Jesús González-Rubio; Robin L. Hill; Philipp Koehn; Luis A. Leiva; Bartolomé Mesa-Lao; Daniel Ortiz-Martínez; Herve Saint-Amand; Chara Tsoukala; Enrique Vidal

We conducted a field trial in computer-assisted professional translation to compare interactive translation prediction (ITP) against conventional post-editing (PE) of machine translation (MT) output. In contrast to the conventional PE set-up, where an MT system first produces a static translation hypothesis that is then edited by a professional (hence “post-editing”), ITP constantly updates the translation hypothesis in real time in response to user edits. Our study involved nine professional translators and four reviewers working with the web-based CasMaCatxa0 workbench. Various new interactive features aiming to assist the post-editor/translator were also tested in this trial. Our results show that even with little training, ITP can be as productive as conventional PE in terms of the total time required to produce the final translation. Moreover, translation editors working with ITP require fewer key strokes to arrive at the final version of their translation.

IberSPEECH | 2012

Online Learning of Log-Linear Weights in Interactive Machine Translation

Francisco Javier López-Salcedo; Germán Sanchis-Trilles; Francisco Casacuberta

Whenever the quality provided by a machine translation system is not enough, a human expert is required to correct the sentences provided by the machine translation system. In this environment, the human translator is generating bilingual data after each translation has been marked as correct, and expects the system to be able to learn from the errors made. In this paper, we analyse the appropriateness of discriminative ridge regression for adapting the scaling factors of a state-of-the-art machine translation system within a conventional post-editing scenario and also within an interactive machine translation setup. Results show that the strategies applied in the former setup cannot be directly applied in the latter framework. Hence, the discriminative ridge regression is revised and adapted for the interactive machine translation framework, with encouraging results.

international conference on computational linguistics | 2011

Online learning via dynamic reranking for computer assisted translation

Pascual Martínez-Gómez; Germán Sanchis-Trilles; Francisco Casacuberta

New techniques for online adaptation in computer assisted translation are explored and compared to previously existing approaches. Under the online adaptation paradigm, the translation system needs to adapt itself to real-world changing scenarios, where training and tuning may only take place once, when the system is set-up for the first time. For this purpose, post-edit information, as described by a given quality measure, is used as valuable feedback within a dynamic reranking algorithm. Two possible approaches are presented and evaluated. The first one relies on the well-known perceptron algorithm, whereas the second one is a novel approach using the Ridge regression in order to compute the optimum scaling factors within a state-of-the-art SMT system. Experimental results show that such algorithms are able to improve translation quality by learning from the errors produced by the system on a sentence-by-sentence basis.

human factors in computing systems | 2014

Representatively memorable: sampling the right phrase set to get the text entry experiment right

Luis A. Leiva; Germán Sanchis-Trilles

In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date there is no automated method to achieve this effect. As a result, researchers have to use either manually curated English-only phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we present a novel sampling method based on two core ideas: a multiple regression model over language-independent features, and the statistical analysis of the corpus from which phrases will be drawn. Our results show that researchers can finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source code as well as our phrase sets are publicly available.

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition | 2010

Similarity word-sequence kernels for sentence clustering

Jesús Andrés-Ferrer; Germán Sanchis-Trilles; Francisco Casacuberta

In this paper, we present a novel clustering approach based on the use of kernels as similarity functions and the C-means algorithm. Several word-sequence kernels are defined and extended to verify the properties of similarity functions. Afterwards, these monolingual wordsequence kernels are extended to bilingual word-sequence kernels, and applied to the task of monolingual and bilingual sentence clustering. The motivation of this proposal is to group similar sentences into clusters so that specialised models can be trained for each cluster, with the purpose of reducing in this way both the size and complexity of the initial task.We provide empirical evidence for proving that the use of bilingual kernels can lead to better clusters, in terms of intra-cluster perplexities.

conference of the association for machine translation in the americas | 2016

Integrating Online and Active Learning in a Computer-Assisted Translation Workbench

Daniel Ortiz-Martínez; Jesús González-Rubio; Vicent Alabau; Germán Sanchis-Trilles; Francisco Casacuberta

This chapter describes a pilot study aiming at testing the integration of online and active learning features into the computer-assisted translation workbench developed within the CASMACAT project. These features can be used to take advantage of the new knowledge implicitly provided by human experts when they generate new translations. Online learning (OL) allows the system to learn from user feedback in real time by incrementally adapting the parameters of the statistical models involved in the translation process. On the other hand, active learning (AL) determines those sentences that need to be supervised by the user so as to maximize the final translation quality minimizing user effort and, at the same time, improving the statistical model parameters. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. ITP is a computer assisted translation approach where the user interactively collaborates with a statistical machine translation system to generate high quality translations. User activity data was collected from ten translators using key-logging and eye-tracking. We found that ITP with OL performs better than standard ITP, especially in terms of typing effort required from the user to generate correct translations. Additionally, ITP with AL provides better translation quality than standard ITP for the same levels of user effort.

human computer interaction with mobile devices and services | 2014

A systematic comparison of 3 phrase sampling methods for text entry experiments in 10 languages

Germán Sanchis-Trilles; Luis A. Leiva

Todays reference datasets for conducting text entry experiments are only available in English, which may lead to misleading results when testing text entry methods with non-native English speakers. We compared 3 automated phrase sampling methods available in the literature: Random, Ngram, and MemRep. It was found that MemRep performs best according to a statistical analysis and qualitative observations. This resulted in a collection of 30 datasets across 10 major languages, and we wish to share them with the community via this paper.

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition | 2010

Bayesian adaptation for statistical machine translation

Germán Sanchis-Trilles; Francisco Casacuberta

In many pattern recognition problems, learning from training samples is a process that requires important amounts of training data and a high computational effort. Sometimes, only limited training data and/or limited computational resources are available, but there is also available a previous system trained for a closely related task and with enough training material. This scenario is very frequent in statistical machine translation and adaptation can be a solution to deal with this problem. In this paper, we present an adaptation technique for (state-of-the-art) log-linear modelling based on the well-known Bayesian learning paradigm. This technique has been applied to statistical machine translation and can be easily extended to other pattern recognition areas in which log-linear models are used. We show empirical results in which a small amount of adaptation data is able to improve both the nonadapted system and a system that optimises the above-mentioned weights only on the adaptation set.

Explore More