Jesús González-Rubio

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jesús González-Rubio is active.

Explore More

Publication

Featured researches published by Jesús González-Rubio.

workshop on statistical machine translation | 2014

FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task

José G. C. de Souza; Jesús González-Rubio; Christian Buck; Marco Turchi; Matteo Negri

This paper describes the joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014. We present our submis- sions for Task 1.2, 1.3 and 2. Our systems ranked first for Task 1.2 and for the Binary and Level1 settings in Task 2.

conference of the european chapter of the association for computational linguistics | 2014

CASMACAT: A Computer-assisted Translation Workbench

Vicent Alabau; Christian Buck; Michael Carl; Francisco Casacuberta; Mercedes García-Martínez; Ulrich Germann; Jesús González-Rubio; Robin L. Hill; Philipp Koehn; Luis A. Leiva; Bartolomé Mesa-Lao; Daniel Ortiz-Martínez; Herve Saint-Amand; Germán Sanchis Trilles; Chara Tsoukala

CASMACAT is a modular, web-based translation workbench that offers advanced functionalities for computer-aided translation and the scientific study of human translation: automatic interaction with machine translation (MT) engines and translation memories (TM) to obtain raw translations or close TM matches for conventional post-editing; interactive translation prediction based on an MT engine’s search graph, detailed recording and replay of edit actions and translator’s gaze (the latter via eye-tracking), and the support of e-pen as an alternative input device. The system is open source sofware and interfaces with multiple MT systems.

international conference on pattern recognition | 2010

On the Use of Median String for Multi-source Translation

Jesús González-Rubio; Francisco Casacuberta

State-of-the-art approaches to multi-source translation involve a multimodal-like process which applies an individual translation system to each source language. Then, the translations of the individual systems are combined to obtain a consensus output. We propose to use the (generalised) median string as the consensus output of the individual translation systems. Different approximations to the median string are studied as well as different approaches to improve the median string performance when dealing with natural language strings. The proposed approaches were evaluated on the Europarl corpus, achieving significant improvements in translation quality.

conference of the association for machine translation in the americas | 2016

Integrating Online and Active Learning in a Computer-Assisted Translation Workbench

Daniel Ortiz-Martínez; Jesús González-Rubio; Vicent Alabau; Germán Sanchis-Trilles; Francisco Casacuberta

This chapter describes a pilot study aiming at testing the integration of online and active learning features into the computer-assisted translation workbench developed within the CASMACAT project. These features can be used to take advantage of the new knowledge implicitly provided by human experts when they generate new translations. Online learning (OL) allows the system to learn from user feedback in real time by incrementally adapting the parameters of the statistical models involved in the translation process. On the other hand, active learning (AL) determines those sentences that need to be supervised by the user so as to maximize the final translation quality minimizing user effort and, at the same time, improving the statistical model parameters. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. ITP is a computer assisted translation approach where the user interactively collaborates with a statistical machine translation system to generate high quality translations. User activity data was collected from ten translators using key-logging and eye-tracking. We found that ITP with OL performs better than standard ITP, especially in terms of typing effort required from the user to generate correct translations. Additionally, ITP with AL provides better translation quality than standard ITP for the same levels of user effort.

international conference on multimodal interfaces | 2011

An active learning scenario for interactive machine translation

Jesús González-Rubio; Daniel Ortiz-Martínez; Francisco Casacuberta

This paper provides the first experimental study of an active learning (AL) scenario for interactive machine translation (IMT). Unlike other IMT implementations where user feedback is used only to improve the predictions of the system, our IMT implementation takes advantage of user feedback to update the statistical models involved in the translation process. We introduce a sentence sampling strategy to select the sentences that are worth to be interactively translated, and a retraining method to update the statistical models with the user-validated translations. Both, the sampling strategy and the retraining process are designed to work in real-time to meet the severe time constraints inherent to the IMT framework. Experiments in a simulated setting showed that the use of AL dramatically reduces user effort required to obtain translations of a given quality.

conference on computational natural language learning | 2016

Beyond Prefix-Based Interactive Translation Prediction

Jesús González-Rubio; Daniel Ortiz-Martínez; Francisco Casacuberta; Jose Miguel Benedi Ruiz

Current automatic machine translation systems require heavy human proofreading to produce high-quality translations. We present a new interactive machine translation approach aimed at providing a natural collaboration between humans and translation systems. As such, we grant the user complete freedom to validate and correct any part of the translations suggested by the system. Our approach is then designed according to the requirements placed by this unrestricted proofreading protocol. In particular, the ability of the system to suggest new translations coherent with the set of potentially disjoint translation segments validated by the user. We evaluate our approach in a usersimulated setting where reference translations are considered the output desired by a human expert. Results show important reductions in the number of edits in comparison to decoupled post-editing and conventional prefix-based interactive translation prediction. Additionally, we provide evidence that it can also reduce the cognitive overload reported for interactive translation systems in previous user studies.

Pattern Analysis and Applications | 2015

Minimum Bayes' risk subsequence combination for machine translation

Jesús González-Rubio; Francisco Casacuberta

System combination has proved to be a successful technique in the pattern recognition field. However, several difficulties arise when combining the outputs of tasks, e.g. machine translation, that generates structured patterns. So far, machine translation system combination approaches either implement sophisticated classifiers to select one of the provided translations, or generate new sentences by combining the “best” subsequences of the provided translations. We present minimum Bayes’ risk system combination (MBRSC), a system combination method for machine translation that gathers together the advantages of sentence-selection and subsequence-combination methods. MBRSC is able to detect and utilize the “best” subsequences of the provided translations to generate the optimal consensus translation with respect to a particular performance metric. Experiments show that MBRSC obtains significant improvements in translation quality, and a particularly competitive performance when applied to languages with scarce resources.

Machine Translation | 2013

Dimensionality reduction methods for machine translation quality estimation

Jesús González-Rubio; J. Ramón Navarro-Cerdán; Francisco Casacuberta

Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.

iberian conference on pattern recognition and image analysis | 2015

Improving the Minimum Description Length Inference of Phrase-Based Translation Models

Jesús González-Rubio; Francisco Casacuberta

We study the application of minimum description length (MDL) inference to estimate pattern recognition models for machine translation. MDL is a theoretically-sound approach whose empirical results are however below those of the state-of-the-art pipeline of training heuristics. We identify potential limitations of current MDL procedures and provide a practical approach to overcome them. Empirical results support the soundness of the proposed approach.

conference of the european chapter of the association for computational linguistics | 2014

Inference of Phrase-Based Translation Models via Minimum Description Length

Jesús González-Rubio; Francisco Casacuberta

We present an unsupervised inference procedure for phrase-based translation models based on the minimum description length principle. In comparison to current inference techniques that rely on long pipelines of training heuristics, this procedure represents a theoretically wellfounded approach to directly infer phrase lexicons. Empirical results show that the proposed inference procedure has the potential to overcome many of the problems inherent to the current inference approaches for phrase-based models.

Explore More