Jesús González-Rubio
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jesús González-Rubio.
workshop on statistical machine translation | 2014
José G. C. de Souza; Jesús González-Rubio; Christian Buck; Marco Turchi; Matteo Negri
This paper describes the joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014. We present our submis- sions for Task 1.2, 1.3 and 2. Our systems ranked first for Task 1.2 and for the Binary and Level1 settings in Task 2.
conference of the european chapter of the association for computational linguistics | 2014
Vicent Alabau; Christian Buck; Michael Carl; Francisco Casacuberta; Mercedes García-Martínez; Ulrich Germann; Jesús González-Rubio; Robin L. Hill; Philipp Koehn; Luis A. Leiva; Bartolomé Mesa-Lao; Daniel Ortiz-Martínez; Herve Saint-Amand; Germán Sanchis Trilles; Chara Tsoukala
CASMACAT is a modular, web-based translation workbench that offers advanced functionalities for computer-aided translation and the scientific study of human translation: automatic interaction with machine translation (MT) engines and translation memories (TM) to obtain raw translations or close TM matches for conventional post-editing; interactive translation prediction based on an MT engine’s search graph, detailed recording and replay of edit actions and translator’s gaze (the latter via eye-tracking), and the support of e-pen as an alternative input device. The system is open source sofware and interfaces with multiple MT systems.
international conference on pattern recognition | 2010
Jesús González-Rubio; Francisco Casacuberta
State-of-the-art approaches to multi-source translation involve a multimodal-like process which applies an individual translation system to each source language. Then, the translations of the individual systems are combined to obtain a consensus output. We propose to use the (generalised) median string as the consensus output of the individual translation systems. Different approximations to the median string are studied as well as different approaches to improve the median string performance when dealing with natural language strings. The proposed approaches were evaluated on the Europarl corpus, achieving significant improvements in translation quality.
conference of the association for machine translation in the americas | 2016
Daniel Ortiz-Martínez; Jesús González-Rubio; Vicent Alabau; Germán Sanchis-Trilles; Francisco Casacuberta
This chapter describes a pilot study aiming at testing the integration of online and active learning features into the computer-assisted translation workbench developed within the CASMACAT project. These features can be used to take advantage of the new knowledge implicitly provided by human experts when they generate new translations. Online learning (OL) allows the system to learn from user feedback in real time by incrementally adapting the parameters of the statistical models involved in the translation process. On the other hand, active learning (AL) determines those sentences that need to be supervised by the user so as to maximize the final translation quality minimizing user effort and, at the same time, improving the statistical model parameters. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. ITP is a computer assisted translation approach where the user interactively collaborates with a statistical machine translation system to generate high quality translations. User activity data was collected from ten translators using key-logging and eye-tracking. We found that ITP with OL performs better than standard ITP, especially in terms of typing effort required from the user to generate correct translations. Additionally, ITP with AL provides better translation quality than standard ITP for the same levels of user effort.
international conference on multimodal interfaces | 2011
Jesús González-Rubio; Daniel Ortiz-Martínez; Francisco Casacuberta
This paper provides the first experimental study of an active learning (AL) scenario for interactive machine translation (IMT). Unlike other IMT implementations where user feedback is used only to improve the predictions of the system, our IMT implementation takes advantage of user feedback to update the statistical models involved in the translation process. We introduce a sentence sampling strategy to select the sentences that are worth to be interactively translated, and a retraining method to update the statistical models with the user-validated translations. Both, the sampling strategy and the retraining process are designed to work in real-time to meet the severe time constraints inherent to the IMT framework. Experiments in a simulated setting showed that the use of AL dramatically reduces user effort required to obtain translations of a given quality.
conference on computational natural language learning | 2016
Jesús González-Rubio; Daniel Ortiz-Martínez; Francisco Casacuberta; Jose Miguel Benedi Ruiz
Current automatic machine translation systems require heavy human proofreading to produce high-quality translations. We present a new interactive machine translation approach aimed at providing a natural collaboration between humans and translation systems. As such, we grant the user complete freedom to validate and correct any part of the translations suggested by the system. Our approach is then designed according to the requirements placed by this unrestricted proofreading protocol. In particular, the ability of the system to suggest new translations coherent with the set of potentially disjoint translation segments validated by the user. We evaluate our approach in a usersimulated setting where reference translations are considered the output desired by a human expert. Results show important reductions in the number of edits in comparison to decoupled post-editing and conventional prefix-based interactive translation prediction. Additionally, we provide evidence that it can also reduce the cognitive overload reported for interactive translation systems in previous user studies.
Pattern Analysis and Applications | 2015
Jesús González-Rubio; Francisco Casacuberta
System combination has proved to be a successful technique in the pattern recognition field. However, several difficulties arise when combining the outputs of tasks, e.g. machine translation, that generates structured patterns. So far, machine translation system combination approaches either implement sophisticated classifiers to select one of the provided translations, or generate new sentences by combining the “best” subsequences of the provided translations. We present minimum Bayes’ risk system combination (MBRSC), a system combination method for machine translation that gathers together the advantages of sentence-selection and subsequence-combination methods. MBRSC is able to detect and utilize the “best” subsequences of the provided translations to generate the optimal consensus translation with respect to a particular performance metric. Experiments show that MBRSC obtains significant improvements in translation quality, and a particularly competitive performance when applied to languages with scarce resources.
Machine Translation | 2013
Jesús González-Rubio; J. Ramón Navarro-Cerdán; Francisco Casacuberta
Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.
iberian conference on pattern recognition and image analysis | 2015
Jesús González-Rubio; Francisco Casacuberta
We study the application of minimum description length (MDL) inference to estimate pattern recognition models for machine translation. MDL is a theoretically-sound approach whose empirical results are however below those of the state-of-the-art pipeline of training heuristics. We identify potential limitations of current MDL procedures and provide a practical approach to overcome them. Empirical results support the soundness of the proposed approach.
conference of the european chapter of the association for computational linguistics | 2014
Jesús González-Rubio; Francisco Casacuberta
We present an unsupervised inference procedure for phrase-based translation models based on the minimum description length principle. In comparison to current inference techniques that rely on long pipelines of training heuristics, this procedure represents a theoretically wellfounded approach to directly infer phrase lexicons. Empirical results show that the proposed inference procedure has the potential to overcome many of the problems inherent to the current inference approaches for phrase-based models.