José G. C. de Souza
Dublin City University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José G. C. de Souza.
workshop on statistical machine translation | 2014
José G. C. de Souza; Jesús González-Rubio; Christian Buck; Marco Turchi; Matteo Negri
This paper describes the joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014. We present our submis- sions for Task 1.2, 1.3 and 2. Our systems ranked first for Task 1.2 and for the Binary and Level1 settings in Task 2.
meeting of the association for computational linguistics | 2014
Marco Turchi; Antonios Anastasopoulos; José G. C. de Souza; Matteo Negri
The automatic estimation of machine translation (MT) output quality is a hard task in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role. When moving from controlled lab evaluations to real-life scenarios the task becomes even harder. For current MT quality estimation (QE) systems, additional complexity comes from the difficulty to model user and domain changes. Indeed, the instability of the systems with respect to data coming from different distributions calls for adaptive solutions that react to new operating conditions. To tackle this issue we propose an online framework for adaptive QE that targets reactivity and robustness to user and domain changes. Contrastive experiments in different testing conditions involving user and domain changes demonstrate the effectiveness of our approach.
north american chapter of the association for computational linguistics | 2015
José G. C. de Souza; Hamed Zamani; Matteo Negri; Marco Turchi; Falavigna Daniele
We investigate the problem of predicting the quality of automatic speech recognition (ASR) output under the following rigid constraints: i) reference transcriptions are not available, ii) confidence information about the system that produced the transcriptions is not accessible, and iii) training and test data come from multiple domains. To cope with these constraints (typical of the constantly increasing amount of automatic transcriptions that can be found on the Web), we propose a domain-adaptive approach based on multitask learning. Different algorithms and strategies are evaluated with English data coming from four domains, showing that the proposed approach can cope with the limitations of previously proposed single task learning methods.
international joint conference on natural language processing | 2015
José G. C. de Souza; Matteo Negri; Elisa Ricci; Marco Turchi
We present a method for predicting machine translation output quality geared to the needs of computer-assisted translation. These include the capability to: i) continuously learn and self-adapt to a stream of data coming from multiple translation jobs, ii) react to data diversity by exploiting human feedback, and iii) leverage data similarity by learning and transferring knowledge across domains. To achieve these goals, we combine two supervised machine learning paradigms, online and multitask learning, adapting and unifying them in a single framework. We show the effectiveness of our approach in a regression task (HTER prediction), in which online multitask learning outperforms the competitive online single-task and pooling methods used for comparison. This indicates the feasibility of integrating in a CAT tool a single QE component capable to simultaneously serve (and continuously learn from) multiple translation jobs involving different domains and users.
meeting of the association for computational linguistics | 2016
Shahab Jalalvand; Matteo Negri; Marco Turchi; José G. C. de Souza; Daniele Falavigna; Mohammed R. H. Qwaider
We present TranscRater, an open-source tool for automatic speech recognition (ASR) quality estimation (QE). The tool allows users to perform ASR evaluation bypassing the need of reference transcripts and confidence information, which is common to current assessment protocols. TranscRater includes: i) methods to extract a variety of quality indicators from (signal, transcription) pairs and ii) machine learning algorithms which make possible to build ASR QE models exploiting the extracted features. Confirming the positive results of previous evaluations, new experiments with TranscRater indicate its effectiveness both in WER prediction and transcription ranking tasks.
north american chapter of the association for computational linguistics | 2016
Duygu Ataman; José G. C. de Souza; Marco Turchi; Matteo Negri
This paper describes the system by FBK HLTMT for cross-lingual semantic textual similarity measurement. Our approach is based on supervised regression with an ensemble decision tree. In order to assign a semantic similarity score to an input sentence pair, the model combines features collected by state-of-the-art methods in machine translation quality estimation and distance metrics between crosslingual embeddings of the two sentences. In our analysis, we compare different techniques for composing sentence vectors, several distance features and ways to produce training data. The proposed system achieves a mean Pearson’s correlation of 0.39533, ranking 7 among all participants in the cross-lingual STS task organized within the SemEval 2016 evaluation campaign.
meeting of the association for computational linguistics | 2016
Masoud Jalili Sabet; Matteo Negri; Marco Turchi; José G. C. de Souza; Marcello Federico
We present TMop, the first open-source tool for automatic Translation Memory (TM) cleaning. The tool implements a fully unsupervised approach to the task, which allows spotting unreliable translation units (sentence pairs in different languages, which are supposed to be translations of each other) without requiring labeled training data. TMop includes a highly configurable and extensible set of filters capturing different aspects of translation quality. It has been evaluated on a test set composed of 1,000 translation units (TUs) randomly extracted from the English-Italian version of MyMemory, a large-scale public TM. Results indicate its effectiveness in automatic removing “bad” TUs, with comparable performance to a state-of-the-art supervised method (76.3 vs. 77.7 balanced accuracy).
workshop on statistical machine translation | 2013
José G. C. de Souza; Christian Buck; Marco Turchi; Matteo Negri
Rubino, Raphael and de Souza, Jose and Foster, Jennifer and Specia, Lucia (2013) Topic models for translation quality estimation for gisting purposes. In: MT Summit XIV, 2-6 Sept. 2013, Nice, France. | 2013
Raphael Rubino; José G. C. de Souza; Jennifer Foster; Lucia Specia
international conference on computational linguistics | 2014
Matteo Negri; Marco Turchi; José G. C. de Souza; Falavigna Daniele