Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Raphael Rubino is active.

Publication


Featured researches published by Raphael Rubino.


meeting of the association for computational linguistics | 2016

Findings of the 2016 Conference on Machine Translation.

Ondˇrej Bojar; Rajen Chatterjee; Christian Federmann; Yvette Graham; Barry Haddow; Matthias Huck; Antonio Jimeno Yepes; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Aurélie Névéol; Mariana L. Neves; Martin Popel; Matt Post; Raphael Rubino; Carolina Scarton; Lucia Specia; Marco Turchi; Karin Verspoor; Marcos Zampieri

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.


international conference on computational linguistics | 2011

A multi-view approach for term translation spotting

Raphael Rubino; Georges Linarès

This paper presents a multi-view approach for term translation spotting, based on a bilingual lexicon and comparable corpora. We propose to study different levels of representation for a term: the context, the theme and the orthography. These three approaches are studied individually and combined in order to rank translation candidates. We focus our task on French-English medical terms. Experiments show a significant improvement of the classical context-based approach, with a F-score of 40.3% for the first ranked translation candidates.


workshop on statistical machine translation | 2015

Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling

Raphael Rubino; Tommi A. Pirinen; Miquel Esplà-Gomis; Nikola Ljubešić; Sergio Ortiz Rojas; Vassilis Papavassiliou; Prokopis Prokopidis; Antonio Toral

This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish‐English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish language by (i) crawling parallel and monolingual data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Several statistical machine translation approaches are evaluated and then combined to obtain our final submissions, which are the top performing English-to-Finnish unconstrained (all automatic metrics) and constrained (BLEU), and Finnish-to-English constrained (TER) systems.


workshop on statistical machine translation | 2014

Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules

Raphael Rubino; Antonio Toral; Víctor M. Sánchez-Cartagena; Jorge Ferrández-Tordera; Sergio Ortiz Rojas; Gema Ramírez-Sánchez; Felipe Sánchez-Martínez; Andy Way

This paper presents the machine translation systems submitted by the AbuMaTran project to the WMT 2014 translation task. The language pair concerned is English‐French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all the datasets provided by the shared task organisers, as well as the monolingual data from LDC. To build the translation models, we apply a two-step data selection method based on bilingual crossentropy difference and vocabulary saturation, considering each parallel corpus individually. Synthetic translation rules are extracted from the development sets and used to train another translation model. We then interpolate the translation models, minimising the perplexity on the development sets, to obtain our final SMT system. Our submission for the English to French translation task was ranked second amongst nine teams and a total of twenty submissions.


Machine Translation | 2015

Quality estimation-guided supplementary data selection for domain adaptation of statistical machine translation

Pratyush Banerjee; Raphael Rubino; Johann Roturier; Josef van Genabith

The problem of domain adaptation in statistical machine translation systems emanates from the fundamental assumption that test and training data are drawn independently from the same distribution (topic, domain, genre, style etc.). In real-life translation tasks, the sparseness of in-domain parallel training data often leads to poor model estimation, and consequentially poor translation quality. Domain adaptation by supplementary data selection aims at addressing this specific issue by selecting relevant parallel training data from out-of-domain or general-domain bi-text to enhance the quality of a poor baseline system. State-of-the-art research in data selection focuses on the development of novel similarity measures to improve the relevance of selected data. However, in this paper we approach the problem from a different perspective. In contrast to the conventional approach of using the entire available target-domain data as a reference for supplementary data selection, we restrict the reference set to only those sentences that are expected to be poorly translated by the baseline MT system using a Quality Estimation model. Our rationale is to focus help (i.e. supplementary training material) to where it is needed most. Automatic quality estimation techniques are used to identify such poorly translated sentences in the target domain. The experiments reported in this paper show that (i) this technique provides statistically significant improvements over the unadapted baseline translation and (ii) using significantly smaller amounts of supplementary data our approach achieves results comparable to state-of-the-art approaches using conventional reference sets.


language resources and evaluation | 2017

Crawl and crowd to bring machine translation to under-resourced languages

Antonio Toral; Miquel Esplà-Gomis; Filip Klubička; Nikola Ljubešić; Vassilis Papavassiliou; Prokopis Prokopidis; Raphael Rubino; Andy Way

AbstractWe present a widely applicable methodology to bring machine translation (MT) to under-resourced languages in a cost-effective and rapid manner. Our proposal relies on web crawling to automatically acquire parallel data to train statistical MT systems if any such data can be found for the language pair and domain of interest. If that is not the case, we resort to (1) crowdsourcing to translate small amounts of text (hundreds of sentences), which are then used to tune statistical MT models, and (2) web crawling of vast amounts of monolingual data (millions of sentences), which are then used to build language models for MT. We apply these to two respective use-cases for Croatian, an under-resourced language that has gained relevance since it recently attained official status in the European Union. The first use-case regards tourism, given the importance of this sector to Croatia’s economy, while the second has to do with tweets, due to the growing importance of social media. For tourism, we crawl parallel data from 20 web domains using two state-of-the-art crawlers and explore how to combine the crawled data with bigger amounts of general-domain data. Our domain-adapted system is evaluated on a set of three additional tourism web domains and it outperforms the baseline in terms of automatic metrics and/or vocabulary coverage. In the social media use-case, we deal with tweets from the 2014 edition of the soccer World Cup. We build domain-adapted systems by (1) translating small amounts of tweets to be used for tuning by means of crowdsourcing and (2) crawling vast amounts of monolingual tweets. These systems outperform the baseline (Microsoft Bing) by 7.94 BLEU points (5.11 TER) for Croatian-to-English and by 2.17 points (1.94 TER) for English-to-Croatian on a test set translated by means of crowdsourcing. A complementary manual analysis sheds further light on these results.


north american chapter of the association for computational linguistics | 2016

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification

Raphael Rubino; Ekaterina Lapshinova-Koltunski; Josef van Genabith

This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.


biomedical engineering and informatics | 2011

Audio indexing on a medical video database: The AVISON project

Grágory Senay; Stanislas Oger; Raphael Rubino; Georges Linarès; Thomas Parent

This paper presents an overview of our research conducted in the context of the AVISON project which aims to develop a platform for indexing surgery videos of the Institute of Research Against Digestive Cancer. The platform is intended to provide a friendly query-based access to the videos database of IRCAD institute, that is dedicated to the training of international surgeons. A text-based indexing system is used for querying the videos where the textual contents are obtained with an automatic speech recognition system. The paper presents the new approaches that we proposed for dealing with these highly specialised data in an automatic manner. We present new approaches for obtaining low-cost training corpus, for automatically adapting the automatic speech recognition system, for allowing multilingual querying of videos and, finally, for filtering documents that could affect the database quality due to transcription errors.


Rubino, Raphael and de Souza, Jose and Foster, Jennifer and Specia, Lucia (2013) Topic models for translation quality estimation for gisting purposes. In: MT Summit XIV, 2-6 Sept. 2013, Nice, France. | 2013

Topic models for translation quality estimation for gisting purposes

Raphael Rubino; José G. C. de Souza; Jennifer Foster; Lucia Specia


workshop on statistical machine translation | 2012

DCU-Symantec Submission for the WMT 2012 Quality Estimation Task

Raphael Rubino; Jennifer Foster; Joachim Wagner; Johann Roturier; Rasoul Samad Zadeh Kaljahi; Fred Hollowood

Collaboration


Dive into the Raphael Rubino's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andy Way

Dublin City University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge