Rudolf Rosa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rudolf Rosa is active.

Explore More

Publication

Featured researches published by Rudolf Rosa.

Artificial Intelligence in Medicine | 2014

Adaptation of machine translation for multilingual information retrieval in the medical domain

Pavel Pecina; Ondřej Dušek; Lorraine Goeuriot; Jan Hajic; Jaroslava Hlaváčová; Gareth J. F. Jones; Liadh Kelly; Johannes Leveling; David Mareček; Michal Novák; Martin Popel; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová

OBJECTIVE We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. METHODS AND DATA Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. RESULTS The search query translation results achieved in our experiments are outstanding - our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. CONCLUSIONS Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance - better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.

international joint conference on natural language processing | 2015

KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer

Rudolf Rosa; Zdenek Zabokrtsky

We present KLcpos3 , a language similarity measure based on Kullback-Leibler divergence of coarse part-of-speech tag trigram distributions in tagged corpora. It has been designed for multilingual delexicalized parsing, both for source treebank selection in single-source parser transfer, and for source treebank weighting in multi-source transfer. In the selection task, KLcpos3 identifies the best source treebank in 8 out of 18 cases. In the weighting task, it brings +4.5% UAS absolute, compared to unweighted parse tree combination.

workshop on statistical machine translation | 2014

Machine Translation of Medical Texts in the Khresmoi Project

Ondřej Dušek; Jan Hajiċ; Jaroslava Hlaváċová; Michal Novák; Pavel Pecina; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová; Daniel Zeman

This paper presents the participation of the Charles University team in the WMT 2014 Medical Translation Task. Our systems are developed within the Khresmoi project, a large integrated project aiming to deliver a multi-lingual multi-modal search and access system for biomedical information and documents. Being involved in the organization of the Medical Translation Task, our primary goal is to set up a baseline for both its subtasks (summary translation and query translation) and for all translation directions. Our systems are based on the phrasebased Moses system and standard methods for domain adaptation. The constrained/unconstrained systems differ in the training data only.

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016

Dictionary-based Domain Adaptation of MT Systems without Retraining

Rudolf Rosa; Roman Sudarikov; Michal Novák; Martin Popel; Ondrej Bojar

We describe our submission to the ITdomain translation task of WMT 2016. We perform domain adaptation with dictionary data on already trained MT systems with no further retraining. We apply our approach to two conceptually different systems developed within the QTLeap project: TectoMT and Moses, as well as Chimera, their combination. In all settings, our method improves the translation quality. Moreover, the basic variant of our approach is applicable to any MT system, including a black-box one.

international workshop/conference on parsing technologies | 2015

MSTParser Model Interpolation for Multi-Source Delexicalized Transfer

Rudolf Rosa; ZdenÄ›k Żabokrtský

We introduce interpolation of trained MSTParser models as a resource combination method for multi-source delexicalized parser transfer. We present both an unweighted method, as well as a variant in which each source model is weighted by the similarity of the source language to the target language. Evaluation on the HamleDT treebank collection shows that the weighted model interpolation performs comparably to weighted parse tree combination method, while being computationally much less demanding.

The Prague Bulletin of Mathematical Linguistics | 2013

MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service

Aleš Tamchyna; Ondřej Dušek; Rudolf Rosa; Pavel Pecina

Abstract We present a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post-processing. It is currently used to provide MT between several languages for cross-lingual information retrieval in the EU FP7 Khresmoi project. The software consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. We present the overall design of the software and test results which document speed and scalability of our solution. Our software is licensed under the Apache 2.0 licence and is available for download from the Lindat-Clarin repository and Github.

Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) | 2017

Slavic Forest, Norwegian Wood.

Rudolf Rosa; Daniel Zeman; David Mareček

We once had a corp, or should we say, it once hadus They showed us its tags, isn’t it great,unifiedtags They asked us to parse and they told us to useeverything So we looked around and we noticed there was nearnothing We took other langs, bitext aligned: words one-to-one We played for two weeks, and then they said, here is the test The parser kept training till morning,

text speech and dialogue | 2012

Dependency Relations Labeller for Czech

Rudolf Rosa; David Mareček

We present a MIRA-based labeller designed to assign dependency relation labels to edges in a dependency parse tree, tuned for Czech language. The labeller was created to be used as a second stage to unlabelled dependency parsers but can also improve output from labelled dependency parsers. We evaluate two existing techniques which can be used for labelling and experiment with combining them together. We describe the feature set used. Our final setup significantly outperforms the best results from the CoNLL 2009 shared task.

workshop on statistical machine translation | 2012

DEPFIX: A System for Automatic Correction of Czech MT Outputs

Rudolf Rosa; David Mareċek; OndÅ™ej Dušek

Archive | 2015

Universal Dependencies 1.0

Joakim Nivre; Željko Agić; Maria Jesus Aranzabe; Masayuki Asahara; Aitziber Atutxa; Miguel Ballesteros; John Bauer; Kepa Bengoetxea; Riyaz Ahmad Bhat; Cristina Bosco; Sam Bowman; Giuseppe G. A. Celano; Miriam Connor; Marie-Catherine de Marneffe; Arantza Diaz de Ilarraza; Kaja Dobrovoljc; Timothy Dozat; Tomaž Erjavec; Richárd Farkas; Jennifer Foster; Daniel Galbraith; Filip Ginter; Iakes Goenaga; Koldo Gojenola; Yoav Goldberg; Berta Gonzales; Bruno Guillaume; Jan Hajic; Dag Haug; Radu Ion

Explore More