Zdeňka Urešová
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zdeňka Urešová.
Artificial Intelligence in Medicine | 2014
Pavel Pecina; Ondřej Dušek; Lorraine Goeuriot; Jan Hajic; Jaroslava Hlaváčová; Gareth J. F. Jones; Liadh Kelly; Johannes Leveling; David Mareček; Michal Novák; Martin Popel; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová
OBJECTIVE We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. METHODS AND DATA Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. RESULTS The search query translation results achieved in our experiments are outstanding - our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. CONCLUSIONS Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance - better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.
workshop on statistical machine translation | 2014
Ondřej Dušek; Jan Hajiċ; Jaroslava Hlaváċová; Michal Novák; Pavel Pecina; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová; Daniel Zeman
This paper presents the participation of the Charles University team in the WMT 2014 Medical Translation Task. Our systems are developed within the Khresmoi project, a large integrated project aiming to deliver a multi-lingual multi-modal search and access system for biomedical information and documents. Being involved in the organization of the Medical Translation Task, our primary goal is to set up a baseline for both its subtasks (summary translation and query translation) and for all translation directions. Our systems are based on the phrasebased Moses system and standard methods for domain adaptation. The constrained/unconstrained systems differ in the training data only.
linguistic annotation workshop | 2015
Zdeňka Urešová; Ondřej Dušek; Eva Fučíková; Jan Hajic; Jana Šindlerová
This paper presents a resource and the associated annotation process used in a project of interlinking Czech and English verbal translational equivalents based on a parallel, richly annotated dependency treebank containing also valency and semantic roles, namely the Prague Czech-English Dependency Treebank. One of the main aims of this project is to create a high-quality and relatively large empirical base which could be used both for linguistic comparative research as well as for natural language processing applications, such as machine translation or cross-language sense disambiguation. This paper describes the resulting lexicon, CzEngVallex, and the process of building it, as well some interesting observations and statistics already obtained.
The Prague Bulletin of Mathematical Linguistics | 2016
Zdeňka Urešová; Eva Fučíková; Jana Šindlerová
Abstract This paper introduces a new bilingual Czech-English verbal valency lexicon (called CzEng-Vallex) representing a relatively large empirical database. It includes 20,835 aligned valency frame pairs (i.e., verb senses which are translations of each other) and their aligned arguments. This new lexicon uses data from the Prague Czech-English Dependency Treebank and also takes advantage of the existing valency lexicons for both languages: the PDT-Vallex for Czech and the EngVallex for English. The CzEngVallex is available for browsing as well as for download in the LINDAT/CLARIN repository. The CzEngVallex is meant to be used not only by traditional linguists, lexicographers, translators but also by computational linguists both for the purposes of enriching theoretical linguistic accounts of verbal valency from a cross-linguistic perspective and for an innovative use in various NLP tasks.
text speech and dialogue | 2012
Václava Kettnerová; Markéta Lopatková; Zdeňka Urešová
Under the term grammaticalized alternations, we understand changes in valency frames of verbs corresponding to different surface syntactic structures of the same lexical unit of a verb. Czech grammaticalized alternations are expressed either (i) by morphological means (diatheses), or (ii) by syntactic means (reciprocity). These changes are limited to changes in morphemic form(s) of valency complementations; moreover, they are regular enough to be captured by formal syntactic rules.
Journal of Linguistics/Jazykovedný casopis | 2017
Zdeňka Urešová; Eva Fučíková; Eva Hajičová
Abstract In this paper, we introduce our ongoing project about synonymy in bilingual context. This project aims at exploring semantic ‘equivalence’ of verb senses of generally different verbal lexemes in a bilingual (Czech-English) setting. Specifically, it focuses on their valency behavior within such equivalence groups. We believe that using bilingual context (translation) as an important factor in the delimitation of classes of synonymous lexical units (verbs, in our case) may help to specify the verb senses, also with regard to the (semantic) roles relation to other verb senses and roles of their arguments more precisely than when using monolingual corpora. In our project, we work “bottom-up”, i.e., from an evidence as recorded in our corpora and not “top-down”, from a predefined set of semantic classes.
linguistic annotation workshop | 2009
Barbora Hladká; Zdeňka Urešová
Corpus annotation plays an important role in linguistic analysis and computational processing of both written and spoken language. Syntactic annotation of spoken texts becomes clearly a topic of considerable interest nowadays, driven by the desire to improve automatic speech recognition systems by incorporating syntax in the language models, or to build language understanding applications. Syntactic annotation of both written and spoken texts in the Czech Academic Corpus was created thirty years ago when no other (even annotated) corpus of spoken texts has existed. We will discuss how much relevant and inspiring this annotation is to the current frameworks of spoken text annotation.
language resources and evaluation | 2012
Jan Hajiċ; Eva Hajiċová; Jarmila Panevová; Petr Sgall; Ondřej Bojar; Silvie Cinková; Eva Fuċíková; Marie Mikulová; Petr Pajas; Jan Popelka; Jiř'i Semeck'y; Jana Šindlerová; Jan Štėpánek; Josef Toman; Zdeňka Urešová; Zdenėk Żabokrtsk'y
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015) | 2015
Ondřej Dušek; Eva Fuċíková; Jan Hajiċ; Martin Popel; Jana Šindlerová; Zdeňka Urešová
International Journal of Lexicography | 2016
Adam Przepiórkowski; Jan Hajic; Elżbieta Hajnicz; Zdeňka Urešová