Jaroslava Hlaváčová
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaroslava Hlaváčová.
Artificial Intelligence in Medicine | 2014
Pavel Pecina; Ondřej Dušek; Lorraine Goeuriot; Jan Hajic; Jaroslava Hlaváčová; Gareth J. F. Jones; Liadh Kelly; Johannes Leveling; David Mareček; Michal Novák; Martin Popel; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová
OBJECTIVE We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. METHODS AND DATA Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. RESULTS The search query translation results achieved in our experiments are outstanding - our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. CONCLUSIONS Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance - better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw#N# Text to Universal Dependencies | 2017
Daniel Zeman; Martin Popel; Milan Straka; Jan Hajic; Joakim Nivre; Filip Ginter; Juhani Luotolahti; Sampo Pyysalo; Slav Petrov; Martin Potthast; Francis M. Tyers; Elena Badmaeva; Memduh Gokirmak; Anna Nedoluzhko; Silvie Cinková; Jaroslava Hlaváčová; Václava Kettnerová; Zdenka Uresová; Jenna Kanerva; Stina Ojala; Anna Missilä; Christopher D. Manning; Sebastian Schuster; Siva Reddy; Dima Taji; Nizar Habash; Herman Leung; Marie-Catherine de Marneffe; Manuela Sanguinetti; Maria Simi
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.
text speech and dialogue | 2013
Jaroslava Hlaváčová; Anna Nedoluzhko
In Czech and in Russian there is a set of prefixes changing the meaning of imperfective verbs always in the same manner. The change often (in Czech always) demands adding a reflexive morpheme. This feature can be used for automatic recognition of words, without the need to store them in morphological dictionaries.
The Prague Bulletin of Mathematical Linguistics | 2008
Barbora Hladká; Jan Hajic; Jirka Hana; Jaroslava Hlaváčová; Jiří Mírovský; Jan Raab
The Czech Academic Corpus 2.0 Guide The Czech Academic Corpus version 2.0 is a morphologically and syntactically annotated corpus of 650,000 words. The Czech Academic Corpus (CAC) was created by a team from the Institute of the Czech Language of the Academy of Sciences of the Czech Republic from 1971 to 1985. When the CAC project began there were only two computerized annotated corpora available since the 1960s - the Brown Corpus of American English and the LOB Corpus of British English. Both corpora became well known to corpus linguists, whereas the CAC remained hidden mainly because of the 1980s political regime in the Czech Republic. The idea of transferring the internal format and annotation scheme of the CAC into the Prague Dependency Treebank (PDT) concept emerged during the work on the PDTs second version. The main goal was to make the CAC and the PDT fully compatible and thus enable the integration of the CAC into the PDT. The currently released second version of the CAC presents the complete conversion of the internal format and morphological and syntactical annotation schemes. The Czech Academic Corpus v. 2.0 is being published by the Linguistic Data Consortium.
The Prague Bulletin of Mathematical Linguistics | 2014
Jaroslava Hlaváčová; Anna Nedoluzhko
Abstract The paper discusses a set of verbal prefixes which, when added to a verb together with a reflexive morpheme, change the verb’s meaning always in the same manner. The prefixes form a sequence according to the degree of intensity with which they modify the verbal action. We present the process of verb intensification in three Slavic languages, namely Czech, Slovak and Russian.
text, speech and dialogue | 2018
Jaroslava Hlaváčová
The paper presents the analysis of Czech verbal prefixes, which is the first step of a project that has the ultimate goal an automatic morphemic analysis of Czech. We studied prefixes that may occur in Czech verbs, especially their possible and impossible combinations. We describe a procedure of prefix recognition and derive several general rules for selection of a correct result. The analysis of “double” prefixes enables to make conclusions about universality of the first prefix. We also added linguistic comments to several types of prefixes.
text speech and dialogue | 2016
Jaroslava Hlaváčová
We focus on a problem of homonymy and polysemy in morphological dictionaries on the example of the Czech morphological dictionary MorfFlex CZ [2]. It is not necessary to distinguish meanings in morphological dictionaries unless the distinction has consequencies in word formation or syntax. The contribution proposes several important rules and principles for achieving consistency.
text speech and dialogue | 2011
Jaroslava Hlaváčová; Michal Hrušecký
The paper deals with automatic methods for prefix extraction and their comparison. We present experiments with Czech and English and compare the results with regard to the size and type (wordforms vs. lemmas) of input data.
text speech and dialogue | 2001
Jaroslava Hlaváčová
language resources and evaluation | 2006
Jaroslava Hlaváčová