Jaroslava Hlaváčová

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaroslava Hlaváčová is active.

Explore More

Publication

Featured researches published by Jaroslava Hlaváčová.

Artificial Intelligence in Medicine | 2014

Adaptation of machine translation for multilingual information retrieval in the medical domain

Pavel Pecina; Ondřej Dušek; Lorraine Goeuriot; Jan Hajic; Jaroslava Hlaváčová; Gareth J. F. Jones; Liadh Kelly; Johannes Leveling; David Mareček; Michal Novák; Martin Popel; Rudolf Rosa; Aleš Tamchyna; Zdeňka Urešová

OBJECTIVE We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. METHODS AND DATA Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. RESULTS The search query translation results achieved in our experiments are outstanding - our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. CONCLUSIONS Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance - better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw#N# Text to Universal Dependencies | 2017

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Daniel Zeman; Martin Popel; Milan Straka; Jan Hajic; Joakim Nivre; Filip Ginter; Juhani Luotolahti; Sampo Pyysalo; Slav Petrov; Martin Potthast; Francis M. Tyers; Elena Badmaeva; Memduh Gokirmak; Anna Nedoluzhko; Silvie Cinková; Jaroslava Hlaváčová; Václava Kettnerová; Zdenka Uresová; Jenna Kanerva; Stina Ojala; Anna Missilä; Christopher D. Manning; Sebastian Schuster; Siva Reddy; Dima Taji; Nizar Habash; Herman Leung; Marie-Catherine de Marneffe; Manuela Sanguinetti; Maria Simi

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

text speech and dialogue | 2013

Intensifying Verb Prefix Patterns in Czech and Russian

Jaroslava Hlaváčová; Anna Nedoluzhko

In Czech and in Russian there is a set of prefixes changing the meaning of imperfective verbs always in the same manner. The change often (in Czech always) demands adding a reflexive morpheme. This feature can be used for automatic recognition of words, without the need to store them in morphological dictionaries.

The Prague Bulletin of Mathematical Linguistics | 2008

The Czech Academic Corpus 2.0 Guide

Barbora Hladká; Jan Hajic; Jirka Hana; Jaroslava Hlaváčová; Jiří Mírovský; Jan Raab

The Czech Academic Corpus 2.0 Guide The Czech Academic Corpus version 2.0 is a morphologically and syntactically annotated corpus of 650,000 words. The Czech Academic Corpus (CAC) was created by a team from the Institute of the Czech Language of the Academy of Sciences of the Czech Republic from 1971 to 1985. When the CAC project began there were only two computerized annotated corpora available since the 1960s - the Brown Corpus of American English and the LOB Corpus of British English. Both corpora became well known to corpus linguists, whereas the CAC remained hidden mainly because of the 1980s political regime in the Czech Republic. The idea of transferring the internal format and annotation scheme of the CAC into the Prague Dependency Treebank (PDT) concept emerged during the work on the PDTs second version. The main goal was to make the CAC and the PDT fully compatible and thus enable the integration of the CAC into the PDT. The currently released second version of the CAC presents the complete conversion of the internal format and morphological and syntactical annotation schemes. The Czech Academic Corpus v. 2.0 is being published by the Linguistic Data Consortium.

The Prague Bulletin of Mathematical Linguistics | 2014

Productive verb prefixation patterns

Jaroslava Hlaváčová; Anna Nedoluzhko

Abstract The paper discusses a set of verbal prefixes which, when added to a verb together with a reflexive morpheme, change the verb’s meaning always in the same manner. The prefixes form a sequence according to the degree of intensity with which they modify the verbal action. We present the process of verb intensification in three Slavic languages, namely Czech, Slovak and Russian.

text, speech and dialogue | 2018

Prefixal Morphemes of Czech Verbs

Jaroslava Hlaváčová

The paper presents the analysis of Czech verbal prefixes, which is the first step of a project that has the ultimate goal an automatic morphemic analysis of Czech. We studied prefixes that may occur in Czech verbs, especially their possible and impossible combinations. We describe a procedure of prefix recognition and derive several general rules for selection of a correct result. The analysis of “double” prefixes enables to make conclusions about universality of the first prefix. We also added linguistic comments to several types of prefixes.

text speech and dialogue | 2016

Homonymy and Polysemy in the Czech Morphological Dictionary

Jaroslava Hlaváčová

We focus on a problem of homonymy and polysemy in morphological dictionaries on the example of the Czech morphological dictionary MorfFlex CZ [2]. It is not necessary to distinguish meanings in morphological dictionaries unless the distinction has consequencies in word formation or syntax. The contribution proposes several important rules and principles for achieving consistency.

text speech and dialogue | 2011

Prefix recognition experiments

Jaroslava Hlaváčová; Michal Hrušecký

The paper deals with automatic methods for prefix extraction and their comparison. We present experiments with Czech and English and compare the results with regard to the size and type (wordforms vs. lemmas) of input data.

text speech and dialogue | 2001