Marie Mikulová
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marie Mikulová.
spoken language technology workshop | 2008
Jan Hajic; Silvie Cinková; Marie Mikulová; Petr Pajas; Jan Ptáček; Josef Toman; Zdenka Uresová
We present a description of a new resource (Prague Dependency Treebank of Spoken Language) being created for English and Czech to be used for the task of speech understanding, broad natural language analysis for dialog systems and other speech-related tasks, including speech editing. The resources we have created so far contain audio and a standard transcription of spontaneous speech, but as a novel layer, we add an edited (ldquoreconstructedrdquo) version of the spoken utterances. These edits go beyond the scope of current speech reconstruction efforts in that we allow, on top of the usual deletions of speech artifacts, fillers, etc. also for word modifications, insertions and word order changes. We have used both monologue and dialogue recordings in English and Czech to verify the feasibility of such transcription. We have also assessed the quality of the resulting annotation since the relative freedom of the editing raises an issue of what a ldquocorrectrdquo annotation is.
Archive | 2017
Jan Hajic; Eva Hajičová; Marie Mikulová; Jiří Mírovský
This chapter brings a relatively complete, though very brief, up-to-date information on the annotated corpus of Czech called Prague Dependency Treebank (PDT). It is the first complex linguistically motivated treebank based on a dependency syntactic theory, which contains annotation on several layers of sentence structure (Sects. 3, 4 and 5), coreference and basic discourse relations, genre specification and multiword expressions (Sect. 6). Section 7 presents a commented list of the whole PDT-style family of several follow-up treebanks developed in Prague as well as information on treebanks of other languages using the PDT-style annotation scheme in one way or another. In the last section, a brief description of the data format and the available tools is given.
conference on intelligent text processing and computational linguistics | 2015
Jan Hajic; Eva Hajičová; Marie Mikulová; Jiří Mírovský; Jarmila Panevová; Daniel Zeman
The aim of the present contribution is to put under scrutiny the ways in which the so-called deletions of elements in the surface shape of the sentence are treated in syntactically annotated corpora and to attempt at a categorization of deletions within a multilevel annotation scheme. We explain first (Sect. 1) the motivations of our research into this matter and in Sect. 2 we briefly overview how deletions are treated in some of the advanced annotation schemes for different languages. The core of the paper is Sect. 3, which is devoted to the treatment of deletions and node reconstructions on the two syntactic levels of annotation of the annotation scheme of the Prague Dependency Treebank (PDT). After a short account of PDT relevant for the issue under discussion (Sect. 3.1) and of the treatment of deletions at the level of surface structure of sentences (Sect. 3.2), we concentrate on selected types of reconstructions of the deleted items on the underlying (tectogrammatical) level of PDT (Sect. 3.3). In Section 3.4 we present some statistical data that offer a stimulating and encouraging ground for further investigations, both for linguistic theory and annotation practice. The results and the advantages of the approach applied and further perspectives are summarized in Sect. 4.
The Prague Bulletin of Mathematical Linguistics | 2018
Marie Mikulová; Eduard Bejček; Eva Hajičová; Jarmila Panevová
Abstract The aim of the contribution is to introduce a database of linguistic forms and their functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing. We demonstrate possibilities of the exploitation of the ForFun database. This article is largely based on a paper presented at the 16th International Workshop on Treebanks and Linguistic Theories in Prague (Bejček et al., 2017).
text speech and dialogue | 2017
Marie Mikulová; Jiří Mírovský; Anja Nedoluzhko; Petr Pajas; Jan Štěpánek; Jan Hajic
We present a richly annotated spoken language resource, the Prague Dependency Treebank of Spoken Czech 2.0, the primary purpose of which is to serve for speech-related NLP tasks. The treebank features several novel annotation schemas close to the audio and transcript, and the morphological, syntactic and semantic annotation corresponds to the family of Prague Dependency Treebanks; it could thus be used also for linguistic studies, including comparative studies regarding text and speech. The most unique and novel feature is our approach to syntactic annotation, which differs from other similar corpora such as Treebank-3 [8] in that it does not attempt to impose syntactic structure over input, but it includes one more layer which edits the literal transcript to fluent Czech while keeping the original transcript explicitly aligned with the edited version. This allows the morphological, syntactic and semantic annotation to be deterministically and fully mapped back to the transcript and audio. It brings new possibilities for modeling morphology, syntax and semantics in spoken language – either at the original transcript with mapped annotation, or at the new layer after (automatic) editing. The corpus is publicly and freely available.
The Prague Bulletin of Mathematical Linguistics | 2017
Veronika Kolářová; Jan Kolář; Marie Mikulová
Abstract The present paper extends understanding of differences in expressing actions by verbal nouns in corpora of written vs. spoken Czech, namely in the Czech part of the Prague Czech-English Dependency Treebank and in the Prague Dependency Treebank of Spoken Czech. We show that while the written corpus includes more complex noun phrases with more explicit expression of adnominal participants, noun phrases in the spoken corpus contain more deletions and more exophoric references. We also carried out a quantitative analysis focusing on relative frequencies of combinations of participants modifying verbal nouns; although the written corpus shows higher relative frequencies, the order of the relative frequencies of particular combinations is the same in both types of communication.
Journal of Linguistics/Jazykovedný casopis | 2017
Marie Mikulová; Eduard Bejček; Veronika Kolářová; Jarmila Panevová
Abstract We introduce a corpus based description of selected adverbial meanings in Czech sentences. Its basic repertory is one of a long lasting tradition in both scientific and school grammars. However, before the corpus era, researchers had to rely on their own excerption; but nowadays, current syntax has a vast material basis in the form of electronic corpora available. On the case of spatial adverbials, we describe our methodology which we used to acquire a detailed, comprehensive, well-arranged description of meanings of adverbials including a list of formal realizations with examples. Theoretical knowledge stemming from this work will lead into an improval of the annotation of the meanings in the Prague Dependency Treebanks which serve as the corpus sources for our research. The Prague Dependency Treebanks include data manually annotated on the layer of deep syntax and thus provide a large amount of valuable examples on the basis of which the meanings of adverbials can be defined more accurately and subcategorized more precisely. Both theoretical and practical results will subsequently be used in NLP, such as machine translation.
language resources and evaluation | 2012
Jan Hajiċ; Eva Hajiċová; Jarmila Panevová; Petr Sgall; Ondřej Bojar; Silvie Cinková; Eva Fuċíková; Marie Mikulová; Petr Pajas; Jan Popelka; Jiř'i Semeck'y; Jana Šindlerová; Jan Štėpánek; Josef Toman; Zdeňka Urešová; Zdenėk Żabokrtsk'y
language resources and evaluation | 2010
Marie Mikulová; Jan Stepánek
Archive | 2009
Marie Mikulová; Jan Stepánek