Jiří Mírovský
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiří Mírovský.
meeting of the association for computational linguistics | 2009
Barbora Hladká; Jiří Mírovský; Pavel Schlesinger
We propose the PlayCoref game, whose purpose is to obtain substantial amount of text data with the coreference annotation. We provide a description of the game design that covers the strategy, the instructions for the players, the input texts selection and preparation, and the score evaluation.
linguistic annotation workshop | 2009
Barbora Hladká; Jiří Mírovský; Pavel Schlesinger
PlayCoref is a concept of an on-line language game designed to acquire a substantial amount of text data with the coreference annotation. We describe in detail various aspects of the game design and discuss features that affect the quality of the annotation.
linguistic annotation workshop | 2009
Anna Nedoluzhko; Jiří Mírovský; Petr Pajas
The present paper outlines an ongoing project of annotation of the extended nominal coreference and the bridging anaphora in the Prague Dependency Treebank. We describe the annotation scheme with respect to the linguistic classification of coreferential and bridging relations and focus also on details of the annotation process from the technical point of view. We present methods of helping the annotators -- by a pre-annotation and by several useful features implemented in the annotation tool. Our method of the inter-annotator agreement is focused on the improvement of the annotation guidelines; we present results of three subsequent measurements of the agreement.
linguistic annotation workshop | 2014
Magdaléna Rysová; Jiří Mírovský
The paper introduces a possibility of new research offered by a multi-dimensional annotation of the Prague Dependency Treebank. It focuses on exploitation of the annotation of coreference for the annotation of discourse relations expressed by multiword expressions. It tries to find which aspect interlinks these linguistic areas and how we can use this interplay in automatic searching for Czech expressions like despite this (navzdory tomu ), because of this fact (diky teto skutecnosti ) functioning as multiword discourse markers.
The Prague Bulletin of Mathematical Linguistics | 2008
Barbora Hladká; Jan Hajic; Jirka Hana; Jaroslava Hlaváčová; Jiří Mírovský; Jan Raab
The Czech Academic Corpus 2.0 Guide The Czech Academic Corpus version 2.0 is a morphologically and syntactically annotated corpus of 650,000 words. The Czech Academic Corpus (CAC) was created by a team from the Institute of the Czech Language of the Academy of Sciences of the Czech Republic from 1971 to 1985. When the CAC project began there were only two computerized annotated corpora available since the 1960s - the Brown Corpus of American English and the LOB Corpus of British English. Both corpora became well known to corpus linguists, whereas the CAC remained hidden mainly because of the 1980s political regime in the Czech Republic. The idea of transferring the internal format and annotation scheme of the CAC into the Prague Dependency Treebank (PDT) concept emerged during the work on the PDTs second version. The main goal was to make the CAC and the PDT fully compatible and thus enable the integration of the CAC into the PDT. The currently released second version of the CAC presents the complete conversion of the internal format and morphological and syntactical annotation schemes. The Czech Academic Corpus v. 2.0 is being published by the Linguistic Data Consortium.
The Prague Bulletin of Mathematical Linguistics | 2008
Jiří Mírovský
Netgraph Query Language for the Prague Dependency Treebank 2.0 We study the annotation of the Prague Dependency Treebank 2.0 (PDT 2.0) and assemble a list of requirements on a query language that would allow searching for and studying all linguistic phenomena annotated in the treebank. We propose an extension to the query language of an existing search tool Netgraph 1.0 and show that the extended query language satisfies the list of requirements. We demonstrate how all principal linguistic phenomena annotated in the treebank can be searched for with the proposed query language and compare the query language to some other treebank search systems. The proposed query language has been implemented in the search tool Netgraph - we talk about features of a search tool that can simplify the searching and make it more powerful. We also present a table that shows the extent of usage of various features of the implemented query language by the users of Netgraph and mention several usages of Netgraph for other treebanks than PDT 2.0.
Archive | 2017
Jan Hajic; Eva Hajičová; Marie Mikulová; Jiří Mírovský
This chapter brings a relatively complete, though very brief, up-to-date information on the annotated corpus of Czech called Prague Dependency Treebank (PDT). It is the first complex linguistically motivated treebank based on a dependency syntactic theory, which contains annotation on several layers of sentence structure (Sects. 3, 4 and 5), coreference and basic discourse relations, genre specification and multiword expressions (Sect. 6). Section 7 presents a commented list of the whole PDT-style family of several follow-up treebanks developed in Prague as well as information on treebanks of other languages using the PDT-style annotation scheme in one way or another. In the last section, a brief description of the data format and the available tools is given.
conference on intelligent text processing and computational linguistics | 2016
Vladislav Kuboň; Markéta Lopatková; Jiří Mírovský
This paper gives an overview of results of automatic analysis of word order in 23 dependency treebanks. These treebanks have been collected in the frame of the HamleDT project, whose main goal is to provide universal annotation for dependency corpora; thus it also makes it possible to use identical queries for all the corpora. The analysis concentrates on basic characteristics of word order, the order of three main constituents, a predicate, a subject and an object. A quantitative analysis is performed separately for main clauses and subordinated clauses; further, a presence of an active verb is taken into account – we show that in many languages the subordinated clauses have a slightly different order of words than main clauses; the choice of voice has also an impact on word order.
The Prague Bulletin of Mathematical Linguistics | 2016
Jan Hajic; Eva Hajičová; Jiří Mírovský; Jarmila Panevová
Abstract A case study based on experience in linguistic investigations using annotated monolingual and multilingual text corpora; the “cases” include a description of language phenomena belonging to different layers of the language system: morphology, surface and underlying syntax, and discourse. The analysis is based on a complex annotation of syntax, semantic functions, information structure and discourse relations of the Prague Dependency Treebank, a collection of annotated Czech texts. We want to demonstrate that annotation of corpus is not a self-contained goal: in order to be consistent, it should be based on some linguistic theory, and, at the same time, it should serve as a test bed for the given linguistic theory in particular and for linguistic research in general.
conference on intelligent text processing and computational linguistics | 2015
Jan Hajic; Eva Hajičová; Marie Mikulová; Jiří Mírovský; Jarmila Panevová; Daniel Zeman
The aim of the present contribution is to put under scrutiny the ways in which the so-called deletions of elements in the surface shape of the sentence are treated in syntactically annotated corpora and to attempt at a categorization of deletions within a multilevel annotation scheme. We explain first (Sect. 1) the motivations of our research into this matter and in Sect. 2 we briefly overview how deletions are treated in some of the advanced annotation schemes for different languages. The core of the paper is Sect. 3, which is devoted to the treatment of deletions and node reconstructions on the two syntactic levels of annotation of the annotation scheme of the Prague Dependency Treebank (PDT). After a short account of PDT relevant for the issue under discussion (Sect. 3.1) and of the treatment of deletions at the level of surface structure of sentences (Sect. 3.2), we concentrate on selected types of reconstructions of the deleted items on the underlying (tectogrammatical) level of PDT (Sect. 3.3). In Section 3.4 we present some statistical data that offer a stimulating and encouraging ground for further investigations, both for linguistic theory and annotation practice. The results and the advantages of the approach applied and further perspectives are summarized in Sect. 4.