Magda Ševčíková
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Magda Ševčíková.
text speech and dialogue | 2007
Magda Ševčíková; Zdeněk Žabokrtsky; Oldřich Krůza
This paper deals with the treatment of Named Entities (NEs) in Czech. We introduce a two-level NE classification. We have used this classification for manual annotation of two thousand sentences, gaining more than 11,000 NE instances. Employing the annotated data and Machine-Learning techniques (namely the top-down induction of decision trees), we have developed and evaluated a software system aimed at automatic detection and classification of NEs in Czech texts.
The Prague Bulletin of Mathematical Linguistics | 2011
Magda Ševčíková; Jarmila Panevová; Lenka Smejkalová
Specificity of the number of nouns in Czech and its annotation in Prague Dependency Treebank The paper focuses on the way how the grammatical category of number of nouns will be annotated in the forthcoming version of Prague Dependency Treebank (PDT 3.0), concentrating on the peculiarities beyond the regular opposition of singular and plural. A new semantic feature closely related to the category of number (so-called pair/group meaning) was introduced. Nouns such as ruce ‘hands’ or klíče ‘keys’ refer with their plural forms to a pair or to a typical group even more often than to a larger amount of single entities. Since pairs or groups can be referred to with most Czech concrete nouns, the pair/group meaning is considered as a grammaticalized meaning of nouns in Czech. In the present paper, manual annotation of the pair/group meaning is described, which was carried out on the data of Prague Dependency Treebank. A comparison with a sample annotation of data from Prague Dependency Treebank of Spoken Czech has demonstrated that the pair/group meaning is both more frequent and more easily distinguishable in the spoken than in the written data.
The Prague Bulletin of Mathematical Linguistics | 2018
Magda Ševčíková
Abstract The present paper deals with morphographemic alternations in Czech derivation with regard to the build-up of a large-coverage lexical resource specialized in derivational morphology of contemporary Czech (DeriNet database). After a summary of available descriptions in the Czech linguistic literature and Natural Language Processing, an extensive list of alternations is provided in the first part of the paper with a focus on their manifestation in writing. Due to the significant frequency and limited predictability of alternations in Czech derivation, several bottom-up methods were used in order to adequately model the alternations in DeriNet. Suffix-substitution rules proved to be efficient for alternations in the final position of the stem, whereas a specialized approach of extracting alternations from inflectional paradigms was used for modelling alternations within the roots. Alternations connected with derivation of verbs were handled as a separate task. DeriNet data are expected to be helpful in developing a tool for morphemic segmentation and, once the segmentation is available, to become a reliable resource for data-based description of word formation including alternations in Czech.
Archive | 2017
Jana Straková; Milan Straka; Magda Ševčíková; Zdeněk Žabokrtský
We present a corpus of Czech sentences with manually annotated named entities, in which a rich two-level hierarchy of named entity types was used. The corpus was the first available large Czech named entity resource and since 2007, it has stimulated the research in this field for Czech. We describe the two-level fine-grained hierarchy allowing embedded entities and the motivations leading to its design. We further discuss the data selection and the annotation process. We then show how the data can be used for training a named entity recognizer and we perform a number of experiments to critically evaluate the impact of the decisions made in the process of annotation on the named entity recognizer performance. We thoroughly discuss the effect of sentence selection, corpus size, part-of-speech tagging and lemmatization, representativeness and bias of the named entity distribution, classification granularity and other corpus properties in terms of supervised machine learning.
systems and frameworks for computational morphology | 2015
Magda Ševčíková
Morphological annotation constitutes a separate layer in the multi-layered annotation scenario of the Prague Dependency Treebank. At this layer, morphological categories expressed by a word form are captured in a positional part-of-speech tag. According to the Praguian approach based on the relation between form and function, functions (meanings) of morphological categories are represented as well, namely as grammateme attributes at the deep-syntactic (tectogrammatical) layer of the treebank.
text speech and dialogue | 2012
Magda Ševčíková; Jiří Mírovský
The paper focuses on the annotation of sentence modality in the Prague Dependency Treebank (PDT). Sentence modality (as the contrast between declarative, imperative, interrogative etc. sentences) is expressed by a combination of several means in Czech, from which the category of verbal mood and the final punctuation of the sentence are the most important ones. In PDT 2.0, sentence modality was assigned semi-automatically to the root node of each sentence (tree) and further to the roots of parenthesis and direct speech subtrees. As this approach was too simple to adequately represent the linguistic phenomenon in question, the method for assigning the sentence modality has been revised and elaborated for the forthcoming version of the treebank (PDT 3.0).
Archive | 2011
Eduard Bejček; Jan Hajic; Jarmila Panevová; Jiří Mírovský; Johanka Spoustová; Jan Štěpánek; Pavel Straňák; Pavel Šidák; Pavlína Vimmrová; Eva Šťastná; Magda Ševčíková; Lenka Smejkalová; Petr Homola; Jan Popelka; Markéta Lopatková; Lucie Hrabalová; Natalia Klyueva; Zdeněk Žabokrtský
language resources and evaluation | 2014
Magda Ševčíková; Zdenek Zabokrtsky
language resources and evaluation | 2016
Magda Ševčíková; Milan Straka; Jonáš Vidra; Adéla Limburská
language resources and evaluation | 2010
Jarmila Panevová; Magda Ševčíková