Magda Ševčíková | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Magda Ševčíková is active.

Explore More

Publication

Featured researches published by Magda Ševčíková.

text speech and dialogue | 2007

Named entities in Czech: annotating data and developing NE tagger

Magda Ševčíková; Zdeněk Žabokrtsky; Oldřich Krůza

This paper deals with the treatment of Named Entities (NEs) in Czech. We introduce a two-level NE classification. We have used this classification for manual annotation of two thousand sentences, gaining more than 11,000 NE instances. Employing the annotated data and Machine-Learning techniques (namely the top-down induction of decision trees), we have developed and evaluated a software system aimed at automatic detection and classification of NEs in Czech texts.

The Prague Bulletin of Mathematical Linguistics | 2011

Specificity of the number of nouns in Czech and its annotation in Prague Dependency Treebank

Magda Ševčíková; Jarmila Panevová; Lenka Smejkalová

Specificity of the number of nouns in Czech and its annotation in Prague Dependency Treebank The paper focuses on the way how the grammatical category of number of nouns will be annotated in the forthcoming version of Prague Dependency Treebank (PDT 3.0), concentrating on the peculiarities beyond the regular opposition of singular and plural. A new semantic feature closely related to the category of number (so-called pair/group meaning) was introduced. Nouns such as ruce ‘hands’ or klíče ‘keys’ refer with their plural forms to a pair or to a typical group even more often than to a larger amount of single entities. Since pairs or groups can be referred to with most Czech concrete nouns, the pair/group meaning is considered as a grammaticalized meaning of nouns in Czech. In the present paper, manual annotation of the pair/group meaning is described, which was carried out on the data of Prague Dependency Treebank. A comparison with a sample annotation of data from Prague Dependency Treebank of Spoken Czech has demonstrated that the pair/group meaning is both more frequent and more easily distinguishable in the spoken than in the written data.

The Prague Bulletin of Mathematical Linguistics | 2018

Modelling Morphographemic Alternations in Derivation of Czech

Magda Ševčíková

Abstract The present paper deals with morphographemic alternations in Czech derivation with regard to the build-up of a large-coverage lexical resource specialized in derivational morphology of contemporary Czech (DeriNet database). After a summary of available descriptions in the Czech linguistic literature and Natural Language Processing, an extensive list of alternations is provided in the first part of the paper with a focus on their manifestation in writing. Due to the significant frequency and limited predictability of alternations in Czech derivation, several bottom-up methods were used in order to adequately model the alternations in DeriNet. Suffix-substitution rules proved to be efficient for alternations in the final position of the stem, whereas a specialized approach of extracting alternations from inflectional paradigms was used for modelling alternations within the roots. Alternations connected with derivation of verbs were handled as a separate task. DeriNet data are expected to be helpful in developing a tool for morphemic segmentation and, once the segmentation is available, to become a reliable resource for data-based description of word formation including alternations in Czech.

Archive | 2017

Czech Named Entity Corpus

Jana Straková; Milan Straka; Magda Ševčíková; Zdeněk Žabokrtský

We present a corpus of Czech sentences with manually annotated named entities, in which a rich two-level hierarchy of named entity types was used. The corpus was the first available large Czech named entity resource and since 2007, it has stimulated the research in this field for Czech. We describe the two-level fine-grained hierarchy allowing embedded entities and the motivations leading to its design. We further discuss the data selection and the annotation process. We then show how the data can be used for training a named entity recognizer and we perform a number of experiments to critically evaluate the impact of the decisions made in the process of annotation on the named entity recognizer performance. We thoroughly discuss the effect of sentence selection, corpus size, part-of-speech tagging and lemmatization, representativeness and bias of the named entity distribution, classification granularity and other corpus properties in terms of supervised machine learning.

systems and frameworks for computational morphology | 2015

Morphology Within the Multi-layered Annotation Scenario of the Prague Dependency Treebank

Magda Ševčíková

Morphological annotation constitutes a separate layer in the multi-layered annotation scenario of the Prague Dependency Treebank. At this layer, morphological categories expressed by a word form are captured in a positional part-of-speech tag. According to the Praguian approach based on the relation between form and function, functions (meanings) of morphological categories are represented as well, namely as grammateme attributes at the deep-syntactic (tectogrammatical) layer of the treebank.

text speech and dialogue | 2012

Sentence Modality Assignment in the Prague Dependency Treebank

Magda Ševčíková; Jiří Mírovský

The paper focuses on the annotation of sentence modality in the Prague Dependency Treebank (PDT). Sentence modality (as the contrast between declarative, imperative, interrogative etc. sentences) is expressed by a combination of several means in Czech, from which the category of verbal mood and the final punctuation of the sentence are the most important ones. In PDT 2.0, sentence modality was assigned semi-automatically to the root node of each sentence (tree) and further to the roots of parenthesis and direct speech subtrees. As this approach was too simple to adequately represent the linguistic phenomenon in question, the method for assigning the sentence modality has been revised and elaborated for the forthcoming version of the treebank (PDT 3.0).

Archive | 2011

Prague Dependency Treebank 2.5

Eduard Bejček; Jan Hajic; Jarmila Panevová; Jiří Mírovský; Johanka Spoustová; Jan Štěpánek; Pavel Straňák; Pavel Šidák; Pavlína Vimmrová; Eva Šťastná; Magda Ševčíková; Lenka Smejkalová; Petr Homola; Jan Popelka; Markéta Lopatková; Lucie Hrabalová; Natalia Klyueva; Zdeněk Žabokrtský

language resources and evaluation | 2014