Marie Kopřivová | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marie Kopřivová is active.

Explore More

Publication

Featured researches published by Marie Kopřivová.

Journal of Linguistics/Jazykovedný casopis | 2017

New Spoken Corpora of Czech: ORTOFON and DIALEKT

Zuzana Komrsková; Marie Kopřivová; David Lukeš; Petra Poukarová; Hana Goláňová

Abstract The paper introduces the ORTOFON corpus of spontaneous spoken Czech and the DIALEKT corpus of Czech dialects, their design principles and practical solutions adopted during data collection.

International Conference on Computational and Corpus-Based Phraseology | 2017

Eye of a Needle in a Haystack

Milena Hnátková; Tomáš Jelínek; Marie Kopřivová; Vladimír Petkevič; Alexandr Rosen; Hana Skoumalová; Pavel Vondřička

We propose a multidimensional taxonomy of multiword expressions (MWEs) as a pattern applicable to entries in a representative lexicon of Czech MWEs. The taxonomy and the lexicon are useful for many reasons concerning lexicography, teaching Czech as a foreign language, and theoretical issues of MWEs as entities standing between lexicon and grammar, as well as for NLP tasks such as tagging and parsing, identification and search of MWEs, or word sense and semantic disambiguation. In addition to the description of various types of idiomaticity, the taxonomy and the lexicon are designed to account for flexibility in morphology and word order, syntactic and lexical variants and even creatively used fragments.

International Conference on Computational and Corpus-Based Phraseology | 2017

Contribution Towards a Corpus-Based Phraseology Minimum

Marie Kopřivová

This paper represents an attempt to put together a list of the most commonly used (most typical) Czech idioms using corpus data with annotated collocations. Collocations are annotated in corpora of contemporary written Czech as well as in a corpus of spoken Czech containing transcripts of intimate conversations. Idioms are selected based on their frequency in different text types (newspapers and magazines, non-fiction, fiction, spoken language) and the resulting list is compiled based on a criterion of occurrence of the given idiom in at least two different text types. A short characteristic of the individual text types is given in terms of which types of idioms are typical for them (according to formal criteria). This study confirms a substantial divide between idiom use in written and spoken language. A smaller difference can be observed between fiction on the one hand and non-fiction and newspapers on the other. The main reason for this is the interactive nature of fiction texts, which leads to them containing idioms with verbal components. These are employed in a fashion similar to spoken languages, in interactions among the individual characters. By contrast, non-fiction and journalistic language tends to be more descriptive, with more nominal idioms.

text speech and dialogue | 2015

Experimental Tagging of the ORAL Series Corpora: Insights on Using a Stochastic Tagger

David Lukeš; Petra Klimešová; Zuzana Komrsková; Marie Kopřivová

The ORAL series corpora of spontaneous spoken Czech currently contain neither lemmatization nor part of speech tagging. The main reason for this is that readily available NLP tools, designed primarily with written texts in mind, underperform when applied directly to speech transcripts, due to various morpohological and syntactic specificities of informal spoken language and the ways these are captured in transcription. Recently, the highly optimized open-source MorphoDiTa toolchain for training and applying stochastic tagging models was released; MorphoDiTa makes it easy and fast to experiment with incremental changes in the training procedure. The article discusses modifications to the morphological dictionary and training data used by the models which are necessary in order to improve their performance on the ORAL series corpora, as well as challenges which remain to be solved.

language resources and evaluation | 2014

MAPPING DIATOPIC AND DIACHRONIC VARIATION IN SPOKEN CZECH: THE ORTOFON AND DIALEKT CORPORA

Marie Kopřivová; Hana Goláňová; Petra Klimešová; David Lukeš

Časopis pro moderní filologii (Journal for Modern Philology) | 2018

Variabilita českých frazémů v úzu

Tomáš Jelínek; Marie Kopřivová; Vladimír Petkevič; Hana Skoumalová

Corpus Pragmatics | 2017

Between Syntax and Pragmatics: The Causal Conjunction Protože in Spoken and Written Czech

Anna Čermáková; Zuzana Komrsková; Marie Kopřivová; Petra Poukarová

Časopis pro moderní filologii (Journal for Modern Philology) | 2015

Slovo to v mluvených korpusech ČNK, jeho prefixace a reduplikace

Petra Klimešová; Zuzana Komrsková; Marie Kopřivová; David Lukeš

Archive | 2008

ORAL2008: Balanced corpus of informal spoken Czech

Martina Waclawičová; Marie Kopřivová; Michal Křen; Lucie Válková

Archive | 2006

SYN2006PUB: corpus of Czech newspapers

František Čermák; Jaroslava Hlaváčová; Milena Hnátková; Tomáš Jelínek; Jan Kocek; Marie Kopřivová; Michal Křen; Renata Novotná; Vladimír Petkevič; Věra Schmiedtová; Hana Skoumalová; Johanka Spoustová; Michal Šulc; Zdeněk Velíšek

Explore More