Alexandr Rosen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandr Rosen is active.

Explore More

Publication

Featured researches published by Alexandr Rosen.

language resources and evaluation | 2014

Evaluating and automating the annotation of a learner corpus

Alexandr Rosen; Jirka Hana; Barbora Štindlová; Anna Feldman

The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked tiers, designed to handle a wide range of error types present in the input. Each tier corrects different types of errors; links between the tiers allow capturing errors in word order and complex discontinuous expressions. Errors are not only corrected, but also classified. The annotation scheme is tested on a data set including approx. 175,000 words with fair inter-annotator agreement results. We also explore the possibility of applying automated linguistic annotation tools (taggers, spell checkers and grammar checkers) to the learner text to support or even substitute manual annotation.

language resources and evaluation | 2012

Building a learner corpus

Jirka Hana; Alexandr Rosen; Barbora Štindlová; Petr J"ager

The need for data about the acquisition of Czech by non-native learners prompted the compilation of the first learner corpus of Czech. After introducing its basic design and parameters, including a multi-tier manual annotation scheme and error taxonomy, we focus on the more technical aspects: the transcription of hand-written source texts, process of annotation, and options for exploiting the result, together with tools used for these tasks and decisions behind the choices. To support or even substitute manual annotation we assign some error tags automatically and use automatic annotation tools (tagger, spell checker).

text speech and dialogue | 2012

Combining Manual and Automatic Annotation of a Learner Corpus

Tomáš Jelínek; Barbora Štindlová; Alexandr Rosen; Jirka Hana

We present an approach to building a learner corpus of Czech, manually corrected and annotated with error tags using a complex grammar-based taxonomy of errors in spelling, morphology, morphosyntax, lexicon and style. This grammar-based annotation is supplemented by a formal classification of errors based on surface alternations. To supply additional information about non-standard or ill-formed expressions, we aim at a synergy of manual and automatic annotation, deriving information from the original input and from the manual annotation.

Archive | 1994

Machine Readable Dictionary as a Source of Grammatical Information

Eva Hajičová; Alexandr Rosen

The present contribution describes an enterprise in collecting lexical data for an English parser in the context of a bilingual research project. The primary source of grammatical information is a computer usable version of OALD (Hornby, 1974). The target lexicon’s structure of verbal valency frames, inspired by the theoretical framework of functional generative description, includes an underlying level. Its content can be derived under some human supervision from OALD’s verb pattern codes. Results confirm the usefulness of machine readable dictionaries for NLP applications.

International Conference on Computational and Corpus-Based Phraseology | 2017

Eye of a Needle in a Haystack

Milena Hnátková; Tomáš Jelínek; Marie Kopřivová; Vladimír Petkevič; Alexandr Rosen; Hana Skoumalová; Pavel Vondřička

We propose a multidimensional taxonomy of multiword expressions (MWEs) as a pattern applicable to entries in a representative lexicon of Czech MWEs. The taxonomy and the lexicon are useful for many reasons concerning lexicography, teaching Czech as a foreign language, and theoretical issues of MWEs as entities standing between lexicon and grammar, as well as for NLP tasks such as tagging and parsing, identification and search of MWEs, or word sense and semantic disambiguation. In addition to the description of various types of idiomaticity, the taxonomy and the lexicon are designed to account for flexibility in morphology and word order, syntactic and lexical variants and even creatively used fragments.

international conference on computational linguistics | 1992

Derivation of underlying valency frames from a learner's dictionary

Alexandr Rosen; Eva Hajičová; Jan Hajic

The authors collect lexical data for a module of English syntactic analysis in the context of a bilingual research project. The computer usable version of OALD (Hornby, 1974) is used as the primary source. The main focus is on the structure and derivation of valency frames for verbal entries in the target lexicon, Illustration of the complex relation between OALDs verb subcategorization codes and the target complementation paradigms is provided, and an approach to the derivation procedure design suggested.

linguistic annotation workshop | 2010