Karel Oliva
Austrian Research Institute for Artificial Intelligence
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karel Oliva.
meeting of the association for computational linguistics | 2001
Jan Hajic; Pavel Krbec; Pavel Kveton; Karel Oliva; Vladimír Petkevič
A hybrid system is described which combines the strength of manual rule-writing and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results.
text speech and dialogue | 2003
Karel Oliva; Pavel Květoň; Roman Ondruška
The paper deals with the computational complexity of Part-of-Speech tagging (aka morphological disambiguation) by means of rules derived from loosened negative n-grams. Loosened negative n-grams [2] were originally developed as a tool for the task of pure verification of results of Part-of-Speech tagging (corpus quality checking). It is shown that while the verification is just a polynomial problem, the time consumed by the tagging (disambiguation) task cannot be bounded by a polynom in the general case. The results presented in the paper are relevant above all for disambiguation performed by means of Constraint-based Grammars [1] and similar frameworks, which are in fact only notational variants of the rules derived via loosened negative n-grams. Throughout the paper some familiarity with finite-state automata (FSA) and the class of NP problems is assumed.
text speech and dialogue | 2002
Pavel Kveton; Karel Oliva
After some theoretical discussion on the issue of representativity of a corpus, this paper presents a simple yet very efficient technique serving for (semi-) automatic detection of those positions in a part-of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and application of invalid bigrams, i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - VERB). Further, the paper describes the generalization of the invalid bigrams into extended invalid bigrams of length n, for any natural n, which provides a powerful tool for error detection in a corpus. The approach is illustrated by English, German and Czech examples.
international conference on computational linguistics | 2002
Pavel Květoň; Karel Oliva
This paper presents a simple yet in practice very efficient technique serving for automatic detection of those positions in a part-of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of negative bigrams, i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the negative bigrams into negative n-grams, for any natural n, which indeed provides a powerful tool for error detection in a corpus. The implementation is also discussed, as well as evaluation of results of the approach when used for error detection in the NEGRA® corpus of German, and the general implications for the quality of results of statistical taggers. Illustrative examples in the text are taken from German, and hence at least a basic command of this language would be helpful for their understanding - due to the complexity of the necessary accompanying explanation, the examples are neither glossed nor translated. However, the central ideas of the paper should be understandable also without any knowledge of German.
text speech and dialogue | 2001
Karel Oliva
The performance of taggers is usually evaluated by their percentual success rate. Because of the pure quantitativity of such an approach, all errors committed by the tagger are treated on a par for the purpose of the evaluation. This paper takes a different, qualitative stand on the topic, arguing that the previous viewpoint is not linguistically adequate: the errors (might) differ in severity. General implications for tagging are discussed, and a simple method is proposed and exemplified, able to 1. detect and in some cases even rectify the most severe errors and thus 2. contribute to arriving finallyat a better tagged corpus. Some encouraging results achieved bya very simple, manually performed test and evaluation on a small sample of a corpus are given.
text speech and dialogue | 2000
Karel Oliva; Milena Hnátková; Vladimír Petkevič; Pavel Kveton
This paper describes the conception of a rule-based tagger (part-of-speech disambiguator) of Czech currently developed for tagging the Czech National Corpus (cf. [2]). The input ofthe tagger consists ofsentences whose words are assigned all possible morphological analyses. The tagger disambiguates this input by successive elimination oftags which are syntactically implausible in the sentential context ofthe particular word. Due to this, the tagger promises substantially higher accuracy than current stochastic taggers for Czech. This is documented by the results concerning the disambiguation ofthe most frequent ambiguous word form in Czech-the word se.
Lecture Notes in Computer Science | 2005
Stefan Klatt; Karel Oliva
In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.
text speech and dialogue | 2003
Tomáš Holan; Vladislav Kuboň; Martin Plátek; Karel Oliva
This paper describes the theoretical basis of a framework for syntactic analysis allowing for flexible, stepwise shifting of the separation line between two sets of ill-formed strings which, given a fixed degree of robustness, i.e. tolerable violation of grammatical constraints:
TAL. Traitement automatique des langues | 2000
Tomáš Holan; Vladislav Kubon; Karel Oliva; Martin Plátek
international workshop/conference on parsing technologies | 2001
Martin Plátek; Tomáš Holan; Vladislav Kubon; Karel Oliva