Jakub Zavrel
Tilburg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jakub Zavrel.
Machine Learning | 1999
Walter Daelemans; Antal van den Bosch; Jakub Zavrel
We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.
Computational Linguistics | 2001
Hans van Halteren; Walter Daelemans; Jakub Zavrel
We examine how differences in language models, learned by different data-driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morphosyntactic word class tagging, on the basis of three different tagged corpora. Four well-known tagger generators (hidden Markov model, memory-based, transformation rules, and maximum entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second-stage classifiers. All combination taggers outperform their best component. The reduction in error rate varies with the material in question, but can be as high as 24.3 with the LOB corpus.
meeting of the association for computational linguistics | 1998
Hans van Halteren; Jakub Zavrel; Walter Daelemans
In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generator (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger.
meeting of the association for computational linguistics | 1997
Jakub Zavrel; Walter Daelemans
This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.
Artificial Intelligence Review | 1996
Jakub Zavrel
Neural networks have recently been proposed for the construction of navigation interfaces for Information Retrieval systems. In this paper, we give an overview of some current research in this area. Most of the cited approaches use (variants) of the well-known Kohonen network. The Kohonen network implements a topology-preserving dimensionality-reducing mapping, which can be applied for information visualization. We identify a number of problems in the application of Kohonen networks for Information Retrieval, most notably scalability, reliability and retrieval effectiveness. To solve these problems we propose to use the Growing Cell Structures network, a variant of the Kohonen network which shows a more flexible adaptation to the domain structure.This network was tested on two standard test-collections, using a combined recall and precision measure, and compared to traditional IR methods such as the Vector Space Model and various clustering algorithms. The network performs at a competitive level of effectiveness, and is suitable for visualization purposes. However, the incremental training procedures for the networks result in a reliability problem, and the approach is computationally intensive. Also, the utility of the resulting maps for navigation will need further improvement.
conference on computational natural language learning | 2000
Anne Kool; Walter Daelemans; Jakub Zavrel
We investigate the usefulness of evolutionary algorithms in three incarnations of the problem of feature relevance assignment in memory-based language processing (MBLP): feature weighting, feature ordering and feature selection. We use a simple genetic algorithm (GA) for this problem on two typical tasks in natural language processing: morphological synthesis and unknown word tagging. We find that GA feature selection always significantly outperforms the MBLP variant without selection and that feature ordering and weighting with GA significantly outperforms a situation where no weighting is used. However, GA selection does not significantly do better than simple iterative feature selection methods, and GA weighting and ordering reach only similar performance as current information-theoretic feature weighting methods.
Natural Language Engineering | 2009
Martijn Spitters; Marco De Boni; Jakub Zavrel; Remko Bonnema
We describe a system that automatically learns effective and engaging dialogue strategies, generated from a library of dialogue content, using reinforcement learning from user feedback. Besides the more usual clarification and verification components of dialogue, this library contains various social elements like greetings, apologies, small talk, relational questions and jokes. We tested the method through an experimental dialogue system that encourages take-up of exercise and shows that the learned dialogue policy performs as well as one built by human experts for this system.
Research Group Technical Report Series | 2003
Walter Daelemans; Jakub Zavrel; K. van der Sloot; A.P.J. van den Bosch
international conference on computational linguistics | 1996
Walter Daelemans; Jakub Zavrel; Peter Berck; Steven Gillis
Archive | 2007
Ilk; Walter Daelemans; Jakub Zavrel; Ko van der Sloot; Antal van den Bosch