Jakub Zavrel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jakub Zavrel is active.

Explore More

Publication

Featured researches published by Jakub Zavrel.

Machine Learning | 1999

Forgetting Exceptions is Harmful in Language Learning

Walter Daelemans; Antal van den Bosch; Jakub Zavrel

We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

Computational Linguistics | 2001

Improving accuracy in word class tagging through the combination of machine learning systems

Hans van Halteren; Walter Daelemans; Jakub Zavrel

We examine how differences in language models, learned by different data-driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morphosyntactic word class tagging, on the basis of three different tagged corpora. Four well-known tagger generators (hidden Markov model, memory-based, transformation rules, and maximum entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second-stage classifiers. All combination taggers outperform their best component. The reduction in error rate varies with the material in question, but can be as high as 24.3 with the LOB corpus.

meeting of the association for computational linguistics | 1998

Improving Data Driven Wordclass Tagging by System Combination

Hans van Halteren; Jakub Zavrel; Walter Daelemans

In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generator (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger.

meeting of the association for computational linguistics | 1997

Memory-Based Learning: Using Similarity for Smoothing

Jakub Zavrel; Walter Daelemans

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.

Artificial Intelligence Review | 1996

Neural navigation interfaces for information retrieval: are they more than an appealing idea?

Jakub Zavrel

Neural networks have recently been proposed for the construction of navigation interfaces for Information Retrieval systems. In this paper, we give an overview of some current research in this area. Most of the cited approaches use (variants) of the well-known Kohonen network. The Kohonen network implements a topology-preserving dimensionality-reducing mapping, which can be applied for information visualization. We identify a number of problems in the application of Kohonen networks for Information Retrieval, most notably scalability, reliability and retrieval effectiveness. To solve these problems we propose to use the Growing Cell Structures network, a variant of the Kohonen network which shows a more flexible adaptation to the domain structure.This network was tested on two standard test-collections, using a combined recall and precision measure, and compared to traditional IR methods such as the Vector Space Model and various clustering algorithms. The network performs at a competitive level of effectiveness, and is suitable for visualization purposes. However, the incremental training procedures for the networks result in a reliability problem, and the approach is computationally intensive. Also, the utility of the resulting maps for navigation will need further improvement.

conference on computational natural language learning | 2000

Genetic algorithms for feature relevance assignment in memory-based language processing

Anne Kool; Walter Daelemans; Jakub Zavrel

We investigate the usefulness of evolutionary algorithms in three incarnations of the problem of feature relevance assignment in memory-based language processing (MBLP): feature weighting, feature ordering and feature selection. We use a simple genetic algorithm (GA) for this problem on two typical tasks in natural language processing: morphological synthesis and unknown word tagging. We find that GA feature selection always significantly outperforms the MBLP variant without selection and that feature ordering and weighting with GA significantly outperforms a situation where no weighting is used. However, GA selection does not significantly do better than simple iterative feature selection methods, and GA weighting and ordering reach only similar performance as current information-theoretic feature weighting methods.

Natural Language Engineering | 2009

Learning effective and engaging strategies for advice-giving human-machine dialogue

Martijn Spitters; Marco De Boni; Jakub Zavrel; Remko Bonnema

We describe a system that automatically learns effective and engaging dialogue strategies, generated from a library of dialogue content, using reinforcement learning from user feedback. Besides the more usual clarification and verification components of dialogue, this library contains various social elements like greetings, apologies, small talk, relational questions and jokes. We tested the method through an experimental dialogue system that encourages take-up of exercise and shows that the learned dialogue policy performs as well as one built by human experts for this system.

Research Group Technical Report Series | 2003