Antal van den Bosch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antal van den Bosch is active.

Explore More

Publication

Featured researches published by Antal van den Bosch.

Machine Learning | 1999

Forgetting Exceptions is Harmful in Language Learning

Walter Daelemans; Antal van den Bosch; Jakub Zavrel

We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

Archive | 2007

Arabic Computational Morphology: Knowledge-based and Empirical Methods

Abdelhandi Soudi; Antal van den Bosch; Günter Neumann

The morphology of Arabic poses special challenges to computational natural language processing systems. The exceptional degree of ambiguity in the writing system, the rich morphology, and the highly complex word formation process of roots and patterns all contribute to making computational approaches to Arabic very challenging. Indeed many computational linguists across the world have taken up this challenge over time, and many of the researchers with a track record in this research area have contributed to this book. The books subtitle aims to reflect that widely different computational approaches to the Arabic morphological system have been proposed. These accounts fall into two main paradigms: the knowledge-based and the empirical. Since morphological knowledge plays an essential role in any higher-level understanding and processing of Arabic text, the book also features a part on the role of Arabic morphology in larger applications, i.e. Information Retrieval (IR) and Machine Translation (MT).

international acm sigir conference on research and development in information retrieval | 2007

Broad expertise retrieval in sparse data environments

Krisztian Balog; Toine Bogers; Leif Azzopardi; Maarten de Rijke; Antal van den Bosch

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.

meeting of the association for computational linguistics | 1999

Memory-Based Morphological Analysis

Antal van den Bosch; Walter Daelemans

We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class labels, and spelling changes. Both precision and recall of labeled morphemes are over 84% on held-out dictionary test words and estimated to be over 93% in free text.

conference on recommender systems | 2008

Recommending scientific articles using citeulike

Toine Bogers; Antal van den Bosch

We describe the use of the social reference management website CiteULike for recommending scientific articles to users, based on their reference library. We test three different collaborative filtering algorithms, and find that user-based filtering performs best. A temporal analysis of the data indexed by CiteULike shows that it takes about two years for the cold-start problem to disappear and recommendation performance to improve.

Progress in speech synthesis / Santen, van, Jan P.H. [edit.] | 1997

Language-independent data-oriented grapheme-to-phoneme conversion

Walter Daelemans; Antal van den Bosch

We describe an approach to grapheme-to-phoneme conversion that is both language-independent and data-oriented. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language that takes as its input the spelling of words and produces as its output the phonetic transcription according to the rules implicit in the training data. We describe the design of the system and compare its performance to knowledge-based and alternative data-oriented approaches.

conference of the european chapter of the association for computational linguistics | 1993

Data-oriented methods for grapheme-to-phoneme conversion

Antal van den Bosch; Walter Daelemans

It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques, based on a corpus of transcribed words, the same and even better performance can be achieved, without explicit modeling of linguistic knowledge.In this paper we present two instances of this approach. A first model implements a variant of instance-based learning, in which a weighed similarity metric and a database of prototypical exemplars are used to predict new mappings. In the second model, grapheme-to-phoneme mappings are looked up in a compressed text-to-speech lexicon (table lookup) enriched with default mappings. We compare performance and accuracy of these approaches to a connectionist (backpropagation) approach and to the linguistic knowledge-based approach.

Data Mining and Knowledge Discovery | 2006

A Rule-Based Approach for Process Discovery: Dealing with Noise and Imbalance in Process Logs

Laura Măruşter; A.J.M.M. Weijters; Wil M. P. van der Aalst; Antal van den Bosch

Effective information systems require the existence of explicit process models. A completely specified process design needs to be developed in order to enact a given business process. This development is time consuming and often subjective and incomplete. We propose a method that constructs the process model from process log data, by determining the relations between process tasks. To predict these relations, we employ machine learning technique to induce rule sets. These rule sets are induced from simulated process log data generated by varying process characteristics such as noise and log size. Tests reveal that the induced rule sets have a high predictive accuracy on new data. The effects of noise and imbalance of execution priorities during the discovery of the relations between process tasks are also discussed. Knowing the causal, exclusive, and parallel relations, a process model expressed in the Petri net formalism can be built. We illustrate our approach with real world data in a case study.

conference on recommender systems | 2007

Comparing and evaluating information retrieval algorithms for news recommendation

Toine Bogers; Antal van den Bosch

In this paper, we argue that the performance of content-based news recommender systems has been hampered by using relatively old and simple matching algorithms. Using more current probabilistic retrieval algorithms results in significant performance boosts. We test our ideas on a test collection that we have made publicly available. We perform both binary and graded evaluation of our algorithms and argue for the need for more graded evaluation of content-based recommender systems.

Computers and The Humanities | 2000

Memory-based word sense disambiguation

Jorn Veenstra; Antal van den Bosch; Sabine Buchholz; Walter Daelemans; Jakub Zavrel

We describe a memory-based classification architecture for word sense disambiguation and its application to the SENSEVAL evaluationtask. For each ambiguous word, a semantic word expert isautomatically trained using a memory-based approach. In each expert,selecting the correct sense of a word in a new context is achieved byfinding the closest match to stored examples of this task. Advantagesof the approach include (i) fast development time for word experts,(ii) easy and elegant automatic integration of information sources,(iii) use of all available data for training the experts, and (iv)relatively high accuracy with minimal linguistic engineering.

Explore More