Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Véronique Hoste is active.

Publication


Featured researches published by Véronique Hoste.


Natural Language Engineering | 2002

Parameter optimization for machine-learning of word sense disambiguation

Véronique Hoste; Iris Hendrickx; Walter Daelemans; A. van den Bosch

Various Machine Learning (ML) approaches have been demonstrated to produce relatively successful Word Sense Disambiguation (WSD) systems. There are still unexplained differences among the performance measurements of different algorithms, hence it is warranted to deepen the investigation into which algorithm has the right ‘bias’ for this task. In this paper, we show that this is not easy to accomplish, due to intricate interactions between information sources, parameter settings, and properties of the training data. We investigate the impact of parameter optimization on generalization accuracy in a memory-based learning approach to English and Dutch WSD. A ‘word-expert’ architecture was adopted, yielding a set of classifiers, each specialized in one single wordform. The experts consist of multiple memory-based learning classifiers, each taking different information sources as input, combined in a voting scheme. We optimized the architectural and parametric settings for each individual word-expert by performing cross-validation experiments on the learning material. The results of these experiments show that the variation of both the algorithmic parameters and the information sources available to the classifiers leads to large fluctuations in accuracy. We demonstrate that optimization per word-expert leads to an overall significant improvement in the generalization accuracies of the produced WSD systems.


meeting of the association for computational linguistics | 2003

Learning to Predict Pitch Accents and Prosodic Boundaries in Dutch

Erwin Marsi; Martin Reynaert; Antal van den Bosch; Walter Daelemans; Véronique Hoste

We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously. The parameters of both algorithms and the selection of features are optimized per task with iterative deepening, an efficient wrapper procedure that uses progressive sampling of training data. Results show a consistent significant advantage of MBL over CART, and also indicate that task combination can be done at the cost of little generalization score loss. Tests on cross-validated data and on held-out data yield F-scores of MBL on accent placement of 84 and 87, respectively, and on breaks of 88 and 91, respectively. Accent placement is shown to outperform an informed baseline rule; reliably predicting breaks other than those already indicated by intra-sentential punctuation, however, appears to be more challenging.


discourse anaphora and anaphor resolution colloquium | 2007

Evaluating hybrid versus data-driven coreference resolution

Iris Hendrickx; Véronique Hoste; Walter Daelemans

In this paper, we present a systematic evaluation of a hybrid approach of combined rule-based filtering and machine learning to Dutch coreference resolution. Through the application of a selection of linguistically-motivated negative and positive filters, which we apply in isolation and combined, we study the effect of these filters on precision and recall using two different learning techniques: memory-based learning and maximum entropy modeling. Our results show that by using the hybrid approach, we can reduce up to 92% of the training material without performance loss. We also show that the filters improve the overall precision of the classifiers leading to higher F-scores on the test set.


Computer Speech & Language | 2004

Using rule-induction techniques to model pronunciation variation in Dutch

Véronique Hoste; Walter Daelemans; Steven Gillis

In this paper, we present an inductive approach to the automatic extraction of knowledge about inter-regional pronunciation variation. We compare two different rule-induction techniques, both popular in language engineering applications, viz. the rule sequence learner Transformation-Based Error-Driven Learning (TBEDL) Brill (1995) and the decision tree learner C5.0 Quinlan (1993). We investigate whether both techniques detect the same regularities and evaluate the extracted rules in terms of accuracy and in terms of linguistic relevance. As a case study, we apply the approach to Dutch and Flemish (the variety of Dutch spoken in Flanders, a part of Belgium), based on Celex and Fonilex, pronunciation lexica for Dutch and Flemish, respectively. Our main goal is to show that this approach allows the automatic acquisition of compact, interpretable translation rules between pronunciation varieties, on the basis of phonemic representations of words in both varieties (as output of phoneme recognition or, as in our case, on the basis of existing lexica). We also show that the observed differences coincide with the tendencies studied and described in linguistic comparative research of inter-regional pronunciation variation in standard Dutch.


Essential speech and language technology for Dutch : results by the STEVIN-programme / Spyns, Peter [edit.]; e.a. | 2013

COREA: Coreference Resolution for Extracting Answers for Dutch

Iris Hendrickx; Gosse Bouma; Walter Daelemans; Véronique Hoste

Many natural language processing applications can benefit from the identification of coreference relations. For example, in information extraction and question answering, recall should in principle increase when the available information is expanded by linking expressions in a text that refer to the same discourse entity. Most current state-of-the-art systems for coreference resolution are based on supervised machine learning, and require (large) amounts of annotated data for training and testing. As these data were lacking for Dutch, the corea project had as its goal the construction of a coreference corpus for Dutch, and the development of an automatic coreference resolution system that used the corpus as training material. In this paper, we present the results of our annotation efforts, and the design of the automatic resolution system. Furthermore, we discuss various experiments that were carried out to determine the accuracy of the resolution system and the potential benefit of incorporating coreference resolution in nlp tasks


Proceedings of the eleventh Belgian-Dutch Conference on Machine Learning (BENELEARN 2001), December 21, Antwerp, Belgium | 2001

Rediscovering workflow models from event-based data

A.J.M.M. Weijters; W.M.P. van der Aalst; Véronique Hoste; G. de Pauw


language resources and evaluation | 2002

Evaluation of Machine Learning Methods for Natural Language Processing Tasks.

Walter Daelemans; Véronique Hoste


european conference on machine learning | 2003

Combined optimization of feature selection and algorithm parameter interaction in machine learning of language

Walter Daelemans; Véronique Hoste; F. De Meulder; Bart Naudts


meeting of the association for computational linguistics | 2001

Classifier Optimization and Combination in the English All Words Task

Véronique Hoste; Anne Kool; Walter Daelemans


international conference on machine learning | 2005

Comparing learning approaches to coreference resolution : there is more to it than 'bias'

Véronique Hoste; Walter Daelemans

Collaboration


Dive into the Véronique Hoste's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gosse Bouma

University of Groningen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge