Iris Hendrickx | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Iris Hendrickx is active.

Explore More

Publication

Featured researches published by Iris Hendrickx.

Natural Language Engineering | 2002

Parameter optimization for machine-learning of word sense disambiguation

VÃ©ronique Hoste; Iris Hendrickx; Walter Daelemans; A. van den Bosch

Various Machine Learning (ML) approaches have been demonstrated to produce relatively successful Word Sense Disambiguation (WSD) systems. There are still unexplained differences among the performance measurements of different algorithms, hence it is warranted to deepen the investigation into which algorithm has the right ‘bias’ for this task. In this paper, we show that this is not easy to accomplish, due to intricate interactions between information sources, parameter settings, and properties of the training data. We investigate the impact of parameter optimization on generalization accuracy in a memory-based learning approach to English and Dutch WSD. A ‘word-expert’ architecture was adopted, yielding a set of classifiers, each specialized in one single wordform. The experts consist of multiple memory-based learning classifiers, each taking different information sources as input, combined in a voting scheme. We optimized the architectural and parametric settings for each individual word-expert by performing cross-validation experiments on the learning material. The results of these experiments show that the variation of both the algorithmic parameters and the information sources available to the classifiers leads to large fluctuations in accuracy. We demonstrate that optimization per word-expert leads to an overall significant improvement in the generalization accuracies of the produced WSD systems.

meeting of the association for computational linguistics | 2002

Evaluating the results of a memory-based word-expert approach to unrestricted word sense disambiguation

Veronique Hoste; Walter Daelemans; Iris Hendrickx; Antal van den Bosch

In this paper, we evaluate the results of the Antwerp University word sense disambiguation system in the English all words task of SENSEVAL-2. In this approach, specialized memory-based word-experts were trained per word-POS combination. Through optimization by cross-validation of the individual component classifiers and the voting scheme for combining them, the best possible word-expert was determined. In the competition, this word-expert architecture resulted in accuracies of 63.6% (fine-grained) and 64.5% (coarse-grained) on the SENSEVAL-2 test data.In order to better understand these results, we investigated whether classifiers trained on different information sources performed differently on the different part-of-speech categories. Furthermore, the results were evaluated in terms of the available number of training items, the number of senses, and the sense distributions in the data set. We conclude that there is no information source which is optimal over all word-experts. Selecting the optimal classifier/voter for each single word-expert, however, leads to major accuracy improvements. We furthermore show that accuracies do not so much depend on the available number of training items, but largely on polysemy and sense distributions.

north american chapter of the association for computational linguistics | 2003

Memory-based one-step named-entity recognition: effects of seed list features, classifier stacking, and unannotated data

Iris Hendrickx; Antal van den Bosch

We present a memory-based named-entity recognition system that chunks and labels named entities in a oneshot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German.

Archive | 2009

Anaphora Processing and Applications

Iris Hendrickx; Sobha Lalitha Devi; António Branco; Ruslan Mitkov

Resolution Methodology.- Why Would a Robot Make Use of Pronouns? An Evolutionary Investigation of the Emergence of Pronominal Anaphora.- Automatic Recognition of the Function of Singular Neuter Pronouns in Texts and Spoken Data.- A Deeper Look into Features for Coreference Resolution.- Computational Applications.- Coreference Resolution on Blogs and Commented News.- Identification of Similar Documents Using Coherent Chunks.- Language Analysis.- Binding without Identity: Towards a Unified Semantics for Bound and Exempt Anaphors.- The Doubly Marked Reflexive in Chinese.- Human Processing.- Definiteness Marking Shows Late Effects during Discourse Processing: Evidence from ERPs.- Pronoun Resolution to Commanders and Recessors: A View from Event-Related Brain Potentials.- Effects of Anaphoric Dependencies and Semantic Representations on Pronoun Interpretation.

international conference on natural language generation | 2008

GRAPH: the costs of redundancy in referring expressions

Emiel Krahmer; Mariët Theune; Jette Viethen; Iris Hendrickx

We describe a graph-based generation system that participated in the TUNA attribute selection and realisation task of the REG 2008 Challenge. Using a stochastic cost function (with certain properties for free), and trying attributes from cheapest to more expensive, the system achieves overall .76 DICE and .54 MASI scores for attribute selection on the development set. For realisation, it turns out that in some cases higher attribute selection accuracy leads to larger differences between system-generated and human descriptions.

european conference on machine learning | 2005

Hybrid algorithms with instance-based classification

Iris Hendrickx; Antal van den Bosch

In this paper we aim to show that instance-based classification can replace the classifier component of a rule learner and of maximum-entropy modeling, thereby improving the generalization accuracy of both algorithms. We describe hybrid algorithms that combine rule learning models and maximum-entropy modeling with instance-based classification. Experimental results show that both hybrids are able to outperform the parent algorithm. We analyze and compare the overlap in errors and the statistical bias and variance of the hybrids, their parent algorithms, and a plain instance-based learner. We observe that the successful hybrid algorithms have a lower statistical bias component in the error than their parent algorithms; the fewer errors they make are also less systematic.

Neuroscience Letters | 2009

Reducing Redundancy in Multi-document Summarization Using Lexical Semantic Similarity

Iris Hendrickx; Walter Daelemans; Erwin Marsi; Emiel Krahmer

We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.

discourse anaphora and anaphor resolution colloquium | 2009

Coreference Resolution on Blogs and Commented News

Iris Hendrickx; Veronique Hoste

We focus on automatic coreference resolution for blogs and news articles with user comments as part of a project on opinion mining. We aim to study the effect of the genre shift from edited, structured newspaper text to unedited, unstructured blog data. We compare our coreference resolution system on three data sets: newspaper articles, mixed newspaper articles and reader comments, and blog data. As can be expected the performance of the automatic coreference resolution system drops drastically when tested on unedited text. We describe the characteristics of the different data sets and we examine the typical errors made by the resolution system.

discourse anaphora and anaphor resolution colloquium | 2007

Evaluating hybrid versus data-driven coreference resolution

Iris Hendrickx; VÃ©ronique Hoste; Walter Daelemans

In this paper, we present a systematic evaluation of a hybrid approach of combined rule-based filtering and machine learning to Dutch coreference resolution. Through the application of a selection of linguistically-motivated negative and positive filters, which we apply in isolation and combined, we study the effect of these filters on precision and recall using two different learning techniques: memory-based learning and maximum entropy modeling. Our results show that by using the hybrid approach, we can reduce up to 92% of the training material without performance loss. We also show that the filters improve the overall precision of the classifiers leading to higher F-scores on the test set.

international conference on computational linguistics | 2008

Semantic and syntactic features for dutch coreference resolution

Iris Hendrickx; Veronique Hoste; Walter Daelemans

We investigate the effect of encoding additional semantic and syntactic information sources in a classification-based machine learning approach to the task of coreference resolution for Dutch. We experiment both with a memory-based learning approach and a maximum entropy modeling method. As an alternative to using external lexical resources, such as the lowcoverage Dutch EuroWordNet, we evaluate the effect of automatically generated semantic clusters as information source. We compare these clusters, which group together semantically similar nouns, to two semantic features based on EuroWordNet encoding synonym and hypernym relations between nouns. The syntactic function of the anaphor and antecedent in the sentence can be an important clue for resolving coreferential relations. As baseline approach, we encode syntactic information as predicted by a memorybased shallow parser in a set of features. We contrast these shallow parse based features with features encoding richer syntactic information from a dependency parser. We show that using both the additional semantic information and syntactic information lead to small but significant performance improvement of our coreference resolution approach.

Explore More