Els Lefever
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Els Lefever.
joint conference on lexical and computational semantics | 2009
Els Lefever; Veronique Hoste
We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the Europarl parallel corpus. The multilingual setup involves the translations of a given English polysemous noun in five supported languages, viz. Dutch, French, German, Spanish and Italian. The task targets the following goals: (a) the manual creation of a multilingual sense inventory for a lexical sample of English nouns and (b) the evaluation of systems on their ability to disambiguate new occurrences of the selected polysemous nouns. For the creation of the hand-tagged gold standard, all translations of a given polysemous English noun are retrieved in the five languages and clustered by meaning. Systems can participate in 5 bilingual evaluation subtasks (English -- Dutch, English -- German, etc.) and in a multilingual subtask covering all language pairs. As WSD from cross-lingual evidence is gaining popularity, we believe it is important to create a multilingual gold standard and run cross-lingual WSD benchmark tests.
meeting of the association for computational linguistics | 2009
Els Lefever; Lieve Macken; Veronique Hoste
We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms.
north american chapter of the association for computational linguistics | 2016
Georgeta Bordea; Els Lefever; Paul Buitelaar
This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dutch, Italian and French from domains as diverse as environment, food and science. A total of 62 runs submitted by 5 different teams were evaluated using structural measures, by comparison with gold standard taxonomies and by manual quality assessment of novel relations.
Information Sciences | 2010
Els Lefever; Timur Fayruzov; Veronique Hoste; M. De Cock
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.
international conference on computational linguistics | 2008
Lieve Macken; Els Lefever; Veronique Hoste
We present a sub-sentential alignment system that links linguistically motivated phrases in parallel texts based on lexical correspondences and syntactic similarity. We compare the performance of our sub-sentential alignment system with different symmetrization heuristics that combine the GIZA++ alignments of both translation directions. We demonstrate that the aligned linguistically motivated phrases are a useful means to extract bilingual terminology and more specifically complex multiword terms.
north american chapter of the association for computational linguistics | 2015
Els Lefever
This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extraction Evaluation”. We propose a hypernym detection system combining three modules: a lexico-syntactic pattern matcher, a morphosyntactic analyzer and a module retrieving hypernym relations from structured lexical resources. Our system ranked first in the competition when considering the gold standard and manual evaluation, and second in the overall ranking. In addition, the experimental results show that all modules contribute to finding hypernym relations between terms.
meeting of the association for computational linguistics | 2016
Iris Hendrickx; Els Lefever; Ilja Croijmans; Asifa Majid; Antal van den Bosch
People find it difficult to name odors and flavors. In blind tests with everyday smells and tastes like orange or chocolate, only 50% is recognized and described correctly. Certain experts like wine reviewers are trained in recognizing and reporting on odor and flavor on a daily basis and they have a much larger vocabulary than lay people. In this research, we want to examine whether expert wine tasters provide consistent descriptions in terms of perceived sensory attributes of wine, both across various wine types and colors. We collected a corpus of wine reviews and performed preliminary experiments to analyse the semantic fields of „flavor” and „odor” in wine reviews. To do so, we applied distributional methods as well as pattern-based approaches. In addition, we show the first results of automatically predicting „color” and „region” of a particular wine, solely based on the reviewer’s text. Our classifiers perform very well when predicting red and white wines, whereas it seems more challenging to distinguish rose wines.
north american chapter of the association for computational linguistics | 2015
Cynthia Van Hee; Els Lefever; Veronique Hoste
This paper describes our contribution to the SemEval-2015 Task 11 on sentiment analysis of figurative language in Twitter. We considered two approaches, classification and regression, to provide fine-grained sentiment scores for a set of tweets that are rich in sarcasm, irony and metaphor. To this end, we combined a variety of standard lexical and syntactic features with specific features for capturing figurative content. All experiments were done using supervised learning with LIBSVM. For both runs, our system ranked fourth among fifteen submissions.
international conference on computational linguistics | 2014
Cynthia Van Hee; Marjan Van de Kauter; Orphée De Clercq; Els Lefever; Veronique Hoste
This paper describes our contribution to the SemEval-2014 Task 9 on sentiment analysis in Twitter. We participated in both strands of the task, viz. classification at message-level (subtask B), and polarity disambiguation of particular text spans within a message (subtask A). Our experiments with a variety of lexical and syntactic features show that our systems benefit from rich feature sets for sentiment analysis on user-generated content. Our systems ranked ninth among 27 and sixteenth among 50 submissions for task A and B respectively.
PLOS ONE | 2018
Cynthia Van Hee; Gilles Jacobs; Chris Emmery; Bart Desmet; Els Lefever; Ben Verhoeven; Guy De Pauw; Walter Daelemans; Veronique Hoste
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.