Els Lefever | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Els Lefever is active.

Explore More

Publication

Featured researches published by Els Lefever.

joint conference on lexical and computational semantics | 2009

SemEval-2010 Task 3: Cross-lingual Word Sense Disambiguation

Els Lefever; Veronique Hoste

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the Europarl parallel corpus. The multilingual setup involves the translations of a given English polysemous noun in five supported languages, viz. Dutch, French, German, Spanish and Italian. The task targets the following goals: (a) the manual creation of a multilingual sense inventory for a lexical sample of English nouns and (b) the evaluation of systems on their ability to disambiguate new occurrences of the selected polysemous nouns. For the creation of the hand-tagged gold standard, all translations of a given polysemous English noun are retrieved in the five languages and clustered by meaning. Systems can participate in 5 bilingual evaluation subtasks (English -- Dutch, English -- German, etc.) and in a multilingual subtask covering all language pairs. As WSD from cross-lingual evidence is gaining popularity, we believe it is important to create a multilingual gold standard and run cross-lingual WSD benchmark tests.

meeting of the association for computational linguistics | 2009

Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus

Els Lefever; Lieve Macken; Veronique Hoste

We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms.

north american chapter of the association for computational linguistics | 2016

SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)

Georgeta Bordea; Els Lefever; Paul Buitelaar

This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dutch, Italian and French from domains as diverse as environment, food and science. A total of 62 runs submitted by 5 different teams were evaluated using structural measures, by comparison with gold standard taxonomies and by manual quality assessment of novel relations.

Information Sciences | 2010

Clustering web people search results using fuzzy ants

Els Lefever; Timur Fayruzov; Veronique Hoste; M. De Cock

Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.

international conference on computational linguistics | 2008

Linguistically-Based Sub-Sentential Alignment for Terminology Extraction from a Bilingual Automotive Corpus

Lieve Macken; Els Lefever; Veronique Hoste

We present a sub-sentential alignment system that links linguistically motivated phrases in parallel texts based on lexical correspondences and syntactic similarity. We compare the performance of our sub-sentential alignment system with different symmetrization heuristics that combine the GIZA++ alignments of both translation directions. We demonstrate that the aligned linguistically motivated phrases are a useful means to extract bilingual terminology and more specifically complex multiword terms.

north american chapter of the association for computational linguistics | 2015

LT3: A Multi-modular Approach to Automatic Taxonomy Construction

Els Lefever

This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extraction Evaluation”. We propose a hypernym detection system combining three modules: a lexico-syntactic pattern matcher, a morphosyntactic analyzer and a module retrieving hypernym relations from structured lexical resources. Our system ranked first in the competition when considering the gold standard and manual evaluation, and second in the overall ranking. In addition, the experimental results show that all modules contribute to finding hypernym relations between terms.

meeting of the association for computational linguistics | 2016

Very quaffable and great fun: Applying NLP to wine reviews

Iris Hendrickx; Els Lefever; Ilja Croijmans; Asifa Majid; Antal van den Bosch

People find it difficult to name odors and flavors. In blind tests with everyday smells and tastes like orange or chocolate, only 50% is recognized and described correctly. Certain experts like wine reviewers are trained in recognizing and reporting on odor and flavor on a daily basis and they have a much larger vocabulary than lay people. In this research, we want to examine whether expert wine tasters provide consistent descriptions in terms of perceived sensory attributes of wine, both across various wine types and colors. We collected a corpus of wine reviews and performed preliminary experiments to analyse the semantic fields of „flavor” and „odor” in wine reviews. To do so, we applied distributional methods as well as pattern-based approaches. In addition, we show the first results of automatically predicting „color” and „region” of a particular wine, solely based on the reviewer’s text. Our classifiers perform very well when predicting red and white wines, whereas it seems more challenging to distinguish rose wines.

north american chapter of the association for computational linguistics | 2015

LT3: Sentiment Analysis of Figurative Tweets: piece of cake #NotReally

Cynthia Van Hee; Els Lefever; Veronique Hoste

This paper describes our contribution to the SemEval-2015 Task 11 on sentiment analysis of figurative language in Twitter. We considered two approaches, classification and regression, to provide fine-grained sentiment scores for a set of tweets that are rich in sarcasm, irony and metaphor. To this end, we combined a variety of standard lexical and syntactic features with specific features for capturing figurative content. All experiments were done using supervised learning with LIBSVM. For both runs, our system ranked fourth among fifteen submissions.

international conference on computational linguistics | 2014

LT3: Sentiment Classification in User-Generated Content Using a Rich Feature Set

Cynthia Van Hee; Marjan Van de Kauter; Orphée De Clercq; Els Lefever; Veronique Hoste

This paper describes our contribution to the SemEval-2014 Task 9 on sentiment analysis in Twitter. We participated in both strands of the task, viz. classification at message-level (subtask B), and polarity disambiguation of particular text spans within a message (subtask A). Our experiments with a variety of lexical and syntactic features show that our systems benefit from rich feature sets for sentiment analysis on user-generated content. Our systems ranked ninth among 27 and sixteenth among 50 submissions for task A and B respectively.

PLOS ONE | 2018

Automatic Detection of Cyberbullying in Social Media Text

Cynthia Van Hee; Gilles Jacobs; Chris Emmery; Bart Desmet; Els Lefever; Ben Verhoeven; Guy De Pauw; Walter Daelemans; Veronique Hoste

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

Explore More