Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Veronique Hoste is active.

Publication


Featured researches published by Veronique Hoste.


joint conference on lexical and computational semantics | 2009

SemEval-2010 Task 3: Cross-lingual Word Sense Disambiguation

Els Lefever; Veronique Hoste

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the Europarl parallel corpus. The multilingual setup involves the translations of a given English polysemous noun in five supported languages, viz. Dutch, French, German, Spanish and Italian. The task targets the following goals: (a) the manual creation of a multilingual sense inventory for a lexical sample of English nouns and (b) the evaluation of systems on their ability to disambiguate new occurrences of the selected polysemous nouns. For the creation of the hand-tagged gold standard, all translations of a given polysemous English noun are retrieved in the five languages and clustered by meaning. Systems can participate in 5 bilingual evaluation subtasks (English -- Dutch, English -- German, etc.) and in a multilingual subtask covering all language pairs. As WSD from cross-lingual evidence is gaining popularity, we believe it is important to create a multilingual gold standard and run cross-lingual WSD benchmark tests.


north american chapter of the association for computational linguistics | 2016

SemEval-2016 task 5 : aspect based sentiment analysis

Maria Pontiki; Dimitris Galanis; Haris Papageorgiou; Ion Androutsopoulos; Suresh Manandhar; Mohammad Al-Smadi; Mahmoud Al-Ayyoub; Yanyan Zhao; Bing Qin; Orphée De Clercq; Veronique Hoste; Marianna Apidianaki; Xavier Tannier; Natalia V. Loukachevitch; Evgeniy Kotelnikov; Núria Bel; Salud María Jiménez-Zafra; Gülşen Eryiğit

This paper describes the SemEval 2016 shared task on Aspect Based Sentiment Analysis (ABSA), a continuation of the respective tasks of 2014 and 2015. In its third year, the task provided 19 training and 20 testing datasets for 8 languages and 7 domains, as well as a common evaluation procedure. From these datasets, 25 were for sentence-level and 14 for text-level ABSA; the latter was introduced for the first time as a subtask in SemEval. The task attracted 245 submissions from 29 teams.


Expert Systems With Applications | 2013

Emotion detection in suicide notes

Bart Desmet; Veronique Hoste

The success of suicide prevention, a major public health concern worldwide, hinges on adequate suicide risk assessment. Online platforms are increasingly used for expressing suicidal thoughts, but manual monitoring is unfeasible given the information overload experts are confronted with. We investigate whether the recent advances in natural language processing, and more specifically in sentiment mining, can be used to accurately pinpoint 15 different emotions, which might be indicative of suicidal behavior. A system for automatic emotion detection was built using binary support vector machine classifiers. We hypothesized that lexical and semantic features could be an adequate way to represent the data, as emotions seemed to be lexicalized consistently. The optimal feature combination for each of the different emotions was determined using bootstrap resampling. Spelling correction was applied to the input data, in order to reduce lexical variation. Classification performance varied between emotions, with scores up to 68.86% F-score. F-scores above 40% were achieved for six of the seven most frequent emotions: thankfulness, guilt, love, information, hopelessness and instructions. The most salient features are trigram and lemma bags-of-words and subjectivity clues. Spelling correction had a slightly positive effect on classification performance. We showed that fine-grained automatic emotion detection benefits from classifier optimization and a combined lexico-semantic feature representation. The modest performance improvements obtained through spelling correction might indicate the robustness of the system to noisy input text. We conclude that natural language processing techniques have future application potential for suicide prevention.


Spyns, P.;Odijk, J. (ed.), Essential Speech and Language Technology for Dutch | 2013

The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch

Nelleke Oostdijk; Martin Reynaert; Veronique Hoste; Ineke Schuurman

The construction of a large and richly annotated corpus of written Dutch was identified as one of the priorities of the STEVIN programme. Such a corpus, sampling texts from conventional and new media, is invaluable for scientific research and application development. The present chapter describes how in two consecutive STEVIN-funded projects, viz. D-Coi and SoNaR, the Dutch reference corpus was developed. The construction of the corpus has been guided by (inter)national standards and best practices. At the same time through the achievements and the experiences gained in the D-Coi and SoNaR projects, a contribution was made to their further advancement and dissemination.


BMC Bioinformatics | 2009

Linguistic feature analysis for protein interaction extraction

Timur Fayruzov; Martine De Cock; Chris Cornelis; Veronique Hoste

BackgroundThe rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels.ResultsOur results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared.ConclusionOur findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches.


Expert Systems With Applications | 2015

Fine-grained analysis of explicit and implicit sentiment in financial news articles

Marjan Van de Kauter; Diane Breesch; Veronique Hoste

In the financial domain, news has an impact on the stock markets.Most sentiment analysis methods are coarse-grained and focus on explicit sentiment.Such a method is insufficient to detect topic-specific sentiment in financial news articles.We propose a novel fine-grained method that detects explicit and implicit sentiment.This is a viable method for topic-specific sentiment analysis in financial news text. This paper focuses on topic-specific and more specifically company-specific sentiment analysis in financial newswire text. This application is of great use to researchers in the financial domain who study the impact of news (media) on the stock markets.We investigate the viability of a new fine-grained sentiment annotation scheme. Most of the current approaches to sentiment analysis focus on the detection of explicit sentiment. As news text often contains implicit sentiment, i.e. factual information implying positive or negative sentiment, our approach aims to identify both explicit and implicit sentiment. Furthermore, this sentiment is analyzed on a fine-grained level by detecting the topic of the sentiment, as sentiment is not always expressed towards the topics one is interested in.In order to test our approach, we assembled a corpus of company-specific news articles, which was manually labeled by four annotators to create a gold standard. We compare the results of our method to the performance of two coarse-grained baseline systems: a lexicon-based approach and a supervised machine learning approach that makes use of lexical features. Our fine-grained approach outperforms both baselines, and its output shows substantial to almost perfect agreement with the gold standard sentiment labels. Using our annotation scheme, we are able to filter out irrelevant sentiment expressions and detect explicit and implicit sentiment in a reliable way.


meeting of the association for computational linguistics | 2009

Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus

Els Lefever; Lieve Macken; Veronique Hoste

We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms.


meeting of the association for computational linguistics | 2002

Evaluating the results of a memory-based word-expert approach to unrestricted word sense disambiguation

Veronique Hoste; Walter Daelemans; Iris Hendrickx; Antal van den Bosch

In this paper, we evaluate the results of the Antwerp University word sense disambiguation system in the English all words task of SENSEVAL-2. In this approach, specialized memory-based word-experts were trained per word-POS combination. Through optimization by cross-validation of the individual component classifiers and the voting scheme for combining them, the best possible word-expert was determined. In the competition, this word-expert architecture resulted in accuracies of 63.6% (fine-grained) and 64.5% (coarse-grained) on the SENSEVAL-2 test data.In order to better understand these results, we investigated whether classifiers trained on different information sources performed differently on the different part-of-speech categories. Furthermore, the results were evaluated in terms of the available number of training items, the number of senses, and the sense distributions in the data set. We conclude that there is no information source which is optimal over all word-experts. Selecting the optimal classifier/voter for each single word-expert, however, leads to major accuracy improvements. We furthermore show that accuracies do not so much depend on the available number of training items, but largely on polysemy and sense distributions.


Information Sciences | 2010

Clustering web people search results using fuzzy ants

Els Lefever; Timur Fayruzov; Veronique Hoste; M. De Cock

Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.


ACM Transactions on Intelligent Systems and Technology | 2016

Multimodular Text Normalization of Dutch User-Generated Content

Sarah Schulz; Guy De Pauw; Orphée De Clercq; Bart Desmet; Veronique Hoste; Walter Daelemans; Lieve Macken

As social media constitutes a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the nonstandard language used on social media poses problems for natural language processing (NLP) tools, as these are typically trained on standard language material. We propose a text normalization approach to tackle this problem. More specifically, we investigate the usefulness of a multimodular approach to account for the diversity of normalization issues encountered in user-generated content (UGC). We consider three different types of UGC written in Dutch (SNS, SMS, and tweets) and provide a detailed analysis of the performance of the different modules and the overall system. We also apply an extrinsic evaluation by evaluating the performance of a part-of-speech tagger, lemmatizer, and named-entity recognizer before and after normalization.

Collaboration


Dive into the Veronique Hoste's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge