Bart Desmet | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bart Desmet is active.

Explore More

Publication

Featured researches published by Bart Desmet.

Expert Systems With Applications | 2013

Emotion detection in suicide notes

Bart Desmet; Veronique Hoste

The success of suicide prevention, a major public health concern worldwide, hinges on adequate suicide risk assessment. Online platforms are increasingly used for expressing suicidal thoughts, but manual monitoring is unfeasible given the information overload experts are confronted with. We investigate whether the recent advances in natural language processing, and more specifically in sentiment mining, can be used to accurately pinpoint 15 different emotions, which might be indicative of suicidal behavior. A system for automatic emotion detection was built using binary support vector machine classifiers. We hypothesized that lexical and semantic features could be an adequate way to represent the data, as emotions seemed to be lexicalized consistently. The optimal feature combination for each of the different emotions was determined using bootstrap resampling. Spelling correction was applied to the input data, in order to reduce lexical variation. Classification performance varied between emotions, with scores up to 68.86% F-score. F-scores above 40% were achieved for six of the seven most frequent emotions: thankfulness, guilt, love, information, hopelessness and instructions. The most salient features are trigram and lemma bags-of-words and subjectivity clues. Spelling correction had a slightly positive effect on classification performance. We showed that fine-grained automatic emotion detection benefits from classifier optimization and a combined lexico-semantic feature representation. The modest performance improvements obtained through spelling correction might indicate the robustness of the system to noisy input text. We conclude that natural language processing techniques have future application potential for suicide prevention.

ACM Transactions on Intelligent Systems and Technology | 2016

Multimodular Text Normalization of Dutch User-Generated Content

Sarah Schulz; Guy De Pauw; Orphée De Clercq; Bart Desmet; Veronique Hoste; Walter Daelemans; Lieve Macken

As social media constitutes a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the nonstandard language used on social media poses problems for natural language processing (NLP) tools, as these are typically trained on standard language material. We propose a text normalization approach to tackle this problem. More specifically, we investigate the usefulness of a multimodular approach to account for the diversity of normalization issues encountered in user-generated content (UGC). We consider three different types of UGC written in Dutch (SNS, SMS, and tweets) and provide a detailed analysis of the performance of the different modules and the overall system. We also apply an extrinsic evaluation by evaluating the performance of a part-of-speech tagger, lemmatizer, and named-entity recognizer before and after normalization.

language resources and evaluation | 2015

The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment

Marjan Van de Kauter; Bart Desmet; Veronique Hoste

We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire text, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis.

workshop on statistical machine translation | 2015

UGENT-LT3 SCATE System for Machine Translation Quality Estimation

Arda Tezcan; Veronique Hoste; Bart Desmet; Lieve Macken

This paper describes the submission of the UGENT-LT3 SCATE system to the WMT15 Shared Task on Quality Estimation (QE), viz. English-Spanish word and sentence-level QE. We conceived QE as a supervised Machine Learning (ML) problem and designed additional features and combined these with the baseline feature set to estimate quality. The sentence-level QE system re-uses the word level predictions of the word-level QE system. We experimented with different learning methods and observe improvements over the baseline system for wordlevel QE with the use of the new features and by combining learning methods into ensembles. For sentence-level QE we show that using a single feature based on word-level predictions can perform better than the baseline system and using this in combination with additional features led to further improvements in performance.

Biomedical Informatics Insights | 2012

Combining Lexico-semantic Features for Emotion Classification in Suicide Notes

Bart Desmet; Veronique Hoste

This paper describes a system for automatic emotion classification, developed for the 2011 i2b2 Natural Language Processing Challenge, Track 2. The objective of the shared task was to label suicide notes with 15 relevant emotions on the sentence level. Our system uses 15 SVM models (one for each emotion) using the combination of features that was found to perform best on a given emotion. Features included lemmas and trigram bag of words, and information from semantic resources such as WordNet, SentiWordNet and subjectivity clues. The best-performing system labeled 7 of the 15 emotions and achieved an F-score of 53.31% on the test data.

PLOS ONE | 2018

Automatic Detection of Cyberbullying in Social Media Text

Cynthia Van Hee; Gilles Jacobs; Chris Emmery; Bart Desmet; Els Lefever; Ben Verhoeven; Guy De Pauw; Walter Daelemans; Veronique Hoste

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

Information Sciences | 2018

Online suicide prevention through optimised text classification

Bart Desmet; Veronique Hoste

Abstract Online communication platforms are increasingly used to express suicidal thoughts. There is considerable interest in monitoring such messages, both for population-wide and individual prevention purposes, and to inform suicide research and policy. Online information overload prohibits manual detection, which is why keyword search methods are typically used. However, these are imprecise and unable to handle implicit references or linguistic noise. As an alternative, this study investigates supervised text classification to model and detect suicidality in Dutch-language forum posts. Genetic algorithms were used to optimise models through feature selection and hyperparameter optimisation. A variety of features was found to be informative, including token and character ngram bags-of-words, presence of salient suicide-related terms and features based on LSA topic models and polarity lexicons. The results indicate that text classification is a viable and promising strategy for detecting suicide-related and alarming messages, with F-scores comparable to human annotators (93% for relevant messages, 70% for severe messages). Both types of messages can be detected with high precision and minimal noise, even on large high-skew corpora. This suggests that they would be fit for use in a real-world prevention setting.

north american chapter of the association for computational linguistics | 2016

Mental distress detection and triage in forum posts: the LT3 CLPsych 2016 shared task system

Bart Desmet; Gilles Jacobs; Veronique Hoste

This paper describes the contribution of LT3 for the CLPsych 2016 Shared Task on automatic triage of mental health forum posts. Our systems use multiclass Support Vector Machines (SVM), cascaded binary SVMs and ensembles with a rich feature set. The best systems obtain macro-averaged F-scores of 40% on the full task and 80% on the green versus alarming distinction. Multiclass SVMs with all features score best in terms of F-score, whereas feature filtering with bi-normal separation and classifier ensembling are found to improve recall of alarming posts.

computational linguistics in the netherlands | 2013