Steven Van Canneyt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven Van Canneyt is active.

Explore More

Publication

Featured researches published by Steven Van Canneyt.

Pattern Recognition Letters | 2016

Representation learning for very short texts using weighted word embedding aggregation

Cedric De Boom; Steven Van Canneyt; Thomas Demeester; Bart Dhoedt

We create text representations by weighing word embeddings using idf information.A novel median-based loss is designed to mitigate the negative effect of outliers.A dataset of semantically related textual pairs from Wikipedia and Twitter is made.Our method outperforms all word embedding baselines in a semantic similarity task.Our method is out-of-the-box and thus requires no retraining in different contexts. Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information | 2012

Using social media to find places of interest: a case study

Steven Van Canneyt; Olivier Van Laere; Steven Schockaert; Bart Dhoedt

In this paper, we show how the large amount of geographically annotated data in social media can be used to complement existing place databases. After explaining our method, we illustrate how this approach can be used to discover new instances of a given semantic type, using London as a case study. In particular, for several place types, our method finds places in London that are not yet contained in the databases used by Foursquare, Google, LinkedGeoData and Geonames. Encouraged by these results, we briefly sketch how similar techniques could potentially be used to identify likely errors in existing databases, to estimate the spatial extent of places, to discover semantic relationships between place types, and to recommend tags to users who are uploading photos.

web intelligence | 2012

Detecting Places of Interest Using Social Media

Steven Van Canneyt; Steven Schockaert; Olivier Van Laere; Bart Dhoedt

Place recommender systems are increasingly being used to find places of a given type that are close to a user-specified location. As it is important for these systems to use an up-to-date database with a wide coverage, there is a need for techniques that are capable of expanding place databases in an automated way. On the other hand, social media are a rich source of geographically distributed information. In this paper, we therefore propose an approach to discover new instances of a given place type by exploiting correlations between terms and locations in geotagged social media. For a variety of place types, our approach is able to find places which are not yet included in popular place databases such as Foursquare or Google Places.

international conference on data mining | 2015

Learning Semantic Similarity for Very Short Texts

Cedric De Boom; Steven Van Canneyt; Steven Bohez; Thomas Demeester; Bart Dhoedt

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair short text fragments -- as a concatenation of separate words -- an adequate distributed sentence representation is needed, in existing literature often obtained by naively combining the individual word representations. We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching. This paper investigates the effectiveness of several such naive techniques, as well as traditional tf-idf similarity, for fragments of different lengths. Our main contribution is a first step towards a hybrid method that combines the strength of dense distributed representations -- as opposed to sparse term matching -- with the strength of tf-idf based methods to automatically reduce the impact of less informative terms. Our new approach outperforms the existing techniques in a toy experimental set-up, leading to the conclusion that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments.

Information Sciences | 2016

Categorizing events using spatio-temporal and user features from Flickr

Steven Van Canneyt; Steven Schockaert; Bart Dhoedt

Even though the problem of event detection from social media has been well studied in recent years, few authors have looked at deriving structured representations for their detected events. We envision the use of social media for extracting large-scale structured event databases, which could in turn be used for answering complex (historical) queries. As a key stepping-stone towards this goal, we introduce a method for discovering the semantic type of extracted events, focusing in particular on how this type is influenced by the spatio-temporal grounding of the event, the profile of its attendees, and the semantic type of the venue and other entities which are associated with the event. We estimate the aforementioned characteristics from metadata associated with Flickr photos of the event and then use an ensemble learner to identify its most likely semantic type. Experimental results based on an event dataset from Upcoming.org and Last.fm show a marked improvement over bag-of-words based methods.

european conference on information retrieval | 2015

Topic-Dependent Sentiment Classification on Twitter

Steven Van Canneyt; Nathan Claeys; Bart Dhoedt

In this paper, we investigate how discovering the topic dicussed in a tweet can be used to improve its sentiment classification. In particular, a classifier is introduced consisting of a topic-specific classifier, which is only trained on tweets of the same topic of the given tweet, and a generic classifier, which is trained on all the tweets in the training set. The set of considered topics is obtained by clustering the hashtags that occur in the training set. A classifier is then used to estimate the topic of a previously unseen tweet. Experimental results based on a public Twitter dataset show that considering topic-specific sentiment classifiers indeed leads to an improvement.

geographic information retrieval | 2014

Estimating the semantic type of events using location features from Flickr

Steven Van Canneyt; Steven Schockaert; Bart Dhoedt

Various methods for automatically detecting events from social media have been developed in recent years. However, little progress has been made towards extracting structured representations of such events, which severely limits the way in which the resulting event databases can be queried. As a first step to address this issue, we focus on the problem of discovering the semantic type of events. While current methods are almost exclusively based on bag-of-words methods, we show that additionally using location features can substantially improve the results. In particular, we use the tags associated with Flickr photos and the types of the known events near the venue of the event as context information.

international world wide web conferences | 2017

Describing Patterns and Disruptions in Large Scale Mobile App Usage Data

Steven Van Canneyt; M. Bron; Andrew Haines; Mounia Lalmas

The advertising industry is seeking to use the unique data provided by the increasing usage of mobile devices and mobile applications (apps) to improve targeting and the experience with apps. As a consequence, understanding user behaviours with apps has gained increased interests from both academia and industry. In this paper we study user app engagement patterns and disruptions of those patterns in a data set unique in its scale and coverage of user activity. First, we provide a detailed account of temporal user activity patterns with apps and compare these to previous studies on app usage behavior. Then, in the second part, and the main contribution of this work, we take advantage of the scale and coverage of our sample and show how app usage behavior is disrupted through major political, social, and sports events.

international conference on data mining | 2015

Optimizing the Popularity of Twitter Messages through User Categories

Rupert Lemahieu; Steven Van Canneyt; Cedric De Boom; Bart Dhoedt

In this paper, we investigate how the category of a Twitter user can be used to better predict and optimize the popularity of tweets. The contributions of this paper are threefold. First, we compare the influence of content features on the popularity of tweets for different user categories. Second, we present a regression model to predict the popularity of tweets given the content features as input. To construct this model, we interpolate a generic regression model, which is trained on all data, and a category-specific model, which is only trained on tweets from users of the same category as the user of the given tweet. In this way we can combine the advantage of the robustness of a generic model, with the ability of category-specific models to pick up on category-specific influence of content features. The third contribution is the investigation of the feasibility of boosting the popularity of a tweet by setting up an experiment in which we proactively adapt content features in order to optimize the popularity of tweets. Based on this research, we conclude that the introduction of user categories leads to a more precise analysis and better predictions. In the hands-on experiment, we observed a gain in popularity by proactively adapting content features.

Multimedia Tools and Applications | 2018

Modeling and predicting the popularity of online news based on temporal and content-related features

Steven Van Canneyt; Philip Leroux; Bart Dhoedt; Thomas Demeester

As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.

Explore More