Cedric De Boom | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cedric De Boom is active.

Explore More

Publication

Featured researches published by Cedric De Boom.

Pattern Recognition Letters | 2016

Representation learning for very short texts using weighted word embedding aggregation

Cedric De Boom; Steven Van Canneyt; Thomas Demeester; Bart Dhoedt

We create text representations by weighing word embeddings using idf information.A novel median-based loss is designed to mitigate the negative effect of outliers.A dataset of semantically related textual pairs from Wikipedia and Twitter is made.Our method outperforms all word embedding baselines in a semantic similarity task.Our method is out-of-the-box and thus requires no retraining in different contexts. Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.

international conference on data mining | 2015

Learning Semantic Similarity for Very Short Texts

Cedric De Boom; Steven Van Canneyt; Steven Bohez; Thomas Demeester; Bart Dhoedt

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair short text fragments -- as a concatenation of separate words -- an adequate distributed sentence representation is needed, in existing literature often obtained by naively combining the individual word representations. We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching. This paper investigates the effectiveness of several such naive techniques, as well as traditional tf-idf similarity, for fragments of different lengths. Our main contribution is a first step towards a hybrid method that combines the strength of dense distributed representations -- as opposed to sparse term matching -- with the strength of tf-idf based methods to automatically reduce the impact of less informative terms. Our new approach outperforms the existing techniques in a toy experimental set-up, leading to the conclusion that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments.

Multimedia Tools and Applications | 2018

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

Cedric De Boom; Rohan Agrawal; Samantha Hansen; Esh Kumar; Romain Yon; Ching-Wei Chen; Thomas Demeester; Bart Dhoedt

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

probabilistic graphical models | 2014

Robustifying the Viterbi Algorithm

Cedric De Boom; Jasper De Bock; Arthur Van Camp; Gert de Cooman

We present an efficient algorithm for estimating hidden state sequences in imprecise hidden Markov models (iHMMs), based on observed output sequences. The main difference with classical HMMs is that the local models of an iHMM are not represented by a single mass function, but rather by a set of mass functions. We consider as estimates for the hidden state sequence those sequences that are maximal. In this way, we generalise the problem of finding a state sequence with highest posterior probability, as is commonly considered in HMMs, and solved efficiently by the Viterbi algorithm. An important feature of our approach is that there may be multiple maximal state sequences, typically for iHMMs that are highly imprecise. We show experimentally that the time complexity of our algorithm tends to be linear in this number of maximal sequences, and investigate how this number depends on the local models.

international conference on data mining | 2015

Optimizing the Popularity of Twitter Messages through User Categories

Rupert Lemahieu; Steven Van Canneyt; Cedric De Boom; Bart Dhoedt

In this paper, we investigate how the category of a Twitter user can be used to better predict and optimize the popularity of tweets. The contributions of this paper are threefold. First, we compare the influence of content features on the popularity of tweets for different user categories. Second, we present a regression model to predict the popularity of tweets given the content features as input. To construct this model, we interpolate a generic regression model, which is trained on all data, and a category-specific model, which is only trained on tweets from users of the same category as the user of the given tweet. In this way we can combine the advantage of the robustness of a generic model, with the ability of category-specific models to pick up on category-specific influence of content features. The third contribution is the investigation of the feasibility of boosting the popularity of a tweet by setting up an experiment in which we proactively adapt content features in order to optimize the popularity of tweets. Based on this research, we conclude that the introduction of user categories leads to a more precise analysis and better predictions. In the hands-on experiment, we observed a gain in popularity by proactively adapting content features.

acm sigmm conference on multimedia systems | 2018

Low-latency delivery of news-based video content

Jeroen van der Hooft; Dries Pauwels; Cedric De Boom; Stefano Petrangeli; Tim Wauters; Filip De Turck

Nowadays, news-based websites and portals provide significant amounts of multimedia content to accompany news stories and articles. Within this context, HTTP Adaptive Streaming is generally used to deliver video over the best-effort Internet, allowing smooth video playback and a good Quality of Experience (QoE). To stimulate user engagement with the provided content, such as browsing and switching between videos, reducing the videos startup time has become more and more important: while the current median load time is in the order of seconds, research has shown that user waiting times must remain below two seconds to achieve an acceptable QoE. We developed a framework for low-latent delivery of news-related video content, integrating four optimizations either at server-side, client-side, or at the application layer. Using these optimizations, the videos startup time can be reduced significantly, allowing user interaction and fast switching between available content. In this paper, we describe a proof of concept of this framework, using a large dataset of a major Belgian news provider. A dashboard is provided, which allows the user to interact with available video content and assess the gains of the proposed optimizations. Particularly, we demonstrate how the proposed optimizations consistently reduce the videos startup time in different mobile network scenarios. These reductions allow the news provider to improve the users QoE, reducing the startup time to values well below two seconds in different mobile network scenarios.

Neural Computing and Applications | 2018

Character-level recurrent neural networks in practice: comparing training and sampling schemes

Cedric De Boom; Thomas Demeester; Bart Dhoedt

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

Proceedings of the 5th Workshop on Making Sense of Microposts | 2015