Is this you? Create Your Porfile

François Fouss

Université catholique de Louvain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where François Fouss is active.

Explore More

Publication

Featured researches published by François Fouss.

international conference on data mining | 2006

An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task

François Fouss; Luh Yen; Alain Pirotte; Marco Saerens

This work presents a systematic comparison between seven kernels (or similarity matrices) on a graph, namely the exponential diffusion kernel, the Laplacian diffusion kernel, the von Neumann kernel, the regularized Laplacian kernel, the commute time kernel, and finally the Markov diffusion kernel and the cross-entropy diffusion matrix - both introduced in this paper - on a collaborative recommendation task involving a database. The database is viewed as a graph where elements are represented as nodes and relations as links between nodes. From this graph, seven kernels are computed, leading to a set of meaningful proximity measures between nodes, allowing to answer questions about the structure of the graph under investigation; in particular, recommend items to users. Cross- validation results indicate that a simple nearest-neighbours rule based on the similarity measure provided by the regularized Laplacian, the Markov diffusion and the commute time kernels performs best. We therefore recommend the use of the commute time kernel for computing similarities between elements of a database, for two reasons: (1) it has a nice appealing interpretation in terms of random walks and (2) no parameter needs to be adjusted.

knowledge discovery and data mining | 2007

Graph nodes clustering based on the commute-time kernel

Luh Yen; François Fouss; Christine Decaestecker; Pascal Francq; Marco Saerens

This work presents a kernel method for clustering the nodes of a weighted, undirected, graph. The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a similarity measure between any couple of nodes by taking the indirect links into account, is computed from the adjacency matrix of the graph. Then, the nodes of the graph are clustered by performing a kernel kmeans or fuzzy k-means on this CT kernel matrix. For this purpose, a new, simple, version of the kernel k-means and the kernel fuzzy k-means is introduced. The joint use of the CT kernel matrix and kernel clustering appears to be quite successful. Indeed, it provides good results on a document clustering problem involving the newsgroups database.

Neural Computation | 2009

Randomized shortest-path problems: Two related models

Marco Saerens; Youssef Achbany; François Fouss; Luh Yen

This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel costnothing particular up to nowyou could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policyminimizing the expected routing costare derived. Iterating these necessary conditions, reminiscent of Bellmans value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsus model (1996), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsus model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.

multiple classifier systems | 2004

Yet Another Method for Combining Classifiers Outputs: A Maximum Entropy Approach

Marco Saerens; François Fouss

In this paper, we present a maximum entropy (maxent) approach to the fusion of experts opinions, or classifiers outputs, problem. The maxent approach is quite versatile and allows us to express in a clear, rigorous, way the a priori knowledge that is available on the problem. For instance, our knowledge about the reliability of the experts and the correlations between these experts can be easily integrated: Each piece of knowledge is expressed in the form of a linear constraint. An iterative scaling algorithm is used in order to compute the maxent solution of the problem. The maximum entropy method seeks the joint probability density of a set of random variables that has maximum entropy while satisfying the constraints. It is therefore the ”most honest” characterization of our knowledge given the available facts (constraints). In the case of conflicting constraints, we propose to minimise the ”lack of constraints satisfaction” or to relax some constraints and recompute the maximum entropy solution. The maxent fusion rule is illustrated by some simulations.

Neurocomputing | 2008

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Youssef Achbany; François Fouss; Luh Yen; Alain Pirotte; Marco Saerens

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action in that state. Then, the exploration/exploitation tradeoff is formulated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at the states. In other words, maximize exploitation for constant exploration. This formulation leads to a set of nonlinear iterative equations reminiscent of the value-iteration algorithm and demonstrates that the Boltzmann strategy based on the Q-value is optimal in this sense. Convergence of those equations to a local minimum is proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path. Furthermore, if the graph of states is directed and acyclic, the nonlinear equations can easily be solved by a single backward pass from the destination state. Stochastic shortest-path problems and discounted problems are also studied, and links between our algorithm and the SARSA algorithm are examined. The theoretical results are confirmed by simple simulations showing that the proposed exploration strategy outperforms the @e-greedy strategy.

international conference on data mining | 2003

Links between Kleinberg's hubs and authorities, correspondence analysis, and Markov chains

François Fouss; Marco Saerens; Jean-Michel Renders

We show that Kleinbergs hubs and authorities model is closely related to both correspondence analysis, a well-known multivariate statistical technique, and a particular Markov chain model of navigation through the Web. The only difference between correspondence analysis and Kleinbergs method is the use of the average value of the hubs (authorities) scores for computing the authorities (hubs) scores, instead of the sum for Kleinbergs method. We also show that correspondence analysis and our Markov model are related to SALSA, a variant of Kleinbergs model.

web intelligence | 2005

HITS is Principal Components Analysis

Marco Saerens; François Fouss

In this work, we show that Kleinbergs hubs and authorities model (HITS) is simply principal components analysis (PCA; maybe the most widely used multivariate statistical analysis method), albeit without centering, applied to the adjacency matrix of the graph of Web pages. We further show that a variant of HITS, SALSA, is closely related to correspondence analysis, another standard multivariate statistical analysis method. In addition, to provide a clear statistical interpretation for HITS, this result suggests to rely on existing work already published in the multivariate statistical analysis literature (extensions of PCA or correspondence analysis) in order to analyse or design new Web pages scoring procedures.

international conference on artificial neural networks | 2016

Comparison of Graph Node Distances on Clustering Tasks

Felix Sommer; François Fouss; Marco Saerens

This work presents recent developments in graph node distances and tests them empirically on social network databases of various sizes and types. We compare two versions of a distance-based kernel k-means algorithm with the well-established Louvain method. The first version is a classic kernel k-means approach, the second version additionally makes use of node weights with the Sum-over-Forests density index. Both kernel k-means algorithms employ a variety of classic and modern distances. We compare the results of all three algorithms using statistical measures and an overall rank-comparison to ascertain their capabilities in community detection. Results show that two recently introduced distances outperform the others, on our tested datasets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

The Sum-over-Forests Density Index: Identifying Dense Regions in a Graph

Mathieu Senelle; Silvia García-Díez; Amin Mantrach; Masashi Shimbo; Marco Saerens; François Fouss

This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the countable set of forests in the graph is defined so that large (high-cost) forests occur with a low probability while short (low-cost) forests occur with a high probability. Then, the SoF density index of a node is defined as the expected outdegree of this node on the set of forests, thus providing a measure of density around that node. Following the matrix-forest theorem and a statistical physics framework, it is shown that the SoF density index can be easily computed in closed form through a simple matrix inversion. Experiments on artificial and real datasets show that the proposed index performs well on finding dense regions, for graphs of various origins.This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the countable set of forests in the graph is defined so that large (high-cost) forests occur with a low probability while short (low-cost) forests occur with a high probability. Then, the SoF density index of a node is defined as the expected outdegree of this node on the set of forests, thus providing a measure of density around that node. Following the matrix-forest theorem and a statistical physics framework, it is shown that the SoF density index can be easily computed in closed form through a simple matrix inversion. Experiments on artificial and real datasets show that the proposed index performs well on finding dense regions, for graphs of various origins.

international conference on pattern recognition | 2010

Normalized Sum-over-Paths Edit Distances

Silvia Garcia Diez; François Fouss; Masashi Shimbo; Marco Saerens

In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected cost on all sequence alignments by favoring low-cost ones – therefore favoring good alignment. Such distances consider two sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal, alignment. They depend on a parameter, θ, and reduce to the standard distances – the edit-distance or the longest common subsequence – when θ → 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR data sets show that (i) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances, the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest common subsequence.

Explore More