Is this you? Create Your Porfile

Luh Yen

Université catholique de Louvain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luh Yen is active.

Explore More

Publication

Featured researches published by Luh Yen.

international conference on data mining | 2006

An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task

François Fouss; Luh Yen; Alain Pirotte; Marco Saerens

This work presents a systematic comparison between seven kernels (or similarity matrices) on a graph, namely the exponential diffusion kernel, the Laplacian diffusion kernel, the von Neumann kernel, the regularized Laplacian kernel, the commute time kernel, and finally the Markov diffusion kernel and the cross-entropy diffusion matrix - both introduced in this paper - on a collaborative recommendation task involving a database. The database is viewed as a graph where elements are represented as nodes and relations as links between nodes. From this graph, seven kernels are computed, leading to a set of meaningful proximity measures between nodes, allowing to answer questions about the structure of the graph under investigation; in particular, recommend items to users. Cross- validation results indicate that a simple nearest-neighbours rule based on the similarity measure provided by the regularized Laplacian, the Markov diffusion and the commute time kernels performs best. We therefore recommend the use of the commute time kernel for computing similarities between elements of a database, for two reasons: (1) it has a nice appealing interpretation in terms of random walks and (2) no parameter needs to be adjusted.

knowledge discovery and data mining | 2008

A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances

Luh Yen; Marco Saerens; Amin Mantrach; Masashi Shimbo

This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path distance when θ is large and, on the other end, to the commute-time (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortest-path policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n x n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.

knowledge discovery and data mining | 2007

Graph nodes clustering based on the commute-time kernel

Luh Yen; François Fouss; Christine Decaestecker; Pascal Francq; Marco Saerens

This work presents a kernel method for clustering the nodes of a weighted, undirected, graph. The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a similarity measure between any couple of nodes by taking the indirect links into account, is computed from the adjacency matrix of the graph. Then, the nodes of the graph are clustered by performing a kernel kmeans or fuzzy k-means on this CT kernel matrix. For this purpose, a new, simple, version of the kernel k-means and the kernel fuzzy k-means is introduced. The joint use of the CT kernel matrix and kernel clustering appears to be quite successful. Indeed, it provides good results on a document clustering problem involving the newsgroups database.

Neural Computation | 2009

Randomized shortest-path problems: Two related models

Marco Saerens; Youssef Achbany; François Fouss; Luh Yen

This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel costnothing particular up to nowyou could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policyminimizing the expected routing costare derived. Iterating these necessary conditions, reminiscent of Bellmans value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsus model (1996), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsus model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

The Sum-over-Paths Covariance Kernel: A Novel Covariance Measure between Nodes of a Directed Graph

Amin Mantrach; Luh Yen; Jérôme Callut; Kevin Françoisse; Masashi Shimbo; Marco Saerens

This work introduces a link-based covariance measure between the nodes of a weighted directed graph, where a cost is associated with each arc. To this end, a probability distribution on the (usually infinite) countable set of paths through the graph is defined by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. This results in a Boltzmann distribution on the set of paths such that long (high-cost) paths occur with a low probability while short (low-cost) paths occur with a high probability. The sum-over-paths (SoP) covariance measure between nodes is then defined according to this probability distribution: two nodes are considered as highly correlated if they often co-occur together on the same - preferably short - paths. The resulting covariance matrix between nodes (say n nodes in total) is a Gram matrix and therefore defines a valid kernel on the graph. It is obtained by inverting an n\times n matrix depending on the costs assigned to the arcs. In the same spirit, a betweenness score is also defined, measuring the expected number of times a node occurs on a path. The proposed measures could be used for various graph mining tasks such as computing betweenness centrality, semi-supervised classification of nodes, visualization, etc., as shown in Section 7.

IEEE Transactions on Knowledge and Data Engineering | 2011

A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases

Luh Yen; Marco Saerens; François Fouss

This work introduces a link analysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain, is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace and visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.

Neurocomputing | 2008

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Youssef Achbany; François Fouss; Luh Yen; Alain Pirotte; Marco Saerens

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action in that state. Then, the exploration/exploitation tradeoff is formulated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at the states. In other words, maximize exploitation for constant exploration. This formulation leads to a set of nonlinear iterative equations reminiscent of the value-iteration algorithm and demonstrates that the Boltzmann strategy based on the Q-value is optimal in this sense. Convergence of those equations to a local minimum is proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path. Furthermore, if the graph of states is directed and acyclic, the nonlinear equations can easily be solved by a single backward pass from the destination state. Stochastic shortest-path problems and discounted problems are also studied, and links between our algorithm and the SARSA algorithm are examined. The theoretical results are confirmed by simple simulations showing that the proposed exploration strategy outperforms the @e-greedy strategy.

european conference on machine learning | 2004