Is this you? Create Your Porfile

Youssef Achbany

Université catholique de Louvain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Youssef Achbany is active.

Explore More

Publication

Featured researches published by Youssef Achbany.

Neural Computation | 2009

Randomized shortest-path problems: Two related models

Marco Saerens; Youssef Achbany; François Fouss; Luh Yen

This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel costnothing particular up to nowyou could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policyminimizing the expected routing costare derived. Iterating these necessary conditions, reminiscent of Bellmans value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsus model (1996), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsus model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.

Neurocomputing | 2008

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Youssef Achbany; François Fouss; Luh Yen; Alain Pirotte; Marco Saerens

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action in that state. Then, the exploration/exploitation tradeoff is formulated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at the states. In other words, maximize exploitation for constant exploration. This formulation leads to a set of nonlinear iterative equations reminiscent of the value-iteration algorithm and demonstrates that the Boltzmann strategy based on the Q-value is optimal in this sense. Convergence of those equations to a local minimum is proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path. Furthermore, if the graph of states is directed and acyclic, the nonlinear equations can easily be solved by a single backward pass from the destination state. Stochastic shortest-path problems and discounted problems are also studied, and links between our algorithm and the SARSA algorithm are examined. The theoretical results are confirmed by simple simulations showing that the proposed exploration strategy outperforms the @e-greedy strategy.

adaptive agents and multi-agents systems | 2007

Dynamic task allocation within an open service-oriented MAS architecture

Ivan Jureta; Stéphane Faulkner; Youssef Achbany; Marco Saerens

A MAS architecture consisting of service centers is proposed. Within each service center, a mediator coordinates service delivery by allocating individual tasks to corresponding task specialist agents depending on their prior performance while anticipating performance of newly entering agents. By basing mediator behavior on a novel multicriteria-driven (including quality of service, deadline, reputation, cost, and user preferences) reinforcement learning algorithm, integrating the exploitation of acquired knowledge with optimal, undirected, continual exploration, adaptability to changes in agent availability and performance is ensured. The reported experiments indicate the algorithm behaves as expected and outperforms two standard approaches.

international conference on web services | 2007