Yann-Michaël De Hauwere
Vrije Universiteit Brussel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yann-Michaël De Hauwere.
Archive | 2012
Ann Nowé; Peter Vrancx; Yann-Michaël De Hauwere
Reinforcement Learning was originally developed for Markov Decision Processes (MDPs). It allows a single agent to learn a policy that maximizes a possibly delayed reward signal in a stochastic stationary environment. It guarantees convergence to the optimal policy, provided that the agent can sufficiently experiment and the environment in which it is operating is Markovian. However, when multiple agents apply reinforcement learning in a shared environment, this might be beyond the MDP model. In such systems, the optimal policy of an agent depends not only on the environment, but on the policies of the other agents as well. These situations arise naturally in a variety of domains, such as: robotics, telecommunications, economics, distributed control, auctions, traffic light control, etc. In these domains multi-agent learning is used, either because of the complexity of the domain or because control is inherently decentralized. In such systems it is important that agents are capable of discovering good solutions to the problem at hand either by coordinating with other learners or by competing with them. This chapter focuses on the application reinforcement learning techniques in multi-agent systems. We describe a basic learning framework based on the economic research into game theory, and illustrate the additional complexity that arises in such systems. We also described a representative selection of algorithms for the different areas of multi-agent reinforcement learning research.
european workshop on multi agent systems | 2010
Yann-Michaël De Hauwere; Peter Vrancx; Ann Nowé
A major challenge in multi-agent reinforcement learning remains dealing with the large state spaces typically associated with realistic multi-agent systems. As the state space grows, agent policies become increasingly complex and learning slows down. Currently, advanced single-agent techniques are already very capable of learning optimal policies in large unknown environments. When multiple agents are present however, we are challenged by an increase of the state-action space, exponential in the number of agents, even though these agents do not always interfere with each other and thus their presence should not always be included in the state information of the other agent. A solution to this problem lies in the use of generalized learning automata (GLA). In this paper we will first demonstrate how GLA can help take the correct actions in large unknown multi-agent environments. Furthermore we introduce a framework capable of dealing with this issue of observing other agents. We also present an implementation of our framework, called 2observe which we apply to some gridworld problems. Finally, we demonstrate that our approach is capable of transferring its knowledge to new agents entering the environment.
adaptive and learning agents | 2011
Yann-Michaël De Hauwere; Peter Vrancx; Ann Nowé
One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.
north american fuzzy information processing society | 2012
Tim Brys; Yann-Michaël De Hauwere; Martine De Cock; Ann Nowé
Satisfiability in propositional logic is well researched and many approaches to checking and solving exist. In infinite-valued or fuzzy logics, however, there have only recently been attempts at developing methods for solving satisfiability. In this paper, we propose a new incomplete solver, based on a class of continuous optimization algorithms called evolution strategies. We show experimentally that our method is an important contribution to the state of the art in incomplete fuzzy-SAT solvers.
Agent and Multi-agent Technology for Internet and Enterprise Systems | 2010
Yann-Michaël De Hauwere; Peter Vrancx; Ann Nowé
A major challenge in multi-agent reinforcement learning remains dealing with the large state spaces typically associated with realistic multi-agent systems. As the state space grows, agent policies become more and more complex and learning slows down. The presence of possibly redundant information is one of the causes of this issue. Current single-agent techniques are already very capable of learning optimal policies in large unknown environments. When multiple agents are present however, we are challenged by an increase of the state space, which is exponential in the number of agents. A solution to this problem lies in the use of Generalized Learning Automata (GLA). In this chapter we will first demonstrate how GLA can help take the correct actions in large unknown environments. Secondly, we introduce a general framework for multi-agent learning, where learning happens on two separate layers and agents learn when to observe each other. Within this framework we introduce a new algorithm, called 2observe, which uses a GLA-approach to distinguish between high risk states where the agents have to take each others presence into account and low risk states where they can act independently. Finally we apply this algorithm to a gridworld problem because of the similarities to some real-world problems, such as autonomous robot control.
Knowledge Engineering Review | 2016
Yann-Michaël De Hauwere; Sam Devlin; Daniel Kudenko; Ann Nowé
Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation. This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents a priori , the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution. We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.
international conference on intelligent transportation systems | 2015
Ivomar Brito Soares; Yann-Michaël De Hauwere; Kris Januarius; Tim Brys; Thierry Salvant; Ann Nowé
This paper considers how existing Reinforcement Learning (RL) techniques can be used to model and learn solutions for large scale Multi-Agent Systems (MAS). The large scale MAS of interest is the context of the movement of departure flights in big airports, commonly known as the Departure MAN-agement (DMAN) problem. A particular DMAN subproblem is how to respect Central Flow Management Unit (CFMU) take-off time windows, which are time windows planned by flow management authorities to be respected for the take-off time of departure flights. A RL model to handle this problem is proposed including the Markov Decision Process (MDP) definition, the behavior of the learning agents and how the problem can be modeled using RL ranging from the simplest to the full RL problem. Several experiments are also shown that illustrate the performance of the machine learning algorithm, with a comparison on how these problems are commonly handled by airport controllers nowadays. The environment in which the agents learn is provided by the Fast Time Simulator (FTS) AirTOp and the airport case study is the John F. Kennedy International Airport (KJFK) in New York City, USA, one of the busiest airports in the world.
ieee symposium series on computational intelligence | 2015
Kevin Van Vaerenbergh; Yann-Michaël De Hauwere; Bruno Depraetere; Kristof Van Moffaert; Ann Nowé
Heating a home is an energy consuming task. Most thermostats are programmed to turn on the heating at a particular time in order to reach and maintain a predefined target temperature. A lot of energy is wasted if these thermostats are not configured optimally since most of these thermostats do not take energy consumption into account but are only concerned with reaching the target temperature. In this paper we present a learning approach based on policy gradient with parameter estimations to balance user comfort with energy consumption. Our results show that our approach is capable of offering good trade-off solutions between these objectives.
european workshop on multi-agent systems | 2011
Tim Brys; Yann-Michaël De Hauwere; Ann Nowé; Peter Vrancx
In cooperative multi-agent systems, group performance often depends more on the interactions between team members, rather than on the performance of any individual agent. Hence, coordination among agents is essential to optimize the group strategy. One solution which is common in the literature is to let the agents learn in a joint action space. Joint Action Learning (JAL) enables agents to explicitly take into account the actions of other agents, but has the significant drawback that the action space in which the agents must learn scales exponentially in the number of agents. Local coordination is a way for a team to coordinate while keeping communication and computational complexity low. It allows the exploitation of a specific dependency structure underlying the problem, such as tight couplings between specific agents. In this paper we investigate a novel approach to local coordination, in which agents learn this dependency structure, resulting in coordination which is beneficial to the group performance. We evaluate our approach in the context of online distributed constraint optimization problems.
international conference on knowledge based and intelligent information and engineering systems | 2008
Yann-Michaël De Hauwere; Peter Vrancx; Ann Nowé
A key problem in multi-agent reinforcement learning remains dealing with the large state spaces typically associated with realistic distributed agent systems. As the state space grows, agent policies become more and more complex and learning slows. One possible solution for an agent to continue learning in these large-scale systems is to learn a policy which generalizes over states, rather than trying to map each individual state to an action. In this paper we present a multi-agent learning approach capable of aggregating states, using simple reinforcement learners called learning automata (LA). Independent Learning automata have already been shown to perform well in multi-agent environments. Previously we proposed LA based multi-agent algorithms capable of finding a Nash Equilibrium between agent policies. In these algorithms, however, one LA per agent is associated with each system state, as such the approach is limited to discrete state spaces. Furthermore, when the number of states increases, the number of automata also increases and the learning speed of the system slows down. To deal with this problem, we propose to use Generalized Learning Automata (GLA), which are capable of identifying regions within the state space with the same optimal action, and as such aggregating states. We analyze the behaviour of GLA in a multi-agent setting and demonstrate results on a set of sample problems.