Diederik M. Roijers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Diederik M. Roijers is active.

Explore More

Publication

Featured researches published by Diederik M. Roijers.

Synthesis Lectures on Artificial Intelligence and Machine Learning | 2017

Multi-Objective Decision Making

Diederik M. Roijers; Shimon Whiteson; Ronald Brachman

Many real-world decision problems have multiple objectives. For example, when choosing a medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize the side effects. These objectives typically conflict, e.g., we can often increase the efficacy of the treatment, but at the cost of more severe side effects. In this book, we outline how to deal with multiple objectives in decision-theoretic planning and reinforcement learning algorithms. To illustrate this, we employ the popular problem classes of multi-objective Markov decision processes (MOMDPs) and multi-objective coordination graphs (MO-CoGs). First, we discuss different use cases for multi-objective decision making, and why they often necessitate explicitly multi-objective algorithms. We advocate a utility-based approach to multi-objective decision making, i.e., that what constitutes an optimal solution to a multi-objective decision problem should be derived from the available information about user utility. We show how different assumptions about user utility and what types of policies are allowed lead to different solution concepts, which we outline in a taxonomy of multi-objective decision problems. Second, we show how to create new methods for multi-objective decision making using existing single-objective methods as a basis. Focusing on planning, we describe two ways to creating multi-objective algorithms: in the inner loop approach, the inner workings of a single-objective method are adapted to work with multi-objective solution concepts; in the outer loop approach, a wrapper is created around a single-objective method that solves the multi-objective problem as a series of single-objective problems. After discussing the creation of such methods for the planning setting, we discuss how these approaches apply to the learning setting. Next, we discuss three promising application domains for multi-objective decision making algorithms: energy, health, and infrastructure and transportation. Finally, we conclude by outlining important open problems and promising future directions.

international acm sigir conference on research and development in information retrieval | 2016

Balancing Relevance Criteria through Multi-Objective Optimization

Joost van Doorn; Daan Odijk; Diederik M. Roijers; Maarten de Rijke

Offline evaluation of information retrieval systems typically focuses on a single effectiveness measure that models the utility for a typical user. Such a measure usually combines a behavior-based rank discount with a notion of document utility that captures the single relevance criterion of topicality. However, for individual users relevance criteria such as credibility, reputability or readability can strongly impact the utility. Also, for different information needs the utility can be a different mixture of these criteria. Because of the focus on single metrics, offline optimization of IR systems does not account for different preferences in balancing relevance criteria. We propose to mitigate this by viewing multiple relevance criteria as objectives and learning a set of rankers that provide different trade-offs w.r.t. these objectives. We model document utility within a gain-based evaluation framework as a weighted combination of relevance criteria. Using the learned set, we are able to make an informed decision based on the values of the rankers and a preference w.r.t. the relevance criteria. On a dataset annotated for readability and a web search dataset annotated for sub-topic relevance we demonstrate how trade-offs between can be made explicit. We show that there are different available trade-offs between relevance criteria.

computational intelligence and games | 2016

Monte Carlo Tree Search with options for general video game playing

Maarten de Waard; Diederik M. Roijers; S.C.J. Bakkes

General video game playing is a challenging research area in which the goal is to find one algorithm that can play many games successfully. “Monte Carlo Tree Search” (MCTS) is a popular algorithm that has often been used for this purpose. It incrementally builds a search tree based on observed states after applying actions. However, the MCTS algorithm always plans over actions and does not incorporate any higher level planning, as one would expect from a human player. Furthermore, although many games have similar game dynamics, often no prior knowledge is available to general video game playing algorithms. In this paper, we introduce a new algorithm called “Option Monte Carlo Tree Search” (O-MCTS). It offers general video game knowledge and high level planning in the form of “options”, which are action sequences aimed at achieving a specific subgoal. Additionally, we introduce “Option Learning MCTS” (OL-MCTS), which applies a progressive widening technique to the expected returns of options in order to focus exploration on fruitful parts of the search tree. Our new algorithms are compared to MCTS on a diverse set of twenty-eight games from the general video game AI competition. Our results indicate that by using MCTSs efficient tree searching technique on options, O-MCTS outperforms MCTS on most of the games, especially those in which a certain subgoal has to be reached before the game can be won. Lastly, we show that OL-MCTS improves its performance on specific games by learning expected values for options and moving a bias to higher valued options.

Annales Des Télécommunications | 2017

Interactive Thompson Sampling for Multi-objective Multi-armed Bandits

Diederik M. Roijers; Luisa M. Zintgraf; Ann Nowé

In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MORL with user interaction employing the multi-objective multi-armed bandit (MOMAB) setting — perhaps the most fundamental MORL setting. We use Bayesian learning algorithms to learn about the environment and the user simultaneously. Specifically, we propose two algorithms: Utility-MAP UCB (umap-UCB) and Interactive Thompson Sampling (ITS), and show empirically that the performance of these algorithms in terms of regret closely approximates the regret of UCB and regular Thompson sampling provided with the ground truth utility function of the user from the start, and that ITS outperforms umap-UCB.

adaptive agents and multi-agents systems | 2017

Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits.

Pieter Libin; Timothy Verstraeten; Kristof Theys; Diederik M. Roijers; Peter Vrancx; Ann Nowé

Pandemic influenza has the epidemic potential to kill millions of people. While different preventive measures exist, it remains challenging to implement them in an effective and efficient way. To improve preventive strategies, it is necessary to thoroughly understand their impact on the complex dynamics of influenza epidemics. To this end, epidemiological models provide an essential tool to evaluate such strategies in silico. Epidemiological models are frequently used to assist the decision making concerning the mitigation of ongoing epidemics. Therefore, rapidly identifying the most promising preventive strategies is crucial to adequately inform public health officials. To this end, we formulate the evaluation of prevention strategies as a multi-armed bandit problem. Through experiments, we demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, even if there is a large number of preventive strategies to consider.

Annales Des Télécommunications | 2017

Multi-criteria Coalition Formation Games

Ayumi Igarashi; Diederik M. Roijers

When forming coalitions, agents have different utilities per coalition. Game-theoretic approaches typically assume that the scalar utility for each agent for each coalition is public information. However, we argue that this is not a realistic assumption, as agents may not want to divulge this information or are even incapable of expressing it. To mitigate this, we propose the multi-criteria coalition formation game model, in which there are different publicly available quality metrics (corresponding to different criteria) for which a value is publicly available for each coalition. The agents have private utility functions that determine their preferences with respect to these criteria, and thus also with respect to the different coalitions. Assuming that we can ask agents to compare two coalitions, we propose a heuristic (best response) algorithm for finding stable partitions in MC2FGs: local stability search (LSS). We show that while theoretically individually stable partitions need not exist in MC2FGs in general, empirically stable partitions can be found. Furthermore, we show that we can find individually stable partitions after asking only a small number of comparisons, which is highly important for applying this model in practice.

AI Matters | 2016

Multi-objective decision-theoretic planning: abstract

Diederik M. Roijers

Decision making is hard. It often requires reasoning about uncertain environments, partial observability and action spaces that are too large to enumerate. In such tasks decision-theoretic agents can often assist us. In most research on decision-theoretic agents, the desirability of actions and their effects is codified in a scalar reward function. However, many real-world decision problems have multiple objectives. In such cases the problem is more naturally expressed using a vector-valued reward function, leading to a multi-objective decision problem (MODP).

parallel problem solving from nature | 2018

Directed Locomotion for Modular Robots with Evolvable Morphologies

Gongjin Lan; Milan Jelisavcic; Diederik M. Roijers; Evert Haasdijk; A. E. Eiben

Morphologically evolving robot systems need to include a learning period right after ‘birth’ to acquire a controller that fits the newly created body. In this paper, we investigate learning one skill in particular: walking in a given direction. To this end, we apply the HyperNEAT algorithm guided by a fitness function that balances the distance travelled in a direction and the deviation between the desired and the actually travelled directions. We validate this method on a variety of modular robots with different shapes and sizes and observe that the best controllers produce trajectories that accurately follow the correct direction and reach a considerable distance in the given test interval.

Frontiers in Neurorobotics | 2018

Open-Ended Learning: A Conceptual Framework Based on Representational Redescription

Stéphane Doncieux; David Filliat; Natalia Díaz-Rodríguez; Timothy M. Hospedales; Richard J. Duro; Alexandre Coninx; Diederik M. Roijers; Benoît Girard; Nicolas Perrin; Olivier Sigaud

Reinforcement learning (RL) aims at building a policy that maximizes a task-related reward within a given domain. When the domain is known, i.e., when its states, actions and reward are defined, Markov Decision Processes (MDPs) provide a convenient theoretical framework to formalize RL. But in an open-ended learning process, an agent or robot must solve an unbounded sequence of tasks that are not known in advance and the corresponding MDPs cannot be built at design time. This defines the main challenges of open-ended learning: how can the agent learn how to behave appropriately when the adequate states, actions and rewards representations are not given? In this paper, we propose a conceptual framework to address this question. We assume an agent endowed with low-level perception and action capabilities. This agent receives an external reward when it faces a task. It must discover the state and action representations that will let it cast the tasks as MDPs in order to solve them by RL. The relevance of the action or state representation is critical for the agent to learn efficiently. Considering that the agent starts with a low level, task-agnostic state and action spaces based on its low-level perception and action capabilities, we describe open-ended learning as the challenge of building the adequate representation of states and actions, i.e., of redescribing available representations. We suggest an iterative approach to this problem based on several successive Representational Redescription processes, and highlight the corresponding challenges in which intrinsic motivations play a key role.

arXiv: Artificial Intelligence | 2016