Abdeslam Boularias
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abdeslam Boularias.
IEEE Transactions on Geoscience and Remote Sensing | 2014
Claudio Persello; Abdeslam Boularias; Michele Dalponte; Terje Gobakken; Erik Næsset; Bernhard Schölkopf
Active learning typically aims at minimizing the number of labeled samples to be included in the training set to reach a certain level of classification accuracy. Standard methods do not usually take into account the real annotation procedures and implicitly assume that all samples require the same effort to be labeled. Here, we consider the case where the cost associated with the annotation of a given sample depends on the previously labeled samples. In general, this is the case when annotating a queried sample is an action that changes the state of a dynamic system, and the cost is a function of the state of the system. In order to minimize the total annotation cost, the active sample selection problem is addressed in the framework of a Markov decision process, which allows one to plan the next labeling action on the basis of an expected long-term cumulative reward. This framework allows us to address the problem of optimizing the collection of labeled samples by field surveys for the classification of remote sensing data. The proposed method is applied to the ground sample collection for tree species classification using airborne hyperspectral images. Experiments carried out in the context of a real case study on forest inventory show the effectiveness of the proposed method.
intelligent robots and systems | 2011
Abdeslam Boularias; Oliver Kroemer; Jan Peters
Learning to grasp novel objects is an essential skill for robots operating in unstructured environments. We therefore propose a probabilistic approach for learning to grasp. In particular, we learn a function that predicts the success probability of grasps performed on surface points of a given object. Our approach is based on Markov Random Fields (MRF), and motivated by the fact that points that are geometrically close to each other tend to have similar grasp success probabilities. The MRF approach is successfully tested in simulation, and on a real robot using 3-D scans of various types of objects. The empirical results show a significant improvement over methods that do not utilize the smoothness assumption and classify each point separately from the others.
international conference on robotics and automation | 2015
Abdeslam Boularias; Felix Duvallet; Jean Oh; Anthony Stentz
We propose a language-driven navigation approach for commanding mobile robots in outdoor environments. We consider unknown environments that contain previously unseen objects. The proposed approach aims at making interactions in human-robot teams natural. Robots receive from human teammates commands in natural language, such as “Navigate around the building to the car left of the fire hydrant and near the tree”. A robot needs first to classify its surrounding objects into categories, using images obtained from its sensors. The result of this classification is a map of the environment, where each object is given a list of semantic labels, such as “tree” and “car”, with varying degrees of confidence. Then, the robot needs to ground the nouns in the command. Grounding, the main focus of this paper, is mapping each noun in the command into a physical object in the environment. We use a probabilistic model for interpreting the spatial relations, such as “left of” and “near”. The model is learned from examples provided by humans. For each noun in the command, a distribution on the objects in the environment is computed by combining spatial constraints with a prior given as the semantic classifiers confidence values. The robot needs also to ground the navigation mode specified in the command, such as “navigate quickly” and “navigate covertly”, as a cost map. The cost map is also learned from examples, using Inverse Optimal Control (IOC). The cost map and the grounded goal are used to generate a path for the robot. This approach is evaluated on a robot in a real-world environment. Our experiments clearly show that the proposed approach is efficient for commanding outdoor robots.
Biological Cybernetics | 2014
Katharina Muelling; Abdeslam Boularias; Betty J. Mohler; Bernhard Schölkopf; Jan Peters
Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.
european conference on machine learning | 2012
Abdeslam Boularias; Oliver Krömer; Jan Peters
We propose a graph-based algorithm for apprenticeship learning when the reward features are noisy. Previous apprenticeship learning techniques learn a reward function by using only local state features. This can be a limitation in practice, as often some features are misspecified or subject to measurement noise. Our graphical framework, inspired from the work on Markov Random Fields, allows to alleviate this problem by propagating information between states, and rewarding policies that choose similar actions in adjacent states. We demonstrate the advantage of the proposed approach on grid-world navigation problems, and on the problem of teaching a robot to grasp novel objects in simulation.
international conference on machine learning | 2009
Abdeslam Boularias; Brahim Chaib-draa
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor-Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variables the parameters of the policy. These latter parameters are gradually updated to maximize the value function. We show that the value function is polynomial for both FSCs and PSR policies, with a potentially smaller degree in the case of PSR policies. Therefore, the value function of a PSR policy can have less local optima than the equivalent FSC, and consequently, the gradient algorithm is more likely to converge to a global optimal solution.
Neurocomputing | 2013
Abdeslam Boularias; Brahim Chaib-draa
We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the experts policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the experts policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the experts examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the experts actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.
international conference on robotics and automation | 2010
Abdeslam Boularias; Brahim Chaib-draa
We consider the problem of apprenticeship learning when the experts demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). However, past work on IRL requires an accurate estimate of the frequency of encountering each feature of the states when the robot follows the experts policy. Given that the complete policy of the expert is unknown, the features frequencies can only be empirically estimated from the demonstrated trajectories. In this paper, we propose to use a transfer method, known as soft homomorphism, in order to generalize the experts policy to unvisited regions of the state space. The generalized policy can be used either as the robots final policy, or to calculate the features frequencies within an IRL algorithm. Empirical results show that our approach is able to learn good policies from a small number of demonstrations.
international conference on machine learning and applications | 2008
Abdeslam Boularias
Learning by imitation has shown to be a powerful paradigm for automated learning in autonomous robots. This paper presents a general framework of learning by imitation for stochastic and partially observable systems. The model is a Predictive Policy Representation (PPR) whose goal is to represent the teachers policies without any reference to states. The model is fully described in terms of actions and observations only. We show how this model can efficiently learn the personal behavior and preferences of an assistive robot user.
international conference on machine learning and applications | 2008
Abdeslam Boularias; Masoumeh T. Izadi; Brahim Chaib-draa
High dimensionality of belief space in partially observable Markov decision processes (POMDPs) is one of the major causes that severely restricts the applicability of this model. Previous studies have demonstrated that the dimensionality of a POMDP can eventually be reduced by transforming it into an equivalent predictive state representation (PSR). In this paper, we address the problem of finding an approximate and compact PSR model corresponding to a given POMDP model. We formulate this problem in an optimization framework. Our algorithm tries to minimize the potential error that missing some core tests may cause. We also present an empirical evaluation on benchmark problems, illustrating the performance of this approach.