Abbas Abdolmaleki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abbas Abdolmaleki is active.

Explore More

Publication

Featured researches published by Abbas Abdolmaleki.

ieee-ras international conference on humanoid robots | 2015

Regularized covariance estimation for weighted maximum likelihood policy search methods

Abbas Abdolmaleki; Nuno Lau; Luís Paulo Reis; Gerhard Neumann

Many episode-based (or direct) policy search algorithms, maintain a multivariate Gaussian distribution as search distribution over the parameter space of some objective function. One class of algorithms, such as episodic REPS, PoWER or PI2 uses, a weighted maximum likelihood estimate (WMLE) to update the mean and covariance matrix of this distribution in each iteration. However, due to high dimensionality of covariance matrices and limited number of samples, the WMLE is an unreliable estimator. The use of WMLE leads to over-fitted covariance estimates, and, hence the variance/entropy of the search distribution decreases too quickly, which may cause premature convergence. In order to alleviate this problem, the estimated covariance matrix can be regularized in different ways, for example by using a convex combination of the diagonal covariance estimate and the sample covariance estimate. In this paper, we propose a new covariance matrix regularization technique for policy search methods that uses the convex combination of the sample covariance matrix and the old covariance matrix used in last iteration. The combination weighting is determined by specifying the desired entropy of the new search distribution. With this mechanism, the entropy of the search distribution can be gradually decreased without damage from the maximum likelihood estimate.

portuguese conference on artificial intelligence | 2013

Omnidirectional Walking and Active Balance for Soccer Humanoid Robot

Nima Shafii; Abbas Abdolmaleki; Rui A. C. Ferreira; Nuno Lau; Luís Paulo Reis

Soccer Humanoid robots must be able to fulfill their tasks in a highly dynamic soccer field, which requires highly responsive and dynamic locomotion. It is very difficult to keep humanoids balance during walking. The position of the Zero Moment Point (ZMP) is widely used for dynamic stability measurement in biped locomotion. In this paper, we present an omnidirectional walk engine, which mainly consist of a Foot planner, a ZMP and Center of Mass (CoM) generator and an Active balance loop. The Foot planner, based on desire walk speed vector, generates future feet step positions that are then inputs to the ZMP generator. The cart-table model and preview controller are used to generate the CoM reference trajectory from the predefined ZMP trajectory. An active balance method is presented which keeps the robot’s trunk upright when faced with environmental disturbances. We have tested the biped locomotion control approach on a simulated NAO robot. Our results are encouraging given that the robot has been able to walk fast and stably in any direction with performances that compare well to the best RoboCup 2012 3D Simulation teams.

ibero-american conference on artificial intelligence | 2014

Omnidirectional Walking with a Compliant Inverted Pendulum Model

Abbas Abdolmaleki; Nima Shafii; Luís Paulo Reis; Nuno Lau; Jan Peters; Gerhard Neumann

In this paper, we propose a novel omnidirectional walking engine that achieves energy efficient, human like, stable and fast walking. We augment the 3D inverted pendulum with a spring model to implement a height change in the robot’s center of mass trajectory. This model is used as simplified model of the robot and the zero moment point (ZMP) criterion is used as the stability indicator. The presented walking engine consists of 5 main modules including the “next posture generator” module, the “foot trajectory generator” module, the “center of mass (CoM) trajectory generator” module, the “robot posture controller” module and “Inverse kinematics (IK) solver” module. The focus of the paper is the generation of the position of the next step and the CoM trajectory generation. For the trajectory generator, we extend the 3D-IPM with an undamped spring to implement height changes of the CoM. With this model we can implement active compliance for the robot’s gait, resulting in a more energy efficient movement. We present a modified method for solving ZMP equations which derivation is based on the new proposed model for omnidirectional walking. The walk engine is tested on simulated and a real NAO robot. We use policy search to optimize the parameters of the walking engines for the standard 3D-LIPM and our proposed model to compare the performance of both models each with their optimal parameters. We optimize the policy parameters in terms of energy efficiency for a fixed walking speed. The experimental results show the advantages of our proposed model over 3D-LIPM.

robot soccer world cup | 2016

Learning a Humanoid Kick with Controlled Distance

Abbas Abdolmaleki; David Apolinário Simões; Nuno Lau; Luís Paulo Reis; Gerhard Neumann

We investigate the learning of a flexible humanoid robot kick controller, i.e., the controller should be applicable for multiple contexts, such as different kick distances, initial robot position with respect to the ball or both. Current approaches typically tune or optimise the parameters of the biped kick controller for a single context, such as a kick with longest distance or a kick with a specific distance. Hence our research question is that, how can we obtain a flexible kick controller that controls the robot (near) optimally for a continuous range of kick distances? The goal is to find a parametric function that given a desired kick distance, outputs the (near) optimal controller parameters. We achieve the desired flexibility of the controller by applying a contextual policy search method. With such a contextual policy search algorithm, we can generalize the robot kick controller for different distances, where the desired distance is described by a real-valued vector. We will also show that the optimal parameters of the kick controller is a non-linear function of the desired distances and a linear function will fail to properly generalize the kick controller over desired kick distances.

ieee international conference on autonomous robot systems and competitions | 2015

Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller

Abbas Abdolmaleki; Nuno Lau; Luís Paulo Reis; Jan Peters; Gerhard Neumann

We investigate learning of flexible Robot locomotion controller, i.e., the controllers should be applicable for multiple contexts, for example different walking speeds, various slopes of the terrain or other physical properties of the robot. In our experiments, contexts are desired walking linear speed and the direction of the gait. Current approaches for learning control parameters of biped locomotion controllers are typically only applicable for a single context. They can be used for a particular context, for example to learn a gait with highest speed, lowest energy consumption or a combination of both. The question of our research is, how can we obtain a flexible walking controller that controls the robot (near) optimally for many different contexts? We achieve the desired flexibility of the controller by applying the recently developed contextual relative entropy policy search(REPS) method. With such a contextual policy search algorithm, we can generalize the robot walking controller for different contexts, where a context is described by a real valued vector. In this paper we also extend the contextual REPS algorithm to learn a non-linear policy instead of a linear one over the contexts. In order to validate our method, we perform a simulation experiment using a simulated NAO humanoid robot. The robot now learns a policy to choose the controller parameters for a continuous set of walking speeds and directions.

genetic and evolutionary computation conference | 2017

Deriving and improving CMA-ES with information geometric trust regions

Abbas Abdolmaleki; Bob Price; Nuno Lau; Luís Paulo Reis; Gerhard Neumann

CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Co-variance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

Journal of Intelligent and Robotic Systems | 2016

Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller

Abbas Abdolmaleki; Nuno Lau; Luís Paulo Reis; Jan Peters; Gerhard Neumann

We investigate learning of flexible robot locomotion controllers, i.e., the controllers should be applicable for multiple contexts, for example different walking speeds, various slopes of the terrain or other physical properties of the robot. In our experiments, contexts are desired walking linear speed of the gait. Current approaches for learning control parameters of biped locomotion controllers are typically only applicable for a single context. They can be used for a particular context, for example to learn a gait with highest speed, lowest energy consumption or a combination of both. The question of our research is, how can we obtain a flexible walking controller that controls the robot (near) optimally for many different contexts? We achieve the desired flexibility of the controller by applying the recently developed contextual relative entropy policy search(REPS) method which generalizes the robot walking controller for different contexts, where a context is described by a real valued vector. In this paper we also extend the contextual REPS algorithm to learn a non-linear policy instead of a linear policy over the contexts which call it RBF-REPS as it uses Radial Basis Functions. In order to validate our method, we perform three simulation experiments including a walking experiment using a simulated NAO humanoid robot. The robot learns a policy to choose the controller parameters for a continuous set of forward walking speeds.

2016 International Conference on Autonomous Robot Systems and Competitions (ICARSC) | 2016

Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation

Abbas Abdolmaleki; David Apolinário Simões; Nuno Lau; Luís Paulo Reis; Gerhard Neumann

Stochastic search algorithms are black-box optimizers of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. However, with slightly different tasks or objective functions, many stochastic search algorithms require complete re-learning in order to adapt the solution to the new objective function or the new context. As such, we consider the contextual stochastic search paradigm. Here, we want to find good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation. Contextual algorithms have been investigated in the field of policy search. However, contextual policy search algorithms typically suffer from premature convergence and perform unfavourably in comparison with state of the art stochastic search methods. In this paper, we investigate a contextual stochastic search algorithm known as Contextual Relative Entropy Policy Search (CREPS), an informationtheoretic algorithm that can learn for multiple tasks simultaneously. We extend that algorithm with a covariance matrix adaptation technique that alleviates the premature convergence problem. We call the new algorithm Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We will show that CREPS-CMA outperforms the original CREPS by orders of magnitude. We illustrate the performance of CREPS-CMA on several contextual tasks, including a complex simulated robot kick task.

International Journal of Advanced Robotic Systems | 2015

Development of an Omnidirectional Walk Engine for Soccer Humanoid Robots

Nima Shafii; Abbas Abdolmaleki; Nuno Lau; Luís Paulo Reis

Humanoid soccer robots must be able to carry out their tasks in a highly dynamic environment which requires responsive omnidirectional walking. This paper explains a new omnidirectional walking engine for a humanoid soccer robot that mainly consists of a foot planner, a zero moment point (ZMP) trajectory generator, a centre of mass (CoM) calculator and an active balance feedback loop. An analytical approach is presented for generating the CoM trajectory, in which the cart-table motion of the equations is solved using the Fourier approximation of the ZMP. With this approach, we propose using a new time segmentation approach in order to parametrize the double-support phase. An active balance method is also proposed which keeps the robots trunk upright when faced with environmental disturbances.The walking engine is tested on both simulated and real NAO robots. Our results are encouraging given the fact that the robot performs favourably, walking quickly and in a stable manner in any direction in comparison...

portuguese conference on artificial intelligence | 2011

A reinforcement learning based method for optimizing the process of decision making in fire brigade agents

Abbas Abdolmaleki; Mostafa Movahedi; Sajjad Salehi; Nuno Lau; Luís Paulo Reis

Decision making in complex, multi agent and dynamic environments such as disaster spaces is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which are common characteristics of such environment makes real time decision making more complicated. In this paper an approach to solve the bottleneck of dynamicity and variety of conditions in such situations based on reinforcement learning is presented. This method is applied to RoboCup Rescue Simulation Fire brigade agents decision making process and it learned a good strategy to save civilians and city from fire. The utilized method increases the speed of learning and it has very low memory usage. The effectiveness of the proposed method is shown through simulation results.

Explore More