Ngo Anh Vien
University of Stuttgart
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ngo Anh Vien.
Information Sciences | 2011
Ngo Anh Vien; Hwanjo Yu; TaeChoong Chung
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10].
Applied Intelligence | 2013
Ngo Anh Vien; Wolfgang Ertel; TaeChoong Chung
This paper considers the problem of extending Training an Agent Manually via Evaluative Reinforcement (TAMER) in continuous state and action spaces. Investigative research using the TAMER framework enables a non-technical human to train an agent through a natural form of human feedback (negative or positive). The advantages of TAMER have been shown on tasks of training agents by only human feedback or combining human feedback with environment rewards. However, these methods are originally designed for discrete state-action, or continuous state-discrete action problems. This paper proposes an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework utilizes any general function approximation of a human trainer’s feedback signal. Moreover, a combined capability of ACTAMER and reinforcement learning is also investigated and evaluated. The combination of human feedback and reinforcement learning is studied in both settings: sequential and simultaneous. Our experimental results demonstrate the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car and Cart-pole (balancing).
Applied Intelligence | 2013
Ngo Anh Vien; Wolfgang Ertel; Viet-Hung Dang; TaeChoong Chung
Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment’s dynamics is a product of Dirichlet distributions, the POMDP’s optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.
advances in computer-human interaction | 2008
Nguyen Hoang Viet; Ngo Anh Vien; SeungGwan Lee; TaeChoong Chung
The task of planning trajectories for a mobile robot has received considerable attention in the research literature. The problem involves computing a collision-free path between a start point and a target point in environment of known obstacles. In this paper, we study an obstacle avoidance path planning problem using multi ant colony system, in which several colonies of ants cooperate in finding good solution by exchanging good information. In the simulation, we experimentally investigate the behaviour of multi colony ant algorithm with different kinds of information among the colonies. At last we will compare the behaviour of different number of colonies with a multi start single colony ant algorithm to show the good improvement.
international conference on robotics and automation | 2016
Marc Toussaint; Thibaut Munzer; Yoan Mollard; Li Yang Wu; Ngo Anh Vien; Manuel Lopes
In human-robot collaboration, multi-agent domains, or single-robot manipulation with multiple end-effectors, the activities of the involved parties are naturally concurrent. Such domains are also naturally relational as they involve objects, multiple agents, and models should generalize over objects and agents. We propose a novel formalization of relational concurrent activity processes that allows us to transfer methods from standard relational MDPs, such as Monte-Carlo planning and learning from demonstration, to concurrent cooperation domains. We formally compare the formulation to previous propositional models of concurrent decision making and demonstrate planning and learning from demonstration methods on a real-world human-robot assembly task.
international symposium on neural networks | 2007
Ngo Anh Vien; Nguyen Hoang Viet; SeungGwan Lee; TaeChoong Chung
Path planning is an important task in mobile robot control. When the robot must move rapidly from any arbitrary start positions to any target positions in environment, a proper path must avoid both static obstacles and moving obstacles of arbitrary shape. In this paper, an obstacle avoidance path planning approach for mobile robots is proposed by using Ant-Q algorithm. Ant-Q is an algorithm in the family of ant colony based methods that are distributed algorithms for combinatorial optimization problems based on the metaphor of ant colonies. In the simulation, we experimentally investigate the sensitivity of the Ant-Q algorithm to its three methods of delayed reinforcement updating and we compare it with the results obtained by other heuristic approaches based on genetic algorithm or traditional ant colony system. At last, we will show very good results obtained by applying Ant-Q to bigger problem: Ant-Q find very good path at higher convergence rate.
international conference on tools with artificial intelligence | 2007
Ngo Anh Vien; TaeChoong Chung
Semi-markov decision processes (SMDP) are continuous time generalizations of discrete time Markov Decision Process. A number of value and policy iteration algorithms have been developed for the solution of SMDP problem. But solving SMDP problem requires prior knowledge of the deterministic kernels, and suffers from the curse of dimensionality. In this paper, we present the steepest descent direction based on a family of parameterized policies to overcome those limitations. The update rule is based on stochastic policy gradients employing Amaris natural gradient approach that is moving toward choosing a greedy optimal action. We then show considerable performance improvements of this method in the simple two-state SMDP problem and in the more complex SMDP of call admission control problem.
intelligent robots and systems | 2015
Ngo Anh Vien; Marc Toussaint
Efficient object manipulation based only on force feedback typically requires a plan of actively contact-seeking actions to reduce uncertainty over the true environmental model. In principle, that problem could be formulated as a full partially observable Markov decision process (POMDP) whose observations are sensed forces indicating the presence/absence of contacts with objects. Such a naive application leads to a very large POMDP with high-dimensional continuous state, action and observation spaces. Solving such large POMDPs is practically prohibitive. In other words, we are facing three challenging problems: 1) uncertainty over discontinuous contacts with objects; 2) high-dimensional continuous spaces; 3) optimization for not only trajectory cost but also execution time. As trajectory optimization is a powerful model-based method for motion generation, it can handle the last two issues effectively by computing locally optimal trajectories. This paper aims to integrate advantages of trajectory optimization into existing POMDP solvers. The full POMDP formulation is solved using sample-based approaches, where each sampled model is quickly evaluated via trajectory optimization instead of simulating a large number of rollouts. To further accelerate the solver, we propose to integrate temporal abstraction, i.e. macro actions or temporal actions, into the POMDP model. We demonstrate the proposed method on a simulated 7 DoF KUKA arm and a physical Willow Garage PR2 platform. The results show that our proposed method could effectively seek contacts in complex scenarios, and achieve near-optimal performance of path planing.
Applied Intelligence | 2014
Ngo Anh Vien; Hung Quoc Ngo; Sungyoung Lee; TaeChoong Chung
In this paper, we propose to use hierarchical action decomposition to make Bayesian model-based reinforcement learning more efficient and feasible for larger problems. We formulate Bayesian hierarchical reinforcement learning as a partially observable semi-Markov decision process (POSMDP). The main POSMDP task is partitioned into a hierarchy of POSMDP subtasks. Each subtask might consist of only primitive actions or hierarchically call other subtasks’ policies, since the policies of lower-level subtasks are considered as macro actions in higher-level subtasks. A solution for this hierarchical action decomposition is to solve lower-level subtasks first, then higher-level ones. Because each formulated POSMDP has a continuous state space, we sample from a prior belief to build an approximate model for them, then solve by using a recently introduced Monte Carlo Value Iteration with Macro-Actions solver. We name this method Monte Carlo Bayesian Hierarchical Reinforcement Learning. Simulation results show that our algorithm exploiting the action hierarchy performs significantly better than that of flat Bayesian reinforcement learning in terms of both reward, and especially solving time, in at least one order of magnitude.
international conference on machine learning and applications | 2012
Ngo Anh Vien; Wolfgang Ertel
Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environments dynamics is a product of Dirichlet distributions, the POMDPs optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.