Michael H. Bowling | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael H. Bowling is active.

Explore More

Publication

Featured researches published by Michael H. Bowling.

international conference on artificial intelligence | 2013

The arcade learning environment: an evaluation platform for general agents

Marc G. Bellemare; Yavar Naddaf; Joel Veness; Michael H. Bowling

In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.

neural information processing systems | 2004

Convergence and No-Regret in Multiagent Learning

Michael H. Bowling

Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learners particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normal-form games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).

neural information processing systems | 2009

Monte Carlo Sampling for Regret Minimization in Extensive Games

Marc Lanctot; Kevin Waugh; Martin Zinkevich; Michael H. Bowling

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.

Proceedings of SPIE | 1999

Anticipation as a key for collaboration in a team of agents: a case study in robotic soccer

Manuela M. Veloso; Peter Stone; Michael H. Bowling

We investigate teams of compete autonomous agents that can collaborate towards achieving precise objectives in an adversarial dynamic environment. We have pursued these two frameworks emphasizing their different technical challenges. Creating effective members of a team is a challenging research problem. We first address this issue by introducing a team architecture organization which allows for a rich task decomposition between team members. The main contribution of this paper is our introduction of an action- selection algorithm that allows for a teammate to anticipate the needs of other teammates. Anticipation is critical for maximizing the probability of successful collaboration in teams of agents. We show how our contribution applies to the two concrete robotic soccer frameworks and present controlled empirical result run in simulation. Anticipation was successfully used by both our CMUnited-98 simulator and CMUnited-98 small-robot teams in the RoboCup-98 competition. The two teams are RoboCup-98 world champions each in its own league.

international conference on machine learning | 2005

Action respecting embedding

Michael H. Bowling; Ali Ghodsi; Dana F. Wilkinson

Dimensionality reduction is the problem of finding a low-dimensional representation of high-dimensional input data. This paper examines the case where additional information is known about the data. In particular, we assume the data are given in a sequence with action labels associated with adjacent data points, such as might come from a mobile robot. The goal is a variation on dimensionality reduction, where the output should be a representation of the input data that is both low-dimensional and respects the actions (i.e., actions correspond to simple transformations in the output representation). We show how this variation can be solved with a semidefinite program. We evaluate the technique in a synthetic, robot-inspired domain, demonstrating qualitatively superior representations and quantitative improvements on a data prediction task.

robot soccer world cup | 1999

The CMUnited-98 Small-Robot Team

Manuela M. Veloso; Michael H. Bowling; Sorin Achim; Kwun Han; Peter Stone

Robotic soccer is a challenging research domain which involves multiple agents that need to collaborate in an adversarial environment to achieve specific objectives. In this paper, we describe CMUnited the team of small robotic agents that we developed to enter the RoboCup-97 competition. We designed and built the robotic agents, devised the appropriate vision algorithm, and developed and implemented algorithms for strategic collaboration between the robots in an uncertain and dynamic environment. The robots can organize themselves in formations, hold specific roles, and pursue their goals. In game situations, they have demonstrated their collaborative behaviors on multiple occasions. The robots can also switch roles to maximize the overall performance of the team. We present an overview of the vision processing algorithm which successfully tracks multiple moving objects and predicts trajectories. The paper then focusses on the agent behaviors ranging from low-level individual behaviors to coordinated, strategic team behaviors. CMUnited won the RoboCup-97 small-robot competition at IJCAI-97 in Nagoya, Japan.

Journal of Artificial Intelligence Research | 2004

Existence of multiagent equilibria with limited agents

Michael H. Bowling; Manuela M. Veloso

Multiagent learning is a necessary yet challenging problem as multiagent systems become more prevalent and environments become more dynamic. Much of the groundbreaking work in this area draws on notable results from game theory, in particular, the concept of Nash equilibria. Learners that directly learn an equilibrium obviously rely on their existence. Learners that instead seek to play optimally with respect to the other players also depend upon equilibria since equilibria are fixed points for learning. From another perspective, agents with limitations are real and common. These may be undesired physical limitations as well as self-imposed rational limitations, such as abstraction and approximation techniques, used to make learning tractable. This article explores the interactions of these two important concepts: equilibria and limitations in learning. We introduce the question of whether equilibria continue to exist when agents have limitations. We look at the general effects limitations can have on agent behavior, and define a natural extension of equilibria that accounts for these limitations. Using this formalization, we make three major contributions: (i) a counterexample for the general existence of equilibria with limitations, (ii) sufficient conditions on limitations that preserve their existence, (iii) three general classes of games and limitations that satisfy these conditions. We then present empirical results from a specific multiagent learning algorithm applied to a specific instance of limited agents. These results demonstrate that learning with limitations is feasible, when the conditions outlined by our theoretical analysis hold.

international conference on machine learning | 2006

Learning predictive state representations using non-blind policies

Michael H. Bowling; Peter N. McCracken; Michael R. James; James Neufeld; Dana F. Wilkinson

Predictive state representations (PSRs) are powerful models of non-Markovian decision processes that differ from traditional models (e.g., HMMs, POMDPs) by representing state using only observable quantities. Because of this, PSRs can be learned solely using data from interaction with the process. The majority of existing techniques, though, explicitly or implicitly require that this data be gathered using a blind policy, where actions are selected independently of preceding observations. This is a severe limitation for practical learning of PSRs. We present two methods for fixing this limitation in most of the existing PSR algorithms: one when the policy is known and one when it is not. We then present an efficient optimization for computing good exploration policies to be used when learning a PSR. The exploration policies, which are not blind, significantly lower the amount of data needed to build an accurate model, thus demonstrating the importance of non-blind policies.

ISRR | 2007

Subjective Localization with Action Respecting Embedding

Michael H. Bowling; Dana F. Wilkinson; Ali Ghodsi; Adam Milstein

Robot localization is the problem of how to estimate a robot’s pose within an objective frame of reference. Traditional localization requires knowledge of two key conditional probabilities: the motion and sensor models. These models depend critically on the specific robot as well as its environment. Building these models can be time-consuming, manually intensive, and can require expert intuitions. However, the models are necessary for the robot to relate its own subjective view of sensors and motors to the robot’s objective pose. In this paper we seek to remove the need for human provided models. We introduce a technique for subjective localization, relaxing the requirement that the robot localize within a global frame of reference. Using an algorithm for action-respecting non-linear dimensionality reduction, we learn a subjective representation of pose from a stream of actions and sensations. We then extract from the data natural motion and sensor models defined for this new representation. Monte Carlo localization is used to track this representation of the robot’s pose while executing new actions and receiving new sensor readings. We evaluate the technique in a synthetic image manipulation domain and with a mobile robot using vision and laser sensors.

annual conference on computers | 2004

Game-Tree search with adaptation in stochastic imperfect-information games

Darse Billings; Aaron Davidson; Terence Schauenberg; Neil Burch; Michael H. Bowling; Robert C. Holte; Jonathan Schaeffer; Duane Szafron

Building a high-performance poker-playing program is a challenging project. The best program to date, PsOpti, uses game theory to solve a simplified version of the game. Although the program plays reasonably well, it is oblivious to the opponents weaknesses and biases. Modeling the opponent to exploit predictability is critical to success at poker. This paper introduces Vexbot, a program that uses a game-tree search algorithm to compute the expected value of each betting option, and does real-time opponent modeling to improve its evaluation function estimates. The result is a program that defeats PsOpti convincingly, and poses a much tougher challenge for strong human players.

Explore More