Arthur Guez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur Guez is active.

Explore More

Publication

Featured researches published by Arthur Guez.

Nature | 2016

Mastering the game of Go with deep neural networks and tree search

David Silver; Aja Huang; Chris J. Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy P. Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Nature | 2017

Mastering the game of Go without human knowledge

David Silver; Julian Schrittwieser; Karen Simonyan; Ioannis Antonoglou; Aja Huang; Arthur Guez; Thomas Hubert; Lucas Baker; Matthew Lai; Adrian Bolton; Yutian Chen; Timothy P. Lillicrap; Fan Hui; Laurent Sifre; George van den Driessche; Thore Graepel; Demis Hassabis

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

International Journal of Neural Systems | 2009

TREATING EPILEPSY VIA ADAPTIVE NEUROSTIMULATION: A REINFORCEMENT LEARNING APPROACH

Joelle Pineau; Arthur Guez; Robert D. Vincent; Gabriella Panuccio; Massimo Avoli

This paper presents a new methodology for automatically learning an optimal neurostimulation strategy for the treatment of epilepsy. The technical challenge is to automatically modulate neurostimulation parameters, as a function of the observed EEG signal, so as to minimize the frequency and duration of seizures. The methodology leverages recent techniques from the machine learning literature, in particular the reinforcement learning paradigm, to formalize this optimization problem. We present an algorithm which is able to automatically learn an adaptive neurostimulation strategy directly from labeled training data acquired from animal brain tissues. Our results suggest that this methodology can be used to automatically find a stimulation strategy which effectively reduces the incidence of seizures, while also minimizing the amount of stimulation applied. This work highlights the crucial role that modern machine learning techniques can play in the optimization of treatment strategies for patients with chronic disorders such as epilepsy.

Journal of Artificial Intelligence Research | 2013

Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Arthur Guez; David Silver; Peter Dayan

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an in finite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.

international conference on robotics and automation | 2010

Multi-tasking SLAM

Arthur Guez; Joelle Pineau

The problem of simultaneous localization and mapping (SLAM) is one of the most studied in the robotics literature. Most existing approaches, however, focus on scenarios where localization and mapping are the only tasks on the robots agenda. In many real-world scenarios, a robot may be called on to perform other tasks simultaneously, in addition to localization and mapping. These can include target-following (or avoidance), search-and-rescue, point-to-point navigation, refueling, and so on. This paper proposes a framework that balances localization, mapping, and other planning objectives, thus allowing robots to solve sequential decision tasks under map and pose uncertainty. Our approach combines a SLAM algorithm with an online POMDP approach to solve diverse navigation tasks, without prior training, in an unknown environment.

Experimental Neurology | 2013

Adaptive control of epileptiform excitability in an in vitro model of limbic seizures

Gabriella Panuccio; Arthur Guez; Robert D. Vincent; Massimo Avoli; Joelle Pineau

Deep brain stimulation (DBS) is a promising tool for treating drug-resistant epileptic patients. Currently, the most common approach is fixed-frequency stimulation (periodic pacing) by means of stimulating devices that operate under open-loop control. However, a drawback of this DBS strategy is the impossibility of tailoring a personalized treatment, which also limits the optimization of the stimulating apparatus. Here, we propose a novel DBS methodology based on a closed-loop control strategy, developed by exploiting statistical machine learning techniques, in which stimulation parameters are adapted to the current neural activity thus allowing for seizure suppression that is fine-tuned on the individual scale (adaptive stimulation). By means of field potential recording from adult rat hippocampus-entorhinal cortex (EC) slices treated with the convulsant drug 4-aminopyridine we determined the effectiveness of this approach compared to low-frequency periodic pacing, and found that the closed-loop stimulation strategy: (i) has similar efficacy as low-frequency periodic pacing in suppressing ictal-like events but (ii) is more efficient than periodic pacing in that it requires less electrical pulses. We also provide evidence that the closed-loop stimulation strategy can alternatively be employed to tune the frequency of a periodic pacing strategy. Our findings indicate that the adaptive stimulation strategy may represent a novel, promising approach to DBS for individually-tailored epilepsy treatment.

bioRxiv | 2018

Adaptive planning in human search

Moritz J. F. Krusche; Eric Schulz; Arthur Guez; Maarten Speekenbrink

How do people plan ahead when searching for rewards? We investigate planning in a foraging task in which participants search for rewards on an infinite two-dimensional grid. Our results show that their search is best-described by a model which searches at least 3 steps ahead. Furthermore, participants do not seem to update their beliefs during planning, but rather treat their initial beliefs as given, a strategy similar to a heuristic called root-sampling. This planning algorithm corresponds well with participants’ behavior in test problems with restricted movement and varying degrees of information, outperforming more complex models. These results enrich our understanding of adaptive planning in complex environments.

CoRR, abs/1509.06461 (2015) | 2015