Peter Sunehag | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Sunehag is active.

Explore More

Publication

Featured researches published by Peter Sunehag.

Pervasive and Mobile Computing | 2010

Wearable sensor activity analysis using semi-Markov models with a grammar

Owen Thomas; Peter Sunehag; Gideon Dror; Sungrack Yun; Sungwoong Kim; Matthew W. Robards; Alexander J. Smola; Daniel J. Green; Philo U. Saunders

Detailed monitoring of training sessions of elite athletes is an important component of their training. In this paper we describe an application that performs a precise segmentation and labeling of swimming sessions. This allows a comprehensive breakdown of the training session, including lap times, detailed statistics of strokes, and turns. To this end we use semi-Markov models (SMM), a formalism for labeling and segmenting sequential data, trained in a max-margin setting. To reduce the computational complexity of the task and at the same time enforce sensible output, we introduce a grammar into the SMM framework. Using the trained model on test swimming sessions of different swimmers provides highly accurate segmentation as well as perfect labeling of individual segments. The results are significantly better than those achieved by discriminative hidden Markov models.

arXiv: Information Theory | 2013

(Non-)Equivalence of Universal Priors

Ian Wood; Peter Sunehag; Marcus Hutter

Ray Solomonoff invented the notion of universal induction featuring an aptly termed “universal” prior probability function over all possible computable environments [9]. The essential property of this prior was its ability to dominate all other such priors. Later, Levin introduced another construction — a mixture of all possible priors or “universal mixture”[12]. These priors are well known to be equivalent up to multiplicative constants. Here, we seek to clarify further the relationships between these three characterisations of a universal prior (Solomonoff’s, universal mixtures, and universally dominant priors). We see that the the constructions of Solomonoff and Levin define an identical class of priors, while the class of universally dominant priors is strictly larger. We provide some characterisation of the discrepancy.

algorithmic learning theory | 2010

Consistency of feature Markov processes

Peter Sunehag; Marcus Hutter

We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.

european workshop on reinforcement learning | 2011

Feature reinforcement learning in practice

Phuong Nguyen; Peter Sunehag; Marcus Hutter

Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [13]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm [20] and the recent active-LZ algorithm [6], and is competitive with MC-AIXI-CTW [29] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our ΦMDP algorithm is simpler and consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.

australasian joint conference on artificial intelligence | 2012

Optimistic agents are asymptotically optimal

Peter Sunehag; Marcus Hutter

We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.

data compression conference | 2012

Adaptive Context Tree Weighting

Alexander O'Neill; Marcus Hutter; Wen Shao; Peter Sunehag

We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file.

algorithmic learning theory | 2011

Axioms for rational reinforcement learning

Peter Sunehag; Marcus Hutter

We provide a formal, simple and intuitive theory of rational decision making including sequential decisions that affect the environment. The theory has a geometric flavor, which makes the arguments easy to visualize and understand. Our theory is for complete decision makers, which means that they have a complete set of preferences. Our main result shows that a complete rational decision maker implicitly has a probabilistic model of the environment. We have a countable version of this result that brings light on the issue of countable vs finite additivity by showing how it depends on the geometry of the space which we have preferences over. This is achieved through fruitfully connecting rationality with the Hahn-Banach Theorem. The theory presented here can be viewed as a formalization and extension of the betting odds approach to probability of Ramsey and De Finetti [Ram31, deF37].

artificial general intelligence | 2012

Optimistic AIXI

Peter Sunehag; Marcus Hutter

We consider extending the AIXI agent by using multiple (or even a compact class of) priors. This has the benefit of weakening the conditions on the true environment that we need to prove asymptotic optimality. Furthermore, it decreases the arbitrariness of picking the prior or reference machine. We connect this to removing symmetry between accepting and rejecting bets in the rationality axiomatization of AIXI and replacing it with optimism. Optimism is often used to encourage exploration in the more restrictive Markov Decision Process setting and it alleviates the problem that AIXI (with geometric discounting) stops exploring prematurely.

international conference on data mining | 2009

Semi-Markov kMeans Clustering and Activity Recognition from Body-Worn Sensors

Matthew W. Robards; Peter Sunehag

Subsequence clustering aims to find patterns that appear repeatedly in time series data. We introduce a novel subsequence clustering technique that we call semi-Markov kmeans clustering. The clustering results in ideal examples of the repeating patterns and in labeled segmentations that can be used as training data for sophisticated discriminative methods like max-margin semi-Markov models. We are applying the new clustering technique to activity recognition from body-worn sensors by showing how it can enable a system to learn from data that is only annotated by an ordered list of activity types that have been undertaken. This kind of annotation, unlike a detailed segmentation of the sensor data, is easily provided by a non-expert user. We show that we can achieve equally good results using only an ordered list of activity types for training as when using a full detailed labeled segmentation.

artificial general intelligence | 2014

Intelligence as Inference or Forcing Occam on the World

Peter Sunehag; Marcus Hutter

We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm.

Explore More