Pedro A. Ortega | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro A. Ortega is active.

Explore More

Publication

Featured researches published by Pedro A. Ortega.

arXiv: Statistics Theory | 2013

Thermodynamics as a theory of decision-making with information-processing costs

Pedro A. Ortega; Daniel A. Braun

Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.

PLOS Computational Biology | 2009

Nash Equilibria in Multi-Agent Motor Interactions

Daniel A. Braun; Pedro A. Ortega; Daniel M. Wolpert

Social interactions in classic cognitive games like the ultimatum game or the prisoners dilemma typically lead to Nash equilibria when multiple competitive decision makers with perfect knowledge select optimal strategies. However, in evolutionary game theory it has been shown that Nash equilibria can also arise as attractors in dynamical systems that can describe, for example, the population dynamics of microorganisms. Similar to such evolutionary dynamics, we find that Nash equilibria arise naturally in motor interactions in which players vie for control and try to minimize effort. When confronted with sensorimotor interaction tasks that correspond to the classical prisoners dilemma and the rope-pulling game, two-player motor interactions led predominantly to Nash solutions. In contrast, when a single player took both roles, playing the sensorimotor game bimanually, cooperative solutions were found. Our methodology opens up a new avenue for the study of human motor interactions within a game theoretic framework, suggesting that the coupling of motor systems can lead to game theoretic solutions.

Journal of Artificial Intelligence Research | 2010

A minimum relative entropy principle for learning and acting

Pedro A. Ortega; Daniel A. Braun

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.

ieee symposium on adaptive dynamic programming and reinforcement learning | 2011

Path integral control and bounded rationality

Daniel A. Braun; Pedro A. Ortega; Evangelos A. Theodorou; Stefan Schaal

Path integral methods [1], [2],[3] have recently been shown to be applicable to a very general class of optimal control problems. Here we examine the path integral formalism from a decision-theoretic point of view, since an optimal controller can always be regarded as an instance of a perfectly rational decision-maker that chooses its actions so as to maximize its expected utility [4]. The problem with perfect rationality is, however, that finding optimal actions is often very difficult due to prohibitive computational resource costs that are not taken into account. In contrast, a bounded rational decision-maker has only limited resources and therefore needs to strike some compromise between the desired utility and the required resource costs [5]. In particular, we suggest an information-theoretic measure of resource costs that can be derived axiomatically [6]. As a consequence we obtain a variational principle for choice probabilities that trades off maximizing a given utility criterion and avoiding resource costs that arise due to deviating from initially given default choice probabilities. The resulting bounded rational policies are in general probabilistic. We show that the solutions found by the path integral formalism are such bounded rational policies. Furthermore, we show that the same formalism generalizes to discrete control problems, leading to linearly solvable bounded rational control policies in the case of Markov systems. Importantly, Bellmans optimality principle is not presupposed by this variational principle, but it can be derived as a limit case. This suggests that the information-theoretic formalization of bounded rationality might serve as a general principle in control design that unifies a number of recently reported approximate optimal control methods both in the continuous and discrete domain.

arXiv: Artificial Intelligence | 2014

Generalized Thompson sampling for sequential decision-making and causal inference

Pedro A. Ortega; Daniel A. Braun

PurposeSampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling.MethodsAlthough mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal.ResultsHere we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion.ConclusionIn summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.

Experimental Brain Research | 2011

Motor coordination: when two have to act as one

Daniel A. Braun; Pedro A. Ortega; Daniel M. Wolpert

Trying to pass someone walking toward you in a narrow corridor is a familiar example of a two-person motor game that requires coordination. In this study, we investigate coordination in sensorimotor tasks that correspond to classic coordination games with multiple Nash equilibria, such as “choosing sides,” “stag hunt,” “chicken,” and “battle of sexes”. In these tasks, subjects made reaching movements reflecting their continuously evolving “decisions” while they received a continuous payoff in the form of a resistive force counteracting their movements. Successful coordination required two subjects to “choose” the same Nash equilibrium in this force-payoff landscape within a single reach. We found that on the majority of trials coordination was achieved. Compared to the proportion of trials in which miscoordination occurred, successful coordination was characterized by several distinct features: an increased mutual information between the players’ movement endpoints, an increased joint entropy during the movements, and by differences in the timing of the players’ responses. Moreover, we found that the probability of successful coordination depends on the players’ initial distance from the Nash equilibria. Our results suggest that two-person coordination arises naturally in motor interactions and is facilitated by favorable initial positions, stereotypical motor pattern, and differences in response times.

international conference on robotics and automation | 2014

Monte Carlo methods for exact efficient solution of the generalized optimality equations

Pedro A. Ortega; Daniel A. Braun; Naftali Tishby

Previous work has shown that classical sequential decision making rules, including expectimax and minimax, are limit cases of a more general class of bounded rational planning problems that trade off the value and the complexity of the solution, as measured by its information divergence from a given reference. This allows modeling a range of novel planning problems having varying degrees of control due to resource constraints, risk-sensitivity, trust and model uncertainty. However, so far it has been unclear in what sense information constraints relate to the complexity of planning. In this paper, we introduce Monte Carlo methods to solve the generalized optimality equations in an efficient & exact way when the inverse temperatures in a generalized decision tree are of the same sign. These methods highlight a fundamental relation between inverse temperatures and the number of Monte Carlo proposals. In particular, it is seen that the number of proposals is essentially independent of the size of the decision tree.

Advances in Complex Systems | 2013

Metabolic Cost As An Organizing Principle For Cooperative Learning

David Balduzzi; Pedro A. Ortega; Michel Besserve

This article investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations.

Entropy | 2014

Information-Theoretic Bounded Rationality and ε-Optimality

Daniel A. Braun; Pedro A. Ortega

Bounded rationality concerns the study of decision makers with limited information processing resources. Previously, the free energy difference functional has been suggested to model bounded rational decision making, as it provides a natural trade-off between an energy or utility function that is to be optimized and information processing costs that are measured by entropic search costs. The main question of this article is how the information-theoretic free energy model relates to simple e-optimality models of bounded rational decision making, where the decision maker is satisfied with any action in an e-neighborhood of the optimal utility. We find that the stochastic policies that optimize the free energy trade-off comply with the notion of e-optimality. Moreover, this optimality criterion even holds when the environment is adversarial. We conclude that the study of bounded rationality based on e-optimality criteria that abstract away from the particulars of the information processing constraints is compatible with the information-theoretic free energy model of bounded rationality.

Current Opinion in Neurobiology | 2014

Dynamic belief state representations.

Daniel D. Lee; Pedro A. Ortega; Alan A. Stocker

Perceptual and control systems are tasked with the challenge of accurately and efficiently estimating the dynamic states of objects in the environment. To properly account for uncertainty, it is necessary to maintain a dynamical belief state representation rather than a single state vector. In this review, canonical algorithms for computing and updating belief states in robotic applications are delineated, and connections to biological systems are highlighted. A navigation example is used to illustrate the importance of properly accounting for correlations between belief state components, and to motivate the need for further investigations in psychophysics and neurobiology.

Explore More