Modular Object-Oriented Games: A Task Framework for Reinforcement Learning, Psychology, and Neuroscience
22021-02-17
Modular Object-Oriented Games: A Task Framework forReinforcement Learning, Psychology, and Neuroscience
Nicholas Watters , Joshua Tenenbaum , Mehrdad Jazayeri Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Center for Brains, Minds and Machines, MIT, McGovern Institute of Brain Research, MIT
In recent years, trends towards studying simulated games have gained momentum in the fields of artifi-cial intelligence, cognitive science, psychology, and neuroscience. The intersections of these fields havealso grown recently, as researchers increasing study such games using both artificial agents and human oranimal subjects. However, implementing games can be a time-consuming endeavor and may require a re-searcher to grapple with complex codebases that are not easily customized. Furthermore, interdisciplinaryresearchers studying some combination of artificial intelligence, human psychology, and animal neurophys-iology face additional challenges, because existing platforms are designed for only one of these domains.Here we introduce Modular Object-Oriented Games, a Python task framework that is lightweight, flexible,customizable, and designed for use by machine learning, psychology, and neurophysiology researchers.
1. Introduction
In recent years, trends towards studying object-based games have gained momentum in the fields ofartificial intelligence, cognitive science, psychology, and neuroscience. In artificial intelligence, interactivephysical games are now a common testbed for reinforcement learning (François-Lavet et al., 2018; Leikeet al., 2017; Mnih et al., 2013; Sutton and Barto, 2018) and object representations are of particularinterest for sample efficient and generalizable AI (Battaglia et al., 2018; Greff et al., 2020; van Steenkisteet al., 2019). In cognitive science and psychology, object-based games are used to study a variety ofcognitive capacities, such as planning, intuitive physics, and intuitive psychology (Chabris, 2017; Ullmanet al., 2017). Developmental psychologists also use object-based visual stimuli to probe questions aboutobject-oriented reasoning in infants and young animals (Spelke and Kinzler, 2007; Wood et al., 2020).In neuroscience, object-based computer games have recently been used to study decision-making andphysical reasoning in both human and non-human primates (Fischer et al., 2016; McDonald et al., 2019;Rajalingham et al., 2021; Yoo et al., 2020).Furthermore, a growing number of researchers are studying tasks using a combination of approachesfrom these fields. Comparing artificial agents with humans or animals performing the same tasks canhelp constrain models of human/animal behavior, generate hypotheses for neural mechanisms, and mayultimately facilitate building more intelligent artificial agents (Hassabis et al., 2017; Lake et al., 2017;Willke et al., 2019).However, building a task that can be played by AI agents, humans, and animals is a time-consumingundertaking because existing platforms are typically designed for only one of these purposes. Professionalgame engines are designed for human play and are often complex libraries that are difficult to customizefor training AI agents and animals. Reinforcement learning platforms are designed for AI agents but areoften too slow or inflexible for neuroscience work. Existing psychology and neurophysiology platformsare too limited to easily support complex interactive games.In this work we offer a game engine that is highly customizable and designed to support tasks thatcan be played by AI agents, humans, and animals.
Corresponding author(s): [email protected] a r X i v : . [ c s . A I] F e b odular Object-Oriented Games: A Task Framework for Reinforcement Learning, Psychology, and Neuroscience
2. Summary
The Modular Object-Oriented Games library is a general-purpose Python-based platform for interactivegames. It aims to satisfy the following criteria:• Usable for reinforcement learning, psychology, and neurophysiology. MOOG supports DeepMinddm_env and OpenAI Gym (Brockman et al., 2016) interfaces for RL agents and an MWorks interfacefor psychology and neurophysiology.• Highly customizable. Environment physics, reward structure, agent interface, and more can becustomized.• Easy to rapidly prototype tasks. Tasks can be composed in a single short file.• Light-weight and efficient. Most tasks run quickly, almost always faster than 100 frames per secondon CPU and often much faster than that.• Facilitates procedural generation for randomizing task conditions every trial.
3. Intended Users
MOOG was designed for use by the following kinds of researchers:• Machine learning researchers studying reinforcement learning in 2.5-dimensional (2-D withocclusion) physical environments who want to quickly implement tasks in Python.• Psychology researchers who want more flexibility than existing psychology platforms afford.• Neurophysiology researchers who want to study interactive games yet still need to precisely controlstimulus timing.• Machine learning researchers studying unsupervised learning, particularly in the video domain.MOOG can be used to procedurally generate video datasets with controlled statistics.MOOG may be particularly useful for interdisciplinary researchers studying AI agents, humans, andanimals (or some subset thereof) all playing the same task.
4. Design
The core philosophy of MOOG is "one task, one file."
Namely, a task should be implemented with a singleconfiguration file. This configuration file is a short “recipe” for the task, containing as little substantivecode as possible, and should define a set of components to pass to the MOOG environment. See Figure 1for a schematic of these components.A MOOG environment receives the following components (or callables returning them) from theconfiguration file:•
State . The state is a collection of sprites. Sprites are polygonal shapes with color and physicalattributes (position, velocity, angular velocity, and mass). Sprites are 2-dimensional, and the stateis 2.5-dimensional with z-ordering for occlusion. The initial state can be procedurally generatedfrom a custom distribution at the beginning of each episode. The state is structured in terms oflayers, which helps hierarchical organization. See state_initialization for procedural generationtools.•
Physics . The physics component is a collection of forces that operate on the sprites. There area variety of forces built into MOOG (collisions, friction, gravity, rigid tethers, ...) and additionalcustom forces can also be used. Forces perturb the velocity and angular velocity of sprites, and thesprite positions and angles are updated with Newton’s method. See physics for more.
Figure 1 | Components of a MOOG environment.
See main text for details. • Task . The task defines the rewards and specifies when to terminate a trial. See tasks for more.•
Action Space . The action space allows the subject to control the environment. Every environmentstep calls for an action from the subject. Action spaces may impart a force on a sprite (like ajoystick), move a sprite in a grid (like an arrow-key interface), set the position of a sprite (like atouch-screen), or be customized. The action space may also be a composite of constituent actionspaces, allowing for multi-agent tasks and multi-controller games. See action_spaces for more.•
Observers . Observers transform the environment state into a observation for the subject/agentplaying the task. Typically, the observer includes a renderer producing an image. However, it ispossible to implement a custom observer that exposes any function of the environment state. Theenvironment can also have multiple observers. See observers for more.•
Game Rules (optional). If provided, the game rules define dynamics or transitions not capturedby the physics. A variety of game rules are included, including rules to modify sprites when theycome in contact, conditionally create new sprites, and control phase structure of trials (e.g. fixationphase to stimulus phase to response phase). See game_rules for more.Importantly, all of these components can be fully customized. If a user would like a physics force,action space, or game rule not provided by MOOG, they can implement a custom one, inheriting fromthe abstract base class for that component. This can typically be done with only a few lines of code.The modularity of MOOG facilitates code re-use across experimental paradigms. For example, ifa user would like to both collect behavior data from humans using a continuous joystick and train RLagents with discrete action spaces on the same task, they can re-use all other components in the taskconfiguration, only changing the action space.For users interested in doing psychology or neurophysiology, we include an example of how to runMOOG through MWorks, a platform with precise timing control and interfaces for eye trackers, HIDdevices, electrophysiology software, and more.
5. Example Tasks
See the example_configs for a variety of task config files. Four of those are shown in Figure 2. See thedemo documentation for videos of them all and instructions for how to run them with a Python gui.The MOOG codebase contains libraries of options for each of the components in Section 4, soimplementing a task involves only combining the desired ingredients and feeding them to the environment.For an example, the following code fully implements a navigate-to-goal task, where the subject mustmove an agent via a joystick action space to a goal location:
Figure 2 | Example tasks.
Time-lapse images of four example tasks. Left-to-right: (i) Pong - The subject aims tocatch the yellow ball with the green paddle, (ii) Red-Green - The subject tries to predict whether the blue ball withcontact the red square or the green square, (iii) Pac-Man - The subject moves the green agent to catch yellowpellets while avoiding the red ghosts, (iv) Collisions - the green agent avoids touching the bouncing polygons. " " " Navigate−to−g o a l t a s k . " " " import c o l l e c t i o n s from moog import a c t i o n _ s p a c e s , environment , o b s e r v e r s , p h y s i c s , s p r i t e , t a s k s def s t a t e _ i n i t i a l i z e r ( ) : goal = s p r i t e . S p r i t e ( x =0.1 , y =0.1 , shape= ’ square ’ , s c a l e =0.1 , c0=255) agent = s p r i t e . S p r i t e ( x =0.5 , y =0.5 , shape= ’ c i r c l e ’ , s c a l e =0.1 , c1=255) s t a t e = c o l l e c t i o n s . OrderedDict ( [ ( ’ goal ’ , [ goal ] ) , ( ’ agent ’ , [ agent ] ) , ] ) return s t a t e phys = p h y s i c s . P h y s i c s ( ( p h y s i c s . Drag ( c o e f f _ f r i c t i o n =0.25) , ’ agent ’ ) ) t a s k = t a s k s . ContactReward ( 1 . , ’ agent ’ , ’ goal ’ , r e s e t _ s t e p s _ a f t e r _ c o n t a c t =5) a c t i o n _ s p a c e = a c t i o n _ s p a c e s . J o y s t i c k ( 0 . 0 1 , ’ agent ’ ) obs = o b s e r v e r s . PILRenderer ( image_size =(256, 256) ) env = environment . Environment ( s t a t e _ i n i t i a l i z e r=s t a t e _ i n i t i a l i z e r , p h y s i c s=phys , t a s k=task , a c t i o n _ s p a c e=a c t i o n _ s p a c e , o b s e r v e r s={ ’ image ’ : obs } , } This is an extremely simple task, but by complexifying the state initializer and adding additionalforces and game rules, a wide range of complex tasks can be implemented with few lines of code.
6. Limitations
Users should be aware of the following limitations of MOOG before choosing to use it for their research:•
Not 3D . MOOG environments are 2.5-dimensional, meaning that they render in 2-dimensionswith z-ordering for occlusion. MOOG does not support 3D sprites.•
Very simple graphics . MOOG sprites are monochromatic polygons. There are no textures, shadows,or other visual effects. Composite sprites can be implemented by creating multiple overlappingsprites, but still the graphics complexity is very limited. This has the benefit of a small and easilyparameterizable set of factors of variation of the sprites, but does make MOOG environmentsvisually unrealistic.•
Imperfect collisions . MOOG’s collision module implements Newtonian rotational mechanics, butit is not as robust as professional physics engines (e.g. can be unstable if object are moving veryquickly and many collisions occur simultaneously).
7. Related Software
Professional game engines (e.g. Unity and Unreal) and visual reinforcement learning platforms (e.g.DeepMind Lab (Beattie et al., 2016), Mujoco (Todorov et al., 2012), and VizDoom) are commonly usedin the machine learning field for task implementation. While MOOG has some limitations compared tothese (see above), it does also offer some advantages:•
Python . MOOG tasks are written purely in Python, so users who are most comfortable with Pythonwill find MOOG easy to use.•
Procedural Generation . MOOG facilitates procedural generation, with a library of compositionaldistributions to randomize conditions across trials.•
Online Simulation . MOOG supports online model-based RL, with a ground truth simulator fortree search.•
Psychophysics . MOOG can be run with MWorks, a psychophysics platform.•
Speed . MOOG is fast on CPU. While the speed depends on the task and rendering resolution,MOOG typically runs at 200fps with 512x512 resolution on a CPU, much faster than DeepMindLab and Mujoco and at least as fast as Unity and Unreal.Python-based physics simulators, such PyBullet (Coumans and Bai, 2016–2019) and Pymunk, aresometimes used in the psychology literature. While these offer highly accurate collision simulation,MOOG offers the following advantages:•
Customization . Custom forces and game rules can be easily implemented in MOOG.•
Psychophysics, Procedural Generation, and Online Simulation , as described above.•
RL Interface . A task implemented in MOOG can be used out-of-the-box to train RL agents, sinceMOOG is Python-based and has DeepMind dm_env and OpenAI Gym interfaces.Psychology and neurophysiology researchers often use platforms such as PsychoPy (Peirce et al.,2019), PsychToolbox (Kleiner et al., 2007), and MWorks. These allow precise timing control andcoordination with eye trackers and other controllers. MOOG can interface with MWorks to leverage allof those features and offers the following additional benefits:•
Flexibility . MOOG offers a large scope of interactive tasks. Existing psychophysics platforms arenot easily customized for game-like tasks, action interfaces, and arbitrary object shapes. • Physics . Existing psychophysics platforms do not have built-in physics, such as forces, collisions,etc.•
RL Interface , as described above.
Acknowledgments
We thank Chris Stawarz and Erica Chiu for their contributions to the codebase. We also thank RuidongChen, Setayesh Radkani, and Michael Yoo for their feedback as early users of OOG.
References
P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti,D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani,K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li,and R. Pascanu. Relational inductive biases, deep learning, and graph networks, 2018.C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés,A. Sadik, J. Schrittwieser, K. Anderson, S. York, M. Cant, A. Cain, A. Bolton, S. Gaffney, H. King,D. Hassabis, S. Legg, and S. Petersen. Deepmind lab.
CoRR , abs/1612.03801, 2016. URL http://arxiv.org/abs/1612.03801 .G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openaigym.
CoRR , abs/1606.01540, 2016. URL http://arxiv.org/abs/1606.01540 .C. F. Chabris. Six suggestions for research on games in cognitive science, 2017. URL https://doi.org/10.1111/tops.12267 .E. Coumans and Y. Bai. Pybullet, a python module for physics simulation for games, robotics and machinelearning. http://pybullet.org , 2016–2019.J. Fischer, J. G. Mikhael, J. B. Tenenbaum, and N. Kanwisher. Functional neuroanatomy of intuitivephysical inference.
Proceedings of the national academy of sciences , 113(34):E5072–E5081, 2016.V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau. An introduction to deepreinforcement learning.
Foundations and Trends® in Machine Learning , 11(3-4):219–354, 2018. ISSN1935-8245. doi: 10.1561/2200000071. URL http://dx.doi.org/10.1561/2200000071 .K. Greff, S. van Steenkiste, and J. Schmidhuber. On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208 , 2020.D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence.
Neuron , 95(2):245–258, 2017.M. Kleiner, D. Brainard, D. Pelli, A. Ingling, R. Murray, and C. Broussard. What’s new in psychtoolbox-3,2007.B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. Building machines that learn and thinklike people.
Behavioral and brain sciences , 40, 2017.J. Leike, M. Martic, V. Krakovna, P. A. Ortega, T. Everitt, A. Lefrancq, L. Orseau, and S. Legg. Ai safetygridworlds, 2017.K. R. McDonald, W. F. Broderick, S. A. Huettel, and J. M. Pearson. Bayesian nonparametric modelscharacterize instantaneous strategies in a competitive dynamic game.
Nature communications , 10(1):1–12, 2019.V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playingatari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 , 2013.J. Peirce, J. R. Gray, S. Simpson, M. MacAskill, R. Höchenberger, H. Sogo, E. Kastman, and J. K. Lindeløv.PsychoPy2: Experiments in behavior made easy.
Behavior Research Methods , 51(1):195–203, 2019.ISSN 1554-3528. doi: 10.3758/s13428-018-01193-y. URL https://doi.org/10.3758/s13428-018-01193-y .R. Rajalingham, A. Piccato, and M. Jazayeri. The role of mental simulation in primate physical inferenceabilities. bioRxiv , 2021. doi: 10.1101/2021.01.14.426741. URL .E. S. Spelke and K. D. Kinzler. Core knowledge.
Developmental science , 10(1):89–96, 2007.
R. S. Sutton and A. G. Barto.
Reinforcement learning: An introduction . MIT press, 2018.E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In
IROS , pages 5026–5033. IEEE, 2012. ISBN 978-1-4673-1737-5. URL http://dblp.uni-trier.de/db/conf/iros/iros2012.html .T. D. Ullman, E. Spelke, P. Battaglia, and J. B. Tenenbaum. Mind games: Game engines as an architecturefor intuitive physics.
Trends in cognitive sciences , 21(9):649–665, 2017.S. van Steenkiste, K. Greff, and J. Schmidhuber. A perspective on objects and systematic generalizationin model-based rl. arXiv preprint arXiv:1906.01035 , 2019.T. L. Willke, S. B. M. Yoo, M. Capotă, S. Musslick, B. Y. Hayden, and J. D. Cohen. A comparison ofnon-human primate and deep reinforcement learning agent performance in a virtual pursuit-avoidancetask. bioRxiv , page 567545, 2019.J. N. Wood, D. Lee, B. W. Wood, and S. M. Wood. Reverse engineering the origins of visual intelligence.2020.S. B. M. Yoo, J. C. Tu, S. T. Piantadosi, and B. Y. Hayden. The neural basis of predictive pursuit.
Natureneuroscience , 23(2):252–259, 2020., 23(2):252–259, 2020.